Large corporations such as Amazon, Ebay, Google, Facebook and LinkedIn are as much data science companies as they are leaders of specific domains.

The global data science market is projected to grow to $320 billion by the year 2020, says Graham Williams, data scientist at data processor company Togaware as well as the Australia Taxation Office.

DATA MINING TOOL

According to McKinsey, there will be shortage of over 1.8 lakh data scientists in the US by 2018, reflecting the explosive growth of the sector.

Williams said this while speaking at a three-day workshop on ‘Data mining and analytics with R’ organised by the International Centre for Free and Open Source Software.

‘R’ is the most widely-used data mining and analytics too globally for statistics and data science.

Data mining is the process of excavating data in an attempt to uncover hitherto unknown but useful patterns, particularly in large datasets.

It is intended to discover new insights and knowledge and to develop predictive models.

WIDESPREAD USE

The ‘R’ tool is used in different disciplines such as retail, financial services, health research, weather modelling, astronomy, psychology, and social sciences.

As computerisation becomes common in governments, enormous volumes of data are generated.

Open source tools domains such as data mining, analytics and big data, previously used mostly by the IT industry, are increasingly becoming important for governments around the world, Williams said.

Open source tools such ‘R’ are of immense use in this context, given their significant power, very low cost, rapid adoption of new technology, vibrant communities, and licence-free regimes.

MASSIVE DATASETS

Governments are increasingly applying tools such as data mining, analytics and visualisation on massive datasets to uncover patterns of interest, including fraud and tax evasion.

The Australian government uses ‘R’ for data mining at the Australian Tax office, and the office of Immigration and Border Control, and Health and Human Services.

As more governments join the open data movement, the use of ‘R’ is expected to increase even further.

According to Satish Babu, Director of the International Centre for Free and Open Source Software, better training in ‘R’ could help India leverage the potential of this domain.

The workshop attracted the attention of the IT Industry, researchers, students and government officers. It will conclude on Thursday.

comment COMMENT NOW