Large corporations such as Amazon, Ebay, Google, Facebook and LinkedIn are as much data science companies as they are leaders of specific domains.

Global data science market is projected to be worth $320 billion by year 2020, says Graham Williams, data scientist at data processor company Togaware as well as the Australia Taxation Office.

According to McKinsey, there will be shortage of over 1.8 lakh data scientists in the US by 2018, given the explosive growth rate of the sector.

Data mining tool Williams was speaking at a three-day workshop on ‘Data mining and analytics with R’ organised here by the International Centre for Free and Open Source Software. ‘R’ is the most widely-used data mining and analytics tool globally for statistics and data science.

Data mining is the process of excavating data in an attempt to uncover hitherto unknown but useful patterns, particularly in large datasets. It is intended to discover new insights and knowledge and to develop predictive models.

The ‘R’ tool is being used in different disciplines such as retail, financial services, health research, weather modelling, astronomy, psychology, and social sciences.

As computerisation becomes common in governments, enormous volumes of data are generated.

Significance of open source Open source tools domains such as data mining, analytics and big data, previously used mostly by the IT Industry, is increasingly becoming important for governments around the world, Williams said. Open source tools such ‘R’ are of immense use in this context, given their significant power, very low cost, rapid adoption of new technology, vibrant communities and license-free regimes.

Massive datasets Governments are increasingly applying tools such as data mining, analytics and visualisation on massive datasets to uncover patterns of interest including fraud and tax evasion.

The Australian government uses ‘R’ for data mining at the Australian Tax office, and office of Immigration and Border Control, and Health and Human Services.

As more governments join the open data movement, it is expected that the use of ‘R’ will increase even further.

According to Satish Babu, Director of hosts International Centre for Free and Open Source Software, better training in ‘R’ could help India to leverage the potential of this domain.

comment COMMENT NOW