Big data refers to massive volumes of data that is generated by the increased use of digital tools and information systems. Data has no meaning unless there is context, and four types of big data analytics include descriptive, diagnostic, predictive and prescriptive.

The benefits of data analytics include better understanding of behavioural patterns of consumers and citizens to maximise commercial/policy outcomes, proactivity and better anticipation of needs, and allows tailored policy solutions and reduce inefficiencies.

Its risks include: (a) increased privacy infringement; (b) data integration coupled with cultural diversity and non-sharing of data, scattering of data across companies, countries, languages and forms; and (c) massive data processing requires stronger computing power.

True owner of data is the source that cleans and verifies the data such that it stands up to scrutiny. Governments and public sector consider personal data a public good and play a more active regulatory role in modernising policy frameworks to protect it from unlawful processing. Thus the question of “how governments are promoting data analytics across the world” and its solution is relevant for countries including India.

Global models

In Singapore, open data sharing is a priority area for Singapore’s Smart Nation vision wherein ‘’ portal makes data sets from more than 70 agencies public and there is active use of data visualisation and . More than 20,000 or 14 per cent of public sector officers are being trained in data science. In Estonia, ‘Digital Agenda 2020’ was introduced, which promotes use of smart ICT solutions to increase quality of life and productivity.

Also, the practice of government promotion and regulation of big data is another relevant aspect for consideration. The rigidity of regulation is high in the European Union, Singapore and Japan where there is General Data Protection Regulation or Personal Data Protection Act, and sharing of data is based on deemed consent.

On the other hand, in China, there is Personal Data Protection Guideline which is non-binding and rigidity of regulation is low. The question here is, “which is the efficient and most desirable one”, wherein the format of personal data protection plays a major role. While overly rigid regulations are obstacles to policy-making, public policy and healthcare research and businesses, overly loose regulations are violation of data privacy and thus its trade-off is relevant.

The data protection and promotion in Japan is a fine balance in this “trade-off”. While its Personal Data Protection Act prohibits deemed consent about sensitive information, its “Next Generation Healthcare Infrastructure Act” allows the anonymisation agency to deal with medical information with deemed consent and researchers get data from the agency.

As per Japan’s Cancer Information Registration Act, hospitals must report anonymised cancer information to the government, which runs a cancer database.

Data Philanthropy is also relevant here and the question is: What is in it for the data providers? Data providers solve public problems that cannot be addressed by existing data sources, align business and philanthropic activities while benefiting customers and business.

Data Philanthropy mitigate potential business risks by contributing to a more informed policy environment, generate goodwill, support community partnership, provide insights for social good and validate internal data and spark innovation. This may generally be termed as the act of sharing private data for public good.

The best example is Google Flu Trends, a data programme that queried search data to track influenza outbreaks wherein Google analysed this search data privately but made the findings publicly available to help health providers track flu outbreaks. Similarly, Sesame Credit, an affiliate of the Chinese Alibaba Group, uses data from Alibaba’s services to compile a score that is based on social media interactions and purchases carried out on Alibaba Group websites or paid for using its affiliate’s mobile wallet.

The usage of Sesame Credit are facilitating ‘bike share service’, ‘power bank service’ and ‘small loan service, wherein its benefits include lowering the cost of trust, providing credit service to more Chinese and helping building a trustworthy society.

Another noteworthy example is ‘disaster big data in Japan’ which helped it in re-building after a tsunami/earthquake. It helped Japan to get answers to questions like: ‘how many people were in the tsunami affected area, at the exact time when tsunami struck?’, ‘what is the actual behaviour of people in the affected area when the evacuation warning was issued?’ and ‘how long it actually took to proceed 100 m in heavily congested road to evacuate?’

The source for this includes data from car-navigation GPS, tweets posted during a week after the disaster, GPS data of mobile phones, simulation data of the disaster, the government’s recovery policies, etc. Data in this context shows that, not only the geographical condition of the city which is surrounded by rivers, but also intensive inflow of “pick-up” behaviours, created serious traffic jam in which vehicles were unable to move. It might be a major reason of the extremely high number of death/missing. This helped Japan to device disaster resilient city development planning in the Rehabilitation Master Plan.

Thus the way forward for consideration for the policymakers on data analytics and confidentiality debate is that there should be clear objectives of using big data which is essential to mobilise multiple institutions. There may be a clear balance between the needs of open data platform and its cost of operation and maintenance as well as the importance of making decisions based on the big data analysis. There is no doubt to the famous quote “information is the oil of the 21st century, and analytics is the combustion engine”.

The writer serves as Deputy Secretary, Ministry of Finance. The views are personal