India’s unemployment rate currently sits at 9 per cent. Yet, one in three citizens with at least a bachelor’s degree is out of work. Its working age population is projected to rise from over 750 million today to almost a billion by 2020. At the same time, agricultural employment is in decline, accounting for less than 50 per cent of total employment for the first time in Indian history. These market pressures are pushing the labour force towards higher skilled occupations. Yet, even young, college-educated Indians often lack the requisite skills to obtain these jobs.

It is perhaps with this transition in mind that Finance Minister Arun Jaitley announced Skill India, a programme to give young workers the training needed to find jobs. The goal is to train 500 million workers by 2022. An existing organisation, the National Skill Development Corporation, as well as 18 other ministries, have run skill development programmes in the past. Skill India aims to consolidate and replace these fragmented initiatives. However, the Government needs not only to upgrade its programmes but also its data collection and evaluation systems.

The German example The argument for high-quality accessible data is that, regardless of ideology, governments should pursue effective policies. Additionally, even well-intentioned policies can have perverse effects. Data-driven analysis is needed for both. For example, recent research shows that India’s child labour ban led to an increase in child labourers and a decrease in their wages.

At present, the National Sample Survey is the only source of nationally representative labour market data. At best, it can uncover some broad trends in employment levels. Given India’s large labour force and many state-sponsored initiatives, a richer employee-employer benefit-linked data set is needed for meaningful policy evaluation. Other countries have made considerable progress in this regard.

Fifteen years ago in Germany, a push from researchers and the strong will of the administration led to the creation of a unique database on individual workers by the Federal Employment Agency. The data originate from notifications into the social security system and employees’ basic information on employment — a rich set of socio-demographic characteristics of the employee and some information on the employing establishment — are put together and annually submitted. Using unique social security numbers, information on employment is combined with other data such as periods of unemployment benefits, registered job-searches, and participation in programmes and training schemes.

Widely used The resulting database allows researchers and policymakers to follow workers from the beginning of their training until they leave the labour force and enter the pension system. As a consequence, almost all active labour market programmes in Germany are evaluated using these data. Prominent examples include the evaluation of the so-called “Hartz reforms,” major labour market reforms in 2005 where new payment schemes for unemployment benefits were introduced and the recently introduced comprehensive minimum wage.

Most of this evaluation comes at a very low cost to the government. Resources like the German employee-employer matched data have become the gold standard for high quality, academic research in labour economics, as the data are publicly accessible for research while preserving confidentiality. Rules are in place to ensure that the privacy of any single individual in the database is maintained.

The Federal Employment Agency has a Research Data Centre, a facility specifically designed to provide researchers with access to confidential micro data in a secure environment in compliance with privacy laws. Several field offices of the German RDC have opened in the US, making this data available for analysis to academic researchers.

India’s head start India does not have to start from scratch to create a similar data set. In fact, most of the raw data needed are already collected. Many industries have to report employee wages to comply with Employee Provident Fund and Employee State Insurance laws. Individuals are required to report income in their tax returns. Almost all benefit programmes in India collect and maintain their own data. The only thing remaining is to link and clean the data originating from different sources.

It might be argued that the creation of this data set is pointless as most of India’s labour force is informal or contract-based. However, the same problems plague any data collection effort, including the NSSO’s. On the contrary, there are many big picture benefits. Hard data on what works will lead to greater stability in India’s labour policies. The knowledge we glean can be used in fixing India’s labour laws to encourage greater formal participation. Better data leads to more efficient investment by the Government as well as outside agencies. Most importantly, the costliest part of this endeavour, collecting the raw data, is already being done.

The lesson from the German experience is clear: preparing data is not a bureaucratic burden nor is it prohibitively expensive. Creating, maintaining, and allowing access to administrative data will encourage high quality research on the Indian labour market. The benefits to policymakers and the Indian public are clear. Effective interventions can only happen by chance unless India invests in systems that help it understand how its labour markets operate.

(Bender and Heining are with the German Federal Employment Agency at the Institute for Employment Research. Krishnan is doing his PhD at UC Berkeley. This article is by special arrangement with the Center for the Advanced Study of India, University of Pennsylvania)