Researchers at IIT Madras’s ‘Initiative for Biological Systems Engineering’ (IBSE) are poring over millions of data points to see why babies are delivered prematurely in India. They want to develop models that will predict the possibility of preterm births and help pregnant women guard against such deliveries. The IBSE is an interdisciplinary group using data science to solve biological problems with machine learning.

Professors Himanshu Sinha, quantitative geneticist, and Karthik Raman and Raghunathan Rengaswamy, chemical engineers, have the basic raw material for their research — oodles of data. The Translational Health Science and Technology Institute (THSTI), a government clinical research institute of the Department of Biotechnology, under a programme called Garbh-Ini — a pregnancy cohort to study preterm birth in India led by Dr Shinjini Bhatnagar — has gathered from the Gurugram Civil Hospital since 2015 a mind-boggling 1,300 parameters for each of the 8,000 pregnant women surveyed.

Some of these are microbiome data collected from saliva, feces, vagina; some, information ultrasound scans, and some more on clinical parameters such as blood samples, temperature and blood pressure. Other pieces of data relate to socio-economic factors — income levels, number of rooms in the house, the type of cooking stoves used (for possible smoke effects) and so on.

Now, using machine learning, the researchers will develop a model that will show, early during a pregnancy, if a woman runs the risk of a preterm delivery.

India is the preterm delivery capital of the world. Thirteen per cent of the deliveries in India are preterm, which works out to a quarter of all preterm deliveries in the world. Half of the babies delivered early in India don’t survive beyond five months. (Preterm is before 37 weeks, while normal term is 40 weeks.) Obviously, this situation needs correction.

Sinha and Raman told Quantum that they got really ‘clean’ data — all numbers checked, outliers verified and properly formatted for machine learning. But some challenges popped up.

One was ‘class imbalance’, a common problem in machine learning. “The algorithm will learn more from the majority class in the sample and less from the minority,” explains Sinha. In this case, the majority of the pregnant women are ‘normal term’; only about 13 per cent are preterm. If this is not factored in — there are many techniques to do that — the predictions will be less accurate. “The factors that cause preterm would be not learnt,” Sinha says.

‘Non-linear effects’ posed another challenge. Simply put, the effect of something the pregnant woman does in the first three months may pop up, not in the second three months, but much later in the pregnancy. It is easier to predict linear effects than non-linear ones.

The researchers have analysed the data of the first three months and developed the first India-specific model to date the pregnancy in the first trimester. Data pertaining to the next two trimesters is being processed right now.

In the end, the doctors will have tools to know if there is a high likelihood of a preterm birth which will point to the need for corrective measures. The outcome will be healthy babies and happy mothers.