Esophageal cancer or cancer of the food pipe is among common cancers that occur in India. Every year. some 47,000 new cases are reported and 42,000 people die because of this disease. Its occurrence is particularly high in the north-eastern states. It is often diagnosed late as its symptoms are not very specific and patients are treated for other causes. In such a situation, early diagnosis can help save lives.

Researchers from the Indian Institute of Technology, Kharagpur have used machine learning techniques to come up with a possible solution. They have developed a machine learning-based algorithm for predicting signs of esophageal cancer, based on demographic data and results of certain clinical tests. This can help in screening people for further tests to confirm if they indeed have cancer or not. It is actually a pre-screening tool which can be used by health workers in rural areas.

The software has been developed using data of 3,000 persons collected by mobile screening vans of Mumbai-based Tata Memorial Hospital in rural areas of Maharashtra. From the data collected by paramedical staff on various points, researchers used data on 49 points such as tobacco consumption, tobacco chewing duration, alcohol consumption, cancer deaths in family, difficulty in swallowing etc. The cancer of the food pipe is usually accompanied by symptoms like pain while swallowing and hoarse voice.

“This software may be installed in premises of hospitals or health centres or can be hosted in cloud and accessed over the internet. A suspected patient can enter his or her demographic information, lifestyle details and available clinical test results. The software can predict if the patient has a particular disease. The prediction can be refined by adding more test results. If the prediction is positive, he or she may contact a doctor for further tests and treatment,”explained Dr Sourangshu Bhattacharya, assistant professor of computer science and engineering at IIT Kharagpur, who co-authored the study along with Ph D student Asis Roy.

The researchers used open source machine learning software - Weka and LibSVM - along with python for developing the prediction software. The objective was to control the parameters of machine learning algorithm so as to make false normal rate (number of diseased people being marked as normal) zero and selection of features (tests conducted by medical laboratories) based on criteria of cost or convenience.

“We searched through all combinations of 15 tests, costing a total of Rs 6500 and found subsets of tests costing Rs 2000, which have zero false negative rate. We could then find the subset which gives the highest accuracy. Similarly, we assigned indices of discomfort to each of the tests and assigned budgets to total discomfort that a patient may be willing or be able to suffer in order to get an initial diagnosis. The main idea was to allow users of the software or implementing agencies to be able to customize selection of initial tests based on individual requirements,” explained Dr Bhattacharya while speaking to India Science Wire.


Machine learning based algorithm can facilitate in the prediction of esophageal cancer relying on demographic, lifestyle, medical history and customized clinical test, with a very high accuracy up to 99.18% with a sensitivity nearing 100%, researchers have claimed in the study published in journal Artificial Intelligence in Medicine.

India Science Wire


Twitter handle: @dineshcsharma