Artificial intelligence has become an essential part of our daily lives, impacting everything from our shopping habits to our healthcare. However, the increased dependence on AI systems has also opened up new avenues for cyber threats, including ‘data poisoning’ attacks. These malicious acts involve inputting inaccurate information into AI systems during their training, leading them to acquire distorted patterns and make biased or inaccurate judgments. Data poisoning poses a significant risk to AI in general, but generative large language models (LLMs) are particularly vulnerable as they are trained using massive datasets — often from the public internet — that enable them to recognise, translate, predict, or generate text or other content.

There are two primary forms of data poisoning attacks. Targeted attacks seek to manipulate the AI’s behaviour for specific inputs. As an example, a malicious individual could manipulate a facial recognition system to incorrectly identify a specific individual. Unfocused or untargeted attacks, on the other hand, seek to diminish the overall performance of the AI by injecting irrelevant data to significantly impact the system’s accuracy across different inputs. These are similar to poisoning a well, and the consequences can be as wide-ranging.

Errors in AI systems

AI models that are not impartial can contribute to the continuation of social disparities in various domains, including loan approvals, hiring practices, and the criminal justice system. Errors in AI systems involved in self-driving cars can result in accidents. The potential for disruption is extensive, affecting a wide range of areas, including financial markets and national security.

While some companies have made efforts to improve their data collection practices, incorporate diverse and reliable datasets for training AI models, and mitigate the risk of manipulation, it may be necessary to introduce a legislation to address concerns related to data poisoning. This is especially so as data poisoning may not always be malicious or criminal in nature as it may also act as a tool for artists to defend their artwork from copyright infringement. For example, it may help altering the pixels of an image and when an AI model is trained on these poisoned samples, the hidden characteristics of these images slowly deteriorate the output of the model.

The recent introduction of the Artificial Intelligence Act by the European Parliament is the first set of comprehensive regulations to govern AI and represent a significant step forward. It acknowledges the potential danger of data and requires specific precautions for AI systems deemed “high-risk”. These systems, such as facial recognition or those used in critical infrastructure, need to be designed with a strong encryption against data poisoning attacks. Article 15 of the Act requires the implementation of “technical solutions to address AI specific vulnerabilities including, where appropriate, measures to prevent and control for attacks trying to manipulate the training dataset (‘data poisoning’), inputs designed to cause the model to make a mistake (‘adversarial examples’), or model flaws”. It also argues for limiting control by restricting access to training data sets and for ensuring that only authorised personnel have the ability to modify the data is crucial in order to minimise the risk of tampering.

With a strong emphasis on data integrity and cybersecurity, Article 15 aims to foster the creation of reliable AI systems. Ensuring fairness and avoiding discrimination is crucial, as it helps build public confidence in AI technology. Nevertheless, there are still some obstacles to overcome. The development of enforcement mechanisms for Article 15 is ongoing. In addition, establishing international collaboration on data security in AI development continues to be a challenge. In spite of these difficulties, the EU’s approach provides valuable insights for other countries.

The writer is Advocate, Madras High Court