A ‘deepfake’ is a type of synthetic media that uses deep learning algorithms to create realistic images, videos or audio recordings that are difficult to distinguish from the real ones. In a deepfake video, a person’s face is layered over another person’s, creating a realistic-looking fake video.

Deepfakes are created using machine learning algorithms, which are trained on large datasets of images and videos to learn how to generate real-looking media. They are becoming more and more popular on social networking sites and can now be created much more quickly and cheaply due to advances in technology.

While deepfakes have the potential for positive applications, such as in the entertainment industry, they also pose a significant threat to the society by enabling the spread of false information and the manipulation of public opinion. Being able to distinguish between real and fake videos can help prevent the spread of false information and protect individuals from potential harm.

In the paper, “Revealing and classification of deepfakes video’s images using a customize convolution neural network model”, Usha Kosarkar, Gopal Sarkarkar and Shilpa Gedam, propose a strategy to detect deepfakes by utilising residual noise — which is the difference between the original and its denoised form.

The authors used a multi-layer perceptron convolutional neural network (MLP-CNN) model for the study. The MLP-CNN model consists of multiple layers of perceptrons and convolutional layers, which are trained on a dataset of genuine and altered video frames to learn how to distinguish between the two by analysing the residual noise in the frames.

The proposed model consists of three main components: pre-processing, feature extraction and classification. In the pre-processing stage, the video frames are first denoised to remove any noise that may interfere with the detection process. Then, the residual noise is extracted from the denoised frames.

In the feature extraction stage, the residual noise is used to extract features that are unique to deepfake videos. The authors used a CNN with transfer learning to extract these features which are then used to train a binary classifier that can distinguish between genuine and altered videos.

In the classification stage, the binary classifier is used to classify each video frame as either genuine or altered. A threshold value was used to determine the authenticity of a video based on the percentage of frames that are classified as altered.

Low-resolution video clips from FaceForensics++ and high-resolution video clips from Kaggle DFDC (Deepfake detection challenge) was used by the researchers to test the effectiveness of the technique. The performance of the model was also compared with other competing.

The authors reported that the MLP-CNN model achieved a high degree of accuracy at 95.5 per cent, which is higher than the testing accuracy achieved by the CNN model alone (85.2 per cent).

The nature of AI is such that as deepfake detection becomes more effective, it will be a race between creating better deepfakes and detecting deepfakes better.