Multimodal Deepfake Detection via Lip-Audio Cross-Attention and Facial Self-Attention
AI tool detecting deepfakes with 98% accuracy using multimodal audio-visual analysis.
Researchers at Purdue University have developed a method for detecting AI-generated video content of individuals, commonly known as deepfakes. Deepfakes can be used to spread misinformation, damage reputations, and undermine trust in institutions. Thus, as the technology to generate them becomes increasingly widespread, it is important that we have tools to identify fraudulent video content. Purdue's approach to solving this issue analyzes inconsistencies between lip movements and speech audio to determine the legitimacy of a video. By using a multimodal (audio and visual) approach, more robust and reliable detection can be achieved. This technology has applications across the traditional and social media, as well as cybersecurity industries.
Advantages
- Accurate detection of deepfakes
- Multimodal analysis of video data (audio and visual)
- More robust and reliable detection than existing methods
Applications
- Cybersecurity
- Media and Journalism
- Social Media
- Artificial Intelligence / Machine Learning
Technology Validation:
This technology has been validated through training and testing of the model on DeepFake Detection Challenge (DFDC) data set, including over 60,000 training videos and 40,000 validation videos. The model was also tested on real world videos from the internet. The accuracy of the model was 98%.
TRL: 6
Intellectual Property:
Provisional-Patent, 2023-06-27, United States
Utility Patent, 2024-04-24, United States
Keywords: deepfake detection,AI video authentication,lip sync analysis,audio visual forensics,synthetic media detection,misinformation prevention,facial recognition security,media integrity verification,cybersecurity solutions,AI fraud detection