A team of researchers led by Assoc Prof Lu Shijian from the NTU School of Computer Science and Engineering has developed a computer program that creates realistic videos that reflect the facial expressions and head movements of the person speaking, only requiring an audio clip and a face photo.
DIverse yet Realistic Facial Animations, or DIRFA, is an artificial intelligence-based program that takes audio and a photo and produces a 3D video showing the person demonstrating realistic and consistent facial animations synchronised with the spoken audio. The NTU-developed program improves on existing approaches, which struggle with pose variations and emotional control.
To accomplish this, the team trained DIRFA on over one million audiovisual clips from over 6,000 people derived from an open-source database to predict cues from speech and associate them with facial expressions and head movements.
The researchers said DIRFA could lead to new applications across various industries and domains, including healthcare, as it could enable more sophisticated and realistic virtual assistants and chatbots, improving user experiences. It could also serve as a powerful tool for individuals with speech or facial disabilities, helping them to convey their thoughts and emotions through expressive avatars or digital representations, enhancing their ability to communicate.