“Uncanny valley-leaping” facial animation: Epic’s new motion-capture tech has to be seen to be believed

Published: Mon, 03/27/23

JOIN THE ISPR PRESENCE COMMUNITY ON FACEBOOK

“Uncanny valley-leaping” facial animation: Epic’s new motion-capture tech has to be seen to be believed

March 27, 2023

Share this post:

Comment

Read Online

[The era in which we can’t distinguish even high-definition video of real people from quickly and easily created computer-generated imagery (CGI) is fast approaching. The Ars Technica story below describes a recent demonstration of the MetaHuman Animator motion-capture animation technology this way: “Epic showed off the new machine-learning-powered system, which needed just a few minutes to generate impressively real, uncanny-valley-leaping facial animation from a simple head-on video taken on an iPhone.” The coverage in PC Gamer adds this:

“The potential here is huge. Not only will major studios be able to create facial animations in a fraction of the time, allowing for increasingly realistic interactions in upcoming games, but smaller developers will be able to create mocap-quality scenes with just a phone and a PC, instead of an entire studio full of 4D cameras and those little white dot things.”

See the original stories for the 5:08 minute video demonstration and 1:07 trailer the authors describe (also available on YouTube here and here). –Matthew]

[Image: Source: TechSpot]

Epic’s new motion-capture animation tech has to be seen to be believed

“MetaHuman Animator” goes from iPhone video to high-fidelity 3D movement in minutes.

By Kyle Orland
March 23, 2023

SAN FRANCISCO—Every year at the Game Developers Conference, a handful of competing companies show off their latest motion-capture technology, which transforms human performances into 3D animations that can be used on in-game models. Usually, these technical demonstrations involve a lot of specialized hardware for the performance capture and a good deal of computer processing and manual artist tweaking to get the resulting data into a game-ready state.

Epic’s upcoming MetaHuman facial animation tool looks set to revolutionize that kind of labor- and time-intensive workflow. In an impressive demonstration at Wednesday’s State of Unreal stage presentation, Epic showed off the new machine-learning-powered system, which needed just a few minutes to generate impressively real, uncanny-valley-leaping facial animation from a simple head-on video taken on an iPhone.

The potential to get quick, high-end results from that kind of basic input “has literally changed how [testers] work or the kind of work they can take on,” Epic VP of Digital Humans Technology Vladimir Mastilovic said in a panel discussion Wednesday afternoon.

A stunning demo

The new automatic animation technology builds on Epic’s MetaHuman modeling tool, which launched in 2021 as a way to manually create highly detailed human models in Unreal Engine 5. Since that launch, over 1 million users have created millions of MetaHumans, Epic said, some from just a few minutes of processing on three photos of a human face.

The main problem with these MetaHumans, as Mastilovic put it on stage Wednesday morning, is that “animating them still wasn’t easy.” Even skilled studios would often need to use a detailed “4D capture” from specialized hardware and “weeks or months of processing time” and human tweaking to get game-usable animation, he said.

MetaHuman Animator is designed to vastly streamline that process. To demonstrate that, Epic relied on Ninja Theory Performance Artist Melina Juergens, known for her role as Senua in 2017’s Hellblade: Senua’s Sacrifice.

Juergens’ 15-second on-stage performance was captured on a stock iPhone mounted on a tripod in front of her. The resulting video of that performance was then processed on a high-end AMD machine in less than a minute, creating a 3D animation that was practically indistinguishable from the original video.

The speed and fidelity of the result drew a huge round of applause from the developers gathered at the Yerba Buena Center for the Arts and really needs to be seen to be believed. Tiny touches in Juergens’ performance—from bared teeth to minuscule mouth quivers to sideways glances—are all incorporated into the animation in a way that makes it almost indistinguishable from the original video. Even realistic tongue movements are extrapolated from the captured audio, using an “audio to tongue” algorithm that “is what it sounds like,” Mastilovic said.

What’s more, Epic also showed how all those facial tics could be applied not just to Juergens’ own MetaHuman model but to any model built on the same MetaHuman standard. Seeing Juergens’ motions and words coming from the mouth of a highly stylized cartoon character, just minutes after she performed them, was striking, to say the least.

The presentation finished with the debut of a performance-focused trailer for the upcoming Senua’s Saga: Hellblade II. That trailer is made all the more impressive by Mastilovic saying that Juergens’ full-body motion-captured performance in it “hasn’t been polished or edited in any way and took a MetaHuman animator just minutes to process, start to finish.”

The machines are learning

At a panel later in the day, Mastilovic discussed how the MetaHuman Animator is powered in part by “a large, varied, highly curated database” of detailed facial-capture data that Epic has gathered over the years (with the help of acquisitions like 3Lateral, Cubic Motion, and Hyprsense). That wide array of curated faces is then processed into what it calls an “n-dimensional human space”—essentially a massive database of all the ways different parts of different head morphologies can move and stretch.

First, Epic’s own “facial solver and landmark detector” identifies the key “rigging” points on the video-captured face. Then, using those points, a machine-learning algorithm essentially maps each video frame to its nearest neighbor in that massive n-dimensional database of captured faces. The algorithm uses a “semantic space solution” that Mastilovic said guarantees the resulting animation “will always work the same in any face logic… it just doesn’t break when you move it onto something else.”

Unlike some other machine-learning models, Mastilovic said MetaHuman Animator “doesn’t hallucinate any details.” The focus is on generating animation that exactly matches the performance that’s put into it. And Mastilovic said the model is pretty resilient to low light and background distractions in the shot.

Tools to generate usable motion-capture animation data in a short timescale aren’t entirely new—Epic showed off something similar in 2016, using a real-time performance of an early Hellblade cut scene. Back then, though, the performance required Juergens to be outfitted with special makeup and tracking dots and to wear a head-mounted lighting and camera rig. And the resulting “real-time” model, while impressive for the time, was more suited for a quick “pre-visualization” rather than final, cut-scene-ready rendering performance data.

Epic’s new MetaHuman Animator, on the other hand, looks like it could be used even by small developers to create highly convincing 3D animation without the usual time and labor investment. Mastilovic said he hopes the tool’s wider launch this summer leads to a “democratization of complex character technologies” by “allowing you to work faster and see the results immediately.”

Based on the demo we saw at GDC this week, we’d say that goal seems well within reach.

Connect with ISPR:

Read Online

RSS

Managing Editor: Matthew Lombard

The International Society for Presence Research (ISPR) is a not-for-profit organization that supports academic research related to the concept of (tele)presence. ISPR Preseence News is available via RSS feed, various e-mail formats and/or Twitter. For more information please visit us at http://ispr.info.

ISPR
2020 N. 13th Street, Rm. 205
Philadelphia, PA 19122
USA

Unsubscribe | Change Subscriber Options