Ira Kemelmacher-Shlizerman – UW News

ClearBuds: First wireless earbuds that clear up calls using deep learning

Sarah McQuate — Mon, 11 Jul 2022 15:55:49 +0000

ClearBuds use a novel microphone system and are one of the first machine-learning systems to operate in real time and run on a smartphone. Photo: Raymond Smith/��Ӱ�Ӵ�ý

As meetings shifted online during the COVID-19 lockdown, many people found that chattering roommates, garbage trucks and other loud sounds disrupted important conversations.

This experience inspired three ��Ӱ�Ӵ�ý researchers, who were roommates during the pandemic, to develop better earbuds. To enhance the speaker’s voice and reduce background noise, “ClearBuds” use a novel microphone system and one of the first machine-learning systems to operate in real time and run on a smartphone.

The researchers at the ACM International Conference on Mobile Systems, Applications, and Services.

“ClearBuds differentiate themselves from other wireless earbuds in two key ways,” said co-lead author , a doctoral student in the Paul G. Allen School of Computer Science & Engineering. “First, ClearBuds use a dual microphone array. Microphones in each earbud create two synchronized audio streams that provide information and allow us to spatially separate sounds coming from different directions with higher resolution. Second, the lightweight neural network further enhances the speaker’s voice.”

While most commercial earbuds also have microphones on each earbud, only one earbud is actively sending audio to a phone at a time. With ClearBuds, each earbud sends a stream of audio to the phone. The researchers designed Bluetooth networking protocols to allow these streams to be synchronized within 70 microseconds of each other.

The team’s neural network algorithm runs on the phone to process the audio streams. First it suppresses any non-voice sounds. And then it isolates and enhances any noise that’s coming in at the same time from both earbuds — the speaker’s voice.

“Because the speaker’s voice is close by and approximately equidistant from the two earbuds, the neural network can be trained to focus on just their speech and eliminate background sounds, including other voices,” said co-lead author , a doctoral student in the Allen School. “This method is quite similar to how your own ears work. They use the time difference between sounds coming to your left and right ears to determine from which direction a sound came from.”

Shown here, the ClearBuds hardware (round disk) in front of the 3D printed earbud enclosures. Photo: Raymond Smith/��Ӱ�Ӵ�ý

When the researchers compared ClearBuds with Apple AirPods Pro, ClearBuds performed better, achieving a higher signal-to-distortion ratio across all tests.

“It’s extraordinary when you consider the fact that our neural network has to run in less than 20 milliseconds on an iPhone that has a fraction of the computing power compared to a large commercial graphics card, which is typically used to run neural networks,” said co-lead author , a doctoral student in the Allen School. “That’s part of the challenge we had to address in this paper: How do we take a traditional neural network and reduce its size while preserving the quality of the output?”

The team also tested ClearBuds “in the wild,” by recording eight people reading from in noisy environments, such as a coffee shop or on a busy street. The researchers then had 37 people rate 10- to 60-second clips of these recordings. Participants rated clips that were processed through ClearBuds’ neural network as having the best noise suppression and the best overall listening experience.

For more information, check out the team’s .
The hardware and software design for ClearBuds is open source and

One limitation of ClearBuds is that people have to wear both earbuds to get the noise suppression experience, the researchers said.

But the real-time communication system developed here can be useful for a variety of other applications, the team said, including smart-home speakers, tracking robot locations or search and rescue missions.

The team is currently working on making the neural network algorithms even so that they can run on the earbuds.

Additional co-authors are , an associate professor in the Allen School; , a professor in both the Allen School and the electrical and computer engineering department; and and , both professors in the Allen School. This research was funded by The National Science Foundation and the ��Ӱ�Ӵ�ý’s Reality Lab.

For more information, contact the team at clearbuds@cs.washington.edu.

Behind the magic: Making moving photos a reality

Sarah McQuate — Tue, 11 Jun 2019 16:32:54 +0000

People moving in and out of photographs used to be reserved for the world of Harry Potter. But now computer scientists at the ��Ӱ�Ӵ�ý have brought that magic to real life.

Pablo Picasso’s “Untitled” (1939) steps out of the frame. Photo: ��Ӱ�Ӵ�ý

Their algorithm, , can take a person from a 2D photo or a work of art and make them run, walk or jump out of the frame. The system also allows users to view the animation in three dimensions using augmented reality tools. The researchers will be presenting their results June 19 at the in Long Beach, California. This research first attracted media attention when it was in preprint form in December on ArXiv.

“This is a very hard fundamental problem in computer vision,” said co-author , an associate professor at the UW’s Paul G. Allen School of Computer Science & Engineering. “The big challenge here is that the input is only from a single camera position, so part of the person is invisible. Our work combines technical advancement on an open problem in the field with artistic creative visualization.”

Previously, researchers thought it would be impossible to animate a person running out of a single photo.

“There is some previous work that tries to create a 3D character using multiple viewpoints,” said co-author , a professor in the Allen School. “But you still couldn’t bring someone to life and have them run out of a scene, and you couldn’t bring AR into it. It was really surprising that we could get some compelling results with using just one photo.”

The applications of Photo Wake-Up are numerous, the team says. The researchers envision this could lead to a new way for gamers to create avatars that actually look like them, a method for visitors to interact with paintings in an art museum — say sitting down to have tea with Mona Lisa — or something that lets children to bring their drawings to life. Examples in the research paper include animating the Golden State Warriors’ Stephen Curry to run off the court, Paul McCartney to leap off the cover of the “Help!” album and Matisse’s ��(1944) to leave his frame.

Matisse’s “Icarus” (1944) Photo: ��Ӱ�Ӵ�ý

To make the magic a reality, Photo Wake-Up starts by identifying a person in an image and making a mask of the body’s outline. From there, it matches a 3D template to the subject’s body position. Then the algorithm does something surprising: In order to warp the template so that it actually looks like the person in the photo, it projects the 3D person back into 2D.

“It’s very hard to manipulate in 3D precisely,” said co-author , a doctoral student in the Allen School. “Maybe you can do it roughly, but any error will be obvious when you animate the character. So we have to find a way to handle things perfectly, and it’s easier to do this in 2D.”

Photo Wake-Up stores 3D information for each pixel: its distance from the camera or artist and how a person’s joints are connected together. Once the template has been warped to match the person’s shape, the algorithm pastes on the texture — the colors from the image. It also generates the back of the person by using information from the image and the 3D template. Then the tool stitches the two sides together to make a 3D person who will be able to turn around.

Stephen Curry runs off the court. Photo: ��Ӱ�Ӵ�ý

Once the 3D character is ready to run, the algorithm needs to set up the background so that the character doesn’t leave a blank space behind. Photo Wake-Up fills in the hole behind the person by borrowing information from other parts of the image.

See related stories from ,��Ի� .

Right now Photo Wake-Up works best with images of people facing forward, and can animate both artistic creations and photographs of real people. The algorithm can also handle some photos where people’s arms are blocking part of their bodies, but it is not yet capable of animating people who have their legs crossed or who are blocking large parts of themselves.

“Photo Wake-Up is a new way to interact with photos,” Weng said. “It can’t do everything yet, but this is just the beginning.”

Photo Wake-up also allows users to view the animation in three dimensions using augmented reality tools. Photo: ��Ӱ�Ӵ�ý

This research was funded by the National Science Foundation, UW Animation Research, UW Reality Lab, Facebook, Huawei and Google.

###

For more information, contact Weng at chungyi@cs.washington.edu, Kemelmacher-Shlizerman at kemelmi@cs.washington.edu or Curless at curless@cs.washington.edu.

Grant number: #VEC1538618

UW Reality Lab launches with $6M from tech companies to advance augmented and virtual reality research

Jennifer Langston — Mon, 08 Jan 2018 14:58:40 +0000

The UW Reality Lab will focus on developing next-generation virtual and augmented reality technologies and educating an industry workforce. In this holographic chess game developed by UW students, opponents move pieces that can only be seen through a virtual reality headset. Photo: Dennis Wise/��Ӱ�Ӵ�ý

The ��Ӱ�Ӵ�ý is launching a new augmented and virtual reality research center — funded by Facebook, Google, and Huawei — to accelerate innovation in the field and educate the next generation of researchers and practitioners.

The $6 million , funded with equal contributions from the three initial sponsors, creates one of the world’s first academic centers dedicated to virtual and augmented reality. The new center in the Paul G. Allen School of Computer Science & Engineering and located in Seattle — a national hub of VR activity — will support research and education initiatives with potential to deliver game-changing breakthroughs in the field.

“Allen School faculty have produced pioneering research in many of the areas that underpin AR and VR technologies, including computer vision, graphics, perception, and machine learning,” said , Allen School director and Wissner-Slivka Chair in Computer Science and Engineering. “Through our partnership with Facebook, Google, and Huawei, the Allen School and UW will be at the forefront of the next great wave of AR and VR innovation — pursuing breakthrough research and educating the next generation of innovators in this exciting and rapidly expanding field.”

To date, AR and VR applications are making their first steps, mostly focusing on entertainment and games. Yet everyone is interested to find the “killer app” for AR and VR. The goal of the UW Reality Lab is to develop technology to power the next generation of applications that will speak to a wider population. Those diverse ideas range from learning Spanish by seeing objects labeled in your field of view to achieving telepresence by conversing with a remote relative or co-worker as if you were in the same room.

UW Reality Lab Advisory Board

Michael Abrash, Chief Scientist, Oculus
Michael Cohen, Director, Computational Photography Group, Facebook
Paul Debevec, Senior Researcher at Google Daydream and Adjunct Research Professor at the University of Southern California’s Institute for Creative Technologies
Shahram Izadi, CTO, PerceptiveIO
Wei Su, Senior Architect of Fields Lab, Huawei Seattle Research Center
Fan Zhang, Chief Architect, Head of Fields Lab, Huawei Seattle Research Center

“We’re seeing some really compelling and high quality AR and VR experiences being built today,” said center co-lead and Allen School professor . “But, there are still many core research advances needed to move the industry forward — tools for easily creating content, infrastructure solutions for streaming 3D video, and privacy and security safeguards — that university researchers are uniquely positioned to tackle.”

The UW Reality Lab will bring together an interdisciplinary team of UW faculty, graduate students and undergraduates working in 3D computer vision and perception, object recognition, graphics, game science and education, distributed computing, stream processing, databases, computer architecture, and privacy and security.

Another key function of the UW Reality Lab will be to educate tomorrow’s AR and VR researchers and workers. The funding will support new courses and access to state-of-the-art labs and infrastructure for UW students to develop new technologies and applications. That includes accessing emerging technologies from the center’s sponsors and allowing those companies to test new ideas in a focused setting with computer science students. An advisory board of luminaries from across the AR and VR community will help the center remain at the forefront of this burgeoning field.

The UW Reality Lab builds on the Allen School’s established leadership in cutting-edge AR/VR education and research. In one example, the school in 2016 introduced the world’s , in which students built AR applications using 40 HoloLens units loaned from Microsoft before they were commercially available.

One goal of the UW Reality Lab — funded with initial investments from Facebook, Google and Huawei — is to achieve telepresence, allowing one to have a lifelike conversation with a person in a remote location. Photo: Dennis Wise/��Ӱ�Ӵ�ý

“Students had fantastic ideas and were able to create amazing AR and VR applications ranging from Holographic Chess to teaching one how to play the piano or cook. This opened our eyes to the potential of investing deeper in development of algorithms and applications for AR and VR,” said center co-lead and Allen School assistant professor . “We realized there were so many cool things we could do if only we had more resources, more time and more devices. Given those, we can help bring the world’s AR and VR dreams to life.”

The UW Reality Lab’s location in Seattle — one of the world’s most active centers for VR and AR innovation — paves the way for unique industry and academic collaborations aimed at achieving new capabilities and offering users seamless experiences.

“Having an opportunity to be at the leading edge of this industry is really exciting,” said co-lead and Allen School professor . “It’s big, it’s happening now and there’s a lot of research to be done. We’re thrilled to take a leading role in making it all happen.”

For more information, contact realitylab@cs.washington.edu or on Twitter.

Lip-syncing Obama: New tools turn audio clips into realistic video

Jennifer Langston — Tue, 11 Jul 2017 16:27:33 +0000

��Ӱ�Ӵ�ý researchers have developed new algorithms that solve a thorny challenge in the field of computer vision: of the person speaking those words.

As detailed in a to be presented Aug. 2 at , the team successfully generated of former president Barack Obama talking about terrorism, fatherhood, job creation and other topics using audio clips of those speeches and existing weekly video addresses that were originally on a different topic.

“These type of results have never been shown before,” said an assistant professor at the UW’s Paul G. Allen School of Computer Science & Engineering. “Realistic audio-to-video conversion has practical applications like improving video conferencing for meetings, as well as futuristic ones such as being able to hold a conversation with a historical figure in virtual reality by creating visuals just from audio. This is the kind of breakthrough that will help enable those next steps.”

In a visual form of lip-syncing, the system converts audio files of an individual’s speech into realistic mouth shapes, which are then grafted onto and blended with the head of that person from another existing video.

The team chose Obama because the machine learning technique needs available video of the person to learn from, and there were hours of presidential videos in the public domain. “In the future video, chat tools like Skype or Messenger will enable anyone to collect videos that could be used to train computer models,” Kemelmacher-Shlizerman said.

Because streaming audio over the internet takes up far less bandwidth than video, the new system has the potential to end video chats that are constantly timing out from poor connections.

“When you watch Skype or Google Hangouts, often the connection is stuttery and low-resolution and really unpleasant, but often the audio is pretty good,” said co-author and Allen School professor . “So if you could use the audio to produce much higher-quality video, that would be terrific.”

By reversing the process — feeding video into the network instead of just audio — the team could also potentially develop algorithms that could detect whether a video is real or manufactured.

The new machine learning tool makes significant progress in overcoming what’s known as the “” problem, which has dogged efforts to create realistic video from audio. When synthesized human likenesses appear to be almost real — but still manage to somehow miss the mark — people find them creepy or off-putting.

“People are particularly sensitive to any areas of your mouth that don’t look realistic,” said lead author , a recent doctoral graduate in the Allen School. “If you don’t render teeth right or the chin moves at the wrong time, people can spot it right away and it’s going to look fake. So you have to render the mouth region perfectly to get beyond the uncanny valley.”

A neural network first converts the sounds from an audio file into basic mouth shapes. Then the system grafts and blends those mouth shapes onto an existing target video and adjusts the timing to create a new realistic, lip-synced video. Photo: ��Ӱ�Ӵ�ý

Previously, audio-to-video conversion processes have involved filming multiple people in a studio saying the same sentences over and over to try to capture how a particular sound correlates to different mouth shapes, which is expensive, tedious and time-consuming. By contrast, Suwajanakorn developed algorithms that can learn from videos that exist “in the wild” on the internet or elsewhere.

“There are millions of hours of video that already exist from interviews, video chats, movies, television programs and other sources. And these deep learning algorithms are very data hungry, so it’s a good match to do it this way,” Suwajanakorn said.

Rather than synthesizing the final video directly from audio, the team tackled the problem in two steps. The first involved training a neural network to watch videos of an individual and translate different audio sounds into basic mouth shapes.

By combining from the team with a new mouth synthesis technique, they were then able to realistically superimpose and blend those mouth shapes and textures on an existing reference video of that person. Another key insight was to allow a small time shift to enable the neural network to anticipate what the speaker is going to say next.

The new lip-syncing process enabled the researchers to create realistic videos of Obama speaking in the White House, using words he spoke on a television talk show or during an interview decades ago.

Currently, the neural network is designed to learn on one individual at a time, meaning that Obama’s voice — speaking words he actually uttered — is the only information used to “drive” the synthesized video. Future steps, however, include helping the algorithms generalize across situations to recognize a person’s voice and speech patterns with less data – with only an hour of video to learn from, for instance, instead of 14 hours.

“You can’t just take anyone’s voice and turn it into an Obama video,” Seitz said. “We very consciously decided against going down the path of putting other people’s words into someone’s mouth. We’re simply taking real words that someone spoke and turning them into realistic video of that individual.”

The research was funded by Samsung, Google, Facebook, Intel and the UW Animation Research Labs.

For more information, contact the research team at audiolipsync@cs.washington.edu.

Imaging software predicts how you look with different hair styles, colors, appearances

Jennifer Langston — Thu, 21 Jul 2016 16:31:55 +0000

A new personalized image search engine synthesizes versions of an input photo (left) with internet queries such as “curly hair” (top row), in “India” (2nd row), and in “1930” (3rd row). Photo: Ira Kemelmacher-Shlizerman, ��Ӱ�Ӵ�ý

When we go to the hair stylist, we can browse magazines with pictures of models and point to a photo we’d like to try. Actors change appearances all the time to fit a role. Missing people are often disguised by changing their hair color and style.

But how can we predict if an appearance change will look good without physically trying it? Or explore what missing children might look like if their appearance is changed?

for the opportunity to test Dreambit in beta mode

A new personalized image search engine developed by a ��Ӱ�Ӵ�ý computer vision researcher called lets a person imagine how they would look a with different a hairstyle or color, or in a different time period, age, country or anything else that can be queried in an image search engine.

After uploading an input photo, you type in a search term — such as “curly hair,” “India” or “1930s.” The software’s algorithms mine Internet photo collections for similar images in that category and seamlessly map the person’s face onto the results.

will be presented July 25 at , the world’s largest annual conference on computer graphics and interactive techniques. Plans are underway to make the system later this year.

Dreambit draws on conducted at the UW and elsewhere in facial processing, recognition, three-dimensional reconstruction and age progression, combining those algorithms in a unique way to create the blended images.

The new software can also help show what a missing child or person evading the law might look like if their appearance has been purposefully disguised, or even how they would look at an advanced age if years have passed.

Developer , UW assistant professor of computer science and engineering, and her team previously developed that focused only on a person’s face. The new system adds varied hairstyle options and other contextual elements.

Dreambit can predict what a 1-year-old boy (top) and a 4-year-old girl (bottom) will look like at subsequent ages. Photo: Ira Kemelmacher-Shlizerman, ��Ӱ�Ӵ�ý

These new features enable one to imagine what a child might look five or 10 years into the future under different circumstances — with red hair, curly hair, black hair or even a shaved head.

“It’s hard to recognize someone by just looking at a face, because we as humans are so biased towards hairstyles and hair colors,” said Kemelmacher-Shlizerman. “With missing children, people often dye their hair or change the style so age-progressing just their face isn’t enough. This is a first step in trying to imagine how a missing person’s appearance might change over time.”

Another potential application is to envision how a certain actor or actress might appear in a role. For example, the system can marry internet photographs of the actress Cate Blanchett and Bob Dylan to predict how she would appear playing the Dylan role in the movie “.”

Actors often change their appearances to fit a new role. The new system could help visualize how they would look, as these predictions show for Cate Blanchett playing Bob Dylan. Photo: Ira Kemelmacher-Shlizerman, ��Ӱ�Ӵ�ý

“This is a way to try on different looks or personas without actually changing your physical appearance,” said Kemelmacher-Shlizerman, who co-leads the (GRAIL). “While imagining what you’d look like with a new hairstyle is mind blowing, it also lets you experiment with�� creative imaginative scenarios.”

The software system analyzes the input photo and searches for a subset of internet photographs that fall into the desired category but also match the original photo’s face shape, pose and expression.

Its ability to accurately and automatically synthesize two photographs stems from the combination of algorithms that Kemelmacher-Shlizerman assembled, as well as the sheer volume of photos available on the internet.

“The key idea is to find a doppelgänger set — people who look similar enough to you that you can copy certain elements of their appearance,” said Kemelmacher-Shlizerman. “And because the system has hundreds of thousands of photos to choose from, the matching results are spellbinding.”

For more information, contact Kemelmacher-Shlizerman at kemelmi@uw.edu.

How well do facial recognition algorithms cope with a million strangers?

Jennifer Langston — Thu, 23 Jun 2016 18:27:34 +0000

The MegaFace dataset contains 1 million images representing more than 690,000 unique people. It is the first benchmark that tests facial recognition algorithms at a million scale. Photo: ��Ӱ�Ӵ�ý

In the last few years, several groups that their facial recognition systems have achieved near-perfect accuracy rates, performing better than humans at picking the same face out of the crowd.

But those tests were performed on a with only 13,000 images — fewer people than attend an average professional U.S. soccer game. What happens to their performance as those crowds grow to the size of a major U.S. city?

��Ӱ�Ӵ�ý researchers answered that question with the , the world’s first competition aimed at evaluating and improving the performance of face recognition algorithms at the million person scale. All of the algorithms suffered in accuracy when confronted with more distractions, but some fared much better than others.

“We need to test facial recognition on a planetary scale to enable practical applications — testing on a larger scale lets you discover the flaws and successes of recognition algorithms,” said , a UW assistant professor of computer science and the project’s principal investigator. “We can’t just test it on a very small scale and say it works perfectly.”

The UW team first developed a dataset with one million Flickr images from around the world that are publicly available under a Creative Commons license, representing 690,572 unique individuals. Then they challenged facial recognition teams to download the database and see how their algorithms when they had to distinguish between a million possible matches.

Google’s showed the strongest performance , dropping from near-perfect accuracy when confronted with a smaller number of images to 75 percent on the million person test. A team from Russia’s came out on top on, dropping to 73 percent.

Facial recognition algorithms that fared well with 10,000 distracting images all experienced a drop in accuracy when confronted with 1 million images. But some performed much better than others. Photo: ��Ӱ�Ӵ�ý

By contrast, the accuracy rates of other algorithms that had performed well at a small scale dropped by much larger percentages to as low as 33 percent accuracy when confronted with the harder task.

Initial results are detailed in a to be presented at the (CVPR 2016) June 30, and are updated on the project website. More than 300 research groups are working with MegaFace.

The MegaFace challenge tested the algorithms on verification, or how well they could correctly identify whether two photos were of the same person. That’s how an iPhone security feature, for instance, could recognize your face and decide whether to unlock your phone instead of asking you to type in a password.

“What happens if you lose your phone in a train station in Amsterdam and someone tries to steal it?” said Kemelmacher-Shlizerman,��who co-leads the (GRAIL.) “I’d want certainty that my phone can correctly identify me out of a million people — or 7 billion — not just 10,000 or so.”

The MegaFace challenge highlights problems in facial recognition that have yet to be fully solved – such as identifying the same person at different ages and recognizing someone in different poses. Photo: ��Ӱ�Ӵ�ý

They also tested the algorithms on identification, or how accurately they could find a match to the photo of a single individual to a different photo of the same person buried among a million “distractors.” That’s what happens, for instance, when law enforcement have a single photograph of a criminal suspect and are combing through images taken on a subway platform or airport to see if the person is trying to escape.

“You can see where the hard problems are — recognizing people across different ages is an unsolved problem. So is identifying people from their doppelgängers ��Ի� matching people who are in varying poses like side views to frontal views,” said Kemelmacher-Shlizerman. The paper also analyses age and pose invariance in face recognition when evaluated at scale.

In general, algorithms that “learned” how to find correct matches out of larger image datasets outperformed those that only had access to smaller training datasets. But the SIAT MMLab algorithm developed by a , which learned on a smaller number of images, bucked that trend by outperforming many others.

The MegaFace challenge is ongoing and still accepting results.

The team’s next steps include assembling a half a million identities — each with a number of photographs — for a dataset that will be used to train facial recognition algorithms. This will help level the playing field and test which algorithms outperform others given the same amount of large scale training data, as most researchers don’t have access to image collections as large as Google’s or Facebook’s. The training set will be released towards the end of the summer.

“State-of-the-art deep neural network algorithms have millions of parameters to learn and require a plethora of examples to accurately tune them,” said , a UW computer science and engineering master’s student working on the training dataset. “Unlike people, these models are initially a blank slate. Having diversity in the data, such as the intricate��identity cues found across more than 500,000 unique individuals, can increase algorithm performance by providing examples of situations not yet seen.”

The research was funded by the National Science Foundation, Intel, Samsung, Google, and the ��Ӱ�Ӵ�ý Animation Research Labs.

Co-authors include UW computer science and engineering professor , undergraduate student and web developer and former student Daniel Miller.

For more information, contact Kemelmacher-Shlizerman at kemelmi@cs.washington.edu.

What makes Tom Hanks look like Tom Hanks?

Jennifer Langston — Mon, 07 Dec 2015 17:03:45 +0000

Tom Hanks has appeared in many acting roles over the years, playing young and old, smart and simple. Yet we always recognize him as Tom Hanks.

Why? Is it his appearance? His mannerisms? The way he moves?

UW researchers have reconstructed 3-D models of celebrities such as Tom Hanks from large Internet photo collections. The models can be controlled by photos or videos of another person. Photo: ��Ӱ�Ӵ�ý

��Ӱ�Ӵ�ý researchers that it’s possible for machine learning algorithms to capture the “persona” and create a digital model of a well-photographed person like Tom Hanks from the vast number of images of them available on the Internet.

With enough visual data to mine, the algorithms can also animate the digital model of Tom Hanks to deliver speeches that the real actor never performed.

“One answer to what makes Tom Hanks look like Tom Hanks can be demonstrated with a computer system that imitates what Tom Hanks will do,” said lead author , a UW graduate student in computer science and engineering.

The technology relies on advances in 3-D face reconstruction, tracking, alignment, multi-texture modeling and puppeteering that have been developed over the last five years by a research group led by UW assistant professor of computer science and engineering . The new results will be presented in a at the in Chile on Dec. 16.

The team’s latest advances include the ability to transfer expressions and the way a particular person speaks onto the face of someone else — for instance, mapping former president George W. Bush’s mannerisms onto the faces of other politicians and celebrities.

It’s one step toward a grand goal shared by the UW computer vision researchers: creating fully interactive, three-dimensional digital personas from family photo albums and videos, historic collections or other existing visuals.

As virtual and augmented reality technologies develop, they envision using family photographs and videos to create an interactive model of a relative living overseas or a far-away grandparent, rather than simply Skyping in two dimensions.

“You might one day be able to put on a pair of augmented reality glasses and there is a 3-D model of your mother on the couch,” said senior author Kemelmacher-Shlizerman. “Such technology doesn’t exist yet — the display technology is moving forward really fast — but how do you actually re-create your mother in three dimensions?”

One day the reconstruction technology could be taken a step further, researchers say.

“Imagine being able to have a conversation with anyone you can’t actually get to meet in person — LeBron James, Barack Obama, Charlie Chaplin — and interact with them,” said co-author , UW professor of computer science and engineering. “We’re trying to get there through a series of research steps. One of the true tests is can you have them say things that they didn’t say but it still feels like them? This paper is demonstrating that ability.”

Existing technologies to create detailed three-dimensional or digital movie characters like often rely on bringing a person into an elaborate studio. They painstakingly capture every angle of the person and the way they move — something that can’t be done in a living room.

Other approaches still require a person to be scanned by a camera to create basic avatars for video games or other virtual environments. But the UW computer vision experts wanted to digitally reconstruct a person based solely on a random collection of existing images.

To reconstruct celebrities like Tom Hanks, Barack Obama and Daniel Craig, the machine learning algorithms mined a minimum of 200 Internet images taken over time in various scenarios and poses — a process known as learning “in the wild.”

“We asked, ‘Can you take Internet photos or your personal photo collection and animate a model without having that person interact with a camera?'” said Kemelmacher-Shlizerman. “Over the years we created algorithms that work with this kind of unconstrained data, which is a big deal.”

Suwajanakorn more recently developed techniques to capture expression-dependent textures — small differences that occur when a person smiles or looks puzzled or moves his or her mouth, for example.

By manipulating the lighting conditions across different photographs, he developed a new approach to densely map the differences from one person’s features and expressions onto another person’s face. That breakthrough enables the team to “control” the digital model with a video of another person, and could potentially enable a host of new animation and virtual reality applications.

“How do you map one person’s performance onto someone else’s face without losing their identity?” said Seitz. “That’s one of the more interesting aspects of this work. We’ve shown you can have George Bush’s expressions and mouth and movements, but it still looks like George Clooney.”

The research was funded by Samsung, Google, Intel and the ��Ӱ�Ӵ�ý.

For more information, contact Suwajanakorn at supasorn@cs.washington.edu or Kemelmacher-Shilzerman at kemelmi@uw.edu.

Automated age-progression software lets you see how a child will age

Michelle Ma — Wed, 09 Apr 2014 15:20:53 +0000

It’s a guessing game parents like to ponder: What will my child look like when she grows up? A computer could now answer the question in less than a minute.

��Ӱ�Ӵ�ý researchers have that automatically generates images of a young child’s face as it ages through a lifetime. The technique is the first fully automated approach for aging babies to adults that works with variable lighting, expressions and poses.

Using one photo of a 3-year-old, the software automatically renders images of his face at multiple ages while keeping his identity (and the milk moustache). Photo: U of Washington

“Aging photos of very young children from a single photo is considered the most difficult of all scenarios, so we wanted to focus specifically on this very challenging case,” said , a UW assistant professor of computer science and engineering. “We took photos of children in completely unrestrained conditions and found that our method works remarkably well.”

The research team has posted a and will present its findings at the June in Columbus, Ohio.

of age-progressed photos.

The shape and appearance of a baby’s face – and variety of expressions – often change drastically by adulthood, making it hard to model and predict that change. This technique leverages the average of thousands of faces of the same age and gender, then calculates the visual changes between groups as they age to apply those changes to a new person’s face.

More specifically, the software determines the average pixel arrangement from thousands of random Internet photos of faces in different age and gender brackets. An algorithm then finds correspondences between the averages from each bracket and calculates the average change in facial shape and appearance between ages. These changes are then applied to a new child’s photo to predict how she or he will appear for any subsequent age up to 80.

The researchers tested their rendered images against those of 82 actual people photographed over a span of years. In an experiment asking random users to identify the correct aged photo for each example, they found that users picked the automatically rendered photos about as often as the real-life ones.

A single photo of a child (far left) is age progressed (left in each pair) and compared to actual photos of the same person at the corresponding age (right in each pair). Photo: U of Washington

“Our extensive user studies demonstrated age progression results that are so convincing that people can’t distinguish them from reality,” said co-author , a UW professor of computer science and engineering. “When shown images of an age-progressed child photo and a photo of the same person as an adult, people are unable to reliably identify which one is the real photo.”

Real-life photos of children are difficult to age-progress, partly due to variable lighting, shadows, funny expressions and even milk moustaches. To compensate for these effects, the algorithm first automatically corrects for tilted faces, turned heads and inconsistent lighting, then applies the computed shape and appearance changes to the new child’s face.

Perhaps the most common application of age progression work is for rendering older versions of missing children. These renderings usually are created manually by an artist who uses photos of the child as well as family members, and editing software to account for common changes to a child’s face as it ages, including vertical stretching, wrinkles and a longer nose.

But this process takes time, and it’s significantly harder to produce an accurate image for children younger than age 5, when facial features more closely resemble that of a baby.

: In each of these morphs, the left image is the starting input photo and the right image will transform to age 80 to show the automatic aging process.

The automatic age-progression software can run on a standard computer and takes about 30 seconds to generate results for one face. While this method considered gender and age, the research team that also includes UW doctoral student Supasorn Suwajanakorn hopes to incorporate other identifiers such as ethnicity, and cosmetic factors such as hair whitening and wrinkles to build a robust enough method for representing every human face.

“I’m really interested in trying to find some representation of everyone in the world by leveraging the massive amounts of captured face photos,” Kemelmacher-Shlizerman said. “The aging process is one of many dimensions to consider.”

This research was funded by Google and Intel Corp.

###

For more information, contact Kemelmacher-Shlizerman at kemelmi@cs.washington.edu or 206-616-0621.