Steve Seitz – UW News

UW Reality Lab launches with $6M from tech companies to advance augmented and virtual reality research

Jennifer Langston — Mon, 08 Jan 2018 14:58:40 +0000

The UW Reality Lab will focus on developing next-generation virtual and augmented reality technologies and educating an industry workforce. In this holographic chess game developed by UW students, opponents move pieces that can only be seen through a virtual reality headset. Photo: Dennis Wise/��Ӱ�Ӵ�ý

The ��Ӱ�Ӵ�ý is launching a new augmented and virtual reality research center — funded by Facebook, Google, and Huawei — to accelerate innovation in the field and educate the next generation of researchers and practitioners.

The $6 million , funded with equal contributions from the three initial sponsors, creates one of the world’s first academic centers dedicated to virtual and augmented reality. The new center in the Paul G. Allen School of Computer Science & Engineering and located in Seattle — a national hub of VR activity — will support research and education initiatives with potential to deliver game-changing breakthroughs in the field.

“Allen School faculty have produced pioneering research in many of the areas that underpin AR and VR technologies, including computer vision, graphics, perception, and machine learning,” said , Allen School director and Wissner-Slivka Chair in Computer Science and Engineering. “Through our partnership with Facebook, Google, and Huawei, the Allen School and UW will be at the forefront of the next great wave of AR and VR innovation — pursuing breakthrough research and educating the next generation of innovators in this exciting and rapidly expanding field.”

To date, AR and VR applications are making their first steps, mostly focusing on entertainment and games. Yet everyone is interested to find the “killer app” for AR and VR. The goal of the UW Reality Lab is to develop technology to power the next generation of applications that will speak to a wider population. Those diverse ideas range from learning Spanish by seeing objects labeled in your field of view to achieving telepresence by conversing with a remote relative or co-worker as if you were in the same room.

UW Reality Lab Advisory Board

Michael Abrash, Chief Scientist, Oculus
Michael Cohen, Director, Computational Photography Group, Facebook
Paul Debevec, Senior Researcher at Google Daydream and Adjunct Research Professor at the University of Southern California’s Institute for Creative Technologies
Shahram Izadi, CTO, PerceptiveIO
Wei Su, Senior Architect of Fields Lab, Huawei Seattle Research Center
Fan Zhang, Chief Architect, Head of Fields Lab, Huawei Seattle Research Center

“We’re seeing some really compelling and high quality AR and VR experiences being built today,” said center co-lead and Allen School professor . “But, there are still many core research advances needed to move the industry forward — tools for easily creating content, infrastructure solutions for streaming 3D video, and privacy and security safeguards — that university researchers are uniquely positioned to tackle.”

The UW Reality Lab will bring together an interdisciplinary team of UW faculty, graduate students and undergraduates working in 3D computer vision and perception, object recognition, graphics, game science and education, distributed computing, stream processing, databases, computer architecture, and privacy and security.

Another key function of the UW Reality Lab will be to educate tomorrow’s AR and VR researchers and workers. The funding will support new courses and access to state-of-the-art labs and infrastructure for UW students to develop new technologies and applications. That includes accessing emerging technologies from the center’s sponsors and allowing those companies to test new ideas in a focused setting with computer science students. An advisory board of luminaries from across the AR and VR community will help the center remain at the forefront of this burgeoning field.

The UW Reality Lab builds on the Allen School’s established leadership in cutting-edge AR/VR education and research. In one example, the school in 2016 introduced the world’s , in which students built AR applications using 40 HoloLens units loaned from Microsoft before they were commercially available.

One goal of the UW Reality Lab — funded with initial investments from Facebook, Google and Huawei — is to achieve telepresence, allowing one to have a lifelike conversation with a person in a remote location. Photo: Dennis Wise/��Ӱ�Ӵ�ý

“Students had fantastic ideas and were able to create amazing AR and VR applications ranging from Holographic Chess to teaching one how to play the piano or cook. This opened our eyes to the potential of investing deeper in development of algorithms and applications for AR and VR,” said center co-lead and Allen School assistant professor . “We realized there were so many cool things we could do if only we had more resources, more time and more devices. Given those, we can help bring the world’s AR and VR dreams to life.”

The UW Reality Lab’s location in Seattle — one of the world’s most active centers for VR and AR innovation — paves the way for unique industry and academic collaborations aimed at achieving new capabilities and offering users seamless experiences.

“Having an opportunity to be at the leading edge of this industry is really exciting,” said co-lead and Allen School professor . “It’s big, it’s happening now and there’s a lot of research to be done. We’re thrilled to take a leading role in making it all happen.”

For more information, contact realitylab@cs.washington.edu or on Twitter.

Lip-syncing Obama: New tools turn audio clips into realistic video

Jennifer Langston — Tue, 11 Jul 2017 16:27:33 +0000

��Ӱ�Ӵ�ý researchers have developed new algorithms that solve a thorny challenge in the field of computer vision: of the person speaking those words.

As detailed in a to be presented Aug. 2 at , the team successfully generated of former president Barack Obama talking about terrorism, fatherhood, job creation and other topics using audio clips of those speeches and existing weekly video addresses that were originally on a different topic.

“These type of results have never been shown before,” said an assistant professor at the UW’s Paul G. Allen School of Computer Science & Engineering. “Realistic audio-to-video conversion has practical applications like improving video conferencing for meetings, as well as futuristic ones such as being able to hold a conversation with a historical figure in virtual reality by creating visuals just from audio. This is the kind of breakthrough that will help enable those next steps.”

In a visual form of lip-syncing, the system converts audio files of an individual’s speech into realistic mouth shapes, which are then grafted onto and blended with the head of that person from another existing video.

The team chose Obama because the machine learning technique needs available video of the person to learn from, and there were hours of presidential videos in the public domain. “In the future video, chat tools like Skype or Messenger will enable anyone to collect videos that could be used to train computer models,” Kemelmacher-Shlizerman said.

Because streaming audio over the internet takes up far less bandwidth than video, the new system has the potential to end video chats that are constantly timing out from poor connections.

“When you watch Skype or Google Hangouts, often the connection is stuttery and low-resolution and really unpleasant, but often the audio is pretty good,” said co-author and Allen School professor . “So if you could use the audio to produce much higher-quality video, that would be terrific.”

By reversing the process — feeding video into the network instead of just audio — the team could also potentially develop algorithms that could detect whether a video is real or manufactured.

The new machine learning tool makes significant progress in overcoming what’s known as the “” problem, which has dogged efforts to create realistic video from audio. When synthesized human likenesses appear to be almost real — but still manage to somehow miss the mark — people find them creepy or off-putting.

“People are particularly sensitive to any areas of your mouth that don’t look realistic,” said lead author , a recent doctoral graduate in the Allen School. “If you don’t render teeth right or the chin moves at the wrong time, people can spot it right away and it’s going to look fake. So you have to render the mouth region perfectly to get beyond the uncanny valley.”

A neural network first converts the sounds from an audio file into basic mouth shapes. Then the system grafts and blends those mouth shapes onto an existing target video and adjusts the timing to create a new realistic, lip-synced video. Photo: ��Ӱ�Ӵ�ý

Previously, audio-to-video conversion processes have involved filming multiple people in a studio saying the same sentences over and over to try to capture how a particular sound correlates to different mouth shapes, which is expensive, tedious and time-consuming. By contrast, Suwajanakorn developed algorithms that can learn from videos that exist “in the wild” on the internet or elsewhere.

“There are millions of hours of video that already exist from interviews, video chats, movies, television programs and other sources. And these deep learning algorithms are very data hungry, so it’s a good match to do it this way,” Suwajanakorn said.

Rather than synthesizing the final video directly from audio, the team tackled the problem in two steps. The first involved training a neural network to watch videos of an individual and translate different audio sounds into basic mouth shapes.

By combining from the team with a new mouth synthesis technique, they were then able to realistically superimpose and blend those mouth shapes and textures on an existing reference video of that person. Another key insight was to allow a small time shift to enable the neural network to anticipate what the speaker is going to say next.

The new lip-syncing process enabled the researchers to create realistic videos of Obama speaking in the White House, using words he spoke on a television talk show or during an interview decades ago.

Currently, the neural network is designed to learn on one individual at a time, meaning that Obama’s voice — speaking words he actually uttered — is the only information used to “drive” the synthesized video. Future steps, however, include helping the algorithms generalize across situations to recognize a person’s voice and speech patterns with less data – with only an hour of video to learn from, for instance, instead of 14 hours.

“You can’t just take anyone’s voice and turn it into an Obama video,” Seitz said. “We very consciously decided against going down the path of putting other people’s words into someone’s mouth. We’re simply taking real words that someone spoke and turning them into realistic video of that individual.”

The research was funded by Samsung, Google, Facebook, Intel and the UW Animation Research Labs.

For more information, contact the research team at audiolipsync@cs.washington.edu.

What makes Tom Hanks look like Tom Hanks?

Jennifer Langston — Mon, 07 Dec 2015 17:03:45 +0000

Tom Hanks has appeared in many acting roles over the years, playing young and old, smart and simple. Yet we always recognize him as Tom Hanks.

Why? Is it his appearance? His mannerisms? The way he moves?

UW researchers have reconstructed 3-D models of celebrities such as Tom Hanks from large Internet photo collections. The models can be controlled by photos or videos of another person. Photo: ��Ӱ�Ӵ�ý

��Ӱ�Ӵ�ý researchers that it’s possible for machine learning algorithms to capture the “persona” and create a digital model of a well-photographed person like Tom Hanks from the vast number of images of them available on the Internet.

With enough visual data to mine, the algorithms can also animate the digital model of Tom Hanks to deliver speeches that the real actor never performed.

“One answer to what makes Tom Hanks look like Tom Hanks can be demonstrated with a computer system that imitates what Tom Hanks will do,” said lead author , a UW graduate student in computer science and engineering.

The technology relies on advances in 3-D face reconstruction, tracking, alignment, multi-texture modeling and puppeteering that have been developed over the last five years by a research group led by UW assistant professor of computer science and engineering . The new results will be presented in a at the in Chile on Dec. 16.

The team’s latest advances include the ability to transfer expressions and the way a particular person speaks onto the face of someone else — for instance, mapping former president George W. Bush’s mannerisms onto the faces of other politicians and celebrities.

It’s one step toward a grand goal shared by the UW computer vision researchers: creating fully interactive, three-dimensional digital personas from family photo albums and videos, historic collections or other existing visuals.

As virtual and augmented reality technologies develop, they envision using family photographs and videos to create an interactive model of a relative living overseas or a far-away grandparent, rather than simply Skyping in two dimensions.

“You might one day be able to put on a pair of augmented reality glasses and there is a 3-D model of your mother on the couch,” said senior author Kemelmacher-Shlizerman. “Such technology doesn’t exist yet — the display technology is moving forward really fast — but how do you actually re-create your mother in three dimensions?”

One day the reconstruction technology could be taken a step further, researchers say.

“Imagine being able to have a conversation with anyone you can’t actually get to meet in person — LeBron James, Barack Obama, Charlie Chaplin — and interact with them,” said co-author , UW professor of computer science and engineering. “We’re trying to get there through a series of research steps. One of the true tests is can you have them say things that they didn’t say but it still feels like them? This paper is demonstrating that ability.”

Existing technologies to create detailed three-dimensional or digital movie characters like often rely on bringing a person into an elaborate studio. They painstakingly capture every angle of the person and the way they move — something that can’t be done in a living room.

Other approaches still require a person to be scanned by a camera to create basic avatars for video games or other virtual environments. But the UW computer vision experts wanted to digitally reconstruct a person based solely on a random collection of existing images.

To reconstruct celebrities like Tom Hanks, Barack Obama and Daniel Craig, the machine learning algorithms mined a minimum of 200 Internet images taken over time in various scenarios and poses — a process known as learning “in the wild.”

“We asked, ‘Can you take Internet photos or your personal photo collection and animate a model without having that person interact with a camera?'” said Kemelmacher-Shlizerman. “Over the years we created algorithms that work with this kind of unconstrained data, which is a big deal.”

Suwajanakorn more recently developed techniques to capture expression-dependent textures — small differences that occur when a person smiles or looks puzzled or moves his or her mouth, for example.

By manipulating the lighting conditions across different photographs, he developed a new approach to densely map the differences from one person’s features and expressions onto another person’s face. That breakthrough enables the team to “control” the digital model with a video of another person, and could potentially enable a host of new animation and virtual reality applications.

“How do you map one person’s performance onto someone else’s face without losing their identity?” said Seitz. “That’s one of the more interesting aspects of this work. We’ve shown you can have George Bush’s expressions and mouth and movements, but it still looks like George Clooney.”

The research was funded by Samsung, Google, Intel and the ��Ӱ�Ӵ�ý.

For more information, contact Suwajanakorn at supasorn@cs.washington.edu or Kemelmacher-Shilzerman at kemelmi@uw.edu.

Automated age-progression software lets you see how a child will age

Michelle Ma — Wed, 09 Apr 2014 15:20:53 +0000

It’s a guessing game parents like to ponder: What will my child look like when she grows up? A computer could now answer the question in less than a minute.

��Ӱ�Ӵ�ý researchers have that automatically generates images of a young child’s face as it ages through a lifetime. The technique is the first fully automated approach for aging babies to adults that works with variable lighting, expressions and poses.

Using one photo of a 3-year-old, the software automatically renders images of his face at multiple ages while keeping his identity (and the milk moustache). Photo: U of Washington

“Aging photos of very young children from a single photo is considered the most difficult of all scenarios, so we wanted to focus specifically on this very challenging case,” said , a UW assistant professor of computer science and engineering. “We took photos of children in completely unrestrained conditions and found that our method works remarkably well.”

The research team has posted a and will present its findings at the June in Columbus, Ohio.

of age-progressed photos.

The shape and appearance of a baby’s face – and variety of expressions – often change drastically by adulthood, making it hard to model and predict that change. This technique leverages the average of thousands of faces of the same age and gender, then calculates the visual changes between groups as they age to apply those changes to a new person’s face.

More specifically, the software determines the average pixel arrangement from thousands of random Internet photos of faces in different age and gender brackets. An algorithm then finds correspondences between the averages from each bracket and calculates the average change in facial shape and appearance between ages. These changes are then applied to a new child’s photo to predict how she or he will appear for any subsequent age up to 80.

The researchers tested their rendered images against those of 82 actual people photographed over a span of years. In an experiment asking random users to identify the correct aged photo for each example, they found that users picked the automatically rendered photos about as often as the real-life ones.

A single photo of a child (far left) is age progressed (left in each pair) and compared to actual photos of the same person at the corresponding age (right in each pair). Photo: U of Washington

“Our extensive user studies demonstrated age progression results that are so convincing that people can’t distinguish them from reality,” said co-author , a UW professor of computer science and engineering. “When shown images of an age-progressed child photo and a photo of the same person as an adult, people are unable to reliably identify which one is the real photo.”

Real-life photos of children are difficult to age-progress, partly due to variable lighting, shadows, funny expressions and even milk moustaches. To compensate for these effects, the algorithm first automatically corrects for tilted faces, turned heads and inconsistent lighting, then applies the computed shape and appearance changes to the new child’s face.

Perhaps the most common application of age progression work is for rendering older versions of missing children. These renderings usually are created manually by an artist who uses photos of the child as well as family members, and editing software to account for common changes to a child’s face as it ages, including vertical stretching, wrinkles and a longer nose.

But this process takes time, and it’s significantly harder to produce an accurate image for children younger than age 5, when facial features more closely resemble that of a baby.

: In each of these morphs, the left image is the starting input photo and the right image will transform to age 80 to show the automatic aging process.

The automatic age-progression software can run on a standard computer and takes about 30 seconds to generate results for one face. While this method considered gender and age, the research team that also includes UW doctoral student Supasorn Suwajanakorn hopes to incorporate other identifiers such as ethnicity, and cosmetic factors such as hair whitening and wrinkles to build a robust enough method for representing every human face.

“I’m really interested in trying to find some representation of everyone in the world by leveraging the massive amounts of captured face photos,” Kemelmacher-Shlizerman said. “The aging process is one of many dimensions to consider.”

This research was funded by Google and Intel Corp.

###

For more information, contact Kemelmacher-Shlizerman at kemelmi@cs.washington.edu or 206-616-0621.