Shyam Gollakota – UW News /news Tue, 14 Apr 2026 14:38:50 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 Tiny cameras in earbuds let users talk with AI about what they see /news/2026/04/14/cameras-in-wireless-earbuds-vuebuds/ Tue, 14 Apr 2026 14:38:00 +0000 /news/?p=91232 Two black earbuds: one with the casing removed exposing a computer chip and tiny camera.
UW researchers developed a system called VueBuds that uses tiny cameras in off-the-shelf wireless earbuds to allow users to talk with an AI model about the scene in front of them. Here, the altered headphones are shown with the camera inserted. Photo: Kim et al./CHI 鈥26

天美影视传媒 researchers developed the first system that incorporates tiny cameras in off-the-shelf wireless earbuds to allow users to talk with an AI model about the scene in front of them. For instance, a user might turn to a Korean food package and say, 鈥淗ey Vue, translate this for me.鈥 They鈥檇 then hear an AI voice say, 鈥淭he visible text translates to 鈥楥old Noodles鈥 in English.鈥

The prototype system called VueBuds takes low-resolution, black-and-white images, which it transmits over Bluetooth to a phone or other nearby device. A small artificial intelligence model on the device then answers questions about the images within around a second. For privacy, all of the processing happens on the device, a small light turns on when the system is recording, and users can immediately delete images.听

The team will April 14 at the Association for Computing Machinery Conference on Human Factors in Computing Systems in Barcelona.听

鈥淲e haven鈥檛 seen most people adopt smart glasses or VR headsets, in part because a lot of people don鈥檛 like wearing glasses, and they often come with , such as recording high-resolution video and processing it in the cloud,鈥 said senior author , a UW professor in the Paul G. Allen School of Computer Science & Engineering. 鈥淏ut almost everyone wears earbuds already, so we wanted to see if we could put visual intelligence into tiny, low-power earbuds, and also address privacy concerns in the process.鈥

Cameras use far more power than the microphones already in earbuds, so using the same sort of high-res cameras as those in smart glasses wouldn鈥檛 work. Also, large amounts of information can鈥檛 stream continuously over Bluetooth, so the system can鈥檛 run continuous video.听

The team found that using a low-power camera 鈥 roughly the size of a grain of rice 鈥 to shoot low-resolution, black-and-white still images limited battery drain and allowed for Bluetooth transmission while preserving performance.

There was also the matter of placement.听

鈥淥ne big question we had was: Will your face obscure the view too much? Can earbud cameras capture the user鈥檚 view of the world reliably?鈥 said lead author , who completed this work as a UW doctoral student in the Allen School.听

The team found that angling each camera 5-10 degrees outward provides a 98-108 degree field of view. While this creates a small blind spot when objects are held closer than 20 centimeters from the user, people rarely hold things that close to examine them 鈥 making it a non-issue for typical interactions.

Researchers also discovered that while the vision language model was largely able to make sense of the images from each earbud, having to process images from both earbuds slowed it down. So they had the system 鈥渟titch鈥 the two images into one, identifying overlapping imagery and combining it. This allows the system to respond in one second 鈥 quick enough to feel like real-time for users 鈥 rather than the two seconds it takes with separate images.

The team then had 74 participants compare recorded outputs from VueBuds with outputs from Ray-Ban Meta Glasses in a series of tests. Despite VueBuds using low-resolution images with greater privacy controls and the Ray-Bans taking high-res images processed on the cloud, the two systems performed equivalently. Participants preferred VueBuds鈥 translations, while the Ray-Bans did better at counting objects.

Sixteen participants also wore VueBuds and tested the system鈥檚 ability to translate and answer basic questions about objects. VueBuds achieved 83-84% accuracy when translating or identifying objects and 93% when identifying the author and title of a book.

This study was designed to gauge the feasibility of integrating cameras in wireless earbuds. Since the system only takes grayscale images, it can鈥檛 answer questions that involve color in the scene.听

The team wants to add color to the system 鈥 color cameras require more power 鈥 and to train specialized AI models for specific use cases, such as translation.听听

鈥淭his study lets us glimpse what鈥檚 possible just using a general purpose language model and our wireless earbuds with cameras,鈥 Kim said. 鈥淏ut we鈥檇 like to study the system more rigorously for applications like reading a book 鈥 for people who have low vision or are blind, for instance 鈥 or translating text for travelers.鈥澨

Co-authors include , a UW master鈥檚 student in the Allen School, and , , , and , all UW students in electrical and computer engineering.听

For more information, contact vuebuds@cs.washington.edu.

]]>
AI headphones automatically learn who you鈥檙e talking to 鈥 and let you hear them better /news/2025/12/09/ai-headphones-smart-noise-cancellation-proactive-listening/ Tue, 09 Dec 2025 17:30:37 +0000 /news/?p=89888

UPDATE (Dec. 12, 2025): This story has been updated to correct Malek Itani’s department.

Holding a conversation in a crowded room often leads to the frustrating 鈥渃ocktail party problem,鈥 or the challenge of separating the voices of conversation partners from a hubbub. It鈥檚 a mentally taxing situation that can be exacerbated by hearing impairment.听

As a solution to this common conundrum, researchers at the 天美影视传媒 have developed that proactively isolate all the wearer鈥檚 conversation partners in a noisy soundscape. The headphones are powered by an AI model that detects the cadence of a conversation and another model that mutes any voices which don鈥檛 follow that pattern, along with other unwanted background noises. The prototype uses off-the-shelf hardware and can identify conversation partners using just two to four seconds of audio.

The system鈥檚 developers think the technology could one day help users of hearing aids, earbuds and smart glasses to filter their soundscapes without the need to manually direct the AI鈥檚 鈥渁ttention.鈥

The team Nov. 7 in Suzhou, China at the Conference on Empirical Methods in Natural Language Processing. The underlying code is open-source and .

鈥淓xisting approaches to identifying who the wearer is listening to predominantly involve electrodes implanted in the brain to track attention,鈥 said senior author , a UW professor in the Paul G. Allen School of Computer Science & Engineering. 鈥淥ur insight is that when we鈥檙e conversing with a specific group of people, our speech naturally follows a turn-taking rhythm. And we can train AI to predict and track those rhythms using only audio, without the need for implanting electrodes.鈥

Related:

  • For more information, visit
  • Story from

The prototype system, dubbed 鈥減roactive hearing assistants,鈥 activates when the person wearing the headphones begins speaking. From there, one AI model begins tracking conversation participants by performing a 鈥渨ho spoke when鈥 analysis and looking for low overlap in exchanges. The system then forwards the result to a second model which isolates the participants and plays the cleaned up audio for the wearer. The system is fast enough to avoid confusing audio lag for the user, and can currently juggle one to four conversation partners in addition to the wearer鈥檚 audio.

The team tested the headphones with 11 participants, who rated qualities like noise suppression and comprehension with and without the AI filtration. Overall, the group rated the filtered audio more than twice as favorably as the baseline.听

A pair of headphones with a curly black microphone taped to one ear cup.
The team combined off-the-shelf noise-canceling headphones with binaural microphones to create the prototype, pictured here. Photo: Hu et al./EMNLP

Gollakota鈥檚 team has been experimenting with AI-powered hearing assistants for the past few years. They developed one smart headphone prototype that can pick a person鈥檚 audio out of a crowd when the wearer looks at them, and another that creates a 鈥渟ound bubble鈥 by muting all sounds within a set distance of the wearer.听

鈥淓verything we鈥檝e done previously requires the user to manually select a specific speaker or a distance within which to listen, which is not great for user experience,鈥 said lead author Guilin Hu, a doctoral student in the Allen School. 鈥淲hat we鈥檝e demonstrated is a technology that鈥檚 proactive 鈥 something that infers human intent noninvasively and automatically.鈥

Plenty of work remains to refine the experience. The more dynamic a conversation gets, the more the system is likely to struggle, as participants talk over one another or speak in longer monologues. Participants entering and leaving a conversation present another hurdle, though Gollakota was surprised by how well the current prototype performed in these more complicated scenarios. The authors also note that the models were tested on English, Mandarin and Japanese dialog, and that the rhythms of other languages might require further fine-tuning.

The current prototype uses commercial over-the-ear headphones, microphones and circuitry. Eventually, Gollakota expects to make the system small enough to run on a tiny chip within an earbud or a hearing aid. In that appeared at , the authors demonstrated that it is possible to run AI models on tiny hearing aid devices.

Co-authors include, a UW doctoral student in the Allen School; and , a UW doctoral student in the electrical and computer engineering department.

This research was funded by the Moore Inventor Fellows program.

For more information, contact proactivehearing@cs.washington.edu

]]>
AI headphones translate multiple speakers at once, cloning their voices in 3D sound /news/2025/05/09/ai-headphones-translate-multiple-speakers-at-once-cloning-their-voices-in-3d-sound/ Fri, 09 May 2025 16:02:02 +0000 /news/?p=88056

, a 天美影视传媒 doctoral student, recently toured a museum in Mexico. Chen doesn鈥檛 speak Spanish, so he ran a translation app on his phone and pointed the microphone at the tour guide. But even in a museum鈥檚 relative quiet, the surrounding noise was too much. The resulting text was useless.

Various technologies have emerged lately promising fluent translation, but none of these solved Chen鈥檚 problem of public spaces. , for instance, function only with an isolated speaker; they after the speaker finishes.

Now, Chen and a team of UW researchers have designed at once, while preserving the direction and qualities of people鈥檚 voices. The team built the system, called Spatial Speech Translation, with off-the-shelf noise-cancelling headphones fitted with microphones. The team鈥檚 algorithms separate out the different speakers in a space and follow them as they move, translate their speech and play it back with a 2-4 second delay.

The Apr. 30 at the ACM CHI Conference on Human Factors in Computing Systems in Yokohama, Japan. The code for the proof-of-concept device is available for others to build on. 鈥淥ther translation tech is built on the assumption that only one person is speaking,鈥 said senior author , a UW professor in the Paul G. Allen School of Computer Science & Engineering. 鈥淏ut in the real world, you can鈥檛 have just one robotic voice talking for multiple people in a room. For the first time, we鈥檝e preserved the sound of each person鈥檚 voice and the direction it鈥檚 coming from.鈥

Related:

  • Story in
  • For more information, visit听

The system makes three innovations. First, when turned on, it immediately detects how many speakers are in an indoor or outdoor space.

鈥淥ur algorithms work a little like radar,鈥 said lead author Chen, a UW doctoral student in the Allen School. 鈥淪o it鈥檚 scanning the space in 360 degrees and constantly determining and updating whether there鈥檚 one person or six or seven.鈥

The system then translates the speech and maintains the expressive qualities and volume of each speaker鈥檚 voice while running on a device, such mobile devices with an Apple M2 chip like laptops and Apple Vision Pro. (The team avoided using cloud computing because of the privacy concerns with voice cloning.) Finally, when speakers move their heads, the system continues to track the direction and qualities of their voices as they change.

The system functioned when tested in 10 indoor and outdoor settings. And in a 29-participant test, the users preferred the system over models that didn鈥檛 track speakers through space.

In a separate user test, most participants preferred a delay of 3-4 seconds, since the system made more errors when translating with a delay of 1-2 seconds. The team is working to reduce the speed of translation in future iterations. The system currently only works on commonplace speech, not specialized language such as technical jargon. For this paper, the team worked with Spanish, German and French 鈥 but previous work on translation models has shown they can be trained to translate around 100 languages.

鈥淭his is a step toward breaking down the language barriers between cultures,鈥 Chen said. 鈥淪o if I鈥檓 walking down the street in Mexico, even though I don鈥檛 speak Spanish, I can translate all the people鈥檚 voices and know who said what.鈥

, a research intern at HydroX AI and a UW undergraduate in the Allen School while completing this research, and , a UW doctoral student in the Allen School, are also co-authors on this paper. This research was funded by a Moore Inventor Fellow award and a .

For more information, contact the researchers at babelfish@cs.washington.edu.听

]]>
A smart ring with a tiny camera lets users point and click to control home devices /news/2025/01/08/smart-ring-camera-iris/ Wed, 08 Jan 2025 17:05:27 +0000 /news/?p=87211

While smart devices in homes have grown to include speakers, security systems, lights and thermostats, the ways to control them have remained relatively stable. Users can interact with a phone, or talk to the tech, but these are frequently less convenient than the simple switches they replace: 鈥淭urn on the lamp鈥. Not that one鈥. Turn up the speaker volume鈥. Not that loud!鈥

天美影视传媒 researchers have developed IRIS, a smart ring that allows users to control smart devices by aiming the ring鈥檚 small camera at the device and clicking a built-in button. The prototype Bluetooth ring sends an image of the selected device to the user鈥檚 phone, which controls the device. The user can adjust the device with the button and 鈥 for devices with gradient controls, such as a speaker鈥檚 volume 鈥 by rotating their hand. IRIS, or Interactive Ring for Interfacing with Smart home devices, operates off a charge for 16-24 hours.

The team Oct. 16 at the 37th Annual ACM Symposium on User Interface Software and Technology in Pittsburgh. IRIS is not currently available to the public.

鈥淰oice commands can often be really cumbersome,鈥 said co-lead author , a UW doctoral student in the Paul G. Allen School of Computer Science & Engineering. 鈥淲e wanted to create something that鈥檚 as simple and intuitive as clicking on an icon on your computer desktop.鈥

A white ring beside a circuit board and a quarter
UW researchers have developed IRIS, a smart ring that allows users to point and click to control smart devices. Here, the ring (left) is shown beside its circuit board and battery and a quarter. Photo: Kim et al./UIST 鈥24

The team decided to put the system in a ring because they believed users would realistically wear that throughout the day. The challenge, then, was integrating a camera into a wireless smart ring with its size and power constraints. The system also had to toggle devices in under a second; otherwise, users tend to think it is not working.

To achieve this, researchers had the ring compress the images before sending them to a phone. Rather than streaming images all the time, the ring gets activated when the user clicks the button, then turns off after 3 seconds of inactivity.

Related:

For more information, visit

In a study with 23 participants, twice as many users preferred IRIS over a voice command system alone (in this case, Apple鈥檚 Siri). On average, IRIS controlled home devices more than two seconds faster than voice commands.

鈥淚n the future, integrating the IRIS camera system into a health-tracking smart ring would be a transformative step for smart rings,鈥 Kim said. 鈥淚t鈥檇 let smart rings actually augment or improve human capability, rather than just telling you your step count or heart rate.鈥

, 鈥斕齜oth UW doctoral students in the Allen School 鈥 were co-lead authors on the study, and , a UW professor in the Allen School, was the senior author. Additional co-authors include , a UW research assistant in the Allen School; , a UW undergraduate in the Allen School; , a UW master鈥檚 student in the Allen School; and , a UW professor in the Allen School. This research was funded by a Moore Inventor Fellow award and the National Science Foundation.

For more information, contact iris@cs.washington.edu.

]]>
AI headphones create a 鈥榮ound bubble,鈥 quieting all sounds more than a few feet away /news/2024/11/14/ai-headphones-sound-bubble-noise-cancelling/ Thu, 14 Nov 2024 16:00:11 +0000 /news/?p=86835

 

Imagine this: You鈥檙e at an office job, wearing noise-canceling headphones to dampen the ambient chatter. A co-worker arrives at your desk and asks a question, but rather than needing to remove the headphones and say, 鈥淲hat?鈥, you hear the question clearly. Meanwhile the water-cooler chat across the room remains muted. Or imagine being in a busy restaurant and hearing everyone at your table, but reducing the other speakers and noise in the restaurant.

A team led by researchers at the 天美影视传媒 has created a headphone prototype that allows listeners to create just such a 鈥渟ound bubble.鈥 The team鈥檚 artificial intelligence algorithms combined with a headphone prototype allow the wearer to hear people speaking within a bubble with a programmable radius of 3 to 6 feet. Voices and sounds outside the bubble are quieted an average of 49 decibels (approximately ), even if the distant sounds are louder than those inside the bubble.

The team Nov. 14 in Nature Electronics. The code for the proof-of-concept device is available for others to build on. The researchers are creating a startup to commercialize this technology.

鈥淗umans aren鈥檛 great at perceiving distances through sound, particularly when there are multiple sound sources around them,鈥 said senior author , a UW professor in the Paul G. Allen School of Computer Science & Engineering. 鈥淥ur abilities to focus on the people in our vicinity can be limited in places like loud restaurants, so creating sound bubbles on a has not been possible so far. Our AI system can actually learn the distance for each sound source in a room, and process this in real time, within 8 milliseconds, on the hearing device itself.鈥

Researchers created the prototype with commercially available noise-canceling headphones. They affixed six small microphones across the headband. The team鈥檚 neural network 鈥 running on a small attached to the headphones 鈥 tracks when different sounds reach each microphone. The system then suppresses the sounds coming from outside the bubble, while playing back and slightly amplifying the sounds inside the bubble (because noise-canceling headphones physically let some sound through).

鈥淲e鈥檇 worked on a previous smart-speaker system where we spread the microphones across a table because we thought we needed significant distances between microphones to extract distance information about sounds,鈥 Gollakota said. 鈥淏ut then we started questioning our assumption. Do we need a big separation to create this 鈥榮ound bubble鈥? What we showed here is that we don鈥檛. We were able to do it with just the microphones on the headphones, and in real-time, which was quite surprising.鈥

To train the system to create sound bubbles in different environments, researchers needed a distance-based sound dataset collected in the real-world, which was not available. To gather such a dataset, they put the headphones on a mannequin head. A robotic platform rotated the head while a moving speaker played noises coming from different distances. The team collected data with the mannequin system as well as with human users in 22 different indoor environments, including offices and living spaces.

A man wears Sony headphones with wires and a chip visible on the outside.
The team created a prototype using off the shelf headphones fitted with microphones, pictured here. Photo: Chen et al./Nature Electronics

Researchers have determined that the system works for a couple of reasons. First, the wearer鈥檚 head reflects sounds, which helps the neural net distinguish sounds from various distances. Second, sounds (like human speech) have multiple frequencies, each of which goes through different phases as it travels from its source. The team鈥檚 AI algorithm, the researchers believe, is comparing the phases of each of these frequencies to determine the distance of any sound source (a person talking, for instance).

Headphones like Apple鈥檚 AirPods Pro 2 . But these features work by tracking head position and amplifying the sound coming from a specific direction, rather than gauging distance. This means the headphones can鈥檛 amplify multiple speakers at once, lose functionality if the wearer turns their head away from the target speaker, and aren鈥檛 as effective at reducing loud sounds from the speaker鈥檚 direction.

The system has been trained to work only indoors, because getting clean training audio is more difficult outdoors. Next, the team is working to make the technology function on hearing aids and noise-canceling earbuds, which requires a new strategy for positioning the microphones.

Additional co-authors are and , UW doctoral students in the Allen School; , a senior researcher at Microsoft; and , director of research at AssemblyAI. This research was funded by a Moore Inventor Fellow award, a and the National Science Foundation.

For more information, contact soundbubble@cs.washington.edu.

]]>
AI headphones let wearer listen to a single person in a crowd, by looking at them just once /news/2024/05/23/ai-headphones-noise-cancelling-target-speech-hearing/ Thu, 23 May 2024 16:36:42 +0000 /news/?p=85538

Noise-canceling headphones have gotten very good at creating an auditory blank slate. But allowing certain sounds from a wearer鈥檚 environment through the erasure still challenges researchers. The latest edition of Apple鈥檚 AirPods Pro, for instance, for wearers 鈥 sensing when they鈥檙e in conversation, for instance 鈥 but the user has little control over whom to listen to or when this happens.

A 天美影视传媒 team has developed an artificial intelligence system that lets a user wearing headphones look at a person speaking for three to five seconds to 鈥渆nroll鈥 them. The system, called 鈥淭arget Speech Hearing,鈥 then cancels all other sounds in the environment and plays just the enrolled speaker鈥檚 voice in real time even as the listener moves around in noisy places and no longer faces the speaker.

The team presented May 14 in Honolulu at the ACM CHI Conference on Human Factors in Computing Systems. The is available for others to build on. The system is not commercially available.

鈥淲e tend to think of AI now as web-based chatbots that answer questions,鈥 said senior author , a UW professor in the Paul G. Allen School of Computer Science & Engineering. 鈥淏ut in this project, we develop AI to modify the auditory perception of anyone wearing headphones, given their preferences. With our devices you can now hear a single speaker clearly even if you are in a noisy environment with lots of other people talking.鈥

To use the system, a person wearing off-the-shelf headphones fitted with microphones taps a button while directing their head at someone talking. The sound waves from that speaker鈥檚 voice then should reach the microphones on both sides of the headset simultaneously; there鈥檚 a 16-degree margin of error. The headphones send that signal to an , where the team鈥檚 machine learning software learns the desired speaker鈥檚 vocal patterns. The system latches onto that speaker鈥檚 voice and continues to play it back to the listener, even as the pair moves around. The system鈥檚 ability to focus on the enrolled voice improves as the speaker keeps talking, giving the system more training data.

Related:

  • For more information, visit
  • Stories from and

The team tested its system on 21 subjects, who rated the clarity of the enrolled speaker鈥檚 voice nearly twice as high as the unfiltered audio on average.

This work builds on the team鈥檚 previous 鈥渟emantic hearing鈥 research, which allowed users to select specific sound classes 鈥 such as birds or voices 鈥 that they wanted to hear and canceled other sounds in the environment.

Currently the TSH system can enroll only one speaker at a time, and it鈥檚 only able to enroll a speaker when there is not another loud voice coming from the same direction as the target speaker鈥檚 voice. If a user isn鈥檛 happy with the sound quality, they can run another enrollment on the speaker to improve the clarity.

The team is working to expand the system to earbuds and hearing aids in the future.

Additional co-authors on the paper were , and , UW doctoral students in the Allen School, and , director of research at AssemblyAI. This research was funded by a Moore Inventor Fellow award, a and a .

For more information, contact tsh@cs.washington.edu.

]]>
New AI noise-canceling headphone technology lets wearers pick which sounds they hear /news/2023/11/09/ai-noise-canceling-headphones/ Thu, 09 Nov 2023 16:26:42 +0000 /news/?p=83414 A man wearing a surgical mask and headphones walks through the 天美影视传媒 campus while holding a smartphone. People walk behind him.
A team led by researchers at the 天美影视传媒 has developed deep-learning algorithms that let users pick which sounds filter through their headphones in real time. Pictured is co-author Malek Itani demonstrating the system. Photo: 天美影视传媒

Most anyone who鈥檚 used noise-canceling headphones knows that hearing the right noise at the right time can be vital. Someone might want to erase car horns when working indoors, but not when walking along busy streets. Yet people can鈥檛 choose what sounds their headphones cancel.

Now, a team led by researchers at the 天美影视传媒 has developed deep-learning algorithms that let users pick which sounds filter through their headphones in real time. The team is calling the system 鈥渟emantic hearing.鈥 Headphones stream captured audio to a connected smartphone, which cancels all environmental sounds. Either through voice commands or a smartphone app, headphone wearers can select which sounds they want to include from 20 classes, such as sirens, baby cries, speech, vacuum cleaners and bird chirps. Only the selected sounds will be played through the headphones.

The team presented Nov. 1 at in San Francisco. In the future, the researchers plan to release a commercial version of the system.

鈥淯nderstanding what a bird sounds like and extracting it from all other sounds in an environment requires real-time intelligence that today鈥檚 noise canceling headphones haven鈥檛 achieved,鈥 said senior author , a UW professor in the Paul G. Allen School of Computer Science & Engineering. 鈥淭he challenge is that the sounds headphone wearers hear need to sync with their visual senses. You can鈥檛 be hearing someone鈥檚 voice two seconds after they talk to you. This means the neural algorithms must process sounds in under a hundredth of a second.鈥

Because of this time crunch, the semantic hearing system must process sounds on a device such as a connected smartphone, instead of on more robust cloud servers. Additionally, because sounds from different directions arrive in people鈥檚 ears at different times, the system must preserve these delays and other spatial cues so people can still meaningfully perceive sounds in their environment.

Related:

  • For more information, see
  • Story from

Tested in environments such as offices, streets and parks, the system was able to extract sirens, bird chirps, alarms and other target sounds, while removing all other real-world noise. When 22 participants rated the system鈥檚 audio output for the target sound, they said that on average the quality improved compared to the original recording.

In some cases, the system struggled to distinguish between sounds that share many properties, such as vocal music and human speech. The researchers note that training the models on more real-world data might improve these outcomes.

Additional co-authors on the paper were and , both UW doctoral students in the Allen School; , who completed this research as a doctoral student in the Allen School and is now at Carnegie Mellon University; and , director of research at AssemblyAI.

For more information, contact semantichearing@cs.washington.edu.

]]>
UW team鈥檚 shape-changing smart speaker lets users mute different areas of a room /news/2023/09/21/shape-changing-smart-speaker-ai-noise-canceling-alexa-robot/ Thu, 21 Sep 2023 15:19:43 +0000 /news/?p=82410 Four people have separate conversations in a meeting room.
A team led by researchers at the 天美影视传媒 has developed a shape-changing smart speaker, which uses self-deploying microphones to divide rooms into speech zones and track the positions of individual speakers. Here UW doctoral students Tuochao Chen (foreground), Mengyi Shan, Malek Itani, and Bandhav Veluri 鈥 all in the Paul G. Allen School of Computer Science & Engineering 鈥 demonstrate the system in a meeting room. Photo: April Hong/天美影视传媒

In virtual meetings, it鈥檚 easy to keep people from talking over each other. Someone just hits mute. But for the most part, this ability doesn鈥檛 translate easily to recording in-person gatherings. In a bustling cafe, there are no buttons to silence the table beside you.

The ability to locate and control sound 鈥 isolating one person talking from a specific location in a crowded room, for instance 鈥 has , especially without visual cues from cameras.

A team led by researchers at the 天美影视传媒 has developed a shape-changing smart speaker, which uses self-deploying microphones to divide rooms into speech zones and track the positions of individual speakers. With the help of the team鈥檚 deep-learning algorithms, the system lets users mute certain areas or separate simultaneous conversations, even if two adjacent people have similar voices. Like a fleet of Roombas, each about an inch in diameter, the microphones automatically deploy from, and then return to, a charging station. This allows the system to be moved between environments and set up automatically. In a conference room meeting, for instance, such a system might be deployed instead of a central microphone, allowing better control of in-room audio.

The team published Sept. 21 in Nature Communications.

鈥淚f I close my eyes and there are 10 people talking in a room, I have no idea who鈥檚 saying what and where they are in the room exactly. That鈥檚 extremely hard for the human brain to process. Until now, it鈥檚 also been difficult for technology,鈥 said co-lead author , a UW doctoral student in the Paul G. Allen School of Computer Science & Engineering. 鈥淔or the first time, using what we鈥檙e calling a robotic 鈥榓coustic swarm,鈥 we鈥檙e able to track the positions of multiple people talking in a room and separate their speech.鈥

Previous research on has required using overhead or on-device cameras, projectors or special surfaces. The UW team鈥檚 system is the first to accurately distribute a robot swarm using only sound.

The team鈥檚 prototype consists of seven small robots that spread themselves across tables of various sizes. As they move from their charger, each robot emits a high frequency sound, like a bat navigating, using this frequency and other sensors to avoid obstacles and move around without falling off the table. The automatic deployment allows the robots to place themselves for maximum accuracy, permitting greater sound control than if a person set them. The robots disperse as far from each other as possible since greater distances make differentiating and locating people speaking easier. Today鈥檚 consumer smart speakers have multiple microphones, but clustered on the same device, they鈥檙e too close to allow for this system鈥檚 mute and active zones.

A small robot sits on a table beside a coffee cup.
The tiny individual microphones are able to navigate around clutter and place themselves with only sound. Photo: April Hong/天美影视传媒

鈥淚f I have one microphone a foot away from me, and another microphone two feet away, my voice will arrive at the microphone that鈥檚 a foot away first. If someone else is closer to the microphone that鈥檚 two feet away, their voice will arrive there first,鈥 said co-lead author , a UW doctoral student in the Allen School. 鈥淲e developed neural networks that use these time-delayed signals to separate what each person is saying and track their positions in a space. So you can have four people having two conversations and isolate any of the four voices and locate each of the voices in a room.鈥

The team tested the robots in offices, living rooms and kitchens with groups of three to five people speaking. Across all these environments, the system could discern different voices within 1.6 feet (50 centimeters) of each other 90% of the time, without prior information about the number of speakers. The system was able to process three seconds of audio in 1.82 seconds on average 鈥 fast enough for live streaming, though a bit too long for real-time communications such as video calls.

As the technology progresses, researchers say, acoustic swarms might be deployed in smart homes to better differentiate people talking with smart speakers. That could potentially allow only people sitting on a couch, in an 鈥渁ctive zone,鈥 to vocally control a TV, for example.

The seven robotic microphones sit in their charging station
To charge, the microphones automatically return to their charging station. Photo: April Hong/天美影视传媒

Researchers plan to eventually make microphone robots that can move around rooms, instead of being limited to tables. The team is also investigating whether the speakers can emit sounds that allow for real-world mute and active zones, so people in different parts of a room can hear different audio. The current study is another step toward science fiction technologies, such as the 鈥渃one of silence鈥 in 鈥淕et Smart鈥 and 鈥淒une,鈥 the authors write.

For more information see .

Of course, any technology that evokes comparison to fictional spy tools will raise questions of privacy. Researchers acknowledge the potential for misuse, so they have included guards against this: The microphones navigate with sound, not an onboard camera like other similar systems. The robots are easily visible and their lights blink when they鈥檙e active. Instead of processing the audio in the cloud, as most smart speakers do, the acoustic swarms process all the audio locally, as a privacy constraint. And even though some people鈥檚 first thoughts may be about surveillance, the system can be used for the opposite, the team says.

鈥淚t has the potential to actually benefit privacy, beyond what current smart speakers allow,鈥 Itani said. 鈥淚 can say, 鈥楧on鈥檛 record anything around my desk,鈥 and our system will create a bubble 3 feet around me. Nothing in this bubble would be recorded. Or if two groups are speaking beside each other and one group is having a private conversation, while the other group is recording, one conversation can be in a mute zone, and it will remain private.鈥

, formerly a principal research manager at Microsoft, is a co-author on this paper, and , a professor in the Allen School, is a senior author. The research was funded by a Moore Inventor Fellow award.

For more information, contact acousticswarm@cs.washington.edu.

]]>
With a new app, smart devices can have GPS underwater /news/2023/07/24/with-a-new-app-smart-devices-can-have-gps-underwater/ Mon, 24 Jul 2023 16:50:40 +0000 /news/?p=82142 a diver uses underwater gps on a smart watch
A team at the 天美影视传媒 has developed the first underwater 3D-positioning app for smart devices, such as the smartwatch pictured here. Photo: 天美影视传媒

Even for scuba and snorkeling enthusiasts, the plunge into open water can be dislocating. Divers frequently swim with limited visibility, which can become a safety hazard for teams trying to find each other in an emergency. Yet even though many dive with smartwatches designed to go to depths of over 100 feet, accurately locating mobile devices underwater has confounded researchers.

Now, a team at the 天美影视传媒 has developed the first underwater 3D-positioning app for smart devices. When at least three divers are within about 98 feet (30 meters) of each other, their devices鈥 existing speakers and microphones contact each other, and the app tracks each user鈥檚 location relative to the leader. This range can extend with more divers, if each is within 98 feet of another diver. The team will present in September at the in New York City.

鈥淢obile devices today can work nearly anywhere on Earth. You can be in a forest or on a plane and still get internet connectivity,鈥 said lead author , a UW doctoral student in the Paul G. Allen School of Computer Science & Engineering. 鈥淏ut the one place where we still hadn鈥檛 made mobile devices work was underwater. It鈥檚 kind of the final frontier.鈥

Above water, GPS relies on a vast satellite network to locate mobile devices with radio signals. Underwater, these signals quickly fade. Sound, though, travels faster and farther in water than it does in air. Previous underwater positioning systems have relied on strategically placed buoys, but these systems are expensive and cumbersome to deploy, leading many divers to do without.

A smartwatch running the underwater GPS app.
The underwater GPS app runs on a smartwatch. Photo: 天美影视传媒

The UW team found that such buoys aren鈥檛 necessary. With the app, if the dive leader has at least one other diver visible, the group鈥檚 devices can send acoustic signals to each other through their microphones and speakers and use the timestamps to estimate each diver鈥檚 distance. Based on these distances, the app can estimate the group鈥檚 formation and each diver鈥檚 location. If a device also tracks depth, as sport monitors like the Apple Watch Ultra or the Garmin Descent do, the system can locate divers in 3D.

The app needs at least three devices in its network to function, and its accuracy improves as more devices are added. When tested with four to five devices in local lakes and a pool, the app estimated locations with an average error of about 5 feet (1.6 meters) 鈥 close enough for divers to see each other in most environments. To get actual GPS coordinates, instead of tracking locations relative to the dive leader, the leader needs to be wirelessly connected to a surface device on a boat with GPS capabilities.

For more information and to see the app’s open-source code, visit the .

The study builds on a , which allows divers to send messages to each other underwater.

鈥淭his and AquaApp can be used together,鈥 said author , a UW doctoral student in the Allen School. 鈥淔or example, if the dive leader finds someone going the wrong way, the leader can send an alert: 鈥楬ey, you鈥檙e going out of range. You need to come back.鈥 Or if a diver is running out of gas, an SOS can let the team find the person quickly even in murky water.鈥

, a professor in the Allen School, is a senior author on this paper. This research was funded by grants from the Gordon and Betty Moore Foundation and National Science Foundation.

For more information, contact underwaterGPS@cs.washington.edu.

]]>
How low-cost earbuds can make newborn hearing screening accessible /news/2022/10/31/low-cost-earbuds-can-make-newborn-hearing-screening-accessible/ Mon, 31 Oct 2022 16:04:58 +0000 /news/?p=79975
A team led by researchers at the 天美影视传媒 has created a new hearing screening system that uses a smartphone and earbuds. Now the team is working with collaborators to use this tool as part of a hearing screening project in Kenya. Here, lead researcher Justin Chan, a UW doctoral student in the Paul G. Allen School of Computer Science & Engineering, uses the device to test a child’s hearing. Photo: Dr. Nada Ali/天美影视传媒

Newborns across the United States are . This test is important because it helps families better understand their child’s health, but it’s often not accessible to children in other countries because the screening device is expensive.

A team led by researchers at the 天美影视传媒 has created a new hearing screening system that uses a smartphone and low-cost earbuds instead. The team tested this device with 114 patients, including 52 babies up to 6 months old. The researchers also tested the device on pediatric patients with known hearing loss. Their tool performed as well as the commercial device, and it correctly identified all patients with hearing loss.

The team Oct. 31 in Nature Biomedical Engineering.

“There is a huge amount of health inequity in the world. I grew up in a country where there was no hearing screening available, in part because the screening device itself is pretty expensive,” said senior author , a UW professor in the Paul G. Allen School of Computer Science & Engineering. “The project here is to leverage the ubiquity of mobile devices people across the world already have 鈥 smartphones and $2 to $3 earbuds 鈥 to make newborn hearing screening something that’s accessible to all without sacrificing quality.”

The earbuds are connected to a microphone in a probe (shown here in blue) that can be placed in the patient’s ear. Photo: Raymond Smith/天美影视传媒

Because babies can’t tell doctors whether they can hear a given sound, these tests rely on the mechanics of the ear.

“When an external sound is played, hair cells in the inner ear move and vibrate. The result is a very quiet sound that our instruments can pick up,” said co-author , an associate professor of otolaryngology-head and neck surgery at the UW School of Medicine who practices at “This screening is very sensitive, meaning that if there is a concern about a patient’s hearing, they will be referred for a more thorough evaluation with a specialist.”

For the test, doctors send two different tones into the ear at the same time. Based on those tones, the hair cells in the ear vibrate and create a third tone, which is what the doctors are listening for.

One reason the commercial device is expensive is that its speaker has been designed to play the two tones without any interference. The UW researchers found that they could use affordable earbuds 鈥 where each earbud plays one of the two tones 鈥 instead. The earbuds are connected to a microphone in a probe that can be placed in the patient’s ear. The microphone records any sounds from the ear and sends them to a smartphone for processing.

The earbuds are connected as shown here to the probe. Photo: Raymond Smith/天美影视传媒

“As you can imagine, these sounds that are coming out from the ear are very soft, and sometimes it’s hard to hear them over noise in the environment or if the patient is moving their head,” said lead author , a UW doctoral student in the Allen School. “We designed听 algorithms on the phone that help us detect the signal even with all that background noise. These algorithms can run in real time on any smartphone and do not require the latest smartphone models.”

The researchers tested their device at three hearing clinics in the Puget Sound area in the state of Washington. For each test, they tested four different frequencies, which is typical for these types of hearing screenings. Participants ranged in age from a few weeks to 20 years old.

Now the team is working with collaborators to use this tool as part of a newborn hearing screening project in Kenya. The researchers teamed up with a group from the UW global health department, the University of Nairobi and the Kenya Ministry of Health to create the project “Toward Universal Newborn and Early Childhood Hearing Screening in Kenya,” or .

“Right now, this is a prototype that we created. The next challenge is really scaling this up and then working with local experts in each country who are the most familiar with the particular challenges in each situation,” Chan said. “We have an opportunity to really have an impact on global health, especially for newborn hearing. I think it’s pretty gratifying to know that the research we do can help to directly solve real problems.”

A child in Kenya has their hearing tested by lead researcher Justin Chan. Photo: Dr. Nada Ali/天美影视传媒

Additional co-authors on this paper are , a resident in otolaryngology-head and neck surgery at the UW School of Medicine; , who worked on this project as a UW doctoral student in the electrical and computer engineering department; , a clinical research coordinator at Seattle Children’s; , a UW affiliate instructor in speech and hearing sciences; and an associate professor of pediatrics in the UW School of Medicine who practices at Seattle Children’s. This research was funded by the National Institute on Deafness and Other Communication Disorders, the Washington Research Foundation, the Seattle Children鈥檚 Research Institute, the Seattle Children’s Research Integration Hub, the Pilot Awards Support Fund Program, a Moore Inventor Fellow award and the National Science Foundation.

For more information, contact tune@cs.washington.edu.

Grant numbers: T32DC000018, 10617

]]>