Shyam Gollakota – UW News

Tiny cameras in earbuds let users talk with AI about what they see

Stefan Milne — Tue, 14 Apr 2026 14:38:00 +0000

UW researchers developed a system called VueBuds that uses tiny cameras in off-the-shelf wireless earbuds to allow users to talk with an AI model about the scene in front of them. Here, the altered headphones are shown with the camera inserted. Photo: Kim et al./CHI ‘26

��Ӱ�Ӵ�ý researchers developed the first system that incorporates tiny cameras in off-the-shelf wireless earbuds to allow users to talk with an AI model about the scene in front of them. For instance, a user might turn to a Korean food package and say, “Hey Vue, translate this for me.” They’d then hear an AI voice say, “The visible text translates to ‘Cold Noodles’ in English.”

The prototype system called VueBuds takes low-resolution, black-and-white images, which it transmits over Bluetooth to a phone or other nearby device. A small artificial intelligence model on the device then answers questions about the images within around a second. For privacy, all of the processing happens on the device, a small light turns on when the system is recording, and users can immediately delete images.��

The team will April 14 at the Association for Computing Machinery Conference on Human Factors in Computing Systems in Barcelona.��

“We haven’t seen most people adopt smart glasses or VR headsets, in part because a lot of people don’t like wearing glasses, and they often come with , such as recording high-resolution video and processing it in the cloud,” said senior author , a UW professor in the Paul G. Allen School of Computer Science & Engineering. “But almost everyone wears earbuds already, so we wanted to see if we could put visual intelligence into tiny, low-power earbuds, and also address privacy concerns in the process.”

Cameras use far more power than the microphones already in earbuds, so using the same sort of high-res cameras as those in smart glasses wouldn’t work. Also, large amounts of information can’t stream continuously over Bluetooth, so the system can’t run continuous video.��

The team found that using a low-power camera — roughly the size of a grain of rice — to shoot low-resolution, black-and-white still images limited battery drain and allowed for Bluetooth transmission while preserving performance.

There was also the matter of placement.��

“One big question we had was: Will your face obscure the view too much? Can earbud cameras capture the user’s view of the world reliably?” said lead author , who completed this work as a UW doctoral student in the Allen School.��

The team found that angling each camera 5-10 degrees outward provides a 98-108 degree field of view. While this creates a small blind spot when objects are held closer than 20 centimeters from the user, people rarely hold things that close to examine them — making it a non-issue for typical interactions.

Researchers also discovered that while the vision language model was largely able to make sense of the images from each earbud, having to process images from both earbuds slowed it down. So they had the system “stitch” the two images into one, identifying overlapping imagery and combining it. This allows the system to respond in one second — quick enough to feel like real-time for users — rather than the two seconds it takes with separate images.

The team then had 74 participants compare recorded outputs from VueBuds with outputs from Ray-Ban Meta Glasses in a series of tests. Despite VueBuds using low-resolution images with greater privacy controls and the Ray-Bans taking high-res images processed on the cloud, the two systems performed equivalently. Participants preferred VueBuds’ translations, while the Ray-Bans did better at counting objects.

Visit for more information
Story in
A smart ring with a tiny camera lets users point and click to control home devices
AI headphones automatically learn who you’re talking to — and let you hear them better

Sixteen participants also wore VueBuds and tested the system’s ability to translate and answer basic questions about objects. VueBuds achieved 83-84% accuracy when translating or identifying objects and 93% when identifying the author and title of a book.

This study was designed to gauge the feasibility of integrating cameras in wireless earbuds. Since the system only takes grayscale images, it can’t answer questions that involve color in the scene.��

The team wants to add color to the system — color cameras require more power — and to train specialized AI models for specific use cases, such as translation.��

“This study lets us glimpse what’s possible just using a general purpose language model and our wireless earbuds with cameras,” Kim said. “But we’d like to study the system more rigorously for applications like reading a book — for people who have low vision or are blind, for instance — or translating text for travelers.”��

Co-authors include , a UW master’s student in the Allen School, and , , , and , all UW students in electrical and computer engineering.��

For more information, contact vuebuds@cs.washington.edu.

AI headphones automatically learn who you’re talking to — and let you hear them better

William Poor — Tue, 09 Dec 2025 17:30:37 +0000

UPDATE (Dec. 12, 2025): This story has been updated to correct Malek Itani’s department.

Holding a conversation in a crowded room often leads to the frustrating “cocktail party problem,” or the challenge of separating the voices of conversation partners from a hubbub. It’s a mentally taxing situation that can be exacerbated by hearing impairment.��

As a solution to this common conundrum, researchers at the ��Ӱ�Ӵ�ý have developed that proactively isolate all the wearer’s conversation partners in a noisy soundscape. The headphones are powered by an AI model that detects the cadence of a conversation and another model that mutes any voices which don’t follow that pattern, along with other unwanted background noises. The prototype uses off-the-shelf hardware and can identify conversation partners using just two to four seconds of audio.

The system’s developers think the technology could one day help users of hearing aids, earbuds and smart glasses to filter their soundscapes without the need to manually direct the AI’s “attention.”

The team Nov. 7 in Suzhou, China at the Conference on Empirical Methods in Natural Language Processing. The underlying code is open-source and .

“Existing approaches to identifying who the wearer is listening to predominantly involve electrodes implanted in the brain to track attention,” said senior author , a UW professor in the Paul G. Allen School of Computer Science & Engineering. “Our insight is that when we’re conversing with a specific group of people, our speech naturally follows a turn-taking rhythm. And we can train AI to predict and track those rhythms using only audio, without the need for implanting electrodes.”

Related:

For more information, visit
Story from

The prototype system, dubbed “proactive hearing assistants,” activates when the person wearing the headphones begins speaking. From there, one AI model begins tracking conversation participants by performing a “who spoke when” analysis and looking for low overlap in exchanges. The system then forwards the result to a second model which isolates the participants and plays the cleaned up audio for the wearer. The system is fast enough to avoid confusing audio lag for the user, and can currently juggle one to four conversation partners in addition to the wearer’s audio.

The team tested the headphones with 11 participants, who rated qualities like noise suppression and comprehension with and without the AI filtration. Overall, the group rated the filtered audio more than twice as favorably as the baseline.��

The team combined off-the-shelf noise-canceling headphones with binaural microphones to create the prototype, pictured here. Photo: Hu et al./EMNLP

Gollakota’s team has been experimenting with AI-powered hearing assistants for the past few years. They developed one smart headphone prototype that can pick a person’s audio out of a crowd when the wearer looks at them, and another that creates a “sound bubble” by muting all sounds within a set distance of the wearer.��

“Everything we’ve done previously requires the user to manually select a specific speaker or a distance within which to listen, which is not great for user experience,” said lead author Guilin Hu, a doctoral student in the Allen School. “What we’ve demonstrated is a technology that’s proactive — something that infers human intent noninvasively and automatically.”

Plenty of work remains to refine the experience. The more dynamic a conversation gets, the more the system is likely to struggle, as participants talk over one another or speak in longer monologues. Participants entering and leaving a conversation present another hurdle, though Gollakota was surprised by how well the current prototype performed in these more complicated scenarios. The authors also note that the models were tested on English, Mandarin and Japanese dialog, and that the rhythms of other languages might require further fine-tuning.

The current prototype uses commercial over-the-ear headphones, microphones and circuitry. Eventually, Gollakota expects to make the system small enough to run on a tiny chip within an earbud or a hearing aid. In that appeared at , the authors demonstrated that it is possible to run AI models on tiny hearing aid devices.

Co-authors include��, a UW doctoral student in the Allen School; and , a UW doctoral student in the electrical and computer engineering department.

This research was funded by the Moore Inventor Fellows program.

For more information, contact proactivehearing@cs.washington.edu

AI headphones translate multiple speakers at once, cloning their voices in 3D sound

Stefan Milne — Fri, 09 May 2025 16:02:02 +0000

, a ��Ӱ�Ӵ�ý doctoral student, recently toured a museum in Mexico. Chen doesn’t speak Spanish, so he ran a translation app on his phone and pointed the microphone at the tour guide. But even in a museum’s relative quiet, the surrounding noise was too much. The resulting text was useless.

Various technologies have emerged lately promising fluent translation, but none of these solved Chen’s problem of public spaces. , for instance, function only with an isolated speaker; they after the speaker finishes.

Now, Chen and a team of UW researchers have designed at once, while preserving the direction and qualities of people’s voices. The team built the system, called Spatial Speech Translation, with off-the-shelf noise-cancelling headphones fitted with microphones. The team’s algorithms separate out the different speakers in a space and follow them as they move, translate their speech and play it back with a 2-4 second delay.

The Apr. 30 at the ACM CHI Conference on Human Factors in Computing Systems in Yokohama, Japan. The code for the proof-of-concept device is available for others to build on. “Other translation tech is built on the assumption that only one person is speaking,” said senior author , a UW professor in the Paul G. Allen School of Computer Science & Engineering. “But in the real world, you can’t have just one robotic voice talking for multiple people in a room. For the first time, we’ve preserved the sound of each person’s voice and the direction it’s coming from.”

Story in
For more information, visit��

The system makes three innovations. First, when turned on, it immediately detects how many speakers are in an indoor or outdoor space.

“Our algorithms work a little like radar,” said lead author Chen, a UW doctoral student in the Allen School. “So it’s scanning the space in 360 degrees and constantly determining and updating whether there’s one person or six or seven.”

The system then translates the speech and maintains the expressive qualities and volume of each speaker’s voice while running on a device, such mobile devices with an Apple M2 chip like laptops and Apple Vision Pro. (The team avoided using cloud computing because of the privacy concerns with voice cloning.) Finally, when speakers move their heads, the system continues to track the direction and qualities of their voices as they change.

The system functioned when tested in 10 indoor and outdoor settings. And in a 29-participant test, the users preferred the system over models that didn’t track speakers through space.

In a separate user test, most participants preferred a delay of 3-4 seconds, since the system made more errors when translating with a delay of 1-2 seconds. The team is working to reduce the speed of translation in future iterations. The system currently only works on commonplace speech, not specialized language such as technical jargon. For this paper, the team worked with Spanish, German and French — but previous work on translation models has shown they can be trained to translate around 100 languages.

“This is a step toward breaking down the language barriers between cultures,” Chen said. “So if I’m walking down the street in Mexico, even though I don’t speak Spanish, I can translate all the people’s voices and know who said what.”

, a research intern at HydroX AI and a UW undergraduate in the Allen School while completing this research, and , a UW doctoral student in the Allen School, are also co-authors on this paper. This research was funded by a Moore Inventor Fellow award and a .

For more information, contact the researchers at babelfish@cs.washington.edu.��

A smart ring with a tiny camera lets users point and click to control home devices

Stefan Milne — Wed, 08 Jan 2025 17:05:27 +0000

While smart devices in homes have grown to include speakers, security systems, lights and thermostats, the ways to control them have remained relatively stable. Users can interact with a phone, or talk to the tech, but these are frequently less convenient than the simple switches they replace: “Turn on the lamp…. Not that one…. Turn up the speaker volume…. Not that loud!”

��Ӱ�Ӵ�ý researchers have developed IRIS, a smart ring that allows users to control smart devices by aiming the ring’s small camera at the device and clicking a built-in button. The prototype Bluetooth ring sends an image of the selected device to the user’s phone, which controls the device. The user can adjust the device with the button and — for devices with gradient controls, such as a speaker’s volume — by rotating their hand. IRIS, or Interactive Ring for Interfacing with Smart home devices, operates off a charge for 16-24 hours.

The team Oct. 16 at the 37th Annual ACM Symposium on User Interface Software and Technology in Pittsburgh. IRIS is not currently available to the public.

“Voice commands can often be really cumbersome,” said co-lead author , a UW doctoral student in the Paul G. Allen School of Computer Science & Engineering. “We wanted to create something that’s as simple and intuitive as clicking on an icon on your computer desktop.”

UW researchers have developed IRIS, a smart ring that allows users to point and click to control smart devices. Here, the ring (left) is shown beside its circuit board and battery and a quarter. Photo: Kim et al./UIST ‘24

The team decided to put the system in a ring because they believed users would realistically wear that throughout the day. The challenge, then, was integrating a camera into a wireless smart ring with its size and power constraints. The system also had to toggle devices in under a second; otherwise, users tend to think it is not working.

To achieve this, researchers had the ring compress the images before sending them to a phone. Rather than streaming images all the time, the ring gets activated when the user clicks the button, then turns off after 3 seconds of inactivity.

Related:

For more information, visit

In a study with 23 participants, twice as many users preferred IRIS over a voice command system alone (in this case, Apple’s Siri). On average, IRIS controlled home devices more than two seconds faster than voice commands.

“In the future, integrating the IRIS camera system into a health-tracking smart ring would be a transformative step for smart rings,” Kim said. “It’d let smart rings actually augment or improve human capability, rather than just telling you your step count or heart rate.”

, —��both UW doctoral students in the Allen School — were co-lead authors on the study, and , a UW professor in the Allen School, was the senior author. Additional co-authors include , a UW research assistant in the Allen School; , a UW undergraduate in the Allen School; , a UW master’s student in the Allen School; and , a UW professor in the Allen School. This research was funded by a Moore Inventor Fellow award and the National Science Foundation.

For more information, contact iris@cs.washington.edu.

AI headphones create a ‘sound bubble,’ quieting all sounds more than a few feet away

Stefan Milne — Thu, 14 Nov 2024 16:00:11 +0000

Imagine this: You’re at an office job, wearing noise-canceling headphones to dampen the ambient chatter. A co-worker arrives at your desk and asks a question, but rather than needing to remove the headphones and say, “What?”, you hear the question clearly. Meanwhile the water-cooler chat across the room remains muted. Or imagine being in a busy restaurant and hearing everyone at your table, but reducing the other speakers and noise in the restaurant.

A team led by researchers at the ��Ӱ�Ӵ�ý has created a headphone prototype that allows listeners to create just such a “sound bubble.” The team’s artificial intelligence algorithms combined with a headphone prototype allow the wearer to hear people speaking within a bubble with a programmable radius of 3 to 6 feet. Voices and sounds outside the bubble are quieted an average of 49 decibels (approximately ), even if the distant sounds are louder than those inside the bubble.

The team Nov. 14 in Nature Electronics. The code for the proof-of-concept device is available for others to build on. The researchers are creating a startup to commercialize this technology.

For more information, visit
AI headphones let wearer listen to a single person in a crowd, by looking at them just once
AI noise-canceling headphone technology lets wearers pick which sounds they hear

“Humans aren’t great at perceiving distances through sound, particularly when there are multiple sound sources around them,” said senior author , a UW professor in the Paul G. Allen School of Computer Science & Engineering. “Our abilities to focus on the people in our vicinity can be limited in places like loud restaurants, so creating sound bubbles on a has not been possible so far. Our AI system can actually learn the distance for each sound source in a room, and process this in real time, within 8 milliseconds, on the hearing device itself.”

Researchers created the prototype with commercially available noise-canceling headphones. They affixed six small microphones across the headband. The team’s neural network — running on a small attached to the headphones — tracks when different sounds reach each microphone. The system then suppresses the sounds coming from outside the bubble, while playing back and slightly amplifying the sounds inside the bubble (because noise-canceling headphones physically let some sound through).

“We’d worked on a previous smart-speaker system where we spread the microphones across a table because we thought we needed significant distances between microphones to extract distance information about sounds,” Gollakota said. “But then we started questioning our assumption. Do we need a big separation to create this ‘sound bubble’? What we showed here is that we don’t. We were able to do it with just the microphones on the headphones, and in real-time, which was quite surprising.”

To train the system to create sound bubbles in different environments, researchers needed a distance-based sound dataset collected in the real-world, which was not available. To gather such a dataset, they put the headphones on a mannequin head. A robotic platform rotated the head while a moving speaker played noises coming from different distances. The team collected data with the mannequin system as well as with human users in 22 different indoor environments, including offices and living spaces.

The team created a prototype using off the shelf headphones fitted with microphones, pictured here. Photo: Chen et al./Nature Electronics

Researchers have determined that the system works for a couple of reasons. First, the wearer’s head reflects sounds, which helps the neural net distinguish sounds from various distances. Second, sounds (like human speech) have multiple frequencies, each of which goes through different phases as it travels from its source. The team’s AI algorithm, the researchers believe, is comparing the phases of each of these frequencies to determine the distance of any sound source (a person talking, for instance).

Headphones like Apple’s AirPods Pro 2 . But these features work by tracking head position and amplifying the sound coming from a specific direction, rather than gauging distance. This means the headphones can’t amplify multiple speakers at once, lose functionality if the wearer turns their head away from the target speaker, and aren’t as effective at reducing loud sounds from the speaker’s direction.

The system has been trained to work only indoors, because getting clean training audio is more difficult outdoors. Next, the team is working to make the technology function on hearing aids and noise-canceling earbuds, which requires a new strategy for positioning the microphones.

Additional co-authors are and , UW doctoral students in the Allen School; , a senior researcher at Microsoft; and , director of research at AssemblyAI. This research was funded by a Moore Inventor Fellow award, a and the National Science Foundation.

For more information, contact soundbubble@cs.washington.edu.

AI headphones let wearer listen to a single person in a crowd, by looking at them just once

Stefan Milne — Thu, 23 May 2024 16:36:42 +0000

Noise-canceling headphones have gotten very good at creating an auditory blank slate. But allowing certain sounds from a wearer’s environment through the erasure still challenges researchers. The latest edition of Apple’s AirPods Pro, for instance, for wearers — sensing when they’re in conversation, for instance — but the user has little control over whom to listen to or when this happens.

A ��Ӱ�Ӵ�ý team has developed an artificial intelligence system that lets a user wearing headphones look at a person speaking for three to five seconds to “enroll” them. The system, called “Target Speech Hearing,” then cancels all other sounds in the environment and plays just the enrolled speaker’s voice in real time even as the listener moves around in noisy places and no longer faces the speaker.

The team presented May 14 in Honolulu at the ACM CHI Conference on Human Factors in Computing Systems. The is available for others to build on. The system is not commercially available.

“We tend to think of AI now as web-based chatbots that answer questions,” said senior author , a UW professor in the Paul G. Allen School of Computer Science & Engineering. “But in this project, we develop AI to modify the auditory perception of anyone wearing headphones, given their preferences. With our devices you can now hear a single speaker clearly even if you are in a noisy environment with lots of other people talking.”

To use the system, a person wearing off-the-shelf headphones fitted with microphones taps a button while directing their head at someone talking. The sound waves from that speaker’s voice then should reach the microphones on both sides of the headset simultaneously; there’s a 16-degree margin of error. The headphones send that signal to an , where the team’s machine learning software learns the desired speaker’s vocal patterns. The system latches onto that speaker’s voice and continues to play it back to the listener, even as the pair moves around. The system’s ability to focus on the enrolled voice improves as the speaker keeps talking, giving the system more training data.

Related:

For more information, visit
Stories from and

The team tested its system on 21 subjects, who rated the clarity of the enrolled speaker’s voice nearly twice as high as the unfiltered audio on average.

This work builds on the team’s previous “semantic hearing” research, which allowed users to select specific sound classes — such as birds or voices — that they wanted to hear and canceled other sounds in the environment.

Currently the TSH system can enroll only one speaker at a time, and it’s only able to enroll a speaker when there is not another loud voice coming from the same direction as the target speaker’s voice. If a user isn’t happy with the sound quality, they can run another enrollment on the speaker to improve the clarity.

The team is working to expand the system to earbuds and hearing aids in the future.

Additional co-authors on the paper were , and , UW doctoral students in the Allen School, and , director of research at AssemblyAI. This research was funded by a Moore Inventor Fellow award, a and a .

For more information, contact tsh@cs.washington.edu.

New AI noise-canceling headphone technology lets wearers pick which sounds they hear

Stefan Milne — Thu, 09 Nov 2023 16:26:42 +0000

A team led by researchers at the ��Ӱ�Ӵ�ý has developed deep-learning algorithms that let users pick which sounds filter through their headphones in real time. Pictured is co-author Malek Itani demonstrating the system. Photo: ��Ӱ�Ӵ�ý

Most anyone who’s used noise-canceling headphones knows that hearing the right noise at the right time can be vital. Someone might want to erase car horns when working indoors, but not when walking along busy streets. Yet people can’t choose what sounds their headphones cancel.

Now, a team led by researchers at the ��Ӱ�Ӵ�ý has developed deep-learning algorithms that let users pick which sounds filter through their headphones in real time. The team is calling the system “semantic hearing.” Headphones stream captured audio to a connected smartphone, which cancels all environmental sounds. Either through voice commands or a smartphone app, headphone wearers can select which sounds they want to include from 20 classes, such as sirens, baby cries, speech, vacuum cleaners and bird chirps. Only the selected sounds will be played through the headphones.

The team presented Nov. 1 at in San Francisco. In the future, the researchers plan to release a commercial version of the system.

“Understanding what a bird sounds like and extracting it from all other sounds in an environment requires real-time intelligence that today’s noise canceling headphones haven’t achieved,” said senior author , a UW professor in the Paul G. Allen School of Computer Science & Engineering. “The challenge is that the sounds headphone wearers hear need to sync with their visual senses. You can’t be hearing someone’s voice two seconds after they talk to you. This means the neural algorithms must process sounds in under a hundredth of a second.”

Because of this time crunch, the semantic hearing system must process sounds on a device such as a connected smartphone, instead of on more robust cloud servers. Additionally, because sounds from different directions arrive in people’s ears at different times, the system must preserve these delays and other spatial cues so people can still meaningfully perceive sounds in their environment.

Related:

For more information, see
Story from

Tested in environments such as offices, streets and parks, the system was able to extract sirens, bird chirps, alarms and other target sounds, while removing all other real-world noise. When 22 participants rated the system’s audio output for the target sound, they said that on average the quality improved compared to the original recording.

In some cases, the system struggled to distinguish between sounds that share many properties, such as vocal music and human speech. The researchers note that training the models on more real-world data might improve these outcomes.

Additional co-authors on the paper were and , both UW doctoral students in the Allen School; , who completed this research as a doctoral student in the Allen School and is now at Carnegie Mellon University; and , director of research at AssemblyAI.

For more information, contact semantichearing@cs.washington.edu.

UW team’s shape-changing smart speaker lets users mute different areas of a room

Stefan Milne — Thu, 21 Sep 2023 15:19:43 +0000

A team led by researchers at the ��Ӱ�Ӵ�ý has developed a shape-changing smart speaker, which uses self-deploying microphones to divide rooms into speech zones and track the positions of individual speakers. Here UW doctoral students Tuochao Chen (foreground), Mengyi Shan, Malek Itani, and Bandhav Veluri — all in the Paul G. Allen School of Computer Science & Engineering — demonstrate the system in a meeting room. Photo: April Hong/��Ӱ�Ӵ�ý

In virtual meetings, it’s easy to keep people from talking over each other. Someone just hits mute. But for the most part, this ability doesn’t translate easily to recording in-person gatherings. In a bustling cafe, there are no buttons to silence the table beside you.

The ability to locate and control sound — isolating one person talking from a specific location in a crowded room, for instance — has , especially without visual cues from cameras.

A team led by researchers at the ��Ӱ�Ӵ�ý has developed a shape-changing smart speaker, which uses self-deploying microphones to divide rooms into speech zones and track the positions of individual speakers. With the help of the team’s deep-learning algorithms, the system lets users mute certain areas or separate simultaneous conversations, even if two adjacent people have similar voices. Like a fleet of Roombas, each about an inch in diameter, the microphones automatically deploy from, and then return to, a charging station. This allows the system to be moved between environments and set up automatically. In a conference room meeting, for instance, such a system might be deployed instead of a central microphone, allowing better control of in-room audio.

The team published Sept. 21 in Nature Communications.

“If I close my eyes and there are 10 people talking in a room, I have no idea who’s saying what and where they are in the room exactly. That’s extremely hard for the human brain to process. Until now, it’s also been difficult for technology,” said co-lead author , a UW doctoral student in the Paul G. Allen School of Computer Science & Engineering. “For the first time, using what we’re calling a robotic ‘acoustic swarm,’ we’re able to track the positions of multiple people talking in a room and separate their speech.”

Previous research on has required using overhead or on-device cameras, projectors or special surfaces. The UW team’s system is the first to accurately distribute a robot swarm using only sound.

The team’s prototype consists of seven small robots that spread themselves across tables of various sizes. As they move from their charger, each robot emits a high frequency sound, like a bat navigating, using this frequency and other sensors to avoid obstacles and move around without falling off the table. The automatic deployment allows the robots to place themselves for maximum accuracy, permitting greater sound control than if a person set them. The robots disperse as far from each other as possible since greater distances make differentiating and locating people speaking easier. Today’s consumer smart speakers have multiple microphones, but clustered on the same device, they’re too close to allow for this system’s mute and active zones.

The tiny individual microphones are able to navigate around clutter and place themselves with only sound. Photo: April Hong/��Ӱ�Ӵ�ý

“If I have one microphone a foot away from me, and another microphone two feet away, my voice will arrive at the microphone that’s a foot away first. If someone else is closer to the microphone that’s two feet away, their voice will arrive there first,” said co-lead author , a UW doctoral student in the Allen School. “We developed neural networks that use these time-delayed signals to separate what each person is saying and track their positions in a space. So you can have four people having two conversations and isolate any of the four voices and locate each of the voices in a room.”

The team tested the robots in offices, living rooms and kitchens with groups of three to five people speaking. Across all these environments, the system could discern different voices within 1.6 feet (50 centimeters) of each other 90% of the time, without prior information about the number of speakers. The system was able to process three seconds of audio in 1.82 seconds on average — fast enough for live streaming, though a bit too long for real-time communications such as video calls.

As the technology progresses, researchers say, acoustic swarms might be deployed in smart homes to better differentiate people talking with smart speakers. That could potentially allow only people sitting on a couch, in an “active zone,” to vocally control a TV, for example.

To charge, the microphones automatically return to their charging station. Photo: April Hong/��Ӱ�Ӵ�ý

Researchers plan to eventually make microphone robots that can move around rooms, instead of being limited to tables. The team is also investigating whether the speakers can emit sounds that allow for real-world mute and active zones, so people in different parts of a room can hear different audio. The current study is another step toward science fiction technologies, such as the “cone of silence” in “Get Smart” and “Dune,” the authors write.

For more information see .

Of course, any technology that evokes comparison to fictional spy tools will raise questions of privacy. Researchers acknowledge the potential for misuse, so they have included guards against this: The microphones navigate with sound, not an onboard camera like other similar systems. The robots are easily visible and their lights blink when they’re active. Instead of processing the audio in the cloud, as most smart speakers do, the acoustic swarms process all the audio locally, as a privacy constraint. And even though some people’s first thoughts may be about surveillance, the system can be used for the opposite, the team says.

“It has the potential to actually benefit privacy, beyond what current smart speakers allow,” Itani said. “I can say, ‘Don’t record anything around my desk,’ and our system will create a bubble 3 feet around me. Nothing in this bubble would be recorded. Or if two groups are speaking beside each other and one group is having a private conversation, while the other group is recording, one conversation can be in a mute zone, and it will remain private.”

, formerly a principal research manager at Microsoft, is a co-author on this paper, and , a professor in the Allen School, is a senior author. The research was funded by a Moore Inventor Fellow award.

For more information, contact acousticswarm@cs.washington.edu.

With a new app, smart devices can have GPS underwater

Stefan Milne — Mon, 24 Jul 2023 16:50:40 +0000

A team at the ��Ӱ�Ӵ�ý has developed the first underwater 3D-positioning app for smart devices, such as the smartwatch pictured here. Photo: ��Ӱ�Ӵ�ý

Even for scuba and snorkeling enthusiasts, the plunge into open water can be dislocating. Divers frequently swim with limited visibility, which can become a safety hazard for teams trying to find each other in an emergency. Yet even though many dive with smartwatches designed to go to depths of over 100 feet, accurately locating mobile devices underwater has confounded researchers.

Now, a team at the ��Ӱ�Ӵ�ý has developed the first underwater 3D-positioning app for smart devices. When at least three divers are within about 98 feet (30 meters) of each other, their devices’ existing speakers and microphones contact each other, and the app tracks each user’s location relative to the leader. This range can extend with more divers, if each is within 98 feet of another diver. The team will present in September at the in New York City.

“Mobile devices today can work nearly anywhere on Earth. You can be in a forest or on a plane and still get internet connectivity,” said lead author , a UW doctoral student in the Paul G. Allen School of Computer Science & Engineering. “But the one place where we still hadn’t made mobile devices work was underwater. It’s kind of the final frontier.”

Above water, GPS relies on a vast satellite network to locate mobile devices with radio signals. Underwater, these signals quickly fade. Sound, though, travels faster and farther in water than it does in air. Previous underwater positioning systems have relied on strategically placed buoys, but these systems are expensive and cumbersome to deploy, leading many divers to do without.

The underwater GPS app runs on a smartwatch. Photo: ��Ӱ�Ӵ�ý

The UW team found that such buoys aren’t necessary. With the app, if the dive leader has at least one other diver visible, the group’s devices can send acoustic signals to each other through their microphones and speakers and use the timestamps to estimate each diver’s distance. Based on these distances, the app can estimate the group’s formation and each diver’s location. If a device also tracks depth, as sport monitors like the Apple Watch Ultra or the Garmin Descent do, the system can locate divers in 3D.

The app needs at least three devices in its network to function, and its accuracy improves as more devices are added. When tested with four to five devices in local lakes and a pool, the app estimated locations with an average error of about 5 feet (1.6 meters) — close enough for divers to see each other in most environments. To get actual GPS coordinates, instead of tracking locations relative to the dive leader, the leader needs to be wirelessly connected to a surface device on a boat with GPS capabilities.

For more information and to see the app’s open-source code, visit the .

The study builds on a , which allows divers to send messages to each other underwater.

“This and AquaApp can be used together,” said author , a UW doctoral student in the Allen School. “For example, if the dive leader finds someone going the wrong way, the leader can send an alert: ‘Hey, you’re going out of range. You need to come back.’ Or if a diver is running out of gas, an SOS can let the team find the person quickly even in murky water.”

, a professor in the Allen School, is a senior author on this paper. This research was funded by grants from the Gordon and Betty Moore Foundation and National Science Foundation.

For more information, contact underwaterGPS@cs.washington.edu.

How low-cost earbuds can make newborn hearing screening accessible

Sarah McQuate — Mon, 31 Oct 2022 16:04:58 +0000

A team led by researchers at the ��Ӱ�Ӵ�ý has created a new hearing screening system that uses a smartphone and earbuds. Now the team is working with collaborators to use this tool as part of a hearing screening project in Kenya. Here, lead researcher Justin Chan, a UW doctoral student in the Paul G. Allen School of Computer Science & Engineering, uses the device to test a child’s hearing. Photo: Dr. Nada Ali/��Ӱ�Ӵ�ý

Newborns across the United States are . This test is important because it helps families better understand their child’s health, but it’s often not accessible to children in other countries because the screening device is expensive.

A team led by researchers at the ��Ӱ�Ӵ�ý has created a new hearing screening system that uses a smartphone and low-cost earbuds instead. The team tested this device with 114 patients, including 52 babies up to 6 months old. The researchers also tested the device on pediatric patients with known hearing loss. Their tool performed as well as the commercial device, and it correctly identified all patients with hearing loss.

The team Oct. 31 in Nature Biomedical Engineering.

“There is a huge amount of health inequity in the world. I grew up in a country where there was no hearing screening available, in part because the screening device itself is pretty expensive,” said senior author , a UW professor in the Paul G. Allen School of Computer Science & Engineering. “The project here is to leverage the ubiquity of mobile devices people across the world already have — smartphones and $2 to $3 earbuds — to make newborn hearing screening something that’s accessible to all without sacrificing quality.”

The earbuds are connected to a microphone in a probe (shown here in blue) that can be placed in the patient’s ear. Photo: Raymond Smith/��Ӱ�Ӵ�ý

Because babies can’t tell doctors whether they can hear a given sound, these tests rely on the mechanics of the ear.

“When an external sound is played, hair cells in the inner ear move and vibrate. The result is a very quiet sound that our instruments can pick up,” said co-author , an associate professor of otolaryngology-head and neck surgery at the UW School of Medicine who practices at “This screening is very sensitive, meaning that if there is a concern about a patient’s hearing, they will be referred for a more thorough evaluation with a specialist.”

For the test, doctors send two different tones into the ear at the same time. Based on those tones, the hair cells in the ear vibrate and create a third tone, which is what the doctors are listening for.

One reason the commercial device is expensive is that its speaker has been designed to play the two tones without any interference. The UW researchers found that they could use affordable earbuds — where each earbud plays one of the two tones — instead. The earbuds are connected to a microphone in a probe that can be placed in the patient’s ear. The microphone records any sounds from the ear and sends them to a smartphone for processing.

The earbuds are connected as shown here to the probe. Photo: Raymond Smith/��Ӱ�Ӵ�ý

“As you can imagine, these sounds that are coming out from the ear are very soft, and sometimes it’s hard to hear them over noise in the environment or if the patient is moving their head,” said lead author , a UW doctoral student in the Allen School. “We designed�� algorithms on the phone that help us detect the signal even with all that background noise. These algorithms can run in real time on any smartphone and do not require the latest smartphone models.”

The researchers tested their device at three hearing clinics in the Puget Sound area in the state of Washington. For each test, they tested four different frequencies, which is typical for these types of hearing screenings. Participants ranged in age from a few weeks to 20 years old.

Now the team is working with collaborators to use this tool as part of a newborn hearing screening project in Kenya. The researchers teamed up with a group from the UW global health department, the University of Nairobi and the Kenya Ministry of Health to create the project “Toward Universal Newborn and Early Childhood Hearing Screening in Kenya,” or .

“Right now, this is a prototype that we created. The next challenge is really scaling this up and then working with local experts in each country who are the most familiar with the particular challenges in each situation,” Chan said. “We have an opportunity to really have an impact on global health, especially for newborn hearing. I think it’s pretty gratifying to know that the research we do can help to directly solve real problems.”

A child in Kenya has their hearing tested by lead researcher Justin Chan. Photo: Dr. Nada Ali/��Ӱ�Ӵ�ý

Additional co-authors on this paper are , a resident in otolaryngology-head and neck surgery at the UW School of Medicine; , who worked on this project as a UW doctoral student in the electrical and computer engineering department; , a clinical research coordinator at Seattle Children’s; , a UW affiliate instructor in speech and hearing sciences; and an associate professor of pediatrics in the UW School of Medicine who practices at Seattle Children’s. This research was funded by the National Institute on Deafness and Other Communication Disorders, the Washington Research Foundation, the Seattle Children’s Research Institute, the Seattle Children’s Research Integration Hub, the Pilot Awards Support Fund Program, a Moore Inventor Fellow award and the National Science Foundation.

For more information, contact tune@cs.washington.edu.

Grant numbers: T32DC000018, 10617