Jenq-Neng Hwang – UW News /news Thu, 28 Mar 2024 16:53:08 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 Q&A: How to train AI when you don’t have enough data /news/2024/03/28/train-ai-machine-learning-when-you-dont-have-enough-data/ Thu, 28 Mar 2024 16:53:08 +0000 /news/?p=84878 Artificial intelligence excels at sorting through information and detecting patterns or trends. But these machine learning algorithms need to be trained with large amounts of data first.

As researchers explore potential applications for AI, they have found scenarios where AI could be really useful — such as analyzing X-ray image data to look for evidence of rare conditions or — but there’s not enough data to accurately train the algorithms.

Jenq-Neng Hwang Photo: ÌìÃÀÓ°ÊÓ´«Ã½

, ÌìÃÀÓ°ÊÓ´«Ã½ professor of electrical and computer and engineering, specializes in these issues. For example, Hwang and his team developed a method that teaches AI to monitor how many distinct poses a baby can achieve throughout the day. There are limited training datasets of babies, which meant the researchers had to create a unique pipeline to make their algorithm accurate and useful. The team in the IEEE/CVF Winter Conference on Applications of Computer Vision 2024.

UW News spoke with Hwang about the project details and other similarly challenging areas the team is addressing.

Why is it important to develop an algorithm to track baby poses?

Jenq-Neng Hwang: We started a collaboration with the UW School of Medicine and the . The goal of the project was to try to help families with a history of autism know whether their babies were also likely to have autism. Babies younger than 9 months don’t really have language skills yet, so it’s difficult to see if they’re autistic or not. Researchers developed one test, called the , which categorizes various poses babies can do: If a baby can do this, they get two points; and if they can do that, they get three points; and so on. Then you add up all the points and if the baby is above some threshold, they likely don’t have autism.

But to do this test, you need a doctor to observe all the different poses. It becomes a very tedious process because sometimes after three or four hours, we still haven’t seen a baby do a specific pose. Maybe the baby could do it, but at that moment they didn’t want to. One solution could be to use AI. Parents often have a baby monitor at home. The baby monitor could use AI to continuously and consistently track the various poses a baby does in a day.

Why is AI a good fit for this task?

JNH: My background is studying traditional image processing and computer vision. We were trying to teach computers to be able to figure out human poses from photos or videos, but the trouble is that there are so many variations. For example, even the same person wearing different outfits is a challenging task for traditional image processing to correctly identify that person’s elbow on each photo.

But AI makes it so much easier. These models can learn. For example, you could train a machine learning model with a variety of motion captured sequences showing all different kinds of people. These sequences could be annotated with the corresponding 3D poses. Then this model could learn to output a 3D model of a person’s pose on a sequence it has never seen before.

But in this case, there aren’t a lot of motion captured sequences of babies that also have 3D pose annotations that you could use to train your machine learning model. What did you do instead?

JNH: We don’t have a lot of 3D pose annotations of baby videos to train the machine learning model for privacy reasons. It’s also difficult to create a dataset where a baby is performing all the possible potential poses that we would need. Our datasets are too small, meaning that a model trained with them would not estimate reliable poses.

The team also at the Winter Conference on Applications of Computer Vision.

Learn more about this research and more .

But we do have a lot of annotated 3D motion sequences of people in general. So, we developed this pipeline.

First we used the large amount of 3D motion sequences of regular people to train a generic 3D pose model, which is similar to the model used in ChatGPT and other GPT-4 types of large language models.

We then finetuned our generic model with our very limited dataset of annotated baby motion sequences. The generic model can then adapt to the small dataset and produce high quality results.

Two side by side images. On the right is a picture of a baby on its hands and knees. There is a blue "stick figure" drawn over the top marking the baby's arm and leg positions. On the right is a grid showing two stick figures of the baby. One is red and one is blue. They overlap pretty well.
Shown on the left is an image of a baby with a 2D “stick figure” created using a set of detected keypoints. On the right is the stick figure model of the baby’s 3D pose. The red stick figure shows the “ground truth” and the blue stick figure is the 3D pose estimated with the researchers’ algorithm.

Are there other tasks like this: good for AI, but there’s not a lot of data to train an algorithm?

JNH: There are many types of scenarios where we don’t have enough information to train the model. One example is a rare disease that is diagnosed by X-rays. The disease is so rare that we don’t have enough X-ray images from patients with the disease to train a model. But we do have a lot of X-rays from healthy patients. So, we can use generative AI again to generate the corresponding synthetic X-ray image without disease, which can then be compared with the diseased image to identify disease regions for further diagnosis.

Autonomous driving is another example. There are so many real events you cannot create. For example, say you are in the middle of driving and a few leaves blow in front of the car. If you use autonomous driving, the car might think something is wrong and slam on the brakes, because the car has never seen this scenario before. This could result in an accident.

We call these “long-tail” events, which means that they are unlikely to happen. But in daily life we always see random things like this. Until we figure out how to train autonomous driving systems to handle these types of events, autonomous driving cannot be useful. Our team is working on this problem by combining data from a regular camera with radar information. The camera and radar persistently check each other’s decisions, which can help a machine learning algorithm make sense of what’s happening.

Additional co-authors on the baby poses paper are , a UW research assistant in the electrical and computer engineering department; and , UW doctoral students in the electrical and computer engineering department; , a UW master’s student studying electrical and computer engineering; and , a doctoral fellow at the University of Copenhagen. This research was funded by the Electronics and Telecommunications Research Institute of Korea, the National Oceanic and Atmospheric Administration and Cisco Research.

For more information, contact Hwang at hwang@uw.edu.

]]>
Moving cameras talk to each other to identify, track pedestrians /news/2014/11/12/moving-cameras-talk-to-each-other-to-identify-track-pedestrians/ Wed, 12 Nov 2014 17:56:52 +0000 /news/?p=34558 It’s not uncommon to see cameras mounted on store ceilings, propped up in public places or placed inside subways, buses and even on the dashboards of cars.

Cameras record our world down to the second. This can be a powerful surveillance tool on the roads and in buildings, but it’s surprisingly hard to sift through vast amounts of visual data to find pertinent information – namely, making a split-second identification and understanding a person’s actions and behaviors as recorded sequentially by cameras in a variety of locations.

Tracking camera example
Frames from a moving camera recorded by the Swiss Federal Institute of Technology in Zurich, Switzerland, show how UW technology distinguishes among people by giving each person a unique color and number, then tracking them as they walk. Photo: Swiss Federal Institute of Technology

Now, ÌìÃÀÓ°ÊÓ´«Ã½ electrical engineers have developed a way to automatically track people across moving and still cameras by using an algorithm that trains the networked cameras to learn one another’s differences. The cameras first identify a person in a video frame, then follow that same person across multiple camera views.

“Tracking humans automatically across cameras in a three-dimensional space is new,” said lead researcher , a UW professor of electrical engineering. “As the cameras talk to each other, we are able to describe the real world in a more dynamic sense.”

Hwang and his research team presented last month in Qingdao, China, at the sponsored by the Institute of Electrical and Electronics Engineers, or IEEE.

Imagine a typical GPS display that maps the streets, buildings and signs in a neighborhood as your car moves forward, then add humans to the picture. With the new technology, a car with a mounted camera could take video of the scene, then identify and track humans and overlay them into the virtual 3-D map on your GPS screen. The UW researchers are developing this to work in real time, which could help pick out people crossing in busy intersections, or track a specific person who is dodging the police.

“Our idea is to enable the dynamic visualization of the realistic situation of humans walking on the road and sidewalks, so eventually people can see the animated version of the real-time dynamics of city streets on a platform like Google Earth,” Hwang said.

Hwang’s research team in the past decade for video cameras – from the most basic models to high-end devices – to talk to each other as they record different places in a common location. The problem with tracking a human across cameras of non-overlapping fields of view is that a person’s appearance can vary dramatically in each video because of different perspectives, angles and color hues produced by different cameras.

The researchers overcame this by building a link between the cameras. Cameras first record for a couple of minutes to gather training data, systematically calculating the differences in color, texture and angle between a pair of cameras for a number of people who walk into the frames in a fully unsupervised manner without human intervention.

After this calibration period, an algorithm automatically applies those differences between cameras and can pick out the same people across multiple frames, effectively tracking them without needing to see their faces.

Tracking camera example
The tracking system first systematically picks out people in a camera frame, then follows each person based on his or her clothing texture, color and body movement. Photo: Swiss Federal Institute of Technology

The research team has tested the ability of static and moving cameras to detect and track pedestrians on the UW campus in multiple scenarios. In one experiment, graduate students mounted cameras in their cars to gather data, then applied the algorithms to successfully pick out humans and follow them in a three-dimensional space.

They also installed the tracking system on cameras placed inside a robot and a flying drone, allowing the robot and drone to follow a person, even when the instruments came across obstacles that blocked the person from view.

A robot equipped with a camera follows a researcher by tracking him as he walks.
A robot equipped with a camera follows a researcher by tracking him as he walks. Photo: U of Washington

The linking technology can be used anywhere, as long as the cameras can talk over a wireless network and upload data to the cloud.

This detailed visual record could be useful for security and surveillance, monitoring for unusual behavior or tracking a moving suspect. But it also tells store owners and business proprietors useful information and statistics about consumers’ moving patterns. A store owner could, for example, use a tracking system to watch a shopper’s movements in the store, taking note of her interests. Then, a coupon or deal for a particular product could be displayed on a nearby screen or pushed to the shopper’s phone – in an instant.

Leveraging the visual data produced by our physical actions and movements might, in fact, become the next way in which we receive marketing, advertisements and even helpful tools for our everyday lives.

Inevitably, people will have privacy concerns, Hwang said, and the information extracted from cameras could be encrypted before it’s sent to the cloud.

“Cameras and recording won’t go away. We might as well take advantage of that fact and extract more useful information for the benefit of the community,” he added.

Co-authors are Kuan-Hui Lee, a UW doctoral student in electrical engineering, and Greg Okopal and James Pitton, engineers at the UW .

The research was funded by the Electronics and Telecommunications Research Institute of Korea and the Applied Physics Laboratory.

###

For more information, contact Hwang at hwang@uw.edu or 206-685-1603.

]]>