A team of researchers at the University of Washington is currently developing a shape-changing smart speaker by utilizing robotic acoustic swarms to enable the device to mute certain areas. While the use of a robotic swarm has been done before, the UW team is the first to use only sound to distribute the small robots and apply such a process for self-deploying microphones.
The use of robot acoustic swarms in the shape-changing smart speaker being developed allows the device to locate, control, and isolate sound without the need for visual cues from cameras. As a result, rooms can be divided into speech zones as the smart speaker can track the positions of users or sound sources. This system and technology also allow better control of in-room audio without the use of a central microphone or boundary microphones.
“If I close my eyes and there are 10 people talking in a room, I have no idea who’s saying what and where they are in the room exactly. That’s extremely hard for the human brain to process. Until now, it’s also been difficult for technology… For the first time, using what we’re calling a robotic ‘acoustic swarm’, we’re able to track the positions of multiple people talking in a room and separate their speech.”, says co-lead author, Malek Itani.
With the use of robotic acoustic swarms and deep-learning algorithms, the microphones also act and work like robot vacuums as they have similar systems. Just like robot vacuums, these self-deploying microphones can automatically leave and return to charging stations when the batteries are running out of juice.
In the prototype, the microphone makes use of seven small robots to help in navigation. Each robot can emit a high-frequency sound while using frequency and its other sensors to avoid obstacles and move around. Combined with these robotic technologies and the acoustic swarm process, the self-deploying microphones only place themselves in the area of the desired sound source while avoiding other microphones to maximize sound pickup and avoid ‘sound’ clutter.
“If I have one microphone a foot away from, and another microphone two feet away, my voice will arrive at the microphone that’s a foot away first. If someone else is closer to the microphone that’s two feet away, their voice will arrive there first,” says co-lead author, Tuochao Chen. To also help in isolating voices and tracking positions, the team developed neural networks that make use of time-delayed signals.
As of writing, the tests and experiments run by the team showed promising results. With groups of three to five individuals in a room, the speakers and microphone system can accurately discern different voices provided that they are within 1.6 feet of each other. The system was also able to process three seconds of audio in 1.82 seconds on average. Although these results are great for live-streaming applications, more upgrades and progress are needed for video call use.
Eventually, the team plans on developing microphone robots that can move themselves around rooms and speakers that can emit sounds in real-world mute and active zones.