At Columbia University, a team of researchers has successfully created a program that can block out audio spying through microphones found in smartphones and connected audio devices that require voice use.
This algorithm works by using predictive voice technology: that is, it can recognize human speech and instinctively generate audible background noise like muffling or whispers in order to camouflage the user’s words.
The technology works in real-time as the algorithm is able to create the obstruction while a person is speaking to a voice-controlled device or conversing with a friend.
But why create such an algorithm in the first place?
The problem stems from advertiser eavesdropping. While this is an issue that has not been proved or disproved, there is plenty of anecdotal evidence that backs it up.
For example, many people on social media have claimed to have seen targeted ads for hyperspecific products (think cat food or a brand of toothpaste) after talking about them in real-life conversations. Without doing a single Google search of the said product, people find that it’s made its way into the carousel of advertisements we witness on a daily basis.
One theory is rogue eavesdropping where companies are able to tap into smartphone mics through voice predictive programs that can pick up and decode real language. That’s what this unique new algorithm aims to prevent.
Similar attempts at developing an eavesdrop-blocking program have been done before. Often, they worked by creating white noise that could fool voice recognition software up to a certain extent, thus preventing any data spying.
However, according to the researchers at Columbia University, the rate of human speech is too fast to be blocked by such a simple program.
To address this, they made prediction a key feature in their program. The algorithm will not only be able to identify certain words but actually make an educated guess on what the user will say next and create obstructive noise based on that guess. This is what the developers called a “predictive attack” model.
The model was tested over a span of two days using eight NVIDIA RTX 2080Ti GPUs and a speech data set that lasted 100 hours.
Among the notable findings of the research are the optimal prediction time (0.5 seconds into the future), an 80% effectiveness rate, and a preference for masking longer words.
The 80% figure was generated by testing the algorithm against multiple speech recognition systems. When the whispers were deployed, the researchers observed an error rate of 80% on the systems’ ends.
The tests also proved that shorter words are harder to obfuscate. These were mostly words like “the”, “they”, “our”, etc. Adversely, words that had more syllables and were longer were easier for the algorithm to mask.
Despite other experts denying its existence, data collection through speech is still a possible issue that many researchers are trying to address.
However, it’s not as simple as creating anti-eavesdropping tools. Once they become normalized, many are worried that ad companies will merely develop more software to counter them.
Another problem is the trustworthiness of such algorithms. In the wrong hands, they can be used as another avenue for data collection.