MIT Device Tracks Tone of Conversations in Real Time

The wearable could help those who struggle to pick up on emotional and social cues.

MITCSAILlead Jason Dorfman

Photo by Jason Dorfman of MIT CSAIL

Imagine if, at the end of a conversation, you could rewind it and see what made the other person happy, sad, or uncomfortable.

Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) may have found a way to do just that, using an artificial intelligence system worn on the wrist. The system can determine if the tone of a conversation is happy, sad, or neutral based on a person’s speech patterns and vitals—a breakthrough that could be especially useful for those who struggle with emotional and social cues, such as individuals with Asperger’s Syndrome.

An invention that came out of CSAIL last fall, called EQ-Radio, also used vital signs to determine emotion. The latest product, however, is a step up, using and analyzing audio, transcriptions of text, and physiological signals to estimate the overall emotion of a conversation with 83 percent accuracy. The system can also estimate the emotion of isolated five-second intervals of the conversation, roughly 8 percent better than existing methods.

Ph.D. candidate Mohammad Ghassemi and graduate student Tuka Al Hanai co-authored a paper about the system, and emphasized its implications in a release.

“Our work is a step in this direction, suggesting that we may not be that far away from a world where people can have an AI social coach right in their pocket,” Al Hanai says in the statement.

The AI system was tested in a study, during which subjects wore a Samsung Simband to record movement, heart rate, blood pressure, blood flow, and skin temperature, as well as audio and text transcripts.

These vitals are all instrumental to the system’s algorithm, which associates long pauses and monotone voices with sad stories, and energetic speech patterns with happy stories. It also analyzes fidgeting, movement, and cardiovascular activity to further determine emotion.

If the system becomes available to consumers, users would need to get consent to record conversations. The algorithm also functions locally within each device to protect privacy.

The next step, Al Hanai says, is to fine-tune the algorithm to more accurately label moments as boring or tense, instead of simply positive or negative.

“Developing technology that can take the pulse of human emotions,” she says in the statement, “has the potential to dramatically improve how we communicate with each other.”