AI analysis of voice data from presidential debate reveals ‘secret’ Trump advantage
By Nicolas Perony, Co-founder and CTO of OTO Systems
As Trump and Biden get set to face off in the final debate of a bitter US election contest, I wanted to look beyond mere words to investigate what voice data would reveal about the voices of the two incumbents.
We know that words carry meaning, but I wanted to understand more about the power of the delivery of these words, and the influence of hidden insights that would tell the world more about behaviour and intent, in that first chaotic debate televised on September 29, 2020.
An AI-aided analysis of over one million data points extracted from that first 2-hour debate watched by some 73.1 million viewers shows Trump might have dominated the event by achieving a greater density of output, and by more aggressively making use of his turns.
Interestingly, the Trump campaign reacted strongly to the decision by the nonpartisan Commission on Presidential Debates (CPD) to mute opposing candidates’ microphones during the ‘opening arguments’ section of the final debate, to be held on Thursday 22 October, 2020. This decision was taken because Trump repeatedly interrupted Biden during the first debate. The President has called this ‘unfair’, and his campaign communications director Tim Murtaugh charged, without evidence, that the commission’s decision is an “attempt to provide advantage to their favoured candidate.”
When Trump announced that he had contracted the coronavirus, the commission changed the second debate to a virtual one; however, Trump pulled out. The Biden camp said that “they believe most voters, especially undecided voters, will see the president as avoiding a second debate out of his own interests, not because he dislikes the format.” (Associated Press, Oct. 8 2020). Could this be because he relies on the technique of interrupting to unnerve his opponent? Chris Wallace told Fox News that the President had interrupted either Biden or himself “a total of 145 times, which is way more than one a minute.” I wanted to use data to reveal deep, possibly hidden, insights about the debating tactics used by Biden and Trump.
To do the research, I imported the soundtrack from the first debate’s YouTube stream into OTO’s proprietary DeepTone™ engine. I divided the voice data into the six subject segments as set out by the CPD. Each of the subject segments — namely the supreme court nomination; COVID-19; the US economy; race relations; the candidates in the election; and electoral integrity — represented roughly fifteen minutes of talk time each.
OTO’s DeepTone™ engine analyses tonality, rhythms of speech and the emotional elements of voice, like anger, happiness, tiredness or irritation in a speaker, that contribute to how communication is received and perceived. What I’m examining in this original approach are the hidden behaviours and intentions behind human communication rather than the lexical context of the communication itself.
I extracted turns from the audio data to look at how many times each of the participants — Trump, Biden and the moderator, Chris Wallace — spoke uninterruptedly. There were 790 turns in the debate in total.
The first thing I wanted to consider was polarity; in other words, who dominated the debate at any time? And did this change over time, or with different debate topics?
The graph below tracks debate polarity in terms of who ‘holds the floor’ for any given time. At the start of each debate topic, we clearly see how each candidate gets a turn, but that the argument quickly fragments into a heated, rapid-fire exchange (with an average polarity around the middle of the range). Straight out the gate Trump dominates his segment, the Supreme Court debate. But the contest fragments into a ‘free-for-all’ and frequently alternates between each candidate once Biden responds. These ‘ping-pongs’ are a mainstay of the debate.
Biden seems to dominate when discussing the economy, and in the discussion on race, they seem quite evenly matched. However, overall, the polarity spikes on the Trump side more often.
Speaking turn duration
Next, I looked at the speaking turn duration. By this I mean how long each candidate speaks for, in terms of uninterrupted seconds. Here all three distributions are similarly dominated by very short turns, illustrating the interruption dynamics present throughout the debate. But what makes a turn interruptible? This question leads me to our next analytical extrapolation: speaking density.
I extracted the turn-wise speech density, calculated as the ratio of high-resolution frames containing speech (DeepTone™ outputs data every 64 milliseconds), for each participant, and found a clear pattern showing that Trump manages to use his speaking turns in a more effective manner, packing more data into his speaking time. Specifically, 90% of his speaking turns have an average speech density of at least 0.75 (75% of a turn is continuous speech without pauses), while Joe Biden and Chris Wallace only manage to densely fill half as many of their own speaking turns.
Biden speaks more languidly, with more pauses and ‘ums and ahs’. So the above result is not surprising, and from this one might conclude that Trump came across as being more resolute. In terms of our analysis, Trump uses his turns to greater effect by simply packing more data into each turn. By comparison, think of what you could do in a day, not with more time, but with a denser use of the time that you are given!
Listen to these samples from the debate:
Joe Biden (turn 9), speech density: 0.32
Donald Trump (turn 391), speech density: 0.97
Here you’re looking at a very sparse turn from Biden, and a very dense turn from Trump. Play the first sound bite and you’ll hear: “First of all, um, thank you”. Listen to how tired and languid Biden sounds. Now compare this to Trump’s “Came in from Germany, came in from Japan, went to Michigan, went to Ohio…“. Here we can immediately hear the difference. Trump speaks in plosive, dense bursts and packs many more words and concepts into his speech. There are a lot of overlaps of people speaking over each other — as in this case, when the moderator, Wallace, tried to cut in on a repetitive Trump.
Wallace’s attempts to stop Trump speaking sees the incumbent continue without losing a beat or being out of breath. The steam train that is Trump refuses to stall, and it becomes apparent that the former reality performer’s effort is to get as much verbiage in as possible.
Our data shows that the two nominees differ dramatically in terms of density and domination, and the question to ask is whether this is a point of strategy. Does Trump’s decades of television and reality experience, and his intensive PR coaching over the years, offer him the ability to game the system by knowing how it works as an insider? If density is a Trump strategy, this would certainly be the case.
Contrast this to Biden who, and this is a matter of public record, struggled with stuttering during his formative years. He credits his mother with overcoming this challenge. Biden has learned to adapt by being more measured and considered in his speaking. Consequently Biden only achieves a fraction of the density that Trump does.
What we see in the data is that Biden is more practised, measured, and cautious in his approach. This only lapses during times of high emotion, like when Biden told Trump: “Will you shut up, man!” at the end of the first segment. Even this is somewhat restrained, unlike Trump, who doesn’t hold back at all. Trump by contrast is absolutely self-confident and unhesitant.
Now let’s look at DeepTone™’s arousal score, which speaks to energy in tone. All three participants in the debate differ in their arousal score, with Biden tending to make more frequent use of a strident tone. Let’s listen to low/high arousal turns for each of the candidates.
Joe Biden (turn 9), arousal score: 0.02
Joe Biden (turn 198), arousal score: 0.97
Donald Trump (turn 176), arousal score: 0.09
Donald Trump (turn 142), arousal score: 0.96
I find that, on average, longer turns have higher arousal. But there’s a more interesting pattern that emerges when we look at the detailed dynamics of arousal within a turn. I calculated this by extracting over 78,000 individual arousal measurements throughout the debate. What I discovered was that arousal tends to increase, then plateau, then peak for the candidates, but not the moderator.
This might be a manifestation of a speaker preparing to respond to his opponent. Contrarily to the speakers, the moderator’s arousal score tends to drop towards the end of a turn, presumably to give the candidates room to speak.
Looking at the big picture again — the whole debate — one can gain deeper insights: I grouped the arousal score measurements in 5-second slices and drew a regression line per participant and per segment to show the trend. The graph below reveals the arousal dynamics of each segment.
While there is high variation in arousal scores throughout a segment, the direction of the regression line is notable: the first segment sets the tone, with arousal increasing as the skirmish between the candidates intensifies, culminating in Biden’s now iconic, “Will you shut up, man,” after 15 minutes.
Another notable feature of the graphs is that arousal scores tend to move in the same direction (especially for the candidates) — demonstrating a linguistic phenomenon known as ‘acoustic-prosodic entrainment’, which is how each speaker matches the arousal level of the other, over the duration of each segment.
As humans, we’re sitting on untold amounts of voice data, that can help us better understand the world we live in. I’d like to invite political pundits, journalists, analysts, and anyone interested in creating value through voice intelligence to make their own analysis of speeches, podcasts, streams, and voice in any form using OTO’s proprietary DeepTone™ Software Developer Kit, or simply sign up for a free trial of our cloud API. If you do, please share your results with us.