Machine learning, Twitter, and Donald Trump


I've been studying Trump's tweets for over two years, and I've read every tweet ever posted by @realDonaldTrump. There are nearly 50,000.


But this post is not about Trump's bizarre Twitter activity. Just as interesting as Trump's tweets are the millions of users who tweet replies to @realDonaldTrump. How many of those tweets are #MAGA trolls, and how many are "triggered libs?"


To begin, I randomly sampled 1,000 replies to @realDonaldTrump, from 11/26/19 to 5/8/20, and set on an arduous journey to hand-label them “0” for anti-Trump and “1” for pro-Trump. In the end, exactly 700 (70%) were anti-Trump and 300 (30%) were pro-Trump. These numbers weren't particularly surprising, as previous reports have indicated that Twitter users skew liberal.

And just like that, I had a workable training set. In R (Isaac, computer science? Who would've thought?), I trained a regression model to automatically classify a response tweet, based on the tweet’s words and associations between words, as 0 or 1. On a test set of 200 tweets (where I compared the model's predictions to a hand-labeled key), the model is ~91% accurate. Not bad!


*Okay, the initial model was only around 75% accurate -- barely more accurate than a "null" model (equivalent to always guessing "anti-Trump," which is true 70% of the time). After a lot of time fiddling with the model's regularization parameters, up-sampling, under-sampling, and training word "embeddings" (matrices of associations between words) from a larger sample of millions of response tweets...I arrived at something workable!


Here are some examples of tweets automatically classified by the model:


Text of response tweet                                               Probability_ProTrump       Pro_Trump?

-------------------------------------------------------------------------------------------------------------------------------

@realDonaldTrump BECAUSE DONALD TRUMP FAILED AMERICA!!!!

0.06

0

-------------------------------------------------------------------------------------------------------------------------------

@realDonaldTrump @nytimes @washingtonpost @CNN SCR*W THOSE 'CUTIE PIES' MR PRESIDENT!! WE, AS WELL AS MANY MANY MILLIONS OF ANON'S AND PATRIOTS HAVE YOUR BACK! WE ARE THE NEWS NOW!

0.86

1

-------------------------------------------------------------------------------------------------------------------------------

@realDonaldTrump Nancy "Needs some Polident" Pelosi!

Public enemy #1

0.93

1

-------------------------------------------------------------------------------------------------------------------------------

@realDonaldTrump Was thrown to the #Wolves and he comes back leading the #WolfPack! #Wolf #StandToGether #AmericaFirst #MAGA

0.97

1

-------------------------------------------------------------------------------------------------------------------------------

@realDonaldTrump You are so God dam incompetent please go away.

0

0

-------------------------------------------------------------------------------------------------------------------------------

@realDonaldTrump Trump ripped for late night removal of watchdog who reported his COVID-19 failures: Another Friday night assassination

0.16

0

-------------------------------------------------------------------------------------------------------------------------------


Here’s an example of one tweet that is incorrectly classified:


@realDonaldTrump What have the Democrats done for the average American lately?

0.49

0

-------------------------------------------------------------------------------------------------------------------------------

(It should be labeled 1, as it’s pro-Trump, but as prob(proTrump) < 0.5, it’s classified as anti-Trump.)



I was pretty happy with having built a classification model, but of course any model is only so interesting as its applications. Here are some general findings I found by applying the model to a sample of 10,000 replies:


Most common words among pro-Trump replies:


They hate sleepy joe "biden," address Trump as "sir," thank "god" for Trump, and "love" the "usa." Who's surprised?

Most common words among anti-Trump replies:


They side with "@cnn" and "cuomo" over @realDonaldTrump, complain about Trump's handling of the "virus," call trump a "liar," and cite comparisons between Trump and "obama."



That's all for this post. While the model and top words alone don't reveal any particularly novel information, there's much more to come. I need to lie down and rest my head after reading well over 1,000 angry political tweets -- more than a few of which introduced me to conspiracy theories I hadn't heard of prior. If anyone has a lot of time on their hands, go digging through #QAnon tweets to remind yourself that, no matter how much quarantine has plagued your mind, there are people far crazier than you out there.


Isaac



Comments

  1. Incredible work. I truly enjoy reading about our glorious leader. All hail the best president in the history of the world and may he rule over us forever!! #keepamericagreat

    ReplyDelete

Post a Comment