How similar are Trump tweets to Mein Kampf? My Weekend Coding Project

trump_mein_kampf.utf8


Just how similar is a given Trump tweet to a line from Hitler’s Mein Kampf? Let’s find out…

This script uses three packages: tidyverse (notation), quanteda (text pre-processing and cosine similarity calculation), and rtweet (importing and processing Twitter API data). You’ll need to un-comment the install.packages() calls if you haven’t installed these packages.

#install.packages(tidyverse)
#install.packages(quanteda)
#install.packages(rtweet)

library(tidyverse,quietly=T); library(quanteda,quietly=T); library(rtweet,quietly=T)

As of 7/26/20, Trump has produced 43,137 original tweets (excluding retweets). Let’s read those in along with all 10,317 lines of Mein Kampf:

trump <- read_csv("all_trump_original_tweets_072620.csv",col_types = cols())
fileName <- 'MeinKampf.txt'
MeinKampf <- readChar(fileName, file.info(fileName)$size) %>% strsplit("\\.\\s|\\!\\s|\\?\\s")
MeinKampf <- tibble(data.frame(matrix(unlist(MeinKampf), nrow=10317, byrow=T),stringsAsFactors=FALSE))
df <- MeinKampf %>% rename(text = matrix.unlist.MeinKampf...nrow...10317..byrow...T.)

Mein Kampf text pre-processing:

hitlercorp <- corpus(df$text)
hitlerdfm <- tokens(hitlercorp) %>% tokens_ngrams(n=1:3) %>% dfm(tolower=TRUE,remove_url=TRUE,stem=TRUE,remove_punct=TRUE,remove=c(stopwords("english")))

Here we define the function:

select_trump_tweet <- function() {
  q1 <- readline(prompt="Which Trump tweet would you like to view? (A number from 1-43137) ")
  cat(plain_tweets(trump$text)[as.numeric(q1)])
  q2 <- readline(prompt="Would you like to use this tweet (Yes or No)? ")
  if (as.character(tolower(q2))=="yes") {
    trumpcorp <- corpus(plain_tweets(trump$text)[as.numeric(q1)]) 
    trumpdfm <- tokens(trumpcorp) %>% tokens_ngrams(n=1:10) %>% dfm(tolower=TRUE,remove_url=TRUE,stem=TRUE,remove_punct=TRUE,remove=c(stopwords("english"), "t.co", "https", "rt", "amp", "http", "t.c", "can", "~","RT","realdonaldtrump"))
    cat("\n\nTrump tweet selected!\n\n")
    cat("Searching lines from Mein Kampf...\n\n")
    trump_hitler <- as.data.frame(textstat_simil(hitlerdfm, trumpdfm, margin = "documents",method="cosine"))
    cat("Done! Top 5 matching results:\n\n")
    trump_hitler <- as_tibble(trump_hitler[order(-trump_hitler$cosine),])
    for (tweet in trump_hitler$document1[1:5]) {
      cat(paste(which(trump_hitler$document1==tweet),df$text[as.numeric(substr(tweet,5,nchar(tweet)))],"\n\n"))
    }
  }
  else {
    select_trump_tweet()
  }
}

Before running the function, if you’d like to view 10 random Trump tweets, you can do so here:

sample_n(plain_tweets(trump),10)
## # A tibble: 10 x 2
##      num text                                                                      
##    <dbl> <chr>                                                                     
##  1  6732 "\"@mikeyjacques: @realDonaldTrump is a BAD A$$, I could only be so lucky…
##  2 14935 "I pick the best locations- @Trump_Charlotte has incredible views of beau…
##  3 33559 "Yesterday in Pittsburgh I was really impressed with Congressman Keith Ro…
##  4 15986 "Polling strong, Donald Trump starting to get serious //t.co/8LCMa58P7P v…
##  5 21205 "Polls show that the hurricane had a huge positive effect for Obama on hi…
##  6  6980 "\"@uconncrazy @realDonaldTrump Great to hear you on @sternshow. Your hon…
##  7 26360 "\"@1sonny12: @KSmith233035 @mitchellvii FLORIDIANS ARE UPSET BECAUSE RUB…
##  8  7043 "\"@momtoheather: @TrumpLasVegas @realDonaldTrump great Hotel!! Stayed th…
##  9   326 "@MaryBethTHM @TheRealMarilu @CelebApprentice Marilu is a fantastic perso…
## 10 16604 "Bill Clinton has been Obama's most effective surrogate out on the trail."

Time to turn a Trump tweet into Mein Kampf. Can you tell the difference?

select_trump_tweet()
## The FAKE NEWS media (failing @nytimes, @NBCNews, @ABC, @CBS, @CNN) is not my enemy, it is the enemy of the American People!
## 
## Trump tweet selected!
## 
## Searching lines from Mein Kampf...
## 
## Done! Top 5 matching results:
## 
## 1 The result was that the enemies of the Republic ceased to oppose the Republic as such and helped to subjugate those who were also enemies of the Republic, though for quite different reasons 
## 
## 2 Only the enemies of the two countries, Germany and Russia, could have an active interest in such a war under these circumstances 
## 
## 3 It is the task of the propagandist to recruit the followers and it is the task of the organizer to select the members 
## 
## 4 Here again it is the fault of the education given our young people 
## 
## 5 Those who effectively combat this mortal enemy of our people, who is at the same time the enemy of all Aryan peoples and all culture, can only expect to arouse opposition on the part of this race and become the object of its slanderous attacks


Citations:

Benoit K, Watanabe K, Wang H, Nulty P, Obeng A, Müller S, Matsuo A (2018). “quanteda: An R package for the quantitative analysis of textual data.” Journal of Open Source Software, 3(30), 774. doi: 10.21105/joss.00774 (URL: https://doi.org/10.21105/joss.00774), URL: https://quanteda.io.

Brown B. Trump Twitter Archive [Internet]. Trumptwitterarchive.com. 2019. Available from: http://www.trumptwitterarchive.com/.

Kearney, M. W. (2019). rtweet: Collecting and analyzing Twitter data, Journal of Open Source Software, 4, 42. 1829. doi:10.21105/joss.01829 (R package version 0.7.0).

R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

RStudio Team (2020). RStudio: Integrated Development for R. RStudio, PBC, Boston, MA URL http://www.rstudio.com/.

Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686.

Comments