All Your Tweets Are Belong To Us: the Twitterverse declares a winner

, | Features

This post is the second is a series that examines this year’s console launches through the eyes of the Twitterverse. For more on the project, see the description in the previous post.

As of the evening of the 24th of November, I have dutifully collected 4,168,778 English language tweets about the PS4 and the Xbox One. A few minor technical glitches aside, this represents 348 consecutive hours of tweets that include words and hash tags relevant to the new consoles.

After the jump, find out whether Swedish soccer legend Zlatan Ibrahimovic and One Direction member Louis Tomlinson play the same system…

During the first day of the PS4 release, the single most influential tweet (measuring influence by the number of retweets, in this case) was by Louis Tomlinson, member of boy band One Direction. His words of wisdom that were retweeted 42,253 times and favorited 55,334 times?


I didn’t bother digging into the replies to his tweet — many of which discuss how adorable he is.

This anecdote represents a gold mine for marketers — and this frequently is what is done with mined Twitter data. If I were an exec at Sony or one of its studios, I might consider reaching out to Louis; paying him to represent my product.

Zlatan, on the other hand, is an Xbox One fan. Or is he? His single tweet about the console, days before release:


While Microsoft has an Xbox Ambassador program (Sony should take note), it’s unlikely that Zlatan self-registered. He represents one of Microsoft’s attempts to reach into the European console market through celebrities.

Those two tweets represent the basics of how this all works, except most of us aren’t celebrities. Regular people do routine things and tweet about them. When celebrities they admire tweet about something of interest to them, they either retweet or reply, in some fantasy world where a public reply to a public tweet constitutes a conversation with an idol. In aggregate, those tweets, retweets, and replies represent the corpus of text that I’m mining for this project.

Methodology and Tools

Let me take a moment to explain what I did. Using the official Twitter API, I created a Twitter “application” that exists only as a single Python script that I wrote. Running persistently, Twitter pushed tweets relevant to the PS4 and the Xbox One (as well as the WiiU, which isn’t part of this article) to me at the rate allowed by the API. On average, this is about 12,000 tweets per hour. At its quietest, Twitter pushes about 4500 tweets per hour, while it reaches nearly 40,000 tweets per hour at its peak.

For the most part, about 33% of the tweets, by volume, are retweets. During heavy tweet loads, however, it is common to see that number spike to over 45%. The reach of a retweet, of course, is only as far as the network of followers that the retweeter has. Intuitively, tweets posted by famous people and companies are more likely to be retweeted.

When dealing with millions of tweets, it is not feasible to read them all and provide a personal interpretation. That’s where data mining comes into play. Text mining and sentiment analysis work like this: you bounce words and phrases in the relevant text off of a sentiment dictionary and score each word along one or more vectors. For example, you may look for positive and negative words, or excited and lethargic words. Usually, the vectors along which one performs sentiment analysis have two meaningful poles (such as positive and negative).

If you spend any time reading through tweets, you may come to believe that we already live in an Idiocracy. I personally missed the part of the Twitter Terms of Use that require poor English skills, a lack of decency, unbridled narcissism flavored with racism, and a penchant for general assholery, but after spending a few days with this project, I’m convinced they exist. What I’m trying to say is that it can be very difficult to perform sentiment analysis on tweets. In the Twitterverse, words considered vulgar and inappropriate are used as terms of endearment as often as insults.

Short of creating a custom sentiment dictionary, I concluded that I would have to treat sentiment scores as if they have larger random error when dealing with tweets than with formal bodies of text. Therefore, lacking any good short-term way to perform robust sentiment analysis along any vector other than positive and negative, I decided to rank all tweets using the AFINN sentiment dictionary. Basically, as described above, this rates words with a valence from -5 to 5 along a vector from negative to positive.

The process that I follow to analyze the tweets goes like this:

  1. Harvest the tweets through the Twitter API.
  2. Extract relevant fields from the JSON and save them to CSV.
  3. Churn through the CSV, performing sentiment analysis on each tweet and flagging it along a number of variables: is it related to the PS4 or XB1 (or both), is it racist, is it vulgar, is it homophobic, is it a retweet, does it indicate defective hardware or software, etc.
  4. Aggregate all of the data from the tweets by the hour.
  5. Use natural language processing to aggregate the tweets by hour, tokenize them and search for top frequency words, bigrams (two words that often appear next to each other), and trigrams.
  6. Stratify by console relevancy and calculate descriptive statistics (min, max, mean, median, etc).
  7. Improve the script(s) and iterate.

If you are interested, you can find the original and processed data here. The tools that I am using include Anaconda Python, Scikit Learn, Scipy, Numpy, NLTK, R, SAS Enterprise Miner, Matplotlib, Excel, and others. You can find more technical details in the Qt3 forum.

Preliminary Analysis

Beyond any doubt, the Twitter data shows more interest and excitement in the PS4 than in the Xbox One. Let’s look at a few charts that tell the story. First of all, this chart shows the volume of tweets, per hour (time stamps are in UTC). At the time of its release, there were nearly twice as many tweets pouring in about the PS4 as there were about the Xbox One when it was released.


Perhaps more interesting, however, is the mean sentiment per tweet, stratified by console, shown over the same period of time. Here we see several spikes in positive sentiment for the PS4, but few for the Xbox One. Surprisingly, there’s not even a peak in sentiment for the Xbox One at the time of its release. The only real indicator, in the sentiment data, of the Xbox One release is the dip in sentiment for the PS4. (We see a similar dip in sentiment for the Xbox One at the time of the PS4 release.) Overall, across all collected tweets, the mean sentiment score for PS4 tweets is 26% higher than for Xbox One tweets.


Next, I looked for words in the tweets that might indicate either software or hardware problems with the consoles. As you know, reports of errors with new consoles spread like wildfire. Based on this early data, the Xbox One comes out looking slightly better here.


Finally, for the initial analysis, I examined the words that most frequently appear in tweets both combined and stratified by console. These words (and bi/trigrams) represent the most commonly expressed sentiments regarding the consoles. As you might expect, chatter about each console increased surrounding each launch.

For the PS4, these terms indicate excitement building towards the launch, and then a significant number of tweets repeating the initial sales figures, post release. Perhaps most revealing is that there were 6 consecutive hours of PS4 tweets that either were retweets of or replies to the Microsoft tweets congratulating Sony on the PS4 launch. Those may be part of the reason for the large PS4 sentiment spike at its time of release. Isn’t it ironic that Microsoft may have purposefully given Sony one of its largest social media exposure bumps?

The Xbox One tweets were varied and somewhat reactionary to the PS4 release up until several days prior to the Xbox’s release. Our friend Zlatan shows up again in the Xbox chatter, this time because he evidently signed an Xbox One for a giveaway event. In general, it is more difficult to examine the common Xbox One tweet terms and determine trends than it is to do with the PS4. This may be due to tuning of the word frequency script (which attempts to remove common words that clutter sentiment, such as “to,” “the,” “I’m,” etc). This is a ripe area for additional future analysis.

My purpose in these articles simply is to show the Twitter data and some methods for analyzing it. I have no favorite in the new console generation — I own neither of them, and probably won’t for another year. There are many reasons why the volume and sentiment charts might look this way and none of those reasons are known to me. However, with a huge number of tweets to mine, I can and will continue to examine the data to determine what trends it shows with regards to console sentiments.

In the next article, I’ll throw some clustering algorithms at the data to see how market segmentation might enable marketers and Twitter to deliver targeted advertisements to you and your social network. If you have any questions or suggestions, I would be happy to discuss them with you either in the comments here or in the forum!


This post brought to you by Clay Heaton, former game developer, long time Qt3 reader and forum member, founder of the nascent GameAid, and current MSc student in Advanced Analytics at North Carolina State University.