Tweetstorm

Social media information is becoming increasingly influential, and has helped shaped public discourse for world events.

In our project, Team 26 has built a model that conducts sentiment analysis on Twitter posts, and then created a platform to visualize the link between social media posts and the outcome of events.

Our focus here is on the 2016 US Presidential Election. However it is our hope that this work can be applied to any form of event to provide users insight into the impact social media can have.

Whilst we have developed our platform to work across browsers, including mobile, we have tested on Google Chrome and Mozilla Firefox and recommend these browsers for the best experience.

About Tweetstorm

Our Project

In recent times there has been a lot of discussion about the influence of information sharing on social media, including ‘fake news’ (Allcott, Gentzkow, 2017). The quantum of research that has found social media to be influential far outweighs those that don’t, including research that found that the sentiment of the public and the voting preferences of users can be identified by the volume of tweets (Tumasjan, Sprenger, Sandner, Welpe, 2010).

Twitter, “a popular microblogging service where users create status messages (called “tweets”)” (Go, Bhayani, Huang, 2009) offers access to services that allows for the analysis of tweets and can be used to assess the platform's influence. Team 26 has used these services to create a model to understand the sentiment of tweets and then visualized these for users to assess.

To demonstrate this platform we have used tweets from the 2016 US Presidential election to showcase what our model would have suggested to be the likely outcome.

Our team of five present to you, Project Tweetstorm.

Matthew Tinker

Team Coordinator

Ensured team deliverables were met, coordination of check-ins, report drafting and submission.

Minhao Leong

Data Specialist

Identified suitable datasets, prepared and curated data, developed analytic models.

Jeffrey Lee

Data Specialist

Identified suitable datasets, prepared and curated data, developed analytic models.

Jun Xiong Tan

Front-end Developer

Developed interactive visualizations and web interface.

Jinchao Lin

Front-end Developer

Developed interactive visualizations and web interface.

Approach

Problem Statement

The problem that has been addressed by this piece of work is to develop understanding of how influential social media can be on world events. Presented in a way useful for many audiences to consume.

Data Approach

Over 10 million tweets from key stages of the 2016 US Presidential Election were obtained by using the IDs captured by the Harvard Dataverse (Littman, Wrubel, Kerchner, 2016) and then using these tweet IDs the team called the Twitter API to hydrate the IDs to obtain the full tweets. Once these tweets were extracted, they were run through a series of cleansing processes to remove tags, links, and non-useful labels including “RT” for retweet. These steps then provided a clean and useful dataset to allow us to progress to the next stage.

The further expansion upon this data preparation was to commence classifying characters produced as a result of the use of emojis. Characters used in combination that could be classified as emojis were left in the dataset for the use of the analysis. These tweets were stored on an SQLite Database.

Observations

Are Emojis useful additions to the sentiment analysis model?
Further development could continue to add features including sarcasm handling into the model however it does generally provide value to the model.

Can polarity and subjectivity measures from sentiment analysis on tweets help to predict the outcome of events?
At the point when the election was held, the sentiment analysis model was able to predict 29 states of the 52 correctly. Whilst that may not indicate a resounding success what it did demonstrate is a likely win by Donald Trump throughout the majority of the election campaign which would have been more successful than most voter polls. This also demonstrates some success of the project which was centered on being able to use tweets to correctly predict the outcome of events.

Sentiment Model

Our team produced a sentiment analysis model using a complex series of rules. This approach was taken so that we were able to expand the model beyond just standard text analysis to the classification of a combination of characters as either positive or negative sentiment text. The model developed then assigned both a polarity and a sentiment score to each tweet. The underlying approach works in the manner of a lookup table on a library of words and phrases which provided the polarity, subjectivity and intensity. For each word these scores are applied in the following way:

Category Scoring

Polarity A score between -1 and 1, with the higher score representing a more positive sentiment.

Subjectivity A score between 0 and 1, 0 being objective, 1 being subjective.

Intensity A score which is a multiplier between 0.5 and 2 and considers a word and its influence on the following word.

Category	Scoring
Polarity	A score between -1 and 1, with the higher score representing a more positive sentiment.
Subjectivity	A score between 0 and 1, 0 being objective, 1 being subjective.
Intensity	A score which is a multiplier between 0.5 and 2 and considers a word and its influence on the following word.

Geographic Sentiment

The following visualization provides a view of the polarity, subjectivity and candidate preference that Tweetstorm has modelled. To use the visualization:
1. Hover over each state to see the model outputs, 2. The candidate preference is displayed beneath the map, the polarity and subjectivity to the right, 3. You can choose the map overlay by using the options beneath the map, 4. Click on a state to zoom in and see the candidate outcomes and top themes for the state, 5. Click again to zoom out, 6. Use the event selector above to visualize data from different events.

This visualization provides an overview of the change in sentiment over time in each state. The model provides good insight into the candidate preference and also demonstrates that the eventual winner, Donald Trump, maintained a candidate preference lead througout most of the key election stages. This visualization along with the rest that you will see below provide an ability to easily adapt the platform for any scenario where there has or will be a binary outcome as an event result.

Polarity and Subjectivity Time Series

The following visualization provides a time series view of the change in sentiment based on five key events over the course of the US Presidential Election. There are a number of options available for this visualization including:
1. Toggling the number labels can be done via the checkbox at the top of the chart, 2. Toggling between polarity and subjectivity can be done via the two buttons available, 3. Click the images to filter candidates, 4. Finally, you can take a look at the specific details related to the key events through the event selector at the top of the page or selecting this event time period on the chart.

Number

This visualization demonstrates the movement of subjectivity and polarity over time. The two values are almost inverse to each other with polarity peaking at the time of the Election whilst the subjectivity reached its lowest point at this time.

Tweet Theme Analysis

This theme analysis provides users a view of the top three themes discussed on Twitter over the five key stages of the election. Below the bar chart this has then been visualized at a state vs topic level in a matrix. To view the top themes for each key event over the election campaign please use the event selector at the top of the page. To assist with readability the topics and States can be selected in the matrix to highlight or click the relevant topics.

This visualization assists to understand the key topics over the election and the impact they had in each state. What it also demonstrates is the messages which meant the most to the areas and where the candidates could have focused their attention. An interesting observation following the second debate was that the topic of "Women" was the top theme at a time that Hilary Clinton appeared to take over candidate preference from Donald Trump, however whilst it was the top theme overall iit was only the top theme in the states of Nebraska and Wyoming although it was in the top three in all other states except for five.

Network Analysis

The visualization below demonstrates the networks of Twitter users and their connection throughout the key periods of the US Presidential election. It shows that over time the groups change significantly moving from a larger network (with a large tweet volume) associated to Hilary Clinton's account in the first debtate to a larger network building towards Donald Trump over the later key points. To use this visualization you can filter by the minimum tweet count at the top of the visualization and to see the networks in greater detail you can double click on any node for it to stick to the canvas. The changes over the key time points can be viewed by using the buttons at the top of the page.

Filter: Count More Than − +

The interesting aspect of this visualization is that at the early stages (using a larger tweet volume of > 30) we can see the network of Hilary Clinton as being larger than that of Donald Trump. This is not the case when factoring in a lower number of tweets; however over time this seemed to shrink as the campaign of Donald Trump gained momentum.

Emoji Analysis

The innovation of this piece of work was that of the development of Emoji Analysis. Whilst the emoji analysis was built into the sentiment modelling we also developed some visualizations demonstrating emoji use at the different time points. To view the change in the use of emojis throughout the various stages of the election, please use the options at the top of the page.

The first chart was a spider web analysis looking at the use of emojis based on Team 26's categorization. The emoji use peaked on the day of the election with 'surprise' really increasing in use. The below doughnut chart then looks at the emoji use in greater detail.

The final visualization provides a dendrogram as another view to see the use of emojis and also make it clear to the user as to the categorization of the particular emojis in our analysis.

References:

Allcott, H., & Gentzkow, M. (2017). Social media and fake news in the 2016 election. Journal of economic perspectives, 31(2), 211-36.

Go, A., Bhayani, R., & Huang, L. (2009). Twitter sentiment classification using distant supervision. CS224N project report, Stanford, 1(12), 2009.

Tumasjan, A., Sprenger, T. O., Sandner, P. G., & Welpe, I. M. (2010, May). Predicting elections with twitter: What 140 characters reveal about political sentiment. In Fourth international AAAI conference on weblogs and social media.