Every day a large amount of data is produced. Data take on different formats and shapes; from data points to images and written words. A big source of data is social media and recently companies, institutions, and other organizations rely on data analytics from social media to better understand the success of marketing strategies, people’s interest, and other trends. A relatively new command in Stata 15 allows users to import data from twitter straight into a Stata data editor. The command is
Before using the command, one needs to do few things outside of Stata:
- You need to have a Twitter account. If you do not already have an account, you can follow this link to create one.
- Once you create an account, you will need to create a Twitter application. Here is the link to do so.
- Click on Create New App button which should take you to the following page
- Under Name create a name for your twitter application (choose any name that you see fit); Note: the Name has to be unique
- You will also need a brief description in the Description box
- And information for the Website box
- Fill the Developer Agreement box to go to the next pageOn the next page, click on the Keys and Access Tokens tab
At the bottom of the page click on Create my access token button
Note: I redacted the information from my screen
- From this, you will need to copy the following information into a Stata dofile (see bullet point #4)
- Consumer Key (API)
- Consumer Secret (API Secret)
- Access Token
- Access Token Secret
3. Install the command twitter2stata in Stata by typing
ssc install twitter2stata, replace
4. Open a new Stata dofile and create 4 local macros which take the names listed above (consumer key, consumer secret, access toke, and access token secret); Note: the XXXXXX refer to the information that has been redacted because this information is private and should not be shared with anyone.
local consumer_key "XXXXXX"
local consumer_secret "XXXXXX"
local access_token "XXXXXX"
local access_token_secret "XXXXXX"
5. Now we need to give Stata access to those 4 pieces of information so we type
twitter2stata setaccess "`consumer_key'" "`consumer_secret'" "`access_token'" "`access_token_secret'"
6. We are now ready to start importing data from Twitter. For the sake of example, we will download the last 50 tweets which include the search word “world cup” (see file)
twitter2stata searchtweets "world cup", numtweets(50)
Stata downloads a dataset with 45 variables which include the date and time the tweet was created, the actual tweet text, when possible geographical location (longitude latitude), user screen name, user’s display name and description, time zone, user’s URL, user’s list count, user’s status count, user’s follower count, whether the tweet was retweeted among several others.
We can always adjust some of the options to refine our search and data download. For instance, we can import user data for a specific user (we choose FIFA):
twitter2stata getuser "@FIFAcom", clear
The information that is imported includes 22 variables (see file) where for instance: the user_account_timestamp is when the user joined Twitter, user_follower_count is the number of followers, user_favorite_count is the number of likes, user_friend_count is the number of following, and a dummy that is 1 because the user is verified.
If we want to import tweet data on a specific user then we can type
twitter2stata tweets "@FIFAcom", clear
We end up with 45 variables and 3,200 observations (tweets). Check the data here.
There are several options that allow the user to edit the choices such as limiting date range, focusing on specified list, etc. However, keep in mind that the command is governed by limitations from Twitter that you can read about here.