Making a TwitterBot to collect data in R

The api in twitter allows you to collect a subset of data from the past 7 days, based on a particular search term. The returned tweets are based on “relevance” rather than “completeness” meaning some tweets containing your searched phrase may not be collected by your bot. Alternatively there are paid plans that allow this type of data collection.

Here we will demonstrate how you can collect data for free.

1st. You need a twitter account.
2nd. You need to make sure you have applied for a twitter developer account, and got this approved.
3rd. Once you have a developers account you can start creating apps, and also get the keys and tokens for that app which are required for authentication.

Then you need to open R studio, and authenticate your session:

Once you are authenticated, you can use r to collect tweets containing a particular search term. Here, we searched for words containing the word ‘dog’. If you want to search for a phrase, use double speech marks, (i.e. “‘Insert Phrase Here'”). q stands for your search query, n stands for the number of tweets you would like, and language =”en” specifies that you want to get tweets written in english. For more arguments, see documentation for the package rtweet: https://cran.r-project.org/web/packages/rtweet/rtweet.pdf

Once you have this initial dataset, you can continually add to this via a loop. Remember, twitter has a limit to how many times you can get tweets within a specific time frame, so the loop contains a delay to control for this.

What the loop does, is calculate the oldest tweet you have, and sets the maxid to this. Then it binds the original data set to a second data set, made by running another search query. Then it makes a data frame to save in your working directory. Then the loop pauses for 600 seconds. The loop will repeat itself dependant on the number you place in the for (i in 1:20) { argument.