Huge amounts of data are generated every day on Twitter. These data are mostly in the form of text that can be used to provide important information about clients and others groups of interest and support decisions if they are well analyzed.
In this article we are going to see how to fetch tweets data from a public Twitter account using the Python module called Tweepy.
Step 2: Accessing the API using tweepy
Open a text editor of your preference and start the MongoDB server by executing mongod command in command-line.
First, we need to import the tweepy module and the class MongoClient from pymongo module, as we can see in the Listing 1.
Listing 1. Importing required modules and classes
01 import tweepy
02 from pymongo import MongoClient
Now we are going to need to set the variables that are required to initially have access to the Twitter API. They are the Consumer Key, Consumer Secret, Access Token and Access Token Secret. You have access to all of their values in App's Manage Keys and Access Token tab in the App's main page. We set our variables as in the Listing 2.
Listing 2. Setting Consumer Key, Consumer Secret, Access Token and Access Token
03 consumer_key = "Your Consumer Key"
04 consumer_secret = "your Consumer Secret"
05 access_token = "Your Access Token"
06 access_token_secret = "Your Access Token Secret"
Following this, it is the time to create a function. It will receive a screen_name parameter,
which is a Twitter username.
Inside the function, we are going to instantiate the MongoClient and get the database and collection that we are going to work with. If the database or collection does not exist, MongoDB will create it for you. In our case, the database is called TwitterData and the collection is called Tweets, as the Listing 3 shows.
Listing 3. Getting the collection with MongoClient instance
07 def get_all_tweets(screen_name):
08 client = MongoClient()
09 db = client.db_twitter
10 tweets = db.Tweets
Line 09: Getting the database db_twitter;
Line 10: Getting the collection Tweets;
So the database and the collection are created, if they does not exist yet, dynamically
Now we set OAuth authentication to allow tweepy to connect to Twitter API. To achieve that, we are going to create a tweepy.OAuthHandler instance and pass both Consumer Key and Consumer Secret as parameters, in this order. Also it is needed to call the OAuthHandler.set_access_token method passing both Access Token and Access Secret as parameters. Finally, we instantiate the tweepy.API class passing the OAuthHandler instance we have created as parameter. Now we are able to access the Twitter RESTful API methods as we can see in the Listing 4.
Listing 4. Instantiate tweepy.API using keys and secrets
11 auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
12 auth.set_access_token(access_token, access_token_secret)
13 api = tweepy.API(auth)
Line 11 and 12: Passing the keys and secrets that we have set in Listing 2;
We now create a list to store the tweets data. Then we call the user_timeline API method passing the screen_name and count parameters and append the results to the list using the extend method of the list. For each call is only allowed to retrieve up to 200 tweets, but it is possible to write a while loop, calling the user_timeline method and pass an additional parameter called max_id, setting it as the oldest tweet retrieve until now minus 1 in the end of each loop. Twitter allows us to retrieve up to 3240 tweets using its API method. These steps are shown in the Listing 5.
Listing 5. Retrieving Twitter tweets
14 alltweets = 
15 new_tweets = api.user_timeline(screen_name=screen_name, count=200)
17 oldest = alltweets[-1].id - 1
18 while len(new_tweets) > 0:
19 new_tweets = api.user_timeline(screen_name=screen_name, count=200, max_id=oldest)
21 oldest = alltweets[-1].id - 1
Line 15: Call api.user_timeline and retrieving the 200 more recent tweets;
Line 17: Set the value of the variable oldest, so we can keep getting new tweets;
Line 18 to 19: While loop to retrieve new tweets, call the api.user_timeline passing oldest as the max_id parameter, append the results to the alltweets list created in Line 14 and set the new value of oldest.
The JSON data from the Twitter tweet object is stored in the _json property from the each object. Now we are going to write a for loop and add the json data as a document in MongoDB using the insert_one method of the collection's variable as we can see in the Listing 6.
Listing 6. Insertion of JSON data to the MongoDB collection
22 for tweet in alltweets:
24 print("Method execution is finished.")
Now our method's code is done! Now it's possible to call this method passing a Twitter username as parameter (with the @ sign before) and the JSON data of all the retrieved tweets to MongoDB as it is done in the line of code right below!
In the Figure 2 we can see the result of our code execution and the new populated MongoDB collection Tweets.
Figure 2. Result of the code and collection queries execution
Now that we have all this data stored in our MongoDB database, it is recommended to take a look in the document structure, since there is stored the tweet’s hashtags, text and other fields. Now it is possible to try out some interesting things. You could fetch the hashtags and check which hashtag is more widely used by a single user. Or you could even execute a sentimental analysis on the text of these tweets. It is up to you to decide what you want to do with this powerful amount of data!