× Please, log in to give us a feedback. Click here to login
×

You must be logged to download. Click here to login

×

MrBool is totally free and you can help us to help the Developers Community around the world

Yes, I'd like to help the MrBool and the Developers Community before download

No, I'd like to download without make the donation

×

MrBool is totally free and you can help us to help the Developers Community around the world

Yes, I'd like to help the MrBool and the Developers Community before download

No, I'd like to download without make the donation

Tweepy: Retrieving and storing Twitter data using Python and MongoDB

In this article we are going to see how to retrieve Twitter tweets data using the Python tweepy module and store in a MongoDB database.

Motivation

Huge amounts of data are generated every day on Twitter. These data are mostly in the form of text that can be used to provide important information about clients and others groups of interest and support decisions if they are well analyzed.

In this article we are going to see how to fetch tweets data from a public Twitter account using the Python module called Tweepy.

Step 1: Setting up the environment

To be able to retrieve data using Tweepy you are going to need the following things:

  • Install tweepy module. You can execute the command pip install tweepy in the command-line to install it.
  • Install pymongo module. You can execute the command pip install pymongo in the command-line to install it.
  • Create an app in https://apps.twitter.com/. An authenticated Twitter user is required.
  • Create an access token. You can achieve that by accessing the Keys and Access Token tab of your Twitter App’s main page and clicking in Create my access token button. The Figure 1 shows the page you need to access to execute these actions. You need to roll the scroll down to find the Create my access token button.


Figure 1. Keys and Access Token tab of your Twitter App’s main page

In this example we are going to use Python 3.5.1 and MongoDB 3.0.7 installed. You can download both from https://www.python.org/downloads/ and https://www.mongodb.com/download-center#community respectively.

Now we have access to all tools and information required to retrieve data from Twitter tweets. It is finally time to code!

Step 2: Accessing the API using tweepy

Open a text editor of your preference and start the MongoDB server by executing mongod command in command-line.

First, we need to import the tweepy module and the class MongoClient from pymongo module, as we can see in the Listing 1.

01 import tweepy
02 from pymongo import MongoClient
Listing 1. Importing required modules and classes

Now we are going to need to set the variables that are required to initially have access to the Twitter API. They are the Consumer Key, Consumer Secret, Access Token and Access Token Secret. You have access to all of their values in App's Manage Keys and Access Token tab in the App's main page. We set our variables as in the Listing 2.

03 consumer_key = "Your Consumer Key"
04 consumer_secret = "your Consumer Secret"
05 access_token = "Your Access Token"
06 access_token_secret = "Your Access Token Secret"
Listing 2. Setting Consumer Key, Consumer Secret, Access Token and Access Token Secret

Following this, it is the time to create a function. It will receive a screen_name parameter, which is a Twitter username.

Inside the function, we are going to instantiate the MongoClient and get the database and collection that we are going to work with. If the database or collection does not exist, MongoDB will create it for you. In our case, the database is called TwitterData and the collection is called Tweets, as the Listing 3 shows.

07 def get_all_tweets(screen_name):
08     client = MongoClient()
09     db = client.db_twitter
10     tweets = db.Tweets
Listing 3. Getting the collection with MongoClient instance

Line 09: Getting the database db_twitter;

Line 10: Getting the collection Tweets;

So the database and the collection are created, if they does not exist yet, dynamically

Now we set OAuth authentication to allow tweepy to connect to Twitter API. To achieve that, we are going to create a tweepy.OAuthHandler instance and pass both Consumer Key and Consumer Secret as parameters, in this order. Also it is needed to call the OAuthHandler.set_access_token method passing both Access Token and Access Secret as parameters. Finally, we instantiate the tweepy.API class passing the OAuthHandler instance we have created as parameter. Now we are able to access the Twitter RESTful API methods as we can see in the Listing 4.

11 auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
12 auth.set_access_token(access_token, access_token_secret)
13 api = tweepy.API(auth)
Listing 4. Instantiate tweepy.API using keys and secrets

Line 11 and 12: Passing the keys and secrets that we have set in Listing 2;

We now create a list to store the tweets data. Then we call the user_timeline API method passing the screen_name and count parameters and append the results to the list using the extend method of the list. For each call is only allowed to retrieve up to 200 tweets, but it is possible to write a while loop, calling the user_timeline method and pass an additional parameter called max_id, setting it as the oldest tweet retrieve until now minus 1 in the end of each loop. Twitter allows us to retrieve up to 3240 tweets using its API method. These steps are shown in the Listing 5.

14 alltweets = []
15 new_tweets = api.user_timeline(screen_name=screen_name, count=200)
16 alltweets.extend(new_tweets)
17 oldest = alltweets[-1].id - 1
18 while len(new_tweets) > 0:
19     new_tweets = api.user_timeline(screen_name=screen_name, count=200, max_id=oldest)
20     alltweets.extend(new_tweets)
21     oldest = alltweets[-1].id - 1
Listing 5. Retrieving Twitter tweets

Line 15: Call api.user_timeline and retrieving the 200 more recent tweets;

Line 17: Set the value of the variable oldest, so we can keep getting new tweets;

Line 18 to 19: While loop to retrieve new tweets, call the api.user_timeline passing oldest as the max_id parameter, append the results to the alltweets list created in Line 14 and set the new value of oldest.

The JSON data from the Twitter tweet object is stored in the _json property from the each object. Now we are going to write a for loop and add the json data as a document in MongoDB using the insert_one method of the collection's variable as we can see in the Listing 6.

22 for tweet in alltweets:
23     tweets.insert_one(status._json)
24 print("Method execution is finished.")
Listing 6. Insertion of JSON data to the MongoDB collection

Now our method's code is done! Now it's possible to call this method passing a Twitter username as parameter (with the @ sign before) and the JSON data of all the retrieved tweets to MongoDB as it is done in the line of code right below!

get_all_tweets("@BarackObama")

In the Figure 2 we can see the result of our code execution and the new populated MongoDB collection Tweets.


Figure 2. Result of the code and collection queries execution

Now that we have all this data stored in our MongoDB database, it is recommended to take a look in the document structure, since there is stored the tweet’s hashtags, text and other fields. Now it is possible to try out some interesting things. You could fetch the hashtags and check which hashtag is more widely used by a single user. Or you could even execute a sentimental analysis on the text of these tweets. It is up to you to decide what you want to do with this powerful amount of data!



.NET Developer. Python is my hobby. Particularly interested in NoSQL Databases such as MongoDB and RavenDB.

What did you think of this post?
Services
[Close]
To have full access to this post (or download the associated files) you must have MrBool Credits.

  See the prices for this post in Mr.Bool Credits System below:

Individually – in this case the price for this post is US$ 0,00 (Buy it now)
in this case you will buy only this video by paying the full price with no discount.

Package of 10 credits - in this case the price for this post is US$ 0,00
This subscription is ideal if you want to download few videos. In this plan you will receive a discount of 50% in each video. Subscribe for this package!

Package of 50 credits – in this case the price for this post is US$ 0,00
This subscription is ideal if you want to download several videos. In this plan you will receive a discount of 83% in each video. Subscribe for this package!


> More info about MrBool Credits
[Close]
You must be logged to download.

Click here to login