Let’s say that you want to gather information about a particular website for some kind of security analyst job. So what you can do is go to the terminal and you can start gathering all the information. It’s going to take a lot of time finding the IP address of the website, getting a Nmap scan, the robots.txt, the whois. So, why not just build a tool in Python that allows us to do it all in a single click?
We will build a tool in Python that allows us to do it all in a single click. So again what the tool is going to do is that you have to just type the website url like “facebook.com” and hit GO. It will grab all the information for you. So it’s pretty cool. We will be scanning the website and storing the result. People probably don’t have MySQL installed on their system so I will teach you how to save the data in a simple text file which is easily readable to each and everyone.
Listing 1. Writing general.py file
# Importing os to make operating system calls using Python import os # Function to create a new directory def create_dir(directory): if not os.path.exists(directory): os.makedirs(directory) # Function to write data to a file def write_file(path, data): f = open(path, "w") f.write(data) f.close()
We will create a new file called “general.py”. Inside here what I am going to do is make two really simple functions to first create a directory and another one to just write to a file. So then whenever we start building our little tools then we can save the results easily using this file.
Let’s start by importing os!
=> import os
Basically now I want to create a function to make a new directory because let’s say that we have a bunch of target or scanning a bunch of websites, I want to store all the results in their own directory. So I have a directory for youtube, ebay and etc etc.
=> def create_dir( directory ):
Now, we have to check if the directory already exists or not. So let’s say we are scanning a list of 100 websites. Maybe we already have some websites scanned and we do not want to scan them again.
=> if not os.path.exists ( directory ):
Basically we are only going to create this folder if it is not created yet. Simple Enough! So,
=> os.makedirs ( directory )
Let’s say that we pass “ebay”. So it’s going to say that did we create this folder yet? No! Then I am going to create it. If it is already created, then you don’t have to do anything. That was the easiest function in the world.
So this next one is just to write a simple file. I am just going to call it write_file.
=> def write_file ( path, data ):
Path is where you want to write it and data is what you want to write. So first thing I am going to do is I will open a file with path in a writing mode.
=> f = open ( path, ‘w’ ) => f.write ( data ) => f.close ()
We are going to pass in the path essentially where we want to write it, what folder, what location and also what you want in the file and that is all we need to do for the “general.py”. In the next tutorial we are going to start with the fun stuff and making the actual tools.
Top Level Domain Name
Listing 2. Writing domain_name.py file
# Importing get_tld from tld import get_tld # Function to get the top level domain def get_domain_name ( url ): domain_name = get_tld(url) print("Domain name done!") return domain_name
I am going to show you how to get the top level domain for a website.
Now if you don’t know what the top level domain is, it’s basically a small part of the URL. Let’s understand this by an example.
“https://www.facebook.com/” => This is a simple URL or the full URL. When we talk about the top level domain name, we mean “facebook.com”. Not the protocol, not the www, not the directory at the end, it’s only “facebook.com” in this case.
At first I thought user is going to post a URL and then we are just going to rip the extra part which is not needed. So for this we are going to use a Python module. First let’s open a terminal and try “whois” command.
=> whois https://www.facebook.com
It will not show the result.
You can easily see the error “No whois server is known for this kind of object”. This only works with a top level domain name. Now let’s try with a top level domain.
=> whois facebook.com
Now it will show all the results. Now let’s get to work. We will create a new file called “domain_name.py”. You need to go ahead and import “tld” and from “tld” we can import “get_tld”
=> from tld import get_tld
If you don’t know how to install this, then you can do a “pip” or a manual installation. Let’s see how you can install “pip” and “tld”.
=> sudo apt-get install python-pip
So this has successfully installed pip and now we will install tld using pip.
=> pip install tld
This has installed the Python module “tld”.
Let’s get back to the domain_name.py file. Now I am going to make a function called “get_domain_name” and pass in the URL.
=> def get_domain_name ( url ):
So essentially what the user is going to pass in is the full URL. So now we are going to rip the extra part from the full URL to get the top level domain.
=> domain_name = get_tld ( url )
This only accepts a single parameter which is the full URL of the website and then we are just going to return the top level domain i.e domain name.
=> return domain_name
What this function does is that you pass in an URL and it gives you the plain top level domain name and just so that we can verify it, if we just run
=> print ( get_domain_name ( ‘https://www.facebook.com’ ) )
Alright let’s run this real quick and check it out. So we just passed in the full URL and it returned the top level domain name.
Now we can allow the user to pass in any URL and we can extract the top level domain.
Listing 3. Writing ip_address.py
# Importing os import os # Method to get the IP Address def get_ip_address ( url ): command = "host " + url process = os.popen( command ) results = str( process.read() ) marker = results.find( 'has address' ) + 12 # Returning only the top level IP Address print("IP Address done!") return results[marker:].splitlines()
Now that we have the top level domain of the target, we need to get the IP Address of that website and I will show you guys what I mean. Now I am pretty sure that there is an easy way to do this but this is how I do it.
So in the terminal if you type
=> host facebook.com
or any other top level domain and hit enter, what this does is, it returns the IP Address. Now the thing is we just can’t take these results and store them in a text file because we are only worried about the IP Address, not the whole result. So what I am going to do is run this command through Python and then we are going to extract the IP Address from the whole result.
Let’s make a new file “ip_address.py”.
We are going to import os which allows us to make operating system calls and allows us to use the command line or the terminal through Python.
=> import os => def get_ip_address ( url ):
We are passing an argument which is the top level domain name. Now the command that we are going to run is:
=> command = “host “ + url
Now what we are going to do is, in order to actually run that command and get the results back we are going to pretty much open up a new process.
=> process = os.popen ( command )
So this is going to run a new process, just think of it like running or opening a new terminal. We are storing the result in the variable called “process”.
So now what we need to do after that is we actually need to work on removing the extra part from the result as we only need the IP Address.
We are going to write:
=> results = str ( process.read () )
All we are doing here is actually just converting it to a string. Now I will make a marker like this :
=> marker = results.find ( ‘has address’ ) + 12
Let’s understand what this method does. This will look into the string ‘results’ and will find the index of ‘has address’. It will return the index of first character of the string. So now we will need to move 12 characters ahead so that we can reach at the starting of the IP Address that we are finding.
=> return results[marker:].splitlines()
The reason I am doing this is because let’s say that we have a domain name and it has multiple IP Addresses, like google.com
=> host google.com ( inside terminal )
We do not want all the IP Addresses , we only want the top one. So we are using a method split lines to give us only the top level IP Address.
So now, let’s verify whether this works.
=> print ( get_ip_address ( ‘google.com’ ) ) => print ( get_ip_address ( ‘ facebook.com’ ) )
Let’s run this in the terminal.
We have got the IP Address of ‘google.com’ and ‘facebook.com’.
So it does not matter if the result is one IP Address or more, we wrote a method in Python that will only extract the top level IP Address of the website. We can now use this in our other scanning tools.