Free Online Courses for Software Developers - MrBool
× Please, log in to give us a feedback. Click here to login
×

You must be logged to download. Click here to login

×

MrBool is totally free and you can help us to help the Developers Community around the world

Yes, I'd like to help the MrBool and the Developers Community before download

No, I'd like to download without make the donation

×

MrBool is totally free and you can help us to help the Developers Community around the world

Yes, I'd like to help the MrBool and the Developers Community before download

No, I'd like to download without make the donation

Creating a Link Extractor with Java

In this article we will develop a Simple Link Extractor which will go extract all the links present on that page.

To start our example, you'll need to download a library called JSoup and import the necessary classes.

Listing 1 : Importing the classes

import java.util.List;
import java.util.ArrayList;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

After this we'll declare the class LinkExtractor and the Main Method.

Note: We created a class called LinkExtractor, but in your example, you can use other name.

Listing 2: Declaring Classes and Main Method.

public class LinkExtractor
{
public final static void main(String[] args) throws Exception{

     String site = "http://www.mysite.com";
    List<String> links = LinkExtractor.extractLinks(site);
   for (String link : links) {
       System.out.println(link);
   }
 }

In the list above we defined a variable called "site", this variable will store the links extracted from website.

We call the extractLinks method(we'll define the method later), this method will extract the links, store in the links variable and then we show in console.

Now we will define the extractLinks method:

Listing 3: Defining extractLinks Method

  public static List<String>extractLinks(String url) throws Exception {
    final ArrayList<String> result = new ArrayList<String>();
 
    Document doc = Jsoup.connect(url).get();
 
    Elements links = doc.select("a[href]");
 
    for (Element link : links) {
      result.add(link.attr("abs:href"));
    }
 
    return result;
  }
}

We defined the extractLinks method and the result method will store the links.

We will connect to the site and extract the links. We obtain the document by using Jsoup.connect(url).get();

We want to find all the links, so we need find all href tags on the page, crawl through each link and then store those links in result variable. The value in href tag is obtained by using link.attr("abs:href")

Now we'll return the result obtained and display the results.

You can see below the full source code of our example and you can test in your application.

Full Source Code :

package article5;
import java.util.List;
import java.util.ArrayList;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class LinkExtractor
{

 public static List<String>extractLinks(String url) throws Exception {
   final ArrayList<String> result = new ArrayList<String>();

   Document doc = Jsoup.connect(url).get();

   Elements links = doc.select("a[href]");

   for (Element link : links) {
     result.add(link.attr("abs:href"));
   }

   return result;
 }


 public final static void main(String[] args) throws Exception{

         String site = "http://www.mysite.com";
        List<String> links = LinkExtractor.extractLinks(site);
   for (String link : links) {
       System.out.println(link);
   }
 }
}

I hope you liked the article, see you next time.



My main area of specialization is Java and J2EE. I have worked on many international projects like Recorders,Websites,Crawlers etc.Also i am an Oracle Certified java professional as well as DB2 certified

What did you think of this post?
Services
[Close]
To have full access to this post (or download the associated files) you must have MrBool Credits.

  See the prices for this post in Mr.Bool Credits System below:

Individually – in this case the price for this post is US$ 0,00 (Buy it now)
in this case you will buy only this video by paying the full price with no discount.

Package of 10 credits - in this case the price for this post is US$ 0,00
This subscription is ideal if you want to download few videos. In this plan you will receive a discount of 50% in each video. Subscribe for this package!

Package of 50 credits – in this case the price for this post is US$ 0,00
This subscription is ideal if you want to download several videos. In this plan you will receive a discount of 83% in each video. Subscribe for this package!


> More info about MrBool Credits
[Close]
You must be logged to download.

Click here to login