Login:  Password:    
forgot my password
sign up!
Search: 

Creating a Link Extractor with Java

In this article we will develop a Simple Link Extractor which will go extract all the links present on that page.

0 0

To start our example, you'll need to download a library called JSoup and import the necessary classes.

Listing 1 : Importing the classes

import java.util.List;
import java.util.ArrayList;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

After this we'll declare the class LinkExtractor and the Main Method.

Note: We created a class called LinkExtractor, but in your example, you can use other name.

Listing 2: Declaring Classes and Main Method.

public class LinkExtractor
{
public final static void main(String[] args) throws Exception{

     String site = "http://www.mysite.com";
    List<String> links = LinkExtractor.extractLinks(site);
   for (String link : links) {
       System.out.println(link);
   }
 }

In the list above we defined a variable called "site", this variable will store the links extracted from website.

We call the extractLinks method(we'll define the method later), this method will extract the links, store in the links variable and then we show in console.

Now we will define the extractLinks method:

Listing 3: Defining extractLinks Method

  public static List<String>extractLinks(String url) throws Exception {
    final ArrayList<String> result = new ArrayList<String>();
 
    Document doc = Jsoup.connect(url).get();
 
    Elements links = doc.select("a[href]");
 
    for (Element link : links) {
      result.add(link.attr("abs:href"));
    }
 
    return result;
  }
}

We defined the extractLinks method and the result method will store the links.

We will connect to the site and extract the links. We obtain the document by using Jsoup.connect(url).get();

We want to find all the links, so we need find all href tags on the page, crawl through each link and then store those links in result variable. The value in href tag is obtained by using link.attr("abs:href")

Now we'll return the result obtained and display the results.

You can see below the full source code of our example and you can test in your application.

Full Source Code :

package article5;
import java.util.List;
import java.util.ArrayList;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class LinkExtractor
{

 public static List<String>extractLinks(String url) throws Exception {
   final ArrayList<String> result = new ArrayList<String>();

   Document doc = Jsoup.connect(url).get();

   Elements links = doc.select("a[href]");

   for (Element link : links) {
     result.add(link.attr("abs:href"));
   }

   return result;
 }


 public final static void main(String[] args) throws Exception{

         String site = "http://www.mysite.com";
        List<String> links = LinkExtractor.extractLinks(site);
   for (String link : links) {
       System.out.println(link);
   }
 }
}

I hope you liked the article, see you next time.


Anurag Jain
My main area of specialization is Java and J2EE. I have worked on many international projects like Recorders,Websites,Crawlers etc.Also i am an Oracle Certified java professional as well as DB2 certified
Add your comment
[Fechar]

Este post é fechado - você precisa ter acesso ao post para incluir um comentário.


no comments have been posted - be the first!
Help us to improve! Give us your feedback:

Give your note to this post: 1 2 3 4 5 6 7 8 9 10
Is this post helpful? Yes No



[Close]
To have full access to this post (or download the associated files) you must have MrBool Credits.

  See the prices for this post in Mr.Bool Credits System below:

Individually – in this case the price for this post is US$ 0,00 (Buy it now)
in this case you will buy only this video by paying the full price with no discount.

Package of 10 credits - in this case the price for this post is US$ 0,00
This subscription is ideal if you want to download few videos. In this plan you will receive a discount of 50% in each video. Subscribe for this package!

Package of 50 credits – in this case the price for this post is US$ 0,00
This subscription is ideal if you want to download several videos. In this plan you will receive a discount of 83% in each video. Subscribe for this package!


> More info about MrBool Credits








mrbool.com
contact us   |   publish your post   |   buy credits

Copyright 2013 - all rights reserved to www.web-03.net