To start our example, you'll need to download a library called JSoup and import the necessary classes.
import java.util.List; import java.util.ArrayList; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements;
After this we'll declare the class LinkExtractor and the Main Method.
Note: We created a class called LinkExtractor, but in your example, you can use other name.
public class LinkExtractor
{
public final static void main(String[] args) throws Exception{
String site = "http://www.mysite.com";
List<String> links = LinkExtractor.extractLinks(site);
for (String link : links) {
System.out.println(link);
}
}
In the list above we defined a variable called "site", this variable will store the links extracted from website.
We call the extractLinks method(we'll define the method later), this method will extract the links, store in the links variable and then we show in console.
Now we will define the extractLinks method:
public static List<String>extractLinks(String url) throws Exception {
final ArrayList<String> result = new ArrayList<String>();
Document doc = Jsoup.connect(url).get();
Elements links = doc.select("a[href]");
for (Element link : links) {
result.add(link.attr("abs:href"));
}
return result;
}
}
We defined the extractLinks method and the result method will store the links.
We will connect to the site and extract the links. We obtain the document by using Jsoup.connect(url).get();
We want to find all the links, so we need find all href tags on the page, crawl through each link and then store those links in result variable. The value in href tag is obtained by using link.attr("abs:href")
Now we'll return the result obtained and display the results.
You can see below the full source code of our example and you can test in your application.
Full Source Code :
package article5;
import java.util.List;
import java.util.ArrayList;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class LinkExtractor
{
public static List<String>extractLinks(String url) throws Exception {
final ArrayList<String> result = new ArrayList<String>();
Document doc = Jsoup.connect(url).get();
Elements links = doc.select("a[href]");
for (Element link : links) {
result.add(link.attr("abs:href"));
}
return result;
}
public final static void main(String[] args) throws Exception{
String site = "http://www.mysite.com";
List<String> links = LinkExtractor.extractLinks(site);
for (String link : links) {
System.out.println(link);
}
}
}
I hope you liked the article, see you next time.







See the prices for this post in Mr.Bool Credits System below: