【问题标题】:java - How to print link text available on the URL?java - 如何打印 URL 上可用的链接文本?
【发布时间】:2014-01-13 08:48:08
【问题描述】:

目前,我可以打印页面的所有 URL,但无法打印 URL 上可用的文本....

例如:

<a class="fbl" href="/preferences?hl=en" jsaction="foot.cst" id="fsettl">Settings</a> 

该代码只能打印“/preferences?hl=en”,但不能打印链接文本,即 Settings....

public static List getLinks(String uriStr) {

List result = new ArrayList<String>();
//create a reader on the html content
try{
    System.out.println("in the getlinks try");
URL url = new URI(uriStr).toURL();
URLConnection conn = url.openConnection();
Reader rd = new InputStreamReader(conn.getInputStream());

// Parse the HTML
EditorKit kit = new HTMLEditorKit();
HTMLDocument doc = (HTMLDocument)kit.createDefaultDocument();
kit.read(rd, doc, 0);

// Find all the A elements in the HTML document
HTMLDocument.Iterator it = doc.getIterator(HTML.Tag.A);
while (it.isValid()) {
    SimpleAttributeSet s = (SimpleAttributeSet)it.getAttributes();

    String link = (String)s.getAttribute(HTML.Attribute.HREF);
    if (link != null) {
            // Add the link to the result list
            System.out.println(link);
        //System.out.println("link print finished");
        result.add(link);
    }
    //System.out.println(link);
    it.next();
}
}

如何打印 URL 的内容?

【问题讨论】:

  • 你不想解析谷歌搜索页面?
  • 建议不要使用其他框架的人也应该建议不要使用 stackoverflow。您提到的大多数工具都是开源的。为什么不检查代码?

标签: java url hyperlink


【解决方案1】:
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.net.URL;
import java.util.Iterator;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class PrintURL {
public static void main(String[] args) throws Exception 
{
Reader r = null;

     try {
        URL u = new URL("https://www.google.co.in/");
        // URL u = new URL(args[0]);
        InputStream in = u.openStream();
        r = new InputStreamReader(in);

        Document jsoup = Jsoup.connect("https://www.google.co.in/").get();
        Elements aHref = jsoup.getElementsByTag("a");
        Iterator<Element> iterator = aHref.iterator();
        while (iterator.hasNext()) 
        {
            Element element = iterator.next();
            System.out.println("\nLink: " + element.attr("href")); 
            System.out.println("Link Name: " + element.text());
        }
                 } finally {
        if (r != null) {
            r.close();
          }
         }
        }
      }

【讨论】:

    猜你喜欢
    • 2013-10-07
    • 1970-01-01
    • 2012-04-25
    • 2010-12-26
    • 2013-07-28
    • 1970-01-01
    • 1970-01-01
    • 2020-11-30
    • 1970-01-01
    相关资源
    最近更新 更多