【问题标题】:How to get the nth previous or next element in Jsoup如何在Jsoup中获取第n个上一个或下一个元素
【发布时间】:2018-02-18 23:19:07
【问题描述】:

是否有办法使用 jsoup 获取可能位于不同嵌套级别的 nth 上一个或下一个特定 HTML 元素?

HTML 示例:

<div style="position: relative;">
  <div class="wmd-container">
    <div id="wmd-button-bar-42" class="wmd-button-bar"></div>
    <input id="previousInput" name="communitymode" type="checkbox">
  </div>
</div>

<div class="fl" style="margin-top: 8px; height: 24px;">&nbsp;</div>
<div id="draft-saved-42" class="draft-saved community-option fl" style="margin-top: 8px; height: 24px; display: none;">draft saved
</div>

<div id="draft-discarded-42">draft discarded</div>

<div class="community-option g-row ai-center f-checkbox">
  <div class="g-col -input">
    <input id="NextInput" name="communitymode">
  </div>
  <div class="g-col">
    <label for="communitymode-42">community wiki</label>
  </div>
</div>

例如在下面的 HTML 中,我指向元素:

&lt;div id="draft-discarded-42"&gt;draft discarded&lt;/div&gt;

使用下面的代码。

Element elem = doc.select("div[id=draft-discarded-42]").first();

我想要 first previous input 元素:

&lt;input id="previousInput" name="communitymode" type="checkbox"&gt;

还有上一个div

&lt;div class="fl" style="margin-top: 8px; height: 24px;"&gt;&amp;nbsp;&lt;/div&gt;

然后第二个 div:

<div class="g-col -input">
  <input id="NextInput" name="communitymode">
</div>

【问题讨论】:

    标签: java web-scraping jsoup html-parsing


    【解决方案1】:

    除非您不知道id 属性的值或任何可用于标识元素的属性,否则您应该使用选择器语法来获取您想要的元素。

    但是,如果你有一个模糊的想法/不知道元素的属性,但知道它的出现与指向的元素有关,你可以使用这些函数:

    与查询匹配的元素的第 N 次出现:

    public static Element selectNthElementBefore(Element origin, String query, int count) {
        Element currentElement = origin;
        Evaluator evaluator = QueryParser.parse(query);
        while ((currentElement = currentElement.previousElementSibling()) != null) {
            int val = 0;
            if (currentElement.is(evaluator)) {
                if (--count == 0) {
                    return currentElement;
                }
                val++;
            }
            Elements elems = currentElement.select(query);
            if (elems.size() > val) {
                int childCount = elems.size() - val;
                int diff = count - childCount;
    
                if (diff == 0) {
                    Element prevElement = elems.first();
                    currentElement = prevElement.children().select(query).first();
                    while (currentElement != prevElement) {
                        if (currentElement == null) {
                            return prevElement;
                        }
                        prevElement = currentElement;
                        currentElement = currentElement.children().select(query).first();
                    }
                }
                if (diff > 0) {
                    count -= childCount;
                }
                if (diff < 0) {
                    return elems.get(childCount - count);
                }
            }
        }
    
        if (origin.parent() != null && currentElement == null) {
            if (origin.parent().is(evaluator)) {
                if (--count == 0) {
                    return origin.parent();
                }
            }
            return selectNthElementBefore(origin.parent(), query, count);
        }
        return currentElement;
    }
    

    下一个匹配查询的元素的第 N 次出现:

    public static Element selectNthElementAfter(Element origin, String query, int count) {
        Element currentElement = origin;
        Evaluator evaluator = QueryParser.parse(query);
        while ((currentElement = currentElement.nextElementSibling()) != null) {
            int val = 0;
            if (currentElement.is(evaluator)) {
                if (--count == 0)
                    return currentElement;
                val++;
            }
            Elements elems = currentElement.select(query);
            if (elems.size() > val) {
                int childCount = elems.size() - val;
                int diff = count - childCount;
    
                if (diff == 0) {
                    return elems.last();
                }
                if (diff > 0) {
                    count -= childCount;
                }
                if (diff < 0) {
                    return elems.get(childCount + diff);
                }
            }
        }
        if (origin.parent() != null && currentElement == null) {
            return selectNthElementAfter(origin.parent(), query, count);
        }
        return currentElement;
    }
    

    用法:

    Element elem = doc.getElementById("draft-discarded-42");
    
    Element firstPrevInput = selectNthElementBefore(elem, "input", 1);
    Element secPrevDiv = selectNthElementBefore(elem, "div", 2);
    Element secNextDiv = selectNthElementAfter(elem, "div", 2);
    
    System.out.println("#### First previous input ####");
    System.out.println(firstPrevInput.toString());
    System.out.println("##############################\n"); 
    System.out.println("#### Second previous div ####");
    System.out.println(secPrevDiv.toString());
    System.out.println("#############################\n");
    System.out.println("#### Second next div ####");
    System.out.println(secNextDiv.toString());
    System.out.println("#########################");
    

    输出:

    #### First previous input ####
    <input id="previousInput" name="communitymode" type="checkbox">
    ##############################
    
    #### Second previous div ####
    <div class="fl" style="margin-top: 8px; height: 24px;">
     &nbsp;
    </div>
    #############################
    
    #### Second next div ####
    <div class="g-col -input"> 
        <input id="NextInput" name="communitymode"> 
    </div>
    #########################
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-08-07
      • 1970-01-01
      • 2016-11-24
      • 1970-01-01
      • 2015-08-09
      • 1970-01-01
      • 2010-09-20
      • 1970-01-01
      相关资源
      最近更新 更多