【问题标题】:How do I extract cells from HTML Tables and organize based on other cells in table row in Java?如何从 HTML 表格中提取单元格并根据 Java 中表格行中的其他单元格进行组织?
【发布时间】:2014-05-20 11:36:09
【问题描述】:

我从网站中提取了以下 HTML。我将所有这些 HTML 存储为 Java 中的字符串变量,我希望能够查看每个表格行,如果该表格中有任何带有“当前分配报告”字样的数据单元格,那么它将查看另一个该表中的数据单元格并将课程名称添加到 ArrayList 并将数字存储在 javascript:rlViewItm 之后的 href 中,并将这些数字添加到另一个 ArrayList。这是该行的一个示例:

<a href="javascript:rlViewItm('2049144736880355316');">View</a>

我将提供一个示例来阐明我想要得到的东西。它首先会开始查看下面的 html,它是一个字符串。它会查看每个表格,然后分别查看每个单独的表格行。如果有一个表格行的表格数据单元格显示“当前分配报告”,那么它将查看该表格行中的其他数据单元格,并找到下面写的只有数字被更改的行。我希望将这些数字存储在单独的 arrayList 中。

<a href="javascript:rlViewItm('2049145027227690148');">View</a>

我以前在 Java 中使用过对字符串进行排序,但我不明白如何根据 HTML 表的特定条件将每个内容单独存储到 ArrayList 中。

我将非常感谢任何可以在 Java 中执行此操作的人的帮助!

  <div class="ed-formArea">
  <div class="ed-formHeader noText">
  </div>
  <div class="ed-formContent">
<!--SECTION CODE null Section #1  ENDS - DO NOT MODIFY -->
<!--SECTION CODE null CUSTOM CODE BEGIN -->


<form method="post" name="resourceLabelForm" action="/post/UserDocList.page">
<table summary="" border="0" class="ed-formTable" cellspacing="0" cellpadding="5">
<tbody>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td class="ed-tdEnd">
            Private Reports


                <small><small>&nbsp;(1-40 of 40&nbsp;items)</small></small>

        </td></tr>
</tbody>
</table>

 </form>

<form method="post" name="userDocListTableForm" action="/post/UserDocList.page">
  <input type="hidden" name="selectAllEvent" value="" />
  <input type="hidden" name="deselectAllEvent" value="" />
  <table summary="" border="0" class="ed-formTable" cellspacing="0" cellpadding="5">
<tbody>


</tbody>
</table>



<table summary="" border="0" class="ed-formTable" cellspacing="0" cellpadding="5">
<tbody>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td>&nbsp;</td><td valign="bottom" width="12%">
          <div class="smaller"><strong>
            Report Date
          </strong></div>
        </td><td valign="bottom" width="8%">
          <div class="smaller"><strong>Report</strong></div>
        </td><td valign="bottom" width="25%">
          <div class="smaller"><strong>View Home Page</strong></div>
        </td><td valign="bottom" width="25%">
          <div class="smaller"><strong>Report Name</strong></div>
        </td><td valign="bottom" width="2%" class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/11/14
        </td><td>
          <a href="javascript:rlViewItm('2049145027192329860');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/5151_8701"> 
      PRINS OF ENGIN B
      </a> 
    </td><td>

            Current Assignments Report
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/11/14
        </td><td>
          <a href="javascript:rlViewItm('2049145027227690148');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3540_0002"> 
      ADV SCI 4 BIO B
      </a> 
    </td><td>

            Current Assignments Report
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/11/14
        </td><td>
          <a href="javascript:rlViewItm('2049145027213095124');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3042_0010"> 
      MAG FUNCTIONS B
      </a> 
    </td><td>

            Current Assignments Report
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/11/14
        </td><td>
          <a href="javascript:rlViewItm('2049145027201539636');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/2954_8702"> 
      Algorithms &amp; Data Structures X/Y TBD
      </a> 
    </td><td>

            Current Assignments Report
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/10/14
        </td><td>
          <a href="javascript:rlViewItm('2049145027226480084');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/1324_0005"> 
      HON ENGLISH 10B
      </a> 
    </td><td>

            Current Assignments Report
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/09/14
        </td><td>
          <a href="javascript:rlViewItm('2049145027229871460');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3538_0001"> 
      ADV SCI 3 E/SS B
      </a> 
    </td><td>

            Current Assignments Report
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/09/14
        </td><td>
          <a href="javascript:rlViewItm('2049145027216196756');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/1743_0006"> 
      HON SPANISH 3B
      </a> 
    </td><td>

            Current Assignments Report
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/09/14
        </td><td>
          <a href="javascript:rlViewItm('2049144831908197844');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School"> 
      Local High School
      </a> 
    </td><td>

            Student Grades and Graduation Credit Report
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/07/14
        </td><td>
          <a href="javascript:rlViewItm('2049145027196480420');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/2105_8701"> 
      AP GOVPL US NSL B
      </a> 
    </td><td>

            Current Assignments Report
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/02/14
        </td><td>
          <a href="javascript:rlViewItm('2049144736912474660');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/9151_0027"> 
      HOMEROOM
      </a> 
    </td><td>

            Current Absences Report
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/01/14
        </td><td>
          <a href="javascript:rlViewItm('2049144936031942836');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/5151_8701"> 
      PRINS OF ENGIN B
      </a> 
    </td><td>

            Marking Period 3 as of Mar 31 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/01/14
        </td><td>
          <a href="javascript:rlViewItm('2049144936031809620');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3540_0002"> 
      ADV SCI 4 BIO B
      </a> 
    </td><td>

            Marking Period 3 as of Mar 31 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/01/14
        </td><td>
          <a href="javascript:rlViewItm('2049144936025439028');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3538_0001"> 
      ADV SCI 3 E/SS B
      </a> 
    </td><td>

            Marking Period 3 as of Mar 31 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/01/14
        </td><td>
          <a href="javascript:rlViewItm('2049144936016776612');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3042_0010"> 
      MAG FUNCTIONS B
      </a> 
    </td><td>

            Marking Period 3 as of Mar 31 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/01/14
        </td><td>
          <a href="javascript:rlViewItm('2049144936060013524');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/2954_8702"> 
      Algorithms &amp; Data Structures X/Y TBD
      </a> 
    </td><td>

            Marking Period 3 as of Mar 31 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/01/14
        </td><td>
          <a href="javascript:rlViewItm('2049144936025100916');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/2105_8701"> 
      AP GOVPL US NSL B
      </a> 
    </td><td>

            Marking Period 3 as of Mar 31 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/01/14
        </td><td>
          <a href="javascript:rlViewItm('2049144936022815204');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/1743_0006"> 
      HON SPANISH 3B
      </a> 
    </td><td>

            Marking Period 3 as of Mar 31 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/01/14
        </td><td>
          <a href="javascript:rlViewItm('2049144936043227972');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/1324_0005"> 
      HON ENGLISH 10B
      </a> 
    </td><td>

            Marking Period 3 as of Mar 31 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          04/01/14
        </td><td>
          <a href="javascript:rlViewItm('2049145025811761220');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/9151_0027"> 
      HOMEROOM
      </a> 
    </td><td>

            Marking Period 3 Absences as of Mar 31, 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div>/td>
<td valign="center">&nbsp;</td><td>
          03/08/14
        </td><td>
          <a href="javascript:rlViewItm('2049144992192941348');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/9151_0027"> 
      HOMEROOM
      </a> 
    </td><td>

            Interim Report MP3 as of Feb 28
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          01/25/14
        </td><td>
          <a href="javascript:rlViewItm('2049144934670566308');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/9151_0027"> 
      HOMEROOM
      </a> 
    </td><td>

            Marking Period 2 Absences as of Jan 24, 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          01/25/14
        </td><td>
          <a href="javascript:rlViewItm('2049144824058685812');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/5150_8701"> 
      PRINS OF ENGIN A
      </a> 
    </td><td>

            Marking Period 2 as of Jan 24 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          01/25/14
        </td><td>
          <a href="javascript:rlViewItm('2049144824085227764');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3539_0002"> 
      ADV SCI 4 BIO A
      </a> 
    </td><td>

            Marking Period 2 as of Jan 24 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          01/25/14
        </td><td>
          <a href="javascript:rlViewItm('2049144824074464628');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3537_0001"> 
      ADV SCI 3 E/SS A
      </a> 
    </td><td>

            Marking Period 2 as of Jan 24 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          01/25/14
        </td><td>
          <a href="javascript:rlViewItm('2049144824082665540');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3047_0010"> 
      MAGNET PRECALC C
      </a> 
    </td><td>

            Marking Period 2 as of Jan 24 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          01/25/14
        </td><td>
          <a href="javascript:rlViewItm('2049144824049900244');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/2953_8702"> 
      Old Algorithms &amp; Data Structures Y
      </a> 
    </td><td>

            Marking Period 2 as of Jan 24 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          01/25/14
        </td><td>
          <a href="javascript:rlViewItm('2049144824039718948');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/2104_8701"> 
      Period 9 AP NSL
      </a> 
    </td><td>

            Marking Period 2 as of Jan 24 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          01/25/14
        </td><td>
          <a href="javascript:rlViewItm('2049144824065741444');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/1733_0006"> 
      HON SPANISH 3A
      </a> 
    </td><td>

            Marking Period 2 as of Jan 24 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          01/25/14
        </td><td>
          <a href="javascript:rlViewItm('2049144824083064244');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/1323_0005"> 
      HON ENGLISH 10A
      </a> 
    </td><td>

            Marking Period 2 as of Jan 24 2014
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          12/13/13
        </td><td>
          <a href="javascript:rlViewItm('2049144874776524020');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/9151_0027"> 
      HOMEROOM
      </a> 
    </td><td>

            Interim Report MP2 as of Dec 06
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          11/05/13
        </td><td>
          <a href="javascript:rlViewItm('2049144822701443172');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/9151_0027"> 
      HOMEROOM
      </a> 
    </td><td>

            Marking Period 1 Absences as of Nov 04, 2013
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          11/05/13
        </td><td>
          <a href="javascript:rlViewItm('2049144736860489172');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/5150_8701"> 
      PRINS OF ENGIN A
      </a> 
    </td><td>

            Marking Period 1 as of Nov 04 2013
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          11/05/13
        </td><td>
          <a href="javascript:rlViewItm('2049144736881890916');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3539_0002"> 
      ADV SCI 4 BIO A
      </a> 
    </td><td>

            Marking Period 1 as of Nov 04 2013
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          11/05/13
        </td><td>
          <a href="javascript:rlViewItm('2049144736862291156');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3537_0001"> 
      ADV SCI 3 E/SS A
      </a> 
    </td><td>

            Marking Period 1 as of Nov 04 2013
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          11/05/13
        </td><td>
          <a href="javascript:rlViewItm('2049144736866166628');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3047_0010"> 
      MAGNET PRECALC C
      </a> 
    </td><td>

            Marking Period 1 as of Nov 04 2013
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          11/05/13
        </td><td>
          <a href="javascript:rlViewItm('2049144736903239140');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/2953_8702"> 
      Old Algorithms &amp; Data Structures Y
      </a> 
    </td><td>

            Marking Period 1 as of Nov 04 2013
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          11/05/13
        </td><td>
          <a href="javascript:rlViewItm('2049144736880355316');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/2104_8701"> 
      Period 9 AP NSL
      </a> 
    </td><td>

            Marking Period 1 as of Nov 04 2013
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          11/05/13
        </td><td>
          <a href="javascript:rlViewItm('2049144736894413524');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/1733_0006"> 
      HON SPANISH 3A
      </a> 
    </td><td>

            Marking Period 1 as of Nov 04 2013
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr class="ed-alternateRow">
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          11/05/13
        </td><td>
          <a href="javascript:rlViewItm('2049144736870593220');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/1323_0005"> 
      HON ENGLISH 10A
      </a> 
    </td><td>

            Marking Period 1 as of Nov 04 2013
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    <tr>
<td><div class="ed-tdSpacer"></div></td>
<td valign="center">&nbsp;</td><td>
          10/04/13
        </td><td>
          <a href="javascript:rlViewItm('2049144777895089844');">View</a>
        </td><td>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/9151_0027"> 
      HOMEROOM
      </a> 
    </td><td>

            Interim Report MP1 as of Sep 27
        </td><td class="ed-tdEnd">&nbsp;</td></tr>

    </tbody>
</table>

【问题讨论】:

    标签: java html regex arraylist


    【解决方案1】:

    免责声明: Do not use regular expressions to parse HTML.


    如果 HTML 的格式与您发布的代码中的格式一样严格,您可以按照以下步骤操作:

    使用Pattern.DOTALL 标志,搜索整个字符串

    <tr>(.*?)<td> Current Assignments Report </td>.*?</tr>
    

    使用Matcher.find() 迭代每个匹配项,将每个分配的数据放入捕获组一。示例匹配:

     <td>
      <div class="ed-tdSpacer"></div></td>
     <td valign="center">&nbsp;</td>
     <td> 04/02/14 </td>
     <td> <a href="javascript:rlViewItm('2049145027229871460');">View</a> </td>
     <td> <a class="lochomepage" href="/pages/Local_High_School/Classes/3538_0001"> Item 6 </a> </td>
    

    在此文本中,搜索&lt;td&gt; (.*?) &lt;/td&gt; 的每个实例。每个数据项的内容都放在 its 捕获组一中。搜索上述文本会得到以下匹配项:

    04/02/14
    <a href="javascript:rlViewItm('2049145027229871460');">View</a>
    <a class="lochomepage" href="/pages/Local_High_School/Classes/3538_0001"> Item 6 </a>
    

    日期几乎可以按原样获取,其他两项将需要根据您想要从中获取的内容进行解析。

    但同样,如果您的输入真的像您暗示的那样严格,那应该不会那么糟糕。


    更新:根据您最近的输入(您发布的长文件),此正则表达式捕获每个项目,据我了解您的需求:

    <td>\s*?Current Assignments Report.*?<td>\s*?([0-9]{2}/[0-9]{2}/[0-9]{2}).*?<a href="javascript:rlViewItm\('([0-9]+)'\);">View</a>.*?<a class="lochomepage" href="([^"]+)">\s*([\w ]+)\s*</a>
    

    Debuggex Demo

    注意这需要一段时间来加载,因为输入太长了。

    捕获组:

    1. 日期
    2. 商品编号
    3. lochomepage 网址
    4. 链接显示

    我知道你问这个问题已经有一段时间了。也许它仍然有帮助......

    【讨论】:

    • 非常感谢您的回复,但是当我尝试无法成功访问所有表行时,项目 # 只弹出了几个,知道为什么会发生吗?
    • 它们都是以&lt;td&gt;[space] 开头并以[space]&lt;/td&gt; 结尾的吗?
    • 是的,他们都有数据单元格 Current Assignments Report
    • 我需要更多信息才能帮助您。我想你说的是这条线:&lt;a href="javascript:rlViewItm('2049145027229871460');"&gt;View&lt;/a&gt;this 行是否总是被&lt;td&gt;[space][space]&lt;/td&gt; 包围?
    • 我修改了我的问题并试图让它更清楚一点,我将非常感谢有关如何解决我的问题的想法。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2023-03-14
    • 2013-01-10
    • 2021-03-11
    • 2021-01-10
    • 2011-10-23
    相关资源
    最近更新 更多