【发布时间】:2015-04-10 00:48:21
【问题描述】:
我正在尝试从网页复制表格。我无法复制整个页面,因为它具有按钮和动态元素,并且由于内存过载而将它们粘贴到工作表中会破坏代码,所以我试图拉出 HTML 并将表格粘贴到 Excel 中。
当我将整个源代码文本复制到 Word 中时,它告诉我有大约 23k 个字母,但是当我使用 innerHTML 或 outerHTML 时,它们的长度都在 15-16k 左右。
我知道内部和外部在 HTML 正文之外缺少很多函数等,但令我困惑的是它们缺少代码中间我需要的表格。
网站代码:
<div class="row" >
<div class="col-lg-12 col-md-12 col-sm-12" >
</div>
<div class="col-lg-12 col-md-12 col-sm-12" >
<table class="table table-hover table-bordered table-striped " >
<thead>
<tr style="background:#eee">
<th class="sortable" ><a href="/employer/report/report?_action_report=Run+Report&show_advertisers=on&_show_conversions=&advertiser_id=25&_show_advertisers=&_hide_campaigns=&_show_campaigns=&end=03%2F11%2F2015&campaign_id=&begin=03%2F11%2F2015&sort=day&order=asc">Date</a></th>
<th class="sortable" ><a href="/employer/report/report?_action_report=Run+Report&show_advertisers=on&_show_conversions=&advertiser_id=25&_show_advertisers=&_hide_campaigns=&_show_campaigns=&end=03%2F11%2F2015&campaign_id=&begin=03%2F11%2F2015&sort=jobs&order=asc">Current Jobs Listed</a></th>
<th class="sortable" ><a href="/employer/report/report?_action_report=Run+Report&show_advertisers=on&_show_conversions=&advertiser_id=25&_show_advertisers=&_hide_campaigns=&_show_campaigns=&end=03%2F11%2F2015&campaign_id=&begin=03%2F11%2F2015&sort=impressions&order=asc">Impressions</a></th>
<th class="sortable" ><a href="/employer/report/report?_action_report=Run+Report&show_advertisers=on&_show_conversions=&advertiser_id=25&_show_advertisers=&_hide_campaigns=&_show_campaigns=&end=03%2F11%2F2015&campaign_id=&begin=03%2F11%2F2015&sort=clicks&order=asc">Clicks</a></th>
<th class="sortable" ><a href="/employer/report/report?_action_report=Run+Report&show_advertisers=on&_show_conversions=&advertiser_id=25&_show_advertisers=&_hide_campaigns=&_show_campaigns=&end=03%2F11%2F2015&campaign_id=&begin=03%2F11%2F2015&sort=cpc&order=asc">CPC</a></th>
<th class="sortable" ><a href="/employer/report/report?_action_report=Run+Report&show_advertisers=on&_show_conversions=&advertiser_id=25&_show_advertisers=&_hide_campaigns=&_show_campaigns=&end=03%2F11%2F2015&campaign_id=&begin=03%2F11%2F2015&sort=ctr&order=asc">CTR</a></th>
<th class="sortable" ><a href="/employer/report/report?_action_report=Run+Report&show_advertisers=on&_show_conversions=&advertiser_id=25&_show_advertisers=&_hide_campaigns=&_show_campaigns=&end=03%2F11%2F2015&campaign_id=&begin=03%2F11%2F2015&sort=cost&order=asc">Estimated cost</a></th>
<th class="sortable" ><a href="/employer/report/report?_action_report=Run+Report&show_advertisers=on&_show_conversions=&advertiser_id=25&_show_advertisers=&_hide_campaigns=&_show_campaigns=&end=03%2F11%2F2015&campaign_id=&begin=03%2F11%2F2015&sort=daily_budget&order=asc">Current Daily Budget</a></th>
<th style="vertical-align:top" ><a href="#" onclick="return false;">Edit Campaign</a></th>
<th style="vertical-align:top" ></th>
</tr>
</thead>
<tbody>
<tr class="odd 2015-03-11">
<td>2015-03-11</td>
<td class="jobsListed" >437879</td>
<td>148397</td>
<td>1379</td>
<td>$0.36</td>
<td>0.93%</td>
<td >$491.16</td>
<td class="dailyBudget">$15500.00</td>
<td ><a href="/employer/campaign/">Edit</a></td>
</tr>
<tr class="dg" >
<td colspan="1" class="text-right"><b>Total:</b></td>
<td class="jobsListed" >437879</td>
<td>148397</td>
<td>1379</td>
<td>$0.36</td>
<td>0.93%</td>
<td >$491.16</td>
<td class="dailyBudget">$15500.00</td>
<td ></td>
<td ></td>
</tr>
</tbody>
</table>
</div>
</div>
</div><!--container ends here -->
这是我尝试获取表格数据的方式:
Dim appIE As Object ' InternetExplorer.Application
Set appIE = CreateObject("InternetExplorer.Application")
Dim strSource As String
Dim TableString As String
strSource = CStr(appIE.document.body.outerHTML)
TableString = Mid(strSource, _
InStr(strSource, "<table"), _
InStr(strSource, "</table>") - InStr(strSource, "<table"))
Dim ClipBoard As New DataObject
ClipBoard.SetText TableString
ClipBoard.PutInClipboard
它给了我一个错误,因为它在字符串中找不到<table。我在字符串中踩了几下,发现table所在的空间应该是这样的:
class="col-lg-12 col-md-12 col-sm-12">
</div>
</div>
</div><!--container ends here -->
有什么想法吗?谢谢
【问题讨论】:
-
也许表格是动态的,除非您“悬停”在某个区域上,否则不会提供服务?这只是基于描述性类名的疯狂猜测,没有 URL,任何人都很难提供具体的帮助。我会问你为什么使用字符串函数来解析 HTML?为 HTML 或 XML 文档使用适当的 DOM 解析器,这些解析器具有很棒的方法,例如
.getElementsByClassName和其他设计专门用于遍历 XML/HTML 树中的节点的方法。 -
我试图通过类名来获取它,类似于:
strSource = cstr(appIE.document.getElementsByTagName("table table-hover table-bordered table-striped").innerhtml)但它仍然给我一个空表
标签: html excel vba internet-explorer