【发布时间】:2019-03-25 03:00:56
【问题描述】:
我有一个包含以下数据的 HTML 页面:
<table cellpadding="0" cellspacing="0" width="100%">
<tr>
<td style="width:50%;padding-right:8px;" valign="top">
<h2 class="sectionTitle">Corporate Headquarters</h2>
<div itemprop="workLocation">6901 Professional Parkway East<br />Sarasota, Florida 34240<br /><br />United States<br /><br /></div><span class="detail">Phone</span>: <span itemprop="telephone">941-556-2601</span><br /><span class="detail">Fax</span>: <span itemprop="faxNumber">--</span>
<h2 class="sectionTitle">Board Members Memberships</h2>
<div>2011-Present</div><div><strong>Director</strong></div><div style="padding-bottom:15px;"><a href="../../stocks/snapshot/snapshot.asp?capId=11777224">TrustWave Holdings, Inc.</a></div><div>2018-Present</div><div><strong>President, CEO & Director</strong></div><div style="padding-bottom:15px;"><a href="../../stocks/snapshot/snapshot.asp?capId=22751">Roper Technologies, Inc.</a></div>
<h2 class="sectionTitle">Education</h2>
<div><strong>MBA</strong> </div><div style="padding-bottom:15px;" itemprop="alumniOf">Harvard Business School</div><div><strong>Unknown/Other Education</strong> </div><div style="padding-bottom:15px;" itemprop="alumniOf">Miami University</div><div><strong>Bachelor's Degree</strong> </div><div style="padding-bottom:15px;" itemprop="alumniOf">Miami University</div>
<h2 class="sectionTitle">Other Affiliations</h2>
<div><a itemprop="affiliation" href="../../stocks/snapshot/snapshot.asp?capId=424885">MedAssets, Inc.</a></div><div><a itemprop="affiliation" href="../../stocks/snapshot/snapshot.asp?capId=1131022">Harvard Business School</a></div><div><a itemprop="affiliation" href="../../stocks/snapshot/snapshot.asp?capId=4109057">Miami University</a></div><div><a itemprop="affiliation" href="../../stocks/snapshot/snapshot.asp?capId=6296385">MedAssets Net Revenue Systems, LLC</a></div><div><a itemprop="affiliation" href="../../stocks/snapshot/snapshot.asp?capId=11777224">TrustWave Holdings, Inc.</a></div><div><a itemprop="affiliation" href="../../stocks/snapshot/snapshot.asp?capId=138296355">Medassets Services LLC</a></div>
</td>
我正在尝试将有关“董事会成员成员资格”的信息提取为
Director
TrustWave Holdings, Inc.
CEO & Director
Roper Technologies, Inc.
这些没有任何类或 id 以便于提取。
但是,我能做的就是:
soup.find('td',style="width:50%;padding-right:8px;").findAll("strong")
这给了我以下结果:
[<strong>Director</strong>,
<strong>President, CEO & Director</strong>,
<strong>MBA</strong>,
<strong>Unknown/Other Education</strong>,
<strong>Bachelor's Degree</strong>]
有人可以指导我怎么做吗?
【问题讨论】:
-
您想要字符串“Director - TrustWave Holdings, Inc. CEO & Director - Roper Technologies, Inc.”吗?我不清楚你到底想要什么。
-
@VincentBeltman 我已经编辑了我的问题。请看一看。
-
您在 .find()` 方法中使用了
td标记,但您粘贴的 html 元素不包含该标记。尝试分享链接或添加更多元素,使其与您的尝试相关。 -
@SIM 请查看。
标签: python html web-scraping beautifulsoup