【发布时间】:2015-09-23 02:48:27
【问题描述】:
这里是部分网页源代码。
<tr>
<td>
<a href="/docdollars/doctors/pid/36602">
<h6>Jane</h6>
</a>
Allopathic & Osteopathic Physicians/Internal Medicine
</td>
<td>
<p>NY Medical Ctr<br>New York City,
<a href="/docdollars/states/NY">NY</a>
</p>
</td>
</tr>
<tr>
<td>
<a href="/docdollars/doctors/pid/1091514">
<h6>Greg</h6>
</a>
Allopathic & Osteopathic Physicians/Family Medicine
</td>
<td>
<p>57950 NYC<br>New York City,
<a href="/docdollars/states/NY">NY</a>
</p>
</td>
</tr>
我希望抓取的数据如下所示:
Jane, Allopathic & Osteopathic Physicians/Internal Medicine, NY Medical Ctr, New York City, NY
Greg, Allopathic & Osteopathic Physicians/Family Medicine, 57950 NYC, New York City, NY
我的代码(下面)部分工作(见下面的 cmets)。
for i in item.find_all('tr'):
print i.find('a').find('h6').text #working fine
print i.find('td').next_sibling.next_sibling.find('p').text.strip() # this needs revision
print i.find('td').text.strip() # this needs revision
提前感谢您的建议!
【问题讨论】:
标签: python python-2.7 web-scraping beautifulsoup