【发布时间】:2019-06-09 20:30:21
【问题描述】:
我对使用 Xpath 很陌生。我正在尝试从法律法规网站提取一些信息,现在我只想:
- 查找包含字符串“Article 1”的标签。
- 从 (1) 中的那个标记开始,获取它以及之后的所有内容,直到其中一个标记在
<b>标记中包含另一个字符串“总理大臣”。
<p>
<b> <span> Article 1. </span> </b>
<span>
To approve the master plan on development
of tourism in Northern Central Vietnam
with the following principal contents:
</span>
</p>
<p>
<span>
1. Development viewpoints
</span>
</p>
<p>
<span>To realize general viewpoints of the strategy for and master plan on development of Vietnam’s tourism through 2020.
</span>
</p>
<p>
<span>PRIME MINISTER: Nguyen Tan Dung</span>
</p>
<p>
<span>
<b> PRIME MINISTER </b>
</span>
</p>
<p>
<b> <span> Article 2. </span> </b>
<span>
.................
</span>
</p>
<p>
<span> PRIME MINISTER: Nguyen Tan Dung</span>
</p>
预期的输出,我应该有一个类似于
的列表[
'Article 1.' ,
'To approve the master plan on development of tourism in Northern
Central Vietnam with the following principal contents: ',
'1. Development viewpoints' ,
'To realize general viewpoints of the strategy for and master plan on
development of Vietnam’s tourism through 2020.' ,
'PRIME MINISTER: Nguyen Tan Dung',
'PRIME MINISTER'
]
列表中的第一项是“Article 1”。列表中的最后一项是 <b> 标签内的“PRIME MINISTER”
【问题讨论】:
标签: python xpath web-scraping scrapy