【问题标题】:How to access the <p> tag next to <p class="bold"> Last Statement:</p>如何访问 <p class="bold"> Last Statement 旁边的 <p> 标签:</p>
【发布时间】:2020-07-21 04:24:04
【问题描述】:
<p class="bold">Date of Execution: </p>
<p>February 6, 2020</p>
<p class="bold"> Offender:</p>
<p>Ochoa, Abel Revill #999450</p>
<p class="bold"> Last Statement:</p>
<p>Yes sir. I would like to thank God, my dad, my Lord Jesus savior for saving me and changing my life. I want to apologize to my in-laws for causing all this emotional pain. I love y’all and consider y’all my sisters I never had. I want to thank you for forgiving me. Thank you warden. </p>
<p> </p>
如何访问倒数第二个段落标签? (最后一个空标签之前的那个)
para = soup('p')
for item in para:
string = str(item)
if '<p class="bold"> Last Statement:</p>' not in string: continue
print(string)
这是我的代码,下一步应该是什么?
【问题讨论】:
标签:
python-3.x
beautifulsoup
【解决方案1】:
In [43]: from bs4 import BeautifulSoup
In [44]: a = """<p class="bold">Date of Execution: </p>
...: <p>February 6, 2020</p>
...: <p class="bold"> Offender:</p>
...: <p>Ochoa, Abel Revill #999450</p>
...: <p class="bold"> Last Statement:</p>
...: <p>Yes sir. I would like to thank God, my dad, my Lord Jesus savior for saving me and changing my life. I w
...: ant to apologize to my in-laws for causing all this emotional pain. I love y’all and consider y’all my siste
...: rs I never had. I want to thank you for forgiving me. Thank you warden. </p>
...: <p> </p>"""
In [45]: soup = BeautifulSoup(a,"lxml")
In [46]: soup.find_all("p")[-2].text.strip()
Out[46]: 'Yes sir. I would like to thank God, my dad, my Lord Jesus savior for saving me and changing my life. I want to apologize to my in-laws for causing all this emotional pain. I love y’all and consider y’all my sisters I never had. I want to thank you for forgiving me. Thank you warden.'
In [49]: soup.find_all("p")[-3].text.strip() + ": " + soup.find_all("p")[-2].text.strip()
Out[49]: 'Last Statement:: Yes sir. I would like to thank God, my dad, my Lord Jesus savior for saving me and changing my life. I want to apologize to my in-laws for causing all this emotional pain. I love y’all and consider y’all my sisters I never had. I want to thank you for forgiving me. Thank you warden.'
【解决方案2】:
您可以搜索<p>标签,其文本包含“Last Statement”,然后得到next<p>标签。
例如:
from bs4 import BeautifulSoup
txt = '''<p class="bold">Date of Execution: </p>
<p>February 6, 2020</p>
<p class="bold"> Offender:</p>
<p>Ochoa, Abel Revill #999450</p>
<p class="bold"> Last Statement:</p>
<p>Yes sir. I would like to thank God, my dad, my Lord Jesus savior for saving me and changing my life. I want to apologize to my in-laws for causing all this emotional pain. I love y’all and consider y’all my sisters I never had. I want to thank you for forgiving me. Thank you warden. </p>
<p> </p>'''
soup = BeautifulSoup(txt, 'html.parser')
p = soup.select_one('p.bold:contains("Last Statement") + p')
print(p.text)
打印:
Yes sir. I would like to thank God, my dad, my Lord Jesus savior for saving me and changing my life. I want to apologize to my in-laws for causing all this emotional pain. I love y’all and consider y’all my sisters I never had. I want to thank you for forgiving me. Thank you warden.