【发布时间】:2020-09-07 02:05:37
【问题描述】:
这是我使用的专利示例 https://patents.google.com/patent/EP1208209A1/en?oq=medicinal+chemistry 。下面是我使用的代码。我希望代码仅显示被引用的 (3) 计数,这样我就知道该专利被引用了多少次。如何让输出仅显示被引用的计数为 3?请帮忙!
soup = BeautifulSoup(patent, 'html.parser')
cited_section =soup.findAll({"h2":"Cited By"})
print(cited_section)
Output I get is [<h2>Info</h2>, <h2>Links</h2>, <h2>Images</h2>, <h2>Classifications</h2>, <h2>Abstract</h2>, <h2>Description</h2>, <h2>Claims (<span itemprop="count">57</span>)</h2>, <h2>Priority Applications (5)</h2>, <h2>Applications Claiming Priority (1)</h2>, <h2>Related Parent Applications (1)</h2>, <h2>Publications (2)</h2>, <h2>ID=38925605</h2>, <h2>Family Applications (1)</h2>, <h2>Country Status (1)</h2>, <h2>Cited By (3)</h2>, <h2>Families Citing this family (12)</h2>, <h2>Citations (306)</h2>, <h2>Patent Citations (348)</h2>, <h2>Non-Patent Citations (23)</h2>, <h2>Cited By (4)</h2>, <h2>Also Published As</h2>, <h2>Similar Documents</h2>, <h2>Legal Events</h2>]````
【问题讨论】:
-
页面好像是异步渲染的。我建议你使用
Selenium。
标签: python html parsing beautifulsoup extract