如何使用lxml Python从超链接覆盖的html文本中提取名称答案

【问题标题】：How to extract name from html text that is covered by hyper link, using lxml Python如何使用lxml Python从超链接覆盖的html文本中提取名称
【发布时间】：2017-10-22 19:14:29
【问题描述】：

我一直在尝试使用 lxml 请求来删除该站点上的玩家姓名http://mlb.mlb.com/news/probable_pitchers/。但是从我在互联网上看到的关于如何使用 lxml 的所有其他代码中，我发现没有任何东西对我有帮助。有人有什么建议吗？

【问题讨论】：

标签： python web-scraping lxml

【解决方案1】：

你试过这个网站“HTML Scraping”http://python-guide-pt-br.readthedocs.io/en/latest/scenarios/scrape/

【讨论】：

首先感谢您的回复！是的，该站点实际上是我在搜索中访问的第一个站点之一，但问题是，虽然我理解他们为什么使用它：buyers = tree.xpath('//div[@title="buyer-name" ]/text()')，不幸的是，我不明白仅从这段 html 中获取名称的等价物是什么：mlb.mlb.com/team/player.jsp?player_id=608566">GermanMarquez
为澄清起见，我要提取的名称是“German Marquez”
试试这个：page = requests.get('mlb.mlb.com/news/probable_pitchers/') tree = html.fromstring(page.content) tree.xpath('//div/div/h5/a/text() ') 根据 html 中的标签使用层次结构。我希望这会对你有所帮助。