【发布时间】:2019-04-07 06:01:52
【问题描述】:
我正在尝试获取 2017/2018 NHL 滑冰运动员的统计数据。我已经开始编写代码,但在解析数据和打印到 excel 时遇到了问题。
到目前为止,这是我的代码:
#import modules
from urllib.request import urlopen
from lxml.html import fromstring
import pandas as pd
#connect to url
url = "https://www.hockey-reference.com/leagues/NHL_2018_skaters.html"
#remove HTML comment markup
content = str(urlopen(url).read())
comment = content.replace("-->","").replace("<!--","")
tree = fromstring(comment)
#setting up excel columns
columns = ("names", "gp", "g", "s", "team")
df = pd.DataFrame(columns=columns)
#attempt at parsing data while using loop
for nhl, skater_row in enumerate(tree.xpath('//table[contains(@class,"stats_table")]/tr')):
names = pitcher_row.xpath('.//td[@data-stat="player"]/a')[0].text
gp = skater_row.xpath('.//td[@data-stat="games_played"]/text()')[0]
g = skater_row.xpath('.//td[@data-stat="goals"]/text()')[0]
s = skater_row.xpath('.//td[@data-stat="shots"]/text()')[0]
try:
team = skater_row.xpath('.//td[@data-stat="team_id"]/a')[0].text
# create pandas dataframe to export data to excel
df.loc[nhl] = (names, team, gp, g, s)
#write data to excel
writer = pd.ExcelWriter('NHL skater.xlsx')
df.to_excel(writer, 'Sheet1')
writer.save()
谁能解释一下如何解析这些数据?是否有任何提示可以帮助编写 Xpath,以便我可以遍历数据?
我在写这行时遇到问题:
for nhl, skater_row in enumerate(tree.xpath...
您是如何找到 Xpath 的?你用过 Xpath Finder 还是 Xpath Helper?
另外,我遇到了一行错误:
df.loc[nhl] = (names, team, gp, g, s)
它显示 df 的语法无效。
我是网络抓取的新手,之前没有编码经验。任何帮助将不胜感激。提前感谢您的宝贵时间!
【问题讨论】:
标签: python parsing xpath web-scraping lxml