【发布时间】:2021-04-11 16:00:20
【问题描述】:
我尝试添加包含从天气网站https://www.wunderground.com/history/daily/us/dc/washington/KDCA 抓取的每个表格的唯一日期的列
我从这段代码开始
driver = webdriver.Chrome('/Users/razanalthawwadi/Desktop/chromedriver')
link='https://www.wunderground.com/history/daily/us/va/arlington-
county/KDCA/Date/'
def list_dates(start,end):
""" This creates a list of of dates between the 'start' date and the 'end' date """
# create datetime object for the start and end dates
start = datetime.datetime.strptime(start, '%Y-%m-%d')
end = datetime.datetime.strptime(end, '%Y-%m-%d')
# generates list of dates between start and end dates
step = datetime.timedelta(days=1)
dates = []
while start <= end:
dates.append(start.date())
start += step
# return the list of dates in string format
return [str(date) for date in dates]
dates=list_dates('2017-01-01','2017-12-31')
we=[]
datess=[]
for i in dates:
# print(i)
datess.append(i)
page=str(str(link) + str(i))
driver.get(page)
time.sleep(3)
html=driver.page_source
df=pd.read_html(html)
we.append(df[1])
我尝试使用此循环,但它为所有表格打印相同的日期
for i in dates:
wel.insert(loc=0, column='jj', value=i)
【问题讨论】:
标签: python web-scraping data-science