【发布时间】:2019-06-25 12:41:47
【问题描述】:
我只想返回杂货零售商网站上显示的价格。
我已经在网站上抓取了表格,但我只想知道数据框中每个单元格的交货价格。我的想法是过滤每个单元格并返回单元格中字符串中价格的正则表达式匹配。我不确定是否有更简单的方法可以做到这一点,也许是 pd.read_html?
import requests
import pandas as pd
from bs4 import BeautifulSoup
postcode = 'l4 0th'
payload = {'postcode': postcode}
putUrl = 'https://www.sainsburys.co.uk/gol-api/v1/customer/postcode'
Sains_url = 'https://www.sainsburys.co.uk/shop/PostCodeCheckSuccessView'
Sains_url2 = 'https://www.sainsburys.co.uk/shop/BookingDeliverySlotDisplayView'
client = requests.Session()
PutReq = client.put(putUrl, data=payload)
rget = client.get(Sains_url)
r2 = client.get(Sains_url2)
soup = BeautifulSoup(r2.content,'lxml')
table = soup.find_all('table')[0]
df = pd.read_html(str(table), skiprows=([1]))[0]
df = df[~df.Time.str.contains("Afternoon delivery")]
df = df[~df.Time.str.contains("Evening delivery")]
我的数据框应该如下所示:
+-------------+----------------+-------------+-------------+
| Time | Today | Wed 26 June | Thu 27 June |
+-------------+----------------+-------------+-------------+
| 7.30-8:30am | Not Available | £3 | £5 |
+-------------+----------------+-------------+-------------+
【问题讨论】:
标签: html pandas beautifulsoup python-requests