【问题标题】:Using Beautifulsoup to return the value being shown on the table in the webpage (Pandas read html)使用 Beautifulsoup 返回网页中表格上显示的值(Pandas 读取 html)
【发布时间】:2019-06-25 12:41:47
【问题描述】:

我只想返回杂货零售商网站上显示的价格。

我已经在网站上抓取了表格,但我只想知道数据框中每个单元格的交货价格。我的想法是过滤每个单元格并返回单元格中字符串中价格的正则表达式匹配。我不确定是否有更简单的方法可以做到这一点,也许是 pd.read_html?

import requests
import pandas as pd
from bs4 import BeautifulSoup

postcode = 'l4 0th'
payload = {'postcode': postcode}
putUrl = 'https://www.sainsburys.co.uk/gol-api/v1/customer/postcode'
Sains_url = 'https://www.sainsburys.co.uk/shop/PostCodeCheckSuccessView'
Sains_url2 = 'https://www.sainsburys.co.uk/shop/BookingDeliverySlotDisplayView'
client = requests.Session()
PutReq = client.put(putUrl, data=payload)
rget = client.get(Sains_url)
r2 = client.get(Sains_url2)
soup = BeautifulSoup(r2.content,'lxml')
table = soup.find_all('table')[0]
df = pd.read_html(str(table), skiprows=([1]))[0]
df = df[~df.Time.str.contains("Afternoon delivery")]
df = df[~df.Time.str.contains("Evening delivery")]

我的数据框应该如下所示:

+-------------+----------------+-------------+-------------+
|    Time     |     Today      | Wed 26 June | Thu 27 June |
+-------------+----------------+-------------+-------------+
| 7.30-8:30am | Not Available  | £3          | £5          |
+-------------+----------------+-------------+-------------+

【问题讨论】:

    标签: html pandas beautifulsoup python-requests


    【解决方案1】:

    IIUC,您可以使用regexapplymap 进行一些后期处理:

    import re
    
    pat = re.compile('£\S+')
    
    # Where this regex will extract '£' and every proceeding character
    # upto the next whitespace
    
    df.applymap(lambda x: re.findall(pat, str(x))[0] if '£' in str(x) else x)
    

    [出]

                     Time          Today    Wed  26 Jun Thu  27 Jun Fri  28 Jun  \
    0     7:30am - 8:30am  Not Available  Not Available       £4.50          £7   
    1     8:00am - 9:00am  Not Available             £3       £5.50          £6   
    2     8:30am - 9:30am  Not Available             £3       £5.50          £6   
    3    9:00am - 10:00am  Not Available             £3       £4.50          £6   
    4    9:30am - 10:30am  Not Available             £3       £4.50          £6   
    5   10:00am - 11:00am  Not Available          £2.50       £3.50          £5   
    6   11:00am - 12:00pm  Not Available          £1.50       £2.50          £4   
    8    12:00pm - 1:00pm  Not Available             £1          £2          £3   
    9     1:00pm - 2:00pm  Not Available          £0.50          £2       £2.50   
    10    2:00pm - 3:00pm  Not Available          £0.50          £3       £2.50   
    11    3:00pm - 4:00pm  Not Available          £0.50          £3       £3.50   
    12    4:00pm - 5:00pm  Not Available             £1          £3       £4.50   
    13    4:30pm - 5:30pm  Not Available             £1          £3       £4.50   
    15    5:00pm - 6:00pm  Not Available             £1       £3.50       £4.50   
    16    5:30pm - 6:30pm  Not Available             £1       £3.50       £4.50   
    17    6:00pm - 7:00pm  Not Available  Not Available       £2.50          £4   
    18    6:30pm - 7:30pm  Not Available  Not Available       £2.50          £4   
    19    7:00pm - 8:00pm  Not Available  Not Available       £2.50          £4   
    20    7:30pm - 8:30pm  Not Available  Not Available       £2.50          £4   
    21    8:00pm - 9:00pm  Not Available  Not Available       £1.50          £2   
    22   9:00pm - 10:00pm  Not Available          £1.50          £1       £1.50   
    23  10:00pm - 11:00pm  Not Available             £1       £0.50       £1.50   
    
          Sat  29 Jun    Sun  30 Jun Mon  1 Jul  
    0           £6.50  Not Available      £5.50  
    1              £7             £7      £5.50  
    2              £7             £7      £5.50  
    3              £7             £7         £5  
    4              £7             £7         £5  
    5           £5.50          £5.50      £4.50  
    6           £5.50             £5      £2.50  
    8           £3.50          £3.50         £2  
    9              £3          £3.50      £1.50  
    10             £3          £2.50         £3  
    11          £3.50             £3      £2.50  
    12          £3.50          £3.50         £4  
    13          £3.50          £3.50         £4  
    15             £3          £2.50         £4  
    16             £3          £2.50         £4  
    17             £3             £3         £3  
    18             £3             £3         £3  
    19             £3             £3         £3  
    20             £3             £3         £3  
    21             £2             £2         £1  
    22             £2             £2         £1  
    23  Not Available  Not Available      £0.50  
    

    如果lambdas 不是你的菜,这将类似于更明确的:

    def extract_cost(string):
        if '£' in string:
            return re.findall('£\S+', string)[0]
        else:
            return string
    
    df.applymap(extract_cost)
    

    这里的applymap 只是将函数extract_cost '应用'到DataFrame 中的每个值

    【讨论】:

    • 这很完美,正是我想要的。谢谢!我想我需要对“applymap”和“lambda”做更多的研究!
    • @Jack 很高兴它有帮助。编码愉快!
    猜你喜欢
    • 2017-06-22
    • 1970-01-01
    • 2017-12-18
    • 2020-09-24
    • 2022-10-17
    • 2021-11-15
    • 2020-02-25
    • 1970-01-01
    • 2022-11-20
    相关资源
    最近更新 更多