【问题标题】:Need to scrape a data from a website using xpath and beautifulsoup需要使用 xpath 和 beautifulsoup 从网站上抓取数据
【发布时间】:2020-10-21 14:36:50
【问题描述】:

大家好

website link

故事是他试图抓取一个名为“Open Bets”的表格,但不幸的是该表格没有类或 id,我使用 beautifulsoup 抓取表格并使用 XPath 检测表格但没有发生任何事情,如您所见在下图中:

我试图从表中抓取数据并检测命名为“Team A”与“Team B”的列 关键是我这样显示数据

print(Player1," vs ",Player2)
print("Odds ",odds)
print("Rate ",rate)
print("stake ",stake)

我想你会明白我在这里想要做什么 这是下表:

我尝试联系网站管理员向代码源添加类或其他内容,但没有。

from lxml import html
import requests
page = requests.get('https://tipsters.asianbookie.com/index.cfm?player=Mitya68&ID=297754')
tree = html.fromstring(page.content)
ID = tree.xpath('/html/body/table[2]/tbody/tr/td[3]/table[7]')
print(ID)

这是我使用的代码,如果有人可以提供帮助,那就太好了 =)

【问题讨论】:

    标签: python xpath web-scraping beautifulsoup


    【解决方案1】:

    一个简单的方法是使用pandas。以下是你的做法:

    import pandas as pd
    import requests
    
    r = requests.get('https://tipsters.asianbookie.com/index.cfm?player=Mitya68&ID=297754&sortleague=1#playersopenbets&tz=5.5').text
    
    dfs = pd.read_html(r)
    
    df = dfs[141]
    
    df.columns = df.iloc[0]
    
    df = df.drop(0)
    
    df['Bet Placed ≡'] = [value.split('.')[-1] for value in df['Bet Placed ≡']]
    
    print(df)
    

    输出:

    0   Bet Placed ≡              Team A  ...   Rate         Pending Status
    1    9 hours ago         Real Madrid  ...  1.975            pending ?-?
    2    9 hours ago   Red Bull Salzburg  ...  1.875            pending ?-?
    3    9 hours ago                Ajax  ...   2.00            pending ?-?
    4    9 hours ago       Bayern Munich  ...   2.00            pending ?-?
    5    9 hours ago       Bayern Munich  ...   1.85            pending ?-?
    6    9 hours ago         Inter Milan  ...  1.875            pending ?-?
    7    9 hours ago     Manchester City  ...   1.95            pending ?-?
    8    9 hours ago         Midtjylland  ...  1.875            pending ?-?
    9    9 hours ago  Olympiakos Piraeus  ...   1.95            pending ?-?
    10   9 hours ago          Hamburg SV  ...  1.925            pending ?-?
    11   9 hours ago         Vissel Kobe  ...  1.925   Lost(-25,000) FT 1-3
    12   9 hours ago     Shonan Bellmare  ...  1.825   Won½(+10,313) FT 0-0
    13   9 hours ago    Yokohama Marinos  ...  2.025   Won½(+12,812) FT 2-1
    14   9 hours ago        RKC Waalwijk  ...  1.875            pending ?-?
    15   9 hours ago            Espanyol  ...  2.075  lose(-25,000) 29' 1-0
    
    [15 rows x 7 columns]
    

    您还可以通过将这些行添加到代码中来将这些值作为单独的列表获取:

    team_a = list(df['Team A'])
    team_b = list(df['Team B'])
    rate = list(df['Rate'])
    stake = list(df['Stake'])
    

    如果您想以您提到的格式打印它们,请将这些行添加到您的代码中:

    final_lst = zip(team_a,team_b,stake,rate)
    
    for teamA,teamB,stakee,ratee in final_lst:
        print(f"{teamA} vs {teamB} - Stake: {stakee}, Rate: {ratee}")
    

    输出:

    Real Madrid vs Shaktar Donetsk - Stake: 25000.00, Rate: 1.975
    Red Bull Salzburg vs Lokomotiv Moscow - Stake: 100000.00, Rate: 1.875
    Ajax vs Liverpool - Stake: 25000.00, Rate: 2.00
    Bayern Munich vs Atl. Madrid - Stake: 25000.00, Rate: 2.00
    Bayern Munich vs Atl. Madrid - Stake: 25000.00, Rate: 1.85
    Inter Milan vs Monchengladbach - Stake: 25000.00, Rate: 1.875
    Manchester City vs Porto - Stake: 25000.00, Rate: 1.95
    Midtjylland vs Atalanta - Stake: 100000.00, Rate: 1.875
    Olympiakos Piraeus vs Marseille - Stake: 25000.00, Rate: 1.95
    Hamburg SV vs Erzgebirge Aue - Stake: 100000.00, Rate: 1.925
    Vissel Kobe vs Kashima Antlers - Stake: 25000.00, Rate: 1.925
    Shonan Bellmare vs Sagan Tosu - Stake: 25000.00, Rate: 1.825
    Yokohama Marinos vs Nagoya - Stake: 25000.00, Rate: 2.025
    RKC Waalwijk vs PEC Zwolle - Stake: 25000.00, Rate: 1.875
    Espanyol vs Mirandes - Stake: 25000.00, Rate: 2.075
    

    【讨论】:

    • 太棒了!我只是想知道如何检测玩家姓名以及比率和赌注并将它们转移到变量
    • 玩家姓名?本例中的玩家名字是 Mitya68 吗?
    • 正如您在桌子上看到的那样,有一个“Team A”和“Team B”列,关键是要废弃团队 a 的球员姓名并将他们添加到变量中,然后废弃团队 b 并添加到一个变量并废弃每个玩家的比率,例如“Vissel Kobe”与“Kashima Antlers” - 赌注:“25,000.00” - 比率“1.925”
    • 我想像这样报废...报废数据,将它们添加到变量并显示结果如下:var(player1),“vs”,var(player2)-stake:var(股权) - 利率:var(利率)
    猜你喜欢
    • 2021-04-22
    • 1970-01-01
    • 1970-01-01
    • 2015-05-09
    • 2021-10-26
    • 1970-01-01
    • 1970-01-01
    • 2014-05-30
    • 2020-07-07
    相关资源
    最近更新 更多