【问题标题】:Web Scraping with BS4 - Can you sort this out?Web Scraping with BS4 - 你能解决这个问题吗?
【发布时间】:2021-07-24 08:56:44
【问题描述】:

你能帮我修复这个代码吗?这给了我一个错误的信息,例如,

AttributeError:ResultSet 对象没有属性“find_all”。您可能将元素列表视为单个元素。当你打算调用 find() 时,你调用了 find_all() 吗?

谁能帮我解决这个问题?下面是代码

import pandas as pd
import requests
from bs4 import BeautifulSoup 

url="https://www.cse.lk/pages/trade-summary/trade-summary.component.html"

data  = requests.get(url).text
soup = BeautifulSoup(data, 'html5lib')

cse = pd.DataFrame(columns=["Company Name", "Symbol", "Share Volume", "Trade Volume", "Previous Close (Rs.)", "Open (Rs.)", "High (Rs.)", "Low (Rs.)", "**Last Traded Price (Rs.)", "Change (Rs.)", "Change Percentage (%)"])
for row in soup.find_all('tbody').find_all('tr'): ##for row in soup.find("tbody").find_all('tr'):
    col = row.find_all("td")
    Company_Name = col[0].text
    Symbol = col[1].text
    Share_Volume = col[2].text
    Trade_Volume = col[3].text
    Previous_Close = col[4].text
    Open = col[5].text
    High = col[6].text
    Low = col[7].text
    Last_Traded_Price = col[8].text
    Change = col[9].text
    Change_Percentage = col[10].text
    cse = cse.append({"Company Name":Company_Name,"Symbol":Symbol,"Share Volume":Share_Volume,"Trade Volume":Trade_Volume,"Previous Close (Rs.)":Previous_Close,"Open (Rs.)":Open,"High (Rs.)":High,"Low (Rs.)":Low,"**Last Traded Price (Rs.)":Last_Traded_Price,"Change (Rs.)":Change,"Change Percentage (%)":Change_Percentage}, ignore_index=True)

【问题讨论】:

    标签: python web-scraping


    【解决方案1】:

    数据是通过 Javascript 从外部 URL 加载的,所以beautifulsoup 看不到它。您可以使用此示例如何加载它:

    import requests
    import pandas as pd
    
    url = "https://www.cse.lk/api/tradeSummary"
    
    data = requests.post(url).json()
    df = pd.DataFrame(data["reqTradeSummery"])
    
    print(df)
    df.to_csv("data.csv", index=None)
    

    打印:

           id                                                       name      symbol  quantity  percentageChange  change    price  previousClose     high      low  lastTradedTime    issueDate      turnover  sharevolume  tradevolume     marketCap  marketCapPercentage     open  closingPrice  crossingVolume  crossingTradeVol  status
    0     204                                      ABANS ELECTRICALS PLC  ABAN.N0000       317          4.184704    7.25   180.50         173.25   183.00   172.00   1626944252441  01/JAN/1984  1.256363e+06         7012           44  9.224561e+08                  0.0   179.00        180.50            7012                44       0
    1    1845                                          ABANS FINANCE PLC  AFSL.N0000        89         -3.225806   -1.00    30.00          31.00    30.10    30.00   1626944124197  27/JUN/2011  1.160916e+06        38652           11  1.996847e+09                  0.0    30.10         30.00           38652                11       3
    2    2065                                     ACCESS ENGINEERING PLC   AEL.N0000       500         -0.432900   -0.10    23.00          23.10    23.40    22.90   1626944388726  27/MAR/2012  1.968675e+07       855534          264  2.300000e+10                  0.0    23.10         23.00          855534               264       0
    3     472                                             ACL CABLES PLC   ACL.N0000      1000         -0.963855   -0.40    41.10          41.50    41.70    40.90   1626944397450  01/JAN/1976  3.037800e+07       738027          421  9.846521e+09                  0.0    41.50         41.10          738027               421       0
    4     406                                           ACL PLASTICS PLC  APLA.N0000        20          0.842697    2.25   269.25         267.00   272.75   266.00   1626943847820  05/APR/1995  1.436916e+06         5333           26  1.134216e+09                  0.0   272.75        269.25            5333                26       0
    
    ...
    

    并保存data.csv(来自 LibreOffice 的屏幕截图);

    【讨论】:

    • 感谢@Andrej。我可以知道使用 API 链接而不是给定 URL 位置的原因吗?
    • @SnyderFox URL https://www.cse.lk/pages/trade-summary/trade-summary.component.html 不包含数据。相反,它使用 JavaScript 连接到https://www.cse.lk/api/tradeSummary 并动态加载日期。因此,当您使用bs4 解析第一个 URL 时,您看不到数据。
    • 谢谢。您是如何将链接转换为 api 的?我是 python 新手。这就是为什么...
    • @SnyderFox 当您打开 Firefox 开发者工具-> 网络选项卡(Chrome 有类似的东西)并重新加载时,您将看到页面正在执行的所有请求。这些请求之一是带有数据的 API url。
    • 嗨 Andrej,请问我能知道一件事吗?您是如何在 df = pd.DataFrame(data["reqTradeSummery"]) 下获取“reqTradeSummery”的?我在哪里可以找到那个?
    猜你喜欢
    • 2011-01-24
    • 2021-08-14
    • 1970-01-01
    • 1970-01-01
    • 2022-12-26
    • 2011-09-27
    • 1970-01-01
    • 2021-11-17
    • 2020-03-10
    相关资源
    最近更新 更多