【问题标题】:Get specific value BeautifulSoup (parsing)获取具体值 BeautifulSoup(解析)
【发布时间】:2020-08-29 18:27:29
【问题描述】:

我正在尝试从网站中提取信息。

使用 Python (BeautifulSoup)

我想提取以下数据(只是数字

EPS(基本)

来自:https://www.marketwatch.com/investing/stock/aapl/financials/income/quarter

来自xml

我已经构建了代码:

import pandas as pd
from bs4 import BeautifulSoup
import urllib.request as ur
import request 

url_is = 'https://www.marketwatch.com/investing/stock/aapl/financials/income/quarter'


read_data = ur.urlopen(url_is).read()
soup_is=BeautifulSoup(read_data, 'lxml')
cells = soup_is.findAll('tr', {'class': 'mainRow'} )
for cell in cells:
  print(cell.text)

但我不会为 EPS (Basic)

提取数据

有没有办法只提取数据并按列排序?

【问题讨论】:

    标签: python python-3.x parsing beautifulsoup


    【解决方案1】:

    尝试关注css选择器,检查td标签是否包含EPS (Basic)文本。

    import urllib.request as ur
    
    url_is = 'https://www.marketwatch.com/investing/stock/aapl/financials/income/quarter'
    read_data = ur.urlopen(url_is).read()
    soup_is=BeautifulSoup(read_data, 'lxml')
    row = soup_is.select_one('tr.mainRow>td.rowTitle:contains("EPS (Basic)")')
    print([cell.text for cell in row.parent.select('td') if cell.text!=''])
    

    输出

    [' EPS (Basic)', '2.47', '2.20', '3.05', '5.04', '2.58']
    

    在 DF 中打印

    import pandas as pd
    from bs4 import BeautifulSoup
    import urllib.request as ur
    
    url_is = 'https://www.marketwatch.com/investing/stock/aapl/financials/income/quarter'
    read_data = ur.urlopen(url_is).read()
    soup_is=BeautifulSoup(read_data, 'lxml')
    row = soup_is.select_one('tr.mainRow>td.rowTitle:contains("EPS (Basic)")')
    data=[cell.text for cell in row.parent.select('td') if cell.text!='']
    df=pd.DataFrame(data)
    print(df.T)
    

    输出

                  0     1     2     3     4     5
    0   EPS (Basic)  2.47  2.20  3.05  5.04  2.58
    

    【讨论】:

      猜你喜欢
      • 2015-10-06
      • 2017-06-02
      • 1970-01-01
      • 1970-01-01
      • 2016-09-23
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多