【问题标题】:Can't scrape all data from website with BeautifulSoup无法使用 BeautifulSoup 从网站上抓取所有数据
【发布时间】:2021-05-11 14:36:22
【问题描述】:

我正在尝试从此website 中抓取数据,但我无法从该行获取此特定信息:

"p class="mt-3 pt-2 mb-0 rs-rel-085"": "6,10 % aller Aktien sind besser bewertet.

(英语:“所有股票中有 6.1% 的评级更高。”)

我的代码正在为其余部分工作:

from bs4 import BeautifulSoup as soup
from urllib.request import Request, urlopen

# Set up scraper
url = (f"https://aktie.traderfox.com/visualizations/US30303M1027/DI/facebook-inc")
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
html = soup(webpage, "html.parser")

#find company name
name_1 = html.find("span",attrs={"class":"h1 m-0"})
name = name_1.text.strip()

#Ticker+WKN+ISIN
WKN_1 = html.find("span",attrs={"class":"color-grey2 d-lg-none"})
WKN = WKN_1.text.strip().replace("[","").replace("]","")

#enterprise value
value_2 = html.find("div",attrs={"class":"col-5 col-lg-auto d-lg-table-cell align-top text-nowrap"})
value_1 = value_2.find("td")
enterprise_value = value_1.text.strip()

#P/E, P/S, div. yield
fin_all = html.find_all("span",attrs={"class":"d-block d-sm-inline d-lg-block fs-rel-110"})
fin_pe = fin_all[0]
PE = fin_pe.text.strip()
fin_ps = fin_all[1]
PS = fin_ps.text.strip()
fin_div20 = fin_all[2]
div20 = fin_div20.text.strip()
fin_div19 = fin_all[3]
div19 = fin_div19.text.strip()

#Performance since year X and avg. return
perf3 = html.find_all("div",attrs={"class":"col-auto py-2 fs-080 color-grey2"})
perf2 = perf3[0]
perf1 = perf3[1]
perf_h = perf2.text.strip()
perf_d = perf1.text.strip()
perf_1 = html.find_all("div",attrs={"class":"col-auto py-2 fs-125 fs-lg-110 fs-xl-125"})
perf_2 = perf_1[0]
perf_hist = perf_2.text.strip()
perf_4 = perf_1[1]
perf_avg = perf_4.text.strip()
perf_year = perf_h[23:27]

print(name)
print(WKN)
print(enterprise_value)
print(PE,PS, div20, div19)
print(perf_year, perf_hist, perf_avg)

【问题讨论】:

  • 我认为那部分是用JS动态加载的,所以你用bs4不能刮。也许你可以试试硒

标签: python web-scraping beautifulsoup


【解决方案1】:

5,95 是根据通过单独的 JSON 请求获得的百分比分数计算得出的。该值计算为100 - (100 * score)

from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
from urllib import parse
import json

# Set up scraper
url = (f"https://aktie.traderfox.com/visualizations/US30303M1027/DI/facebook-inc")
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
soup = BeautifulSoup(webpage, "html.parser")

# << Your code here to get other items >>

# Locate the stock ID and request the JSON data for it
stock_id = soup.find('span', attrs={"data-id" : True})['data-id']
data = parse.urlencode({"stock_id" : stock_id}).encode()
req_fa =  Request("https://aktie.traderfox.com/ajax/getFactorAnalysis.php", data=data)
json_data = json.loads(urlopen(req_fa).read())

umsatzwachstum_growth = 100 - (100 * json_data["data"]["scores"]["salesgrowth5"]["score"])
eps_growth = 100 - (100 * json_data["data"]["scores"]["epsgrowth5"]["score"])
print(f"{umsatzwachstum_growth:.2f}, {eps_growth:.2f}")

这会给你:

5.95, 3.55

我建议您打印出json_data,以便更好地了解返回数据的格式。

【讨论】:

  • 谢谢你,Martin Evans!这正是我要找的。​​span>
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2019-09-03
  • 2015-05-09
  • 2021-10-26
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多