从雅虎财经中抓取数据答案

【问题标题】：Scrape data from yahoo finance从雅虎财经中抓取数据
【发布时间】：2020-04-01 18:58:08
【问题描述】：

伙计们，我想从 yahoo Finance 中获取公司所在的国家 - 美国，位于 Yahoo Finance 的个人资料页面中。链接是：

https://finance.yahoo.com/quote/AAPL/profile?p=AAPL

我尝试了这段代码，但无法提取它。我是抓取数据的新手，如果您能帮助我，我将不胜感激。

我的代码：

import requests
from lxml import html

xp = "//span[text()='Sector']/following-sibling::span[1]"

symbol = 'AAPL'

url = 'https://finance.yahoo.com/quote/' + symbol + '/profile?p=' + symbol

page = requests.get(url)
tree = html.fromstring(page.content)

d = {}

我更喜欢 lxm 和 requests 并且没有使用 beautifulsoup，所以更喜欢在代码库中指出。

不胜感激。

【问题讨论】：

标签： python web-scraping

【解决方案1】：

看看这是否适合你：

xpp = tree.xpath('//div[@data-reactid=7]/p/text()[3]')[0].strip()
xpp

输出：

'美国'

【讨论】：

【解决方案2】：

也许您可以结合使用 BeautifulSoup 和 Regex Search 来过滤掉位置：

import requests
from lxml import html
from bs4 import BeautifulSoup
import re

xp = "//span[text()='Sector']/following-sibling::span[1]"
symbol = 'TEVA'
url = 'https://finance.yahoo.com/quote/' + symbol + '/profile?p=' + symbol

page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
baseTag = soup.findAll('p', {'class':"D(ib) W(47.727%) Pend(40px)"})
matches = re.findall("\ -->(.*?)\<!--", str(baseTag))
print(matches[-1])

我使用 Google (GOOG)、Apple (APPL) 和 Teva Pharmaceutical Industries Limited (TEVA) 对其进行了测试，它似乎有效。

【讨论】：

【解决方案3】：

不要刮，而是使用yfinance，它会定期更新并简化一切：

import yfinance as yf
df = yf.download('TWTR')

如果你想绘制它：

import finplot as fplt
fplt.candlestick_ochl(df[['Open','Close','High','Low']])
fplt.show()

【讨论】：