【发布时间】:2016-07-27 23:36:43
【问题描述】:
我对编程还很陌生,这是我阅读各种指南后的第一个项目。我正在尝试从 Yahoo Finance Key Statistics 页面和财务报表(即http://finance.yahoo.com/q/ks?s=GOOG+Key+Statistics)中抓取数据。指向财务的链接位于关键统计页面的底部。关键统计功能的代码似乎有效。
但是对于语句函数,pattern3中使用的入口变量并没有得到负值。这个问题在现金流量表中尤为明显。对于负值,条目应如下所示
entry = '<td align="right">(.+?)</td>'
我的处理方法正确吗?有没有一种简单的方法可以获取财务报表的所有值并将它们放入一个列表中?
我在 Python 2.7 中的代码:
import urllib
import re
keystat = '<td class="yfnc_tabledata1">(.+?)</td>'
date = '<th scope="col" style="border-top:2px solid #000;text-align:right; font- weight:bold">(.+?)</th>' #obtain the date; only works for income statement
total = '<strong>(.+?) </strong>' #obtain data for any totals from statements
entry = '<td align="right">(.+?) </td>' #obtain data for any entries on statements that are not totals
def keystatfunc(symbol):
url = 'http://finance.yahoo.com/q/ks?s=' + symbol + '+Key+Statistics'
htmlfile = urllib.urlopen(url)
htmltext = htmlfile.read()
regex = '<span id="yfs_j10_' + symbol + '">(.+?)</span>'
pattern = re.compile(regex)
pattern2 = re.compile(keystat)
marketcap = re.findall(pattern, htmltext)
keystats = re.findall(pattern2, htmltext)
return (marketcap + keystats[1:31]) #creates a list with all the data on key statistics page)
def statement(symbol, period, statementtype): #period: "quarter" or "annually"; statementtype: is, bs, or cf (income statement, balance sheet, cash flow statement)
if period == "quarterly" and statementtype == "bs":
url = 'http://finance.yahoo.com/q/bs?s=' + symbol
elif period == "annual" and statementtype == "bs":
url = 'http://finance.yahoo.com/q/bs?s=' + symbol + '&annual'
elif period == "quarterly" and statementtype == "is":
url = 'http://finance.yahoo.com/q/is?s=' + symbol + '&annual'
elif period == "annual" and statementtype == "is":
url = 'http://finance.yahoo.com/q/is?s=' + symbol + '&annual'
elif period == "quarterly" and statementtype == "cf":
url = 'http://finance.yahoo.com/q/cf?s=' + symbol + '&annual'
elif period == "annual" and statementtype == "cf":
url = 'http://finance.yahoo.com/q/cf?s=' + symbol + '&annual'
htmlfile = urllib.urlopen(url)
htmltext = htmlfile.read()
pattern = re.compile(date)
pattern2 = re.compile(total)
pattern3 = re.compile(entry)
dates = re.findall(pattern, htmltext)
totals = re.findall(pattern2, htmltext)
entries = re.findall(pattern3, htmltext)
return (dates + totals + entries)
print keystatfunc("goog")
print statement("goog", "annual", "cf")
【问题讨论】:
-
为什么不使用他们的 API?
-
或者至少是专为屏幕抓取而设计的东西,例如Scrapy
标签: python scraper yahoo-finance