Python Web抓取股票图表，找不到股票代码时代码卡住答案

【问题标题】：Python Web scrape stock charts, code stuck when stock symbol not foundPython Web抓取股票图表，找不到股票代码时代码卡住
【发布时间】：2019-03-25 06:29:24
【问题描述】：

我有一个股票代码列表要在这个网站上运行，然后希望获得股票图表的链接

但是，当符号出错时，网站会重定向到另一个页面，python 会停止运行剩余的符号

我的符号列表是：WOW、AAR、TPM

错误发生在 AAR

谁能给这个Py noob一些指导？


from urllib import urlopen
from bs4 import BeautifulSoup
import re

newsymbolslist = ['WOW','AAR','TPM']

i=0

try:
    while i < len(newsymbolslist):
        try:
            html = urlopen( 'http://bigcharts.marketwatch.com/quickchart/quickchart.asp?symb=AU%3A'+newsymbolslist[i])
            bs = BeautifulSoup(html, 'html.parser')
            images = bs.find_all('img', {'src': re.compile('market')})
            for image in images:
                print (image['src'] + '\n')
                i += 1
        except:
            print "error"
            i += 1
except:
    pass

最好的结果是它获取了股票图表的所有链接，可以告诉我哪个股票代码遇到错误并继续运行剩余的代码

谢谢

【问题讨论】：

图表的链接在哪里定义？你能发布一个示例输出吗？

标签： python loops redirect web-scraping

【解决方案1】：

符号不存在时不会抛出异常。这意味着 i 不会增加，因为它在 for 循环中迭代找到的图像（在 AAR 情况下只是一个空列表）。结果是 i 永远不会满足中断 while 循环的条件，并且它会永远继续下去。将 i+=1 移动到 finally 块中可确保它始终递增。

from urllib import urlopen
from bs4 import BeautifulSoup
import re
newsymbolslist = ['WOW','AAR','TPM']
i=0
try:
    while i < len(newsymbolslist):
        try:
            html = urlopen( 'http://bigcharts.marketwatch.com/quickchart/quickchart.asp?symb=AU%3A'+newsymbolslist[i])
            bs = BeautifulSoup(html, 'html.parser')
            images = bs.find_all('img', {'src': re.compile('market')})
            for image in images:
                print (image['src'] + '\n')      
        except Exception as e:
            print "error"
        finally:
            i += 1
except:
    pass

作为一项改进，您可以通过迭代您拥有的符号列表来完全删除 while 循环。那你就不用担心递增i：

for symbol in newsymbolslist:
    try:
        html = urlopen( 'http://bigcharts.marketwatch.com/quickchart/quickchart.asp?symb=AU%3A'+symbol)
        bs = BeautifulSoup(html, 'html.parser')
        images = bs.find_all('img', {'src': re.compile('market')})
        for image in images:
            print (image['src'] + '\n')      
    except Exception as e:
        print "error"

【讨论】：

【解决方案2】：

存在逻辑错误。这是一个我认为会让你摆脱困境的改变。

from urllib import urlopen
from bs4 import BeautifulSoup
import re

newsymbolslist = ['WOW','AAR','TPM']

i=0

try:
    while i < len(newsymbolslist):
        try:
            html = urlopen( 'http://bigcharts.marketwatch.com/quickchart/quickchart.asp?symb=AU%3A'+newsymbolslist[i])
            bs = BeautifulSoup(html, 'html.parser')
            images = bs.find_all('img', {'src': re.compile('market')})
            for image in images:
                print (image['src'] + '\n')
            i += 1
        except:
            print "error"
            i += 1
except:
    pass

这可能有点简单：

from urllib import urlopen
from bs4 import BeautifulSoup
import re

newsymbolslist = ['WOW','AAR','TPM']

try:
    for symbol in newsymbolslist:
        try:
            html = urlopen( 'http://bigcharts.marketwatch.com/quickchart/quickchart.asp?symb=AU%3A'+symbol)
            bs = BeautifulSoup(html, 'html.parser')
            images = bs.find_all('img', {'src': re.compile('market')})
            for image in images:
                print (image['src'] + '\n')
        except:
            print "error"
except:
    pass

【讨论】：

【解决方案3】：

稍微更简洁和重用现有连接：

import requests
from bs4 import BeautifulSoup

newSymbolsList = ['WOW','AAR','TPM']

with requests.Session() as s:
    for symbol in newSymbolsList:
        try:
            html = s.get('http://bigcharts.marketwatch.com/quickchart/quickchart.asp?symb=AU%3A'+ symbol).content
            bs = BeautifulSoup(html, 'lxml')
            images = [img['src'] for img in bs.select('img[src*=market]')]
            print(images)
        except Exception as e:
            print("error", e)

【讨论】：