【问题标题】:Python Web scrape stock charts, code stuck when stock symbol not foundPython Web抓取股票图表,找不到股票代码时代码卡住
【发布时间】:2019-03-25 06:29:24
【问题描述】:

我有一个股票代码列表要在这个网站上运行,然后希望获得股票图表的链接

但是,当符号出错时,网站会重定向到另一个页面,python 会停止运行剩余的符号

我的符号列表是:WOW、AAR、TPM

错误发生在 AAR

谁能给这个Py noob一些指导?


from urllib import urlopen
from bs4 import BeautifulSoup
import re

newsymbolslist = ['WOW','AAR','TPM']

i=0

try:
    while i < len(newsymbolslist):
        try:
            html = urlopen( 'http://bigcharts.marketwatch.com/quickchart/quickchart.asp?symb=AU%3A'+newsymbolslist[i])
            bs = BeautifulSoup(html, 'html.parser')
            images = bs.find_all('img', {'src': re.compile('market')})
            for image in images:
                print (image['src'] + '\n')
                i += 1
        except:
            print "error"
            i += 1
except:
    pass

最好的结果是它获取了股票图表的所有链接,可以告诉我哪个股票代码遇到错误并继续运行剩余的代码

谢谢

【问题讨论】:

  • 图表的链接在哪里定义?你能发布一个示例输出吗?

标签: python loops redirect web-scraping


【解决方案1】:

符号不存在时不会抛出异常。这意味着 i 不会增加,因为它在 for 循环中迭代找到的图像(在 AAR 情况下只是一个空列表)。结果是 i 永远不会满足中断 while 循环的条件,并且它会永远继续下去。将 i+=1 移动到 finally 块中可确保它始终递增。

from urllib import urlopen
from bs4 import BeautifulSoup
import re
newsymbolslist = ['WOW','AAR','TPM']
i=0
try:
    while i < len(newsymbolslist):
        try:
            html = urlopen( 'http://bigcharts.marketwatch.com/quickchart/quickchart.asp?symb=AU%3A'+newsymbolslist[i])
            bs = BeautifulSoup(html, 'html.parser')
            images = bs.find_all('img', {'src': re.compile('market')})
            for image in images:
                print (image['src'] + '\n')      
        except Exception as e:
            print "error"
        finally:
            i += 1
except:
    pass

作为一项改进,您可以通过迭代您拥有的符号列表来完全删除 while 循环。那你就不用担心递增i

for symbol in newsymbolslist:
    try:
        html = urlopen( 'http://bigcharts.marketwatch.com/quickchart/quickchart.asp?symb=AU%3A'+symbol)
        bs = BeautifulSoup(html, 'html.parser')
        images = bs.find_all('img', {'src': re.compile('market')})
        for image in images:
            print (image['src'] + '\n')      
    except Exception as e:
        print "error"

【讨论】:

    【解决方案2】:

    存在逻辑错误。这是一个我认为会让你摆脱困境的改变。

    from urllib import urlopen
    from bs4 import BeautifulSoup
    import re
    
    newsymbolslist = ['WOW','AAR','TPM']
    
    i=0
    
    try:
        while i < len(newsymbolslist):
            try:
                html = urlopen( 'http://bigcharts.marketwatch.com/quickchart/quickchart.asp?symb=AU%3A'+newsymbolslist[i])
                bs = BeautifulSoup(html, 'html.parser')
                images = bs.find_all('img', {'src': re.compile('market')})
                for image in images:
                    print (image['src'] + '\n')
                i += 1
            except:
                print "error"
                i += 1
    except:
        pass
    

    这可能有点简单:

    from urllib import urlopen
    from bs4 import BeautifulSoup
    import re
    
    newsymbolslist = ['WOW','AAR','TPM']
    
    try:
        for symbol in newsymbolslist:
            try:
                html = urlopen( 'http://bigcharts.marketwatch.com/quickchart/quickchart.asp?symb=AU%3A'+symbol)
                bs = BeautifulSoup(html, 'html.parser')
                images = bs.find_all('img', {'src': re.compile('market')})
                for image in images:
                    print (image['src'] + '\n')
            except:
                print "error"
    except:
        pass
    

    【讨论】:

      【解决方案3】:

      稍微更简洁和重用现有连接:

      import requests
      from bs4 import BeautifulSoup
      
      newSymbolsList = ['WOW','AAR','TPM']
      
      with requests.Session() as s:
          for symbol in newSymbolsList:
              try:
                  html = s.get('http://bigcharts.marketwatch.com/quickchart/quickchart.asp?symb=AU%3A'+ symbol).content
                  bs = BeautifulSoup(html, 'lxml')
                  images = [img['src'] for img in bs.select('img[src*=market]')]
                  print(images)
              except Exception as e:
                  print("error", e)
      

      【讨论】:

        猜你喜欢
        • 2011-01-23
        • 2013-01-04
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2015-08-05
        • 2014-09-24
        相关资源
        最近更新 更多