【问题标题】:Python loop in beautiful soup regex list [duplicate]美丽的汤正则表达式列表中的Python循环[重复]
【发布时间】:2018-03-03 06:14:08
【问题描述】:

当我运行下面的代码时,我得到三个列表,一个在另一个垂直下方。 我希望它们是水平的,用逗号分隔(类似于最后一个打印列表语句,其中数据用逗号分隔)。 我尝试重新排列 for 循环语句,我得到了各种组合,但没有像我上面描述的那样。请帮忙!

import bs4 as bs
import urllib.request
import re

sauce = urllib.request.urlopen('http://www5.statcan.gc.ca/cimt-cicm/topNCountryCommodities-marchandises?lang=eng&chapterId=27&sectionId=0&refMonth=2&refYr=2017&freq=6&countryId=999&usaState=0&provId=1&arrayId=9900000&commodityId=271111&commodityName=Natural+gas%2C+liquefied&topNDefault=10&tradeType=3').read()
soup = bs.BeautifulSoup(sauce,'lxml')

regexQ = re.compile('.*Date1 Qty.*')
regexC = re.compile('.*Footnote.*')
regexV = re.compile('.*Date1 Val.*')

for countryPart in soup.findAll("a",{"href":regexC}):
        Country = countryPart.text.strip()
        print(Country)
for DatePart in soup.findAll("td",{"headers":regexQ}):
        Quantity = DatePart.text.strip()
        print(Quantity)
for ValPart in soup.findAll("td",{"headers": regexV}):
        Value = ValPart.text.strip()
        print(Value)

list = [Country,Quantity,Value]
print(list)

【问题讨论】:

    标签: regex python-3.x loops beautifulsoup


    【解决方案1】:

    看看List Comprehensions

    此外,在 BeautifulSoup 中使用正则表达式时,您不需要 .* 来匹配任何字符。

    用它来得到你想要的:

    regexQ = re.compile('Date1 Qty')
    regexC = re.compile('Footnote')
    regexV = re.compile('Date1 Val')
    
    country = [x.text.strip() for x in soup.find_all("a", {"href": regexC})]
    quantity = [x.text.strip() for x in soup.find_all("td", {"headers": regexQ})]
    value = [x.text.strip() for x in soup.find_all("td", {"headers": regexV})]
    
    total_list = [list(x) for x in zip(country, quantity, value)]
    for item in total_list:
        print(item)
    

    输出:

    ['World', '282,911,404', '67,284,637']
    ['Equatorial Guinea', '146,027,530', '40,493,766']
    ['Trinidad and Tobago', '136,883,464', '26,790,695']
    ['Japan', '410', '176']
    

    【讨论】:

      【解决方案2】:

      您可以在不使用regex 的情况下做到这一点。请尝试以下方法来实现相同的效果。我用list comprehensions

      使用urllib

      from urllib.request import urlopen
      from bs4 import BeautifulSoup
      
      res = urlopen("http://www5.statcan.gc.ca/cimt-cicm/topNCountryCommodities-marchandises?lang=eng&chapterId=27&sectionId=0&refMonth=2&refYr=2017&freq=6&countryId=999&usaState=0&provId=1&arrayId=9900000&commodityId=271111&commodityName=Natural+gas%2C+liquefied&topNDefault=10&tradeType=3")
      soup = BeautifulSoup(res.read(),"lxml")
      for items in soup.find_all(class_="ResultRow"):
          data = [item.get_text(" ",strip=True) for item in items.find_all(["th","td"])[1:4]]
          print(data)
      

      使用requests

      import requests
      from bs4 import BeautifulSoup
      
      res = requests.get("http://www5.statcan.gc.ca/cimt-cicm/topNCountryCommodities-marchandises?lang=eng&chapterId=27&sectionId=0&refMonth=2&refYr=2017&freq=6&countryId=999&usaState=0&provId=1&arrayId=9900000&commodityId=271111&commodityName=Natural+gas%2C+liquefied&topNDefault=10&tradeType=3")
      soup = BeautifulSoup(res.text,"lxml")
      for items in soup.find_all(class_="ResultRow"):
          data = [item.get_text(" ",strip=True) for item in items.find_all(["th","td"])[1:4]]
          print(data)
      

      输出:

      ['World', '282,911,404', '67,284,637']
      ['Equatorial Guinea', '146,027,530', '40,493,766']
      ['Trinidad and Tobago', '136,883,464', '26,790,695']
      ['Japan', '410', '176']
      

      【讨论】:

        【解决方案3】:

        尝试将您的国家/地区和其他结果合并到一个列表中。

        然后试试这个:

        for mylist in lists:
            print(*mylist, end=", ")
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2018-09-21
          • 2012-05-05
          • 1970-01-01
          • 2013-06-29
          • 2013-04-09
          • 2014-09-05
          • 2017-12-14
          相关资源
          最近更新 更多