【问题标题】:Python Web Scraper print issuePython Web Scraper 打印问题
【发布时间】:2016-09-02 15:10:25
【问题描述】:

我在 python 中创建了一个网络爬虫,但是在最后打印时,我想打印我已经下载的 ("Bakerloo:" + info_from_website),正如您在代码中看到的那样,但它总是像 info_from_website 和忽略“Bakerloo:”字符串。无论如何都找不到解决办法。

import urllib
import urllib.request
from bs4 import BeautifulSoup
import sys

url = 'https://tfl.gov.uk/tube-dlr-overground/status/'
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page,"html.parser")

try:
   bakerlooInfo = (soup.find('li',{"class":"rainbow-list-item bakerloo "}).find_all('span')[2].text)
except:
   bakerlooInfo = (soup.find('li',{"class":"rainbow-list-item bakerloo disrupted expandable "}).find_all('span')[2].text)

bakerloo = bakerlooInfo.replace('\n','')
print("Bakerloo     : " + bakerloo)

【问题讨论】:

    标签: python python-3.x web-scraping


    【解决方案1】:

    我会改用CSS selector,获取带有disruption-summary 类的元素:

    import requests
    from bs4 import BeautifulSoup
    
    url = 'https://tfl.gov.uk/tube-dlr-overground/status/'
    page = requests.get(url)
    soup = BeautifulSoup(page.content, "html.parser")
    
    service = soup.select_one('li.bakerloo .disruption-summary').get_text(strip=True)
    print("Bakerloo: " + service)
    

    打印:

    Bakerloo: Good service
    

    (在这里使用requests)。


    请注意,如果您只想列出所有带有中断摘要的站点,请执行以下操作:

    import requests
    from bs4 import BeautifulSoup
    
    url = 'https://tfl.gov.uk/tube-dlr-overground/status/'
    page = requests.get(url)
    soup = BeautifulSoup(page.content, "html.parser")
    
    for station in soup.select("#rainbow-list-tube-dlr-overground-tflrail-tram ul li"):
        station_name = station.select_one(".service-name").get_text(strip=True)
        service_info = station.select_one(".disruption-summary").get_text(strip=True)
    
        print(station_name + ": " + service_info)
    

    打印:

    Bakerloo: Good service
    Central: Good service
    Circle: Good service
    District: Good service
    Hammersmith & City: Good service
    Jubilee: Good service
    Metropolitan: Good service
    Northern: Good service
    Piccadilly: Good service
    Victoria: Good service
    Waterloo & City: Good service
    London Overground: Good service
    TfL Rail: Good service
    DLR: Good service
    Tram: Good service
    

    【讨论】:

      猜你喜欢
      • 2019-04-05
      • 1970-01-01
      • 1970-01-01
      • 2021-06-01
      • 2021-10-19
      • 1970-01-01
      • 2011-02-14
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多