【问题标题】:How do I scrape real time public transport timings using Python?如何使用 Python 获取实时公共交通时间?
【发布时间】:2019-01-20 00:48:21
【问题描述】:

https://www.ptv.vic.gov.au/next5/diva/10018306/line/9777/2

我正在尝试获取时间/时间(出发时间)和目的地,但页面每 60 秒刷新一次,我无法获取该信息。

这是我迄今为止尝试过的:

from bs4 import BeautifulSoup
import requests
from user_agent import generate_user_agent
from requests import get

headers = {'User-Agent': generate_user_agent(device_type="desktop", os=('mac', 'linux'))}
url = 'https://www.ptv.vic.gov.au/next5/diva/10004556/line/11613/2'
response = get(url)

html_soup = BeautifulSoup(response.text, 'html.parser')
type(html_soup)
datatest = html_soup.find_all('div', class_='timetable')
print(type(datatest))
print(len(datatest))

我想从网站上获取至少 3 个即将到来的时间和目的地。

【问题讨论】:

    标签: python python-3.x web-scraping beautifulsoup python-requests


    【解决方案1】:

    使用 JSON 请求每分钟更新一次实时数据。从 JSON 数据中提取此信息比尝试从呈现的 HTML 中抓取信息更容易。例如:

    from datetime import datetime
    import requests
    
    r = requests.get("https://www.ptv.vic.gov.au/langsing/stop-services?stopId=10018306&direction=Altona&limit=20&mode=2")
    json_reply = r.json()
    
    for value in json_reply['values']:
        dt_departing = datetime.strptime(value['time_timetable_utc'], '%Y-%m-%dT%H:%M:%SZ')
        departing = dt_departing.strftime("%I:%M%p")   # 12hour format
        line_name = value['platform']['direction']['line']['line_name']
        print(f'{departing} - {line_name}')
    

    会给你数据开始:

    05:57PM - 903 - Altona - Mordialloc (SMARTBUS Service)
    06:14PM - 903 - Altona - Mordialloc (SMARTBUS Service)
    06:31PM - 903 - Altona - Mordialloc (SMARTBUS Service)
    06:41PM - 903 - Altona - Mordialloc (SMARTBUS Service)
    06:57PM - 903 - Altona - Mordialloc (SMARTBUS Service)
    07:09PM - 903 - Altona - Mordialloc (SMARTBUS Service)
    07:20PM - 903 - Altona - Mordialloc (SMARTBUS Service)
    07:30PM - 903 - Altona - Mordialloc (SMARTBUS Service)
    07:42PM - 903 - Altona - Mordialloc (SMARTBUS Service)
    07:51PM - 903 - Altona - Mordialloc (SMARTBUS Service)
    08:06PM - 903 - Altona - Mordialloc (SMARTBUS Service)
    08:20PM - 903 - Altona - Mordialloc (SMARTBUS Service)
    08:32PM - 903 - Altona - Mordialloc (SMARTBUS Service)
    08:44PM - 903 - Altona - Mordialloc (SMARTBUS Service)
    08:59PM - 903 - Altona - Mordialloc (SMARTBUS Service)
    09:14PM - 903 - Altona - Mordialloc (SMARTBUS Service)
    09:30PM - 903 - Altona - Mordialloc (SMARTBUS Service)
    09:45PM - 903 - Altona - Mordialloc (SMARTBUS Service)
    10:00PM - 903 - Altona - Mordialloc (SMARTBUS Service)
    10:15PM - 903 - Altona - Mordialloc (SMARTBUS Service)
    10:36PM - 706 - Mordialloc - Aspendale - Edithvale - Chelsea
    01:32AM - 706 - Mordialloc - Aspendale - Edithvale - Chelsea
    02:51AM - 706 - Mordialloc - Aspendale - Edithvale - Chelsea
    10:36PM - 706 - Mordialloc - Aspendale - Edithvale - Chelsea
    

    通过查看浏览器每 60 秒发出的请求找到该 URL。您可以通过更改 format string 轻松调整时间,例如使用 "%A %I:%M%p" 获取星期几

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2011-10-28
      • 1970-01-01
      • 2022-10-21
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-07-03
      相关资源
      最近更新 更多