【问题标题】:Can't extract data using beautifulSoup for javascript?无法使用 beautifulSoup for javascript 提取数据?
【发布时间】:2020-06-24 05:37:01
【问题描述】:

大家好,我正在尝试从 https://newslab.malaysiakini.com/covid-19/en 提取数据

import requests
from bs4 import BeautifulSoup

page = requests.get("https://newslab.malaysiakini.com/covid-19/en")

soup = BeautifulSoup(page.content, 'html.parser')

option_tags = soup.find(id="uk-grid uk-grid-small uk-width-auto uk-flex uk-flex-middle uk-flex-center")

patient_items = option_tags.find_all(class_="patient")

first = patient_items[0]
print(first.prettigy())

我无法提取结果似乎我的 html.parser 无法获取我在 google 控制台中看到的数据。有人可以帮忙吗?

【问题讨论】:

  • 您要提取哪些数据?
  • 我想获取与他们的年龄、性别、在哪家医院接受治疗、同一集群中的其他病例相关的患者信息。@αԋɱҽԃαмєяιcαη

标签: python web-scraping beautifulsoup html-parsing


【解决方案1】:

在向https://newslab.malaysiakini.com/covid-19/en 发出初始请求后,该站点发出大量请求。这些附加链接可能包含您要查找的内容。

此链接可能包含您要查找的所有信息,但 GPS 坐标除外。位置比较困难,它们似乎被编译成一些 javascript 和数据标签。

https://m5.malaysiakini.com/en/tag/covid-19?alt=json 这包含谷歌地图/列表上所有故事的 JSON 格式。例如:

{
            "title": "Tabligh particiapants: Foreigners the cause of Covid-19 spread, not fair to blame locals",
            "sid": 514832,
            "image_feat": ["https://i.newscdn.net/publisher-c1a3f893382d2b2f8a9aa22a654d9c97/2020/03/9b6ba685820341c1cfc4f7d7faff7ba0.jpg"],
            "image_feat_single": "https://i.newscdn.net/publisher-c1a3f893382d2b2f8a9aa22a654d9c97/2020/03/9b6ba685820341c1cfc4f7d7faff7ba0.jpg",
            "summary": "<p>Most of us went to the hospital for testing as soon we were given the directive, says a participant.</p>",
            "author": "",
            "author_array": [],
            "author_display": "no",
            "date_pub": 1584321043,
            "date_pub2": "1584321043000",
            "date_pubh": "2020-03-16 09:10:43+08:00",
            "category": "news",
            "comment_count": 0,
            "tags": ["health", "coronavirus", "covid-19", "tabligh gathering", "infection"],
            "free": false,
            "redirect": "",
            "date_modh": "2020-03-16 09:10:43+08:00"
        }

【讨论】:

  • 哇,你是怎么追踪的?我真的需要向你学习更多技能哈哈
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2011-08-30
  • 2018-07-31
  • 1970-01-01
  • 1970-01-01
  • 2013-01-29
  • 2021-12-26
相关资源
最近更新 更多