【问题标题】:Is there a Python function to scrape different class names?是否有一个 Python 函数来抓取不同的类名?
【发布时间】:2020-07-17 17:28:03
【问题描述】:

这是链接:https://www.mobihealthnews.com/news?page=0

对于新闻页面中的每篇文章,我都在尝试抓取文章的名称+其简短内容+链接+发布日期+作者姓名。

当网站有不同的类名时,我遇到了一些问题。例如:

<div class="views-row views-row-1 views-row-odd views-row-first">...</div>
<div class="views-row views-row-2 views-row-even">...</div>
<div class="views-row views-row-3 views-row-odd">...</div>
<div class="views-row views-row-4 views-row-even">...</div>
<div class="views-row views-row-5 views-row-odd">...</div>
<div class="views-row views-row-6 views-row-even">...</div>
<div class="views-row views-row-7 views-row-odd">...</div>
<div class="views-row views-row-8 views-row-even">...</div>
<div class="views-row views-row-9 views-row-odd">...</div>
<div class="views-row views-row-10 views-row-even views-row-last">...</div>

除了列出一长串if-else 声明之外,还有其他方法可以获取课程吗?

附加信息:我目前正在使用 BeautifulSoup4 和 requests 库。

提前感谢您的宝贵时间。

编辑:这是我的策略,但我很确定必须更改 links 变量中的某些内容。

soup=BeautifulSoup(page.text,'html.parser')
frame=[]
links=soup.find_all('div',attrs={'class':'group-left list-wrapper'})
print(len(links))
filename="mobi_health_news.csv"
f=open(filename,"w", encoding = 'utf-8')
headers="Title,Content,Date, Link, Author\n"
f.write(headers)

for j in links:
    Title = j.find("div",attrs={'class':'views-field views-field-title'}).text.strip()
    Link = "https://www.mobihealthnews.com"
    Link += j.find("div",attrs={'class':'views-field views-field-title'}).find('a')['href'].strip()
    Date = j.find('span',attrs={'class':'day_list'}).text.strip()
    Content = j.find('div', attrs={'class':'views-field views-field-body'}).text.strip()
    Author = j.find('span', attrs ={'class':'author_list'}).text.strip()
    frame.append((Title,Content,Date,Link,Author))        f.write(Title.replace(",","^")+","+Link+","+Author.replace(",","^")+","+Content.replace(",","^")+","+Date.replace(",","^")+"\n")
upperframe.extend(frame)
f.close()

【问题讨论】:

  • 你能分享你的代码吗?你尝试过什么
  • @Umair 刚刚编辑了我的问题

标签: html python-3.x beautifulsoup python-requests


【解决方案1】:

无需选择class="..." 中的所有类名。只需为每个字段选择一个唯一的。

例如:

import requests
from bs4 import BeautifulSoup


url = 'https://www.mobihealthnews.com/news?page=0'    
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

for row in soup.select('.group-left .views-row'):
    title = row.select_one('.views-field-title').get_text(strip=True)
    content = row.select_one('.views-field-body').get_text(strip=True)
    link = 'https://www.mobihealthnews.com' + row.a['href']
    dt = row.select_one('.day_list').get_text(strip=True)
    author = row.select_one('.author_list').get_text(strip=True)

    print(title)
    print(link)
    print(dt,'by', author)
    print(content)
    print('-' * 120)

打印:

Vitls scores 510(k) clearance for continual and remote vital signs monitoring device
https://www.mobihealthnews.com/news/vitls-scores-510k-clearance-continual-and-remote-vital-signs-monitoring-device
July 16, 2020 by Mallory Hackett
The information is stored and sent to hospital systems and the Vitls app, so healthcare providers can monitor the vital signs of their patients in real time, no matter where they are.
------------------------------------------------------------------------------------------------------------------------
Walgreens, DoorDash partner on nonprescription delivery orders
https://www.mobihealthnews.com/news/walgreens-doordash-partner-non-prescription-delivery-orders
July 16, 2020 by Dave Muoio
Through the DoorDash app or website, consumers in certain cities can have over-the-counter medications and other products delivered to their homes.
------------------------------------------------------------------------------------------------------------------------
Roche, Genentech ink real-world data deal with PicnicHealth
https://www.mobihealthnews.com/news/roche-genentech-ink-real-world-data-deal-picnichealth
July 16, 2020 by Laura Lovett
The original focus will be on multiple sclerosis but will extend to include Huntington's disease and hemophilia.
------------------------------------------------------------------------------------------------------------------------
Teva Pharmaceuticals releases its prescription ProAir Digihaler in the US
https://www.mobihealthnews.com/news/teva-pharmaceuticals-releases-its-prescription-proair-digihaler-us
July 16, 2020 by Dave Muoio
The connected inhaler's launch will be followed by Teva's other Digihaler products before the end of the year.
------------------------------------------------------------------------------------------------------------------------
Health equity focused startup Cityblock lands $53.5M in funding
https://www.mobihealthnews.com/news/health-equity-focused-startup-cityblock-lands-535m-funding
July 16, 2020 by Laura Lovett
This comes a year after its last $63 million funding round.
------------------------------------------------------------------------------------------------------------------------
Roundup: Isle of Wight infections drop following launch of COVID-19 app, NHS Providers publish digital guide and more briefs
https://www.mobihealthnews.com/news/europe/roundup-isle-wight-infections-drop-following-launch-covid-19-app-nhs-providers-publish
July 16, 2020 by Sara Mageit
Also, a new study shows workers back restrictions on technology use since the rise of remote working.
------------------------------------------------------------------------------------------------------------------------
Mental health tech firm Meditopia scores $15 million in Series A round
https://www.mobihealthnews.com/news/europe/mental-health-tech-firm-meditopia-scores-15-million-series-round
July 16, 2020 by Tammy Lovell
The funds will be used to expand reach of its culturally-tailored mindfulness app.
------------------------------------------------------------------------------------------------------------------------
Oncoshot partners with MyDoc to offer second opinion advice for cancer patients
https://www.mobihealthnews.com/news/asia-pacific/oncoshot-partners-mydoc-offer-second-opinion-advice-cancer-patients
July 16, 2020 by Dean Koh
The service enables patients from the region to make informed decisions about cancer care, with the aim of expanding their treatment options and improving clinical outcomes.
------------------------------------------------------------------------------------------------------------------------
Tabula Rasa HealthCare launches MedWise to prevent adverse drug events
https://www.mobihealthnews.com/news/tabula-rasa-healthcare-launches-medwise-prevent-adverse-drug-events
July 15, 2020 by Mallory Hackett
With the service, pharmacists can compare multiple different medications and see how risky the combination is.
------------------------------------------------------------------------------------------------------------------------
Care coordination, telehealth startups merge to support vulnerable senior populations
https://www.mobihealthnews.com/news/care-coordination-telehealth-startups-merge-support-vulnerable-senior-populations
July 15, 2020 by Dave Muoio
Arkos Health will weave together Curavi Health, CarePointe and U.S. Health Systems' various care platforms for payer and provider customers.
------------------------------------------------------------------------------------------------------------------------

【讨论】:

    猜你喜欢
    • 2010-09-10
    • 2015-03-16
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-11-03
    • 2014-01-30
    • 2020-12-15
    • 1970-01-01
    相关资源
    最近更新 更多