【问题标题】:My list if, else statement only returns the "if" statement我的列表 if,else 语句只返回“if”语句
【发布时间】:2020-07-30 09:26:24
【问题描述】:

从网址中我要提取这家养老院的资料:信息在网站上以如下格式给出:https://www.carehome.co.uk/carehome.cfm/searchazref/10001005FITA

集团: Excelcare Holdings

负责人:Denise Marks(注册经理)

地方当局/社会服务:伦敦塔哈姆雷特自治市议会(点击查看联系方式)

我的 get_deets 函数只输出它们各自列表“标签”和“兄弟”中的第一个元素。我也想要完整的标签文本列表和相应的信息。

脚本

import numpy as np
import pandas as pd
from bs4 import BeautifulSoup as soup
from selenium import webdriver

driver = webdriver.Chrome(executable_path=r'C:\Users\Main\Documents\Work\Projects\chromedriver')

my_url = "https://www.carehome.co.uk/carehome.cfm/searchazref/10001005FITA"

def make_soup(url):
  driver.get(url)
  m_soup = soup(driver.page_source, features='html.parser')
  return m_soup 

main_page = make_soup(my_url)

strongs = main_page.select(".blue")

def get_deets(strongs):
    tag = []
    sibling = []
    for strong_tag in strongs:
     if strong_tag.next_sibling == '\n':
        tag.append(strong_tag.text), sibling.append(strong_tag.next_sibling.next_sibling.text)
     else:
        tag.append(strong_tag.text), sibling.append(strong_tag.next_sibling.strip())
     return tag, sibling

我当前的输出:

get_deets(strongs)

    (['Group:'], ['Excelcare Holdings'])

期望的输出

标签

['Group:','Person in charge:', 'Local Authority / Social Services:'] 

兄弟姐妹

['Excelcare Holdings',  'Denise Marks (Registered Manager)','London Borough of Tower Hamlets Council (click for contact details)' ]

使用此 HTML:

<div class="profile-group-description col-xs-12 col-sm-8">

    <p><strong class="blue">Group:</strong>

        <a href="https://www.carehome.co.uk/care_search_results.cfm/searchgroup/36151505EXCA">Excelcare Holdings</a>
    </p>

    <p><strong class="blue">Person in charge:</strong>

      Denise Marks (Registered Manager)</p>

    <p><strong class="blue">Local Authority / Social Services:</strong> 
      London Borough of Tower Hamlets Council (<a href="https://www.carehome.co.uk/local-authorities/profile.cfm/id/Tower-Hamlets">click for contact details</a>)</p>

    <p>
        <strong class="blue">Type of Service:</strong>
      Care Home only (Residential Care) – Privately Owned , Registered for a maximum of 44 Service Users
    </p>

    <p>
        <strong class="blue">Registered Care Categories*:</strong> 
      Dementia • Learning Disability • Mental Health Condition • Old Age
    </p>

【问题讨论】:

    标签: list if-statement web-scraping beautifulsoup append


    【解决方案1】:

    鉴于您问题中的 HTML,它可能会简化一点:

    care = """[your HTML]"""
    
    from bs4 import BeautifulSoup as bs
    soup = bs(care, 'lxml')
    
    headers = []
    rows = []
    tags = soup.select('p')
    for tag in tags:
        items = tag.text.replace('\n','').split('\n')[0].split(':')
        headers.append(items[0].strip())
        rows.append(items[1].strip())
    for h,r in zip(headers,rows):
        print(h,': ',r)
    

    输出:

    Group :  Excelcare Holdings
    Person in charge :  Denise Marks (Registered Manager)
    Local Authority / Social Services :  London Borough of Tower Hamlets Council (click for contact details)
    Type of Service :  Care Home only (Residential Care) – Privately Owned , Registered for a maximum of 44 Service Users
    Registered Care Categories* :  Dementia • Learning Disability • Mental Health Condition • Old Age
    

    【讨论】:

      猜你喜欢
      • 2021-08-31
      • 2023-02-21
      • 1970-01-01
      • 2013-09-09
      • 2019-04-09
      • 2020-08-20
      • 1970-01-01
      • 2016-07-23
      • 1970-01-01
      相关资源
      最近更新 更多