【问题标题】:python nested loops not working while parsing site解析站点时python嵌套循环不起作用
【发布时间】:2019-06-27 03:46:16
【问题描述】:

我正在解析网站 yelp.com,我正在获取菜品名称name_of_dishs=yelp_beat.findAll('div',{'class':'lemon--div__373c0__1mboc businessName__373c0__1fTgn border-color--default__373c0__2oFDT'}) (Soco、SalaThai、Bunker)而且我需要对菜肴进行评论,但是当我使用嵌套循环时它不起作用

import requests
from bs4 import BeautifulSoup

base_url = "https://www.yelp.com/search?find_desc=Restaurants&find_loc=New%20York%2CNY&start=30"



yelp = requests.get(base_url)
yelp_beat = BeautifulSoup(yelp.text, 'html.parser')

name_of_dishs=yelp_beat.findAll('div',{'class':'lemon--div__373c0__1mboc businessName__373c0__1fTgn border-color--default__373c0__2oFDT'})
for dish in name_of_dishs:
    #print(dish.text)
    for reviews in dish.findAll('span',{'lemon--span__373c0__3997G text__373c0__2pB8f reviewCount__373c0__2r4xT text-color--mid__373c0__3G312 text-align--left__373c0__2pnx_'}):
        print(reviews.text)

【问题讨论】:

    标签: python-3.x parsing beautifulsoup html-parsing


    【解决方案1】:

    它缺少 class 作为参数,我已经简化了选择器并选择了 li

    yelp_beat = BeautifulSoup(yelp.text, 'html.parser')
    
    theList = yelp_beat.select('.mainContentContainer__373c0__32Mqa .domtags--li__373c0__3TKyB.list-item__373c0__M7vhU')
    
    for li in theList:
        name_of_dishs = li.select_one('h3 a')
        reviews = li.select_one('.reviewCount__373c0__2r4xT')
        if not name_of_dishs or not reviews:
            continue
        print('{}: {}'.format(name_of_dishs.text, reviews.text))
    

    结果

    Jajaja: 576 reviews
    Balzem: 364 reviews
    Jane: 2995 reviews
    PMF Pardon My French: 830 review
    

    【讨论】:

    • 这是因为评论计数在不同的兄弟姐妹中,请参阅更新的答案。
    猜你喜欢
    • 2021-04-23
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多