【问题标题】:Scraping reviews with 'More' text使用“更多”文本抓取评论
【发布时间】:2019-05-24 08:47:29
【问题描述】:

正如标题所述,我需要帮助从这个名为 TripAdivsor 的网站上抓取评论。我使用的具体链接是https://www.tripadvisor.co.uk/Restaurant_Review-g60834-d4106745-Reviews-McDonald_s-Page_Arizona.html

问题在于,在某些评论中,有“更多”文本可以查看其余评论(例如,上面链接上的第二次评论)。如何抓取包含此“更多”文本的评论?

有没有一种方法可以在我点击链接时打开它们,或者这是找到包含整个评论的正确标签的问题?

【问题讨论】:

    标签: python beautifulsoup


    【解决方案1】:

    使用 Selenium 和 Beautiful soup。如果单击该按钮,请检查更多按钮并获取 page_source。

    from selenium import webdriver
    from bs4 import BeautifulSoup
    import time
    driver = webdriver.Chrome()
    driver.get('https://www.tripadvisor.co.uk/Restaurant_Review-g60834-d4106745-Reviews-McDonald_s-Page_Arizona.html')
    if len(driver.find_elements_by_xpath("//span[@class='taLnk ulBlueLinks'][contains(.,'More')]"))>0:
        driver.find_elements_by_xpath("//span[@class='taLnk ulBlueLinks'][contains(.,'More')]")[0].click()
    
    time.sleep(3)
    soup=BeautifulSoup(driver.page_source,'html.parser')
    driver.quit()
    items=[item.text for item in soup.select("p.partial_entry")]
    print(items)
    

    输出:

    ['Stopped by to get some chicken strips to go.  They were out of soft drinks, but I was getting coffee.  Restrooms were clean.', "We live in page Arizona and go to McDonald's on the occasion that we don't want to cook but almost every time that we stop in the service is horrible. There has been times where the drive thru would not say anything to us until we decided to drive back around to really let them know we were ready to order food. The manager whom i have talked to on multiple occasions acts like it's bo big deal that their restaurant shows no respect for the customers. Finally i decided to write a review before calling corporate. I understand not wanting or liking your job at McDonald's but you made the life decisions to be where you are the least you could do is show some respect for your customers especially the locals of this tourist town.", 'The location was newer, clean and kept up very well. The hot fudge sundaes were great . Stopped by for a snack', 'We stopped in to grab a little snack before heading to Horseshoe Bend. My husband got a double cheeseburger, I ordered an apple pie. His burger was fine. The apples in the pie were all shriveled up. It looked old. I looked at the time on the box and it had expired 4 hours before. I walked back in and asked for a new one, explaining the one they just gave me was quite old. Then he handed me one and said try this one. I looked at the date and it expired 2 hours before. I asked if the had any fresh ones. He went into the back for awhile and came out with a new one.', 'I like the coffee, there was few times they messed up coffee 3x in a row. but its okay i had patience for them to get it right. I only like their fries, coffee, and a very few sandwiches. plus the nuggets. clean restrooms. clean tables but rude managers', 'Ordered mg nuggets and Big Mac for two and waited 25 minutes I decided to go ask for a refund or compensation but the manager did not want He said if I refund you ,you will not have your mealI find that not acceptable to wait that long and Big Macs were coldI am a big traveller and never saw a Manager like that Don’t go there Go to Taco Bell ...', "the employees were very fast and efficient at the service they provided whilst giving me my food. McDonald's is always reliable whenever you want a quick snack.", "It is a newer looking location with a huge amount of parking. The dining area was very large and quite clean. The service was very good. The food was just like any other McD's.", 'win i eat at the best restaurant the meals are the best i love the fries it gives me taste of joy . i like to eat their again i like to eat their win im on the road and i like to never stop eating its my great place to eat', "This is a new facility in what looks like a newer area of Page. Typical McDonald's but great service and new building makes this a good stop if you are looking for a quick fill up."]
    

    【讨论】:

      【解决方案2】:

      目前您无法获取评论的全文,因为它不包含在 html 中。

      获取方法如下:

      • 抓取页面
      • 查找所有评论
      • 如果评论有“更多”链接:
      • 获取ID
      • 抓取“评论网址”

      代码:

      import requests
      from bs4 import BeautifulSoup as soup
      
      website = "https://www.tripadvisor.co.uk/"
      r_review_str = "Restaurant_Review-"
      u_review_str = "ShowUserReviews-"
      restaurant_id = "g60834-d4106745"
      restaurant_name = "McDonald_s-Page_Arizona"
      
      base_url = website + r_review_str + restaurant_id + " -Reviews-" + restaurant_name + ".html"
      req = requests.get(base_url)
      page = soup(req.text,'html.parser')
      
      reviews_text =[]
      reviews = page.find_all('div',{'class':'reviewSelector'})
      for r in reviews:
          r_id = r.get('id').replace('review_','')
          p_text = r.find('p',{'class':'partial_entry'})
          text = ""
          if p_text.find('span',{'class':'ulBlueLinks'}):
              url = website + u_review_str + restaurant_id + "-r" + r_id + "-" + restaurant_name + ".html"
              req_u = requests.get(url)
              page_u = soup(req_u.text, "html.parser")
              text = page_u.find('div',{'id':'review_'+r_id}).find('p',{'class':'partial_entry'}).text
          else:
              text = p_text.text
          reviews_text.append(text)
      
      from pprint import pprint
      pprint(reviews_text)
      

      输出:

      ['Stopped by to get some chicken strips to go.  They were out of soft drinks, '
       'but I was getting coffee.  Restrooms were clean.',
       "We live in page Arizona and go to McDonald's on the occasion that we don't "
       'want to cook but almost every time that we stop in the service is horrible. '
       'There has been times where the drive thru would not say anything to us until '
       'we decided to drive back around to really let them know we were ready to '
       'order food. The manager whom i have talked to on multiple occasions acts '
       "like it's bo big deal that their restaurant shows no respect for the "
       'customers. Finally i decided to write a review before calling corporate. I '
       "understand not wanting or liking your job at McDonald's but you made the "
       'life decisions to be where you are the least you could do is show some '
       'respect for your customers especially the locals of this tourist town.',
       'The location was newer, clean and kept up very well. The hot fudge sundaes '
       'were great . Stopped by for a snack',
       'We stopped in to grab a little snack before heading to Horseshoe Bend. My '
       'husband got a double cheeseburger, I ordered an apple pie. His burger was '
       'fine. The apples in the pie were all shriveled up. It looked old. I looked '
       'at the time on the box and it had expired 4 hours before. I walked back in '
       'and asked for a new one, explaining the one they just gave me was quite old. '
       'Then he handed me one and said try this one. I looked at the date and it '
       'expired 2 hours before. I asked if the had any fresh ones. He went into the '
       'back for awhile and came out with a new one.',
       'I like the coffee, there was few times they messed up coffee 3x in a row. '
       'but its okay i had patience for them to get it right. I only like their '
       'fries, coffee, and a very few sandwiches. plus the nuggets. clean restrooms. '
       'clean tables but rude managers',
       'Ordered mg nuggets and Big Mac for two and waited 25 minutes I decided to go '
       'ask for a refund or compensation but the manager did not want He said if I '
       'refund you ,you will not have your mealI find that not acceptable to wait '
       'that long and Big Macs were coldI am a big traveller and never saw a Manager '
       'like that Don’t go there Go to Taco Bell ...',
       'the employees were very fast and efficient at the service they provided '
       "whilst giving me my food. McDonald's is always reliable whenever you want a "
       'quick snack.',
       'It is a newer looking location with a huge amount of parking. The dining '
       'area was very large and quite clean. The service was very good. The food was '
       "just like any other McD's.",
       'win i eat at the best restaurant the meals are the best i love the fries it '
       'gives me taste of joy . i like to eat their again i like to eat their win im '
       'on the road and i like to never stop eating its my great place to eat',
       'This is a new facility in what looks like a newer area of Page. Typical '
       "McDonald's but great service and new building makes this a good stop if you "
       'are looking for a quick fill up.']
      

      【讨论】:

        猜你喜欢
        • 2013-12-08
        • 1970-01-01
        • 2021-05-19
        • 1970-01-01
        • 2019-09-23
        • 2019-05-08
        • 2023-02-24
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多