【问题标题】:How to scrape google maps for all data using python如何使用 python 抓取谷歌地图的所有数据
【发布时间】:2020-06-09 09:49:36
【问题描述】:

我正在尝试使用 python 从谷歌地图中抓取一个地方的标题、电话号码、网站、地址、评级、评论数量。例如,Pike's Landing 餐厅(请参阅下面的谷歌地图 URL)需要所有信息。我想把它们拉到 python 中。

网址:https://www.google.com/maps?cid=15423079754231040967&hl=en

我在检查时可以看到 HTML 代码,但是当我使用漂亮的汤进行 scraping 时,所有代码都会被转换。从堆栈溢出中,我找到了唯一的审查数量的解决方案,如下代码,

import re
import requests
from ast import literal_eval

urls = [
'https://www.google.com/maps?cid=15423079754231040967&hl=en',
'https://www.google.com/maps?cid=16168151796978303235&hl=en']

for url in urls:
    for g in re.findall(r'\[\\"http.*?\d+ reviews?.*?]', requests.get(url).text):
        data = literal_eval(g.replace('null', 'None').replace('\\"', '"'))
        print(bytes(data[0], 'utf-8').decode('unicode_escape'))
        print(data[1])

但我需要所有数据。我可以使用 Google Maps API 来获取实际数据,但现在获取电话号码、评级、评论不是免费的。所以我想从前端转义数据。

请帮帮我。

【问题讨论】:

  • 你需要使用 Selenium 或其他一些无头浏览器来抓取它。
  • 你检查你需要的数据是否是动态生成的?

标签: python django web-scraping


【解决方案1】:

很久以前我在 reddit 上问过同样的问题。我最终自己解决了这个问题,have a look at this 注意 - 这是严格为我的用例提取详细信息而编写的,但您可以了解这里发生的事情的要点。

from selenium import webdriver

options = webdriver.ChromeOptions()

options.add_argument('headless')



browser = webdriver.Chrome(options=options)



url = "https://www.google.com/maps/place/Papa+John's+Pizza/@40.7936551,-74.0124687,17z/data=!3m1!4b1!4m5!3m4!1s0x89c2580eaa74451b:0x15d743e4f841e5ed!8m2!3d40.7936551!4d-74.0124687"

# url = "https://www.google.com/maps/place/Lucky+Dhaba/@30.653792,76.8165233,17z/data=!3m1!4b1!4m5!3m4!1s0x390feb3e3de1a031:0x862036ab85567f75!8m2!3d30.653792!4d76.818712"



browser.get(url)



# review titles / username / Person who reviews

review_titles = browser.find_elements_by_class_name("section-review-title")

print([a.text for a in review_titles])

# review text / what did they think

review_text = browser.find_elements_by_class_name("section-review-review-content")

print([a.text for a in review_text])

# get the number of stars

stars = browser.find_elements_by_class_name("section-review-stars")

first_review_stars = stars[0]

active_stars = first_review_stars.find_elements_by_class_name("section-review-star-active")

print(f"the stars the first review got was {len(active_stars)}")

【讨论】:

    【解决方案2】:

    如果您检查页面源代码,您会发现 window.APP_INITIALIZATION_STATE 块,其中包含带有所有地点数据的 JSON。你只需要解析它。

    您还可以使用第三方解决方案,例如 SerpApi。这是一个免费试用的付费 API。

    from serpapi import GoogleSearch
    
    params = {
      "engine": "google_maps",
      "type": "place",
      "q": "Pike's Landing",
      "ll": "@40.7455096,-74.0083012,14z",
      "data": "!3m1!4b1!4m5!3m4!1s0x0:0xd609c9524d75cbc7!8m2!3d64.8299557!4d-147.8488774"
      "api_key": "API_KEY",
    }
    
    search = GoogleSearch(params)
    results = search.get_dict()
    

    示例输出:

    "place_results": {
      "title": "Pike's Landing",
      "data_id": "0x51325b1733fa71bf:0xd609c9524d75cbc7",
      "reviews_link": "https://serpapi.com/search.json?engine=google_maps_reviews&hl=en&place_id=0x51325b1733fa71bf:0xd609c9524d75cbc7",
      "gps_coordinates": {
        "latitude": 64.8299557,
        "longitude": -147.8488774
      },
      "thumbnail": "https://lh5.googleusercontent.com/p/AF1QipNtwheOCQ97QFrUNIwKYUoAPiV81rpiW5cIiQco=w152-h86-k-no",
      "rating": 3.9,
      "reviews": 825,
      "price": "$$",
      "type": [
        "American restaurant"
      ],
      "description": "Burgers, seafood, steak & river views. Pub fare alongside steak & seafood, served in a dining room with river views & a waterfront patio.",
      "service_options": {...},
      "extensions": [...],
      "address": "4438 Airport Way, Fairbanks, AK 99709",
      "website": "https://www.pikeslodge.com/pikes-landing",
      "phone": "(907) 479-6500",
      "hours": [...],
      "images": [...],
      "user_reviews": {
        "summary": [...],
        "most_relevant": [
          {
            "username": "Vasisht Raghavendra",
            "rating": 5,
            "description": "Restaurant with a view and good food. The Biggest Berry Blast cocktail is a highly recommended one. For food we tried the Alaska Crab with Mushrooms and the Bisque. Both were excellent. For dessert we tried the Bread Pudding - probably not the best we have had. The ambiance is very nice and the servers are friendly.",
            "images": [...],
            "date": "2 months ago"
          },
          ...
        ]
      },
      ...
    }
    

    您可以查看documentation了解更多详情。

    免责声明:我在 SerpApi 工作。

    【讨论】:

      猜你喜欢
      • 2018-06-09
      • 2022-10-05
      • 1970-01-01
      • 2016-08-01
      • 2020-11-14
      • 1970-01-01
      • 2023-01-19
      • 1970-01-01
      • 2022-08-12
      相关资源
      最近更新 更多