【问题标题】:Python/BeautifulSoup with JavaScript source带有 JavaScript 源的 Python/BeautifulSoup
【发布时间】:2016-07-12 16:43:42
【问题描述】:

首先,我是 Python 和 BeautifulSoup 的新手。如果我使用了错误的术语,请原谅我。

我遇到了一个问题,当我检查元素时,我能够找到它,但是当我转到“查看源代码”时,它不存在,并且似乎数据是通过 javascript 提取的,因此它可能是动态的。

因此,我的问题是,如何合并由 javascript“上传”的数据(源/元素/标签)?

到目前为止,我有下面的代码。我无法获取每个“搜索”的 URL

import urllib
import urllib.request
from bs4 import BeautifulSoup
import csv

rootURL="http://www.homestead.ca"

def HomeStead2(URL):
    thePage = urllib.request.urlopen(URL)
    soup = BeautifulSoup(thePage, "html.parser")
    return soup

soup = HomeStead2(rootURL)

for dropdownlist in soup.find("ul", {"class":"nav navbar-nav primary"}).find('ul').findAll('a'):

"""NOTHING IS WORKING FROM HERE ONWARDS WHEN I TRY TO GET THE HREF"""
    citySoup = HomeStead2(rootURL + dropdownlist.get('href'))
    for btnPreview in citySoup.find("div", {"class":"search extended-search"}).findAll('li'):
        try:
            for ApartmentLink in btnPreview.findAll("div", {"class":"property-container"}):
                print(ApartmentLink)
        except:
            print('skip')

【问题讨论】:

  • 试试 selenium -- 它读取 javascript 并生成结果标记
  • 那是不是就不用BeautifulSoup了?
  • 您需要在 Python 中使用 Selenium。无需使用BS。
  • 你需要一个无头浏览器——能够解析 Javascript 的东西。硒是一个; PhantomJS(一个 Javascript 库)是另一个。

标签: javascript python beautifulsoup


【解决方案1】:

你可以在没有硒的情况下完成这一切,一旦你访问了每个公寓的网址,数据就会从一个 ajax 调用中检索到一个 api,我们只需要 city-id

from bs4 import BeautifulSoup
from urllib.parse import urljoin

root = "http://www.homestead.ca"

data = {'keyword': 'false', 'max_bed': '100', 'geocode': '',
        'min_rate': '0', 'offset': '0', 'max_rate': '4000',
        'show_custom_fields': 'true', 'limit': '50', ''
                                                     'pet_friendly': '', 'city_id': '', 'amenities': '',
        'client_id': '6', 'max_bath': '10',
        'auth_token': 'sswpREkUtyeYjeoahA2i',
        'count': 'false', 'min_bath': '0',
        'order': 'max_rate ASC, min_rate ASC, min_bed ASC, max_bath ASC',
        'city_ids': '', 'region': '',
        'property_types': 'low-rise-apartment,mid-rise-apartment,high-rise-apartment,luxury-apartment,townhouse,house,multi-unit-house,single-family-home,duplex,tripex,semi',
        'min_bed': '-1',
        'show_promotions': 'true'}

get = "http://api.theliftsystem.com/v2/search"
with requests.Session() as s:
    r = s.get(root)
    soup = BeautifulSoup(r.content, "lxml")
    lis = soup.select("ul.child-pages.dropdown-menu li")
    for li in lis:
        city_id = li["data-city-id"]
        data["city_id"] = city_id
        p = s.get(get, params=data)
        print(p.json())

您可以修改数据以匹配您想要的任何查询。

输出将是 json 格式,例如:

[{'building_header': '', 'office_hours': '', 'name': 'North Park Tower', 'matched_suite_names': ['Bachelor', 'One Bedroom', 'Two Bedroom'], 'matched_beds': ['0', '1', '2'], 'id': 309, 'statistics': {'suites': {'rates': {'average': 950.0, 'max': 1275.0, 'min': 625.0}, 'square_feet': {'average': 0.0, 'max': '0.0', 'min': '0.0'}, 'bedrooms': {'average': '1.0', 'max': 2, 'min': 0}, 'bathrooms': {'average': 1.0, 'max': 1.0, 'min': 1.0}}}, 'geocode': {'longitude': '-80.2605725', 'latitude': '43.1703624', 'distance': None}, 'photo': '1443018148_2.jpg', 'min_availability_date': '', 'address': {'intersection': '', 'country_code': 'CAN', 'province_code': 'ON', 'address': '325 North Park Street', 'postal_code': 'N3R 2X4', 'province': 'Ontario', 'country': 'Canada', 'neighbourhood': '', 'city_id': 332, 'city': 'Brantford'}, 'permalink': 'http://www.homestead.ca/apartments/325-north-park-street-brantford', 'pet_friendly': True, 'thumbnail_path': 'http://s3.amazonaws.com/lws_lift/homestead/images/gallery/256/1443018148_2.jpg', 'details': {'location': '', 'suite': '', 'features': '', 'overview': "Located on North Park Street and Memorial Avenue,this quiet building is within walking distance of the following: - Zehrs Plaza, North Park Plaza, Shoppers Drug Mart, Zehrs Grocery Store, Zellers, Pet Store, Party Supply Store, furniture store, variety store, Black's Photography, paint shop and veterinary clinic\xa0  - Restaurants and coffee shops\xa0  - Wayne Gretzky Recreational Arena\xa0  - Medical Clinic,Shoppers Home Health Care Clinic and Pharmacy\xa0  - Catholic Elementary School\xa0  - On bus route "}, 'availability_status_label': 'Available Now', 'availability_status': 1, 'contact': {'email': 'rentals@homestead.ca', 'fax': '(519) 752-6855', 'alt_phone': '', 'name': '', 'phone': '519-752-3596', 'alt_extension': '', 'extension': ''}, 'parking': {'indoor': '', 'additional': '', 'outdoor': ''}, 'property_type': 'High-rise-apartment', 'website': {'url': '', 'title': '', 'description': ''}, 'availability_count': 6, 'client': {'email': 'bcadieux@homestead.ca', 'phone': '613-546-3146', 'id': 6, 'website': 'www.homestead.ca', 'name': 'Homestead Land Holdings'}, 'promotion': {'featured': 0}, 'photo_path': 'http://s3.amazonaws.com/lws_lift/homestead/images/gallery/full/1443018148_2.jpg'}, {'building_header': '', 'office_hours': '', 'name': 'Westgate Apartments', 'matched_suite_names': ['Bachelor', 'One Bedroom', 'Two Bedroom'], 'matched_beds': ['0', '1', '2'], 'id': 310, 'statistics': {'suites': {'rates': {'average': 975.0, 'max': 1300.0, 'min': 650.0}, 'square_feet': {'average': 0.0, 'max': '0.0', 'min': '0.0'}, 'bedrooms': {'average': '1.0', 'max': 2, 'min': 0}, 'bathrooms': {'average': 1.0, 'max': 1.0, 'min': 1.0}}}, 'geocode': {'longitude': '-80.2482991', 'latitude': '43.1733242', 'distance': None}, 'photo': '1443017488_1.jpg', 'min_availability_date': '', 'address': {'intersection': '', 'country_code': 'CAN', 'province_code': 'ON', 'address': '661 West Street', 'postal_code': 'N3R 6W9', 'province': 'Ontario', 'country': 'Canada', 'neighbourhood': '', 'city_id': 332, 'city': 'Brantford'}, 'permalink': 'http://www.homestead.ca/apartments/661-west-street-brantford', 'pet_friendly': True, 'thumbnail_path': 'http://s3.amazonaws.com/lws_lift/homestead/images/gallery/256/1443017488_1.jpg', 'details': {'location': '', 'suite': '', 'features': '', 'overview': 'Located in the North end of Brantford, Westgate Tower is in an area that resembles a city within a city. There are a variety of banks, grocery stores, drug stores, malls, a wide selection of fast food, fine dining restaurants and an after hours medical centre, within waking distance.'}, 'availability_status_label': 'Available Now', 'availability_status': 1, 'contact': {'email': 'rentals@homestead.ca', 'fax': '(519) 751-0379', 'alt_phone': '', 'name': '', 'phone': '519-751-3867', 'alt_extension': '', 'extension': ''}, 'parking': {'indoor': '', 'additional': '', 'outdoor': ''}, 'property_type': 'High-rise-apartment', 'website': {'url': '', 'title': '', 'description': ''}, 'availability_count': 6, 'client': {'email': 'bcadieux@homestead.ca', 'phone': '613-546-3146', 'id': 6, 'website': 'www.homestead.ca', 'name': 'Homestead Land Holdings'}, 'promotion': {'featured': 0}, 'photo_path': 'http://s3.amazonaws.com/lws_lift/homestead/images/gallery/full/1443017488_1.jpg'}, {'building_header': '', 'office_hours': '', 'name': 'Dornia Manor', 'matched_suite_names': ['One Bedroom', 'Two Bedroom', 'Three Bedroom'], 'matched_beds': ['1', '2', '3'], 'id': 308, 'statistics': {'suites': {'rates': {'average': 1124.5, 'max': 1350.0, 'min': 899.0}, 'square_feet': {'average': 0.0, 'max': '0.0', 'min': '0.0'}, 'bedrooms': {'average': '2.25', 'max': 3, 'min': 1}, 'bathrooms': {'average': 1.375, 'max': 2.0, 'min': 1.0}}}, 'geocode': {'longitude': '-80.2584034', 'latitude': '43.1706331', 'distance': None}, 'photo': '1443017947_1.jpg', 'min_availability_date': '', 'address': {'intersection': '', 'country_code': 'CAN', 'province_code': 'ON', 'address': '321 Fairview Drive', 'postal_code': 'N3R 2X6', 'province': 'Ontario', 'country': 'Canada', 'neighbourhood': '', 'city_id': 332, 'city': 'Brantford'}, 'permalink': 'http://www.homestead.ca/apartments/321-fairview-drive-brantford', 'pet_friendly': True, 'thumbnail_path': 'http://s3.amazonaws.com/lws_lift/homestead/images/gallery/256/1443017947_1.jpg', 'details': {'location': '', 'suite': '', 'features': '', 'overview': 'Dornia Manor is a quiet, ninety-two unit apartment building located in the North end of Brantford. We offer one, two and three bedroom units and one penthouse suite. The building is located in close proximity to many major services such as banking, shopping, health services, recreational facilities, beauty shops, dry cleaners, schools and churches. There is a bus stop at the front door and highway 403 is within minutes.'}, 'availability_status_label': 'Available Now', 'availability_status': 1, 'contact': {'email': 'rentals@homestead.ca', 'fax': '(519) 752-6855', 'alt_phone': '', 'name': '', 'phone': '519-752-3596', 'alt_extension': '', 'extension': ''}, 'parking': {'indoor': '', 'additional': '', 'outdoor': ''}, 'property_type': 'High-rise-apartment', 'website': {'url': '', 'title': '', 'description': ''}, 'availability_count': 8, 'client': {'email': 'bcadieux@homestead.ca', 'phone': '613-546-3146', 'id': 6, 'website': 'www.homestead.ca', 'name': 'Homestead Land Holdings'}, 'promotion': {'featured': 0}, 'photo_path': 'http://s3.amazonaws.com/lws_lift/homestead/images/gallery/full/1443017947_1.jpg'}]

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-04-28
    • 1970-01-01
    • 2012-07-15
    • 1970-01-01
    • 1970-01-01
    • 2015-11-06
    相关资源
    最近更新 更多