【问题标题】:How to scroll down to end of the dynamic page using selenium python chromeDriver如何使用 selenium python chromeDriver 向下滚动到动态页面的末尾
【发布时间】:2020-04-13 15:44:40
【问题描述】:

请帮助我。 我正在尝试向下滚动到动态页面的末尾并获取 HTML 代码,但它无法正常工作。 我试过this。 这只会向下滚动一次。 我将睡眠时间从 2 更改为超过 5,这只向下滚动两次,然后从 while 循环中中断。 这个页面是here

非常感谢任何帮助。

【问题讨论】:

标签: javascript python web-scraping beautifulsoup


【解决方案1】:

您可以直接拨打API,如下:

import csv
import json
import requests

params = {
    "BodyType": "",
    "Year": "",
    "Make": "",
    "Model": "",
    "PriceRange": "",
    "PriceStart": "",
    "PriceEnd": "",
    "Condition": "pre-owned-cars",
    "Color": "",
    "InteriorColor": "",
    "CityMpg": "",
    "HighwayMpg": "",
    "Transmission": "",
    "DriveTrain": "",
    "Fuel": "",
    "SearchExpression": "",
    "SortCriteria": "",
    "SortDirection": "",
    "LocationId": "",
    "IsCertified": "-1",
    "IsSold": "",
    "IsFuzzySearch": "false",
    "startIndex": "1",
    "Results": "60"
}

names = ["VehicleName", "Engine", "Transmission",
         "FuelEconomyCity", "FuelEconomyHighway", "StockNo", "Vin", "IsSold", "Mileage"]


def main(url):
    r = requests.get(url, params=params).json()
    with open("data.csv", 'w', newline="") as f:
        writer = csv.writer(f)
        writer.writerow(names)
        for item in r['vehicles']:
            writer.writerow([item[name] for name in names])


main("https://www.tgmotorsales.com/Inventory/Search")

输出:view-online

熊猫短版:

def main(url):
    r = requests.get(url, params=params).json()
    df = pd.DataFrame(r['vehicles'])
    df.to_csv("data.csv", index=False)


main("https://www.tgmotorsales.com/Inventory/Search")

【讨论】:

  • 你在哪里找到https://www.tgmotorsales.com/Inventory/Search,我一直在找那个,但没找到!
  • @chitown88 您可以在浏览器Network-Monitor 下的XHR 请求下找到它。作为following
  • 是的,这就是我的意思。那从来没有出现在我的窗格中。这正是我期望看到的。
  • @chitown88 好吧,让我给你一个提示。按下 Presist Logs 并进入页面后,向下滚动时向下滚动以 获取数据。
  • 是的!就在那里!谢谢队友
【解决方案2】:

不需要硒。数据从 json 结构呈现在源 html 中。

import pandas as pd
import requests
from bs4 import BeautifulSoup
import json
from pandas.io.json import json_normalize




url  = 'https://www.tgmotorsales.com/pre-owned-cars?results='
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

jsonStr = soup.find('div',{'id':'ds-vehicles-json'})['data-json']
jsonData = json.loads(jsonStr)
df = json_normalize(jsonData)

输出:

print (df.head().to_string())
         AccidentIndicatorsText  Age BasicExteriorColor BasicExteriorColorSwatch BodyType BodyTypeName  CabType CabTypeName  CarHighestCost  CarLowestCost                                CarfaxFeedText  CarfaxOk     CarfaxText CategoryId CategoryIdList CategoryName CategoryType  CertifiedStatus ChromeCode                                      ComingSoonUrl                                  ComingSoonUrlBase Comments  Condition  CreatedBy            CreatedOn  CreatorUserType  Cylinders DamageType DamageTypeName DataOptionSelected DealerCost  DealerId  DealerLocationId  DestinationPrice  DisabilityEquipped  Discount  DiscountPrice  DiscountType  DiscountValue  Doors        DriveTrain  EndRow             Engine  FDiff FactoryColor         FactoryColorText FactoryInterior FactoryInteriorText FileName  FinalPrice                                      FirstImageUrl           Fuel FuelCapacity FuelEconomyCity FuelEconomyHighway FuelName GrossVehicleWeightRating  HasHighlightedFeatures         HorsePower  ImageCount ImageUrls Images          InStockDate  InternetPrice  InvoicePrice InvoiceValue  IsActive  IsAutoTrader  IsBasicColor  IsCarsCom  IsCudl  IsCustomerSaved  IsDeleted  IsEdmunds  IsHighlighted  IsInboundLocked  IsInspected  IsNew  IsNewPrice  IsOnSale  IsOptional IsPakageInstalled  IsPriceLocked  IsPriority  IsPromotionLock  IsPublishedQualityControlInspectionReport  IsPublishedVehicleBuyersGuid  IsRecycler  IsRemoved  IsSelected  IsShowMsrpInvoice  IsSmogChecked  IsSold  IsStockManual  IsUniversalPromotionText  IsUpdateNow LastModifiedBy LicensePlate LicensePlateState  MDiff  Make  MakeId MakeName MakeOther  Mileage  MoDiff     Model ModelId  ModifiedBy           ModifiedOn  ModifierUserType  MsrpPrice NewPrice          NonWaterMarkedImageUrl Notes OldInternetPrice OptionCodes OptionDescription Options OwnershipText PackageValue Pakage PakageSelected        PriceLockDate PriceLockNotes         PromoExpires          PromoStarts  PromotionCode PromotionDescription PromotionText  Rank            RemovedOn RetailValue        ScheduledDate SearchText  SequenceNumber SortColumn SortDirection  Source  StartRow StateCode Status           StatusDate StockNo StockNumberFirst StockNumberSettingType StockNumberStartsWith  StockSettingId                                Style  StyleId SubCategoryName  Title TitleState TitleStatus Torque Transmission  TransmissionType          Trim TruckBedLength TruckBedWidth  UpdateGroupId  VehicleId                    VehicleImage                          VehicleImageHash  VehicleImageId VehicleInventorySource VehicleInventorySourceScript VehicleInventoryUpdateSource VehicleInventoryUpdateSourceScript                    VehicleName VehicleTitle  VideoCount                Vin  Warranty  YDiff  Year
0          1 Accidents Reported    0               None                     None        D         None        0        None               0              0                                 Carfax Report      True  Carfax Report       None           None         None         None                0       None  //images.dealersync.com/cloud/userdocumentprod...  /2714/Photos/comingsoon/450e3bd07f284a09b7225e...     None          2          0  0001-01-01T00:00:00                0          6       None                              None       None      2714               756               0.0                   0       0.0            0.0         False            0.0      0   All Wheel Drive       0  3.0L  6 Cylinders      0      #000000          Brilliant Black         #000000               Black     None     13000.0  //images.dealersync.com/cloud/userdocumentprod...  Gasoline Fuel         None              19                 28     None                     None                   False  310 hp @ 5500 rpm          35      None     []  2020-02-24T00:00:00        13000.0           0.0         None     False         False         False      False   False            False      False      False          False            False        False  False       False     False       False              None          False       False            False                                      False                         False       False      False       False              False          False    True          False                     False        False           None         None              None      0  Audi       0     None      None    90060       0        A6    None           0  0001-01-01T00:00:00                 0    49900.0     None  20200225004903504_IMG_4695.jpg  None             None        None              None      []      2 Owners         None   None           None  0001-01-01T00:00:00           None  0001-01-01T00:00:00  0001-01-01T00:00:00              0                 None          None     0  0001-01-01T00:00:00        None  0001-01-01T00:00:00       None               0       None          None       0         0      None      S  0001-01-01T00:00:00    1347             None                   None                  None               0         4dr Sdn quattro 3.0T Premium        0            None      1       None       Clear   None    Automatic                 1  3.0T Premium           None          None              0     496489  20200225004903504_IMG_4695.jpg  919e844a50786ba437eea8ca98fb1b15c02f9d62               0                   None                         None                         None                               None      2012 Audi A6 3.0T Premium         None           0  WAUBGAFCXCN003858         0      0  2012
1  No Accidents/Damage Reported    0               None                     None        C         None        0        None               0              0  Carfax Report - No Accidents/Damage Reported      True  Carfax Report       None           None         None         None                0       None  //images.dealersync.com/cloud/userdocumentprod...  /2714/Photos/comingsoon/450e3bd07f284a09b7225e...     None          2          0  0001-01-01T00:00:00                0          4       None                              None       None      2714               756               0.0                   0       0.0            0.0         False            0.0      0   All Wheel Drive       0  1.8L  4 Cylinders      0      #CDCDCD     Lake Silver Metallic                               Ebony     None      7950.0  //images.dealersync.com/cloud/userdocumentprod...  Gasoline Fuel         None              20                 28     None                     None                   False  225 hp @ 5900 rpm          32      None     []  2019-12-18T00:00:00         7950.0           0.0         None     False         False         False      False   False            False      False      False          False            False        False  False       False     False       False              None          False       False            False                                      False                         False       False      False       False              False          False    True          False                     False        False           None         None              None      0  Audi       0     None      None    80000       0        TT    None           0  0001-01-01T00:00:00                 0    39600.0     None  20200102234214276_IMG_3807.jpg  None             None        None              None      []      3 Owners         None   None           None  0001-01-01T00:00:00           None  0001-01-01T00:00:00  0001-01-01T00:00:00              0                 None          None     0  0001-01-01T00:00:00        None  0001-01-01T00:00:00       None               0       None          None       0         0      None      S  0001-01-01T00:00:00    1311             None                   None                  None               0                2dr Cpe quattro 6-Spd        0            None      1       None       Clear   None       Manual                 2                         None          None              0     470937  20200102234214276_IMG_3807.jpg  5710e20e7ff2a325b0bbdd65b6155058d583fab7               0                   None                         None                         None                               None                  2002 Audi TT          None           0  TRUWT28N221000808         0      0  2002
2  No Accidents/Damage Reported    0               None                     None        D         None        0        None               0              0  Carfax Report - No Accidents/Damage Reported      True  Carfax Report       None           None         None         None                0       None  //images.dealersync.com/cloud/userdocumentprod...  /2714/Photos/comingsoon/450e3bd07f284a09b7225e...     None          2          0  0001-01-01T00:00:00                0          6       None                              None       None      2714               756               0.0                   0       0.0            0.0         False            0.0      0   All Wheel Drive       0  3.0L  6 Cylinders      0      #000000  Black Sapphire Metallic                               Beige     None      8950.0  //images.dealersync.com/cloud/userdocumentprod...  Gasoline Fuel         None              17                 25     None                     None                   False  230 hp @ 6500 rpm          47      None     []  2020-02-24T00:00:00         8950.0           0.0         None     False         False         False      False   False            False      False      False          False            False        False  False       False     False       False              None          False       False            False                                      False                         False       False      False       False              False          False   False          False                     False        False           None         None              None      0   BMW       0     None      None   107223       0  3 Series    None           0  0001-01-01T00:00:00                 0    36600.0     None  20200326000458548_IMG_4870.jpg  None             None        None              None      []      5 Owners         None   None           None  0001-01-01T00:00:00           None  0001-01-01T00:00:00  0001-01-01T00:00:00              0                 None          None     0  0001-01-01T00:00:00        None  0001-01-01T00:00:00       None               0       None          None       0         0      None      I  0001-01-01T00:00:00    1345             None                   None                  None               0        4dr Sdn 328i xDrive AWD SULEV        0            None      1       None       Clear   None    Automatic                 1   328i xDrive           None          None              0     502702  20200326000458548_IMG_4870.jpg  6fb61e8026cc85b3f2e312de4b1a3e3140e7d56c               0                   None                         None                         None                               None  2011 BMW 3 Series 328i xDrive         None           0  WBAPK5C50BF124652         0      0  2011
3  No Accidents/Damage Reported    0               None                     None        D         None        0        None               0              0  Carfax Report - No Accidents/Damage Reported      True  Carfax Report       None           None         None         None                0       None  //images.dealersync.com/cloud/userdocumentprod...  /2714/Photos/comingsoon/450e3bd07f284a09b7225e...     None          2          0  0001-01-01T00:00:00                0          6       None                              None       None      2714               756               0.0                   0       0.0            0.0         False            0.0      0  Rear Wheel Drive       0  3.0L  6 Cylinders      0      #010101  Black Sapphire Metallic                               Beige     None      6950.0  //images.dealersync.com/cloud/userdocumentprod...  Gasoline Fuel         None              18                 28     None                     None                   False  230 hp @ 6500 rpm          37      None     []  2020-01-02T00:00:00         6950.0           0.0         None     False         False         False      False   False            False      False      False          False            False        False  False       False     False       False              None          False       False            False                                      False                         False       False      False       False              False          False    True          False                     False        False           None         None              None      0   BMW       0     None      None   104650       0  3 Series    None           0  0001-01-01T00:00:00                 0    42595.0     None  20200108013341357_IMG_4019.jpg  None             None        None              None      []      3 Owners         None   None           None  0001-01-01T00:00:00           None  0001-01-01T00:00:00  0001-01-01T00:00:00              0                 None          None     0  0001-01-01T00:00:00        None  0001-01-01T00:00:00       None               0       None          None       0         0      None      S  0001-01-01T00:00:00    1314             None                   None                  None               0  4dr Sdn 328i RWD SULEV South Africa        0            None      1       None       Clear   None    Automatic                 1          328i           None          None              0     473172  20200108013341357_IMG_4019.jpg  378ef180ba92675b9d573dc232a516152a2b7d59               0                   None                         None                         None                               None         2010 BMW 3 Series 328i         None           0  WBAPH5G52ANM36680         0      0  2010
4  No Accidents/Damage Reported    0               None                     None        C         None        0        None               0              0  Carfax Report - No Accidents/Damage Reported      True  Carfax Report       None           None         None         None                0       None  //images.dealersync.com/cloud/userdocumentprod...  /2714/Photos/comingsoon/450e3bd07f284a09b7225e...     None          2          0  0001-01-01T00:00:00                0          4       None                              None       None      2714               756               0.0                   0       0.0            0.0         False            0.0      0   All Wheel Drive       0  2.0L  4 Cylinders      0      #E9EEE8             Alpine White         #000000               Black     None     16500.0  //images.dealersync.com/cloud/userdocumentprod...  Gasoline Fuel         None              22                 33     None                     None                   False  240 hp @ 5000 rpm          41      None     []  2020-03-10T00:00:00        16500.0           0.0         None     False         False         False      False   False            False      False      False          False            False        False  False       False     False       False              None          False       False            False                                      False                         False       False      False       False              False          False   False          False                     False        False           None         None              None      0   BMW       0     None      None    82460       0  4 Series    None           0  0001-01-01T00:00:00                 0    57650.0     None  20200311010048004_IMG_4926.jpg  None             None        None              None      []      2 Owners         None   None           None  0001-01-01T00:00:00           None  0001-01-01T00:00:00  0001-01-01T00:00:00              0                 None          None     0  0001-01-01T00:00:00        None  0001-01-01T00:00:00       None               0       None          None       0         0      None      I  0001-01-01T00:00:00    1356             None                   None                  None               0        2dr Cpe 428i xDrive AWD SULEV        0            None      1       None       Clear   None    Automatic                 1   428i xDrive           None          None              0     504580  20200311010048004_IMG_4926.jpg  63a2ff0985a8e2276d33ab7dca1ae40221e528a5               0                   None                         None                         None                               None  2014 BMW 4 Series 428i xDrive         None           0  WBA3N9C56EK245257         0      0  2014
....

【讨论】:

  • 感谢您的回复。不幸的是,我不知道元素的 id,因为我不仅需要从该站点获取 HTML 代码,还需要从许多其他站点获取 HTML 代码。我将从这个和其他 ajax 页面获取 HTML 代码并使用正则表达式。你有什么办法解决这个问题吗?
  • 那么,如果您要访问多个站点,那么您将在这里拥有多个案例。但我猜这个网站,id 属性不会改变
  • 我必须从许多站点中提取数据,例如上述站点。我可能会从这些站点获取 ID,但这需要更多时间并改变我的系统逻辑。对我来说最好的方法是什么?
  • @Danniel 检查我的以下答案。
  • 还有哪些网站?无论如何,您仍然必须通过不同的 html 结构进行解析,因此每个页面仍然会有不同的逻辑。也有可能其他页面会有一个 api 来直接提取数据。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2015-11-30
  • 2018-07-28
  • 2020-12-18
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多