分页没有每个页面的不同网址答案

【问题标题】：Pagination without having different urls to each page分页没有每个页面的不同网址
【发布时间】：2019-05-01 12:46:46
【问题描述】：

我正在抓取网页（使用 Python requests 和 requests-html 模块），我需要浏览项目列表的所有页面。

在“人类用户”世界中，我单击“2”进入第二页，或单击“->”从实际页面转到下一页。

当我检查我刚才说的元素时，它们是一个<div>标签例如：

<div class="pagination__Page..."> 2 </div> 或

<div class="pagination__Page..."> -> </div>

两者都有一个链接到每个event，所以当我点击它时，会移动到下一页。

我已尝试执行 requests-HTML 文档建议的 for 循环分页，但在这种情况下它不起作用，因为没有链接到 r.html 对象，也没有链接到列表的每个页面。

当我在网站中点击这些“div”时，网址根本不会改变。

检查event（对于2的情况）它调用了一个JS函数，例如：

function() {
   return a({
      pageNum: e
   })
}

检查event函数（对于->的情况）它调用了一个JS，比如：

function() {
   return a({
      direction: "right"
   })
}

我想得到与单击时相同的结果，但我不知道如何。

【问题讨论】：

我想到了几种方法来做到这一点。 1）使用Selenium模拟打开浏览器，处理html，然后让它在下一页“点击”并重复。 2）如果你分享网址，我们可以看看它是否从 XHR 获取数据。如果是这样，您可以通过 POST 请求获取数据，其中页码是查询参数的一部分，您可以通过这种方式对其进行迭代。
在 Chrome/Firefox 的 DevTools 中，您可以看到所有发送到服务器的请求。您可以检查单击下一页时发送的请求，然后您可以尝试使用 python 执行相同的请求..
@chitown88，如果对于查询参数，您正在谈论在 url 中发送的参数，我认为这是不可能的（或者至少，我不知道该怎么做）因为，正如我之前所说，点击“下一页”后，网址不会改变。尽管如此，网址是：link。 furas，我不认为有这样的要求，它似乎是一个以某种方式完整的预加载列表，以 25 行的块显示。
不，我指的不是网址。我指的是xhr（如果有的话）。明天我会去看看，因为那时我不在我的笔记本电脑附近。

标签： python-3.x web-scraping python-requests python-requests-html

【解决方案1】：

您必须使用开发工具来获取精确的查询参数（特别是rqid），但这应该可以帮助您。它将返回完整列表，无需逐页浏览：

import requests
from pandas.io.json import json_normalize

url = 'https://www.flightstats.com/v2/api-next/flight-tracker/arr/ORY/2019/4/29/6'

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}

query = {
'carrierCode': '',
'numHours': '6',
'rqid': '7tl8o43bkps'}

jsonData = requests.get(url, headers=headers, params=query).json()

df = json_normalize(jsonData['data']['flights'])

输出：

print (df)
                  airport.city  ...                                                url
0                      Cayenne  ...  /flight-tracker/TX/571?year=2019&month=4&date=...
1    Saint Denis de la Reunion  ...  /flight-tracker/AF/671?year=2019&month=4&date=...
2               Pointe-a-Pitre  ...  /flight-tracker/SS/3541?year=2019&month=4&date...
3               Pointe-a-Pitre  ...  /flight-tracker/TX/541?year=2019&month=4&date=...
4                       Moscow  ...  /flight-tracker/S7/4021?year=2019&month=4&date...
5                       Moscow  ...  /flight-tracker/ZI/516?year=2019&month=4&date=...
6                      Cayenne  ...  /flight-tracker/AF/853?year=2019&month=4&date=...
7                      Cayenne  ...  /flight-tracker/KL/2245?year=2019&month=4&date...
8                     Toulouse  ...  /flight-tracker/AF/6101?year=2019&month=4&date...
9               Pointe-a-Pitre  ...  /flight-tracker/KL/2261?year=2019&month=4&date...
10                    Toulouse  ...  /flight-tracker/HOP/5101?year=2019&month=4&dat...
11              Pointe-a-Pitre  ...  /flight-tracker/AF/793?year=2019&month=4&date=...
12                      Beirut  ...  /flight-tracker/SS/6628?year=2019&month=4&date...
13                      Beirut  ...  /flight-tracker/ZI/628?year=2019&month=4&date=...
14                 Montpellier  ...  /flight-tracker/AF/7541?year=2019&month=4&date...
15                      Geneva  ...  /flight-tracker/U2/1399?year=2019&month=4&date...
16                 Montpellier  ...  /flight-tracker/HOP/5541?year=2019&month=4&dat...
17                     Ajaccio  ...  /flight-tracker/AF/4442?year=2019&month=4&date...
18                      Bastia  ...  /flight-tracker/HOP/7780?year=2019&month=4&dat...
19                     Ajaccio  ...  /flight-tracker/HOP/7770?year=2019&month=4&dat...
20                     Ajaccio  ...  /flight-tracker/XK/770?year=2019&month=4&date=...
21                      Bastia  ...  /flight-tracker/XK/780?year=2019&month=4&date=...
22                      Bastia  ...  /flight-tracker/AF/4458?year=2019&month=4&date...
23                   Marseille  ...  /flight-tracker/HOP/5001?year=2019&month=4&dat...
24                   Marseille  ...  /flight-tracker/AF/6001?year=2019&month=4&date...
25            Clermont-Ferrand  ...  /flight-tracker/AF/7433?year=2019&month=4&date...
26            Clermont-Ferrand  ...  /flight-tracker/HOP/5433?year=2019&month=4&dat...
27                    Bordeaux  ...  /flight-tracker/HOP/5253?year=2019&month=4&dat...
28                    Bordeaux  ...  /flight-tracker/AF/6253?year=2019&month=4&date...
29                        Nice  ...  /flight-tracker/HOP/5203?year=2019&month=4&dat...
..                         ...  ...                                                ...
192                  Marseille  ...  /flight-tracker/HOP/5009?year=2019&month=4&dat...
193                    Sevilla  ...  /flight-tracker/TO/3201?year=2019&month=4&date...
194                   Bordeaux  ...  /flight-tracker/AF/6277?year=2019&month=4&date...
195                   Toulouse  ...  /flight-tracker/U2/4026?year=2019&month=4&date...
196                   Toulouse  ...  /flight-tracker/HOP/5117?year=2019&month=4&dat...
197                   Toulouse  ...  /flight-tracker/AF/6117?year=2019&month=4&date...
198                       Rome  ...  /flight-tracker/IB/5193?year=2019&month=4&date...
199                       Rome  ...  /flight-tracker/VY/6251?year=2019&month=4&date...
200                   Bordeaux  ...  /flight-tracker/HOP/5277?year=2019&month=4&dat...
201                       Faro  ...  /flight-tracker/U2/4278?year=2019&month=4&date...
202                   Campinas  ...  /flight-tracker/AD/8900?year=2019&month=4&date...
203                 Casablanca  ...  /flight-tracker/AT/760?year=2019&month=4&date=...
204                   Campinas  ...  /flight-tracker/ZI/36?year=2019&month=4&date=2...
205                       Rome  ...  /flight-tracker/U2/4242?year=2019&month=4&date...
206                    Ajaccio  ...  /flight-tracker/XK/772?year=2019&month=4&date=...
207                    Ajaccio  ...  /flight-tracker/AF/4445?year=2019&month=4&date...
208                    Ajaccio  ...  /flight-tracker/HOP/7772?year=2019&month=4&dat...
209                     Madrid  ...  /flight-tracker/AV/6049?year=2019&month=4&date...
210                     Madrid  ...  /flight-tracker/AA/8758?year=2019&month=4&date...
211                     Madrid  ...  /flight-tracker/IB/3436?year=2019&month=4&date...
212                      Setif  ...  /flight-tracker/AH/1108?year=2019&month=4&date...
213                     Berlin  ...  /flight-tracker/ZI/608?year=2019&month=4&date=...
214                     Berlin  ...  /flight-tracker/SS/6608?year=2019&month=4&date...
215                     Toulon  ...  /flight-tracker/AF/7513?year=2019&month=4&date...
216                     Toulon  ...  /flight-tracker/HOP/5513?year=2019&month=4&dat...
217                  Perpignan  ...  /flight-tracker/AF/7465?year=2019&month=4&date...
218                  Perpignan  ...  /flight-tracker/HOP/5465?year=2019&month=4&dat...
219                      Rodez  ...  /flight-tracker/BE/7682?year=2019&month=4&date...
220                     Nantes  ...  /flight-tracker/AF/7383?year=2019&month=4&date...
221                     Nantes  ...  /flight-tracker/HOP/5383?year=2019&month=4&dat...

[222 rows x 13 columns]

【讨论】：

chitown88，我正在尝试使用“原始路径”（flightstats.com/v2/flight-tracker/arrivals/ORY/…）复制上面的代码，当我点击jsonData = requests.get(url_address, headers=headers, params=query).json() 行时，我得到了JSONDecodeError: Expecting value: line 1 column 1 (char 0)。知道为什么吗？以及如何避免它（当然，除了使用您的链接 xD）谢谢。
原始路径以 html 形式返回响应。与来自 api 的其他链接相反，后者以 json 格式返回。因此，当您尝试将 html 解析为 json 时，它会抛出错误，因为它不是有效的 json 格式/结构。