【发布时间】:2020-07-10 16:44:55
【问题描述】:
我正在尝试从每行中提取 3 个信息。完成后,它会滚动到页面底部单击“加载更多”,然后抓取新数据,依此类推,直到没有更多“加载更多”按钮。
为了从表中提取所有数据,我使用了 $$eval 但这会导致未定义。但是,如果我改用 $eval,我会得到数据,但这只会从表的第一行中提取数据。为什么 $$eval 返回“未定义”,如果我不能使用它,如何循环遍历表以使用 $eval 获取所有值?
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({ headless: false }); // default is true
const page = await browser.newPage();
await page.goto('someexamplesite.com', {
waitUntil: 'domcontentloaded',
});
const ExerciseName = await page.$$eval(
'.ExCategory-results > .ExResult-row:nth-child(2) > .ExResult-cell > .ExHeading > a',
(e) => e.innerText
);
const muscleTargeted = await page.$$eval(
' .ExCategory-results > .ExResult-row:nth-child(2) > .ExResult-cell > .ExResult-muscleTargeted > a',
(e) => e.innerText
);
const equipmentType = await page.$$eval(
'.ExCategory-results > .ExResult-row:nth-child(2) > .ExResult-cell > .ExResult-equipmentType > a',
(e) => e.innerText
);
//click on load more
await page.waitForSelector(
'#js-ex-content > #js-ex-category-body > .ExCategory-results > .ExLoadMore > .bb-flat-btn'
);
console.log({ ExerciseName, muscleTargeted, equipmentType });
await browser.close();
})().catch((e) => {
console.error(e);
});
我试图抓取的代码
<div class="ExCategory-results">
<div class="ExCategory-resultsLoadIndicator" id="js-ex-finder-load-indicator">
<div class="ExCategory-resultsLoadIndicatorBox">
<div class="ExCategory-resultsLoadIndicatorSpinner bb-spinner-btn__spinner"></div>
</div>
</div>
<div class="ExResult-row flexo-container flexo-between" itemscope="" itemtype="http://schema.org/ExerciseAction">
<div class="ExResult-cell ">
<!-- using male photos -->
<img class="ExImg ExResult-img ls-is-cached lazyloaded" width="70" height="70" onerror="if (window._E_) _E_(this)" alt=" thumbnail image" src="https://www.websites.com/exercises/exerciseImages/sequences/742/Male/m/742_1.jpg" data-src="https://www.websites.com/exercises/exerciseImages/sequences/742/Male/m/742_1.jpg" itemprop="image">
</div>
<div class="ExResult-cell ExResult-cell--nameEtc">
<h3 class="ExHeading ExResult-resultsHeading">
<a href="/exercises/rickshaw-carry" itemprop="name">
Rickshaw Carry
</a>
</h3>
<div class="ExResult-details ExResult-muscleTargeted">
Muscle Targeted:
<a href="/exercises/muscle/forearms">
Forearms
</a>
</div>
<div class="ExResult-details ExResult-equipmentType">
Equipment Type:
<a href="/exercises/equipment/other">
Other
</a>
</div>
</div>
<div class="ExResult-cell ExResult-cell--rating">
<div class="ExRating">
<div class="ExRating-badge">
9.6
</div>
<div class="ExRating-description ExRating-description--Average">
Average
</div>
</div>
</div>
</div>
<div class="ExResult-row flexo-container flexo-between" itemscope="" itemtype="http://schema.org/ExerciseAction">
<div class="ExResult-cell ">
<!-- using male photos -->
<img class="ExImg ExResult-img ls-is-cached lazyloaded" width="70" height="70" onerror="if (window._E_) _E_(this)" alt=" thumbnail image" src="https://www.websites.com/images/2020/xdb/cropped/xdb-50m-single-leg-leg-press-m1-square-600x600.jpg" data-src="https://www.websites.com/images/2020/xdb/cropped/xdb-50m-single-leg-leg-press-m1-square-600x600.jpg" itemprop="image">
</div>
<div class="ExResult-cell ExResult-cell--nameEtc">
<h3 class="ExHeading ExResult-resultsHeading">
<a href="/exercises/single-leg-press" itemprop="name">
Single-Leg Press
</a>
</h3>
<div class="ExResult-details ExResult-muscleTargeted">
Muscle Targeted:
<a href="/exercises/muscle/quadriceps">
Quadriceps
</a>
</div>
<div class="ExResult-details ExResult-equipmentType">
Equipment Type:
<a href="/exercises/equipment/machine">
Machine
</a>
</div>
</div>
<div class="ExResult-cell ExResult-cell--rating">
<div class="ExRating">
<div class="ExRating-badge">
9.6
</div>
<div class="ExRating-description ExRating-description--Average">
Average
</div>
</div>
</div>
</div>
【问题讨论】:
-
谢谢你,我还发布了我正在尝试抓取的代码片段
-
你能展示出错的地方,而不是有效的地方吗?如果
$$eval失败,请说明您使用的代码不起作用,以便我们告诉您您可能做错了什么。另外,请记住,如果您包含代码,请尝试将其设为 minimal reproducible example,因为现在您展示的 JS 和标记超出了您展示问题所需的内容。 -
我用 $$eval 更新了它。我不知道我想刮过多少页面。下次我只会发布 2 个 div。
标签: javascript web-scraping puppeteer