【发布时间】:2021-07-17 09:55:14
【问题描述】:
我正在尝试使用 XPath 获取我在 Letterboxhd 上评分的最后一部电影/系列,然后将其打印出来。为了获得第一部电影,我在 HTML 中找到了这个:"<span class="frame-title">Magnolia (1999)</span>"。
获取第一部电影的评分:"<span class="rating -tiny -darker rated-6">★★★</span>"
我知道每次运行这段代码都会得到 3 颗星,所以我只写了一半。
这是我所做的:
let data = await page.evaluate(() => {
let titles = document.evaluate("//span[contains(@class, 'frame-title')]", document, null, XPathResult.ANY_TYPE, null);
let title = titles.iterateNext();
let ratings = document.evaluate("//span[contains(@class, ' -tiny')]", document, null, XPathResult.ANY_TYPE, null);
let rating = ratings.iterateNext();
return{
title,
rating
}
});
当我运行此代码时,我看到“数据”未定义。我究竟做错了什么?我应该怎么做?
这是我的完整代码:
const puppeteer = require('puppeteer');
(async () => {
let movieUrl = 'https://letterboxd.com/sdeer/films/';
let browser = await puppeteer.launch({ headless: true });
let page = await browser.newPage();
await page.goto(movieUrl, { waitUntil: 'networkidle2'});
let data = await page.evaluate(() => {
let titles = document.evaluate("//span[contains(@class, 'frame-title')]", document, null, XPathResult.ANY_TYPE, null);
let title = titles.iterateNext();
let ratings = document.evaluate("//span[contains(@class, ' -tiny')]", document, null, XPathResult.ANY_TYPE, null);
let rating = ratings.iterateNext();
return{
title,
rating
}
});
debugger
console.log(data.title.textContent);
console.log(data.rating.textContent);
await browser.close();
})();
【问题讨论】:
标签: javascript html web-scraping xpath