【发布时间】:2021-06-11 16:27:25
【问题描述】:
我有一个要抓取的 url 列表,以查找具有以下两个类的“a”标签,即行业名称和行业链接。我想抓住innerHtml 和href。我在控制台中抓取它,但 Cheeriojs 遇到了一些问题。
<a href="xxxxxxx" class="industry-name industry-link">Retail</a> // taken from target page
document.getElementsByClassName('industry-name industry-link')[0].innerHTML //runs in console and works as expected
document.getElementsByClassName('industry-name industry-link')[0].href //runs in console and works as expected
const cheerio = require('cheerio');
const got = require('got');
const url= 'xxxxxxx';
got(url).then(response => {
const $ = cheerio.load(response.body);
$('a', '.industry-name', '.industry-link').text();
}).catch(err => {
console.log(err);
});
【问题讨论】:
标签: web-scraping cheerio