【问题标题】:Trouble with pagination while scraping NodeJS抓取NodeJS时出现分页问题
【发布时间】:2019-12-15 20:57:10
【问题描述】:

我正在编写一个从公共目录中抓取一些信息的小脚本。我已将其保存为 CSV,但在自动分页时遇到了问题。

我的来源是:

const rp = require('request-promise');
const request = ('request');
const otcsv = require('objects-to-csv');
const cheerio = require('cheerio');

// URL To scrape
const baseURL = 'xx';
const searchURL = 'xxx';

// scrape info
const getCompanies = async () => {
    // Pagination test 

    for(let index = 0; index <= 2; index = index + 1) {
        const html = await request.get("xxx" + index);
        const $ = await cheerio.load(html);
        console.log("Loading Pages....");
        // console.log("At page number" + index);
        // end pagination test
        const htmls = await rp(baseURL + searchURL);
        const businessMap = cheerio('a.business-name', htmls).map(async (i, e) => {
            const link = baseURL + e.attribs.href;
            const innerHtml = await rp(link);
            const emailAddress = cheerio('a.email-business', innerHtml).prop('href');
            const name = e.children[0].data || cheerio('h1', innerHtml).text();
            const phone = cheerio('p.phone', innerHtml).text();

            return {
                emailAddress: emailAddress ? emailAddress.replace('mailto:', '') : '',
                //  link,
                name,
                phone,
            }

        }).get();
        return Promise.all(businessMap);
    }
};

// save to CSV
getCompanies()
  .then(result => {
    const transformed = new otcsv(result);
    return transformed.toDisk('./output.csv');
  })
  .then(() => console.log('SUCCESSFULLY COMPLETED THE WEB SCRAPING SAMPLE'));

出现的错误是request.get is not a function。

编辑

此问题的第二部分位于此处:Nodejs Scraper isn't moving to next page(s)

【问题讨论】:

    标签: javascript arrays node.js cheerio


    【解决方案1】:

    request.get 应该是rp.get,因为request 模块不返回Promise

    无论如何,您都会收到错误消息,因为您不是 requireing request,而只是将 string 分配给 request 变量:

    const request = ('request');
    

    改成:

    const request = require('request');
    

    由于您正在使用 Promises,我建议您只需要 request-promise

    const request = require('request-promise');
    

    【讨论】:

    • 感谢 Marcos 的回答,现在我无法让它进入下一页,任何想法
    • 太好了,我没有意识到我可以同时链接到 2 个帖子,第二个帖子在这里 - stackoverflow.com/questions/59363001/…
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-10-10
    • 1970-01-01
    • 1970-01-01
    • 2021-12-05
    • 1970-01-01
    • 2021-05-08
    相关资源
    最近更新 更多