【问题标题】:Upload Node Request-Response Response To MongoDB将节点请求-响应响应上传到 MongoDB
【发布时间】:2017-11-28 16:37:10
【问题描述】:

我在 Node.js 中玩 Cheerio。我有一个刮板,它会进入文章列表,抓取所有文章 URL,然后转到每篇文章并刮掉标题和 URL。一切正常,除了当我尝试将结果更新到我的 Mongodb 时,我得到未定义。

我假设它在定义值之前尝试更新插入......但即使使用 Request-Response 我也无法让它工作。任何帮助将不胜感激!由于代码不是太长,我将粘贴整个内容,以便更容易看到我正在尝试做什么。同样,主要问题是让upsertArticle 实际插入变量。

const request = require('request');
const cheerio = require('cheerio');
const rp = require('request-promise');
const mongoose = require('mongoose');
const Article = require('./models/article');

var urls = [];
//get the list of articles to scrape
rp('https://www.somesite.com/', function(error, response, html) {
    if (!error && response.statusCode == 200) {
        var $ = cheerio.load(html);
        $('.c-entry-box--compact__title').each(function(i, element) {
            var a = $(this);
            urls.push(a.children().attr('href'));
        });     }  })
    //scrape over each article individually
    .then(function(getStuff) {
        var arrayLength = urls.length;
        //get the list of articles to scrape and upsert each one
        for (var i = 0; i < arrayLength; i++) {
            const result = rp(urls[i], function(error, response, html) {
                if (!error && response.statusCode == 200) {
                    var $ = cheerio.load(html);
                    var parsedResults = [];
                    $('.l-main-content').each(function(n, element) {
                        var a = $(this);
                        var title = a.find('.c-page-title').text();
                        var url = response.request.uri.href;
                        //I also tried upserting the variables right here, that didn't work
                        return { title, url };
                    });
                } else {console.log(error);}
            }).then(function(upsertStuff) {
                    //also tried returning and upserting stuff here... but nothing gets upserted
                    upsertArticle({
                        title: result.title,
                        source: result.url,
                        dateCrawled: new Date()
                    });
                    console.log('Upserted ' + result.title);
                }).catch(function(err) {console.log(err);   });     }
    })  .catch(function(err) {console.log(err); });

function upsertArticle(userObj) {
    const DB_URL = 'mongodb://localhost/articles';
    if (mongoose.connection.readyState == 0) {
        mongoose.connect(DB_URL, {
            useMongoClient: true
        });
    }
    let conditions = {
        title: userObj.title
    };
    let options = {
        upsert: true,
        new: true,
        setDefaultsOnInsert: true
    };
    Article.findOneAndUpdate(conditions, userObj, options, (err, result) => {
        if (err) throw err;
    });
}

【问题讨论】:

    标签: javascript node.js mongodb response cheerio


    【解决方案1】:

    我对提供的代码进行了一些更改。也就是说,我使用 Promise 而不是回调来让您的逻辑保持一致并确保一切都在应该运行的时候运行。

    对于for 循环,我将upsertArticle({...}) 移回each 函数内部,以便在运行时定义titleurl

    最后,我正在使用 Bluebird 的 Promise.allrequest-promise 已经依赖于 Bluebird)在所有链接都被插入时发出信号。此更改是可选的,但我认为在一切完成后获得反馈会很有用:

    试试这个:

    const request = require('request');
    const cheerio = require('cheerio');
    const rp = require('request-promise');
    const mongoose = require('mongoose');
    const Article = require('./models/article');
    const Promise = require("bluebird");
    
    var urls = [];
    
    rp({uri: 'https://www.somesite.com',  resolveWithFullResponse: true}).then(function(response) {
    
        if(response.statusCode != 200) throw "Response: " + response.statusCode;
    
        var html = response.body;
    
        var $ = cheerio.load(html);
    
        $('.c-entry-box--compact__title').each(function(i, element) {
            var a = $(this);
            urls.push(a.children().attr('href'));
        });
    
    }).then(function(getStuff) {
    
        var arrayLength = urls.length;
        var promiseArray = [];
    
        for(var i = 0; i < arrayLength; i++) {
    
            const p = rp({uri: urls[i],  resolveWithFullResponse: true}).then(function(response) {
    
                if(response.statusCode != 200) throw "Response: " + response.statusCode;
    
                var html = response.body;
    
                var $ = cheerio.load(html);
                var parsedResults = [];
    
                $('.l-main-content').each(function(n, element) {
    
                    var a = $(this);
                    var title = a.find('.c-page-title').text();
                    var url = response.request.uri.href;
    
                    upsertArticle({
                        title: title,
                        source: url,
                        dateCrawled: new Date()
                    });
    
                    console.log('Upserted ' + title);
                });
    
            });
    
            promiseArray.push(p);
        }
    
        return Promise.all(promiseArray);
    
    }).then(function() {
        console.log("Done upserting!");
    })
    .catch(function(err) {
        console.log(err); 
    });
    
    function upsertArticle(userObj) {
        const DB_URL = 'mongodb://localhost/articles';
        if (mongoose.connection.readyState == 0) {
            mongoose.connect(DB_URL, {
                useMongoClient: true
            });
        }
        let conditions = {
            title: userObj.title
        };
        let options = {
            upsert: true,
            new: true,
            setDefaultsOnInsert: true
        };
        Article.findOneAndUpdate(conditions, userObj, options, (err, result) => {
            if (err) throw err;
        });
    }
    

    我无法在不知道https://www.somesite.com 的真实值的情况下测试代码,所以如果代码给您带来任何新错误,请告诉我。

    【讨论】:

    • 完美!我只需要删除 result.var 并使用 var,一切都很好!
    • 啊,哎呀!不小心把那个放在那里,我会更新答案
    猜你喜欢
    • 2022-01-14
    • 1970-01-01
    • 2018-05-28
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-02-06
    • 2014-12-20
    • 1970-01-01
    相关资源
    最近更新 更多