在 NodeJS 中写入文件时内存不足答案

【问题标题】：Running out of memory writing to a file in NodeJS在 NodeJS 中写入文件时内存不足
【发布时间】：2016-10-22 20:13:30
【问题描述】：

我正在处理我正在处理的大量数据并将其存储在一个文件中。我遍历数据集，然后我想将它全部存储在一个 JSON 文件中。

我使用 fs 的初始方法，将其全部存储在一个对象中然后转储它不起作用，因为我的内存不足并且变得非常慢。

我现在正在使用 fs.createWriteStream，但据我所知，它仍将其全部存储在内存中。

我希望将数据逐个对象写入文件，除非有人可以推荐更好的方法。

我的部分代码：

  // Top of the file
  var wstream = fs.createWriteStream('mydata.json');
  ...

  // In a loop
  let JSONtoWrite = {}
  JSONtoWrite[entry.word] = wordData

  wstream.write(JSON.stringify(JSONtoWrite))

  ...
  // Outside my loop (when memory is probably maxed out)
  wstream.end()

我认为我使用 Streams 错误，有人可以告诉我如何在不耗尽内存的情况下将所有这些数据写入文件吗？我在网上找到的每个示例都与读取流有关，但由于我正在对数据进行计算，我无法使用可读流。我需要按顺序添加到这个文件中。

【问题讨论】：

标签： javascript node.js memory io stream

【解决方案1】：

您也应该将数据源包装在可读流中。我不知道您的来源是什么，但您必须确保它不会将所有数据加载到内存中。

例如，假设您的数据集来自另一个文件，其中 JSON 对象以行尾字符分隔，您可以创建如下读取流：

const Readable = require('stream').Readable;
class JSONReader extends Readable {
constructor(options={}){
  super(options);
  this._source=options.source: // the source stream
  this._buffer='';
  source.on('readable', function() {
    this.read();
  }.bind(this));//read whenever the source is ready
}
_read(size){
   var chunk;
   var line;
   var lineIndex;
   var result;
   if (this._buffer.length === 0) {
     chunk = this._source.read(); // read more from source when buffer is empty
     this._buffer += chunk;
   }
   lineIndex = this._buffer.indexOf('\n'); // find end of line 
   if (lineIndex !== -1) { //we have a end of line and therefore a new object
      line = this._buffer.slice(0, lineIndex); // get the character related to the object
      if (line) {
        result = JSON.parse(line);
        this._buffer = this._buffer.slice(lineIndex + 1);
        this.push(JSON.stringify(line) // push to the internal read queue
      } else {
        this._buffer.slice(1)
      }
  }
}}

现在你可以使用了

const source = fs.createReadStream('mySourceFile');
const reader = new JSONReader({source});
const target = fs.createWriteStream('myTargetFile');
reader.pipe(target);

那么你会有更好的记忆流：

请注意图片和上例均取自优秀nodejs in practice book

【讨论】：

感谢您的详细回答，但我需要将初始源加载到内存中以根据该数据计算各种值。它相互交叉引用每条记录以创建我将输出的数据源，因此我不能使用这种方法。

【解决方案2】：

问题在于，您不是在等待数据刷新到文件系统，而是在紧密循环中不断将新数据和新数据同步扔到流中。

这是一段应该适合你的伪代码：

    // Top of the file
    const wstream = fs.createWriteStream('mydata.json');
    // I'm no sure how're you getting the data, let's say you have it all in an object
    const entry = {};
    const words = Object.keys(entry);

    function writeCB(index) {
       if (index >= words.length) {
           wstream.end()
           return;
       }

       const JSONtoWrite = {};
       JSONtoWrite[words[index]] = entry[words[index]];   
       wstream.write(JSON.stringify(JSONtoWrite), writeCB.bind(index + 1));
    }

    wstream.write(JSON.stringify(JSONtoWrite), writeCB.bind(0));

【讨论】：

有没有不递归的方法？我的循环实际上是一个循环中的循环，因此我无法真正处理嵌套循环内的一次递归调用中的数据。
要更好地组织这样的代码，您可以考虑使用像 async (npmjs.com/package/async) 这样的库