逐行读取 FileReader 对象而不将整个文件加载到 RAM 中答案

【问题标题】：Read FileReader object line-by-line without loading the whole file into RAM逐行读取 FileReader 对象而不将整个文件加载到 RAM 中
【发布时间】：2015-08-07 01:47:04
【问题描述】：

现在许多浏览器都支持使用 HTML5 的 FileReader 读取本地文件，这为网站打开了大门，这些网站超越了“数据库前端”进入脚本，可以对本地数据做一些有用的事情，而无需先将其发送到服务器.

除了在上传之前对图像和视频进行预处理之外，FileReader 的一个重要应用是将数据从某种磁盘表（CSV、TSV 等）加载到浏览器中进行操作 - 可能用于在 D3 中进行绘图或分析.js 或在 WebGL 中创建景观。

问题是，StackOverflow 和其他网站上的大多数示例都使用 FileReader 的 .readAsText() 属性，该属性在返回结果之前将整个文件读入 RAM。

javascript: how to parse a FileReader object line by line

要在不将数据加载到 RAM 的情况下读取文件，需要使用 .readAsArrayBuffer()，而这篇 SO 帖子是我能得到的最接近好的答案：

filereader api on big files

但是，它对那个特定问题有点太具体了，老实说，我可以尝试几天来使解决方案更通用，但由于我不了解块大小的重要性或为什么使用 Uint8Array。使用用户可定义的行分隔符（最好使用 .split()，因为它也接受正则表达式）逐行读取文件的更一般问题的解决方案，然后每行执行一些操作（例如将其打印到console.log) 将是理想的。

【问题讨论】：

“使用用户可定义的行分隔符（最好使用 .split()，因为它也接受正则表达式）来解决更普遍的问题，如果你可以使用拆分，您已经加载了整个文件...
如果您在阅读时将其拆分为块，则不会 :) 说，读取 1Mb，拆分，处理线，其余添加另一个 1Mb，冲洗重复 :)
您使用 Uint8Array（或 node.js 中的 Buffer）的原因是因为文件可能是二进制文件，而 javascript 字符串无法处理二进制数据（例如，字节 0x00 - 也称为 nul终止符（是的，那是带有一个“l”的 nul））
这里我给出两点。先说说Uint8Array的使用。请记住，文件是 bytes 的序列，而不是 characters，就像字符串一样。结果，文件是以字节块（使用 Uint8Array）而不是字符块读取的（这就是为什么答案说它“假设输入是 ASCII”）。要转换为字符，需要知道字符编码（现在可以假定为 UTF-8，除非元数据另有规定）和字符解码器，例如使用的 TextEncoder 类。
下一行分隔符。一般来说，行分隔符只有两种或三种可能的选择：LF、CR/LF 和 CR。其中，前两个是最常见的（Linux 中的第一个，Windows 中的第二个）。其他选择将是非常不寻常的。因此，最好创建一个行阅读器来处理最常见的行分隔符选择，从而无需手动指定行分隔符。

标签： javascript html filereader

【解决方案1】：

我在以下 Gist URL 上创建了一个 LineReader 类。正如我在评论中提到的，使用除 LF、CR/LF 和 CR 之外的其他行分隔符是不常见的。因此，我的代码仅将 LF 和 CR/LF 视为行分隔符。

https://gist.github.com/peteroupc/b79a42fffe07c2a87c28

例子：

new LineReader(file).readLines(function(line){
 console.log(line);
});

【讨论】：

很棒的解决方案彼得！这很好用！我希望它得到很多关注 :) 我以前从未向 SO answer-gist 捐款过，但这次你的解决方案绝对值得 :) 谢谢！
这太棒了彼得。当所有行读取完成后，我可以调用回调吗？
@Noitidart：我希望我已经更新了我的要点以添加可以满足您需求的功能。请注意，它只是在适当位置进行了编辑，并且该功能没有经过测试。
谢谢@PeterO.！我会试试看，这周让你知道

【解决方案2】：

这是来自 Peter O 的代码的改编 TypeScript 类版本。

export class BufferedFileLineReader {
  bufferOffset = 0;
  callback: (line: string) => void = () => undefined;
  currentLine = '';
  decodeOptions: TextDecodeOptions = { 'stream': true };
  decoder = new TextDecoder('utf-8', { 'ignoreBOM': true });
  endCallback: () => void = () => undefined;
  lastBuffer: Uint8Array | undefined;
  offset = 0;
  omittedCR = false;
  reader = new FileReader();
  sawCR = false;

  readonly _error = (event: Event): void => {
    throw event;
  };

  readonly _readFromView = (dataArray: Uint8Array, offset: number): void => {
    for (let i = offset; i < dataArray.length; i++) {
      // Treats LF and CRLF as line breaks
      if (dataArray[i] == 0x0A) {
        // Line feed read
        const lineEnd = (this.sawCR ? i - 1 : i);
        if (lineEnd > 0) {
          this.currentLine += this.decoder.decode(dataArray.slice(this.bufferOffset, lineEnd), this.decodeOptions);
        }
        this.callback(this.currentLine);
        this.decoder.decode(new Uint8Array([]));
        this.currentLine = '';
        this.sawCR = false;
        this.bufferOffset = i + 1;
        this.lastBuffer = dataArray;
      } else if (dataArray[i] == 0x0D) {
        if (this.omittedCR) {
          this.currentLine += '\r';
        }
        this.sawCR = true;
      } else if (this.sawCR) {
        if (this.omittedCR) {
          this.currentLine += '\r';
        }
        this.sawCR = false;
      }
      this.omittedCR = false;
    }

    if (this.bufferOffset != dataArray.length) {
      // Decode the end of the line if no current line was reached
      const lineEnd = (this.sawCR ? dataArray.length - 1 : dataArray.length);
      if (lineEnd > 0) {
        this.currentLine += this.decoder.decode(dataArray.slice(this.bufferOffset, lineEnd), this.decodeOptions);
      }
      this.omittedCR = this.sawCR;
    }
  };

  readonly _viewLoaded = (): void => {
    if (!this.reader.result) {
      this.endCallback();
    }

    const dataArray = new Uint8Array(this.reader.result as ArrayBuffer);
    if (dataArray.length > 0) {
      this.bufferOffset = 0;
      this._readFromView(dataArray, 0);
      this.offset += dataArray.length;
      const s = this.file.slice(this.offset, this.offset + 256);
      this.reader.readAsArrayBuffer(s);
    } else {
      if (this.currentLine.length > 0) {
        this.callback(this.currentLine);
      }
      this.decoder.decode(new Uint8Array([]));
      this.currentLine = '';
      this.sawCR = false;
      this.endCallback();
    }
  }

  constructor(private file: File) {
    this.reader.addEventListener('load', this._viewLoaded);
    this.reader.addEventListener('error', this._error);
  }

  public readLines(callback: (line: string) => void, endCallback: () => void) {
    this.callback = callback;
    this.endCallback = endCallback;
    const slice = this.file.slice(this.offset, this.offset + 8192);
    this.reader.readAsArrayBuffer(slice);
  }
}

再次感谢 Peter O 的精彩回答。

【讨论】：