【发布时间】:2016-02-03 22:20:48
【问题描述】:
我已检索到 IMDB 数据转储(感谢 http://www.omdbapi.com/ 和少量捐赠)作为 TSV 文件(包含 1,111,073 行)。每条线代表一部电影,它们看起来像这样:
ID imdbID Title Year Rating Runtime Genre Released Director Writer Cast Metacritic imdbRating imdbVotes Poster Plot FullPlot Language Country Awards lastUpdated
1 tt0000001 Carmencita 1894 NOT RATED 1 min Documentary, Short William K.L. Dickson Carmencita 5.8 1100 http://ia.media-imdb.com/images/M/MV5BMjAzNDEwMzk3OV5BMl5BanBnXkFtZTcwOTk4OTM5Ng@@._V1_SX300.jpg Performing on what looks like a small wooden stage, wearing a dress with a hoop skirt and white high-heeled pumps, Carmencita does a dance with kicks and twirls, a smile always on her face. Performing on what looks like a small wooden stage, wearing a dress with a hoop skirt and white high-heeled pumps, Carmencita does a dance with kicks and twirls, a smile always on her face. USA 2015-12-10 01:09:33.043000000
我的目标是可视化电影长度随时间的演变。因此,我需要创建两个数组,一个用于最小值/最大值,一个用于每年的平均值(因为 Highcharts 图表类型“面积和折线图”需要这种格式)。因此,我编写了一个脚本,它适用于一小部分子集,但在尝试读取整个文件时会抛出一个错误,这并不意外。
我很清楚流应该能够帮助解决这个问题,但我的专业知识有限,这个小项目实际上是为了帮助我更好地了解流...
这是目前的脚本:
https://gist.github.com/jfix/f79f011ce99d2049613c
如果最好在我的问题中内联显示整个脚本,我显然可以添加它。
这是抛出的错误:
$ node each.js
buffer.js:382
throw new Error('toString failed');
^
Error: toString failed
at Buffer.toString (buffer.js:382:11)
at StringDecoder.write (string_decoder.js:129:21)
at Parser._transform (/Users/jakob/Projects/imdb-film-length/node_modules/csv-parse/lib/index.js:154:26)
at Transform._read (_stream_transform.js:167:10)
at Transform._write (_stream_transform.js:155:12)
at doWrite (_stream_writable.js:292:12)
at writeOrBuffer (_stream_writable.js:278:5)
at Writable.write (_stream_writable.js:207:11)
at /Users/jakob/Projects/imdb-film-length/node_modules/csv-parse/lib/index.js:46:14
at doNTCallback0 (node.js:419:9)
感谢您提供正确方向的任何指示...
【问题讨论】: