如果您有能力一次性将整个文件读入内存,您可以使用类似以下代码的代码,这样应该会更快:
(let loop ((lines (with-input-from-file "largeish_file.txt"
read-lines)))
(if (null? lines)
'()
(append (string-split (car lines))
(loop (cdr lines)))))
这里有一些快速基准代码:
(import (chicken io)
(chicken string))
;; Warm-up
(with-input-from-file "largeish_file.txt" read-lines)
(time
(with-output-to-file "a.out"
(lambda ()
(display
(call-with-input-file "largeish_file.txt"
(lambda (input-file)
(let loop ([line (read-line input-file)]
[tokens '()])
(if (eof-object? line)
tokens
(loop (read-line input-file)
(append tokens (string-split line)))))))))))
(time
(with-output-to-file "b.out"
(lambda ()
(display
(let loop ((lines (with-input-from-file "largeish_file.txt"
read-lines)))
(if (null? lines)
'()
(append (string-split (car lines))
(loop (cdr lines)))))))))
这是我系统上的结果:
$ csc bench.scm && ./bench
28.629s CPU time, 13.759s GC time (major), 68772/275 mutations (total/tracked), 4402/14196 GCs (major/minor), maximum live heap: 4.63 MiB
0.077s CPU time, 0.033s GC time (major), 68778/292 mutations (total/tracked), 10/356 GCs (major/minor), maximum live heap: 3.23 MiB
只要确保我们从两个代码 sn-ps 得到相同的结果:
$ cmp a.out b.out && echo They contain the same data
They contain the same data
largeish_file.txt 是通过 cat'ing 一个 ~100KB 系统日志文件直到它有 ~10000 行生成的(提到这一点以便您了解输入文件的配置文件):
$ ls -l largeish_file.txt
-rw-r--r-- 1 mario mario 587340 Aug 2 11:55 largeish_file.txt
$ wc -l largeish_file.tx
5790 largeish_file.txt
我在 Debian 系统上使用 CHICKEN 5.2.0 得到的结果。