总是很难将特定方法声明为“最快”,因为几乎总有一些方法可以提高性能。但是,使用Data.ByteString.Char8 的方法和您建议的一般方法应该是最快的读取数字的方法之一。如果遇到性能不佳的情况,问题可能出在其他地方。
为了给出一些具体的结果,我生成了一个 191Meg 文件,其中包含 2000 万个 9 位数字,在一行上以空格分隔。然后我尝试了几种通用方法来读取一行数字并打印它们的总和(据记录,它是 10999281565534666)。使用String的明显方法:
reader :: IO [Int]
reader = map read . words <$> getLine
sum' xs = sum xs -- work around GHC ticket 10992
main = print =<< sum' <$> reader
耗时 52 秒;使用Text的类似方法:
import qualified Data.Text as T
import qualified Data.Text.IO as T
import qualified Data.Text.Read as T
readText = map parse . T.words <$> T.getLine
where parse s = let Right (n, _) = T.decimal s in n
在 2.4 秒内运行(但请注意,需要对其进行修改以处理负数!);以及使用Char8的相同方法:
import qualified Data.ByteString.Char8 as C
readChar8 :: IO [Int]
readChar8 = map parse . C.words <$> C.getLine
where parse s = let Just (n, _) = C.readInt s in n
在 1.4 秒内运行。所有示例均在 GHC 8.0.2 上使用 -O2 编译。
作为比较基准,基于scanf 的 C 实现:
/* GCC 5.4.0 w/ -O3 */
#include <stdio.h>
int main()
{
long x, acc = 0;
while (scanf(" %ld", &x) == 1) {
acc += x;
}
printf("%ld\n", acc);
return 0;
}
运行大约 2.5 秒,与 Text 实现相当。
您可以从 Char8 实现中挤出更多性能。使用手动解析器:
readChar8' :: IO [Int]
readChar8' = parse <$> C.getLine
where parse = unfoldr go
go s = do (n, s1) <- C.readInt s
let s2 = C.dropWhile C.isSpace s1
return (n, s2)
在大约 0.9 秒内运行——我没有试图确定为什么会有差异,但编译器一定错过了对 words-to-readInt 管道进行一些优化的机会。
Haskell 代码参考
用 Numbers.hs 制作一些数字:
-- |Generate 20M 9-digit numbers:
-- ./Numbers 20000000 100000000 999999999 > data1.txt
import qualified Data.ByteString.Char8 as C
import Control.Monad
import System.Environment
import System.Random
main :: IO ()
main = do [n, a, b] <- map read <$> getArgs
nums <- replicateM n (randomRIO (a,b))
let _ = nums :: [Int]
C.putStrLn (C.unwords (map (C.pack . show) nums))
用 Sum.hs 求和:
import Data.List
import qualified Data.Text as T
import qualified Data.Text.IO as T
import qualified Data.Text.Read as T
import qualified Data.Char8 as C
import qualified Data.ByteString.Char8 as C
import System.Environment
-- work around https://ghc.haskell.org/trac/ghc/ticket/10992
sum' xs = sum xs
readString :: IO [Int]
readString = map read . words <$> getLine
readText :: IO [Int]
readText = map parse . T.words <$> T.getLine
where parse s = let Right (n, _) = T.decimal s in n
readChar8 :: IO [Int]
readChar8 = map parse . C.words <$> C.getLine
where parse s = let Just (n, _) = C.readInt s in n
readHand :: IO [Int]
readHand = parse <$> C.getLine
where parse = unfoldr go
go s = do (n, s1) <- C.readInt s
let s2 = C.dropWhile C.isSpace s1
return (n, s2)
main = do [method] <- getArgs
let reader = case method of
"string" -> readString
"text" -> readText
"char8" -> readChar8
"hand" -> readHand
print =<< sum' <$> reader
地点:
./Sum string <data1.txt # 54.3 secs
./Sum text <data1.txt # 2.29 secs
./Sum char8 <data1.txt # 1.34 secs
./Sum hand <data1.txt # 0.91 secs