【发布时间】:2015-06-15 17:31:07
【问题描述】:
我有一个 Haskell 程序,它读取输入文件的内容并对其进行解析以排序和删除重复项。这个程序已经休眠了一段时间了,我需要复活它。我告诉你这只是为了了解这个问题的一些历史背景。
当我重新启用该程序时,我发现它无法正常工作。我的调试已将问题与解析和“清理”输入文件的代码隔离开来。之后发生的事情对这个问题来说无关紧要,因为我最终得到了来自输入文件的候选记录的空列表。
我在我的 Windows 笔记本电脑上编写和测试这个程序,然后在需要运行的 Ubuntu 服务器上部署和构建源代码。作为调试的一部分,我将文本解析分解为几个隐蔽的步骤,在最后一步的输出中运行 catMaybe 的部分是我得到空列表的地方,但只有当我在 Ubuntu 服务器上运行它时。
这是演示问题的主要来源:
main = do
[ inFileName ] <- getArgs
sFile <- readFile inFileName
let lrec = lines sFile
putStrLn $ "Number of lines read from the file: " ++ show (length lrec)
let prec = map processLine lrec
putStrLn $ "Number of processed lines is " ++ show (length prec)
-- let persons = mapMaybe processLine lrec
let persons = catMaybes prec
putStrLn $ "Number of filtered person records: " ++ show (length persons)
let records = sortBy (compare `on` personEmployeeID) persons
putStrLn $ "Number of records read and sorted is " ++ show (length records)
{-
Compare and warn about employees with duplicate records.
-}
let srec = groupBy ((==) `on` personEmployeeID) records
putStrLn $ "Number of unique record groups is " ++ show (length srec)
let dups = map (personEmployeeID . head) $ filter ((> 1) . length) srec
putStrLn $ "Number of dups: " ++ show (length dups)
unless (null dups) $ putStrLn $ "WARNING: Duplicate employees: " ++ show dups
-- Remove the duplicates
let cleanedRecords = map head srec
putStrLn $ "Number of records in cleanedRecords is " ++ show (length cleanedRecords)
正如您可能从注释行中注意到的那样,我使用 mapMaybe 代替 catMaybes 进行了尝试,结果没有任何变化。下面是 processLine 方法的代码,其中注释显示了输入记录的格式:
{-
Splits a line of the input file into fields. The format includes 11 columns,
separated by semicolons. The 10th columns is required to be 'A' or 'S',
indicating the user is active or short-term; otherwise we ignore that line.
Sample Line:
------------------------------------------------------------------------------------------------------------------------------------------------
99XXXXX17;MXXX ;TXXXXX ;MIXXXXXX ;RAA CBP;RAA;19910929;19910929;19910929;A; ;
------------------------------------------------------------------------------------------------------------------------------------------------
emp id ;first name ;middle name ;last name ;loc code;dpt;hiredate;servdate;statdate;s;note ;
------------------------------------------------------------------------------------------------------------------------------------------------
* s = status
-}
processLine :: String -> Maybe Person
processLine line =
let (_ :: String, _ :: String, _ :: String, result) =
line =~ "^(.+);(.+);(.+);(.+);(.+);(.+);(.+);(.+);(.+);(A|S);(.+);$"
in case result of
[empid, fname, mname, lname, lcode, dept, hdate, srvdate, stdate, status, note]
-> Just $ Person empid (trim fname) (trim mname) (trim lname)
(trim lcode) dept hdate srvdate stdate (readStatus status) (trim note)
_ -> Nothing
当我在 Windows 笔记本电脑上运行此代码时,它会产生以下输出:
Number of lines read from the file: 47793
Number of processed lines is 47793
Number of filtered person records: 32993
Number of records read and sorted is 32993
Number of unique record groups is 32949
Number of dups: 44
WARNING: Duplicate employees: [ {List removed for privacy } ]
Number of records in cleanedRecords is 32949
C:>cabal --version
cabal-install version 1.22.4.0
using version 1.22.3.0 of the Cabal library
C:>ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.8.3
当我在两个不同的 Ubuntu 服务器中的任何一个上针对相同的输入文件运行相同的代码时,每个服务器都有不同版本的 Ubuntu 和 Haskell,我得到以下输出:
Number of lines read from the file: 47793
Number of processed lines is 47793
Number of filtered person records: 0
Number of records read and sorted is 0
Number of unique record groups is 0
Number of dups: 0
Number of records in cleanedRecords is 0
xx:~/$ cabal --version
cabal-install version 0.14.0
using version 1.14.0 of the Cabal library
xx:~/$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.4.1
...从另一个 Ubuntu 服务器:
Number of lines read from the file: 47793
Number of processed lines is 47793
Number of filtered person records: 0
Number of records read and sorted is 0
Number of unique record groups is 0
Number of dups: 0
Number of records in cleanedRecords is 0
yy:~/$ cabal --version
cabal-install version 0.10.2
using version 1.10.2.0 of the Cabal library
yy:~/$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.6.1
像往常一样,我很困惑。我已经准备好尝试任何事情了。
有什么想法吗?
戴夫
【问题讨论】:
-
看起来差异在到达
mapMaybe/catMaybes之前就开始了。从文件中读取的行数不同。 -
我已经在我的 Ubuntu 机器上尝试过你的代码。使用文本文件中的示例行,过滤器可以正常工作。您确定您的 ubuntu 机器上的文件遵循相同的格式吗?例如,他们是否有一个非空的注释字段?
-
是的,现在我真的觉得自己很愚蠢。在我匆忙写下这篇文章时,我确实在我的 Windows 机器上使用了错误的输入文件。我已经更正了该错误并编辑了 OP 以反映更正的结果。我被告知我所报告的内容是不可能的,我同意。有人告诉我,输入文件中必须存在一些差异来解释输出的差异。在最后一个示例中,我将代码和输入文件从我的 Windows 笔记本电脑复制到了那个 Ubuntu 服务器;因此,我相信它们是相同的。输入文件上的 wc -l 确认行数。感谢 cmets。
标签: list parsing haskell ubuntu