【问题标题】:Parsec fails without error if reading from file如果从文件中读取 Parsec 失败且没有错误
【发布时间】:2021-07-07 08:42:08
【问题描述】:

我编写了一个小型 parsec 解析器来从用户提供的输入字符串或输入文件中读取样本。如果输入以分号分隔的字符串形式提供,它会在错误输入时正确失败并显示有用的错误消息:

> readUncalC14String "test1,7444,37;6800,36;testA,testB,2000,222;test3,7750,40"
*** Exception: Error in parsing dates from string: (line 1, column 29):
unexpected "t"
expecting digit

但是对于具有相同条目的输入文件inputFile.txt,它会静默失败:

test1,7444,37
6800,36
testA,testB,2000,222
test3,7750,40
> readUncalC14FromFile "inputFile.txt"
[UncalC14 "test1" 7444 37,UncalC14 "unknownSampleName" 6800 36]

为什么会这样?我怎样才能让readUncalC14FromFile 也以一种有用的方式失败?

这是我的代码的最小子集:

import qualified Text.Parsec                    as P
import qualified Text.Parsec.String             as P

data UncalC14 = UncalC14 String Int Int deriving Show

readUncalC14FromFile :: FilePath -> IO [UncalC14]
readUncalC14FromFile uncalFile = do
    s <- readFile uncalFile
    case P.runParser uncalC14SepByNewline () "" s of
        Left err -> error $ "Error in parsing dates from file: " ++ show err
        Right x -> return x
    where
        uncalC14SepByNewline :: P.Parser [UncalC14]
        uncalC14SepByNewline = P.endBy parseOneUncalC14 (P.newline <* P.spaces)

readUncalC14String :: String -> Either String [UncalC14]
readUncalC14String s = 
    case P.runParser uncalC14SepBySemicolon () "" s of
        Left err -> error $ "Error in parsing dates from string: " ++ show err
        Right x -> Right x
    where 
        uncalC14SepBySemicolon :: P.Parser [UncalC14]
        uncalC14SepBySemicolon = P.sepBy parseOneUncalC14 (P.char ';' <* P.spaces)

parseOneUncalC14 :: P.Parser UncalC14
parseOneUncalC14 = do
    P.try long P.<|> short
    where
        long = do
            name <- P.many (P.noneOf ",")
            _ <- P.oneOf ","
            mean <- read <$> P.many1 P.digit
            _ <- P.oneOf ","
            std <- read <$> P.many1 P.digit
            return (UncalC14 name mean std)
        short = do
            mean <- read <$> P.many1 P.digit
            _ <- P.oneOf ","
            std <- read <$> P.many1 P.digit
            return (UncalC14 "unknownSampleName" mean std)

【问题讨论】:

    标签: parsing haskell parsec


    【解决方案1】:

    这里发生的是你输入的前缀是一个有效的字符串。要强制 parsec 使用整个输入,您可以使用 eof 解析器:

    uncalC14SepByNewline = P.endBy parseOneUncalC14 (P.newline <* P.spaces) <* P.eof
    

    一个有效而另一个无效的原因是sepByendBy 之间的差异。这是一个更简单的例子:

    sepTest, endTest :: String -> Either P.ParseError String
    sepTest s = P.runParser (P.sepBy (P.char 'a') (P.char 'b')) () "" s
    endTest s = P.runParser (P.endBy (P.char 'a') (P.char 'b')) () "" s
    

    这里有一些有趣的例子:

    ghci> sepTest "abababb"
    Left (line 1, column 7):
    unexpected "b"
    expecting "a"
    
    ghci> endTest "abababb"
    Right "aaa"
    
    ghci> sepTest "ababaa"
    Right "aaa"
    
    ghci> endTest "ababaa"
    Left (line 1, column 6):
    unexpected "a"
    expecting "b"
    

    如您所见,sepByendBy 都可以静默失败,但如果前缀没有以分隔符结尾,sepBy 静默失败 bendBy 如果前缀没有以静默方式失败在主解析器a结束。

    所以如果你想确保你读取了整个文件/字符串,你应该在两个解析器之后使用eof

    【讨论】:

    • 太棒了!这确实解决了这个问题。在今晚花了不合理的时间试图弄清楚这一点后,我学到了一些新东西。非常感谢!
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2013-01-17
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多