【问题标题】:replacement / substition with Haskell regex libraries用 Haskell 正则表达式库替换/替换
【发布时间】:2012-01-30 22:20:17
【问题描述】:

是否有用于在 Haskell 中使用正则表达式进行搜索和替换的高级 API?特别是,我正在查看Text.Regex.TDFAText.Regex.Posix 包。我真的很想要一些类型的东西:

f :: Regex -> (ResultInfo -> m String) -> String -> m String

因此,例如,将“狗”替换为“猫”,您可以这样写

runIdentity . f "dog" (return . const "cat")    -- :: String -> String

或者用 monad 做更高级的事情,比如计算出现次数等。

这方面的 Haskell 文档非常缺乏。一些低级 API 说明是 here

【问题讨论】:

    标签: regex haskell


    【解决方案1】:

    Text.Regex.subRegex 包中的regex-compat 怎么样?

    Prelude> import Text.Regex (mkRegex, subRegex)
    
    Prelude> :t mkRegex
    mkRegex :: String -> Regex
    
    Prelude> :t subRegex
    subRegex :: Regex -> String -> String -> String
    
    Prelude> subRegex (mkRegex "foo") "foobar" "123"
    "123bar"
    

    【讨论】:

      【解决方案2】:

      我不知道任何创建此功能的现有函数,但我认为我最终会使用类似 AllMatches [] (MatchOffset, MatchLength) instance of RegexContent 的东西来模拟它:

      replaceAll :: RegexLike r String => r -> (String -> String) -> String -> String
      replaceAll re f s = start end
        where (_, end, start) = foldl' go (0, s, id) $ getAllMatches $ match re s
              go (ind,read,write) (off,len) =
                let (skip, start) = splitAt (off - ind) read 
                    (matched, remaining) = splitAt len matched 
                in (off + len, remaining, write . (skip++) . (f matched ++))
      
      replaceAllM :: (Monad m, RegexLike r String) => r -> (String -> m String) -> String -> m String
      replaceAllM re f s = do
        let go (ind,read,write) (off,len) = do
            let (skip, start) = splitAt (off - ind) read 
            let (matched, remaining) = splitAt len matched 
            replacement <- f matched
            return (off + len, remaining, write . (skip++) . (replacement++))
        (_, end, start) <- foldM go (0, s, return) $ getAllMatches $ match re s
        start end
      

      【讨论】:

        【解决方案3】:

        基于@rampion 的回答,但修正了错字,所以它不仅仅是&lt;&lt;loop&gt;&gt;

        replaceAll :: Regex -> (String -> String) -> String -> String
        replaceAll re f s = start end
          where (_, end, start) = foldl' go (0, s, id) $ getAllMatches $ match re s
                go (ind,read,write) (off,len) =
                    let (skip, start) = splitAt (off - ind) read 
                        (matched, remaining) = splitAt len start 
                    in (off + len, remaining, write . (skip++) . (f matched ++))
        

        【讨论】:

          【解决方案4】:

          您可以使用Data.Text.ICU.Replace module 中的replaceAll

          Prelude> :set -XOverloadedStrings
          Prelude> import Data.Text.ICU.Replace
          Prelude Data.Text.ICU.Replace> replaceAll "cat" "dog" "Bailey is a cat, and Max is a cat too."
          "Bailey is a dog, and Max is a dog too."
          

          【讨论】:

            【解决方案5】:

            也许这种方法适合你。

            import Data.Array (elems)
            import Text.Regex.TDFA ((=~), MatchArray)
            
            replaceAll :: String -> String -> String -> String        
            replaceAll regex new_str str  = 
                let parts = concat $ map elems $ (str  =~  regex :: [MatchArray])
                in foldl (replace' new_str) str (reverse parts) 
            
              where
                 replace' :: [a] -> [a] -> (Int, Int) -> [a]
                 replace' new list (shift, l)   = 
                    let (pre, post) = splitAt shift list
                    in pre ++ new ++ (drop l post)
            

            【讨论】:

              【解决方案6】:

              对于使用“使用 monad 进行更高级的操作,例如计算出现次数等”进行“搜索和替换”,我推荐 Replace.Megaparsec.streamEditT

              有关如何计算出现次数的具体示例,请参阅包 README。

              【讨论】:

                猜你喜欢
                • 1970-01-01
                • 1970-01-01
                • 2020-11-05
                • 1970-01-01
                • 2015-01-24
                • 1970-01-01
                • 2014-06-01
                • 2016-07-17
                相关资源
                最近更新 更多