【问题标题】:How to write a functional file "scanner"如何编写功能文件“扫描仪”
【发布时间】:2012-02-07 00:56:36
【问题描述】:

首先让我为这个问题的规模道歉,但我真的在尝试从功能上思考,这是我不得不处理的更具挑战性的问题之一。

我想获得一些关于如何以功能方式处理我遇到的问题的建议,尤其是在 F# 中。我正在编写一个程序来遍历目录列表并使用正则表达式模式列表来过滤从目录中检索到的文件列表,并使用第二个正则表达式模式列表在检索到的文件的文本中查找匹配项。我希望这个东西返回与给定正则表达式模式匹配的每段文本的文件名、行索引、列索引、模式和匹配值。另外,异常需要记录,有3种可能的异常场景:打不开目录、打不开文件、从文件中读取内容失败。这样做的最后一个要求是“扫描”匹配的文件量可能非常大,所以整个事情需要偷懒。我不太担心“纯”功能解决方案,就像我对读起来好且性能良好的“好”解决方案感兴趣一样。最后一个挑战是让它与 C# 互操作,因为我想使用 winform 工具将此算法附加到 ui。这是我的第一次尝试,希望这能澄清问题:

open System.Text.RegularExpressions
open System.IO

type Reader<'t, 'a> = 't -> 'a //=M['a], result varies

let returnM x _ = x 

let map f m = fun t -> t |> m |> f

let apply f m = fun t -> t |> m |> (t |> f)

let bind f m = fun t -> t |> (t |> m |> f)

let Scanner dirs =
    returnM dirs
    |> apply (fun dirExHandler ->
        Seq.collect (fun directory ->
            try
                Directory.GetFiles(directory, "*", SearchOption.AllDirectories)
            with | e ->
                dirExHandler e directory
                Array.empty))
    |> map (fun filenames ->
        returnM filenames
        |> apply (fun (filenamepatterns, lineExHandler, fileExHandler) ->
            Seq.filter (fun filename ->
                 filenamepatterns |> Seq.exists (fun pattern ->
                    let regex = new Regex(pattern)
                    regex.IsMatch(filename)))
            >> Seq.map (fun filename ->
                    let fileinfo = new FileInfo(filename)
                    try
                        use reader = fileinfo.OpenText()
                        Seq.unfold (fun ((reader : StreamReader), index) ->
                            if not reader.EndOfStream then
                                try
                                    let line = reader.ReadLine()
                                    Some((line, index), (reader, index + 1))
                                with | e -> 
                                    lineExHandler e filename index
                                    None
                            else
                                None) (reader, 0)        
                        |> (fun lines -> (filename, lines))
                    with | e -> 
                        fileExHandler e filename
                        (filename, Seq.empty))
            >> (fun files -> 
                returnM files
                |> apply (fun contentpatterns ->
                    Seq.collect (fun file ->
                        let filename, lines = file
                        lines |>
                            Seq.collect (fun line ->
                                let content, index = line
                                contentpatterns
                                |> Seq.collect (fun pattern ->    
                                    let regex = new Regex(pattern)
                                    regex.Matches(content)
                                    |> (Seq.cast<Match>
                                    >> Seq.map (fun contentmatch -> 
                                        (filename, 
                                            index, 
                                            contentmatch.Index, 
                                            pattern, 
                                            contentmatch.Value))))))))))

感谢您的任何意见。

已更新 -- 以下是基于我收到的反馈的任何更新解决方案:

open System.Text.RegularExpressions
open System.IO

type ScannerConfiguration = {
    FileNamePatterns : seq<string>
    ContentPatterns : seq<string>
    FileExceptionHandler : exn -> string -> unit
    LineExceptionHandler : exn -> string -> int -> unit
    DirectoryExceptionHandler : exn -> string -> unit }

let scanner specifiedDirectories (configuration : ScannerConfiguration) = seq {
    let ToCachedRegexList = Seq.map (fun pattern -> new Regex(pattern)) >> Seq.cache

    let contentRegexes = configuration.ContentPatterns |> ToCachedRegexList

    let filenameRegexes = configuration.FileNamePatterns |> ToCachedRegexList

    let getLines exHandler reader = 
        Seq.unfold (fun ((reader : StreamReader), index) ->
            if not reader.EndOfStream then
                try
                    let line = reader.ReadLine()
                    Some((line, index), (reader, index + 1))
                with | e -> exHandler e index; None
            else
                None) (reader, 0)   

    for specifiedDirectory in specifiedDirectories do
        let files =
            try Directory.GetFiles(specifiedDirectory, "*", SearchOption.AllDirectories)
            with e -> configuration.DirectoryExceptionHandler e specifiedDirectory; [||]
        for file in files do
            if filenameRegexes |> Seq.exists (fun (regex : Regex) -> regex.IsMatch(file)) then
                let lines = 
                    let fileinfo = new FileInfo(file)
                    try
                        use reader = fileinfo.OpenText()
                        reader |> getLines (fun e index -> configuration.LineExceptionHandler e file index)
                    with | e -> configuration.FileExceptionHandler e file; Seq.empty
                for line in lines do
                    let content, index = line
                    for contentregex in contentRegexes do
                        for mmatch in content |> contentregex.Matches do
                            yield (file, index, mmatch.Index, contentregex.ToString(), mmatch.Value) }

再次,欢迎任何意见。

【问题讨论】:

  • 你见过像 Parsec 这样的函数式解析器吗?
  • 这是很多文字。尝试将其拆分以更易于阅读。
  • 我会简单地使用接口和对象表达式来创建一个实例并将其公开给 C# 代码。

标签: f# functional-programming


【解决方案1】:

我认为最好的方法是从最简单的解决方案开始,然后对其进行扩展。您当前的方法对我来说似乎很难读懂,原因有两个:

  • 代码在 F# 中不太常见的模式中使用了大量组合子和函数组合。使用序列表达式可以更轻松地编写一些处理。

  • 代码全部写成一个函数,但是相当复杂,如果分成多个函数,可读性会更好。

我可能会首先将代码拆分为一个测试单个文件的函数(比如fileMatches)和一个遍历文件并调用fileMatches 的函数。使用 F# 序列表达式可以很好地编写主迭代:

// Checks whether a file name matches a filename pattern 
// and a content matches a content pattern.
let fileMatches fileNamePatterns contentPatterns 
                (fileExHandler, lineExHandler) file =
  // TODO: This can be imlemented using
  // File.ReadLines which returns a sequence.


// Iterates over all the files and calls 'fileMatches'.
let scanner specifiedDirectories fileNamePatterns contentPatterns
            (dirExHandler, fileExHandler, lineExHandler) = seq {
  // Iterate over all the specified directories.
  for specifiedDir in specifiedDirectories do
    // Find all files in the directories (and handle exceptions).
    let files =
      try Directory.GetFiles(specifiedDir, "*", SearchOption.AllDirectories)
      with e -> dirExHandler e specifiedDir; [||]
    // Iterate over all files and report those that match.
    for file in files do
      if fileMatches fileNamePatterns contentPatterns 
                     (fileExHandler, lineExHandler) file then 
        // Matches! Return this file as part of the result.
        yield file }

这个函数还是比较复杂的,因为你需要传递很多参数。将参数包装在简单类型或记录中可能是个好主意:

type ScannerArguments = 
  { FileNamePatterns:string 
    ContentPatterns:string
    FileExceptionHandler:exn -> string -> unit
    LineExceptionHandler:exn -> string -> unit
    DirectoryExceptionHandler:exn -> string -> unit }

然后您可以将fileMatchesscanner 定义为只接受两个参数的函数,这将使您的代码更具可读性。比如:

// Iterates over all the files and calls 'fileMatches'.
let scanner specifiedDirectories (args:ScannerArguments) = seq {
  for specifiedDir in specifiedDirectories do
    let files =
      try Directory.GetFiles(specifiedDir, "*", SearchOption.AllDirectories)
      with e -> args.DirectoryExceptionHandler e specifiedDir; [||]
    for file in files do
      // No need to propagate all arguments explicitly to other functions.
      if fileMatches args file then yield file }

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-10-02
    • 2022-11-24
    相关资源
    最近更新 更多