【问题标题】:Filtering os.walk() dirs and files过滤 os.walk() 目录和文件
【发布时间】:2011-07-05 17:08:57
【问题描述】:

我正在寻找一种在os.walk() 调用中包含/排除文件模式和排除目录的方法。

这是我现在正在做的事情:

import fnmatch
import os

includes = ['*.doc', '*.odt']
excludes = ['/home/paulo-freitas/Documents']

def _filter(paths):
    for path in paths:
        if os.path.isdir(path) and not path in excludes:
            yield path

        for pattern in (includes + excludes):
            if not os.path.isdir(path) and fnmatch.fnmatch(path, pattern):
                yield path

for root, dirs, files in os.walk('/home/paulo-freitas'):
    dirs[:] = _filter(map(lambda d: os.path.join(root, d), dirs))
    files[:] = _filter(map(lambda f: os.path.join(root, f), files))

    for filename in files:
        filename = os.path.join(root, filename)

        print(filename)

有没有更好的方法来做到这一点?怎么样?

【问题讨论】:

    标签: python filtering os.walk


    【解决方案1】:

    为什么选择 fnmatch?

    import os
    excludes=....
    for ROOT,DIR,FILES in os.walk("/path"):
        for file in FILES:
           if file.endswith(('doc','odt')):
              print file
        for directory in DIR:
           if not directory in excludes :
              print directory
    

    没有经过全面测试

    【讨论】:

    • 结尾应该是 .doc 和 .odt 。因为在上面的代码中会返回一个名为 mydoc [没有文件扩展名] 的文件。另外,我认为这将满足 OP 发布的特定情况。排除可能也包含文件,而包含可能包含我猜的目录。
    • 如果您必须使用 glob 模式,则需要 fnmatch(尽管问题中给出的示例不是这种情况)。
    • @Oben Sonne,glob (IMO) 比 fnmatch 有更多的“功能”。例如,路径名扩展。例如,您可以这样做glob.glob("/path/*/*/*.txt")
    • 好点。对于简单的包含/排除模式,glob.glob() 可能是更好的解决方案。
    • 出于良好实践和简化调试的目的,我尽量不使用与内置类型匹配的变量名,例如您使用的“文件”,因为它是内置类型。
    【解决方案2】:

    这是一种方法

    import fnmatch
    import os
    
    excludes = ['/home/paulo-freitas/Documents']
    matches = []
    for path, dirs, files in os.walk(os.getcwd()):
        for eachpath in excludes:
            if eachpath in path:
                continue
        else:
            for result in [os.path.abspath(os.path.join(path, filename)) for
                    filename in files if fnmatch.fnmatch(filename,'*.doc') or fnmatch.fnmatch(filename,'*.odt')]:
                matches.append(result)
    print matches
    

    【讨论】:

    • 有一个错字:filename.odt 应该是 `filename, '*.odt'
    • 如果包含模式的数量增加,则不切实际。此外,不允许对要排除的目录名称使用 glob 模式。
    • 欧本,纠正错误。我同意包含模式部分。它可以在更通用的地方进行编码。
    • 应该在“if eachpath in path”下继续是一个中断吗?
    【解决方案3】:

    来自docs.python.org

    os.walk(top[, topdown=True[, onerror=None[, followlinks=False]]])

    当 topdown 为 True 时,调用者可以就地修改 dirnames 列表……这可用于修剪搜索……

    for root, dirs, files in os.walk('/home/paulo-freitas', topdown=True):
        # excludes can be done with fnmatch.filter and complementary set,
        # but it's more annoying to read.
        dirs[:] = [d for d in dirs if d not in excludes] 
        for pat in includes:
            for f in fnmatch.filter(files, pat):
                print os.path.join(root, f)
    

    我应该指出,上面的代码假定excludes 是一个模式,而不是一个完整的路径。如果os.path.join(root, d) not in excludes 匹配 OP 案例,您需要调整列表理解来过滤。

    【讨论】:

    • excludesincludes 在这里看起来像什么?这个答案有例子吗?
    【解决方案4】:

    此解决方案使用 fnmatch.translate 将 glob 模式转换为正则表达式(假设包含仅用于文件):

    import fnmatch
    import os
    import os.path
    import re
    
    includes = ['*.doc', '*.odt'] # for files only
    excludes = ['/home/paulo-freitas/Documents'] # for dirs and files
    
    # transform glob patterns to regular expressions
    includes = r'|'.join([fnmatch.translate(x) for x in includes])
    excludes = r'|'.join([fnmatch.translate(x) for x in excludes]) or r'$.'
    
    for root, dirs, files in os.walk('/home/paulo-freitas'):
    
        # exclude dirs
        dirs[:] = [os.path.join(root, d) for d in dirs]
        dirs[:] = [d for d in dirs if not re.match(excludes, d)]
    
        # exclude/include files
        files = [os.path.join(root, f) for f in files]
        files = [f for f in files if not re.match(excludes, f)]
        files = [f for f in files if re.match(includes, f)]
    
        for fname in files:
            print fname
    

    【讨论】:

    • Ermm,我们需要if excludes 检查re.match(excludes, ...),不是吗?如果excludes = [],它将匹配所有条目。但我喜欢你的方法,更清晰。 :)
    • @pf.me:你说得对,我没有考虑过这种情况。因此,要么您 1) 将排除列表理解包装在 if exclude 中,2) 前缀 not re.match(excludes, ...)not exclude or,或者 3) 如果原始排除项为空,则将excludes 设置为从不匹配的正则表达式。我使用变体 3 更新了我的答案。
    • 经过一番谷歌搜索后,似乎 [:] 语法dirs[:] = [os.path.join(root, d) for d in dirs] 的要点是使用变异切片方法,该方法会更改列表,而不是创建新列表。这让我大吃一惊 - 没有 [:],它不起作用。
    • 我还是没搞懂机制,dirs[:]怎么改变原来的列表?所有手册都说 slice[:] 返回列表的新副本,成员作为指向原始列表值的指针。Here is a discussion on Stack about this. 那么 dirs[:] 更改原始列表是如何发生的呢?
    • @Daniel:切片不仅可以用于获取列表的值,还可以用于分配选定的项目。由于[:] 表示完整列表,分配给该切片将替换列表的整个先前内容。见docs.python.org/2/library/stdtypes.html#mutable-sequence-types
    【解决方案5】:
    import os
    includes = ['*.doc', '*.odt']
    excludes = ['/home/paulo-freitas/Documents']
    def file_search(path, exe):
    for x,y,z in os.walk(path):
        for a in z:
            if a[-4:] == exe:
                print os.path.join(x,a)
            for x in includes:
                file_search(excludes[0],x)
    

    【讨论】:

      【解决方案6】:

      dirtools 非常适合您的用例:

      from dirtools import Dir
      
      print(Dir('.', exclude_file='.gitignore').files())
      

      【讨论】:

        【解决方案7】:

        这是一个用os.walk()排除目录和文件的例子:

        ignoreDirPatterns=[".git"]
        ignoreFilePatterns=[".php"]
        def copyTree(src, dest, onerror=None):
            src = os.path.abspath(src)
            src_prefix = len(src) + len(os.path.sep)
            for root, dirs, files in os.walk(src, onerror=onerror):
                for pattern in ignoreDirPatterns:
                    if pattern in root:
                        break
                else:
                    #If the above break didn't work, this part will be executed
                    for file in files:
                        for pattern in ignoreFilePatterns:
                            if pattern in file:
                                break
                        else:
                            #If the above break didn't work, this part will be executed
                            dirpath = os.path.join(dest, root[src_prefix:])
                            try:
                                os.makedirs(dirpath,exist_ok=True)
                            except OSError as e:
                                if onerror is not None:
                                    onerror(e)
                            filepath=os.path.join(root,file)
                            shutil.copy(filepath,dirpath)
                        continue;#If the above else didn't executed, this will be reached
        
                continue;#If the above else didn't executed, this will be reached
        

        python >=3.2 由于exist_ok in makedirs

        【讨论】:

          【解决方案8】:

          上述方法对我不起作用。

          所以,这就是我对another question 的原始答案的扩展。

          对我有用的是:

          if (not (str(root) + '/').startswith(tuple(exclude_foldr)))

          它编译了一个路径并排除了我列出的文件夹的元组。

          这给了我想要的确切结果。

          我的目标是让我的 mac 井井有条。

          我可以通过pathlocate & move 特定的file.typesignore subfolders 搜索任何folder,如果他们want to move 文件,我会抢先prompt the user

          注意:Prompt 每次运行只有一次,而不是每个文件

          默认情况下,当您按 Enter 键而不是 [y/N] 时,提示默认为 NO,并且只会列出要移动的 Potential 文件。

          这只是一个snippet of my GitHub 完整的脚本请访问。

          提示:阅读下面的脚本,因为我每行添加了关于我所做的事情的信息。

          #!/usr/bin/env python3
          # =============================================================================
          # Created On  : MAC OSX High Sierra 10.13.6 (17G65)
          # Created On  : Python 3.7.0
          # Created By  : Jeromie Kirchoff
          # =============================================================================
          """THE MODULE HAS BEEN BUILD FOR KEEPING YOUR FILES ORGANIZED."""
          # =============================================================================
          from os import walk
          from os import path
          from shutil import move
          import getpass
          import click
          
          mac_username = getpass.getuser()
          includes_file_extensn = ([".jpg", ".gif", ".png", ".jpeg", ])
          search_dir = path.dirname('/Users/' + mac_username + '/Documents/')
          target_foldr = path.dirname('/Users/' + mac_username + '/Pictures/Archive/')
          exclude_foldr = set([target_foldr,
                              path.dirname('/Users/' + mac_username +
                                           '/Documents/GitHub/'),
                               path.dirname('/Users/' + mac_username +
                                            '/Documents/Random/'),
                               path.dirname('/Users/' + mac_username +
                                            '/Documents/Stupid_Folder/'),
                               ])
          
          if click.confirm("Would you like to move files?",
                           default=False):
              question_moving = True
          else:
              question_moving = False
          
          
          def organize_files():
              """THE MODULE HAS BEEN BUILD FOR KEEPING YOUR FILES ORGANIZED."""
              # topdown=True required for filtering.
              # "Root" had all info i needed to filter folders not dir...
              for root, dir, files in walk(search_dir, topdown=True):
                  for file in files:
                      # creating a directory to str and excluding folders that start with
                      if (not (str(root) + '/').startswith(tuple(exclude_foldr))):
                          # showcase only the file types looking for
                          if (file.endswith(tuple(includes_file_extensn))):
                              # using path.normpath as i found an issue with double //
                              # in file paths.
                              filetomove = path.normpath(str(root) + '/' +
                                                         str(file))
                              # forward slash required for both to split
                              movingfileto = path.normpath(str(target_foldr) + '/' +
                                                           str(file))
                              # Answering "NO" this only prints the files "TO BE Moved"
                              print('Files To Move: ' + str(filetomove))
                              # This is using the prompt you answered at the beginning
                              if question_moving is True:
                                  print('Moving File: ' + str(filetomove) +
                                        "\n To:" + str(movingfileto))
                                  # This is the command that moves the file
                                  move(filetomove, movingfileto)
                                  pass
          
                      # The rest is ignoring explicitly and continuing
                              else:
                                  pass
                              pass
                          else:
                              pass
                      else:
                          pass
          
          
          if __name__ == '__main__':
              organize_files()
          

          从终端运行我的脚本示例:

          $ python3 organize_files.py
          Exclude list: {'/Users/jkirchoff/Pictures/Archive', '/Users/jkirchoff/Documents/Stupid_Folder', '/Users/jkirchoff/Documents/Random', '/Users/jkirchoff/Documents/GitHub'}
          Files found will be moved to this folder:/Users/jkirchoff/Pictures/Archive
          Would you like to move files?
          No? This will just list the files.
          Yes? This will Move your files to the target folder.
          [y/N]: 
          

          列表文件示例:

          Files To Move: /Users/jkirchoff/Documents/Archive/JayWork/1.custom-award-768x512.jpg
          Files To Move: /Users/jkirchoff/Documents/Archive/JayWork/10351458_318162838331056_9023492155204267542_n.jpg
          ...etc
          

          移动文件示例:

          Moving File: /Users/jkirchoff/Documents/Archive/JayWork/1.custom-award-768x512.jpg
          To: /Users/jkirchoff/Pictures/Archive/1.custom-award-768x512.jpg
          Moving File: /Users/jkirchoff/Documents/Archive/JayWork/10351458_318162838331056_9023492155204267542_n.jpg
          To: /Users/jkirchoff/Pictures/Archive/10351458_318162838331056_9023492155204267542_n.jpg
          ...
          

          【讨论】:

            猜你喜欢
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 2017-08-05
            • 1970-01-01
            相关资源
            最近更新 更多