如何在python中仅获取目录中与特定命名模式匹配的文件的名称，而忽略其他文件？答案

【问题标题】：How to obtain names of only files in a directory which match a certain naming pattern, and ignore others, in python?如何在python中仅获取目录中与特定命名模式匹配的文件的名称，而忽略其他文件？
【发布时间】：2020-11-11 09:26:12
【问题描述】：

我有一个全是 jpeg 文件的目录，这些文件都应该按照相同的格式命名，例如：

"ABC_00001_D0.jpg"
"ABC_00100_D8.jpg"
"ABC_00023_D4.jpg"
...

其中数字字符可以是任意数字，但每个文件名的字母和下划线应始终相同，并且位置相同。

我正在将文件名读入一个列表，同时确保只抓取像这样的 jpg 类型：

import os

expected_filename_style = "ABC_00000_D0.jpg"

folder_path = r"C:\my_dir"
filelist = []
for f in os.listdir(folder_path):
    if f.endswith(".jpg"):
        filelist.append(f)
        print(f)

但是，有时目录中会出现不符合我命名约定的恶意文件名。例如，我想忽略看起来像 EFG_00001_D1.jpg 或 ABC_0E001_D0.jpg 的文件名。

我希望能够更改预期的格式（例如更改为“00_XYZ_00.jpg”）并且代码现在应该接受新格式。但是，它始终只能是允许变化的数字字符，所以我想想以某种方式检查每个文件名中的非数字字符是否与 expected_filename_style 中正确位置的非数字字符匹配？谁能帮我解决这个问题？

【问题讨论】：

使用带有正则表达式re.compile("ABC_(\d+)_D(\d+).jpg", flags=re.I)的re库。
您的建议没有将我的 expected_filename_style 变量作为输入。如前所述，如果我想改变这一点怎么办？
然后你相应地改变你的正则表达式。我认为根据输入创建动态正则表达式并不困难
你能提供一个小的工作示例吗？我是 python 新手，刚刚学习。
我会添加一个答案

标签： python string directory path filenames

【解决方案1】：

正如在 cmets 中所讨论的，这里有一个使用 re 库的解决方案

import re
expected_file_format = "ABC_00000_D0.jpg"

# as mentioned, this can vary. 
# Also, characters and underscore represent themselves, 
# but 0 represents all digits 0-9

regex = re.compile(expected_file_format.replace("0", "\d") + "$", flags=re.I) 
# dont add the flags if you want case sensitive match

file_name = "ABC_12345_D9.jpg"
print(bool(regex.match(file_name)))  # True

file_name = "ABC_1234_D9.jpg"
print(bool(regex.match(file_name)))  # False

【讨论】：

看来我也不再需要 .endswith() 了。
Yes .match 用于从字符串开头搜索，“$”表示匹配到字符串结尾。所以“ABC_12345_D6.jpge”会搜索失败