【问题标题】:How to copy only non-duplicate files whilst maintaining folder structure?如何在保持文件夹结构的同时仅复制非重复文件?
【发布时间】:2022-10-15 19:49:06
【问题描述】:

我正在尝试在两个文件夹之间查找重复项,并且只将唯一的图像文件复制到“dest”文件夹中。我可以使用下面的代码复制所有非重复项,但是它不维护源目录结构。我认为 OS.walk 返回 3 个元组,但它们没有链接,所以不确定如何重新构建子目录?

例子:

import shutil, os
from difPy import dif
source = input('Input source folder:')
dest = input('Input backup \ destination folder:')

ext = ('.jpg','.jpeg','.gif','.JPG','.JPEG','.GIF')

search = dif(source, dest)
result = search.result
result


dupes = []
srcfiles = []
filecount = []
failed = []
removed = []

for i in result.values(): 
        dupes.append(i['location'])

for dirpath, subdirs, files in os.walk(source):
    for x in files:
        if x.endswith(ext):
            srcfiles.append(os.path.join(dirpath, x))

for f in srcfiles:
                if f not in dupes:
                        shutil.copy(f, dest)
                        print('File copied successfully - '+f)
                        filecount.append(f)
                else:
                        print('File not copied successfully !!!! - '+f)
                        failed.append(f)

我也尝试过将 shutil.copytree 函数与忽略列表一起使用,但是它需要一个新文件夹并且无法使忽略列表函数工作

shutil.copytree 示例:

for i in result.values(): 
        df = []
        df.append(i['filename'])

def ignorelist(source, df):
        return [f for f in df if os.path.isfile(os.path.join(source, f))]

shutil.copytree(source, destnew, ignore=ignorelist)

【问题讨论】:

  • github.com/gchamon/sysrsync 可能会以最小的努力为您提供您所追求的。
  • 您是否有源文件夹中唯一和重复文件的示例?例如文件夹_A包含:pic1.png、pic2.png、pic9.png; Folder_B contians:pic2.png、picY.png、picW4K.png。我解释您的问题的方式是您希望 Folder_NEW 具有 pic1.png、pic2.png、pic9.png、picY.png、picW4K.png。听起来对吗?
  • @kyrlon,理想情况下,Folder_B 将拥有 pic1.png、pic2.png、pic9.png、picY.png、picW4K.png 而无需创建新文件夹。但是我还不能解决的问题是当 Folder_A 有一个子文件夹时,例如Folder_A\subfolder\pic.png,它只是复制到 Folder_B 而不重新创建该子文件夹(使用第一个示例代码)。
  • 使用 shutil.copytree 方法,它需要创建一个新文件夹,因此 Folder_B 将包含 Folder_B\New 和 pic1.png、pic9.png 作为非欺骗。但同样,如果 Folder_A 下有子文件夹,则复制时不会对其进行维护。
  • @W4K1NG 您只需要在调用 shutil.copy 之前告诉 shutil.copy 正确的目的地并确保目录存在 - 请参阅下面的答案。

标签: python python-3.x shutil


【解决方案1】:

这个函数 ignorelist 应该可以解决问题:

import shutil, os
from difPy import dif
source = input('Input source folder:')
dest = input('Input backup  destination folder:')

ext = ('.jpg','.jpeg','.gif')

search = dif(source, dest)

dupes = list(value['location'] for value in search.result.values())

def ignorelist(source, files):
    return list(file for file in files
                    if (os.path.isfile(os.path.join(source, file))
                         and (os.path.join(source, file) in dupes
                              or not file.lower().endswith(ext))))

shutil.copytree(source, dest, ignore=ignorelist)

另一种“更手动”的方式是

import shutil, os
from difPy import dif
source = input('Input source folder:').rstrip('/\')
dest = input('Input backup  destination folder:').rstrip('/\')

ext = ('.jpg','.jpeg','.gif')

search = dif(source, dest)

dupes = list(value['location'] for value in search.result.values())

srcfiles = []
copied = []
failed = []
skipped = []

for dirpath, subdirs, files in os.walk(source):
    for file in files:
        if file.lower().endswith(ext):
            srcfile = os.path.join(dirpath,file)
            srcfiles.append(srcfile)
            if srcfile in dupes:
                print('File not copied (duplicate) - '+srcfile)
                skipped.append(srcfile)
            else:
                try:
                    destfile = os.path.join(dest,srcfile[len(source)+1:])
                    os.makedirs(os.path.dirname(destfile), exist_ok=True)
                    shutil.copy(srcfile,destfile)
                    print('File copied successfully - '+srcfile)
                    copied.append(srcfile)
                except Exception as err:
                    print('File not copied (error %s) - %s' % (str(err),srcfile))
                    failed.append(f)

【讨论】:

    【解决方案2】:

    我更改了一些变量名称以使它们更具描述性。你所说的failed 实际上只是一个未复制的文件列表,因为它们是重复的,而不是尝试复制但失败的文件。

    import shutil, os
    from difPy import dif
    
    source = input('Input source folder: ')
    dest = input('Input backup  destination folder: ')
    
    # Remove trailing path separators if they exist:
    if source.endswith(('/', '\')):
        source = source[:-1]
    if dest.endswith(('/', '\')):
        dest = dest[:-1]
    
    # Use the correct path separator to
    # ensure correct matching with dif results:
    if os.sep == '/':
        source = source.replace('\', os.sep)
    elif os.sep == '\':
        source = source.replace('/', os.sep)
    
    source_directory_length = len(source) + 1
    
    ext = ('.jpg','.jpeg','.gif','.JPG','.JPEG','.GIF')
    
    search = dif(source, dest)
    result = search.result
    
    # Set comprehension:
    dupes = {duplicate['location'] for duplicate in result.values()}
    
    copied = []
    not_copied = []
    for dirpath, subdirs, files in os.walk(source):
        for file in files:
            if file.endswith(ext):
                source_path = os.path.join(dirpath, file)
                if source_path not in dupes:
                    # get subdirectory of source directory that this file is in:
                    file_length = len(file) + 1
                    # Get subdirectory relative to the source directory:
                    subdirectory = source_path[source_directory_length:-file_length]
                    if subdirectory:
                        dest_directory = os.path.join(dest, subdirectory)
                        # ensure directory exists:
                        os.makedirs(dest_directory, exist_ok=True)
                    else:
                        dest_directory = dest
                    dest_path = os.path.join(dest_directory, file)
                    shutil.copy(source_path, dest_path)
                    print('File copied successfully -', source_path)
                    copied.append(source_path)
                else:
                    print('File not copied -', source_path)
                    not_copied.append(source_path)
    

    【讨论】:

      【解决方案3】:
      import sysrsync
      
      source = input('Input source folder:')
      dest = input('Input backup  destination folder:')
      sysrsync.run(source=source,
                   destination=dest,
                   sync_source_contents=False)
      

      来自:https://github.com/gchamon/sysrsync

      【讨论】:

      • 嘿,谢谢,不过我不想使用 sysrsync。
      猜你喜欢
      • 2018-11-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-12-11
      • 1970-01-01
      • 2018-12-11
      • 1970-01-01
      相关资源
      最近更新 更多