数据提取 Tar 和 7z答案

【问题标题】：Data Extract Tar and 7z数据提取 Tar 和 7z
【发布时间】：2019-12-12 13:11:42
【问题描述】：

我有一个 .tar 文件，其中包含许多文件夹和子文件夹。在这些许多文件夹中，除了其他文件之外，还有 .7z 文件。我想搜索这些文件夹/子文件夹并找到 .7z 文件（将它们分配给数组？）并将它们提取到各自的位置。

我正在使用 Apache Commons： 1) org.apache.commons.compress.archivers.sevenz 提供使用 7z 格式读取和写入档案的类。 2) org.apache.commons.compress.archivers.tar
为使用 TAR 格式读取和写入档案提供流类。

步骤我想解压 .tar 文件
步骤我想递归浏览提取的 .tar 文件夹及其子文件夹并找到 .7z 文件。
在第 3 步中，我想向数组提供我找到的 .7z 文件数组，并将它们一一提取到各自的位置。

我在数组调用/分配的第 3 步中遇到问题：/您能帮忙吗？非常感谢:)

    /**
     * uncompresses .tar file
     * @param in
     * @param out
     * @throws IOException
     */
    public static void decompressTar(String in, File out) throws IOException {
        try (TarArchiveInputStream tin = new TarArchiveInputStream(new FileInputStream(in))){
            TarArchiveEntry entry;
            while ((entry = tin.getNextTarEntry()) != null) {
                if (entry.isDirectory()) {
                    continue;
                }
                File curfile = new File(out, entry.getName());
                File parent = curfile.getParentFile();
                if (!parent.exists()) {
                    parent.mkdirs();
                }
                IOUtils.copy(tin, new FileOutputStream(curfile));
            }
        }
    }

    /**
     * uncompresses .7z file
     * @param in
     * @param destination
     * @throws IOException
     */
    public static void decompressSevenz(String in, File destination) throws IOException {
        //@SuppressWarnings("resource")
        SevenZFile sevenZFile = new SevenZFile(new File(in));
        SevenZArchiveEntry entry;
        while ((entry = sevenZFile.getNextEntry()) != null){
            if (entry.isDirectory()){
                continue;
            }
            File curfile = new File(destination, entry.getName());
            File parent = curfile.getParentFile();
            if (!parent.exists()) {
                parent.mkdirs();
            }
            FileOutputStream out = new FileOutputStream(curfile);
            byte[] content = new byte[(int) entry.getSize()];
            sevenZFile.read(content, 0, content.length);
            out.write(content);
            out.close();
        }
        sevenZFile.close();
    }

    public void run()
    {
        //1) uncompress .tar
        try {
            JThreadTar.decompressTar(RECURSIVE_DIRECTORY_PATH, new File(RECURSIVE_DIRECTORY));
        } catch (IOException e1) {
            // TODO Auto-generated catch block
            e1.printStackTrace();
        }

        //2) go through the extracted .tar file directory and look for .7z (recursively?)
        File[] files = new File(RECURSIVE_DIRECTORY).listFiles();

        for (File file : files) {
                if (file.isDirectory()) {

                    File[] matches = file.listFiles(new FilenameFilter()
                    {
                      public boolean accept(File dir, String name)
                      {
                         return name.endsWith(".7z");
                      }
                    });

                    for (File element: matches) {
                        System.out.println(element);
                        }
                }
                else {
                    continue;
                }
        }

        //3) Feed the array above to decompressSevenz method

        for (int i = 0; i < matches.length; i++)
        {
            if (matches[i].isFile())
            {      
                try {
                JThreadTar.decompressSevenz(matches[i].toString(), new File(RECURSIVE_DIRECTORY));
                } 
                catch (IOException e2) {
                // TODO Auto-generated catch block
                e2.printStackTrace();
                }
            }
        }

我的问题是：我无法在第 3 步中引用 []matches。我没有正确使用它。我只想为 .7z 文件匹配创建一个数组 []matches。每次找到 .7z 时，我都想将其添加到此数组中。在 3. 步骤中，我想将每个 .7z 提取到其相对位置。

我走得更远了：

    //1) uncompress .tar
        try {
            JThreadTar.decompressTar(RECURSIVE_DIRECTORY_PATH, new File(RECURSIVE_DIRECTORY));
        } catch (IOException e1) {
            // TODO Auto-generated catch block
            e1.printStackTrace();
        }

        //2) go through the extracted .tar file directory and look for .7z (recursively?)
        File dir = new File(RECURSIVE_DIRECTORY);
        File[] dirFiles = dir.listFiles();
        ArrayList<File> matches2 = new ArrayList<File>();

        for (File file : dirFiles) {
                if (file.isDirectory()) {
                    File[] matches = dir.listFiles(new FilenameFilter()
                    {
                      public boolean accept(File dir, String name)
                      {
                         return name.endsWith(".7z");
                      }
                    });
                    matches2.addAll(Arrays.asList(matches));
                }
                else if (file.isFile()) {
                    if (file.getName().endsWith(".7z")){
                    matches2.add(file);
                    };
                    }
                };


            //3) Feed the arraylist above to decompressSevenz method   
            for (int counter = 0; counter < matches2.size(); counter++) {
            if (matches2.get(counter).isFile())
            {  
                try {
                JThreadTar.decompressSevenz(matches2.get(counter).toString(), new File(RECURSIVE_DIRECTORY));
                } 
                catch (IOException e2) {
                // TODO Auto-generated catch block
                e2.printStackTrace();
                }
            }
            }

这是@Joop Eggen 的第 2 步和第 3 步的最终形式

        Path topDir = Paths.get(RECURSIVE_DIRECTORY);
        try {
            Files.walk(topDir)
                .filter(path -> path.getFileName().toString().endsWith(".7z"))
                .forEach(path -> {
                    try {
                        JThreadTar.decompressSevenz(path.toString(), topDir.toFile());
                    } catch (IOException e2) {
                        e2.printStackTrace();
                    }
            });
        } catch (IOException e1) {
            // TODO Auto-generated catch block
            e1.printStackTrace();
        }

递归步骤：

        Path toptopDir = Paths.get(RECURSIVE_DIRECTORY_PATH);
        try {
            Files.walk(toptopDir)
                .filter(path -> path.getFileName().toString().endsWith(".tar"))
                .forEach(path -> {
                    try {
                        JThreadTar.decompressTar(RECURSIVE_DIRECTORY_PATH, new File(RECURSIVE_DIRECTORY));
                    } catch (IOException e2) {
                        e2.printStackTrace();
                    }
            });
        } catch (IOException e1) {
            // TODO Auto-generated catch block
            e1.printStackTrace();
        }

【问题讨论】：

你有什么问题？请说明问题
在第 2 步中，您将“递归地”写在您的评论中。但是没有递归调用。你应该把它放在一个可以递归调用自己的单独方法中。为什么要使用数组？最好使用某种列表来存储对 7z 文件的引用。数组的长度是固定的，你事先并不知道会找到多少个 7z 文件。
@vanje 感谢您的发言！ :) 你是绝对正确的。我现在为 2. 步骤制作了一个 arrayList。我浏览了提取的 .tar 文件夹中的所有文件夹和文件，并检查它是否是 .7z 文件。如果是，我将其添加到 arrayList。在 3. 步骤中，我遍历了该 arrayList。我的代码远非完美。如果您在某处看到任何改进，请告知。但是我认为它有效。
@arcanium0611 我只是试着描述一下并写了一些进一步的代码。谢谢！

标签： java apache

【解决方案1】：

我借此机会使用了更新的路径和文件。 Files.listFiles() 可能返回 null。而Arrays.asList等的使用会导致数据量大。

所有这些都将简化为：

    Path topDir = Paths.get(RECURSIVE_DIRECTORY);
    Files.walk(topDir)
        .filter(path -> path.getFileName().toString().endsWith(".7z"))
        .forEach(path -> {
            try {
                JThreadTar.decompressSevenz(path.toString(), topDir.toFile());
            } catch (IOException e2) {
                e2.printStackTrace();
            }
    });

【讨论】：

非常感谢@Joop Eggen 的工作就像一个魅力！该方法只接受一件事 decompressSevenz(String, File) 我将第二个参数更改为 topDir.toFile() 并且必须添加另一个 IOException。你觉得这样好吗？我添加了上面的代码
当然；感谢您的反馈;我会为其他读者更改答案。
您好，乔普·埃根。我想就另一项更改向您咨询。现在我想扩展上述算法的 1. 步骤。就像我总是从 1 个 .tar 文件开始一样。然而这一次。我想遍历这个 .tar 文件中的文件夹/子文件夹，如果找到的话，提取 .tar 文件直到没有 .tar 文件。然后进入第 2 步，从 topDir 开始，查找并提取 .7z 文件，直到没有 .7z 文件。我试图在上面应用相同的逻辑，但不是运气。你有什么建议吗？我把代码贴在上面。
那么在任何.7z 中都可以再次包含一个.tar 吗？让我想起了Matryoshkas。我认为不能同时遍历目录树和解包。然后收集 tar 列表 final List<Path> tars = new ArrayList<>(1000); 并在第二阶段处理它们。顺便说一句，我喜欢使用 Unix 同步来进行数据挖掘。
哦，不，很抱歉造成混淆，我们采用单个 .tar 文件。并在其中查找 .tar 文件。然后我们继续 7z。所以我的意思是 1. 解压直到没有 .tar 离开 2. 进入 unsevenz 直到没有 .7z 离开。