将目录中的文件放入数组变量中答案

【问题标题】：Putting files in directory into array variable将目录中的文件放入数组变量中
【发布时间】：2020-12-28 08:02:17
【问题描述】：

我正在编写 bash 代码，它将在运行它的目录中搜索特定文件并将它们添加到数组变量中。我遇到的问题是格式化结果。我需要在当前目录中找到所有压缩文件，并按最后修改的顺序显示文件的名称和大小。我想获取该命令的结果并将它们放入一个数组变量中，每个行元素都包含文件名和相应的大小，但我不知道该怎么做。我不确定我是否应该使用命令“find”而不是“ls”，但这是我目前所拥有的：

find_files="$(ls -1st --block-size=MB)"
arr=( ($find_files) )

【问题讨论】：

mywiki.wooledge.org/ParsingLs
另见BashFAQ/003
定义“压缩文件”？请提供文件目录的示例（有和没有一些“压缩”文件），然后提供预期的数组内容（例如，一个关联数组，其中索引是文件名，值是 MB？）；如果 MB 值不是一个很好的整数（例如，1,325.27 MB）怎么办...存储整数值、实值，包括逗号？
arr=( $(anything) ) 是一种反模式。见BashPitfalls #50。
顺便说一句，How can I store the “find” command results as an array in Bash 密切相关；如果你没有尺寸要求，我会标记这个副本——即使没有它也可能值得一读。

标签： arrays linux bash shell ls

【解决方案1】：

我不确定您希望数组采用什么格式，但这里有一个 sn-p，它创建一个以文件名为键的关联数组，其大小为值：

$ ls -l test.{zip,bz2}
-rw-rw-r-- 1 user group 0 Sep 10 13:27 test.bz2
-rw-rw-r-- 1 user group 0 Sep 10 13:26 test.zip

$ declare -A sizes; while read SIZE FILENAME ; do sizes["$FILENAME"]="$SIZE"; done < <(find * -prune -name '*.zip' -o -name *.bz2  | xargs stat -c "%Y %s %N" | sort | cut -f 2,3 -d " ")

$ echo "${sizes[@]@A}"
declare -A sizes=(["'test.zip'"]="0" ["'test.bz2'"]="0" )

$

如果你只想要一个字面上“文件名大小”条目的数组，那就更容易了：

$ while read SIZE FILENAME ; do sizes+=("$FILENAME $SIZE"); done < <(find * -prune -name '*.zip' -o -name *.bz2  | xargs stat -c "%Y %s %N" | sort | cut -f 2,3 -d " ")

$ echo "${sizes[@]@A}"
declare -a sizes=([0]="'test.zip' 0" [1]="'test.bz2' 0")

$

【讨论】：

【解决方案2】：

这两种解决方案都有效，并通过本文中的复制粘贴进行了测试。

第一个相当慢。一个问题是循环内的外部程序调用 - 例如，date 会为每个文件调用。您可以通过不在输出数组中包含日期来加快速度（请参阅下面的注释）。特别是对于方法 2 - 这将导致 while 循环内没有 external 命令调用。但方法 1 确实存在问题 - 速度要慢几个数量级。

另外，有人可能知道如何将纪元日期转换为 awk 中的另一种格式，这可能会更快。也许你也可以在awk 中进行排序。也许只是保持纪元日期？

这些解决方案是 bash / GNU 重的，不能移植到其他环境（这里是 bash 字符串，find-printf）。 OP 标记为 linux 和 bash，因此可以假定为 GNU。

解决方案 1 - 捕获任何压缩文件 - 使用 file 匹配（慢）

“压缩”的标准是file 输出是否包含单词compress
足够可靠，但可能与其他文件类型描述有冲突？
file -l | grep compress（文件 5.38，Ubuntu 20.04，WSL）对我来说根本没有冲突（列出的所有文件都是压缩格式）
除此之外，我找不到对任何压缩文件进行分类的方法
我在一个包含 1664 个文件的目录上运行了这个 - 时间（实际）是 40 秒

#!/bin/bash

# Capture all files, recursively, in $TARGET, that are
# compressed files. In an indexed array. Using file name
# extensions to match.

# Initialise variables, and check the target is valid
declare -g c= compressed_files= path= TARGET=$1
[[ -r "$TARGET" ]] || exit 1

# Make the array
# A here string (<<<) must be used, to keep array in the global environment
while IFS= read -r -d '' path; do
    [[ "$(file --brief "${path%% *}")" == *compress* ]] &&
    compressed_files[c++]="${path% *} $(date -d @${path##* })"
done < \
    <(
        find "$TARGET" -type f -printf '%p %s %T@\0' |
        awk '{$2 = ($2 / 1024); print}' |
        sort -n -k 3
    )

# Print results - to test
printf '%s\n' "${compressed_files[@]}"

解决方案 2 - 使用 文件扩展名 - 速度提高几个数量级

如果您确切知道要查找的扩展程序，您可以将它们组合成 find 命令
这很多更快
在与上述相同的目录中，包含 1664 个文件 - 时间（实际）为 200 毫秒
此示例查找 .gz、.zip 和 .7z（分别为 gzip、zip 和 7zip）
我不确定-type f -and -regex '.*[.]$gz\|zip\|7z$ -and printf 是否会再次更快，现在我想起来了。我从 glob 开始，因为我认为这更快
这也可能允许将扩展列表存储在变量中..
此方法可避免对目标中的每个文件进行file分析
它还使 while 循环更短 - 你只是在迭代匹配
注意这里-printf的重复，这是由于逻辑查找用途：-printf 是“真”。如果单独包含它，它会充当“匹配”并打印所有文件
它必须用作名称匹配为真的结果（使用-and）
也许有人有更好的作文？

#!/bin/bash

# Capture all files, recursively, in $TARGET, that are
# compressed files. In an indexed array. Using file name
# extensions to match.

# Initialise variables, and check the target is valid
declare -g c= compressed_files= path= TARGET=$1
[[ -r "$TARGET" ]] || exit 1

while IFS= read -r -d '' path; do
    compressed_files[c++]="${path% *} $(date -d @${path##* })"
done < \
    <(
        find "$TARGET" \
            -type f -and -name '*.gz'  -and -printf '%p %s %T@\0' -or \
            -type f -and -name '*.zip' -and -printf '%p %s %T@\0' -or \
            -type f -and -name '*.7z'  -and -printf '%p %s %T@\0' |
        awk '{$2 = ($2 / 1024); print}' |
        sort -n -k 3
    )

# Print results - for testing
printf '%s\n' "${compressed_files[@]}"

样本输出（任一方法）：

$ comp-find.bash /tmp
/tmp/comptest/websters_english_dictionary.tmp.tar.gz 265.148 Thu Sep 10 07:53:37 AEST 2020
/tmp/comptest/What_is_Systems_Architecture_PART_1.tar.gz 1357.06 Thu Sep 10 08:17:47 AEST 2020

注意：

您可以添加文字K 来指示块大小/单位（千字节）
如果你想只打印这个数组的路径，你可以使用后缀去除：printf '%s\n' "${files[@]&& *}"
对于数组中没有日期（它用于排序，但它的工作可能完成），只需删除 $(date -d @${path##* })（包括空格）。
有点切线，但要使用不同的日期格式，请将$(date -d @${path##* }) 替换为： $(date -I -d @${path##* }) ISO 格式 - 请注意短选择样式：date -Id @[date] 对我不起作用 $(date -d @${path##* } +%Y-%M-%d_%H-%m-%S) 类似 ISO，但有秒数 $(date -d @${path##* } +%Y-%M-%d_%H-%m-%S) 再次相同，但有纳秒（find 给你纳秒）

抱歉，帖子太长了，希望能提供信息。

【讨论】：