在 bash 中从 txt 文件中多次读取（并行处理）答案

【问题标题】：Multiple read from a txt file in bash (parallel processing )在 bash 中从 txt 文件中多次读取（并行处理）
【发布时间】：2017-06-02 12:16:02
【问题描述】：

这是一个用于 HTTP 状态码的简单 bash 脚本

while read url
    do
        urlstatus=$(curl -o /dev/null --silent --head --write-out  '%{http_code}' "${url}" --max-time 5 )
        echo "$url  $urlstatus" >> urlstatus.txt
    done < $1

我正在从文本文件中读取 URL，但它一次只处理一个，花费太多时间，GNU 并行和 xargs 也一次处理一行（已测试）

如何同时处理 URL 进行处理以提高时间？换句话说，URL 文件的线程而不是 bash 命令（GNU 并行和 xargs 所做的）

 Input file is txt file and lines are separated  as
    ABC.Com
    Bcd.Com
    Any.Google.Com

Something  like this

【问题讨论】：

为什么不读取文件并为每个 URL 衍生不同的 nohup 脚本？
你能详细说明一下吗
究竟什么时间太长了？请举个例子。读取 10,000 个 URL 的 bash 循环可能会在您的前 2-3 个 curl 命令之前完成，因此这不是瓶颈，也不值得优化。只需使用 GNU Parallel 运行 curl 命令。
实际上问题是并行处理多个命令而不是多个url
例如--- cat abc.txt | parallel -j100 --pipe /root/bash5.sh abc.txt 像正常的 bash 脚本执行一样一次处理一个 url

标签： bash curl libcurl xargs gnu-parallel

【解决方案1】：

GNU parallel 和 xargs 也一次处理一行（已测试）

你能举个例子吗？如果您使用-j，那么您一次应该能够运行多个进程。

我会这样写：

doit() {
    url="$1"
    urlstatus=$(curl -o /dev/null --silent --head --write-out  '%{http_code}' "${url}" --max-time 5 )
    echo "$url  $urlstatus"
}
export -f doit
cat "$1" | parallel -j0 -k doit >> urlstatus.txt

根据输入：

Input file is txt file and lines are separated  as
ABC.Com
Bcd.Com
Any.Google.Com
Something  like this
www.google.com
pi.dk

我得到了输出：

Input file is txt file and lines are separated  as  000
ABC.Com  301
Bcd.Com  301
Any.Google.Com  000
Something  like this  000
www.google.com  302
pi.dk  200

看起来很正确：

000 if domain does not exist
301/302 for redirection
200 for success

【讨论】：

我会测试并告诉你
嘿，我得到了相同的状态码 000，你能告诉我你是如何从终端执行脚本的吗？可能有帮助
cat input.txt | parallel -j0 -k doit >> urlstatus.txt; 如您所见，对于不存在的域，我也得到 000。我想知道，如果你真的从你的输入中给我们一个摘录。如果这 6 行实际上不在您的输入文件中，那么您能否提供 actual 输入文件中的 10 行？
我解释了整个过程--- 1. 我复制了你的 bash 脚本并将其保存为 bash.sh 并授予执行权限。 2. 我的输入文件是大文件，但我也在 10 行小文件上进行了测试——这里是列表 www.yahoo.com ，www.google.com facebook.com amazon.com bing.com apple.com www.microsoft。 com www.windows.com ,,,,, 全部由行分隔并保存为 top.txt 4. 现在我转到终端并键入 ./bash.sh top.txt 5. 现在它在每 6 个中给出结果 000 . 现在你能在我错的地方进一步帮助我吗,,,谢谢