从 bash 中的 txt 文件中多次读取（线程）答案

【问题标题】：Multiple read from a txt file in bash (threading)从 bash 中的 txt 文件中多次读取（线程）
【发布时间】：2017-05-30 22:52:44
【问题描述】：

这是一个用于 HTTP 状态码的简单 bash 脚本

 while read url
    do
        urlstatus=$(curl -o /dev/null --silent --head --write-out  '%{http_code}' "${url}" --max-time 5 )
        echo "$url  $urlstatus" >> urlstatus.txt
    done < $1

我正在从文本文件中读取 URL，但它一次只处理一个，花费太多时间，GNU 并行和 xargs 也一次处理一行（已测试）

如何同时处理 URL 进行处理以提高时间？换句话说，URL 文件的线程而不是 bash 命令（GNU 并行和 xargs 这样做）

Input file is txt file and lines are separated  as
ABC.Com
Bcd.Com
Any.Google.Com
Something  like this

。

【问题讨论】：

从文件中读取一行并不花时间。 cURL 正在花时间...看看是否可以将 curl 作为后台进程运行。
这是我经常使用parallel 做的事情。如果你展示你对parallel 和xargs 的尝试，也许有人会发现一个小问题，可以解决。

标签： bash curl xargs gnu-parallel

【解决方案1】：

GNU parallel 和 xargs 也一次处理一行（已测试）

你能举个例子吗？如果您使用-j，那么您一次应该能够运行多个进程。

我会这样写：

doit() {
    url="$1"
    urlstatus=$(curl -o /dev/null --silent --head --write-out  '%{http_code}' "${url}" --max-time 5 )
    echo "$url  $urlstatus"
}
export -f doit
cat "$1" | parallel -j0 -k doit >> urlstatus.txt

根据输入：

Input file is txt file and lines are separated  as
ABC.Com
Bcd.Com
Any.Google.Com
Something  like this
www.google.com
pi.dk

我得到了输出：

Input file is txt file and lines are separated  as  000
ABC.Com  301
Bcd.Com  301
Any.Google.Com  000
Something  like this  000
www.google.com  302
pi.dk  200

看起来很正确：

000 if domain does not exist
301/302 for redirection
200 for success

【讨论】：

我的行为就像 cat any.txt |并行 - j100 /bash。 sh filelist.txt
并且n1也被测试了
啊，这就解释了。考虑穿过man parallel_tutorial。它将解释这一点，并让您更好地理解它为什么会这样工作。
实际上您的代码（我测试过）非常快，但它提供的不是实际状态代码，而是 000 作为状态代码，任何建议
您需要从输入文件中提供几行示例来进行测试和调试。用那个更新你的问题。（并考虑将 -j 保留为 0）。

【解决方案2】：

您提到您在使用 GNU parallel 时运气不佳。可以试试这个方法吗？

format='curl -o /dev/null --silent --head --write-out "%{http_code}" "%s"; echo "%s"\n'

awk -v fs="$format" '{printf fs, $0, $0}' url-list.txt | parallel

想要例如128 个并发进程？

awk -v fs="$format" '{printf fs, $0, $0}' url-list.txt | parallel -P128

【讨论】：

【解决方案3】：

    #!/bin/bash
while read LINE; do
  curl -o /dev/null --silent --head --write-out '%{http_code}' "$LINE" & echo
  echo " $LINE"
done < url-list.txt

您正在逐行读取文件并将该行传递给获取内容的 curl ，然后当 CURL 完成时它将读取新行。所以为了避免你需要添加 & echo

一个讨厌的例子：

file="/tmp/url-list.txt"
echo "hello 1" >>$file 
echo "hello 2" >>$file
echo "hello3" >>$file 
while read line ;do 
  sleep 3 && echo "i run after sleep 3 - $line"  & echo "i runn as the same time of sleep 3"
done< "$file"

【讨论】：