如何在bash中仅grep没有任何扩展名的目录答案

【问题标题】：How to grep only the directories without any extensions in bash如何在bash中仅grep没有任何扩展名的目录
【发布时间】：2021-09-19 21:22:08
【问题描述】：

假设我有一个名为 URL.txt 的 URL 列表，我只想输出目录而不是 .html、.php 等文件或扩展名。如果它在 URL 中找到任何扩展名或任何文件脚本应该转到下一个 URL

- https://example.com/tradings/trade/trading?currency=usdt&dest=btc&tab=limit
- https://example.com/account/signup/accounts/signin/account.html

我想要这样的结果：

- https://example.com/tradings/
- https://example.com/tradings/trade/
- https://example.com/account/
- https://example.com/account/signup/
- https://example.com/account/signup/accounts/
- https://example.com/account/signup/accounts/signin/

我试过这个命令，但它不会转换成完整的 URL 端点。我想要一个没有任何扩展的完整 URL 端点。

cat Urls.txt | rev | cut -d'/' -f 2 | sort -u | rev

【问题讨论】：

标签： grep cut

【解决方案1】：

Perl 来救援！

perl -lne '@parts = split m{/}; print join "/", @parts[0 .. $_] for 3 .. $#parts - 1' < URL.txt

-n 逐行读取输入并运行每一行的代码
-l 从输入中删除换行符并将它们添加到 print
每一行在/ 上分开。然后，我们重新连接从 3 开始的部分，直到最后一个部分。
有关详细信息，请参阅 split 和 join。

【讨论】：

【解决方案2】：

我建议使用awk:

awk 'BEGIN{FS=OFS="/"}{$NF=""}!seen[$0]++' URLS.txt

解释：

# Set the input field separator (FS) and the
# output fields separator (OFS) to a forward slash /
BEGIN{
    FS=OFS="/"
}

{
    # NF is a speacial variable and contains the number of fields.
    # Therefore $NF is the last field. Assign an empty to string to it
    $NF=""
}

# The variable 'seen' is an associative array, initialized on demand
# upon first usage. We are using it as a lookup to prevent printing
# the same url path twice.
!seen[$0]++

PS：您的初始命令几乎可以正常工作，只是 cut 命令错误：您使用的是cut -f2，它将打印第二个字段，但您想要cut -f2-，它将打印倒数第二个字段字段：

rev Urls.txt  | cut -d'/' -f 2- | sort -u | rev

【讨论】：

【解决方案3】：

由于您的标题提到了grep，所以这里有一个grep 解决方案

grep -o '.*/[^a-z0-9A-Z]*' input_file

这将匹配直到最后一次出现 / 的任何 URL，最后一次出现之后的任何内容都不会匹配输出

$ grep -o '.*/[^a-z0-9A-Z]*' input_file
- https://example.com/tradings/trade/
- https://example.com/account/signup/accounts/signin/

这也可以通过sed完成

sed -E 's|(.*/).*|\1|' input_file

【讨论】：

【解决方案4】：

如果你想把它做成单线，

[gnm]awk 'BEGIN {OFS=FS="/"} (1<NF) && _==__[$(_^--NF)]++'

请帮助破译这个：

当您尝试将零分配给 NF 时出现 awk 错误，因此 (1 < NF) 是一项安全检查。使用 $NF 检查缩短它有一个陷阱 - 如果最后一列中的输入数据类似于数字零，则该条件会无意中评估为 False

_ 是一个从未初始化的变量，因此它与 0/False 相同。我是这样写的，因为我的 shell 脚本起作用了偶尔用那个“！”标记 bash 太急于扩展

__ 是可见数组

--NF 自动清除最右边的列，即基本名称

由于我们之前已确保 NF >= 2，无论输入如何， $(_^--NF) 的计算结果为 $(0)，因为从零到任何非零的幂始终为零。

其余的和上面其他人详细解释的一样。

【讨论】：