如何避免使用 wget 下载链接答案

【问题标题】：how to avoid links to be downloaded using wget如何避免使用 wget 下载链接
【发布时间】：2012-12-05 10:44:39
【问题描述】：

我正在尝试下载以下站点http://computerone.altervista.org的一些页面，只是为了测试......

我的目标是只下载符合以下模式“*JavaScript*”和“*index*”的页面。

实际上，如果我尝试以下选项

wget \
-A "*Javascript*, *index*" \
--exclude-domains http://computerone.altervista.org/rss-articles/ \
-e robots=off \
--mirror -E -k -p -np -nc --convert-links  \
--wait=5 -c  \
http://computerone.altervista.org

它可以正常工作，因为它也会尝试下载http://computerone.altervista.org/rss-articles/。

我的问题是：

为什么它会尝试下载http://computerone.altervista.org/rss-articles/ 页面？
我应该如何避免它？我试过--exclude-domains http://computerone.altervista.org/rss-articles/选项，但它会尝试下载它

附：
查看我得到的源页面：

<link rel="alternate" type="application/rss+xml" title="RSS 2.0" href="rss-articles/" />

【问题讨论】：

标签： download wget

【解决方案1】：

wget -p 下载所有页面要求：

人 wget：

要结束这个话题，有必要了解一下 Wget 的外部文档链接是在 <A> 标记中指定的任何 URL， <AREA> 标签，或<LINK> 标签以外的<LINK REL="stylesheet">。

排除rss-articles使用-X或--exclude-directories

wget -A "*Javascript*, *index*" -X "rss-articles" -e robots=off --mirror -E -k -p -np -nc -c http://computerone.altervista.org

【讨论】：