Google 的 <noindex> 标记答案

【问题标题】：<noindex> tag for GoogleGoogle 的 <noindex> 标记
【发布时间】：2013-03-19 02:09:13
【问题描述】：

我想告诉 Google 不要将页面的某些部分编入索引。在 Yandex（俄罗斯 SE）中有一个非常有用的标签，叫做<noindex>。谷歌如何做到这一点？

【问题讨论】：

标签： seo googlebot yandex noindex

【解决方案1】：

不，Google does not support the <noindex> tag。几乎没有人这样做。

【讨论】：

Google 不认可的任何方式：webmasters.stackexchange.com/questions/16390/…
“几乎没有人”至少包括 Yandex，请参阅 my answer。但谁真正关心那个，是另一个问题。

【解决方案2】：

您可以通过将这些部分放入被 robots.txt 阻止的 iframe 中来阻止 Google 看到页面的某些部分。

robots.txt

Disallow: /iframes/

index.html

This text is crawlable, but now you'll see 
text that search engines can't see:
<iframe src="/iframes/hidden.html" width="100%" height=300 scrolling=no>

/iframes/hidden.html

Search engines cannot see this text.

您可以使用 AJAX 加载隐藏文件的内容，而不是使用 iframe。这是一个使用 jquery ajax 执行此操作的示例：

his text is crawlable, but now you'll see 
text that search engines can't see:
<div id="hidden"></div>
<script>
    $.get(
        "/iframes/hidden.html",
        function(data){$('#hidden').html(data)},
    );
</script>

【讨论】：

请注意，AJAX 部分不再正确。大多数搜索引擎评估 JavaScript 并执行 XHR 调用。
如果你全部被Ajax加载被JavaScript禁止，搜索引擎即使执行一般的JavaScript也看不到它。

【解决方案3】：

在您的根级别创建一个 robots.txt 文件并插入如下内容：

屏蔽谷歌：

User-agent: Googlebot
Disallow: /myDisallowedDir1/
Disallow: /myDisallowedPage.html
Disallow: /myDisallowedDir2/

阻止所有机器人：

User-agent: *
Disallow: /myDisallowedDir1/
Disallow: /myDisallowedPage.html
Disallow: /myDisallowedDir2/

一个方便的 robots.txt 生成器：

http://www.mcanerin.com/EN/search-engine/robots-txt.asp

【讨论】：

teslasimus 不想屏蔽整个页面，只屏蔽“某些部分”。
好点，我的答案可以与上面提出的 iframe 解决方案一起使用

【解决方案4】：

根据维基百科¹，有一些规则一些蜘蛛遵循：

<!--googleoff: all-->
This should not be indexed by Google. Though its main spider, Googlebot,
might ignore that hint.
<!--googleon: all-->

<div class="robots-nocontent">Yahoo bots won't index this.</div>

<noindex>Yandex bots ignore this text.</noindex>
<!--noindex-->They will ignore this, too.<!--/noindex-->

不幸的是，他们似乎无法就单一标准达成一致——据我所知，没有什么可以阻止所有蜘蛛...

googleoff: 评论似乎支持不同的选项，但我不确定哪里有完整的列表。至少有：

全部：完全忽略该块
索引：内容没有进入 Google 的索引
anchor：链接的锚文本不会与目标页面相关联
sn-p：文本不会用于为搜索结果创建 sn-ps

还要注意（至少对 Google 而言）这只会影响 搜索索引，而不是页面排名等。此外，正如 Stephen Ostermiller 在下面的评论中正确指出的那样，@987654341 @ 和 googleoff only work with the Google search appliance and have no effect on normal Googlebot，很遗憾。

还有一篇关于 Yahoo 部分的文章²（还有一篇文章描述了 Yandex 也向 <noindex>⁶ 致敬）。在googleoff:部分，也可以看this answer，这篇文章大部分相关信息我都取自。³

此外，Google Webmaster Tools 建议对特定链接使用rel=nofollow 属性⁴（例如广告或机器人无法访问/无用的页面链接，例如登录/注册）。这意味着，HTML a rel Attribute 应该受到 Google 机器人的尊重——尽管这主要与页面排名有关，而不是与搜索索引本身有关。不幸的是，似乎没有rel=noindex^5,7。我也不确定这个属性是否也可以用于其他元素（例如<DIV REL="noindex">）；但除非爬虫尊重“noindex”，否则这也没有意义。

更多参考资料：

How to Noindex parts of a web page?
Excluding crawler from sections of pages（Spiderline 爬虫；你看，其他爬虫可能使用其他专有标记（另请参阅AddSearch 爬虫）。我希望他们只是将REL="noindex" 设为标准，而不是与任何 HTML 标记（例如 DIV/SPAN/）一起使用P/A！）
Preventing Google from indexing the contents of a div by reversing the string
Methods for preventing search engines from indexing irrelevant content on a page

¹Wikipedia: Noindex
²Which Sections of Your Web Pages Might Search Engines Ignore?
³Tell Google to Not Index Certain Parts of Your Page
^{4Use rel="nofollow" for specific links
⁵Is it a good idea to use <a href=“http://name.com” rel=“noindex, nofollow”>name</a>?
⁶Using HTML tags — Yandex.Help. Webmaster
⁷existing REL values} p>

【讨论】：

googleoff 和 googleon only work with the Google search appliance and have no effect on normal Googlebot
@StephenOstermiller 是的，我同时也想到了这一点。感谢您指出，我完全忘了在这里更新！
由于您的答案很长，说明它是错误的评论可能会被忽视。能否请您在开头添加一个声明，以警告要避免此解决方案？
Roger 那，@Frederic – 没错。完成，感谢您的指出！