使用 AJAX 内容的 HTML5 历史 URL（无 hashbang）将 Facebook Scraper 重定向到 /?_escaped_fragment_=答案

【问题标题】：Redirect Facebook Scraper to /?_escaped_fragment_= with HTML5 history URLs (no hashbang) for AJAX content使用 AJAX 内容的 HTML5 历史 URL（无 hashbang）将 Facebook Scraper 重定向到 /?_escaped_fragment_=
【发布时间】：2013-11-23 02:09:53
【问题描述】：

如果您使用 hashbang URL，例如 /#!/path/to/content，Facebook 抓取工具（以及 Googlebot）将自动转发到 /?_escaped_fragment_=/path/to/content，您可以在其中呈现内容服务器端以供抓取工具使用。

对于 Google，如果您包含片段元标记 (<meta name="fragment" content="!">)，您可以使用 HTML5 历史样式 URL（例如，简单的 /path/to/content），它仍然会知道重定向到转义的片段 URL。

Facebook 似乎不支持这一点。它将重定向到您将og:url 元标记设置为的任何内容，但我不确定这是否是 og:url 标记的正确用法。

【问题讨论】：

标签： facebook facebook-graph-api facebook-social-plugins

【解决方案1】：

这是未经测试的，但我相信你可以嗅出 Facebook 机器人的用户代理，并据此将其转发到 /?_escaped_fragment_ URL。

【讨论】：

【解决方案2】：

今天在 Twitter 上与您交谈并进行了自己的研究后，我发现唯一适合我的解决方案如下：

我正在使用 node+express。我首先检查谷歌爬虫的查询字符串，但如果用户代理是 facebook，我使用它来代替我的片段变量。然后我解析 url 并匹配我使用 grunt-htmlSnapshot 插件创建的快照之一。

app.use(function(req, res, next) {
      var userAgent = req.headers['user-agent'];

      var fragment = req.query._escaped_fragment_;

      if (userAgent.indexOf('facebookexternalhit') >= 0) {
        fragment = req.url;
      }

      // If there is no fragment in the query params
      // then we're not serving a crawler
      if (!fragment) return next();

      // If the fragment is empty, serve the
      // index page
      if (fragment === "" || fragment === "/")
        fragment = "/.html";

      // If fragment does not start with '/'
      // prepend it to our fragment
      if (fragment.charAt(0) !== "/")
        fragment = '/' + fragment;

      // If fragment does not end with '.html'
      // append it to the fragment
      if (fragment.indexOf('.html') == -1)
        fragment += ".html";

      fragment = fragment.replace(/\//g, '_');
      // Serve the static html snapshot
      try {
        var file = "./snapshots/snapshot_" + fragment;
        res.sendfile(file);
      } catch (err) {
        res.send(404);
      }
    });

我所有的快照都存储在 ./snapshots 中，“/contact/”页面的快照示例是：./snapshots/snapshot__contact.html

这一切都经过测试，效果很好！

【讨论】：