模板 url 反转转义 surt 参数答案

【问题标题】：template url reversal escaping surt arguments模板 url 反转转义 surt 参数
【发布时间】：2014-02-13 20:35:58
【问题描述】：

我遇到了模板 url 反转正在转义冒号和括号字符的问题。我希望这些字符在锚标记的 href 属性中保持未转义。当我在 django 1.3 中时，它曾经是这种行为，但升级到 1.6，我注意到它的行为不像我想要的那样。

我有什么：

surt = 'http://(gov/'
browse_domain = 'gov'
... in template ...
<a href="{% url 'nomination.views.url_surt' project.project_slug surt %}">{{ browse_domain }}</a>

这会产生：

<a href="/nomination/eth2008/surt/http%3A//%28gov/">gov</a>

如您所见，冒号: 和左括号( 字符在url href 属性中被转义。我不想那样。

我想要什么：

surt = 'http://(gov/'
browse_domain = 'Gov'
... in template ...
<a href="{% url 'nomination.views.url_surt' project.project_slug surt %}">{{ browse_domain }}</a>

这会产生：

<a href="/nomination/eth2008/surt/http://(gov/">gov</a>

有人知道当我在锚标记中反转 URL 时如何防止这些字符转义吗？

【问题讨论】：

标签： django url django-templates escaping

【解决方案1】：

注意：以下答案是错误的。 urllib.quote(safe=':()') 确实会保持这些安全字符不转义。 django 中发生了其他事情导致这个问题，我仍然不知道它在哪里。

在 Django 1.6 中，模板中的任何 url 反转将首先通过 iri_to_uri()，然后再呈现为 HTML。在对 url reverse {% url %} as-is 的模板调用中没有对此进行覆盖。

通知 this bit 斜体文本详细说明更改。

这是iri_to_uri()

def iri_to_uri(iri):
    """
    Convert an Internationalized Resource Identifier (IRI) portion to a URI
    portion that is suitable for inclusion in a URL.

    This is the algorithm from section 3.1 of RFC 3987.  However, since we are
    assuming input is either UTF-8 or unicode already, we can simplify things a
    little from the full method.

    Returns an ASCII string containing the encoded result.
    """
    # The list of safe characters here is constructed from the "reserved" and
    # "unreserved" characters specified in sections 2.2 and 2.3 of RFC 3986:
    #     reserved    = gen-delims / sub-delims
    #     gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"
    #     sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
    #                   / "*" / "+" / "," / ";" / "="
    #     unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
    # Of the unreserved characters, urllib.quote already considers all but
    # the ~ safe.
    # The % character is also added to the list of safe characters here, as the
    # end of section 3.1 of RFC 3987 specifically mentions that % must not be
    # converted.
    if iri is None:
        return iri
    return urllib.quote(smart_str(iri), safe="/#%[]=:;$&()+,!?*@'~")

乍一看，:、( 和 ) 可能不会受到转义的十六进制编码的影响，因为它们作为“安全”传递给 urllib.quote()：

_safe_map = {}
for i, c in zip(xrange(256), str(bytearray(xrange(256)))):
    _safe_map[c] = c if (i < 128 and c in always_safe) else '%{:02X}'.format(i)
_safe_quoters = {}

def quote(s, safe='/'):
    # fastpath
    if not s:
        if s is None:
            raise TypeError('None object cannot be quoted')
        return s
    cachekey = (safe, always_safe)
    try:
        (quoter, safe) = _safe_quoters[cachekey]
    except KeyError:
        safe_map = _safe_map.copy()
        safe_map.update([(c, c) for c in safe])
        quoter = safe_map.__getitem__
        safe = always_safe + safe
        _safe_quoters[cachekey] = (quoter, safe)
    if not s.rstrip(safe):
        return s
    return ''.join(map(quoter, s))

如果您逐步执行上面显示的实际urllib.quote() 方法，“安全”实际上意味着这些字符将被转义/引用。最初，我认为“安全”的意思是“避免引用”。这给我带来了很大的困惑。我猜他们的意思是，“安全”作为“RFC-3986 的安全条款 2.2 和 2.3-of-RFC-3986”。也许一个更精心命名的关键字参数会是谨慎的，但话又说回来，关于urllib，我发现一大堆东西很尴尬。 ‎ಠ_ಠ

经过大量研究，并且由于我们不想修改 Django 核心方法，我们的团队决定在模板中做一些 hacky url-construction（非常友好的 Django 文档strongly eschew）。它并不完美，但它适用于我们的用例。

【讨论】：