像javascript一样在python中递归解码URI组件答案

【问题标题】：Recursive decoding of URI component in python like javascript像javascript一样在python中递归解码URI组件
【发布时间】：2013-02-05 07:28:56
【问题描述】：

我有一个编码的 URI 组件 "http://www.yelp.com/biz/carriage-house-caf%25C3%25A9-houston-2"。我可以通过递归应用 decodeURIComponent 函数将其转换为"http://www.yelp.com/biz/carriage-house-café-houston-2"，如下所示

function recursiveDecodeURIComponent(uriComponent){
        try{
            var decodedURIComponent = decodeURIComponent(uriComponent);
            if(decodedURIComponent == uriComponent){
                return decodedURIComponent;
            }
            return recursiveDecodeURIComponent(decodedURIComponent);
        }catch(e){
            return uriComponent;
        }
    }
    console.log(recursiveDecodeURIComponent("http://www.yelp.com/biz/carriage-house-caf%25C3%25A9-houston-2"))

输出："http://www.yelp.com/biz/carriage-house-café-houston-2"。

我想在 python 中得到同样的结果。我尝试了以下方法：

print urllib2.unquote(urllib2.unquote(urllib2.unquote("http://www.yelp.com/biz/carriage-house-caf%25C3%25A9-houston-2").decode("utf-8")))

但我得到了http://www.yelp.com/biz/carriage-house-cafÃ©-houston-2。而不是预期的字符é，我得到'Ã©'，无论调用多少次urllib2.unquote。

我用的是python2.7.3，谁能帮帮我？

【问题讨论】：

标签： javascript python python-3.x python-2.7 urllib2

【解决方案1】：

我想一个简单的循环应该可以解决问题：

uri = "http://www.yelp.com/biz/carriage-house-caf%25C3%25A9-houston-2"

while True:
    dec = urllib2.unquote(uri)
    if dec == uri:
        break
    uri = dec

uri = uri.decode('utf8')
print '%r' % uri  
# u'http://www.yelp.com/biz/carriage-house-caf\xe9-houston-2'

【讨论】：

我的错，我只是在测试其他东西，并被混合编码/非编码的取消引用推迟了......没关系:)