【问题标题】:Recursive decoding of URI component in python like javascript像javascript一样在python中递归解码URI组件
【发布时间】:2013-02-05 07:28:56
【问题描述】:

我有一个编码的 URI 组件 "http://www.yelp.com/biz/carriage-house-caf%25C3%25A9-houston-2"。我可以通过递归应用 decodeURIComponent 函数将其转换为"http://www.yelp.com/biz/carriage-house-café-houston-2",如下所示

function recursiveDecodeURIComponent(uriComponent){
        try{
            var decodedURIComponent = decodeURIComponent(uriComponent);
            if(decodedURIComponent == uriComponent){
                return decodedURIComponent;
            }
            return recursiveDecodeURIComponent(decodedURIComponent);
        }catch(e){
            return uriComponent;
        }
    }
    console.log(recursiveDecodeURIComponent("http://www.yelp.com/biz/carriage-house-caf%25C3%25A9-houston-2"))

输出:"http://www.yelp.com/biz/carriage-house-café-houston-2"

我想在 python 中得到同样的结果。 我尝试了以下方法:

print urllib2.unquote(urllib2.unquote(urllib2.unquote("http://www.yelp.com/biz/carriage-house-caf%25C3%25A9-houston-2").decode("utf-8")))

但我得到了http://www.yelp.com/biz/carriage-house-café-houston-2。而不是预期的字符é,我得到'é',无论调用多少次urllib2.unquote。

我用的是python2.7.3,谁能帮帮我?

【问题讨论】:

    标签: javascript python python-3.x python-2.7 urllib2


    【解决方案1】:

    我想一个简单的循环应该可以解决问题:

    uri = "http://www.yelp.com/biz/carriage-house-caf%25C3%25A9-houston-2"
    
    while True:
        dec = urllib2.unquote(uri)
        if dec == uri:
            break
        uri = dec
    
    uri = uri.decode('utf8')
    print '%r' % uri  
    # u'http://www.yelp.com/biz/carriage-house-caf\xe9-houston-2'
    

    【讨论】:

    • 我的错,我只是在测试其他东西,并被混合编码/非编码的取消引用推迟了......没关系:)
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-06-17
    • 1970-01-01
    • 2015-09-29
    • 2021-08-02
    • 2015-05-19
    • 1970-01-01
    相关资源
    最近更新 更多