基于 URL 模板合并相对路径答案

【问题标题】：Merging relative paths based on URL template基于 URL 模板合并相对路径
【发布时间】：2017-08-31 07:49:34
【问题描述】：

我正在尝试合并用户提供的 URL 相对路径和文件路径。例如，如果给我以下项目：

url_base = 'http://myserver.com/my/path/to/files'
path = 'path/to/files/foo.txt'

想要的输出是

http://myserver.com/my/path/to/files/foo.txt

URL 和文件之间的公共路径元素已合并的位置； my/path/to/files 和 path/to/files/foo.txt 组合得到 my/path/to/files/foo.txt，它被附加到 URL 的基础上。

我能得到的最接近的是：

# python 2.7
import os
import urlparse
from collections import OrderedDict

url_base = 'http://myserver.com/my/path/to/files'
path = 'path/to/files/foo.txt'

url = urlparse.urlparse(url_base)
print(url)
# ParseResult(scheme='http', netloc='myserver.com', path='/my/path/to/files', params='', query='', fragment='')

merge_path = os.path.join(url.path, path)
print(merge_path)
# /my/path/to/files/path/to/files/foo.txt

# take an ordered set of the path components
# this is not good because it assumes '/' is the split key
merge_path_set = list(OrderedDict.fromkeys(merge_path.split('/')))
print(merge_path_set)
# ['', 'my', 'path', 'to', 'files', 'foo.txt']

path_joined = os.path.join(*merge_path_set)
print(path_joined)
# my/path/to/files/foo.txt

# THIS DOESN'T WORK:
url_joined = urlparse.urljoin(url.netloc, path_joined)
print(url_joined)
# my/path/to/files/foo.txt

似乎应该有一种更好的方法来做到这一点，即利用内置库而不是像我在这里所做的那样手动拆分 '/' 并采用有序集。我还没有想出如何将它返回到 URL 中进行输出。有什么想法吗？

【问题讨论】：

标签： python url path

【解决方案1】：

如果您将第二个参数与url_base 的路径组件保持一致，urljoin() 可以正常工作。

对于 Python 2.7：

from urlparse import urljoin

url_base = 'http://myserver.com/my/path/to/files'
path = 'path/to/files/foo.txt'

final_url = urljoin(url_base, '/my/' + path)

# http://myserver.com/my/path/to/files/foo.txt

对于 Python 3：

from urllib.parse import urljoin

url_base = 'http://myserver.com/my/path/to/files'
path = 'path/to/files/foo.txt'

final_url = urljoin(url_base, '/my/' + path)

# http://myserver.com/my/path/to/files/foo.txt

假设path 的path/to/files 将始终与url_base 的path/to/files 组件匹配，并且您可以将“/”附加到url_base，尽管它确实使用了@987654330 的变体@，你可以这样做：

import os
from urlparse import urljoin

url_base = 'http://myserver.com/my/path/to/files/'
path = 'path/to/files/foo.txt'

final_url = urljoin(url_base, os.path.split(path)[-1])

print(final_url)
# http://myserver.com/my/path/to/path/to/files/foo.txt

【讨论】：

在这种情况下，'/my/' 是硬编码的，因此您不能在具有可变输入的程序中使用它
您能否添加到您的问题中以准确说明您的约束是什么？正如所写，它似乎暗示http://myserver.com/my/ 是url_base 的一个常量组成部分。我会用更多信息调整我的答案。干杯！
更新问题
我仍然不清楚为什么final_url = urljoin(url_base, '/my/' + path) 不适合你？如果它确实是硬编码的，那么这个代码正是似乎需要的。所有这一切都是在说，要合并路径，请记住 /my/ 是实际上是硬编码的，并将路径与考虑的路径合并。您基本上是在告诉urljoin 在合并两个path/to/files/foo.txt 路径时保留'/my/'。或者更确切地说，另一种说法是“合并这两条路径，就好像/my/ 不存在一样”。有了您修改后的问题，它似乎更更多正是您所要求的。
还有另一种表达方式：通过使用'/my/' + path，您正在向urljoin 展示您希望如何将“变量输入”与url 合并。您的问题似乎暗示您想采用 http://myserver.com/my/path/to/files/ 和 path/to/files/foo.txt 并将它们合并以结束 http://myserver.com/my/path/to/files/foo.txt 并且 http://myserver.com/my/ 将保持不变。这正是这段代码所做的。您能否通过进一步澄清您的问题来解释我在这里遗漏的内容？