python脚本在我的服务器上下载xml文件答案

【问题标题】：python script to download xml files on my serverpython脚本在我的服务器上下载xml文件
【发布时间】：2010-04-27 09:47:42
【问题描述】：

我需要一个 python 脚本来执行以下操作：

连接到一个 URL，该 URL 将返回一个类似 1200 的数字。
使用数字下载名为：1 到 x 的 xml 文件，其中 x 是 #1 中的数字。
将文件存储在特定目录中。

对不起，我从来没有写过 python 脚本，所以如果你能指导我，那就太好了（也许有一些 cmets）。

如果这很重要，我将把它作为一个 cron 作业运行。

【问题讨论】：

URL 返回的这个数字是多少？是页面上的数字吗？可以在 URL 的 HTML 中找到吗？
一定是python吗？这听起来像是一个相当简单的带有 wget 的 bash 脚本。
我想学习如何在python中做... URL会返回一个数字，这个数字是要下载的页面数。

标签： python scripting cron

【解决方案1】：

使用urllib的示例：

import urllib
import os

URL = 'http://someurl.com/foo/bar'
DIRECTORY = '/some/local/folder'

# connect to a URL, and that URL will return a number like 1200.
number = int(urllib.urlopen(URL).read())

# Use the number, to download xml files named: 
# 1 to x where x is the number from #1.
# store the files in a particular directory.
for n in xrange(1, number + 1):
    filename = '%d.xml' % (n,)
    destination = os.path.join(DIRECTORY, filename)
    urllib2.urlretrieve(URL + '/' + filename, destination)

【讨论】：

【解决方案2】：

如果你从来没有写过python脚本，你最好先找一个python教程。

一旦你对事情有了一点了解，就去看看

http://docs.python.org/library/

对于问题 #1，您需要查看

http://docs.python.org/library/internet.html

对于问题 #2，您可以执行类似的操作

max = 10 # assume from #1
for x in range(1, max+1):
    filename = 'some_file-' + str(x) + '.xml'
    # download the file - see above url for internet protocols
    # see http://docs.python.org/library/stdtypes.html#file-objects
    # for help on files

这个问题的细节非常模糊，虽然它不像家庭作业，但用你完全不知道的语言来做这件事不是一个好主意，特别是如果你正在运行它在 cron 中。

【讨论】：

它的“家庭作业”没问题，但不是为了学校，而是为了工作！好吧，它实际上对我来说是自己的网站，但它仍然有效。上学太老了:)