Python链接到文件迭代器不迭代答案

【问题标题】：Python Link to File Iterator not IteratingPython链接到文件迭代器不迭代
【发布时间】：2013-12-13 14:20:46
【问题描述】：

这个问题让我困扰了几天，我相信我终于将其范围缩小到了这段代码。如果有人能告诉我如何解决这个问题，以及为什么会这样，那就太棒了。

import urllib2

GetLink = 'http://somesite.com/search?q=datadata#page'
holder = range(1,3)

for LinkIncrement in holder:
    h = GetLink + str(LinkIncrement)
    ReadLink = urllib2.urlopen(h)
    f = open('test.txt', 'w')

    for line in ReadLink:
        f.write(line)  

    f.close()
    main() #calls function main that does stuff with the file
    continue

问题是它只会从'http://somesite.com/search?q=datadata#page' 写入数据，如果我执行以下操作，结果打印正确。

for LinkIncrement in holder:
    h = GetLink + str(LinkIncrement)
    print h

我正在复制的链接确实以这种方式增加，我可以通过复制和粘贴来打开网址。此外，我用while 循环尝试过这个，但总是得到相同的结果。

下面的代码打开 3 个标签页，其中包含递增的网址 /search?q=datadata#page1、/search?q=datadata#page2 和 /search?q=datadata#page3。只是不能让它在我的代码中工作。

import webbrowser
import urllib2
h = ''
def tab(passed):
    url = passed
    webbrowser.open_new_tab(url + '/')

def test():

    g = 'http://somesite.com/search?q=datadata#page'
    f = urllib2.urlopen(g)      
    NewVar = 1
    PageCount = 1

    while PageCount < 4:

            h = g + str(NewVar)                  
            PageCount += 1
            NewVar += 1
            tab(h)
test()

感谢 Falsetru 帮助我解决这个问题。该网站在第一页之后的所有页面都使用 json。

【问题讨论】：

标签： python loops python-2.7 urllib2

【解决方案1】：

url中#(fragment identifier)后面的部分没有传递给web服务器；服务器响应相同的内容，因为 framents 标识符之前的部分是相同的。

#something 由浏览器 (javascript) 处理。你需要看看 javascript 中发生了什么。

【讨论】：

我认为这也可能是 OP 的 previous question 的答案，只要这次我们确定确实有一个 #... 如果是这样，我希望我们知道下一个问题将是...... :)
并且文件 test.txt 在每个循环中被覆盖，这可能是也可能不是 OP 想要的。
我以为你可能已经搞定了，但我刚刚用我需要它做的代码更新了我的问题。当我把它放在我的代码中时，它无法正常工作。 @Matthias，我不关心以文本形式保留每个链接。
@Timmay，请参阅页面中包含的 javascript 代码。或者也许调查浏览器调试工具提供的网络信息会更容易。
@Timmay，第二页通过http://steamcommunity.com/market/search/render/?query=appid%3A570%20common&search_descriptions=0&start=10&count=10访问。第三页是http://steamcommunity.com/market/search/render/?query=appid%3A570%20common&search_descriptions=0&start=20&count=10 ...从开发人员工具（Chrome）的网络选项卡中找到此信息。正如我回答的那样，该位置中的 URL 不是传递给服务器的实际 URL。