【问题标题】:Python threading - unexpected outputPython线程 - 意外输出
【发布时间】:2014-03-09 04:12:10
【问题描述】:

我是 Python 新手,在下面编写了一个线程脚本,它获取文件的每一行,并将其传递给 get_result 函数。如果是 200 或 301,get_result 函数应该输出 url 和状态码。

代码如下:

import requests
import Queue
import threading
import re
import time

start_time = int(time.time())
regex_to_use = re.compile(r"^")


def get_result(q, partial_url):
    partial_url = regex_to_use.sub("%s" % "http://www.domain.com/", partial_url)
    r = requests.get(partial_url)
    status = r.status_code
    #result = "nothing"
    if status == 200 or status == 301:
        result = str(status) + " " + partial_url
        print(result)


#need list of urls from file
file_list = [line.strip() for line in open('/home/shares/inbound/seo/feb-404s/list.csv', 'r')]
q = Queue.Queue()
for url in file_list:
    #for each partial. send to the processing function get_result
    t = threading.Thread(target=get_result, args=(q, url))
    t.start()

end_time = int(time.time())
exec_time = end_time - start_time
print("execution time was " + str(exec_time))

我使用了队列和线程,但发生的情况是“执行时间为 x”的打印输出线程完成输出数据之前。

即典型的输出是:

200 www.domain.com/ok-url
200 www.domain.com/ok-url-1
200 www.domain.com/ok-url-2
execution time was 3
200 www.domain.com/ok-url-4
200 www.domain.com/ok-ur-5
200 www.domain.com/ok-url-6

这是怎么回事,我想知道如何在脚本末尾显示脚本执行,即,一旦所有 url 都被处理和输出?

感谢 utdemir 下面给出的答案,这里是更新后的代码加入。

import requests
import Queue
import threading
import re
import time

start_time = int(time.time())
regex_to_use = re.compile(r"^")


def get_result(q, partial_url):
    partial_url = regex_to_use.sub("%s" % "http://www.domain.com/", partial_url)
    r = requests.get(partial_url)
    status = r.status_code
    #result = "nothing"
    if status == 200 or status == 301:
        result = str(status) + " " + partial_url
        print(result)


#need list of urls from file
file_list = [line.strip() for line in open('/home/shares/inbound/seo/feb-404s/list.csv', 'r')]
q = Queue.Queue()
threads_list = []

for url in file_list:
    #for each partial. send to the processing function get_result
    t = threading.Thread(target=get_result, args=(q, url))
    threads_list.append(t)
    t.start()

for thread in threads_list:
    thread.join()


end_time = int(time.time())
exec_time = end_time - start_time
print("execution time was " + str(exec_time))

【问题讨论】:

  • 您启动线程并继续执行。您不必等待他们完成,因此您会在他们(至少其中一些)完成之前打印“执行时间为 X”。等待线程完成使用 thread.join()

标签: python multithreading


【解决方案1】:

你应该join线程等待它们,否则它们将继续在后台执行。

像这样:

threads = []
for url in file_list:
    ...
    threads.append(t)

for thread in threads:
    thread.join() # Wait until each thread terminates

end_time = int(time.time()
...

【讨论】:

  • 非常感谢,效果很好。我将在上面发布新代码,并将其标记为答案。
猜你喜欢
  • 2019-12-17
  • 1970-01-01
  • 2014-11-30
  • 1970-01-01
  • 2013-10-31
  • 1970-01-01
  • 2021-12-24
  • 2020-08-04
相关资源
最近更新 更多