【问题标题】:Caught In Potential Infinite Loop陷入潜在的无限循环
【发布时间】:2019-11-20 20:47:46
【问题描述】:

我只是想从 pandas DataFrame 中创建一个地区名称和地区对象的列表,但由于某种原因,代码从未完成运行。我看不到任何可能成为无限循环的地方,所以我无法理解为什么每次运行它都会卡住。这是卡住的部分(尤其是 j 迭代的 for 循环):

import numpy as np
import pandas as pd

#make dataframe
data = pd.read_csv('gun-violence-data_01-2013_03-2018.csv', header=0, delimiter=',')

#drop data points with null condressional district values
data = data[data.congressional_district != 0]
data.dropna(axis=0,how='any',subset=['congressional_district'],inplace= True)

#constructing working table
table = data[['incident_id','state','congressional_district']]

#list of districts. Formatting in original file must be corrected to analyze data
districtNames = ['filler1','filler2']
districts = []
s = table.shape

#loop thru the rows of the table
for i in range(s[0]):
    check = True

    #build strings for each district
    ds = table.iloc[i,1] + str(table.iloc[i,2])
    #testString = str(table.iloc[i,2])

    #append ds to districtNames if it isnt in already
    #make array of District Objects
    for j in range(len(districtNames)):
        if(ds == districtNames[j]):
            check = False
        if(check):
            districtNames.append(ds)
            districts.append(District(ds,0))

作为参考,这里是 District 类:

class District:
def __init__(self, name, count):
    self._name = name
    self._count = count
def get_name(self):
    return name
def get_count(self):
    return count
def updateCount(self,amount):
    self._count += amount

最初的 .csv 文件相当大,切掉第 8 行和第 9 行的一些数据点后,我还剩下 227,312 个数据点。我知道这很多,但是代码在运行 5 分钟后甚至都没有完成。我做错了什么?

【问题讨论】:

  • 放入一些打印行并调试
  • 不是修复方法,但您可以将 for j in range(len(districtNames)): 缩短为 for districtName in districtNames:
  • print 是你的朋友,在每个循环的开头贴一个,这样你就可以看到正在发生的事情以及事情的进展

标签: python python-3.x


【解决方案1】:

不是它不会终止,而是它在当前状态下效率低下。试试这样的:

import numpy as np
import pandas as pd

class District:
    def __init__(self, name, count):
        self._name = name
        self._count = count
    def get_name(self):
        return name
    def get_count(self):
        return count
    def updateCount(self,amount):
        self._count += amount

#make dataframe
data = pd.read_csv('gun-violence-data_01-2013_03-2018.csv', header=0, delimiter=',')

#drop data points with null condressional district values
data = data[data.congressional_district != 0]
data.dropna(axis=0,how='any',subset=['congressional_district'],inplace= True)

#constructing working table
table = data[['incident_id','state','congressional_district']]

#list of districts. Formatting in original file must be corrected to analyze data
districtNames = (table.state + table.congressional_district.astype(str)).unique()
districts = list(map(lambda districtName: District(districtName, 0), districtNames))

【讨论】:

  • 工作就像一个魅力!我不知道你甚至可以在 Python 中做到这一点。非常感谢!
【解决方案2】:

您可以使用tqdm 包查看您的代码停留在哪个循环中。

import tqdm from tqdm 
for i in tqdm(range(s[0]), position=0, leave=True):
    check = True

    #build strings for each district
    ds = table.iloc[i,1] + str(table.iloc[i,2])
    #testString = str(table.iloc[i,2])

    #append ds to districtNames if it isnt in already
    #make array of District Objects
    for j in range(len(districtNames)):
        if(ds == districtNames[j]):
            check = False
        if(check):
            districtNames.append(ds)
            districts.append(District(ds,0))

【讨论】:

    猜你喜欢
    • 2020-05-13
    • 1970-01-01
    • 2013-03-17
    • 2021-02-15
    • 2022-01-23
    • 2021-01-23
    相关资源
    最近更新 更多