【问题标题】:iterating on glob set doesn't work with if condition迭代 glob 集不适用于 if 条件
【发布时间】:2017-09-10 19:33:13
【问题描述】:

我有一列由一组字符串组成,如下所示:

npa = pd.read_csv("file_names.csv", usecols=[3,5,6, 7, 8, 9], header=None)
npa.iloc[:,0]
XML_0_1841729699_001
XML_0_1841729699_00nn
XML_0_1841729699_00145
XML_0_1841729699_00145
XML_0_1841729699_00178
XML_0_1841729699_001jklm
XML_0_1841729699_001fjmfd

并且我的 png 名称如下:

path_img = "/images"
os.chdir(path_img)
images_name = glob.glob("*.png")
set_img = set([x.rsplit('.', 1)[0] for x in images_name])
set_img
set(['XML_0_1841729699_001fjmfd', XML_0_1841729699_00145','XML_0_1841729699_001','XML_0_1841729699_00178'])

我想在处理之前检查set_img 中的名称是否与数据框中的名称匹配:

for i in range(1, 30):
    for img_name in set_img:
        if (img_name==npa.iloc[i,0]):  # 0 corresponds to the the column of string 
            print("it works")

但是它不检查条件 if。 怎么了?

编辑1:

f = open("file_names.csv", 'rt')
reader = csv.reader(f)
for row in reader:
    if cpt >= 1:  # skip header
        characs.append(str(row[5]))
    cpt += 1

path_img = "/images"
os.chdir(path_img)
images_name = glob.glob("*.png")
set_img = set([x.rsplit('.', 1)[0] for x in images_name])
mask = npa.iloc[:,0].isin(set_img)
for img in set_img:

    img = cv2.imread(path_img+'/'+ img +'.png')
    print(img.shape)

    images = []
    images_names = []
    WIDTH=[]
    HEIGHT=[]

    for i in range(1, nb_charac):
        if (img==npa[mask].iloc[i,0]):
            print("hello")
            coords = npa.iloc[[i]]
            charac = characs[i - 1]

我收到以下错误:

 FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  if (img==npa[mask].iloc[i,0]):
Traceback (most recent call last):
  File "/to_test.py", line 186, in <module>
    if (img==npa[mask].iloc[i,0]):
  File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1225, in __getitem__
    return self._getitem_tuple(key)
  File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1449, in _getitem_tuple
    self._has_valid_tuple(tup)
  File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 127, in _has_valid_tuple
    if not self._has_valid_type(k, i):
  File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1417, in _has_valid_type
    return self._is_valid_integer(key, axis)
  File "/usr/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1431, in _is_valid_integer
    raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds

EDIT2:

然后我替换了:

if (img==npa[mask].iloc[i,0]):

通过

if (img==npa[mask][3][i]):

它一直有效,直到某一行出现以下错误:

    if (img==npa[mask][3][i]):
  File "/usr/lib/python2.7/dist-packages/pandas/core/series.py", line 557, in __getitem__
    result = self.index.get_value(self, key)
  File "/usr/lib/python2.7/dist-packages/pandas/core/index.py", line 1790, in get_value
    return self._engine.get_value(s, k)
  File "pandas/index.pyx", line 103, in pandas.index.IndexEngine.get_value (pandas/index.c:3204)
  File "pandas/index.pyx", line 111, in pandas.index.IndexEngine.get_value (pandas/index.c:2903)
  File "pandas/index.pyx", line 157, in pandas.index.IndexEngine.get_loc (pandas/index.c:3843)
  File "pandas/hashtable.pyx", line 303, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6525)
  File "pandas/hashtable.pyx", line 309, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6463)
KeyError: 2035

【问题讨论】:

  • 在你的 if 语句中,你删除了iloc,它应该是if(img_name==npa.iloc[i,0])
  • 对不起,我写得不好。实际上在我的代码中我做了 if(img_name==npa.iloc[i,0])

标签: python csv pandas dataframe glob


【解决方案1】:

使用isin 创建一个布尔掩码。然后使用该掩码过滤数据框。这相当于循环遍历每一行并检查第一列是否在集合中。

mask = npa.iloc[:,0].isin(set_img)
npa[mask]

【讨论】:

  • 不确定是否理解您的答案。我如何在我的 for 循环中使用它 if condition is it if (img_name==npa[mask]): ?
  • 我认为是 if (img==npa[mask].iloc[i,0]):
  • @vincent 你的循环除了打印什么都不做。我给你一种方法来一次检查每一行的条件。然后,您可以将数据框减少到仅条件为真的那些行。这是一个非常典型的预处理步骤。您的问题并不完全清楚,因为您没有生产任何东西。
  • 绝对,我更新了我的代码。请看更新
猜你喜欢
  • 1970-01-01
  • 2020-10-22
  • 2012-10-24
  • 1970-01-01
  • 1970-01-01
  • 2022-06-28
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多