【问题标题】:ValueError: Buffer has wrong number of dimensions (expected 1, got 2) on if in statementValueError:缓冲区在 if 语句中的维数错误(预期为 1,得到 2)
【发布时间】:2018-03-19 22:27:56
【问题描述】:

我正在尝试在“for”循环中使用“if”语句来检查循环中当前项目的索引(包含该项目的熊猫系列的索引)是否对应于以下索引之一另一个系列,但这样做会引发 ValueError。 这是给出问题的代码行:

if(ICM_items[ICM_items['track_id'] == i].index[0] in ICM_tgt_items.index.values.flatten().tolist()):

我尝试用随机整数或列表更改“in”语句的两侧,它可以工作,这两项也正确构建,但在语句中耦合时会引发错误。

希望有人能给我一些提示,说明问题出在哪里或执行相同任务的替代方法。

ICM_items 和 ICM_tgt_items 都是 pandas.Series

下面是控制台错误:

Traceback (most recent call last):
File "/Users/LucaButera/git/rschallenge/similarity_to_recommandable_builder.py", line 27, in <module>
dot[ICM_tgt_items[ICM_items[ICM_items['track_id'] == i].index[0]]] = 0
File "/Users/LucaButera/anaconda/lib/python3.6/site-packages/pandas/core/series.py", line 603, in __getitem__
result = self.index.get_value(self, key)
File "/Users/LucaButera/anaconda/lib/python3.6/site-packages/pandas/indexes/base.py", line 2169, in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas/index.pyx", line 98, in pandas.index.IndexEngine.get_value (pandas/index.c:3557)
File "pandas/index.pyx", line 106, in pandas.index.IndexEngine.get_value (pandas/index.c:3240)
File "pandas/index.pyx", line 147, in pandas.index.IndexEngine.get_loc (pandas/index.c:4194)
File "pandas/index.pyx", line 280, in pandas.index.IndexEngine._ensure_mapping_populated (pandas/index.c:6150)
File "pandas/src/hashtable_class_helper.pxi", line 446, in pandas.hashtable.Int64HashTable.map_locations (pandas/hashtable.c:9261)
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
[Finished in 1.26s]

【问题讨论】:

  • 您的问题的设置不是很清楚。如果您提供有代表性的样本数据会有所帮助。见How to create a Minimal, Complete, and Verifiable Example。另外,您确定只想查看与track_id == i 匹配的ICM_items 的第一个索引吗?如果返回多个索引怎么办?

标签: python pandas valueerror


【解决方案1】:

我建议您简化表达式,使用 .loc,并留意边缘情况(例如 track_id 对给定的 i 变为空)。
有了正确的测试数据,这些步骤应该可以帮助您缩小寻找错误的范围。

例如ICM_items数据:

import numpy as np
import pandas as pd

N = 7
max_track_id = 5
idx1 = ['A','B','C']
icm_idx = np.random.choice(idx1, size=N)
icm = {"track_id":np.random.randint(0, max_track_id, size=N)}
ICM_items = pd.DataFrame(icm, index=icm_idx)

ICM_items
   track_id
C         1
A         1
A         2
C         1
B         0
B         0
B         2

例如ICM_tgt_items数据:

idx2 = ['A','B']
icm_tgt_idx = np.random.choice(idx2, size=N)
icm = np.random.random(size=N)
ICM_tgt_items = pd.DataFrame(icm, index=icm_tgt_idx)

          0
B  0.785614
A  0.976523
A  0.856821
B  0.098086
B  0.481140
A  0.686156
A  0.851714

现在简单的比较和捕捉潜在的边缘情况:

for i in range(max_track_id):
    mask = ICM_items['track_id'] == i
    try:
        # use .loc for indexing, no need to flatten() or use .values on the right.
        if ICM_items.loc[mask].index[0] in ICM_tgt_items.index:
            print("found")
        else:
            print("not found")
    # catch error if i not found in track_id
    except IndexError as e:           
        print(f"ERROR at i={i}: {e}")

输出:

found
not found
found
ERROR at i=3: index 0 is out of bounds for axis 0 with size 0
ERROR at i=4: index 0 is out of bounds for axis 0 with size 0

【讨论】:

    猜你喜欢
    • 2022-08-07
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-10-19
    • 2011-12-28
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多