构建数据矩阵时如何处理“索引越界”？答案

【问题标题】：How to handle "Index out of bounds" while building the data-matrix?构建数据矩阵时如何处理“索引越界”？
【发布时间】：2020-12-11 18:40:24
【问题描述】：

我正在尝试构建一个大小为(n_users, n_items) 的效用矩阵，但出现index is out of bounds 错误。从错误中，很明显我试图到达矩阵范围之外的元素，但我不知道如何形成矩阵来处理这个问题。如果有什么建议我会为你考虑的。

这是我的代码：

## Import the required libraries
import pandas as pd
import numpy as nm
from scipy import spatial
from sklearn.metrics.pairwise import pairwise_distances
from sklearn.preprocessing import MinMaxScaler

user_artists = pd.read_csv("./user_artists.dat", sep='\t+', engine='python')
#user_artists has three features ['userID','artistID','weight']
n_users = user_artists.userID.nunique()
n_items = user_artists.artistID.nunique()
n_users,n_items
## (1892, 17632)

## Create a user-item matrix that can be used to calculate the similarity between users and items.

data_matrix = nm.zeros((n_users, n_items))
for line in user_artists.itertuples():
    data_matrix[line[1]-1, line[2]-1] = line[3]

这是错误：

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-96-f3242d18985b> in <module>
      3 data_matrix = nm.zeros((n_users, n_items))
      4 for line in user_artists.itertuples():
----> 5     data_matrix[line[1]-1, line[2]-1] = line[3]

IndexError: index 18733 is out of bounds for axis 1 with size 17632

【问题讨论】：

您的逻辑一定有错误，因为索引超出范围。但是如果你仍然想处理这个问题，你可以使用try 和except。你的逻辑对吗？
@GHOSTHUNT 其实我在here之前看到过这种方法，所以我想我的逻辑是肯定的。
我不建议这样做，但我已经发布了答案。
@GHOSTHUNT 感谢您的评论，但我真的想要所有元素。如果您认为这种方法不合适，请给我一些建议并推荐更好的方法。
为什么是line[2] 18734？您的代码显示了如何创建 data_matrix，因此 17632 的大小是可以理解的。但我们不知道你的“确定逻辑”是做什么的。

标签： python numpy recommendation-engine

【解决方案1】：

如果您确定自己的逻辑并希望避免此错误（这不是解决方案），请执行此操作

for line in user_artists.itertuples():
    try:
        data_matrix[line[1]-1, line[2]-1] = line[3]
    except:
        pass

【讨论】：