【发布时间】:2021-08-14 01:24:27
【问题描述】:
# Create random df
df = pd.DataFrame(np.random.randint(1,10, size=(100,23)))
test = df[:50]
for i in range(len(test)):
query_node = test.iloc[i]
# Find the distance between this node and everyone else
euclidean_distances = test.apply(lambda row: distance.euclidean(row, query_node), axis=1)
# Create a new dataframe with distances.
distance_frame = pd.DataFrame(data={"dist": euclidean_distances, "idx": euclidean_distances.index})
distance_frame.sort_values("dist", inplace=True)
smallest_dist = [dist["idx"] for idx, dist in distance_frame.iloc[1:4].iterrows()]
我被这个问题难住了,想知道是否有人能看出我哪里出错了。我正在尝试计算每行与每行之间的欧几里得距离。然后,我对这些距离进行排序,并按列表 minimum_dist 中的最小距离返回“最相似”行的索引位置。
问题是这只返回最后一行最相似的索引位置:[6.0, 3.0, 4.0]
我想要的输出是这样的:
| Original ID | Matches |
|---|---|
| 1 | 4,5,6 |
| 2 | 8,2,5 |
我试过了,但结果是一样的:
list_of_mins = []
for i in range(len(test)):
query_node = test.iloc[i]
# Find the distance between this node and everyone else
euclidean_distances = test.apply(lambda row: distance.euclidean(row, query_node), axis=1)
# Create a new dataframe with distances.
distance_frame = pd.DataFrame(data={"dist": euclidean_distances, "idx": euclidean_distances.index})
distance_frame.sort_values("dist", inplace=True)
smallest_dist = [dist["idx"] for idx, dist in distance_frame.iloc[1:4].iterrows()]
for i in range(len(test)):
list_of_mins.append(smallest_dist_ixs)
Does anyone know what's causing this problem? thank you!
【问题讨论】: