【问题标题】:Comparing number of identical column elements between two rows in Python比较Python中两行之间相同列元素的数量
【发布时间】:2017-02-09 18:01:04
【问题描述】:

我正在尝试编写一个基本脚本,以帮助我找出行之间有多少相似的列。信息很简单,比如:

array = np.array([0 1 0 0 1 0 0], [0 0 1 0 1 1 0])

我必须在列表的所有排列之间执行此脚本,因此第 1 行与第 2 行相比,第 1 行与第 3 行相比,等等。

任何帮助将不胜感激。

【问题讨论】:

  • 您的示例所需的输出在哪里?你在说什么第3行?我只看到两行。而且您的代码无效。
  • 你如何定义“相似”?

标签: python arrays sorting numpy elements


【解决方案1】:

您的标题问题可以使用基本的 numpy 技术来解决。假设您有一个二维 numpy 数组 a,并且您想要比较行 mn

row_m = a[m, :] # this selects row index m and all column indices, thus: row m
row_n = a[n, :]
shared = row_m == row_n # this compares row_m and row_n element-by-element storing each individual result (True or False) in a separate cell, the result thus has the same shape as row_m and row_n
overlap = shared.sum() # this sums over all elements in shared, since False is encoded as 0 and True as 1 this returns the number of shared elements.

将此配方应用于所有行对的最简单方法是广播:

 first = a[:, None, :] # None creates a new dimension to make space for a second row axis
 second = a[None, :, :] # Same but new dim in first axis
 # observe that axes 0 and 1 in these two array are arranged as for a distance map
 # a binary operation between arrays so layed out will trigger broadcasting, i.e. numpy will compute all possible pairs in the appropriate positions
 full_overlap_map = first == second # has shape nrow x nrow x ncol
 similarity_table = full_overlap_map.sum(axis=-1) # shape nrow x nrow

【讨论】:

    【解决方案2】:

    如果您可以依赖所有行的二进制值,则“相似列”计数很简单

    def count_sim_cols(row0, row1):
        return np.sum(row0*row1)
    

    如果可能有更广泛的值,您只需将产品替换为比较

    def count_sim_cols(row0, row1):
         return np.sum(row0 == row1)
    

    如果你想对“相似性”有一个容忍度,比如tol,一些小的值,这只是

    def count_sim_cols(row0, row1):
        return np.sum(np.abs(row0 - row1) < tol)
    

    然后您可以使用双重嵌套循环来获取计数。假设X 是一个带有n 行的numpy 数组

    sim_counts = {}
    for i in xrange(n):
        for j in xrange(i + 1, n):
            sim_counts[(i, j)] = count_sim_cols(X[i], X[j])
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2012-05-04
      • 1970-01-01
      • 1970-01-01
      • 2019-05-05
      • 2011-02-21
      • 2014-01-07
      • 2021-08-08
      • 1970-01-01
      相关资源
      最近更新 更多