Julia：在排序矩阵中搜索列答案

【问题标题】：Julia: Searching for a column in a sorted matrixJulia：在排序矩阵中搜索列
【发布时间】：2018-12-24 09:49:30
【问题描述】：

我有一个矩阵，其排序如下所示

1 1 2 2 3

1 2 3 4 1

2 1 2 1 1

描述排序对我来说有点困难，但希望从示例中可以清楚地看出。粗略的想法是，我们首先对第一行进行排序，然后对第二行进行排序，以此类推。

我想在矩阵中找到一个特定的列，该列可能存在也可能不存在。

我尝试了以下代码：

index = searchsortedfirst(1:total_cols, col, lt=(index,x) -> (matrix[: index] < x))

上面的代码可以工作，但是速度很慢。我分析了代码，它在“_get_index”中花费了很多时间。然后我尝试了以下

  @views index = searchsortedfirst(1:total_cols, col, lt=(index,x) -> (matrix[: index] < x))

正如预期的那样，这很有帮助，可能是由于我正在服用的切片。但是，有没有更好的方法来解决这个问题？似乎仍然有很多开销，我觉得可能有一种更简洁的方式来编写它，这会更容易优化。

但是，我绝对看重速度而不是清晰度。

这是我编写的一些代码，用于比较二进制搜索与线性搜索。

using Profile

function test_search()
    max_val = 20
    rows = 4
    matrix = rand(1:max_val, rows, 10^5)
    matrix = Array{Int64,2}(sortslices(matrix, dims=2))

    indices = @time @profile lin_search(matrix, rows, max_val, 10^3)
    indices = @time @profile bin_search(matrix, rows, max_val, 10^3)
end
function bin_search(matrix, rows, max_val, repeats)
    indices = zeros(repeats)
    x = zeros(Int64, rows)
    cols = size(matrix)[2]
    for i = 1:repeats
        x = rand(1:max_val, rows)
        @inbounds @views index = searchsortedfirst(1:cols, x, lt=(index,x)->(matrix[:,index] < x))
        indices[i] = index
    end
    return indices
end

function array_eq(matrix, index, y, rows)
    for i=1:rows
        @inbounds if view(matrix, i, index) != y[i]
            return false
        end
    end
    return true
end

function lin_search(matrix, rows, max_val, repeats)
    indices = zeros(repeats)
    x = zeros(Int64, rows)
    cols = size(matrix)[2]

    for i = 1:repeats
        index = cols + 1
        x = rand(1:max_val, rows)
        for j=1:cols
            if array_eq(matrix, j, x, rows)
                index = j;
                break
            end
        end
        indices[i] = index
    end
    return indices
end

Profile.clear()
test_search()

这是一些示例输出

0.041356 seconds (68.90 k allocations: 3.431 MiB)
0.070224 seconds (110.45 k allocations: 5.418 MiB)

添加更多@inbounds 后，看起来线性搜索比二进制搜索更快。有 10^5 列时看起来很奇怪。

【问题讨论】：

AFAIU 您的行被视为数字，行排序只是基于比较它们的大小。
如果速度高于一切，为什么不手动实现一个小而快的功能（带循环）？

标签： julia

【解决方案1】：

如果速度是最重要的，为什么不简单地利用 Julia 允许您编写快速循环这一事实呢？

julia> function findcol(M, col)                
           @inbounds @views for c in axes(M, 2)
               M[:,c] == col && return c       
           end                                 
           return nothing                      
       end                                     
findcol (generic function with 1 method)       

julia> col = [2,3,2];                          

julia> M = [1 1 2 2 3;                         
           1 2 3 4 1;                          
           2 1 2 1 1];                         

julia> @btime findcol($M, $col)                
  32.854 ns (3 allocations: 144 bytes)         
3

这应该足够快，甚至不考虑任何排序。

【讨论】：

我应该提到矩阵可以非常大（10^3 或更多列）。我分析了代码，二进制搜索更快。我将使用此信息更新我的问题。

【解决方案2】：

我发现了两个问题，当修复后，线性搜索和二进制搜索都会变得更快。并且二分查找变得比线性查找更快。

首先，存在某种类型的不稳定性。我将其中一行更改为

matrix::Array{Int64,2} = Array{Int64,2}(sortslices(matrix, dims=2))

这导致了一个数量级的加速。事实证明，在下面的代码中使用@views 并没有做任何事情

@inbounds @views index = searchsortedfirst(1:cols, x, lt=(index,x)->(matrix[:,index] < x))

我是 Julia 的新手，但我的预感是因为 matrix[:,index] 在匿名函数中无论如何都会被复制。这是有道理的，因为它允许闭包。

如果我编写一个单独的非匿名函数，那么该副本就会消失。线性搜索没有复制切片，所以这也确实加快了二分搜索。

【讨论】：