【问题标题】:Goroutines overhead and performance analysis when subsetting DataFrames (Gota)子集 DataFrames (Gota) 时的 Goroutines 开销和性能分析
【发布时间】:2017-08-03 22:17:15
【问题描述】:

自 2016 年初以来,我一直致力于为 Go 实现 Pandas/R DataFrame 实现:https://github.com/kniren/gota

最近,我一直专注于提高库的性能,以尝试与 Pandas/Dplyr 匹敌。你可以在这里关注目前的进展:https://github.com/kniren/gota/issues/16

由于更常用的操作之一是 DataFrame 子集,我认为引入并发来尝试提高系统性能可能是个好主意。

之前:

columns := make([]series.Series, df.ncols)
for i, column := range df.columns {
    s := column.Subset(indexes)
    columns[i] = s
}

之后:

columns := make([]series.Series, df.ncols)
var wg sync.WaitGroup
wg.Add(df.ncols)
for i := range df.columns {
    go func(i int) {
        columns[i] = df.columns[i].Subset(indexes)
        wg.Done()
    }(i)
}
wg.Wait()

据我了解,为 DataFrame 的每一列创建一个 goroutine 不会引入太多开销,因此我期望相对于串行版本至少实现 x2 加速(至少对于大型数据集) .但是,当使用不同大小的数据集和索引对这种变化进行基准测试时,结果非常令人失望(NROWSxNCOLS_INDEXSIZE-CPUCoreS):

benchmark                                          old ns/op      new ns/op      delta
BenchmarkDataFrame_Subset/1000000x20_100           55230          109349         +97.99%
BenchmarkDataFrame_Subset/1000000x20_100-2         51457          67714          +31.59%
BenchmarkDataFrame_Subset/1000000x20_100-4         49845          70141          +40.72%
BenchmarkDataFrame_Subset/1000000x20_1000          518506         518085         -0.08%
BenchmarkDataFrame_Subset/1000000x20_1000-2        476661         311379         -34.67%
BenchmarkDataFrame_Subset/1000000x20_1000-4        505023         316583         -37.31%
BenchmarkDataFrame_Subset/1000000x20_10000         6621116        6314112        -4.64%
BenchmarkDataFrame_Subset/1000000x20_10000-2       7316062        4509601        -38.36%
BenchmarkDataFrame_Subset/1000000x20_10000-4       6483812        8394113        +29.46%
BenchmarkDataFrame_Subset/1000000x20_100000        105341711      106427967      +1.03%
BenchmarkDataFrame_Subset/1000000x20_100000-2      94567729       56778647       -39.96%
BenchmarkDataFrame_Subset/1000000x20_100000-4      91896690       60971444       -33.65%
BenchmarkDataFrame_Subset/1000000x20_1000000       1538680081     1632044752     +6.07%
BenchmarkDataFrame_Subset/1000000x20_1000000-2     1292113119     1100075806     -14.86%
BenchmarkDataFrame_Subset/1000000x20_1000000-4     1282367864     949615298      -25.95%
BenchmarkDataFrame_Subset/100000x20_100            50286          106850         +112.48%
BenchmarkDataFrame_Subset/100000x20_100-2          54537          70492          +29.26%
BenchmarkDataFrame_Subset/100000x20_100-4          58024          76617          +32.04%
BenchmarkDataFrame_Subset/100000x20_1000           541600         625967         +15.58%
BenchmarkDataFrame_Subset/100000x20_1000-2         493894         362894         -26.52%
BenchmarkDataFrame_Subset/100000x20_1000-4         535373         349211         -34.77%
BenchmarkDataFrame_Subset/100000x20_10000          6298063        7678499        +21.92%
BenchmarkDataFrame_Subset/100000x20_10000-2        5827185        4832560        -17.07%
BenchmarkDataFrame_Subset/100000x20_10000-4        8195048        3660077        -55.34%
BenchmarkDataFrame_Subset/100000x20_100000         105108807      82976477       -21.06%
BenchmarkDataFrame_Subset/100000x20_100000-2       92112736       58317114       -36.69%
BenchmarkDataFrame_Subset/100000x20_100000-4       92044966       63469935       -31.04%
BenchmarkDataFrame_Subset/1000x20_10               9741           53365          +447.84%
BenchmarkDataFrame_Subset/1000x20_10-2             9366           36457          +289.25%
BenchmarkDataFrame_Subset/1000x20_10-4             9463           46682          +393.31%
BenchmarkDataFrame_Subset/1000x20_100              50841          103523         +103.62%
BenchmarkDataFrame_Subset/1000x20_100-2            49972          62344          +24.76%
BenchmarkDataFrame_Subset/1000x20_100-4            72014          81808          +13.60%
BenchmarkDataFrame_Subset/1000x20_1000             457799         571292         +24.79%
BenchmarkDataFrame_Subset/1000x20_1000-2           460551         405116         -12.04%
BenchmarkDataFrame_Subset/1000x20_1000-4           462928         416522         -10.02%
BenchmarkDataFrame_Subset/1000x200_10              90125          688443         +663.88%
BenchmarkDataFrame_Subset/1000x200_10-2            85259          392705         +360.60%
BenchmarkDataFrame_Subset/1000x200_10-4            87412          387509         +343.31%
BenchmarkDataFrame_Subset/1000x200_100             486600         1082901        +122.54%
BenchmarkDataFrame_Subset/1000x200_100-2           471154         732304         +55.43%
BenchmarkDataFrame_Subset/1000x200_100-4           542846         659571         +21.50%
BenchmarkDataFrame_Subset/1000x200_1000            5926086        6686480        +12.83%
BenchmarkDataFrame_Subset/1000x200_1000-2          5364091        3986970        -25.67%
BenchmarkDataFrame_Subset/1000x200_1000-4          5904977        4504084        -23.72%
BenchmarkDataFrame_Subset/1000x2000_10             1187297        7800052        +556.96%
BenchmarkDataFrame_Subset/1000x2000_10-2           1217022        3930742        +222.98%
BenchmarkDataFrame_Subset/1000x2000_10-4           1301666        3617871        +177.94%
BenchmarkDataFrame_Subset/1000x2000_100            6942015        10790196       +55.43%
BenchmarkDataFrame_Subset/1000x2000_100-2          6588351        7592847        +15.25%
BenchmarkDataFrame_Subset/1000x2000_100-4          7067226        14391327       +103.63%
BenchmarkDataFrame_Subset/1000x2000_1000           62392457       69560711       +11.49%
BenchmarkDataFrame_Subset/1000x2000_1000-2         57793006       37416703       -35.26%
BenchmarkDataFrame_Subset/1000x2000_1000-4         59572261       58398203       -1.97%

benchmark                                          old allocs     new allocs     delta
BenchmarkDataFrame_Subset/1000000x20_100           41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_100-2         41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_100-4         41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_1000          41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_1000-2        41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_1000-4        41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_10000         41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_10000-2       41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_10000-4       41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_100000        41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_100000-2      41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_100000-4      41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_1000000       41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_1000000-2     41             43             +4.88%
BenchmarkDataFrame_Subset/1000000x20_1000000-4     41             46             +12.20%
BenchmarkDataFrame_Subset/100000x20_100            41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_100-2          41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_100-4          41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_1000           41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_1000-2         41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_1000-4         41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_10000          41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_10000-2        41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_10000-4        41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_100000         41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_100000-2       41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_100000-4       41             42             +2.44%
BenchmarkDataFrame_Subset/1000x20_10               41             42             +2.44%
BenchmarkDataFrame_Subset/1000x20_10-2             41             42             +2.44%
BenchmarkDataFrame_Subset/1000x20_10-4             41             42             +2.44%
BenchmarkDataFrame_Subset/1000x20_100              41             42             +2.44%
BenchmarkDataFrame_Subset/1000x20_100-2            41             42             +2.44%
BenchmarkDataFrame_Subset/1000x20_100-4            41             42             +2.44%
BenchmarkDataFrame_Subset/1000x20_1000             41             42             +2.44%
BenchmarkDataFrame_Subset/1000x20_1000-2           41             42             +2.44%
BenchmarkDataFrame_Subset/1000x20_1000-4           41             42             +2.44%
BenchmarkDataFrame_Subset/1000x200_10              401            402            +0.25%
BenchmarkDataFrame_Subset/1000x200_10-2            401            402            +0.25%
BenchmarkDataFrame_Subset/1000x200_10-4            401            402            +0.25%
BenchmarkDataFrame_Subset/1000x200_100             401            402            +0.25%
BenchmarkDataFrame_Subset/1000x200_100-2           401            402            +0.25%
BenchmarkDataFrame_Subset/1000x200_100-4           401            402            +0.25%
BenchmarkDataFrame_Subset/1000x200_1000            401            402            +0.25%
BenchmarkDataFrame_Subset/1000x200_1000-2          401            402            +0.25%
BenchmarkDataFrame_Subset/1000x200_1000-4          401            402            +0.25%
BenchmarkDataFrame_Subset/1000x2000_10             4001           4002           +0.02%
BenchmarkDataFrame_Subset/1000x2000_10-2           4001           4002           +0.02%
BenchmarkDataFrame_Subset/1000x2000_10-4           4001           4002           +0.02%
BenchmarkDataFrame_Subset/1000x2000_100            4001           4002           +0.02%
BenchmarkDataFrame_Subset/1000x2000_100-2          4001           4002           +0.02%
BenchmarkDataFrame_Subset/1000x2000_100-4          4001           4002           +0.02%
BenchmarkDataFrame_Subset/1000x2000_1000           4001           4002           +0.02%
BenchmarkDataFrame_Subset/1000x2000_1000-2         4001           4010           +0.22%
BenchmarkDataFrame_Subset/1000x2000_1000-4         4001           4003           +0.05%

benchmark                                          old bytes     new bytes     delta
BenchmarkDataFrame_Subset/1000000x20_100           32400         32416         +0.05%
BenchmarkDataFrame_Subset/1000000x20_100-2         32400         32416         +0.05%
BenchmarkDataFrame_Subset/1000000x20_100-4         32400         32416         +0.05%
BenchmarkDataFrame_Subset/1000000x20_1000          298880        298896        +0.01%
BenchmarkDataFrame_Subset/1000000x20_1000-2        298880        298896        +0.01%
BenchmarkDataFrame_Subset/1000000x20_1000-4        298880        298896        +0.01%
BenchmarkDataFrame_Subset/1000000x20_10000         2971520       2971536       +0.00%
BenchmarkDataFrame_Subset/1000000x20_10000-2       2971520       2971536       +0.00%
BenchmarkDataFrame_Subset/1000000x20_10000-4       2971520       2971536       +0.00%
BenchmarkDataFrame_Subset/1000000x20_100000        29083520      29083536      +0.00%
BenchmarkDataFrame_Subset/1000000x20_100000-2      29083520      29083547      +0.00%
BenchmarkDataFrame_Subset/1000000x20_100000-4      29083542      29083563      +0.00%
BenchmarkDataFrame_Subset/1000000x20_1000000       290121600     290121616     +0.00%
BenchmarkDataFrame_Subset/1000000x20_1000000-2     290121600     290121696     +0.00%
BenchmarkDataFrame_Subset/1000000x20_1000000-4     290121600     290121840     +0.00%
BenchmarkDataFrame_Subset/100000x20_100            32400         32416         +0.05%
BenchmarkDataFrame_Subset/100000x20_100-2          32400         32416         +0.05%
BenchmarkDataFrame_Subset/100000x20_100-4          32400         32416         +0.05%
BenchmarkDataFrame_Subset/100000x20_1000           298880        298896        +0.01%
BenchmarkDataFrame_Subset/100000x20_1000-2         298880        298896        +0.01%
BenchmarkDataFrame_Subset/100000x20_1000-4         298880        298896        +0.01%
BenchmarkDataFrame_Subset/100000x20_10000          2971520       2971536       +0.00%
BenchmarkDataFrame_Subset/100000x20_10000-2        2971520       2971536       +0.00%
BenchmarkDataFrame_Subset/100000x20_10000-4        2971520       2971536       +0.00%
BenchmarkDataFrame_Subset/100000x20_100000         29083520      29083536      +0.00%
BenchmarkDataFrame_Subset/100000x20_100000-2       29083520      29083536      +0.00%
BenchmarkDataFrame_Subset/100000x20_100000-4       29083542      29083553      +0.00%
BenchmarkDataFrame_Subset/1000x20_10               4880          4896          +0.33%
BenchmarkDataFrame_Subset/1000x20_10-2             4880          4896          +0.33%
BenchmarkDataFrame_Subset/1000x20_10-4             4880          4896          +0.33%
BenchmarkDataFrame_Subset/1000x20_100              32400         32416         +0.05%
BenchmarkDataFrame_Subset/1000x20_100-2            32400         32416         +0.05%
BenchmarkDataFrame_Subset/1000x20_100-4            32400         32416         +0.05%
BenchmarkDataFrame_Subset/1000x20_1000             298880        298896        +0.01%
BenchmarkDataFrame_Subset/1000x20_1000-2           298880        298896        +0.01%
BenchmarkDataFrame_Subset/1000x20_1000-4           298880        298896        +0.01%
BenchmarkDataFrame_Subset/1000x200_10              49568         49584         +0.03%
BenchmarkDataFrame_Subset/1000x200_10-2            49568         49584         +0.03%
BenchmarkDataFrame_Subset/1000x200_10-4            49568         49585         +0.03%
BenchmarkDataFrame_Subset/1000x200_100             324768        324784        +0.00%
BenchmarkDataFrame_Subset/1000x200_100-2           324768        324784        +0.00%
BenchmarkDataFrame_Subset/1000x200_100-4           324768        324784        +0.00%
BenchmarkDataFrame_Subset/1000x200_1000            2989568       2989584       +0.00%
BenchmarkDataFrame_Subset/1000x200_1000-2          2989568       2989584       +0.00%
BenchmarkDataFrame_Subset/1000x200_1000-4          2989569       2989588       +0.00%
BenchmarkDataFrame_Subset/1000x2000_10             491072        491088        +0.00%
BenchmarkDataFrame_Subset/1000x2000_10-2           491072        491133        +0.01%
BenchmarkDataFrame_Subset/1000x2000_10-4           491072        491088        +0.00%
BenchmarkDataFrame_Subset/1000x2000_100            3243072       3243088       +0.00%
BenchmarkDataFrame_Subset/1000x2000_100-2          3243074       3243102       +0.00%
BenchmarkDataFrame_Subset/1000x2000_100-4          3243076       3243100       +0.00%
BenchmarkDataFrame_Subset/1000x2000_1000           29891072      29891088      +0.00%
BenchmarkDataFrame_Subset/1000x2000_1000-2         29891086      29891797      +0.00%
BenchmarkDataFrame_Subset/1000x2000_1000-4         29891115      29891167      +0.00%

在这个基准上运行分析器 (cpu/mem) 似乎没有发现任何重要的东西。并发版本似乎在rumtime.match_semaphore_signal 上花费了一些时间,但我想这是在等待 goroutine 完成时可以预料到的。

我已尝试将启动的 goroutine 数量限制为 runtime.GOMAXPROCS(0) 报告的最大内核数,但结果更糟。我在这里做错了什么,还是 goroutines 开销如此之大以至于对性能有如此显着的影响?

【问题讨论】:

  • 对于 ns/op,看起来当数字很大时,性能下降得很快。 Goroutines 在少量线程上维护,并且应该非常快,因为没有上下文切换。但是,可能是,当这个数字真的很大时,性能开始下降。至少你的数据是这样说的。您如何进行更复杂的分而治之?而不是您现在使用的幼稚版本?但是,感谢您做出这样的努力。
  • 也许您不是受 CPU 限制,而是受内存限制?如果访问内存是最慢的部分,那么在问题上投入更多的 CPU 和更多的缓存未命中是没有帮助的。
  • @Volker 我认为你是对的,毕竟Series.Subset 有一个解析索引的部分,然后它几乎是根据索引复制数据。这可以解释为什么当索引长度很大时我们观察到的性能提升不是很显着。那么没有办法同时执行副本吗?
  • 我怀疑我可以在这里给出一个明确的答案,这真的取决于。可能还有改进的余地,但要想办法做到这一点,需要深入了解您的工作量并详细分析应用程序。
  • 你尝试了 2 个 goroutine 而不是 n 个?当过程很长时,我通常会通过并行化来获得好处。此外,您可能希望有一个实时 CPU 监视器来查看二进制文件从头到尾实际在做什么;你没有从 pprof 那里得到。你也可以试试这个:github.com/uber/go-torch

标签: performance go goroutine


【解决方案1】:

Goroutines 很便宜,但不是免费的。

我没有阅读您的代码,但如果您为您处理的 每个 行生成 NCOLS_INDEXSIZE goroutine,那么这是一个非常糟糕的做法。

这可以在您的基准测试中看到,其中您有 2k 列和只有 1k 行 - 您获得了非常大的改进。但在所有其他情况下,当列数

相反,您应该生成一个 goroutine 池(接近您的 CPU 数量)并通过通道在它们之间分配工作 - 这是规范的方式。您可能想阅读https://blog.golang.org/pipelines

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2011-11-18
    • 1970-01-01
    • 2015-12-18
    • 2021-09-10
    • 1970-01-01
    • 2018-07-26
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多