【问题标题】:How to bin scala collection into subsets based upon bin range values如何根据 bin 范围值将 scala 集合分类为子集
【发布时间】:2022-11-18 07:29:08
【问题描述】:

我有一个非常大的案例类集合,每个案例类都有一个 String 属性和 Double 属性,例如:

case class Sample(id:String, value: Double)

val samples: List[Sample] = List(
  Sample("a", 0), 
  Sample("b", 2), 
  Sample("c", 20), 
  Sample("d", 50), 
  Sample("e", 100), 
  Sample("f", 1000)
)

给定一个桶列表,例如:

val buckets = List(5, 50, 100)

生成子集列表的最佳方法是什么,例如:

List(
  List(Sample("a", 0)), // samples with Value of 0
  List(Sample("b", 2)),   // Samples with Value > 0 & <= 5
  List(Sample("c", 20), Sample("d", 50)), // Samples with Value > 5 & <= 50
  List(Sample("e", 100)), // Samples with Value > 50 & <= 100
  List(Sample("f", 1000)), // Samples with Value > 100
)

【问题讨论】:

    标签: scala


    【解决方案1】:

    明确添加0作为bucket boundary,使用二分查找快速找到O(log(numBuckets))中正确的bucket,使用groupBy

    val buckets = List[Double](0, 5, 50, 100)
    
    val indexFinder: Double => Int = {
      val arr = buckets.toArray
      (value: Double) => arr.search(value).insertionPoint
    }
    
    samples.groupBy(sample => indexFinder(sample.value)).values.toList.foreach(println)
    
    

    给出:

    
    List(Sample(a,0.0))
    List(Sample(b,2.0))
    List(Sample(c,20.0), Sample(d,50.0))
    List(Sample(e,100.0))
    List(Sample(f,1000.0))
    

    完整代码:

    case class Sample(id:String, value: Double)
    
    val samples: List[Sample] = List(
      Sample("a", 0), 
      Sample("b", 2), 
      Sample("c", 20), 
      Sample("d", 50), 
      Sample("e", 100), 
      Sample("f", 1000)
    )
    
    val buckets = List[Double](0, 5, 50, 100)
    
    val indexFinder: Double => Int = {
      val arr = buckets.toArray
      (value: Double) => arr.search(value).insertionPoint
    }
    
    samples.groupBy(sample => indexFinder(sample.value)).values.toList.foreach(println)
    
    
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2020-02-01
      • 1970-01-01
      • 1970-01-01
      • 2017-08-13
      • 2018-01-30
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多