【问题标题】:convert wrapped array[long] into vector in scala在scala中将包装的数组[long]转换为向量
【发布时间】:2018-03-24 04:49:31
【问题描述】:

我在数据框标签 (1,2,3,4) 和 Signal 中有 2 列 标签是整数类型,信号是 包裹数组 (信号:数组(可为空=真) | |-- 元素:long (containsNull = true))

signal_list :one wrapped array contain 9k elements (sample data) 

-116, -123, -129, -133, -136, -137, -139, -140, -140, -141, -141, -142, -144, -145, -146, -146, -146, -146, -146, -144, -142, -139, -134, -130, -126, -123, -121, -119, -118, -116, -115, -113, -112, -110, -108, -106, -104, -101, -99, -97, -95, -92, -91, -88, -86, -84, -81, -78, -74, -70, -67, -64, -63, -62, -63, -65, -68, -70, -73, -76, -78, -80, -82, -84, -86, -88, -91, -93, -94, -95, -96, -97, -97, -96, -95, -94, -93, -92, -92, -93, -93, -93, -93, -92, -92, -91, -91, -90, -90, -89, -89, -89, -90, -90, -91, -91, -88, -86, -83, -82, -80, -78, -76, -73, -72, -70, -69, -69, -68, -67, -67, -67, -67, -67, -67, -67, -66, -65, -64, -63, -62, -61, -60, -60, -59, -57, -56, -54, -52, -47, -32, -20, -7, 5, 20, 29, 35, 40, 45, 47, 47, 44, 39, 32, 24, 17, 8, -1, -9


I want to convert this both into the vector.
i have used following 
val convertUDF = udf((array : Seq[Long]) => {
  Vector(array.toArray)
})

it gives wrapped array inside the vector
 [WrappedArray(-116, -123, -129, -133, -136, -137, -139, -140, -140)]

i want the only element in the vector and I want to ask one more thing will vector handle 9k elements.?

【问题讨论】:

    标签: arrays scala apache-spark dataframe vector


    【解决方案1】:
    val convertUDF = udf((array : Seq[Long]) => {
      Vectors.dense(array.toArray.map(_.toDouble))
    })
    

    这对我有用

    【讨论】:

      【解决方案2】:

      你可以试试这个:

      scala> val df = spark.range(10)
      df: org.apache.spark.sql.Dataset[Long] = [id: bigint]
      
      scala> df.show
      +---+
      | id|
      +---+
      |  0|
      |  1|
      |  2|
      |  3|
      |  4|
      |  5|
      |  6|
      |  7|
      |  8|
      |  9|
      +---+
      
      scala> val list = df.agg(collect_list("id").as("id"))
      list: org.apache.spark.sql.DataFrame = [id: array<bigint>]
      
      scala> list.show(false)
      +------------------------------+
      |id                            |
      +------------------------------+
      |[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]|
      +------------------------------+
      
      scala> val convertUDF = udf((array : Seq[Long]) => {
           | array.toVector }
           | )
      convertUDF: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,ArrayType(LongType,false),Some(List(ArrayType(LongType,false))))
      
      scala> list.select(convertUDF(col("id")).as("vector")).show
      +--------------------+
      |              vector|
      +--------------------+
      |[0, 1, 2, 3, 4, 5...|
      +--------------------+
      
      
      scala> list.select(convertUDF(col("id")).as("vector")).printSchema
      root
       |-- vector: array (nullable = true)
       |    |-- element: long (containsNull = false)
      

      【讨论】:

      • 谢谢,我知道 ID 列是可能的,但是对于包含 9k 个元素的包装数组的 signal_list col 我应该怎么做。
      • 不,它不工作,因为包装数组我认为 [Label: int, vector: array]
      猜你喜欢
      • 1970-01-01
      • 2012-09-30
      • 1970-01-01
      • 2010-12-31
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-03-11
      • 1970-01-01
      相关资源
      最近更新 更多