【问题标题】:Scala - Converting array-data into table or dataframe?Scala - 将数组数据转换为表或数据框?
【发布时间】:2018-01-07 07:19:25
【问题描述】:

我想创建并保存一个用随机ints 填充的表。到目前为止一切都很好,但我不明白我如何能够将多维数组 tmp 放入具有顶部定义的架构的 Dataframe 中。

import org.apache.spark.sql.types.{
StructType, StructField, StringType, IntegerType, DoubleType}
import org.apache.spark.sql.Row

val schema = StructType(
StructField("rowId", IntegerType, true) ::
StructField("t0_1", DoubleType, true) ::
StructField("t0_2", DoubleType, true) ::    
StructField("t0_3", DoubleType, true) ::
StructField("t0_4", DoubleType, true) ::
StructField("t0_5", DoubleType, true) ::
StructField("t0_6", DoubleType, true) ::
StructField("t0_7", DoubleType, true) ::
StructField("t0_8", DoubleType, true) ::
StructField("t0_9", DoubleType, true) ::
StructField("t0_10", DoubleType, true) :: Nil)

val columnNo = 10;
val rowNo = 50;

var c = 0;
var r = 0;

val tmp = Array.ofDim[Double](10,rowNo)

for (r <- 1 to rowNo){
for (c <- 1 to columnNo){
    val temp = new scala.util.Random
    tmp(c-1)(r-1) = temp.nextDouble
    println( "Value of " + c + "/"+ r + ":" + tmp(c-1)(r-1));
}
}

val df = sc.parallelize(tmp).toDF
df.show
dataframe.show

【问题讨论】:

    标签: sql scala apache-spark dataframe spark-dataframe


    【解决方案1】:

    您不能将数组数组转换为数据帧,而是需要一个元组数组或案例类。这里的变体基于与您想要的架构相对应的案例类:

    case class Record(
      rowID:Option[Int],
      t0_1:Option[Double],
      t0_2:Option[Double],
      t0_3:Option[Double],
      t0_4:Option[Double],
      t0_5:Option[Double],
      t0_6:Option[Double],
      t0_7:Option[Double],
      t0_8:Option[Double],
      t0_9:Option[Double],
      t0_10:Option[Double]
    )
    
    val rowNo = 50;
    val temp = new scala.util.Random
    
    val data = (1 to rowNo).map(r => 
     Record(
        Some(r),
        Some(temp.nextDouble),
        Some(temp.nextDouble),
        Some(temp.nextDouble),
        Some(temp.nextDouble),
        Some(temp.nextDouble),
        Some(temp.nextDouble),
        Some(temp.nextDouble),
        Some(temp.nextDouble),
        Some(temp.nextDouble),
        Some(temp.nextDouble)
      )
    )
    
    val df = sc.parallelize(data).toDF
    

    【讨论】:

    • 非常感谢!解决了我的问题并大大缩短了我的代码!
    猜你喜欢
    • 2021-11-25
    • 2017-06-11
    • 2019-09-13
    • 2022-11-04
    • 2020-04-15
    • 2015-01-22
    • 1970-01-01
    • 1970-01-01
    • 2020-02-05
    相关资源
    最近更新 更多