无论如何,您可以将其转换为 Scala 吗?
Scala 支持case class 处理这种情况
对于您的案例,挑战是您有一个 Seq/Array 的 Inner 案例类 => private java.util.ArrayList<Identifier> secodaryIds;
所以可以通过以下方式完成
// inner case class Identifier
case class Identifier(Id : Integer , uuid : String)
val innerVal = Seq(Identifier(1,"gsgsg"),Identifier(2,"dvggwgwg"))
// Outer case class MyComplexEntity
case class MyComplexEntity(notes : String, identifierArray : Seq[Identifier])
val outerVal = MyComplexEntity("Hello", innerVal)
请注意=>
outerVal : MyComplexEntity 包含一个 LIST 标识符对象,如下所示
outerVal: MyComplexEntity = MyComplexEntity(Hello,List(Identifier(1,gsgsg), Identifier(2,dvggwgwg)))
现在是使用 Dataset
的实际
spark 方式
import spark.implicits._
// Convert Our Input Data in Same Structure as your MyComplexEntity
// Only Trick is To 'Reflect' A Seq[(Int,String)] => Seq[Identifier]
// Hence we have to do 2 Mapping once for Outer Case class (MyComplexEntity) And Once For Inner Seq of Identifier
// If We Just Take this Input Data and Convert To DataSet ( without any Schema Inference)
// This is How It looks
val inputData = Seq(("Some DAY",Seq((210,"wert67"),(310,"bill123"))),
("I WILL BE", Seq((420,"henry678"),(1000,"baba123"))),
("Saturday Night",Seq((1000,"Roger123"),(2000,"God345")))
)
val unMappedDs = inputData.toDS
给我们=>
// See how it is Infered
// unMappedDs: org.apache.spark.sql.Dataset[(String, Seq[(Int, String)])] = [_1: string, _2: array<struct<_1:int,_2:string>>]
但如果我们'正确'映射它 =>
作为 => // Second element is a Seq[(Int,String)] and We map it into Seq[Identifier] as x._2.map(y => Identifier(y._1,y._2))
如下:
val resultDs = inputData.toDS.map(x =>MyComplexEntity(x._1,x._2.map(y => Identifier(y._1,y._2))))
resultDs.show(20,false)
我们得到一个类似 =>
的结构
resultDs: org.apache.spark.sql.Dataset[MyComplexEntity] = [notes: string, identifierArray: array<struct<Id:int,uuid:string>>]
数据为:
+--------------+--------------------------------+
|notes |identifierArray |
+--------------+--------------------------------+
|Some DAY |[[210,wert67], [310,bill123]] |
|I WILL BE |[[420,henry678], [1000,baba123]]|
|Saturday Night|[[1000,Roger123], [2000,God345]]|
+--------------+--------------------------------+
使用 Scala 就这么简单。
谢谢。