【发布时间】:2020-07-04 21:38:33
【问题描述】:
我正在使用带有 java8 的 spark-sql 2.3.1v。 我有如下数据框
val df_data = Seq(
("G1","I1","col1_r1", "col2_r1","col3_r1"),
("G1","I2","col1_r2", "col2_r2","col3_r3")
).toDF("group","industry_id","col1","col2","col3")
.withColumn("group", $"group".cast(StringType))
.withColumn("industry_id", $"industry_id".cast(StringType))
.withColumn("col1", $"col1".cast(StringType))
.withColumn("col2", $"col2".cast(StringType))
.withColumn("col3", $"col3".cast(StringType))
+-----+-----------+-------+-------+-------+
|group|industry_id| col1| col2| col3|
+-----+-----------+-------+-------+-------+
| G1| I1|col1_r1|col2_r1|col3_r1|
| G1| I2|col1_r2|col2_r2|col3_r3|
+-----+-----------+-------+-------+-------+
val df_cols = Seq(
("1", "usa", Seq("col1","col2","col3")),
("2", "ind", Seq("col1","col2"))
).toDF("id","name","list_of_colums")
.withColumn("id", $"id".cast(IntegerType))
.withColumn("name", $"name".cast(StringType))
+---+----+------------------+
| id|name| list_of_colums|
+---+----+------------------+
| 1| usa|[col1, col2, col3]|
| 2| ind| [col1, col2]|
+---+----+------------------+
问题: 如上所示,我在“df_cols”数据框中有列信息。 以及“df_data”数据框中的所有数据。 如何从“df_data”动态选择列到“df_cols”的给定ID??
【问题讨论】:
-
我还是不明白。您可以编辑您的问题并在示例中添加您期望的输出数据框吗?
标签: java dataframe apache-spark apache-spark-sql