如何在 Spark 数据框中将值从一列交换到另一列答案

【问题标题】：How to swap the values from one column to another in Spark dataframe如何在 Spark 数据框中将值从一列交换到另一列
【发布时间】：2019-12-01 22:24:13
【问题描述】：

我有一个包含 6 列的数据框。在这里，我需要将一列值分配给另一列。需要将 ROW 列中的值放入 ItemData 列。这里所有的列都是结构类型，而不仅仅是字符串名称。

+-----+--------------------+--------------------+-------------------+--------------------------+--------------------+
|index|                 ROW|        Document    |ItemData           | noNamespaceSchemaLocation|                _xsi|
+-----+--------------------+--------------------+-------------------+--------------------------+--------------------+
|    0|[1,1,1018,17.0...   |[[,2001-12-17T09:...|            [,,,,,]|      GetItemMasterSupp...|http://www.w3.org...|
+-----+--------------------+--------------------+-------------------+--------------------------+--------------------+

我尝试将 DF 注册到临时表，然后尝试交换列，但没有帮助。

The final output should look like this 
+--------------------+-------------------+--------------------------+--------------------+
|        Document   |ItemData           | noNamespaceSchemaLocation|                _xsi|
+--------------------+-------------------+--------------------------+--------------------+
|[[,2001-12-17T09:...|  [1,1,1018,17.0...|      GetItemMasterSupp...|http://www.w3.org...|
+--------------------+-------------------+--------------------------+--------------------+

df.printschema() 这是架构

root
 |-- index: long (nullable = false)
 |-- ROW: struct (nullable = true)
 |    |-- CLTRP: long (nullable = true)
 |    |-- CORP: long (nullable = true)
 |    |-- CORP_ITEM_CD: long (nullable = true)
 |    |-- CTIV: double (nullable = true)
 |    |-- CTLFAC: string (nullable = true)
 |    |-- CTLI: long (nullable = true)
 |-- DocData: struct (nullable = true)
 |    |-- Document: struct (nullable = true)
 |    |    |-- AltementID: string (nullable = true)
 |    |    |-- Creat: string (nullable = true)
 |    |    |-- DataClasion: struct (nullable = true)
 |    |    |    |-- BusinessSeel: struct (nullable = true)
 |    |    |    |    |-- Code: string (nullable = true)
 |    |    |    |    |-- Description: string (nullable = true)
 |    |    |    |-- DataCLevel: struct (nullable = true)
 |    |    |    |    |-- Code: string (nullable = true)
 |    |    |    |    |-- Description: string (nullable = true)
 |    |    |    |-- PCaInd: string (nullable = true)
 |    |    |    |-- PHtaInd: string (nullable = true)
 |    |    |    |-- PPnd: string (nullable = true)
 |    |-- DocumentAction: struct (nullable = true)
 |    |    |-- ActionTypeCd: string (nullable = true)
 |    |    |-- RecordTypeCd: string (nullable = true)
 |-- ItemData: struct (nullable = true)
 |    |-- CorpCd: string (nullable = true)
 |    |-- CorId: string (nullable = true)
 |    |-- DepId: string (nullable = true)
 |    |-- DisrId: string (nullable = true)
 |    |-- DivId: string (nullable = true)
 |    |-- WarId: string (nullable = true)
 |-- _noNamespaceSchemaLocation: string (nullable = true)
 |-- _xsi: string (nullable = true)

编辑 1：

** 更新以显示数据框的创建

//XML Data Reader
    val supData="Input_File/SCI_Input.xml"
    val booksFileTag1 = "ROWSET"   

    val dataDF = (new XmlReader()).withRowTag(booksFileTag1).xmlFile(sqlContext, supplyData).toDF()

    val dataFrame1 = dataDF.withColumn("index",monotonically_increasing_id())   

// XML Schema Reader
val suppySchema="Input_File/Supply_sample.xml"
val booksFileTag = "GetItemMaster"      

val schemaDf = (new XmlReader()).withRowTag(booksFileTag).xmlFile(sqlContext, suppySchema).toDF()

val dataFrame2 = schemaDf.withColumn("index",monotonically_increasing_id())

val finalDf = dataFrame1.join(dataFrame2,"index")

finalDf.show()



 Output for reference for @JXC
 |-- ItemData: struct (nullable = true)
 |    |-- CLTRP: long (nullable = true)
 |    |-- CORP: long (nullable = true)
 |    |-- CORP_ITEM_CD: long (nullable = true)
 |    |-- CTIV: double (nullable = true)
 |    |-- CTLFAC: string (nullable = true)
 |    |-- CTLI: long (nullable = true)

【问题讨论】：

new_df = df.selectExpr('Document', 'ROW AS ItemData', 'noNamespaceSchemaLocation', '_xsi')
@user1638818- 你也可以为你的数据框发布创建语句吗？
@vikrantrana -- 用数据框创建部分编辑了我的问题
@jxc -- 感谢您的回答，我尝试按照您的建议进行操作，但此方法存在问题，如果我选择“ROW”AS ItemData，那么我的项目数据架构也会得到更改为我不想要的 ROW 列。我需要有 ItemData 的模式，如我的问题（df.printschema（））中所示，只有值应该移动到 ItemData 列。尝试上述代码后，ItemData 的架构如下所示（在原始问题中显示）

标签： scala apache-spark pyspark apache-spark-sql

【解决方案1】：

您可以简单地将 Row 列重命名为 ItemData，然后删除旧的 ItemData 列。

您可以通过多种方式重命名列：- https://sparkbyexamples.com/rename-a-column-on-spark-dataframes/

【讨论】：

这些例子确实帮助我找到了我想要的东西。感谢您提供链接。我很感激

【解决方案2】：

试试这个：

df = df.withColumn("ItemData", F.col("ROW")).drop("ROW")

【讨论】：

【解决方案3】：

首先，交换与重命名不同（这里已经回答了）。

如果您希望交换两列的值，例如 col_A 和 col_B，请执行以下操作：

df.withColumn("col_A_", 'col_B)
  .withColumn("col_B", 'col_A)
  .withColumn("col_A", "col_A_")
  .drop('col_A_)

【讨论】：