【发布时间】:2018-02-11 17:41:17
【问题描述】:
这是我的数据框的输出
val finaldf.show(false)
+------------------+-------------------------+---------------------+---------------+-------------------------+--------------+----------+----------+---------+-------------------------+-------------------------+-----------------------+---------------------------+--------------------------+-------------------+-----------------------+--------------------+------------------------+------------+----------------------+-----------+
|DataPartition |TimeStamp |Source_organizationId|Source_sourceId|FilingDateTime |SourceTypeCode|DocumentId|Dcn |DocFormat|StatementDate |IsFilingDateTimeEstimated|ContainsPreliminaryData|CapitalChangeAdjustmentDate|CumulativeAdjustmentFactor|ContainsRestatement|FilingDateTimeUTCOffset|ThirdPartySourceCode|ThirdPartySourcePriority|SourceTypeId|ThirdPartySourceCodeId|FFAction|!||
+------------------+-------------------------+---------------------+---------------+-------------------------+--------------+----------+----------+---------+-------------------------+-------------------------+-----------------------+---------------------------+--------------------------+-------------------+-----------------------+--------------------+------------------------+------------+----------------------+-----------+
|SelfSourcedPrivate|2017-11-02T10:23:59+00:00|4298009288 |80 |2017-09-28T23:00:00+00:00|10K |null |171105584 |ASFILED |2017-07-31T00:00:00+00:00|false |false |2017-07-31T00:00:00+00:00 |1.0 |false |-300 |SS |1 |3011835 |1000716240 |I|!| |
|SelfSourcedPublic |2017-11-21T12:09:23+00:00|4295904170 |364 |2017-08-08T17:00:00+00:00|10Q |null |null |null |2017-07-30T00:00:00+00:00|false |false |2017-07-30T00:00:00+00:00 |1.0 |false |-300 |SS |1 |3011836 |1000716240 |I|!| |
|SelfSourcedPublic |2017-11-21T12:09:23+00:00|4295904170 |365 |2017-10-10T17:00:00+00:00|10K |null |null |null |2017-09-30T00:00:00+00:00|false |false |2017-09-30T00:00:00+00:00 |1.0 |false |-300 |SS |1 |3011835 |1000716240 |I|!| |
|SelfSourcedPublic |2017-11-21T12:17:49+00:00|4295904170 |365 |2017-10-10T17:00:00+00:00|10K |null |null |null |2017-09-30T00:00:00+00:00|false |false |2017-09-30T00:00:00+00:00 |1.0 |false |-300 |SS |1 |3011835 |1000716240 |I|!| |
concat_ws null 何时从行中删除。
val finaldf = diff.foldLeft(tempReorder){(temp2df, colName) => temp2df.withColumn(colName, lit("null"))}
//finaldf.show(false)
val headerColumn = data.columns.toSeq
val header = headerColumn.mkString("", "|^|", "|!|").dropRight(3)
val finaldfWithDelimiter=finaldf.select(concat_ws("|^|",finaldf.schema.fieldNames.map(col): _*).as("concatenated")).withColumnRenamed("concatenated", header)
finaldfWithDelimiter.show(false)
我得到低于输出
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|DataPartition|^|TimeStamp|^|Source_organizationId|^|Source_sourceId|^|FilingDateTime|^|SourceTypeCode|^|DocumentId|^|Dcn|^|DocFormat|^|StatementDate|^|IsFilingDateTimeEstimated|^|ContainsPreliminaryData|^|CapitalChangeAdjustmentDate|^|CumulativeAdjustmentFactor|^|ContainsRestatement|^|FilingDateTimeUTCOffset|^|ThirdPartySourceCode|^|ThirdPartySourcePriority|^|SourceTypeId|^|ThirdPartySourceCodeId|^|FFAction|!||
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|SelfSourcedPrivate|^|2017-11-02T10:23:59+00:00|^|4298009288|^|80|^|2017-09-28T23:00:00+00:00|^|10K|^|171105584|^|ASFILED|^|2017-07-31T00:00:00+00:00|^|false|^|false|^|2017-07-31T00:00:00+00:00|^|1.0|^|false|^|-300|^|SS|^|1|^|3011835|^|1000716240|^|I|!| |
|SelfSourcedPublic|^|2017-11-21T12:09:23+00:00|^|4295904170|^|364|^|2017-08-08T17:00:00+00:00|^|10Q|^|2017-07-30T00:00:00+00:00|^|false|^|false|^|2017-07-30T00:00:00+00:00|^|1.0|^|false|^|-300|^|SS|^|1|^|3011836|^|1000716240|^|I|!| |
|SelfSourcedPublic|^|2017-11-21T12:09:23+00:00|^|4295904170|^|365|^|2017-10-10T17:00:00+00:00|^|10K|^|2017-09-30T00:00:00+00:00|^|false|^|false|^|2017-09-30T00:00:00+00:00|^|1.0|^|false|^|-300|^|SS|^|1|^|3011835|^|1000716240|^|I|!|
在输出 DocumentId 中为 null 被替换。
无法弄清楚我错过了什么?
【问题讨论】:
标签: scala apache-spark spark-dataframe