【发布时间】:2020-02-26 13:14:39
【问题描述】:
我有一个多行字段 csv,我尝试通过 spark 将其加载为数据框。
Cust_id, cust_address, city,zip
1, "1289 cobb parkway
Bufford", "ATLANTA",34343
2, "1234 IVY lane
Decatur", "ATLANTA",23435
val df = Spark.read.format("csv")
.option("multiLine", true)
.option("header", true)
.option("escape", "\"")
.load("/home/SPARK/file.csv")
df.show()
这向我显示了数据框 -
+--------+-------------------+-----+----+
| id | address | city| zip|
+--------+-------------------+-----+----+
| 1| "1289 cobb parkway| null|null|
|Bufford"| "ATLANTA"|34343|null|
| 2| "1234 IVY lane| null|null|
|Decatur"| "ATLANTA"|23435|null|
+--------+-------------------+-----+----+
我想要像这样的输出-
+---+--------------------+-------+-----+
| id| address| city| zip|
+---+--------------------+-------+-----+
| 1|1289 cobb parkway...|ATLANTA|34343|
| 2|1234 IVY lane Dec...|ATLANTA|23435|
+---+--------------------+-------+-----+
【问题讨论】:
标签: csv dataframe apache-spark apache-spark-sql