【问题标题】:How to load CSV file with records on multiple lines in spark scala?如何在 spark scala 中加载包含多行记录的 CSV 文件?
【发布时间】:2020-02-26 13:14:39
【问题描述】:

我有一个多行字段 csv,我尝试通过 spark 将其加载为数据框。

Cust_id, cust_address, city,zip
1, "1289 cobb parkway
Bufford", "ATLANTA",34343
2, "1234 IVY lane
Decatur", "ATLANTA",23435


val df = Spark.read.format("csv")
              .option("multiLine", true)
              .option("header", true)
              .option("escape", "\"")
              .load("/home/SPARK/file.csv")

    df.show()

这向我显示了数据框 -

+--------+-------------------+-----+----+
| id     | address           | city| zip|
+--------+-------------------+-----+----+
|       1| "1289 cobb parkway| null|null|
|Bufford"|          "ATLANTA"|34343|null|
|       2|     "1234 IVY lane| null|null|
|Decatur"|          "ATLANTA"|23435|null|
+--------+-------------------+-----+----+

我想要像这样的输出-

+---+--------------------+-------+-----+
| id|             address|   city|  zip|
+---+--------------------+-------+-----+
|  1|1289 cobb parkway...|ATLANTA|34343|
|  2|1234 IVY lane Dec...|ATLANTA|23435|
+---+--------------------+-------+-----+

【问题讨论】:

    标签: csv dataframe apache-spark apache-spark-sql


    【解决方案1】:
    val File = sqlContext.read.format("com.databricks.spark.csv")
    .option("delimiter", delimiter)
    .option("header",true)
    .option("quote", "\"")
    .option("multiLine", "true")
    .option("inferSchema", "true")
    .option("parserLib", "UNIVOCITY")
    .option("ignoreTrailingWhiteSpace","true")
    .option("ignoreLeadingWhiteSpace", true)
    .load(file_name) 
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-06-13
      • 2021-11-26
      • 2019-01-12
      • 2015-12-16
      • 1970-01-01
      相关资源
      最近更新 更多