【问题标题】:Spark Scala CSV Column names to Lower CaseSpark Scala CSV 列名小写
【发布时间】:2018-03-18 18:20:56
【问题描述】:

请找到下面的代码并告诉我如何将列名更改为小写。我尝试了 withColumnRename 但我必须为每一列都这样做并键入所有列名。我只想在列上这样做,所以我不想提及所有列名,因为它们太多了。

Scala 版本:2.11 火花:2.2

import org.apache.spark.sql.SparkSession
import org.apache.log4j.{Level, Logger}
import com.datastax


import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import com.datastax.spark.connector._
import org.apache.spark.sql._

object dataframeset {

  def main(args: Array[String]): Unit = {

    val conf = new SparkConf().setAppName("Sample1").setMaster("local[*]")
    val sc = new SparkContext(conf)
    sc.setLogLevel("ERROR")
    val rdd1 = sc.cassandraTable("tdata", "map3")
    Logger.getLogger("org").setLevel(Level.ERROR)
    Logger.getLogger("akka").setLevel(Level.ERROR)
    val spark1 = org.apache.spark.sql.SparkSession.builder().master("local").config("spark.cassandra.connection.host","127.0.0.1")
      .appName("Spark SQL basic example").getOrCreate()

    val df = spark1.read.format("csv").option("header","true").option("inferschema", "true").load("/Users/Desktop/del2.csv")
    import spark1.implicits._
    println("\nTop Records are:")
    df.show(1)


    val dfprev1 = df.select(col = "sno", "year", "StateAbbr")

    dfprev1.show(1)
}
}

需要的输出:

|sno|year|stateabbr|    statedesc|cityname|geographiclevel

All the Columns names should be in lower case. 

实际输出:

Top Records are:
+---+----+---------+-------------+--------+---------------+----------+----------+--------+--------------------+---------------+---------------+--------------------+----------+--------------------+---------------------+--------------------------+-------------------+---------------+-----------+----------+---------+--------+---------+-------------------+
|sno|year|StateAbbr|    StateDesc|CityName|GeographicLevel|DataSource|  category|UniqueID|             Measure|Data_Value_Unit|DataValueTypeID|     Data_Value_Type|Data_Value|Low_Confidence_Limit|High_Confidence_Limit|Data_Value_Footnote_Symbol|Data_Value_Footnote|PopulationCount|GeoLocation|categoryID|MeasureId|cityFIPS|TractFIPS|Short_Question_Text|
+---+----+---------+-------------+--------+---------------+----------+----------+--------+--------------------+---------------+---------------+--------------------+----------+--------------------+---------------------+--------------------------+-------------------+---------------+-----------+----------+---------+--------+---------+-------------------+
|  1|2014|       US|United States|    null|             US|     BRFSS|Prevention|      59|Current lack of h...|              %|      AgeAdjPrv|Age-adjusted prev...|      14.9|                14.6|                 15.2|                      null|               null|      308745538|       null|   PREVENT|  ACCESS2|    null|     null|   Health Insurance|
+---+----+---------+-------------+--------+---------------+----------+----------+--------+--------------------+---------------+---------------+--------------------+----------+--------------------+---------------------+--------------------------+-------------------+---------------+-----------+----------+---------+--------+---------+-------------------+
only showing top 1 row

+---+----+---------+
|sno|year|StateAbbr|
+---+----+---------+
|  1|2014|       US|
+---+----+---------+
only showing top 1 row

【问题讨论】:

    标签: scala csv apache-spark apache-spark-sql


    【解决方案1】:

    只需使用toDF:

    df.toDF(df.columns map(_.toLowerCase): _*)
    

    【讨论】:

    • 我明白了。谢谢你。
    【解决方案2】:

    实现它的其他方法是使用 FoldLeft 方法。

    val myDFcolNames = myDF.columns.toList
    val rdoDenormDF = myDFcolNames.foldLeft(myDF)((myDF, c) =>
        myDF.withColumnRenamed(c.toString.split(",")(0), c.toString.toLowerCase()))
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-08-01
      • 1970-01-01
      • 2013-06-01
      • 1970-01-01
      • 2017-04-26
      • 2021-04-16
      • 2019-04-30
      • 2017-11-29
      相关资源
      最近更新 更多