【问题标题】:PYSpark data Frame schema is showing String for every columnPYSpark dataFrame 模式显示每列的字符串
【发布时间】:2022-11-02 22:38:09
【问题描述】:

我正在从下面的代码 sn-p 读取 CSV 文件

df_pyspark = spark.read.csv("sample_data.csv") df_pyspark

当我尝试打印数据帧时,它的输出如下所示:

DataFrame[_c0: string, _c1: string, _c2: string, _c3: string, _c4: string, _c5: string]

对于每一列数据类型显示“字符串”,即使列包含不同的数据类型,如下所示:

df_pyspark.show()

|_c0|       _c1|      _c2|                 _c3|        _c4|       _c5|
+---+----------+---------+--------------------+-----------+----------+
| id|first_name|last_name|               email|     gender|     phone|
|  1|    Bidget| Mirfield|bmirfield0@scient...|     Female|5628618353|
|  2|   Gonzalo|    Vango|    gvango1@ning.com|       Male|9556535457|
|  3|      Rock| Pampling|rpampling2@guardi...|   Bigender|4472741337|
|  4|   Dorella|  Edelman|dedelman3@histats...|     Female|4303062344|
|  5|     Faber|  Thwaite|fthwaite4@google....|Genderqueer|1348658809|
|  6|     Debee| Philcott|dphilcott5@cafepr...|     Female|7906881842|`

我想打印每一列的确切数据类型?

谢谢你!

因为我是新手,所以我对 PYSpark 了解不多!

【问题讨论】:

    标签: python dataframe pyspark scheme


    【解决方案1】:

    在读取 CSV 文件期间使用 inferSchema 参数,它将根据列中的值显示准确/正确的数据类型

        df_pyspark = spark.read.csv("sample_data.csv", header=True, inferSchema=True)
    
        +---+----------+---------+--------------------+-----------+----------+
        | id|first_name|last_name|               email|     gender|     phone|
        +---+----------+---------+--------------------+-----------+----------+
        |  1|    Bidget| Mirfield|bmirfield0@scient...|     Female|5628618353|
        |  2|   Gonzalo|    Vango|    gvango1@ning.com|       Male|9556535457|
        |  3|      Rock| Pampling|rpampling2@guardi...|   Bigender|4472741337|
        |  4|   Dorella|  Edelman|dedelman3@histats...|     Female|4303062344|
        |  5|     Faber|  Thwaite|fthwaite4@google....|Genderqueer|1348658809|
        +---+----------+---------+--------------------+-----------+----------+
        only showing top 5 rows
    
        df_pyspark.printSchema()
    
        root
         |-- id: integer (nullable = true)
         |-- first_name: string (nullable = true)
         |-- last_name: string (nullable = true)
         |-- email: string (nullable = true)
         |-- gender: string (nullable = true)
         |-- phone: long (nullable = true)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-09-26
      • 2020-11-10
      • 1970-01-01
      • 2017-05-08
      • 1970-01-01
      • 2021-06-28
      • 1970-01-01
      • 2021-06-16
      相关资源
      最近更新 更多