【发布时间】:2022-01-03 13:57:29
【问题描述】:
我有以下两个场景之间共享的前奏代码:
from pyspark.sql import SparkSession
from pyspark.sql.types import *
import pyspark.sql.functions as F
import pandas as pd
import numpy as np
spark = SparkSession.builder.getOrCreate()
df = pd.DataFrame({"col1": [1, 2, 3], "col2": [22.0, 88.0, np.nan]})
现在,我想将 df 转换为 pyspark 数据框 (sdf)。当我尝试在创建sdf 期间通过模式将"col2" 隐式“转换”为LongType 时,它失败了:
schema = StructType([StructField("col1", LongType()), StructField("col2", LongType())])
sdf = spark.createDataFrame(df[schema.fieldNames()], schema=schema)
错误:
TypeError: field col2: LongType can't accept object 22.0 in type
但如果我运行以下 sn-p 就可以了:
schema_2 = StructType(
[StructField("col1", LongType()), StructField("col2", FloatType())]
)
sdf = spark.createDataFrame(df[schema.fieldNames()], schema=schema_2)
cast_sdf = sdf.withColumn("col2", F.col("col2").cast(LongType()))
cast_sdf.show()
输出:
+----+----+
|col1|col2|
+----+----+
| 1| 22|
| 2| 88|
| 3| 0|
+----+----+
【问题讨论】:
标签: pandas apache-spark pyspark apache-spark-sql python-3.7