【发布时间】:2020-03-14 05:38:28
【问题描述】:
我正在尝试创建一个小型数据框,以便保存两个标量(双精度)和一个字符串
来自How to create spark dataframe with column name which contains dot/period?
from pyspark.sql.types import StructType, StructField, StringType, DoubleType
input_data = ([output_stem, paired_p_value, scalar_pearson])
schema = StructType([StructField("Comparison", StringType(), False), \
StructField("Paired p-value", DoubleType(), False), \
StructField("Pearson coefficient", DoubleType(), True)])
df_compare_AF = sqlContext.createDataFrame(input_data, schema)
display(df_compare_AF)
产生错误信息:
TypeError: StructType can not accept object 's3://sanford-biofx-dev/con/dev3/dev' in type <class 'str'> 这对我来说没有任何意义,本专栏意味着用于字符串
我的其他解决方案来自 Add new rows to pyspark Dataframe
columns = ["comparison", "paired p", "Pearson coefficient"]
vals = [output_stem, paired_p_value, scalar_pearson]
df = spark.createDataFrame(vals, columns)
display(df)
但这会报错:TypeError: Can not infer schema for type: <class 'str'>
我只想要一个小数据框:
comparison | paired p-value | Pearson Coefficient
-------------------------------------------------
s3://sadf | 0.045 | -0.039
【问题讨论】:
-
将
([output_stem, paired_p_value, scalar_pearson])替换为([output_stem, paired_p_value, scalar_pearson], )或[[output_stem, paired_p_value, scalar_pearson]] -
@10465355saysReinstateMonica 行得通,如果你把你的解决方案作为答案,我会接受它
标签: python dataframe apache-spark pyspark databricks