【发布时间】:2021-04-01 10:02:25
【问题描述】:
我有以下 PySpark DataFrame df:
df.printSchema()
|-- yearday: integer (nullable = true)
|-- month: integer (nullable = true)
|-- dayofweek: integer (nullable = true)
|-- year: integer (nullable = true)
当我应用 VectorAssembler 时,features 将转换为 string 值,而不是原始的 integer 值。
from pyspark.ml.feature import VectorAssembler
vectorAssembler = VectorAssembler(inputCols = ['yearday', 'month', 'dayofweek', 'year'], outputCol = 'features')
df = vectorAssembler.transform(df)
df.select(['features']).show()
这是输出的样子:
如何获取features 中的整数?
【问题讨论】:
标签: python apache-spark pyspark apache-spark-ml