【发布时间】:2016-01-15 04:29:51
【问题描述】:
由于没有开箱即用的支持在 spark 中读取 excel 文件,所以我首先将 excel 文件读入 pandas 数据帧,然后尝试将 pandas 数据帧转换为 spark 数据帧,但出现以下错误 (我使用的是火花 1.5.1)
import pandas as pd
from pandas import ExcelFile
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.types import *
pdf=pd.read_excel('/home/testdata/test.xlsx')
df = sqlContext.createDataFrame(pdf)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/spark/spark-hadoop/python/pyspark/sql/context.py", line 406, in createDataFrame
rdd, schema = self._createFromLocal(data, schema)
File "/opt/spark/spark-hadoop/python/pyspark/sql/context.py", line 337, in _createFromLocal
data = [schema.toInternal(row) for row in data]
File "/opt/spark/spark-hadoop/python/pyspark/sql/types.py", line 541, in toInternal
return tuple(f.toInternal(v) for f, v in zip(self.fields, obj))
File "/opt/spark/spark-hadoop/python/pyspark/sql/types.py", line 541, in <genexpr>
return tuple(f.toInternal(v) for f, v in zip(self.fields, obj))
File "/opt/spark/spark-hadoop/python/pyspark/sql/types.py", line 435, in toInternal
return self.dataType.toInternal(obj)
File "/opt/spark/spark-hadoop/python/pyspark/sql/types.py", line 191, in toInternal
else time.mktime(dt.timetuple()))
AttributeError: 'datetime.time' object has no attribute 'timetuple'
有人知道怎么解决吗?
【问题讨论】:
-
你能发一个链接到你的
test.xlsx吗?