【发布时间】:2019-09-13 04:54:09
【问题描述】:
我的输入文件包含以下输入
"date","time","size","r_version","r_arch","r_os"
"2012-10-01","00:30:13",35165,"2.15.1","i686","linux-gnu"
"2012-10-01","00:30:15",212967,"2.15.1","i686","linux-gnu"
"2012-10-01","02:30:16",167199,"2.15.1","x86_64","linux-gnu"
我现在的输出是这样的
我需要的输出是
我试过下面的代码
conf=SparkConf().setMaster("local").setAppName("logfile")
sc=SparkContext(conf = conf)
spark=SparkSession.builder.appName("yuva").getOrCreate()
lines=sc.textFile("file:///SaprkCourse/filelog.txt")
lines=Seq("file:///SaprkCourse/filelog.txt").t
header = lines.first()
lines = lines.filter(lambda row : row != header)
values=lines.map(lambda x: x.split(","))
df=values.toDF(header.split(","))
df.show()
【问题讨论】:
标签: apache-spark pyspark apache-spark-sql pyspark-sql