【发布时间】:2021-08-09 00:17:02
【问题描述】:
我正在使用以下代码
def process_row(row):
items = row.replace('"', '')
items2 = items.split(' ')
for x in items2:
items2.append(x.replace('-', '0'))
return [string(items[0]), string(items[1]), string(items[2]),
string(items[3]), string(items[4]), int(items[5])]
nasa = (
nasa_raw.map(process_row)
)
for row in nasa.take(5):
print(row)
在文本文件上:
in24.inetnebr.com [01/Aug/1995:00:00:01] "GET /shuttle/missions/sts-68/news/sts-68-mcc-05.txt" 200 1839 uplherc.upl.com [01/Aug/1995:00:00:07] "GET /" 304 0 uplherc.upl.com [01/Aug/1995:00:00:08] "GET /images/ksclogo-medium.gif" 304 0 uplherc.upl.com [01/Aug/1995:00:00:08] "GET /images/MOSAIC-logosmall.gif" 304 0 uplherc.upl.com [01/Aug/1995:00:00:08] "GET /images/USA-logosmall.gif" 304 0 ix-esc-ca2-07.ix.netcom.com [01/Aug/1995:00:00:09] "GET /images/launch-logo.gif" 200 1713 uplherc.upl.com [01/Aug/1995:00:00:10] "GET /images/WORLD-logosmall.gif" 304 0 slppp6.intermind.net [01/Aug/1995:00:00:10] "GET /history/skylab/skylab.html" 200 1687 piweba4y.prodigy.com [01/Aug/1995:00:00:10] "GET /images/launchmedium.gif" 200 11853 slppp6.intermind.net [01/Aug/1995:00:00:11] "GET /history/skylab/skylab-small.gif" 200 9202
我看到我的 replace 功能正在工作,并且引号被替换为空格。 拆分函数似乎失败了,因为结果应该是每一行的一个标记,但这不是我的结果。
我在这里缺少什么?
【问题讨论】:
标签: python apache-spark pyspark bigdata databricks