【发布时间】:2018-01-29 11:48:06
【问题描述】:
我有一个数据框,其中包含如下几列:
类别|类别ID|桶|道具计数|事件计数 | accum_prop_count | accum_event_count -------------------------------------------------- -------------------------------------------------- - 民族|民族| 1 | 222 |第444章555 |第6677章此数据框从 0 行开始,我的脚本的每个函数都在其中添加一行。
有一个功能需要根据条件修改 1 或 2 个单元格值。如何做到这一点?
代码:
schema = StructType([StructField("category", StringType()), StructField("category_id", StringType()), StructField("bucket", StringType()), StructField("prop_count", StringType()), StructField("event_count", StringType()), StructField("accum_prop_count",StringType())]) a_df = sqlContext.createDataFrame([],schema) a_temp = sqlContext.createDataFrame([("nation","nation",1,222,444,555)],schema) a_df = a_df.unionAll(a_temp)从其他函数添加的行:
a_temp3 = sqlContext.createDataFrame([("nation","state",2,222,444,555)],schema) a_df = a_df.unionAll(a_temp3)现在要修改,我正在尝试加入条件。
a_temp4 = sqlContext.createDataFrame([("state","state",2,444,555,666)],schema) a_df = a_df.join(a_temp4, [(a_df.category_id == a_temp4.category_id) & (some other cond here)], how = "inner")但是这段代码不起作用。我收到一个错误:
+--------+------------+------+----------+---------- -+----------------+--------+------------+------+--- -------+-----------+----------------+ |category|category_id|bucket|prop_count|event_count|accum_prop_count|category|category_id|bucket|prop_count|event_count|accum_prop_count| +--------+------------+------+----------+---------- -+----------------+--------+------------+------+--- -------+-----------+----------------+ |民族|状态| 2| 222| 444| 555|状态|状态| 2| 444| 555| 666| +--------+------------+------+----------+---------- -+----------------+--------+------------+------+--- -------+-----------+----------------+
如何解决这个问题?正确的输出应该有 2 行,第二行应该有更新的值
【问题讨论】:
标签: python apache-spark dataframe sql-update