【发布时间】:2020-05-20 20:11:31
【问题描述】:
这在网络上有一些变化,但不是我所期望的。 我有一个像这样的数据框:
+------+-------+------------+---------------+----------------+--------+---------+
|SEQ_ID|TOOL_ID|isfleetlevel|is_golden_limit|use_golden_limit|New_UL |New_LL |
+------+-------+------------+---------------+----------------+--------+---------+
|790026|9160 |0 |1 |0 |26.1184 |23.2954 |
|790026|13509 |0 |0 |1 |Infinity|-Infinity|
|790026|9162 |0 |0 |0 |25.03535|23.48585 |
|790026|13510 |0 |0 |1 |Infinity|-Infinity|
|790048|9162 |0 |0 |0 |33.5 |30.5 |
|790048|13509 |0 |0 |1 |Infinity|-Infinity|
|790048|13510 |0 |0 |0 |NaN |NaN |
|790048|9160 |0 |1 |0 |33.94075|30.75925 |
+------+-------+------------+---------------+----------------+--------+---------+
我想将use_golden_limit 为1 的New_UL 和New_LL 值替换为每个SEQ_ID 的is_golden_limit 为1 的值。所以,在这种情况下,预期的结果是:
+------+-------+------------+---------------+----------------+--------+---------+
|SEQ_ID|TOOL_ID|isfleetlevel|is_golden_limit|use_golden_limit|New_UL |New_LL |
+------+-------+------------+---------------+----------------+--------+---------+
|790026|9160 |0 |1 |0 |26.1184 |23.2954 |
|790026|13509 |0 |0 |1 |26.1184 |23.2954 |
|790026|9162 |0 |0 |0 |25.03535|23.48585 |
|790026|13510 |0 |0 |1 |26.1184 |23.2954 |
|790048|9162 |0 |0 |0 |33.5 |30.5 |
|790048|13509 |0 |0 |1 |33.94075|30.75925 |
|790048|13510 |0 |0 |0 |NaN |NaN |
|790048|9160 |0 |1 |0 |33.94075|30.75925 |
+------+-------+------------+---------------+----------------+--------+---------+
这可能吗?
【问题讨论】:
-
“1”中的“is_golden_limit”是否预计超过一行?
-
@Mitodina,理想情况下
is_golden_limit= 1不应超过一行。我有代码来识别此类情况以单独处理它们。不过,这是个好问题。如果它确实有多个 =1 的行,是否取第一个值? -
@thentangler 请检查我的解决方案
标签: pyspark pyspark-sql pyspark-dataframes