【发布时间】:2022-12-14 12:59:48
【问题描述】:
输入数据帧:
| class | malecount | femalecount |
|---|---|---|
| A | 2 | 1 |
| B | 3 | 1 |
| C | 0 | 3 |
| D | 2 | 4 |
预期输出数据框:
| Class | Gender |
|---|---|
| A | m |
| A | m |
| B | m |
| B | m |
| B | m |
| D | m |
| D | m |
| A | f |
| B | f |
| C | f |
| C | f |
| C | f |
| D | f |
| D | f |
| D | f |
| D | f |
【问题讨论】:
输入数据帧:
| class | malecount | femalecount |
|---|---|---|
| A | 2 | 1 |
| B | 3 | 1 |
| C | 0 | 3 |
| D | 2 | 4 |
预期输出数据框:
| Class | Gender |
|---|---|
| A | m |
| A | m |
| B | m |
| B | m |
| B | m |
| D | m |
| D | m |
| A | f |
| B | f |
| C | f |
| C | f |
| C | f |
| D | f |
| D | f |
| D | f |
| D | f |
【问题讨论】:
您可以为每个类别创建男性和女性数组,然后将其分解。
见下面的例子
data_sdf.
withColumn('male_arr', func.expr('concat_ws(",", array_repeat("m", cast(malecount as int)))')).
withColumn('female_arr', func.expr('concat_ws(",", array_repeat("f", cast(femalecount as int)))')).
withColumn('male_female', func.concat_ws(',',
func.expr('if(male_arr="", null, male_arr)'),
func.expr('if(female_arr="", null, female_arr)')
)
).
selectExpr('class', 'explode(split(male_female, ",")) as gender').
show()
# +-----+------+
# |class|gender|
# +-----+------+
# | A| m|
# | A| m|
# | A| f|
# | B| m|
# | B| m|
# | B| m|
# | B| f|
# | C| f|
# | C| f|
# | C| f|
# | D| m|
# | D| m|
# | D| f|
# | D| f|
# | D| f|
# | D| f|
# +-----+------+
【讨论】: