我如何修改代码以获取空数组？答案

【问题标题】：how I can modify a code to get empty array too?我如何修改代码以获取空数组？
【发布时间】：2022-08-09 17:36:24
【问题描述】：

我有以下代码：

L = {\'L1\': [\'us\'] }
#df1 = df1.withColumnRenamed(\"name\",\"OriginalCompanyName\")
for key, vals in L.items():
    # regex pattern for extracting vals
    pat = r\'\\\\b(%s)\\\\b\' % \'|\'.join(vals)

    # extract matching occurrences
    col1 = F.expr(\"regexp_extract_all(array_join(loc, \' \'), \'%s\')\" % pat)

    # Mask the rows with null when there are no matches
    df1 = df1.withColumn(key, F.when((F.size(col1) == 0), None).otherwise(col1))

它从loc 和key 列中提取us，否则为us 和null。我在loc 列中还有一些空列表[]。当loc 为空时，我还想将us 放在key 列中。如果我将 L = {\'L1\': [\'us\'] } 更改为 L = {\'L1\': [\'us\',\'[]\' } 它不起作用。

由于某种原因，当loc 为空时，此代码实际上会消除行。我可以修改代码吗？

暗示：空loc可以通过以下代码找到：

df1=df1.withColumn(\'empty_country\', when(sf.size(\'loc\')==0,\'us\'))

数据样本

loc
[\"this is ,us, better life\"]
[\"no one is, in charge\"]
[\"I am, very far, from us\"]
[]


loc
[\"this is ,us, better life\"]      [\"us\"]
[\"no one is, in charge\"]           null
[\"I am, very far, from us\"]        [\"us\"]
[]                                 [\"us\"]

标签： pyspark

【解决方案1】：

对for 循环中的最后一行进行此更改：

df1 = df1.withColumn(key, f.when((f.size(col1) == 0) & (f.size('loc')!=0), None).when(f.size('loc')==0, f.array(f.lit('us'))).otherwise(col1))

PS：regexp_extract_all的输出是一个数组。

【讨论】：