Pyspark 为包含时间范围的记录创建多行答案

【问题标题】：Pyspark create multiple rows for a record that include a time rangePyspark 为包含时间范围的记录创建多行
【发布时间】：2023-03-03 13:23:01
【问题描述】：

我有一个像这样的数据框。

A  Start  End
1  1578   1581
1  1789   1790
2  1800   1802

开始和结束是纪元。我想每秒钟创建多行，像这样

如何在 pyspark 中做到这一点？（不需要保持顺序）

谢谢！

【问题讨论】：

0 您需要对具有所有秒数的支持数据帧进行非等值连接。第二行A的值是1吗？还是错字？
@StefanoGallotti 它是其中之一。这是为了向您展示 A 在数据集中可能不是不同的。

标签： python pyspark timestamp

【解决方案1】：

这个想法是创建一个list，通过包含中间seconds 来涵盖整个时间跨度。例如；对于Start = 1578 和End = 1581，我们创建一个列表[1578,1579,1580,1581]。要创建此列表，我们首先创建一个UDF。一旦得到这个列表，我们就explode它来获取所需的dataframe。

# Creating the DataFrame
values = [(1,1578,1581),(1,1789,1790),(2,1800,1802)]
df = sqlContext.createDataFrame(values,['A','Start','End'])
df.show()
+---+-----+----+
|  A|Start| End|
+---+-----+----+
|  1| 1578|1581|
|  1| 1789|1790|
|  2| 1800|1802|
+---+-----+----+

# Import requisite packages
from pyspark.sql.functions import udf, col, explode, array, struct
from pyspark.sql.types import ArrayType, StructType, StructField, IntegerType

#Creating UDFs below to create a list.
def make_list(start,end):
    return list(range(start,end+1))
make_list_udf = udf(make_list,ArrayType(IntegerType()))

#Creating Lists of seconds finally.
df = df.withColumn('my_list',make_list_udf(col('Start'),col('End'))).drop('Start','End')
df.show(truncate=False)
+---+------------------------+
|A  |my_list                 |
+---+------------------------+
|1  |[1578, 1579, 1580, 1581]|
|1  |[1789, 1790]            |
|2  |[1800, 1801, 1802]      |
+---+------------------------+

#Exploding the Lists
df = df.withColumn('time', explode('my_list')).drop('my_list')
df.show()
+---+----+
|  A|time|
+---+----+
|  1|1578|
|  1|1579|
|  1|1580|
|  1|1581|
|  1|1789|
|  1|1790|
|  2|1800|
|  2|1801|
|  2|1802|
+---+----+

【讨论】：

【解决方案2】：

假设您的数据在数据帧 df 中，并且您有一个支持数据帧 s_df 的秒数，您可以这样做：

df.alias("a").join(s_df.alias("b"), (col("a.Start") >= col("b.time)) & (col("a. End") <= col("b.time)), "inner").select(col("a.A"), col("b.time")).

如果“A”重叠，可能会成为问题。在这种情况下，您可能希望将“A”设置为唯一的，以建立属于哪个时代的

【讨论】：