【问题标题】:Python DataFrame explode running slowPython DataFrame 爆炸运行缓慢
【发布时间】:2022-06-18 00:11:05
【问题描述】:

我有一个 Python 数据框,我试图在包含元素数组的列“transactions_details”上使用它。返回数据需要很长时间。 'explode'是否有替代品

这是我的代码:

df2 = df1.withColumn("transactions_details.itemid", explode(col("transactions.details.itemid")))\
.withColumn("transactions_details.expiration", explode(col("transactions.details.expiration")))\
.withColumn("transactions_details.externalitemid", explode(col("transactions.details.externalitemid")))\
.withColumn("transactions_details.from_sitelocationid", explode(col("transactions.details.from_sitelocationid")))\
.withColumn("transactions_details.itemDescription", explode(col("transactions.details.itemDescription")))\
.withColumn("transactions_details.lot", explode(col("transactions.details.lot")))\
.withColumn("transactions_details.ndcCode", explode(col("transactions.details.ndcCode")))\
.withColumn("transactions_details.ndcCode10Digit", explode(col("transactions.details.ndcCode10Digit")))\
.withColumn("transactions_details.ndcDesc", explode(col("transactions.details.ndcDesc")))\
.withColumn("transactions_details.qty", explode(col("transactions.details.qty")))\
.withColumn("transactions_details.to_sitelocationid", explode(col("transactions.details.to_sitelocationid")))\
.drop("transactions")
display(df2)

【问题讨论】:

  • 你能分享一下你想要的输出是什么吗?

标签: python pyspark


猜你喜欢
  • 2021-11-09
  • 1970-01-01
  • 2018-10-17
  • 2016-08-02
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多