【发布时间】:2022-10-01 00:57:26
【问题描述】:
我有一个看起来像这样的 Pyspark DataFrame:
sdf1 = sc.parallelize([[\"toto\", \"tata\", [\"table\", \"column\"], \"SELECT {1} FROM {0}\"], \"titi\", \"tutu\", [\"table\", \"column\"], \"SELECT {1} FROM {0}\"]]).toDF([\"table\", \"column\", \"parameters\", \"statement\"])
+-----+------+---------------+-------------------+
|table|column| parameters| statement|
+-----+------+---------------+-------------------+
| toto| tata|[table, column]|SELECT {1} FROM {0}|
| titi| tutu|[table, column]|SELECT {1} FROM {0}|
+-----+------+---------------+-------------------+
我尝试将数组“参数”元素映射到列,最终用列中的值格式化“语句”。
这是我在处理转换后所期望的:
sdf2 = sc.parallelize([[\"toto\", \"tata\", [\"table\", \"column\"], \"SELECT {1} FROM {0}\", \"SELECT tata FROM toto\"],[\"titi\", \"tutu\", [\"table\", \"column\"], \"SELECT {1} FROM {0}\", \"SELECT tutu FROM titi\"]]).toDF([\"table\", \"column\", \"parameters\", \"statement\", \"result\"])
+-----+------+---------------+-------------------+---------------------+
|table|column| parameters| statement| result|
+-----+------+---------------+-------------------+---------------------+
| toto| tata|[table, column]|SELECT {1} FROM {0}|SELECT tata FROM toto|
| titi| tutu|[table, column]|SELECT {1} FROM {0}|SELECT tutu FROM titi|
+-----+------+---------------+-------------------+---------------------+
标签: arrays dataframe dictionary pyspark format-string