【发布时间】:2020-11-12 13:25:38
【问题描述】:
我想使用 BigQuery 而不是 Pandas 为我的类别创建虚拟变量(单热编码)。 我最终会得到大约 200 列,因此我无法手动完成并对其进行硬编码
测试数据集(实际的变量比这个多得多)
WITH table AS (
SELECT 1001 as ID, 'blue' As Color, 'big' AS size UNION ALL
SELECT 1002 as ID, 'yellow' As Color, 'medium' AS size UNION ALL
SELECT 1003 as ID, 'red' As Color, 'small' AS size UNION ALL
SELECT 1004 as ID, 'blue' As Color, 'small' AS size)
SELECT *
FROM table
预期结果:
【问题讨论】:
标签: google-bigquery one-hot-encoding dummy-variable