有没有办法改变存储为 ORC 的配置单元表中的列？答案

【问题标题】：Is there a way to alter column in a hive table that is stored as ORC?有没有办法改变存储为 ORC 的配置单元表中的列？
【发布时间】：2016-11-30 08:01:02
【问题描述】：

已经有一个关于 Hive 的一般问题（ Is there a way to alter column type in hive table?)。该问题的答案表明可以使用 alter table change 命令更改架构

但是，如果文件存储为 ORC，这也可能吗？

【问题讨论】：

标签： hive orc

【解决方案1】：

你可以将orc文件加载到pyspark中：

将数据加载到数据框中：

df = spark.read.format("orc").load("<path-of-file-in-hdfs")

在数据框上创建一个视图：

df2 = df.createOrReplaceTempView('Table')

创建一个带有操作列的新数据框：

df3 = spark.sql("select *, cast(third_column as float) as third_column,  from Table")

将数据框保存到 hdfs：

df3.write.format("orc").save("<hdfs-path-where-file-needs-to-be-saved")

【讨论】：

【解决方案2】：

我在 ORC 表上运行了测试。可以将字符串转换为浮点列。

ALTER TABLE test_orc CHANGE third_column third_column float;

会将标记为字符串列的名为third_column 的列转换为浮点列。也可以更改列的名称。

旁注：我很好奇对 ORC 的其他更改是否会产生问题。我在尝试对列重新排序时遇到了异常。

ALTER TABLE test_orc CHANGE third_column third_column float AFTER first_column;

异常是：FAILED: Execution Error，从 org.apache.hadoop.hive.ql.exec.DDLTask 返回代码 1。表 default.test_orc 不支持重新排序列。 SerDe 可能不兼容。

【讨论】：