更改配置单元表添加或删除列答案

【问题标题】：Alter hive table add or drop column更改配置单元表添加或删除列
【发布时间】：2016-03-15 20:46:29
【问题描述】：

我在 hive 中有 orc 表我想从该表中删除列

ALTER TABLE table_name drop  col_name;

但我得到以下异常

执行 hive 查询时出错：OK FAILED: ParseException line 1:35 mismatched input 'user_id1' expected PARTITION near 'drop' in drop partition statement

任何人都可以帮助我或提供任何想法吗？注意，我是using hive 0.14

【问题讨论】：

标签： hadoop hive

【解决方案1】：

对于外部表，它简单易行。只需删除表模式，然后编辑 create table schema ，最后再次使用新模式创建表。示例表：aparup_test.tbl_schema_change 并将删除列 id 步骤：-

------------- show create table to fetch schema ------------------

spark.sql("""
show create table aparup_test.tbl_schema_change
""").show(100,False)

o/p:
CREATE EXTERNAL TABLE aparup_test.tbl_schema_change(name STRING, time_details TIMESTAMP, id BIGINT)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1'
)
STORED AS
  INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'gs://aparup_test/tbl_schema_change'
TBLPROPERTIES (
  'parquet.compress' = 'snappy'
)
""")

------------- drop table --------------------------------

spark.sql("""
drop table aparup_test.tbl_schema_change
""").show(100,False)

------------- edit create table schema by dropping column "id"------------------

CREATE EXTERNAL TABLE aparup_test.tbl_schema_change(name STRING, time_details TIMESTAMP)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1'
)
STORED AS
  INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'gs://aparup_test/tbl_schema_change'
TBLPROPERTIES (
  'parquet.compress' = 'snappy'
)
""")

------------- sync up table schema with parquet files ------------------

spark.sql("""
msck repair table aparup_test.tbl_schema_change
""").show(100,False)

==================== DONE =====================================

【讨论】：

【解决方案2】：

还有一种“愚蠢”的方式来实现最终目标，即创建一个没有不需要的列的新表。使用 Hive 的 regex 匹配将使这变得相当容易。

我会这样做：

-- make a copy of the old table
ALTER TABLE table RENAME TO table_to_dump;

-- make the new table without the columns to be deleted
CREATE TABLE table AS
SELECT `(col_to_remove_1|col_to_remove_2)?+.+`
FROM table_to_dump;

-- dump the table 
DROP TABLE table_to_dump;

如果有问题的表不是太大，这应该工作得很好。

【讨论】：

不知道为什么这不能在一个非常大的桌子上工作：但是它优雅而简单
要在 0,13 之后的 hive 版本中使用正则表达式匹配，必须设置以下属性“hive.support.quoted.identifiers=none”
重要提示：使用CREATE TABLE table AS SELECT 时，您还必须指定原始分区方案和存储格式（至少），否则将使用 hive 默认值，这很可能不是您想要的！额外警告：在 CTAS 中指定分区方案需要 Hive 3.2.0+！！！天哪。

【解决方案3】：

假设您有一个外部表，即。 organization.employee 为：（不包括 TBLPROPERTIES）

hive> show create table organization.employee;
OK
CREATE EXTERNAL TABLE `organization.employee`(
      `employee_id` bigint,
      `employee_name` string,
      `updated_by` string,
      `updated_date` timestamp)
    ROW FORMAT SERDE
      'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
    STORED AS INPUTFORMAT
      'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
    OUTPUTFORMAT
      'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
    LOCATION
      'hdfs://getnamenode/apps/hive/warehouse/organization.db/employee'

您想从表中删除 updated_by, updated_date 列。请按以下步骤操作：

创建一个 organization.employee 的临时表副本：

hive> create table organization.employee_temp as select * from organization.employee;

删除主表 organization.employee。

hive> drop table organization.employee;

从 HDFS 中移除底层数据（需要从 hive shell 中出来）

[nameet@ip-80-108-1-111 myfile]$ hadoop fs -rm hdfs://getnamenode/apps/hive/warehouse/organization.db/employee/*

根据需要创建删除列的表：

hive> CREATE EXTERNAL TABLE `organization.employee`(
  `employee_id` bigint,
  `employee_name` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://getnamenode/apps/hive/warehouse/organization.db/employee'

将原始记录插入到原始表中。

hive> insert into organization.employee 
select employee_id, employee_name from organization.employee_temp;

最终删除创建的临时表

hive> drop table organization.employee_temp;

【讨论】：

【解决方案4】：

非本地表尚不支持 ALTER TABLE；即指定 STORED BY 子句时使用 CREATE TABLE 得到的结果。

检查这个https://cwiki.apache.org/confluence/display/Hive/StorageHandlers

【讨论】：

【解决方案5】：

即使下面的查询对我有用。

Alter table tbl_name drop col_name

【讨论】：

不适合我，可能是他有其他问题。
对我也不起作用。我正在使用 Hive 1.1.0。也许在更新的版本中？（我可以让它在 Impala-shell 下工作）

【解决方案6】：

ALTER TABLE emp REPLACE COLUMNS( name string, dept string);

以上语句只能更改表的架构，不能更改数据。此问题的解决方案是在新表中复制数据。

Insert <New Table> Select <selective columns> from <Old Table>

【讨论】：

【解决方案7】：

您不能使用命令 ALTER TABLE table_name drop col_name; 直接从表中删除列

删除列的唯一方法是使用替换命令。可以说，我有一个带有 id、name 和 dept 列的表 emp。我想删除表 emp 的 id 列。因此，请在替换列子句中提供您希望成为表一部分的所有列。下面的命令将从 emp 表中删除 id 列。

 ALTER TABLE emp REPLACE COLUMNS( name string, dept string);

【讨论】：

感谢@reena 的回复，我正在使用兽人表，我什至尝试过替换语句，但它在这里也不起作用
你能用 1.2 版本的 hive 试试吗？它适用于 1.2 配置单元。
我遇到了同样的问题，尝试了上面的替换列，失败了，FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask。替换列不能删除表 sandbox6.alc_ont_oe_order_headers_all 的列。 SerDe 可能不兼容
我也有同样的问题。如何从分区表中删除列？上面的命令对我不起作用，我也收到同样的错误..
我在使用镶木地板时遇到问题。有什么建议吗？