【问题标题】:Hive: Need to specify partition columns because the destination table is partitionedHive:需要指定分区列,因为目标表是分区的
【发布时间】:2017-07-19 23:49:17
【问题描述】:

我想知道是否可以在 Hive 中将未分区的表插入到 分区的表中。第一张表如下:

hive> describe extended user_ratings;
OK
userid                  int                                         
movieid                 int                                         
rating                  int                                         
unixtime                int                                         

Detailed Table Information  Table(tableName:user_ratings, dbName:ml, owner:cloudera, createTime:1500142667, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:userid, type:int, comment:null), FieldSchema(name:movieid, type:int, comment:null), FieldSchema(name:rating, type:int, comment:null), FieldSchema(name:unixtime, type:int, comment:null)], location:hdfs://quickstart.cloudera:8020/user/hive/warehouse/ml.db/user_ratings, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=    , field.delim=
Time taken: 0.418 seconds, Fetched: 6 row(s)

新表如下:

hive> describe extended rating_buckets;
OK
userid                  int                                         
movieid                 int                                         
rating                  int                                         
unixtime                int                                         
genre                   string                                      

# Partition Information      
# col_name              data_type               comment             

genre                   string                                      

Detailed Table Information  Table(tableName:rating_buckets, dbName:default, owner:cloudera, createTime:1500506879, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:userid, type:int, comment:null), FieldSchema(name:movieid, type:int, comment:null), FieldSchema(name:rating, type:int, comment:null), FieldSchema(name:unixtime, type:int, comment:null), FieldSchema(name:genre, type:string, comment:null)], location:hdfs://quickstart.cloudera:8020/user/hive/warehouse/rating_buckets, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:8, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=  , field.delim=
Time taken: 0.46 seconds, Fetched: 12 row(s)

似乎将分区(“流派”)计算为与其他列相同...我可能创建错了吗?

无论如何,当我尝试在新表中执行 INSERT OVERWRITE 时会发生以下情况:

hive> FROM ml.user_ratings
    > INSERT OVERWRITE TABLE rating_buckets
    > select userid, movieid, rating, unixtime;
FAILED: SemanticException 2:23 Need to specify partition columns because the destination table is partitioned. Error encountered near token 'rating_buckets'

我应该重新创建带有分区的第一个表吗?有没有办法复制第一个表并保持分区不变?

【问题讨论】:

    标签: sql hadoop hive


    【解决方案1】:

    您甚至没有在选择列表中包括流派。我认为它需要在您的选择中排在最后。你不能一无所有。

    您还需要指定与表的分区,如下所示:

    insert overwrite table ratings_buckets partition(genre)
    select
    userid,
    movieid,
    rating,
    unixtime,
    <SOMETHING> as genre
    from
    ...
    

    【讨论】:

    • 感谢您的输入,但不幸的是,它返回以下内容: hive> insert overwrite table rating_buckets partition(genre) > select > userid, > movieid, > rating, > unixtime, > (action) as流派 > 来自 ml.user_ratings; FAILED: SemanticException [Error 10004]: Line 7:1 Invalid table alias or column reference 'action': (可能的列名是: userid, movieid, rating, unixtime)
    • 您是否要插入“动作”一词作为您的流派?如果是这样,您需要用单引号将其括起来,而不是括号:'action' as genre
    猜你喜欢
    • 2016-06-19
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-03-09
    • 1970-01-01
    • 2022-12-04
    • 1970-01-01
    相关资源
    最近更新 更多