如何在雅典娜中使用带有sqlite的tsv文件对列进行行格式化答案

【问题标题】：How to row format a column with tsv file with sqlite in athena如何在雅典娜中使用带有sqlite的tsv文件对列进行行格式化
【发布时间】：2021-02-05 18:56:37
【问题描述】：

所以我想将这些从 tsv 文件添加到 athena 的表中，除了最后一列类型之外，我可以这样做。我的意思是我可以添加它，但我希望它像 ["Comedy", "Mystery"] 但它以 [Comedy,Mystery] 的形式出现，这使得无法以任何方式访问它们

tconst      genres 

tt0081313   Action

tt0081315   Comedy,Mystery

tt0081349   Comedy,Crime

这就是我所做的：

CREATE EXTERNAL TABLE `title_basics`(
  `tconst` string, 
  `genres` Array<string>)

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (

  'field.delim' = '\t'  # This is for separating them by tab which is right but how can I also
                        # add the genres the way I want them to the table

)

STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'mylocation'
TBLPROPERTIES (
  'has_encrypted_data'='false', 
  'transient_lastDdlTime'='-----')

【问题讨论】：

标签： sql amazon-web-services sqlite amazon-athena hive-serde

【解决方案1】：

列流派被解释为字符串。有两种可能的解决方案：

select split(genre,',') -- this will give you an array of genres

或者通过添加,作为集合分隔符直接将列创建为数组。

【讨论】：