【发布时间】:2020-11-25 15:17:13
【问题描述】:
我正在尝试将带有管道分隔符的 csv 加载到配置单元外部表中。数据字段中出现的管道用引号括起来。数据中出现的双引号用 \ 转义。当我配置外部表时,我看到带有双引号的数据没有正确解释。
test.csv
id|name
105|"Test | pipe delim in field"
107|\" Test Escaped single double quote in HIVE
108|\" Test Escaped enclosed double quote in HIVE \"
109|\\" Test Escaped enclosed double quote in HIVE \"
110|\\" Test Escaped enclosed double quote in HIVE \\"
外部建表语句
drop table test_schema.hive_test;
CREATE EXTERNAL TABLE test_schema.hive_test (id string, name string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES
(
"separatorChar" = "|",
"quoteChar" = "\"",
"escapeChar" = "\\"
)
LOCATION '/staging/test/hive'
tblproperties ("skip.header.line.count"="1");
输出
+---------------+-------------------------------------------------+
| hive_test.id | hive_test.name |
+---------------+-------------------------------------------------+
| 105 | Test | pipe delim in field |
| 107 | NULL |
| 108 | NULL |
| 109 | NULL |
| 110 | " Test Escaped enclosed double quote in HIVE \ |
+---------------+-------------------------------------------------+
预期输出
+---------------+-------------------------------------------------+
| hive_test.id | hive_test.name |
+---------------+-------------------------------------------------+
| 105 | Test | pipe delim in field |
| 107 | " Test Escaped single double quote in HIVE |
| 108 | " Test Escaped enclosed double quote in HIVE " |
| 109 | NULL |
| 110 | NULL |
+---------------+-------------------------------------------------+
打开 CSV 版本 2.3
【问题讨论】:
-
我尝试使用带有双反斜杠的附加行 109 和 110,如您共享的解决方案之一中所建议的那样。我看到 110 显示了价值,但第二个双引号再次没有正确显示。尝试更新问题
标签: hadoop hive opencsv hive-serde