在 HDFS 上，我想显示以 ORC 格式存储的配置单元表的普通文本答案

【问题标题】：On HDFS, I want to display normal text for a hive table stored in ORC format在 HDFS 上，我想显示以 ORC 格式存储的配置单元表的普通文本
【发布时间】：2020-08-01 23:23:02
【问题描述】：

我在 Hive 中使用 orc 格式保存了 json 数据帧

jsonDF.write.format("orc").saveAsTable(hiveExamples.jsonTest)

现在我需要在 HDFS 上将文件显示为普通文本。有没有办法做到这一点？

我用过hdfs dfs -text /path-of-table，但是它以ORC格式显示数据。

【问题讨论】：

嗨！您可以点击此链接：stackoverflow.com/questions/20847024/…
嗨@Chema，我已经查看了链接，但无法在 HDFS 上查看 ORC 文件的内容。

标签： hadoop hive apache-spark-sql hdfs orc

【解决方案1】：

在linux shell 命令中有一个名为hive --orcfiledump 的实用程序

要在HDFS 中查看ORC 文件的元数据，您可以调用如下命令：

[@localhost ~ ]$ hive --orcfiledump <path to HDFS ORC file>;

要以纯文本形式查看ORC 文件的内容，您可以使用-d 选项调用命令：

[@localhost ~ ]$ hive --orcfiledump -d <path to HDFS ORC file>;

举个例子：

hive> DESCRIBE FORMATTED orders_orc;
Location:  hdfs://localhost:8020/user/hive/warehouse/training_retail.db/orders_orc
# Storage Information        
SerDe Library:          org.apache.hadoop.hive.ql.io.orc.OrcSerde    
InputFormat:            org.apache.hadoop.hive.ql.io.orc.OrcInputFormat  
OutputFormat:           org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat

hive> exit;

[@localhost ~ ]$ hdfs dfs -ls /user/hive/warehouse/training_retail.db/orders_orc
Found 1 items
-rwxrwxrwx   1 training hive     163094 2020-04-20 09:39 /user/hive/warehouse/training_retail.db/orders_orc/000000_0

[@localhost ~ ]$ hdfs dfs -tail /user/hive/warehouse/training_retail.db/orders_orc/000000_0
��+"%ў�.�b.����8V$tߗ��\|�?�xM��
                      *�ڣ�������!�2���_���Ͳ�V���
                                                     r�E(����~�uM�/&��&x=-�&2�T��o��JD���Q��m5��#���8Iqe����A�^�ێ"���@�t�w�m�A ���3|�����NL�Q����p�d�#:}S-D�Wq�_"����

[@localhost ~ ]$ hive --orcfiledump /user/hive/warehouse/training_retail.db/orders_orc/000000_0;
Structure for /user/hive/warehouse/training_retail.db/orders_orc/000000_0
File Version: 0.12 with HIVE_8732
20/04/20 10:19:58 INFO orc.ReaderImpl: Reading ORC rows from /user/hive/warehouse/training_retail.db/orders_orc/000000_0 with {include: null, offset: 0, length: 9223372036854775807}
Rows: 68883
Compression: ZLIB
Compression size: 262144
Type: struct<_col0:int,_col1:string,_col2:int,_col3:string>
....
File length: 163094 bytes
Padding length: 0 bytes
Padding ratio: 0%

[@localhost ~ ]$ hive --orcfiledump -d /user/hive/warehouse/training_retail.db/orders_orc/000000_0 | head -n 5
{"_col0":1,"_col1":"2013-07-25 00:00:00.0","_col2":11599,"_col3":"CLOSED"}
{"_col0":2,"_col1":"2013-07-25 00:00:00.0","_col2":256,"_col3":"PENDING_PAYMENT"}
{"_col0":3,"_col1":"2013-07-25 00:00:00.0","_col2":12111,"_col3":"COMPLETE"}
{"_col0":4,"_col1":"2013-07-25 00:00:00.0","_col2":8827,"_col3":"CLOSED"}
{"_col0":5,"_col1":"2013-07-25 00:00:00.0","_col2":11318,"_col3":"COMPLETE"}

您可以点击此链接了解详情：

How to see contents of Hive orc files in linux

【讨论】：

谢谢@Chema。我实际上是在没有给出完整文件名的情况下尝试使用部分 hdfs 文件路径，并被抛出 NO FILE EXISTS 错误。您的示例帮助我理解了它的工作原理。