【问题标题】:Problem while uploading csv data into hive table将 csv 数据上传到 hive 表时出现问题
【发布时间】:2018-10-12 12:38:28
【问题描述】:

用于创建 hive 表的格式serde property

我的csv数据csv file demo

csv 数据的某些字段具有换行符,因此会导致问题,例如当我们从表中选择一列时,具有换行符“'\n'”的字段拆分为多个行。

csv文件的猫one row data of csv file

"May 27, 2018",77266157-8b26-46bb-93f0-a1ef20931a,'2124272900300,OD212427213119003000,62029200,Delivered,NON_FBF,BKPE7CPDZ2, ECHO RED1,Black And Red 07171202,FABOK019000001,NA,5,NA,05 /27/18,739,674,65,1,739,Jishad ,Jishad ”NASSCO TILES POYILTHODI TOWER NEAR TAX CHECK POST FAROKE CHUNGAM”,,KOZHIKODE,Kerala,673631,”2018 年 5 月 27 日 15:59:29”,2018 年 5 月 29 日10:00:00",,FM324875856,10,8,6,0.3,没有导致问题的行。 NASSCO TILES & POYILTHODI 之间有换行符

预期结果Expected result of the query

实际结果Getting this result

帮我在hive表中准确加载CSV文件数据也得到所需的结果

【问题讨论】:

  • 你能在这里粘贴 csv 数据吗..即 cat 文件.. 示例行
  • csv 数据的某些字段具有换行符,因此会导致问题,例如当我们从表中选择一列时,具有换行符“'\n'”的字段拆分为更多超过一排。
  • 请将其粘贴到问题部分
  • 我已附上问题部分中所有内容的屏幕截图
  • 这似乎是 Hive csv 处理的一个已知问题。但是有一些可用的解决方法 - 您可以使用 perl 格式化数据以将多行记录转换为单行。之后,您可以在其上构建配置单元表。检查我的解决方法。

标签: hive


【解决方案1】:

使用 Perl 将多行记录合并为一行的解决方法。请检查这是否有效

> nl kislay_stack.dat
     1  "May 27, 2018",77266157-8b26-46bb-93f0-a1ef29f0931a,'21242721311900300,OD212427213119003000,62029200,Delivered,NON_FBF,BKPE7CPYUWYFVD
     2  Z2,PUMA ECHO RED1,Puma Echo Plus 27 L Medium Backpack Black And Red 07171202,FABOK01900002001,NA,5,NA,05/27/18,739,674,65,1,739,Jisha
     3  d ,Jishad ,"NASSCO TILES
     4  POYILTHODI TOWER
     5  NEAR TAX CHECK POST FAROKE CHUNGAM",,KOZHIKODE,Kerala,673631,"May 27, 2018 15:59:29","May 29, 2018 10:00:00",,FMPC0324875856,10,8,6,0.3,NO
     6  "May 28, 2018",77266157-8b26-46bb-93f0-a1ef29f0931a,'21242721311900300,OD212427213119003000,62029200,Delivered,NON_FBF,BKPE7CPYUWYFVD
     7  Z2,PUMA ECHO RED1,Puma Echo Plus 27 L Medium Backpack Black And Red 07171202,FABOK01900002001,NA,5,NA,05/27/18,739,674,65,1,739,Jisha
     8  d ,Jishad ,"NASSCO TILES2
     9  POYILTHODI TOWER
    10  NEAR TAX CHECK POST FAROKE CHUNGAM",,KOZHIKODE,Kerala,673631,"May 27, 2018 15:59:29","May 29, 2018 10:00:00",,FMPC0324875856,10,8,6,0.3,NO
    11  "May 29, 2018",77266157-8b26-46bb-93f0-a1ef29f0931a,'21242721311900300,OD212427213119003000,62029200,Delivered,NON_FBF,BKPE7CPYUWYFVD
    12  Z2,PUMA ECHO RED1,Puma Echo Plus 27 L Medium Backpack Black And Red 07171202,FABOK01900002001,NA,5,NA,05/27/18,739,674,65,1,739,Jisha
    13  d ,Jishad ,"NASSCO TILES3
    14  POYILTHODI TOWER
    15  NEAR TAX CHECK POST FAROKE CHUNGAM",,KOZHIKODE,Kerala,673631,"May 27, 2018 15:59:29","May 29, 2018 10:00:00",,FMPC0324875856,10,8,6,0.3,NO
    16  "May 30, 2018",77266157-8b26-46bb-93f0-a1ef29f0931a,'21242721311900300,OD212427213119003000,62029200,Delivered,NON_FBF,BKPE7CPYUWYFVD
    17  Z2,PUMA ECHO RED1,Puma Echo Plus 27 L Medium Backpack Black And Red 07171202,FABOK01900002001,NA,5,NA,05/27/18,739,674,65,1,739,Jisha
    18  d ,Jishad ,"NASSCO TILES4
    19  POYILTHODI TOWER
    20  NEAR TAX CHECK POST FAROKE CHUNGAM",,KOZHIKODE,Kerala,673631,"May 27, 2018 15:59:29","May 29, 2018 10:00:00",,FMPC0324875856,10,8,6,0.3,NO
> perl -ne ' if ( /^\"\S+ \d+, \d{4}\"/ && $y++) { $x=~s/\n//g;print "$x\n";$x=$_ } else { $x.=$_ } END { $x=~s/\n//g; print "$x\n";} ' kislay_stack.dat | nl
     1  "May 27, 2018",77266157-8b26-46bb-93f0-a1ef29f0931a,'21242721311900300,OD212427213119003000,62029200,Delivered,NON_FBF,BKPE7CPYUWYFVDZ2,PUMA ECHO RED1,Puma Echo Plus 27 L Medium Backpack Black And Red 07171202,FABOK01900002001,NA,5,NA,05/27/18,739,674,65,1,739,Jishad ,Jishad ,"NASSCO TILES POYILTHODI TOWER NEAR TAX CHECK POST FAROKE CHUNGAM",,KOZHIKODE,Kerala,673631,"May 27, 2018 15:59:29","May 29, 2018 10:00:00",,FMPC0324875856,10,8,6,0.3,NO
     2  "May 28, 2018",77266157-8b26-46bb-93f0-a1ef29f0931a,'21242721311900300,OD212427213119003000,62029200,Delivered,NON_FBF,BKPE7CPYUWYFVDZ2,PUMA ECHO RED1,Puma Echo Plus 27 L Medium Backpack Black And Red 07171202,FABOK01900002001,NA,5,NA,05/27/18,739,674,65,1,739,Jishad ,Jishad ,"NASSCO TILES2 POYILTHODI TOWER NEAR TAX CHECK POST FAROKE CHUNGAM",,KOZHIKODE,Kerala,673631,"May 27, 2018 15:59:29","May 29, 2018 10:00:00",,FMPC0324875856,10,8,6,0.3,NO
     3  "May 29, 2018",77266157-8b26-46bb-93f0-a1ef29f0931a,'21242721311900300,OD212427213119003000,62029200,Delivered,NON_FBF,BKPE7CPYUWYFVDZ2,PUMA ECHO RED1,Puma Echo Plus 27 L Medium Backpack Black And Red 07171202,FABOK01900002001,NA,5,NA,05/27/18,739,674,65,1,739,Jishad ,Jishad ,"NASSCO TILES3 POYILTHODI TOWER NEAR TAX CHECK POST FAROKE CHUNGAM",,KOZHIKODE,Kerala,673631,"May 27, 2018 15:59:29","May 29, 2018 10:00:00",,FMPC0324875856,10,8,6,0.3,NO
     4  "May 30, 2018",77266157-8b26-46bb-93f0-a1ef29f0931a,'21242721311900300,OD212427213119003000,62029200,Delivered,NON_FBF,BKPE7CPYUWYFVDZ2,PUMA ECHO RED1,Puma Echo Plus 27 L Medium Backpack Black And Red 07171202,FABOK01900002001,NA,5,NA,05/27/18,739,674,65,1,739,Jishad ,Jishad ,"NASSCO TILES4 POYILTHODI TOWER NEAR TAX CHECK POST FAROKE CHUNGAM",,KOZHIKODE,Kerala,673631,"May 27, 2018 15:59:29","May 29, 2018 10:00:00",,FMPC0324875856,10,8,6,0.3,NO
>

更新:

> select col1,col2 from kislay3

+----------+---------+--+
|   col1   |  col2   |
+----------+---------+--+
| "May 27  |  2018"  |
| "May 28  |  2018"  |
| "May 29  |  2018"  |
| "May 30  |  2018"  |
+----------+---------+--+

> select concat(col1,col2)  from kislay3;

INFO  : OK
+----------------+--+
|      _c0       |
+----------------+--+
| "May 27 2018"  |
| "May 28 2018"  |
| "May 29 2018"  |
| "May 30 2018"  |
+----------------+--+

【讨论】:

  • 将 perl 输出重定向到另一个文件。 perl ' { ...... } ' > new.csv
  • 谢谢,它真的很有效,因为我的数据是根据我的需要组织起来的
  • 还有一件事,当我选择日期列时,它只留下年份数据打印日期。它将 2018 年视为另一列数据。如何解决它
  • 您选择前 2 列,然后连接
  • 但问题是日期被分成两列,由于年份部分在另一列或错误列中,我们如何合并或连接它
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2021-01-11
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2018-09-03
相关资源
最近更新 更多