Apache Pig 中的 Json 解析答案

【问题标题】：Json Parsing in Apache PigApache Pig 中的 Json 解析
【发布时间】：2014-07-24 09:41:12
【问题描述】：

我有一个 json ：

{"Name":"sampling","elementInfo":{"fraction":"3"},"destination":"/user/sree/OUT","source":"/user/sree/foo.txt"}

我发现我们可以将 json 加载到 PigScript 中。

A = LOAD ‘data.json’
USING PigJsonLoader();

但是如何在 Apache Pig 中解析 json

--Sampling.pig
--pig -x mapreduce -f Sampling.pig -param input=foo.csv -param output=OUT/pig -param delimiter="," -param fraction='0.05'

--Load data
inputdata = LOAD '$input' using PigStorage('$delimiter');

--Group data
groupedByAll = group inputdata all;

--output into hdfs
sampled = SAMPLE inputdata $fraction;
store sampled into '$output' using PigStorage('$delimiter');

以上是我的猪脚本。如何在 Apache pig 中 parse json（每个元素）？我需要将上面的 json 作为输入并解析其source,delimiter,fraction,output 并分别传入$input,$delimiter,$fraction,$output。

如何解析相同的 . 请推荐

【问题讨论】：

标签： json parsing hadoop mapreduce apache-pig

【解决方案1】：

试试这个：

--Load data
inputdata = LOAD '/input.txt' using JsonLoader('Name:chararray,elementinfo:(fraction:chararray),destionation:chararray,source:chararray');

--Group data
groupedByAll = group inputdata all;

store groupedByAll into '/OUT/pig' using PigStorage(',');

现在你的输出看起来：

all,{(sampling1,(4),/user/sree/OUT1,/user/sree/foo1.txt),(sampling,(3),/user/sree/OUT,/user/sree/foo.txt)}

在双引号中的输入文件分数数据{"fraction":"3"}。所以我使用分数作为 chararray 所以无法运行示例命令所以我使用上面的脚本来获取结果。

如果您想执行示例操作，请将分数数据转换为 int，然后您将得到结果。

【讨论】：

感谢 MarHserus。但是如何解析那些元素？为了实现分数和分隔符？我的输入是：/user/sree/foo.txt
我希望输出为带分隔符的文件（不是 json），与输入文件（foo.txt）相同
感谢您的好心敌人运行相同和发布。但仍然存在混乱。我的用例就像我有一个来自 json 的 json 我需要解析，我需要在 pig 上运行的所有输入代码（采样）。这意味着我需要解析要采样的输入文件，分隔符和输出