【发布时间】:2018-02-24 03:40:01
【问题描述】:
我有一个要求,我需要将 Json 数据加载到 pig 中,但似乎存在一些我无法加载数据的问题。下面是示例数据结构 -
[{
"id": 1,
"first_name": "Lakshmi",
"last_name": "P",
"email": "xxx@yyy.com",
"gender": "Female",
"ip_address": "26.58.193.2"
}, {
"id": 2,
"first_name": "Syam",
"last_name": "Prasad",
"email": "sp@yyy.com",
"gender": "Male",
"ip_address": "229.179.4.212"
}, {
"id": 3,
"first_name": "ABC",
"last_name": "CDE",
"email": "abc@cde.com",
"gender": "Female",
"ip_address": "180.66.162.255"
}, {
"id": 4,
"first_name": "FGS",
"last_name": "IJK",
"email": "lmn@opq.com",
"gender": "Male",
"ip_address": "67.76.188.26"
}]
我尝试使用 JsonLoader 加载数据,如下面的代码 -
--inidata1 = load 'inputData1.json' using JsonStorage('\n');
--REGISTER 'piggybank-0.15.0.jar';
inidata = load 'inputData1.json' using JsonLoader('id:int,first_name:chararray,last_name:chararray,email:chararray,gender:chararray,ip_address:$
madata = foreach inidata generate group, FLATTEN(inidata);
dump madata;
--filterdata = foreach inidata generate id,first_name,last_name,email,gender,ip_address;
--dump filterdata;
--filterdata = foreach inidata generate id,gender,first_name,last_name;
--selecteddata = filter inidata by (gender=='Male') OR (last_name=='Prasad');
--dump selecteddata;
--store selecteddata into 'JSON-DATA_input';
如果有解决办法,谁能分享一下?
【问题讨论】:
-
“无法加载”是什么意思 - 错误消息、错误结果、没有任何反应?
标签: apache-pig