【问题标题】:Flatten inner bags in Pig将猪的内袋压平
【发布时间】:2014-08-26 20:27:47
【问题描述】:

我有这个:

(1,{(1,2,3)})

(4,{(4,2,1),(4,3,3)})

(8,{(8,3,4)})

我想要这个:

(1,1,2,3)

(4,4,2,1)

(4,4,3,3)

(8,8,3,4)

我发现这篇文章很有帮助,但无法让它发挥作用: How to flatten a group into a single tuple in Pig? 这似乎很容易,但我是 Pig 的新手。 提前致谢!

编辑回答 reo 的问题(感谢 reo):

在 SO 上发帖之前,我曾尝试在我的问题上压平那个包,只是得到以下错误(在我的示例中 d 是 C):

    ERROR 2117: Unexpected error when launching map reduce job.

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias d
        at org.apache.pig.PigServer.openIterator(PigServer.java:857)
        at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:682)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
        at org.apache.pig.Main.run(Main.java:490)
        at org.apache.pig.Main.main(Main.java:111)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias d
        at org.apache.pig.PigServer.storeEx(PigServer.java:956)
        at org.apache.pig.PigServer.store(PigServer.java:919)
        at org.apache.pig.PigServer.openIterator(PigServer.java:832)
        ... 12 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2117: Unexpected error when launching map reduce job.
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:322)
        at org.apache.pig.PigServer.launchPlan(PigServer.java:1270)
        at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1255)
        at org.apache.pig.PigServer.storeEx(PigServer.java:952)
        ... 14 more
Caused by: java.lang.RuntimeException: Could not resolve error that occured when launching map reduce job: java.lang.NullPointerException
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecsHelper(PigOutputFormat.java:193)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:187)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:937)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1117)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:870)
        at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
        at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
        at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
        at java.lang.Thread.run(Thread.java:662)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:260)

        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:631)
        at java.lang.Thread.dispatchUncaughtException(Thread.java:1874)

那是我原来的问题。但是现在我在此处使用的可重现示例上尝试了同样的方法,它可以工作。

grunt> A = load 'my_location_on_HDFS' using PigStorage('|') as (a,b,c);
grunt> B = group A by a;
grunt> describe B;
B: {group: bytearray,A: {(a: bytearray,b: bytearray,c: bytearray)}}
grunt> dump B;
(1,{(1,2,3)})
(4,{(4,2,1),(4,3,3)})
(8,{(8,3,4)})
grunt> C = foreach B generate group, flatten(A);
grunt> dump C;
(1,1,2,3)
(4,4,2,1)
(4,4,3,3)
(8,8,3,4)

我想我必须仔细看看我的原始数据/情况。

【问题讨论】:

标签: apache-pig flatten


【解决方案1】:

如果你把袋子压平,那么你会得到想要的输出让我们假设你上面的输入是 B,你通过对输入 A 分组得到了这个

只需描述 B;

Describe B;    
B: {group: bytearray,A: {()}}

如果你可以尝试类似的东西,A 现在是一个袋子

 C= foreach B generate group, FLATTEN(A);

C 会有你想要的值,压扁一个包会导致单独的行

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2012-05-11
    • 2023-04-03
    • 2015-07-27
    • 1970-01-01
    相关资源
    最近更新 更多