【发布时间】:2016-10-08 01:02:09
【问题描述】:
我有一个包含 json 条目的文件,如下所示:
{"child_pos": "NN", "parent_pos": "NN", "parent": "fighter", "child_dep": "nn", "parent_dep": "nsubj", "child": "virtua"}
{"child_pos": "NN", "parent_pos": "NN", "parent": "case", "child_dep": "nn", "parent_dep": "nsubj", "child": "martin"}
{"child_pos": "NN", "parent_pos": "NN", "parent": "fighter", "child_dep": "nn", "parent_dep": "nsubj", "child": "virtua"}
{"child_pos": "NN", "parent_pos": "NN", "parent": "fighter", "child_dep": "nn", "parent_dep": "nsubj", "child": "virtua"}
{"child_pos": "NN", "parent_pos": "NN", "parent": "case", "child_dep": "nn", "parent_dep": "nsubj", "child": "martin"}
我想计算文件中不同 json 对象的频率。我看到了我们在 Pig 中使用 Group By 和 count() 函数的其他答案。我不确定我是否正确使用它们,但我没有得到所需的结果。我的输出应该是这样的:
{"child_pos": "NN", "parent_pos": "NN", "parent": "fighter", "child_dep": "nn", "parent_dep": "nsubj", "child": "virtua", "count": "3"}
{"child_pos": "NN", "parent_pos": "NN", "parent": "case", "child_dep": "nn", "parent_dep": "nsubj", "child": "martin", "count": "2"}
顺序并不重要。有人可以给我一些指点吗?
【问题讨论】:
-
请分享您尝试过的方法以及为什么您认为这不起作用?
标签: json count apache-pig