【问题标题】:Store data from Pig relation into Cassandra将 Pig 关系中的数据存储到 Cassandra
【发布时间】:2014-05-20 20:38:47
【问题描述】:

我有以下 Cassandra 表:

CREATE TABLE segments (
  b text,
  s int,
  c int,
  PRIMARY KEY (b)
)

和下面的 Pig 关系:

data: {b: chararray,s: long,c: long}

我从存储在 PigStorage 中的文件加载

data = LOAD 'some_file' as (b:chararray,s:long,c:long);

我试图将 Pig 关系存储到 Cassandra 表中,但未成功。我试过了:

to_cassandra = FOREACH (GROUP data ALL) 
  GENERATE 
    TOTUPLE(TOTUPLE('b',data.b)),
    TOTUPLE('s',data.s),
    TOTUPLE('c',data.c);
STORE to_cassandra INTO 
  'cql://pv/segments?
    output_query=UPDATE%20pv.segments%20SET%20s%3D%3F%2Cc%3D%3F'
  USING CqlStorage();

解码后的输出查询在哪里:

UPDATE pv.segments SET s=?,c=?

但我得到以下信息:

[main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - 
  ERROR: java.lang.ClassCastException: 
    org.apache.pig.data.DefaultDataBag cannot be cast to org.apache.pig.data.DataByteArray

这有点神秘。哪个是违规领域?我该如何解决这个问题?

编辑

我跑了illustrate to_cassandra; 并得到:

-----------------------------------------------------------------------------------------------------
| data     | b:chararray                                                  | s:long     | c:long     | 
-----------------------------------------------------------------------------------------------------
|          | 03Wat7NfMi8QiE4IlHeTmbOEfLNkvlzfG5znff62KvSzpm09eTBWCxcdotuB | 1          | 1          | 
|          | 0qadR3YpgVEwYsORBHFMfAh4OFk7IrROyCq7RDibchBpAKfSWAjOHDAyfzPG | 1          | 1          | 
-----------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1-3     | group:chararray     | data:bag{:tuple(b:chararray,s:long,c:long)}                                                                                                  | 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|         | all                 | {(03Wat7NfMi8QiE4IlHeTmbOEfLNkvlzfG5znff62KvSzpm09eTBWCxcdotuB, 1, 1), (0qadR3YpgVEwYsORBHFMfAh4OFk7IrROyCq7RDibchBpAKfSWAjOHDAyfzPG, 1, 1)} | 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| to_cassandra     | org.apache.pig.builtin.totuple_org.apache.pig.builtin.totuple_29_30:tuple(org.apache.pig.builtin.totuple_29:tuple(:chararray,:bag{:tuple(b:chararray)}))                         | org.apache.pig.builtin.totuple_31:tuple(:chararray,:bag{:tuple(s:long)})                     | org.apache.pig.builtin.totuple_32:tuple(:chararray,:bag{:tuple(c:long)})                     | 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|                  | ((b, {(03Wat7NfMi8QiE4IlHeTmbOEfLNkvlzfG5znff62KvSzpm09eTBWCxcdotuB), (0qadR3YpgVEwYsORBHFMfAh4OFk7IrROyCq7RDibchBpAKfSWAjOHDAyfzPG)}))                                          | (s, {(1), (1)})                                                                              | (c, {(1), (1)})                                                                              | 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

【问题讨论】:

    标签: cassandra apache-pig datastax-enterprise


    【解决方案1】:

    您的分组有问题,因为它为每个字段生成数组而不是单个值,这是 Cassandra 所期望的。您的输出最终应如下所示:

    ((b, 03Wat7NfMi8QiE4IlHeTmbOEfLNkvlzfG5znff62KvSzpm09eTBWCxcdotuB)), (s, 1), (c, 1)
    

    ... 为了匹配您的架构。由于您的输出架构直接匹配您的输入,因此分组的目的尚不清楚。

    【讨论】:

    • DataByteArray 是一组包吗?另外,我从包含 N 行的文件中加载数据,每行都有这三个字段,我是否设置了错误的架构?
    • 即,我使用data = load 'some_file' as (b:chararray,s:long,c:long);加载数据
    • 运行illustrate to_cassandra时,pig会输出什么?猪在那一行之后或者当你尝试执行store时会抱怨吗?
    • 我还没有尝试过illustrate,但我确实尝试过describe,它成功了。当我运行它时它失败了。我会尝试运行illustrate
    • 太棒了...输出是什么?我想看看它是否符合 Cassandra 的预期,因为在进行存储时,问题似乎是不匹配。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2018-09-06
    • 2021-09-15
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多