【问题标题】:Connect to Cassandra from Pig从 Pig 连接到 Cassandra
【发布时间】:2017-05-31 16:04:09
【问题描述】:

我正在尝试从 pig 连接到 Cassandra。 但是 Cassandra 安装在不同的集群中,我需要连接才能从 pig 远程连接到 Cassandra。

我指的是以下链接exmaple

得到类似的错误

Failed to parse: Can not retrieve schema from loader org.apache.cassandra.hadoop.pig.CqlStorage@1216d9bf
    at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:198)
    at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1688)
    at org.apache.pig.PigServer$Graph.access$000(PigServer.java:1421)
    at org.apache.pig.PigServer.parseAndBuild(PigServer.java:354)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:379)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:365)
    at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:769)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
    at org.apache.pig.Main.run(Main.java:484)
    at org.apache.pig.Main.main(Main.java:158)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

我的猪脚本如下

A = LOAD 'cql://userName:password/mykeyspace/mycolumnfamily' USING org.apache.cassandra.hadoop.pig.CqlStorage() AS (user_id:long, fname:chararray, last_update_date:chararray, lname:chararray); DUMP A;

请让我知道我们必须在哪里提供安装 Cassandra 的系统的 ip

【问题讨论】:

    标签: cassandra apache-pig


    【解决方案1】:

    我在网上搜索得到的是http://www.datastax.com/dev/blog/cassandra-and-pig-tutorial

    使用 Pig 查询 Cassandra

    通过 Datastax Enterprise 启动 pig 客户端。

    除了在分析模式下启动集群外,无需进行任何设置。

     (14:52:17)[~/BlogPosts/CassPig_Libraries]dse pig
     2013-08-26 14:52:27,166 [main] INFO org.apache.pig.Main - Logging error messages to: /Users/russellspitzer/BlogPosts/CassPig_Libraries/pig_1377553947163.log
     2013-08-26 14:52:27,421 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: cfs://127.0.0.1/
     2013-08-26 14:52:27.488 java[64588:1503] Unable to load realm info from SCDynamicStore
     2013-08-26 14:52:28,348 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: 127.0.0.1:8012
    grunt>
    
    Next we construct our pig commands, starting with loading our data from Cassandra. We’ll be using the cql:// url and the CqlStorage() connector. The format of the command is basically load ‘cql://keyspace/table’. More info on CQL3 and Pig.
    
    
    grunt> libdata = load 'cql://libdata/libout' USING CqlStorage(); 
    grunt> DESCRIBE libdata;
    

    将以下内容设置为环境变量(大写, 下划线),或作为 Hadoop 配置变量(小写,虚线):

     * PIG_INITIAL_ADDRESS or cassandra.thrift.address : initial address to connect to
     * PIG_RPC_PORT or cassandra.thrift.port : the port thrift is listening on
     * PIG_PARTITIONER or cassandra.partitioner.class : cluster partitioner
    

    例如,对于具有默认设置的本地节点,您可以使用:

     export PIG_INITIAL_ADDRESS=localhost
     export PIG_RPC_PORT=9160
     export PIG_PARTITIONER=org.apache.cassandra.dht.Murmur3Partitioner
    

    如果您使用不同的集群进行输入和输出,这些属性可以被以下内容覆盖:

     * PIG_INPUT_INITIAL_ADDRESS : initial address to connect to for reading
     * PIG_INPUT_RPC_PORT : the port thrift is listening on for reading
     * PIG_INPUT_PARTITIONER : cluster partitioner for reading
     * PIG_OUTPUT_INITIAL_ADDRESS : initial address to connect to for writing
     * PIG_OUTPUT_RPC_PORT : the port thrift is listening on for writing
     * PIG_OUTPUT_PARTITIONER : cluster partitioner for writing
    

    更多参考请参考以下网址

    https://github.com/Stratio/stratio-cassandra/tree/master/examples/pig

    希望对您有所帮助!!!...

    【讨论】:

      猜你喜欢
      • 2013-07-04
      • 1970-01-01
      • 2014-11-25
      • 2016-04-25
      • 1970-01-01
      • 1970-01-01
      • 2016-11-15
      • 2018-08-17
      • 2018-08-18
      相关资源
      最近更新 更多