【问题标题】:Java & Cassandra - 2 CQLSSTableWriter InstancesJava 和 Cassandra - 2 个 CQLSSTableWriter 实例
【发布时间】:2015-04-10 18:44:05
【问题描述】:

我正在尝试找到一种最有效的方法,将大量数据从 Java 程序多线程化到 Cassandra 的键空间内的多个表中。这是我的 Keyspace/Table 声明:

CREATE KEYSPACE IF NOT EXISTS articles  WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : '3'}

CREATE TABLE IF NOT EXISTS articles.bigrams (docid text, bigram text, primary key (docid, bigram));
CREATE TABLE IF NOT EXISTS articles.unigrams (docid text, unigram text, primary key (docid, unigram));

这是给我带来问题的 Java 程序部分。我正在尝试创建 2 个 QSQLSSTableWriter 实例并写入每个实例:

package cassandrabulktest.cassandra;

import java.io.IOException;
import java.util.ArrayList;
import org.apache.cassandra.exceptions.InvalidRequestException;
import org.apache.cassandra.io.sstable.CQLSSTableWriter;



public class UnigramLoader {
    private static final String UNIGRAM_SCHEMA = "CREATE TABLE articles.unigrams (" +
                                                      "docid text, " +
                                                      "unigram text, " +
                                                      "PRIMARY KEY (unigram, docid))";

    private static CQLSSTableWriter unigram_writer = CQLSSTableWriter.builder()
                .inDirectory("/tables/articles/unigrams")
                .forTable(UNIGRAM_SCHEMA)
                .using("INSERT INTO articles.unigrams (docid, unigram) VALUES (?, ?)")
                .build();

    private static final String BIGRAM_SCHEMA = "CREATE TABLE articles.bigrams (" +
                                                      "docid text, " +
                                                      "bigram text, " +
                                                      "PRIMARY KEY (bigram, docid))";

    private static CQLSSTableWriter bigram_writer = CQLSSTableWriter.builder()
                .inDirectory("/tables/articles/bigrams")
                .forTable(BIGRAM_SCHEMA)
                .using("INSERT INTO articles.bigrams (docid, bigram) VALUES (?, ?)")
                .build();


    public static void load(String articleId, ArrayList<String> unigrams, ArrayList<String> bigrams) throws IOException, InvalidRequestException {        
        for (String unigram : unigrams) {
            unigram_writer.addRow(unigram, articleId);
        }

        for (String bigram : bigrams) {
            bigram_writer.addRow(bigram, articleId);
        }
    }

    public static void closeWriter() throws IOException {
        unigram_writer.close();
        bigram_writer.close();
    }
}

如果成功,这将开始在 2 个目录中创建 SSTable 文件。但是,我在运行时收到此错误:

Exception in thread "Thread-1" java.lang.ExceptionInInitializerError
    at edu.georgetown.cassandrabulktest.runnables.UnigramRunnable.run(UnigramRunnable.java:69)
    at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.RuntimeException: org.apache.cassandra.exceptions.ConfigurationException: Column family ID mismatch (found 662e2edf-c864-34a4-bca6-f83b25af6f6a; expected 7247b490-b141-11e4-a8f9-8b65543eda40)
    at org.apache.cassandra.config.CFMetaData.reload(CFMetaData.java:1125)
    at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:337)
    at org.apache.cassandra.io.sstable.CQLSSTableWriter$Builder.forTable(CQLSSTableWriter.java:360)
    at edu.georgetown.cassandrabulktest.cassandra.UnigramLoader.<clinit>(UnigramLoader.java:29)
    ... 2 more
Caused by: org.apache.cassandra.exceptions.ConfigurationException: Column family ID mismatch (found 662e2edf-c864-34a4-bca6-f83b25af6f6a; expected 7247b490-b141-11e4-a8f9-8b65543eda40)
    at org.apache.cassandra.config.CFMetaData.validateCompatility(CFMetaData.java:1208)
    at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:1140)
    at org.apache.cassandra.config.CFMetaData.reload(CFMetaData.java:1121)
    ... 5 more

有没有办法做到这一点,或者有没有不同的方式来完成我想做的事情?提前致谢!

【问题讨论】:

    标签: java cassandra bulk bulk-load


    【解决方案1】:

    您可能想尝试构建和使用单个写入器实例,因为同时使用多个写入器时似乎存在一些竞争条件。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-07-10
      • 1970-01-01
      • 2018-03-19
      相关资源
      最近更新 更多