spring 数据 cassandra 存储库上的缓慢插入和保存性能答案

【问题标题】：Slow insert and saveall performance on spring data cassandra repositoryspring 数据 cassandra 存储库上的缓慢插入和保存性能
【发布时间】：2019-03-13 10:05:10
【问题描述】：

我正在尝试使用 spring 将 1500 条记录插入 cassandra。我有一个包含这 1500 条记录的 POJO 列表，当我调用 saveAll 或插入此数据时，完成此操作需要 30 秒。有人可以建议我更快地完成这项工作吗？我目前正在将 Cassandra 3.11.2 作为单节点测试集群运行。

实体 POJO：

package com.samplepoc.pojo;

import static org.springframework.data.cassandra.core.cql.PrimaryKeyType.PARTITIONED;
import java.util.Date;
import java.util.HashMap;
import java.util.Map;
import java.util.UUID;

import org.springframework.data.cassandra.core.mapping.Column;
import org.springframework.data.cassandra.core.mapping.PrimaryKeyColumn;
import org.springframework.data.cassandra.core.mapping.Table;

@Table("health")
public class POJOHealth
{
    @PrimaryKeyColumn(type=PARTITIONED)
    UUID primkey;
    @Column
    String col1;
    @Column
    String col2;
    @Column
    String col3;
    @Column
    String col4;
    @Column
    String col5;
    @Column
    Date ts;
    @Column
    boolean stale;
    @Column
    String col6;
    @Column
    String col7;
    @Column
    String col8;
    @Column
    String col9;
    @Column
    Map<String,String> data_map = new HashMap<String,String>();

    public POJOHealth(
             String col1,
             String col2,
             String col3,
             String col4,
             String col5,
             String col6,
             String col7,
             String col8,
             String col9,
             boolean stale,
             Date ts,
             Map<String,String> data_map
             )
    {
        this.primkey = UUID.randomUUID();
        this.col1=col1;
        this.col2=col2;
        this.col3=col3;
        this.col4=col4;
        this.col5=col5;
        this.col6=col6;
        this.col7=col7;
        this.col8=col8;
        this.col9=col9;
        this.ts=ts;
        this.data_map = data_map;
        this.stale=stale;
    }

    //getters & setter ommitted
}

持久化服务sn-p：

public void persist(List<POJO> l_POJO)
{
        System.out.println("Enter Persist: "+new java.util.Date());

        List<l_POJO> l_POJO_stale = repository_name.findBycol1AndStale("sample",false);
        System.out.println("Retrieve Old: "+new java.util.Date());

        l_POJO_stale.forEach(s -> s.setStale(true));
        System.out.println("Set Stale: "+new java.util.Date());

        repository_name.saveAll(l_POJO_stale);
        System.out.println("Save stale: "+new java.util.Date());

        try 
        {
            repository_name.insert(l_POJO);
        } 
        catch (Exception e) 
        {
            System.out.println("Error in persisting new data");
        }
        System.out.println("Insert complete: "+new java.util.Date());
}

【问题讨论】：

标签： spring cassandra spring-data

【解决方案1】：

我不知道spring，但是它使用的java驱动可以异步插入。以这种方式节省您的实例延迟决定了您的吞吐量 - 而不是查询的效率。即假设您对 C* 协调器有 10 毫秒的延迟，一次保存一个将需要 30 秒（10 毫秒有 10 毫秒返回 * 1,500）。

如果您同时使用 executeAsync 将它们全部插入并阻止它们全部完成，您应该能够在不到一秒的时间内完成 1500 次，除非您的硬件功率非常低（除了树莓派之外的任何东西都应该是至少能够处理突发事件）。也就是说，如果您的应用有任何并发性，您不希望每个都同时发送 1000 个插入，因此设置某种运行中的节流阀（即限制为 128 的信号量）将是一个非常好的主意。

【讨论】：

感谢您的指导@Chris Lohfnik 在您回答后，我正在阅读有关弹簧数据异步操作的信息。
异步解决方案可用here