如何使用键实现 cassandra 的分页？答案

【问题标题】：How to implement pagination for cassandra by using keys?如何使用键实现 cassandra 的分页？
【发布时间】：2019-04-10 20:46:05
【问题描述】：

我正在尝试为我的应用实现某种分页功能，在后端使用 cassandra。

CREATE TABLE sample (
    some_pk int,
    some_id int,
    name1 txt,
    name2 text,
    value text,
    PRIMARY KEY (some_pk, some_id, name1, name2)
)
WITH CLUSTERING ORDER BY(some_id DESC)

我想查询 100 条记录，然后将最后的记录键存储在内存中以供以后使用。

+---------+---------+-------+-------+-------+
| sample_pk| some_id | name1 | name2 | value |
+---------+---------+-------+-------+-------+
| 1       | 125     | x     | ''    | ''    |
+---------+---------+-------+-------+-------+
| 1       | 124     | a     | ''    | ''    |
+---------+---------+-------+-------+-------+
| 1       | 124     | b     | ''    | ''    |
+---------+---------+-------+-------+-------+
| 1       | 123     | y     | ''    | ''    |
+---------+---------+-------+-------+-------+

（为简单起见，我将一些列留空。分区键（sample_pk）并不重要）

假设我的页面大小为 2。

select * from sample where sample_pk=1 limit 2;

返回前 2 行。现在我将最后一条记录存储在查询结果中并再次运行查询以获取接下来的 2 行；

这是由于单个非 EQ 关系的限制而无法工作的查询

select * from where sample_pk=1 and some_id <= 124 and name1>='a' and name2>='' limit 2;

这个返回错误的结果，因为 some_id 是降序的，而 name 列是升序的。

select * from where sample_pk=1 and (some_id, name1, name2) <= (124, 'a', '') limit 2;

所以我被困住了。如何实现分页？

【问题讨论】：

标签： cassandra datastax-java-driver cqlsh spring-data-cassandra

【解决方案1】：

您可以像这样运行第二个查询，

select * from sample where some_pk =1 and some_id <= 124 limit x;

现在在获取记录后忽略您已经阅读的记录（可以这样做，因为您正在存储上一个选择查询中的最后一条记录）。

如果您最终得到空的行/记录列表，则在忽略这些记录之后，这意味着您已经迭代了所有记录，否则请继续为您的分页任务执行此操作。

【讨论】：

这大概是在驱动程序中实现分页的方式（参见github.com/datastax/java-driver/tree/3.x/manual/paging、datastax.github.io/python-driver/query_paging.html）。如果你不想自己实现，你可以使用那些驱动函数。

【解决方案2】：

您不必在内存中存储任何键，也不必在 cqlsh 查询中使用 limit。只需在应用程序代码中使用datastax driver 的功能进行分页，如下代码：

public Response getFromCassandra(Integer itemsPerPage, String pageIndex) {
    Response response = new Response();
    String query = "select * from sample where sample_pk=1";
    Statement statement = new SimpleStatement(query).setFetchSize(itemsPerPage); // set the number of items we want per page (fetch size)
    // imagine page '0' indicates the first page, so if pageIndex = '0' then there is no paging state
    if (!pageIndex.equals("0")) {
        statement.setPagingState(PagingState.fromString(pageIndex));
    }
    ResultSet rows = session.execute(statement); // execute the query
    Integer numberOfRows = rows.getAvailableWithoutFetching(); // this should get only number of rows = fetchSize (itemsPerPage)
    Iterator<Row> iterator = rows.iterator();
    while (numberOfRows-- != 0) {
        response.getRows.add(iterator.next());
    }
    PagingState pagingState = rows.getExecutionInfo().getPagingState();
    if(pagingState != null) { // there is still remaining pages
        response.setNextPageIndex(pagingState.toString());
    }
    return response;
}

请注意，如果您像下面这样进行 while 循环：

while(iterator.hasNext()) {
    response.getRows.add(iterator.next());
}

它将首先获取与我们设置的获取大小相等的行数，然后只要查询仍然匹配 Cassandra 中的某些行，它将再次从 cassandra 获取，直到它从 cassandra 获取与查询匹配的所有行，这可能如果您想实现分页功能，则不打算这样做

来源：https://docs.datastax.com/en/developer/java-driver/3.2/manual/paging/

【讨论】：