Cassandra 数据建模设计答案

【问题标题】：Cassandra Data Modeling DesignCassandra 数据建模设计
【发布时间】：2015-07-31 22:56:10
【问题描述】：

我对 Cassandra 还很陌生，上个月读了很多书。
我正在研究一个小用例。
查询：基于时间范围内的 AmountPlayed 排名前 X 的玩家。

因此，在任何给定时间范围内，我都希望汇总玩家 TotalAmountPlayed 并得出 Top X Players。

我遵循创建 UDF（使用 C*-2.2.0 版本）的方法来聚合播放器的 AmountPlayed。

下面是我为此用例设计的时间序列数据模型。

CREATE COLUMNFAMILY PlayerRating
(
PlayerNumber int, ===> Unique account number
GameID text, ===> unique machine ID per slot
AmountPlayed double, ===> AmountPlayed by the player
EventTime timestamp, ===> Event generated TimeStamp
PRIMARY KEY ((PlayerNumber, GameID),EventTime)) WITH CLUSTERING ORDER BY(EventTime desc);

如果我的数据模型设计适合我的查询，请告诉我。

谢谢！！

【问题讨论】：

标签： cassandra

【解决方案1】：

我认为将每个游戏的所有玩家放在一个分区中可能会更容易。

这样您就可以用一个查询聚合所有玩家，而不是为每个玩家单独查询。然后，您可以将每个玩家的游戏时间汇总到一张地图中（请参阅如何为 here 定义 UDF 的示例）。

所以你的表格看起来像这样：

CREATE TABLE playing_time_by_game (game_id text, event_time int, player_id text, amount_played int, PRIMARY KEY (game_id, event_time));

然后根据 player_id 创建 UDF：

CREATE FUNCTION state_group_and_total( state map<text, int>, type text, amount int )
     CALLED ON NULL INPUT
     RETURNS map<text, int>
     LANGUAGE java AS '
     Integer count = (Integer) state.get(type);  if (count == null) count = amount; else count = count + amount; state.put(type, count); return state; ' ;

然后创建聚合函数：

CREATE OR REPLACE AGGREGATE group_and_total(text, int) 
     SFUNC state_group_and_total 
     STYPE map<text, int> 
     INITCOND {};

然后插入一些数据：

SELECT * from playing_time_by_game ;

 game_id | event_time | amount_played | player_id
---------+------------+---------------+-----------
   game1 |          0 |             8 |   player1
   game1 |          1 |            12 |   player2
   game1 |          5 |             1 |   player2
   game1 |          8 |            50 |   player1
   game2 |          0 |           200 |   player1

现在您可以按 player_id 聚合：

SELECT group_and_total(player_id, amount_played) from playing_time_by_game;

 t2.group_and_total(player_id, amount_played)
----------------------------------------------
              {'player1': 258, 'player2': 13}

并且可以将查询限制在游戏分区和时间范围内：

SELECT group_and_total(player_id, amount_played) from playing_time_by_game where game_id='game1' and event_time >=0 and event_time <=7;

 t2.group_and_total(player_id, amount_played)
----------------------------------------------
                {'player1': 8, 'player2': 13}

您可能还可以定义一个 FINALFUNC 来排序并仅保留地图中的前十项。见this。

【讨论】：