Aerospike 数据建模和查询答案

【问题标题】：Aerospike data modeling and queryingAerospike 数据建模和查询
【发布时间】：2016-01-02 14:31:58
【问题描述】：

假设我在 JAVA 中有以下模型

class Shape {
    String type;
    String color;
    String size;
}

假设我有以下基于上述模型的数据。

Triangle, Blue, Small
Triangle, Red, Large
Circle, Blue, Small
Circle, Blue, Medium
Square, Green, Medium
Star, Blue, Large

我想回答以下问题

Given the type Circle how many unique colors?
    Answer: 1
Given the type Circle how many unique sizes?
    Answer: 2

Given the color Blue how many unique shapes?
    Answer: 2
Given the color Blue how many unique sizes?
    Answer: 3

Given the size Small how many unique shapes?
    Answer: 2
Given the size Small how many unique colors?
    Answer: 1

我想知道是否应该按照以下方式对其进行建模...

set: shapes -> key: type -> bin(s): list of colors, list of sizes
set: colors -> key: color -> bin(s): list of shapes, list of sizes
set: sizes -> key: size -> bin(s): list of shapes, list of colors

或者有更好的方法吗？如果我这样做，我需要多 3 倍的存储空间。

我还希望每组有数十亿个条目。顺便说一句，该模型已被编辑以保护无意义的代码；）

【问题讨论】：

问题仍未解决？我同意您提出的解决方案是最好的方法，如果您对由于 3 组变得“热”而限制吞吐量感到满意的话。要回答您的问题，您能否添加：结果和更新是“在线”还是“离线”（算法）？您是否必须处理 Shapes 的删除（需要引用计数器）？您对“粗略的结果”是否满意，还是需要 100% 正确？在没有颜色/尺寸/类型索引的基础模型上，您希望的吞吐量是多少？

标签： data-modeling aerospike nosql

【解决方案1】：

NoSQL 中的数据建模始终与您计划如何检索数据、吞吐量和延迟有关。

有几种方法可以对这些数据进行建模；最简单的方法是模仿每个字段成为 Bin 的类结构。您可以在每个 bin 上定义二级索引并使用聚合查询来回答您的问题（如上）。

但这只是一种方式；您可能需要使用不同的数据模型来满足延迟和吞吐量的因素。

【讨论】：

我为这个类建模，每个字段都是一个 bin。然后我运行 select * from Shape where Shape = ?或颜色 = ？或大小=？这会将所有本地数据带到我的应用程序中，然后我会进行计数。这比向服务器发送 3 个不同的聚合要快。