Amazon Redshift 外键 - 排序或交错键答案

【问题标题】：Amazon Redshift Foreign Keys - Sort or Interleaved KeysAmazon Redshift 外键 - 排序或交错键
【发布时间】：2018-11-05 08:57:54
【问题描述】：

我们计划将 OLTP 关系表导入 AWS Redshift。 CustomerTransaction 表连接到多个查找表。我只包括了 3 个，但我们还有更多。

客户交易表上的排序键应该是什么？在常规 SQL 服务器中，我们在 CustomerTransaction 表中的外键上有非聚集索引。对于 AWS Redshift，我应该对 CustomerTransaction 中的外键列使用复合排序键还是交错排序？此表设计的最佳索引策略是什么。谢谢，

create table.dbo CustomerTransaction
{
    CustomerTransactionId bigint primary key identity(1,1),
    ProductTypeId bigint,   -- foreign keys to Product Type Table
    StatusTypeID bigint         -- Foreign keys to StatusTypeTable
    DateOfPurchase date,
    PurchaseAmount float,
    ....
}

create table dbo.ProductType
{
    CustomerTransactionId bigint primary key identity(1,1),
    ProductName varchar(255),
    ProductDescription varchar(255)
    .....
}

create table dbo.StatusType
{
    StatusTypeId bigint primary key identity(1,1),
    StatusTypeName varchar(255),
    StatusDescription varchar(255)
    .....

}

【问题讨论】：

标签： sql performance amazon-web-services amazon-redshift

【解决方案1】：

一般的经验法则是：

根据你常用的GROUP BY设置DISTKEY
根据您在WHERE 语句中常用的设置SORTKEY
避免交错排序键（它们仅在极少数情况下是最佳的，并且需要频繁使用VACUUM）

来自Choose the Best Distribution Style - Amazon Redshift：

将事实表和一维表分布在它们的公共列上
根据过滤后的数据集大小选择最大维度
在过滤的结果集中选择具有高基数的列
更改一些维度表以使用 ALL 分布

因此，推荐一个特定的DISTKEY 和SORTKEY 并不容易，因为这取决于你如何使用这些故事。仅仅看到 DDL 不足以推荐优化表的最佳方法。

其他参考资料：

【讨论】：

这里有新问题，希望您的同事或亚马逊员工能回答这个问题，谢谢！ stackoverflow.com/questions/50780784/…