【问题标题】:Growing data to another table - DB Designing将数据增长到另一个表 - 数据库设计
【发布时间】:2019-04-11 05:11:28
【问题描述】:

我目前的项目是一个社交媒体应用,有点像 Facebook。现在,由用户和新闻帖子创建的帖子(每 15 分钟运行一次 cron,它从各种新闻频道获取最新消息)保存在同一个表中,称为 post 表。由于新闻发布,表格增长非常快,时间线需要更多时间来加载。因此,我们计划将普通帖子(post table)和新闻帖子(news_post table)拆分为单独的表格,然后将旧新闻帖子拆分为备用表格(news_post_backup table)。

然后在列出帖子 API 时,我们必须合并所有这 3 个表,并且必须按帖子创建时间排序,并且必须根据分页数据和其他条件进行帖子

我想知道这样做有什么好处。我很怀疑,因为我必须采取联合然后它再次成为像以前的表结构一样的表

服务器上的MYSQL版本是5.6

更新 我在这里添加更多信息
我正在运行的查询是

select CP.id,CP.user_id,post_title,post_content,post_type,new_title,is_spam,spam_reportedby,CP.privacy,CP.link_title,CP.link_content,CP.link_image,CP.is_paid,CP.payment_status,CP.is_breaking,
CUP.id as channel_userspost_id,CUP.parent_id,
SU.full_name as reporteduser_full_name,SU.user_name as reporteduser_user_name,
SU.user_profile_pic as reporteduser_user_profile_pic,
FU.id as from_user_id, FU.full_name as from_user_full_name,
FU.user_name as from_user_name,
FU.user_profile_pic as from_user_profile_pic,
TU.id as to_user_id, TU.full_name as to_user_full_name,
TU.user_name as to_user_name,
TU.user_profile_pic as to_user_profile_pic,
TUA.authentication_status as to_user_authentication_status,
FUA.authentication_status as from_user_authentication_status,
C.verification_status as channel_verification_status,
CUP.created_at,CUP.updated_at,
guid,external_url,
CP.channel_id,CP.rss_channel_id,if(CP.rss_channel_id!=0,RC.rss_name,C.channel_name) as channel_name,
if(CP.rss_channel_id!=0,RC.rss_logo,C.profile_pic) as channel_logo,
C.channel_type,
PCD.like_count as like_count,
PCD.search_count as search_count,
PCD.view_count as view_count,
CM.channel_member_status,C.payment_status as channel_payment_status,C.payment_method as channel_payment_method,
CP.is_live_finished from `channel_users_posts` as `CUP` inner join `channel_posts` as `CP` on `CUP`.`channel_post_id` = `CP`.`id` and `is_spam` = 'N' 
left join `channels` as `C` on `CP`.`channel_id` = `C`.`id` 
left join `rss_channels` as `RC` on `CP`.`rss_channel_id` = `RC`.`id` left join `channel_members` as `CM` on `CM`.`channel_id` = `C`.`id` and `CM`.`user_id` = 427 and `CM`.`channel_member_status` != -1 
left join `test_develop_new`.`users` as `FU` on `FU`.`id` = `CUP`.`shared_from` left join `test_develop_new`.`users` as `SU` on `SU`.`id` = `CP`.`spam_reportedby` 
left join `test_develop_new`.`users` as `TU` on `TU`.`id` = `CUP`.`user_id` left join `common_auth_develop_new`.`user_authentication` as `FUA` on `FUA`.`user_id` = `FU`.`id` 
left join `common_auth_develop_new`.`user_authentication` as `TUA` on `TUA`.`user_id` = `TU`.`id` left join `post_count_details` as `PCD` on `PCD`.`channel_userspost_id` = `CUP`.`id`
where (`CP`.`is_paid` = 'N' or (`CP`.`is_paid` = 'Y' and `CP`.`payment_status` = 'S')) and (`CP`.`channel_id` in (705, 537) or (`CUP`.`user_id` in (8, 12, 427))) and `CUP`.`updated_at` < '2019-04-12 11:09:59.000000' and ((`CP`.`channel_id` != 0 and `CM`.`channel_member_status` is not null) or `CP`.`channel_id` = 0) and ((`CP`.`post_type` != 'BV' or `CP`.`user_id` = 427) or (CP.post_type ='BV' AND EXISTS(SELECT id FROM broadcast_visibility_ids WHERE post_id=CP.id AND post_visibility='PA'))) or (CP.post_type ='BV' AND EXISTS(SELECT id FROM broadcast_visibility_ids WHERE post_id=CP.id AND post_visibility IN ('CNL_A','CRY_A')) AND EXISTS(
SELECT DISTINCT channel_members.channel_id 
FROM channel_members
INNER JOIN channels ON channels.id=channel_members.channel_id
WHERE channel_members.channel_id IN (
705,537
) AND channel_members.channel_id IN (
select channel_id from channel_members where user_id = CP.user_id AND channel_member_status = 1 AND channel_member_role = '1'
) AND channels.channel_type != 46
)) or (CP.post_type ='BV' AND EXISTS(SELECT id FROM broadcast_visibility_ids WHERE post_id=CP.id AND post_visibility IN ('CNL_A','CRY_A')) AND EXISTS(
SELECT DISTINCT channel_members.channel_id 
FROM channel_members
INNER JOIN channels ON channels.id=channel_members.channel_id
WHERE channel_members.channel_id IN (
705,537
) AND channel_members.channel_id IN (
select channel_id from channel_members where user_id = CP.user_id AND channel_member_status = 1 AND channel_member_role = '1'
) AND channels.channel_type = 46
)) or (CP.post_type ='BV' AND EXISTS(SELECT id FROM broadcast_visibility_ids WHERE post_id=CP.id AND post_visibility IN ('CNL_S','CRY_S')) AND EXISTS(
SELECT DISTINCT channel_members.channel_id 
FROM channel_members
INNER JOIN channels ON channels.id=channel_members.channel_id
WHERE channel_members.channel_id IN (
705,537
) AND channel_members.channel_id IN (
select channel_id from channel_members where user_id = CP.user_id AND channel_member_status = 1
) AND channel_members.channel_id IN (SELECT visibility_ids FROM broadcast_visibility_ids WHERE post_id=CP.id AND post_visibility IN ('CNL_S','CRY_S'))
)) order by `CUP`.`updated_at` desc limit 30


核心 post 表的名称是 channel_posts 这是表的模式结构

CREATE TABLE `channel_posts` (
  `id` bigint(20) UNSIGNED NOT NULL,
  `user_id` bigint(20) NOT NULL,
  `channel_id` bigint(20) NOT NULL,
  `rss_channel_id` int(11) NOT NULL,
  `post_title` text COLLATE utf8mb4_unicode_ci NOT NULL,
  `post_content` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
  `post_type` enum('T','L','I','V','Y','G','A','MI','MV','MY','MG','MA','NS_T','NS_I','C_T','BV') COLLATE utf8mb4_unicode_ci DEFAULT 'T',
  `is_spam` enum('N','Y') COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'N',
  `spam_reportedby` bigint(20) NOT NULL,
  `privacy` int(11) NOT NULL DEFAULT '2',
  `guid` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
  `external_url` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
  `link_title` text COLLATE utf8mb4_unicode_ci NOT NULL,
  `link_content` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
  `is_breaking` enum('N','Y') COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'N',
  `is_paid` enum('N','Y') COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'N',
  `payment_status` enum('F','S') COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'F',
  `link_image` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `is_live_finished` tinyint(1) NOT NULL DEFAULT '0',
  `created_at` timestamp(6) NOT NULL DEFAULT '0000-00-00 00:00:00.000000',
  `updated_at` timestamp(6) NOT NULL DEFAULT '0000-00-00 00:00:00.000000'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

还有一张桌子channel_users_post

CREATE TABLE `channel_users_posts` (
  `id` bigint(20) UNSIGNED NOT NULL,
  `channel_post_id` bigint(20) NOT NULL,
  `parent_id` int(11) NOT NULL DEFAULT '0',
  `user_id` bigint(20) NOT NULL,
  `shared_from` bigint(20) NOT NULL,
  `new_title` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `created_at` timestamp(6) NOT NULL DEFAULT '0000-00-00 00:00:00.000000',
  `updated_at` timestamp(6) NOT NULL DEFAULT '0000-00-00 00:00:00.000000'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;


channel_post 表中有 200,000 条记录,channel_users_post 表中有 600,000 条记录,加载时间为 48586 毫秒。

【问题讨论】:

  • 使用一个带分区的表。所以内部表可以按日期拆分,MySQL 为每个部分创建一个索引,请参阅:mariadb.com/kb/en/library/partitioning-overview
  • @BerndBuffen 好的,谢谢,我指的是它。
  • @BerndBuffen 我们使用的mysql版本是5.6 我认为从8.0开始支持分区 有没有其他解决方案或者使用上述方法有什么好处
  • @Salini 我已经更新了我的答案
  • @Yidna 将速度从 48586 毫秒优化到 17443 毫秒。现在,如果我删除 order_by updated_at 速度变为 5548 毫秒。 updated_at 字段存在索引。有什么我想念的吗

标签: mysql database phpmyadmin


【解决方案1】:

另一种选择是按帖子类型和日期对帖子表进行分区。它仍然是一张表,客户端没有代码更改。 Mysql 可以为查询做分区消除。

【讨论】:

  • 从 8.0 开始支持分区 我的 mysql 版本是 5.6
  • 优化速度从 48586 毫秒到 17443 毫秒。现在,如果我删除 order_by updated_at 速度变为 5548 毫秒。 updated_at 字段存在索引。有什么我想念的吗
【解决方案2】:

您是否考虑过分页查询而不是拆分表?假设表是按时间排序的,并且上面有一个聚集索引,你可以这样做

SELECT id, time, content
FROM post
LIMIT 50 OFFSET 5000

获取第 5000 个最新帖子到第 5050 个最新帖子。

就插入时间而言,您可能会有一个 B 树索引,所以它是对数的。

此外,看起来“内容”相对于其余数据可能相当大,因此您可以确保索引按时为 alt 2,或者将其拆分到自己的表中,并在以下情况下运行单独的查询你真的想要内容。


编辑

这是一个非常大的查询,我几乎可以立即告诉您,它如此慢的原因与表的大小关系不大,而与您正在处理的数据量有关(10 JOIN s 有 11 个嵌套的 SELECTs,它们有自己的 JOINs)。

您必须一次性退回所有这些吗?或者你能得到你需要的非常基本的信息,然后在你的应用程序中进行一些计算,然后再进行一次查询吗?这样一来,磁盘和内存就不必做太多的工作,而您可以将其转移到 CPU 上。

如果需要此查询,请参阅this SO post 了解如何优化 10+ JOINs 的查询。但是,请注意,最后,OP 最终拆分了查询,因为它仍然需要太长时间。

这里的要点是编写通常不会浪费太多时间/资源的较小查询。

【讨论】:

  • 是的,有基于更新后时间的分页,例如 'where updated_at > 000050 limit 30'
  • 运行速度还慢吗?你的桌子上有什么索引?
  • 5 个字段有索引 user_id、channel_id、rss_channel_id、spam_reportedby、privacy
  • 尝试在updated_at 上创建一个新索引,看看是否能加快速度(这将是一个非聚集索引)
  • 试过了,但速度没有明显变化。实际上还有另一个名为 users_post 的表用于处理共享帖子,因此当创建帖子时,将在该表中放入一个带有 post_id、user_id 和 from_user_id 的条目。 (一些其他字段,如标题等)上面提到的 updated_at 来自此表。我已经为该字段设置了索引,但速度没有变化
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2015-03-31
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多