【发布时间】:2019-04-22 13:55:28
【问题描述】:
我有一个运行本地 memsql 集群的 CentOS 服务器(聚合器和叶子在同一台机器上)。我有一个名为offers 的数据库。 由于某种原因,我无法对数据库中的表执行任何查询。
在我尝试将另一台机器添加到集群之前,一切正常。我让我所在的 IT 团队(完全)复制了我正在使用的服务器。我转到复制的服务器,删除有问题的数据库,然后使用memsql-toolbox-config register-node 命令注册服务器。然后数据库显示它处于过渡状态。我使用memsql-ops 重新启动了 memsql 并遇到了这种情况。
运行一个简单的查询会产生:
memsql> select * from table;
ERROR 2261 (HY000): Query `select * from table` couldn't be executed because of an in progress failover operation. Check the status of the leaf nodes in the cluster (error 1049:'Leaf Error (172.26.32.20:3307): Unknown database 'offers_5'')
集群状态命令的输出是:
memsql> show cluster status;
+---------+--------------+------+----------+-------------+-------------+----------+--------------+-------------+-------------------------+----------------------+----------------------+---------------+-------------------------------------------------+
| Node ID | Host | Port | Database | Role | State | Position | Master Host | Master Port | Metadata Master Node ID | Metadata Master Host | Metadata Master Port | Metadata Role | Details |
+---------+--------------+------+----------+-------------+-------------+----------+--------------+-------------+-------------------------+----------------------+----------------------+---------------+-------------------------------------------------+
| 1 | 172.26.32.20 | 3306 | cluster | master | online | 0:181 | NULL | NULL | NULL | NULL | NULL | Reference | |
| 1 | 172.26.32.20 | 3306 | offers | master | online | 0:156505 | NULL | NULL | NULL | NULL | NULL | Reference | |
| 2 | 172.26.32.20 | 3307 | cluster | async slave | replicating | 0:180 | 172.26.32.20 | 3306 | 1 | 172.26.32.20 | 3306 | Reference | stage: packet wait, state: x_streaming, err: no |
| 2 | 172.26.32.20 | 3307 | offers | sync slave | replicating | 0:156505 | 172.26.32.20 | 3306 | 1 | 172.26.32.20 | 3306 | Reference | |
+---------+--------------+------+----------+-------------+-------------+----------+--------------+-------------+-------------------------+----------------------+----------------------+---------------+-------------------------------------------------+
4 rows in set (0.00 sec)
看来第二个节点正在复制。另请注意详细信息列:
stage: packet wait, state: x_streaming, err: no
运行复制状态命令给出:
memsql> show replication status;
+--------+----------+------------+--------------+------------------+--------------------+------------------+----------------+----------------+-----------+---------------------------+-------------+-----------------+-------------------+-----------------+---------------+---------------+
| Role | Database | Master_URI | Master_State | Master_CommitLSN | Master_HardenedLSN | Master_ReplayLSN | Master_TailLSN | Master_Commits | Connected | Slave_URI | Slave_State | Slave_CommitLSN | Slave_HardenedLSN | Slave_ReplayLSN | Slave_TailLSN | Slave_Commits |
+--------+----------+------------+--------------+------------------+--------------------+------------------+----------------+----------------+-----------+---------------------------+-------------+-----------------+-------------------+-----------------+---------------+---------------+
| master | cluster | NULL | online | 0:181 | 0:181 | 0:177 | 0:181 | 86 | yes | 172.26.32.20:3307/cluster | replicating | 0:180 | 0:181 | 0:180 | 0:181 | 84 |
| master | offers | NULL | online | 0:156505 | 0:156505 | 0:156505 | 0:156505 | 183 | yes | 172.26.32.20:3307/offers | replicating | 0:156505 | 0:156505 | 0:156505 | 0:156505 | 183 |
+--------+----------+------------+--------------+------------------+--------------------+------------------+----------------+----------------+-----------+---------------------------+-------------+-----------------+-------------------+-----------------+---------------+---------------+
2 rows in set (0.00 sec)
我从未启动过任何故障转移或复制。任何人都知道为什么会这样?我该如何解决这个问题?
编辑:
使用memsql-ops 我得到:
[me@memsql ~]$ memsql-ops memsql-list
ID Agent Id Process State Cluster State Role Host Port Version
33829AF Af13af7 RUNNING CONNECTED MASTER 172.26.32.20 3306 6.5.18
BBA1B61 Af13af7 RUNNING CONNECTED LEAF 172.26.32.20 3307 6.5.18
但是使用memsql-admin,使用新的 memsql 工具:
[me@memsql ~]$ memsql-admin list-nodes
✘ Failed to list nodes on all hosts: failed to list nodes on 1 host:
172.26.32.20
No nodes found
让我的问题更清楚 - 如何让我的服务器再次响应查询?在我这样做之后,我应该如何添加另一个主机?我应该完全清除任何 memsql 数据的复制服务器吗?
第二次编辑:
我设法通过删除我的数据库和集群数据并使用新的 MemSQL 工具设置一个新的、丢弃 MemsqlOps 来解决这个问题。阅读我的answer。
【问题讨论】:
标签: database singlestore