有三台CentOS7服务器安装了ClickHouse
| HostName | IP | 安装程序 | 程序端口 |
| centf8118.sharding1.db | 192.168.81.18 | clickhouse-server,clickhouse-client | 9000 |
| centf8119.sharding2.db | 192.168.81.19 | clickhouse-server,clickhouse-client | 9000 |
| centf8120.sharding3.db | 192.168.81.20 | clickhouse-server,clickhouse-client | 9000 |
clickhouse集群部署简要步骤:
- Install ClickHouse server on all machines of the cluster
- Set up cluster configs in configuration files
- Create local tables on each instance
- Create a Distributed table
- 在群集的所有计算机上安装ClickHouse服务器
- 在配置文件中设置群集配置
- 在每个实例上创建本地表
- 创建一个 分布式表
分布式表 实际上是一种 “view” 到ClickHouse集群的本地表。从分布式表中选择查询使用集群所有分片的资源执行。您可以为多个集群指定configs,并创建多个分布式表,为不同的集群提供视图。
第一步:在群集的所有计算机上安装ClickHouse服务器
参考文章:【ClickHouse】1:clickhouse安装 (CentOS7)
第二步:在配置文件中设置集群配置
2.1: 先在/etc/clickhouse-server/config.xml中新增下面一段内容。不配置的话就默认就在/etc/metrika.xml目录,我这里调整到/etc/clickhouse-server/目录。
2.2: 再配置/etc/clickhouse-server/metrika.xml文件,配置好后同步到其他两台机器。
<yandex>
<!-- 集群配置 -->
<clickhouse_remote_servers>
<!-- 3分片1备份 -->
<cluster_3shards_1replicas>
<!-- 数据分片1 -->
<shard>
<replica>
<host>centf8118.sharding1.db</host>
<port>9000</port>
</replica>
</shard>
<!-- 数据分片2 -->
<shard>
<replica>
<host>centf8119.sharding2.db</host>
<port> 9000</port>
</replica>
</shard>
<!-- 数据分片3 -->
<shard>
<replica>
<host>centf8120.sharding3.db</host>
<port>9000</port>
</replica>
</shard>
</cluster_3shards_1replicas>
</clickhouse_remote_servers>
</yandex>
说明:
- clickhouse_remote_servers与config.xml中的incl属性值对应;
- cluster_3shards_1replicas是集群名,可以随便取名;
- 共设置3个分片,每个分片只有1个副本;
打开clickhouse-client,查看集群:
centf8118.sharding1.db :) select * from system.clusters; SELECT * FROM system.clusters ┌─cluster───────────────────────────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name──────────────┬─host_address─┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐ │ cluster_3shards_1replicas │ 1 │ 1 │ 1 │ centf8118.sharding1.db │ 192.168.81.18│ 9000 │ 1 │ default │ │ 0 │ 0 │ │ cluster_3shards_1replicas │ 2 │ 1 │ 1 │ centf8119.sharding2.db │ 192.168.81.19│ 9000 │ 0 │ default │ │ 0 │ 0 │ │ cluster_3shards_1replicas │ 3 │ 1 │ 1 │ centf8120.sharding3.db │ 192.168.81.20│ 9000 │ 0 │ default │ │ 0 │ 0 │ │ test_cluster_two_shards │ 1 │ 1 │ 1 │ 127.0.0.1 │ 127.0.0.1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ │ test_cluster_two_shards │ 2 │ 1 │ 1 │ 127.0.0.2 │ 127.0.0.2 │ 9000 │ 0 │ default │ │ 0 │ 0 │ │ test_cluster_two_shards_localhost │ 1 │ 1 │ 1 │ localhost │ 127.0.0.1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ │ test_cluster_two_shards_localhost │ 2 │ 1 │ 1 │ localhost │ 127.0.0.1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ │ test_shard_localhost │ 1 │ 1 │ 1 │ localhost │ 127.0.0.1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ │ test_shard_localhost_secure │ 1 │ 1 │ 1 │ localhost │ 127.0.0.1 │ 9440 │ 0 │ default │ │ 0 │ 0 │ │ test_unavailable_shard │ 1 │ 1 │ 1 │ localhost │ 127.0.0.1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ │ test_unavailable_shard │ 2 │ 1 │ 1 │ localhost │ 127.0.0.1 │ 1 │ 0 │ default │ │ 0 │ 0 │ └───────────────────────────────────┴───────────┴──────────────┴─────────────┴────────────────────────┴──────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────────────┘ 11 rows in set. Elapsed: 0.004 sec.
可以看到cluster_3shards_1replicas就是我们定义的集群名称,一共有三个分片,每个分片有一份数据。
第三步:在每个实例上创建本地表
3.1:数据准备
前置条件:这里用到官方提供的OnTime数据集,先下载下来,并按照文档建表。
教程: https://clickhouse.yandex/docs/en/single/?query=internal_replication#ontime
有两种方式:
- import from raw data
- download of prepared partitions
第一种方式:import from raw data
先下载数据
# cd /data/clickhouse/tmp for s in `seq 1987 2018` do for m in `seq 1 12` do wget https://transtats.bts.gov/PREZIP/On_Time_Reporting_Carrier_On_Time_Performance_1987_present_${s}_${m}.zip done done
再创建表 ontime
CREATE TABLE `ontime` ( `Year` UInt16, `Quarter` UInt8, `Month` UInt8, `DayofMonth` UInt8, `DayOfWeek` UInt8, `FlightDate` Date, `UniqueCarrier` FixedString(7), `AirlineID` Int32, `Carrier` FixedString(2), `TailNum` String, `FlightNum` String, `OriginAirportID` Int32, `OriginAirportSeqID` Int32, `OriginCityMarketID` Int32, `Origin` FixedString(5), `OriginCityName` String, `OriginState` FixedString(2), `OriginStateFips` String, `OriginStateName` String, `OriginWac` Int32, `DestAirportID` Int32, `DestAirportSeqID` Int32, `DestCityMarketID` Int32, `Dest` FixedString(5), `DestCityName` String, `DestState` FixedString(2), `DestStateFips` String, `DestStateName` String, `DestWac` Int32, `CRSDepTime` Int32, `DepTime` Int32, `DepDelay` Int32, `DepDelayMinutes` Int32, `DepDel15` Int32, `DepartureDelayGroups` String, `DepTimeBlk` String, `TaxiOut` Int32, `WheelsOff` Int32, `WheelsOn` Int32, `TaxiIn` Int32, `CRSArrTime` Int32, `ArrTime` Int32, `ArrDelay` Int32, `ArrDelayMinutes` Int32, `ArrDel15` Int32, `ArrivalDelayGroups` Int32, `ArrTimeBlk` String, `Cancelled` UInt8, `CancellationCode` FixedString(1), `Diverted` UInt8, `CRSElapsedTime` Int32, `ActualElapsedTime` Int32, `AirTime` Int32, `Flights` Int32, `Distance` Int32, `DistanceGroup` UInt8, `CarrierDelay` Int32, `WeatherDelay` Int32, `NASDelay` Int32, `SecurityDelay` Int32, `LateAircraftDelay` Int32, `FirstDepTime` String, `TotalAddGTime` String, `LongestAddGTime` String, `DivAirportLandings` String, `DivReachedDest` String, `DivActualElapsedTime` String, `DivArrDelay` String, `DivDistance` String, `Div1Airport` String, `Div1AirportID` Int32, `Div1AirportSeqID` Int32, `Div1WheelsOn` String, `Div1TotalGTime` String, `Div1LongestGTime` String, `Div1WheelsOff` String, `Div1TailNum` String, `Div2Airport` String, `Div2AirportID` Int32, `Div2AirportSeqID` Int32, `Div2WheelsOn` String, `Div2TotalGTime` String, `Div2LongestGTime` String, `Div2WheelsOff` String, `Div2TailNum` String, `Div3Airport` String, `Div3AirportID` Int32, `Div3AirportSeqID` Int32, `Div3WheelsOn` String, `Div3TotalGTime` String, `Div3LongestGTime` String, `Div3WheelsOff` String, `Div3TailNum` String, `Div4Airport` String, `Div4AirportID` Int32, `Div4AirportSeqID` Int32, `Div4WheelsOn` String, `Div4TotalGTime` String, `Div4LongestGTime` String, `Div4WheelsOff` String, `Div4TailNum` String, `Div5Airport` String, `Div5AirportID` Int32, `Div5AirportSeqID` Int32, `Div5WheelsOn` String, `Div5TotalGTime` String, `Div5LongestGTime` String, `Div5WheelsOff` String, `Div5TailNum` String ) ENGINE = MergeTree PARTITION BY Year ORDER BY (Carrier, FlightDate) SETTINGS index_granularity = 8192;