有三台CentOS7服务器安装了ClickHouse

HostName IP 安装程序 程序端口
centf8118.sharding1.db 192.168.81.18 clickhouse-server,clickhouse-client 9000
centf8119.sharding2.db 192.168.81.19 clickhouse-server,clickhouse-client 9000
centf8120.sharding3.db 192.168.81.20 clickhouse-server,clickhouse-client 9000

 

 

 

 

clickhouse集群部署简要步骤:

  1. Install ClickHouse server on all machines of the cluster
  2. Set up cluster configs in configuration files
  3. Create local tables on each instance
  4. Create a Distributed table
  1. 在群集的所有计算机上安装ClickHouse服务器
  2. 在配置文件中设置群集配置
  3. 在每个实例上创建本地表
  4. 创建一个 分布式表

分布式表 实际上是一种 “view” 到ClickHouse集群的本地表。从分布式表中选择查询使用集群所有分片的资源执行。您可以为多个集群指定configs,并创建多个分布式表,为不同的集群提供视图。

 

第一步:在群集的所有计算机上安装ClickHouse服务器

参考文章:【ClickHouse】1:clickhouse安装 (CentOS7)

 

第二步:在配置文件中设置集群配置

2.1: 先在/etc/clickhouse-server/config.xml中新增下面一段内容。不配置的话就默认就在/etc/metrika.xml目录,我这里调整到/etc/clickhouse-server/目录。

【ClickHouse】5:clickhouse集群部署

 

2.2: 再配置/etc/clickhouse-server/metrika.xml文件,配置好后同步到其他两台机器。

<yandex>
<!-- 集群配置 -->
<clickhouse_remote_servers>
    <!-- 3分片1备份 -->
    <cluster_3shards_1replicas>
        <!-- 数据分片1  -->
        <shard>
            <replica>
                <host>centf8118.sharding1.db</host>
                <port>9000</port>
            </replica>
        </shard>
        <!-- 数据分片2  -->
        <shard>
            <replica>
                <host>centf8119.sharding2.db</host>
                <port> 9000</port>
            </replica>
        </shard>
        <!-- 数据分片3  -->
        <shard>
            <replica>
                <host>centf8120.sharding3.db</host>
                <port>9000</port>
            </replica>
        </shard>
    </cluster_3shards_1replicas>
</clickhouse_remote_servers>
</yandex>

 

说明:

  • clickhouse_remote_servers与config.xml中的incl属性值对应;
  • cluster_3shards_1replicas是集群名,可以随便取名;
  • 共设置3个分片,每个分片只有1个副本;

打开clickhouse-client,查看集群:
centf8118.sharding1.db :) select * from system.clusters;

SELECT *
FROM system.clusters

┌─cluster───────────────────────────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name──────────────┬─host_address─┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐
│ cluster_3shards_1replicas         │         111 │ centf8118.sharding1.db │ 192.168.81.1890001 │ default │                  │            00 │
│ cluster_3shards_1replicas         │         211 │ centf8119.sharding2.db │ 192.168.81.1990000 │ default │                  │            00 │
│ cluster_3shards_1replicas         │         311 │ centf8120.sharding3.db │ 192.168.81.2090000 │ default │                  │            00 │
│ test_cluster_two_shards           │         111127.0.0.1127.0.0.190001 │ default │                  │            00 │
│ test_cluster_two_shards           │         211127.0.0.2127.0.0.290000 │ default │                  │            00 │
│ test_cluster_two_shards_localhost │         111 │ localhost              │ 127.0.0.190001 │ default │                  │            00 │
│ test_cluster_two_shards_localhost │         211 │ localhost              │ 127.0.0.190001 │ default │                  │            00 │
│ test_shard_localhost              │         111 │ localhost              │ 127.0.0.190001 │ default │                  │            00 │
│ test_shard_localhost_secure       │         111 │ localhost              │ 127.0.0.194400 │ default │                  │            00 │
│ test_unavailable_shard            │         111 │ localhost              │ 127.0.0.190001 │ default │                  │            00 │
│ test_unavailable_shard            │         211 │ localhost              │ 127.0.0.110 │ default │                  │            00 │
└───────────────────────────────────┴───────────┴──────────────┴─────────────┴────────────────────────┴──────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────────────┘

11 rows in set. Elapsed: 0.004 sec. 

 

可以看到cluster_3shards_1replicas就是我们定义的集群名称,一共有三个分片,每个分片有一份数据。

 

 

第三步:在每个实例上创建本地表

3.1:数据准备

前置条件:这里用到官方提供的OnTime数据集,先下载下来,并按照文档建表。

教程: https://clickhouse.yandex/docs/en/single/?query=internal_replication#ontime

 有两种方式:

  • import from raw data
  • download of prepared partitions

第一种方式:import from raw data

 先下载数据

# cd /data/clickhouse/tmp

for s in `seq 1987 2018`
do
for m in `seq 1 12`
do
wget https://transtats.bts.gov/PREZIP/On_Time_Reporting_Carrier_On_Time_Performance_1987_present_${s}_${m}.zip
done
done

 

再创建表 ontime

CREATE TABLE `ontime` (
  `Year` UInt16,
  `Quarter` UInt8,
  `Month` UInt8,
  `DayofMonth` UInt8,
  `DayOfWeek` UInt8,
  `FlightDate` Date,
  `UniqueCarrier` FixedString(7),
  `AirlineID` Int32,
  `Carrier` FixedString(2),
  `TailNum` String,
  `FlightNum` String,
  `OriginAirportID` Int32,
  `OriginAirportSeqID` Int32,
  `OriginCityMarketID` Int32,
  `Origin` FixedString(5),
  `OriginCityName` String,
  `OriginState` FixedString(2),
  `OriginStateFips` String,
  `OriginStateName` String,
  `OriginWac` Int32,
  `DestAirportID` Int32,
  `DestAirportSeqID` Int32,
  `DestCityMarketID` Int32,
  `Dest` FixedString(5),
  `DestCityName` String,
  `DestState` FixedString(2),
  `DestStateFips` String,
  `DestStateName` String,
  `DestWac` Int32,
  `CRSDepTime` Int32,
  `DepTime` Int32,
  `DepDelay` Int32,
  `DepDelayMinutes` Int32,
  `DepDel15` Int32,
  `DepartureDelayGroups` String,
  `DepTimeBlk` String,
  `TaxiOut` Int32,
  `WheelsOff` Int32,
  `WheelsOn` Int32,
  `TaxiIn` Int32,
  `CRSArrTime` Int32,
  `ArrTime` Int32,
  `ArrDelay` Int32,
  `ArrDelayMinutes` Int32,
  `ArrDel15` Int32,
  `ArrivalDelayGroups` Int32,
  `ArrTimeBlk` String,
  `Cancelled` UInt8,
  `CancellationCode` FixedString(1),
  `Diverted` UInt8,
  `CRSElapsedTime` Int32,
  `ActualElapsedTime` Int32,
  `AirTime` Int32,
  `Flights` Int32,
  `Distance` Int32,
  `DistanceGroup` UInt8,
  `CarrierDelay` Int32,
  `WeatherDelay` Int32,
  `NASDelay` Int32,
  `SecurityDelay` Int32,
  `LateAircraftDelay` Int32,
  `FirstDepTime` String,
  `TotalAddGTime` String,
  `LongestAddGTime` String,
  `DivAirportLandings` String,
  `DivReachedDest` String,
  `DivActualElapsedTime` String,
  `DivArrDelay` String,
  `DivDistance` String,
  `Div1Airport` String,
  `Div1AirportID` Int32,
  `Div1AirportSeqID` Int32,
  `Div1WheelsOn` String,
  `Div1TotalGTime` String,
  `Div1LongestGTime` String,
  `Div1WheelsOff` String,
  `Div1TailNum` String,
  `Div2Airport` String,
  `Div2AirportID` Int32,
  `Div2AirportSeqID` Int32,
  `Div2WheelsOn` String,
  `Div2TotalGTime` String,
  `Div2LongestGTime` String,
  `Div2WheelsOff` String,
  `Div2TailNum` String,
  `Div3Airport` String,
  `Div3AirportID` Int32,
  `Div3AirportSeqID` Int32,
  `Div3WheelsOn` String,
  `Div3TotalGTime` String,
  `Div3LongestGTime` String,
  `Div3WheelsOff` String,
  `Div3TailNum` String,
  `Div4Airport` String,
  `Div4AirportID` Int32,
  `Div4AirportSeqID` Int32,
  `Div4WheelsOn` String,
  `Div4TotalGTime` String,
  `Div4LongestGTime` String,
  `Div4WheelsOff` String,
  `Div4TailNum` String,
  `Div5Airport` String,
  `Div5AirportID` Int32,
  `Div5AirportSeqID` Int32,
  `Div5WheelsOn` String,
  `Div5TotalGTime` String,
  `Div5LongestGTime` String,
  `Div5WheelsOff` String,
  `Div5TailNum` String
) ENGINE = MergeTree
PARTITION BY Year
ORDER BY (Carrier, FlightDate)
SETTINGS index_granularity = 8192;
View Code

相关文章: