0%

ClickHouse 由MergeTree转换为ReplicatedMergeTree(副本集)

ClickHouse 由MergeTree转换为ReplicatedMergeTree(副本集)

测试环境准备

创建测试数据库

1
2
3
create database test;

use test;

创建测试表

1
2
3
4
5
6
7
8
9
CREATE TABLE foo
(
`A` Int64,
`D` Date,
`S` String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(D)
ORDER BY A

插入测试数据

1
2
3
4
5
6
7
8
9
10
11
INSERT INTO foo SELECT
number,
today(),
''
FROM numbers(100);

INSERT INTO foo SELECT
number,
today() - 60,
''
FROM numbers(100)

确认测试数据插入OK

1
2
3
4
5
6
7
8
9
10
11
12
localhost :) select count() from foo;

SELECT count()
FROM foo

Query id: 05e4bb12-e307-4613-8e29-5721edbfa1a8

┌─count()─┐
│ 200 │
└─────────┘

1 rows in set. Elapsed: 0.005 sec.

方案

方案一:创建复制表后,直接将原表数据插入,然后再重命名表名

  • 适合小数据量
  • 需保证存储可用,此方案保留”旧”数据

创建复制表

1
2
3
4
5
6
7
8
9
CREATE TABLE foo_replicated
(
`A` Int64,
`D` Date,
`S` String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/{table}', '{replica}')
PARTITION BY toYYYYMM(D)
ORDER BY A

将原表中的数据插入新表

1
2
INSERT INTO foo_replicated SELECT *
FROM foo

分别在复制集节点确认数据

1
2
3
4
5
6
7
8
9
10
11
12
localhost :) select count() from foo_replicated;

SELECT count()
FROM foo_replicated

Query id: 1467d6e7-8e42-4bdd-9fbc-ad2d54f6d39b

┌─count()─┐
│ 200 │
└─────────┘

1 rows in set. Elapsed: 0.005 sec.

按需重命名表名

1
2
3
RENAME TABLE foo TO foo_old;

RENAME TABLE foo_replicated TO foo;

show create foo 可以看到重命名并不会更改ZK中的配置,新表复用原表名的话,建议一开始就规划好

1
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/test/foo_replicated', '{replica}')

方案二:将原表数据目录中的数据”拷贝或移动”到新表的数据目录”detached”中,然后执行”ALTER TABLE ATTACH PARTITION”

  • 适合大数据量
  • 存储空间够用的话,建议拷贝数据

建好复制集表后,拷贝数据

为了保持数据一致性,可以先将数据DETACH(或应用层面暂停数据变更操作)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
SELECT
database,
table,
partition
FROM system.parts
WHERE table = 'foo';

#SYSTEM STOP MERGES test.foo; 如果有此操作,记得迁移完毕后按需SYSTEM STOP MERGES db.table

ALTER TABLE foo
DETACH PARTITION '202212'

ALTER TABLE foo
DETACH PARTITION '202302'

拷贝数据

1
2
3
4
5
cd /var/lib/clickhouse/data/test

cp -r foo/detached/* foo_replicated/detached/ # copy的要注意权限、属主问题,建议rsync

chown -R clickhouse:clickhouse foo_replicated/detached/*

复制集表ATTACH数据

1
2
3
4
5
ALTER TABLE foo_replicated
ATTACH PARTITION '202212';

ALTER TABLE foo_replicated
ATTACH PARTITION '202302'

校验数据

1
2
3
4
5
6
7
8
9
10
11
12
localhost :) select count() from foo_replicated

SELECT count()
FROM foo_replicated

Query id: aa1ceb41-42bd-45fa-b891-48efcb430578

┌─count()─┐
│ 200 │
└─────────┘

1 rows in set. Elapsed: 0.010 sec.

PARTITION过多时操作起来很不优雅,并且容易遗漏

提前生成DETACH | ATTACH语句

DETACH

1
2
3
4
5
6
7
8
9
10
11
12
SELECT DISTINCT concat('ALTER TABLE foo DETACH PARTITION \'', partition_id, '\';')
FROM system.parts
WHERE (table = 'foo') AND active

Query id: ae34e9b9-e278-4af1-a95c-50d21de7161e

┌─concat('ALTER TABLE foo DETACH PARTITION \'', partition_id, '\';')─┐
│ ALTER TABLE foo DETACH PARTITION '202212'; │
│ ALTER TABLE foo DETACH PARTITION '202302'; │
└────────────────────────────────────────────────────────────────────┘

2 rows in set. Elapsed: 0.013 sec.

ATTACH

1
2
3
4
5
6
7
8
9
10
SELECT DISTINCT concat('ALTER TABLE foo_replicated ATTACH PARTITION \'', partition_id, '\';')
FROM system.parts
WHERE (table = 'foo') AND active

Query id: e380689b-4532-430e-904c-c68eba49ce94

┌─concat('ALTER TABLE foo_replicated ATTACH PARTITION \'', partition_id, '\';')─┐
│ ALTER TABLE foo_replicated ATTACH PARTITION '202212'; │
│ ALTER TABLE foo_replicated ATTACH PARTITION '202302'; │
└───────────────────────────────────────────────────────────────────────────────┘

方案三:最便捷、高效的方式ALTER TABLE table ATTACH PARTITION ID partition_id FROM table

生成相关命令

1
2
3
4
5
6
7
8
9
10
SELECT DISTINCT concat('ALTER TABLE foo_replicated ATTACH PARTITION ID \'', partition_id, '\' FROM foo;')
FROM system.parts
WHERE (table = 'foo') AND active

Query id: ff5bf6c5-cd96-4e24-9069-d58b0691928b

┌─concat('ALTER TABLE foo_replicated ATTACH PARTITION ID \'', partition_id, '\' FROM foo;')─┐
│ ALTER TABLE foo_replicated ATTACH PARTITION ID '202212' FROM foo; │
│ ALTER TABLE foo_replicated ATTACH PARTITION ID '202302' FROM foo; │
└───────────────────────────────────────────────────────────────────────────────────────────┘

执行

1
2
ALTER TABLE foo_replicated ATTACH PARTITION ID '202212' FROM foo;
... ...

之所以高效是因为其基于文件系统的硬链接(不额外占用存储),可以对比下Links

1
2
3
4
root@localhost:/var/lib/clickhouse/data/test/foo/202212_2_2_0# stat data.bin
File: data.bin
Size: 503 Blocks: 8 IO Block: 4096 regular file
Device: fd01h/64769d Inode: 1731864 Links: 1
1
2
3
4
root@localhost:/var/lib/clickhouse/data/test/foo/202212_2_2_0# stat data.bin
File: data.bin
Size: 503 Blocks: 8 IO Block: 4096 regular file
Device: fd01h/64769d Inode: 1731864 Links: 2