Clickhouse历史数据导入到doris的性能问题

Viewed 21

想把Clickhouse的历史数据导入到doris,通过创建SCHEDULE EVERY job,insert into select分批次导入。

clickhouse.test.A是在Clickhouse的表原
demo_test.A 是在doris的目标表

思路,查出demo_test.A中max time,下次job从max time开始的数据导入。
但是测了一下以下两个语句,第一个语句跨2种数据库,哪怕是查一个常量,性能都不行,出不来,但是换成第二条就没啥问题了。原因在哪里,或者有其他改进方式

select * from clickhouse.test.A as m where m.device_time >=(select '2025-02-21 08:40:00' from demo_test.A) and m.device_time<(select '2025-02-21 08:50:00' from demo_test.A)

select * from clickhouse.test.A as m where m.device_time >='2025-02-21 08:40:00' and m.device_time<'2025-02-21 08:50:00'


EXPLAIN select * from clickhouse.ddps.ts_kv_iot_main as m where m.device_time >=(select '2025-02-21 08:40:54' from demo_test.iot_main_migrate) limit 10

PLAN FRAGMENT 0
OUTPUT EXPRS:
node_id[#44]
uuid[#45]
batch_id[#46]
device_time[#47]
device_timestamp[#48]
key_id[#49]
str_v[#50]
is_delete[#51]
create_time[#52]
PARTITION: UNPARTITIONED

HAS_COLO_PLAN_NODE: false

VRESULT SINK
MYSQL_PROTOCAL

6:VEXCHANGE
offset: 0
limit: 10
distribute expr lists:

PLAN FRAGMENT 1

PARTITION: RANDOM

HAS_COLO_PLAN_NODE: false

STREAM DATA SINK
EXCHANGE ID: 06
UNPARTITIONED

5:VNESTED LOOP JOIN(482)
| join op: INNER JOIN()
| join conjuncts: (cast(device_time as DATETIMEV2(6))[#42] >= cast('2025-02-21 08:40:54' as DATETIMEV2(6))[#43])
| runtime filters: RF000[min] <- cast('2025-02-21 08:40:54' as DATETIMEV2(6))#13
| is output left side only: false
| cardinality=0
| vIntermediate tuple ids: 6
| limit: 10
| final projections: node_id[#33], uuid[#34], batch_id[#35], device_time[#36], device_timestamp[#37], key_id[#38], str_v[#39], is_delete[#40], create_time[#41]
| final project output tuple id: 7
| distribute expr lists:
| distribute expr lists:
|
|----3:VEXCHANGE
| offset: 0
| distribute expr lists:
|
4:VJdbcScanNode(441)
TABLE: "ddps"."ts_kv_iot_main"
QUERY: SELECT "node_id", "uuid", "batch_id", "device_time", "device_timestamp", "key_id", "str_v", "is_delete", "create_time" FROM "ddps"."ts_kv_iot_main"
final projections: node_id[#14], uuid[#15], batch_id[#16], device_time[#17], device_timestamp[#18], key_id[#19], str_v[#20], is_delete[#21], create_time[#22], CAST(device_time[#17] AS datetimev2(6))
final project output tuple id: 5

PLAN FRAGMENT 2

PARTITION: UNPARTITIONED

HAS_COLO_PLAN_NODE: false

STREAM DATA SINK
EXCHANGE ID: 03
UNPARTITIONED

2:VASSERT NUMBER OF ROWS(467)
| assert number of rows: EQ 1
| final projections: CAST('2025-02-21 08:40:54'[#12] AS datetimev2(6))
| final project output tuple id: 3
| distribute expr lists:
|
1:VEXCHANGE
offset: 0
distribute expr lists:

PLAN FRAGMENT 3

PARTITION: RANDOM

HAS_COLO_PLAN_NODE: false

STREAM DATA SINK
EXCHANGE ID: 01
UNPARTITIONED

0:VOlapScanNode(447)
TABLE: demo_test.iot_main_migrate(iot_main_migrate), PREAGGREGATION: ON
PREDICATES: (DORIS_DELETE_SIGN[#9] = 0)
partitions=2/2 (p20240325000000,p20250220000000)
tablets=20/20, tabletList=3000953,3000955,3000957 ...
cardinality=6, avgRowSize=16904.166, numNodes=1
pushAggOp=NONE
final projections: '2025-02-21 08:40:54'
final project output tuple id: 1

========== STATISTICS ==========
planed with unknown column statistics


explain select * from clickhouse.ddps.ts_kv_iot_main as m where m.device_time >='2025-02-21 08:40:54' limit 10

PLAN FRAGMENT 0
OUTPUT EXPRS:
node_id[#0]
uuid[#1]
batch_id[#2]
device_time[#3]
device_timestamp[#4]
key_id[#5]
str_v[#6]
is_delete[#7]
create_time[#8]
PARTITION: UNPARTITIONED

HAS_COLO_PLAN_NODE: false

VRESULT SINK
MYSQL_PROTOCAL

1:VEXCHANGE
offset: 0
limit: 10
distribute expr lists:

PLAN FRAGMENT 1

PARTITION: RANDOM

HAS_COLO_PLAN_NODE: false

STREAM DATA SINK
EXCHANGE ID: 01
UNPARTITIONED

0:VJdbcScanNode(125)
TABLE: "ddps"."ts_kv_iot_main"
QUERY: SELECT "node_id", "uuid", "batch_id", "device_time", "device_timestamp", "key_id", "str_v", "is_delete", "create_time" FROM "ddps"."ts_kv_iot_main" WHERE ("device_time" >= '2025-02-21 08:40:54') LIMIT 10
PREDICATES: (device_time[#3] >= '2025-02-21 08:40:54')
limit: 10

========== STATISTICS ==========
planed with unknown column statistics

1 Answers

应该是plan的问题了。

  1. 需要Doris的具体版本,可以show backend 看version;
  2. 上边的两个sql 例子,取个explain $SQL 看下执行计划,可以贴到帖子中或者您私聊我一下微信,我们一起看下