Doris 单表导入 SQL 文件数据数据很慢

Viewed 50

有一个 SQL 文件,大小 3 个GB,物联网的设备的历史数据。

已经使用主键模型建了的分区表,SQL 文件所有的数据都有对应的分区,主键为(update_time,id)。

SQL 的内容如下,都是 INSERT INTO VALUES 语句,字段名称和 ID 不便展示隐掉了。

INSERT INTO xxxx (id,update_time,fielda,filedb,fieldc,fieldd) VALUES 
('xxx','2023-07-10 05:03:56',NULL,'2000.0',NULL,4)
,('xxx','2023-07-10 05:14:16',NULL,'2000.0',NULL,4)
,('xxx','2023-07-10 05:24:37',NULL,'1137.0',NULL,4)
,('xxx','2023-07-10 05:34:57',NULL,'1405.0',NULL,4)
,('xxx','2023-07-10 05:45:20',NULL,'2037.0',NULL,4)
,('xxx','2023-07-10 05:55:42',NULL,'2118.0',NULL,4)
,('xxx','2023-07-10 06:06:06',NULL,'1172.0',NULL,4)
,('xxx','2023-07-10 06:16:29',NULL,'1334.0',NULL,4)
,('xxx','2023-07-10 06:26:50',NULL,'2062.0',NULL,4)
,('xxx','2023-07-10 06:37:11',NULL,'2183.0',NULL,4)
;
INSERT INTO xxxx (id,update_time,fielda,filedb,fieldc,fieldd) VALUES 
('xxx','2023-07-11 05:03:56',NULL,'2000.0',NULL,4)
,('xxx','2023-07-11 05:14:16',NULL,'2000.0',NULL,4)
,('xxx','2023-07-11 05:24:37',NULL,'1137.0',NULL,4)
,('xxx','2023-07-11 05:34:57',NULL,'1405.0',NULL,4)
,('xxx','2023-07-11 05:45:20',NULL,'2037.0',NULL,4)
,('xxx','2023-07-11 05:55:42',NULL,'2118.0',NULL,4)
,('xxx','2023-07-11 06:06:06',NULL,'1172.0',NULL,4)
,('xxx','2023-07-11 06:16:29',NULL,'1334.0',NULL,4)
,('xxx','2023-07-11 06:26:50',NULL,'2062.0',NULL,4)
,('xxx','2023-07-11 06:37:11',NULL,'2183.0',NULL,4)
;

INSERT INTO xxxx (id,update_time,fielda,filedb,fieldc,fieldd) VALUES 
('xxx','2023-07-12 05:03:56',NULL,'2000.0',NULL,4)
,('xxx','2023-07-12 05:14:16',NULL,'2000.0',NULL,4)
,('xxx','2023-07-12 05:24:37',NULL,'1137.0',NULL,4)
,('xxx','2023-07-12 05:34:57',NULL,'1405.0',NULL,4)
,('xxx','2023-07-12 05:45:20',NULL,'2037.0',NULL,4)
,('xxx','2023-07-12 05:55:42',NULL,'2118.0',NULL,4)
,('xxx','2023-07-12 06:06:06',NULL,'1172.0',NULL,4)
,('xxx','2023-07-12 06:16:29',NULL,'1334.0',NULL,4)
,('xxx','2023-07-12 06:26:50',NULL,'2062.0',NULL,4)
,('xxx','2023-07-12 06:37:11',NULL,'2183.0',NULL,4)
;

使用 mysql -u name -p dbname < data.sql 导入数据,导入过程持续 4 个小时了没有结束,想请教一下有什么能够快速导入的办法吗?

  1. Doris 集群,一个 fe,三个 be。
  2. 集群服务器的配置均为 16 核心,32GB,导入任务并没有导致资源占用很高。
1 Answers

可以开启group commit导入,可加上global 设置为全局:
set global group_commit = async_mode;