【已解决】2.0.4 大表自动收集统计信息,查询语句占用内存太大导致报错,为什么会占用这么大内存?

Viewed 140

我有个很大的表的统计信息一直报错,我设置了全局的内存查询限制,但是限制是很大,按照文档(https://doris.apache.org/zh-CN/docs/query/nereids/statistics 说明,几百万数据应该不会触发限制的,不知道为啥一直因为限制触发报错;

统计信息的收集作业本身需要占用一定的系统资源,为了尽可能降低开销,Doris会使用采样的方式去收集,自动采样默认采样4194304(2^22)行,以尽可能降低对系统造成的负担并尽快完成收集作业。如果希望采样更多的行以获得更准确的数据分布信息,可通过调整参数huge_table_default_sample_rows增大采样行数。用户还可通过参数控制小表全量收集,大表收集时间间隔等行为。详细配置请参考详3.1。

默认是这样的查询吧
为啥统计信息查询是这个查询?看起来不止查了400万数据

报错

2024-05-27 15:38:42,450 WARN (Analysis Job Executor-0|590) [Coordinator.getNext():1175] Query 19400d22ea734c53-940ada39180fac22 failed: (192.168.73.217)[MEM_LIMIT_EXCEEDED]PreCatch error code:11, [E11] Allocator mem tracke

建表语句

SELECT CONCAT('370740', '-', '-1', '-', 'time') AS id, 0 AS catalog_id, 370739 AS db_id, 370740 AS tbl_id, -1 AS idx_id, 'time' AS col_id, NULL AS part_id, 54594308155 AS row_count, SUM(t1.count) * COUNT(1) / (SUM(t1.count) - SUM(IF(t1.count = 1, 1, 0)) + SUM(IF(t1.count = 1, 1, 0)) * SUM(t1.count) / 54594308155) as ndv, IFNULL(SUM(IF(t1.column_key IS NULL, t1.count, 0)), 0) * 10.638664575052658 as null_count, SUBSTRING(CAST('2021-04-13 20:31:40' AS STRING), 1, 1024) AS min, SUBSTRING(CAST('2024-05-27 15:32:42' AS STRING), 1, 1024) AS max, SUM(t1.count) * 16 * 10.638664575052658 AS data_size, NOW() FROM ( SELECT t0.time as column_key, COUNT(1) as count FROM (SELECT time FROM internal.default_cluster:test_db.test_table TABLET(3670726, 3670758, 3670778, 3670842, 3670894, 4594031, 3670946, 3670974, 3671030, 3671070, 3671154, 3671194, 3671242, 4368127, 3671266, 3671350, 3671398, 3671422, 4231304, 4172842, 3759423, 3671478, 3786165, 3671518, 3849385, 4518637, 3671562, 3671606, 4198033, 3671650, 3671702, 4091931, 3671778, 3735589, 3671878, 3671882, 3993104, 3671938, 3671986, 3672066, 3688065, 3672102, 4124324, 4399905, 3672146, 3672178, 4482706, 3897478, 4017035, 4148488, 3672246, 3824406, 3969157, 3672266, 3672330, 3672374, 3672410, 3672474, 3672522, 4335789, 3672586, 3672614, 3711721, 4263844, 3921549, 3672678, 4040312, 3672702, 3672762, 3672822, 3672878, 3672906, 4561202, 4296784, 3672938, 3672994, 3873491, 3945459, 4066006, 3673062, 3673102, 3673162, 3673218, 3673230, 3673294, 3673346, 3673406, 4431951, 3673458, 3673494, 3673522, 3673586, 3673630) ) as t0 GROUP BY t0.time ) as t1

1 Answers

对于分区列,分桶列和key列,可能会超过400万行的限制。尤其是分区列,每个分区至少要抽样一个tablet,来保证准确性。可能这个time列是分区列?

后续我们会支持分区级别的统计信息收集,应该能缓解这个问题。