fe.warn.log中持续输出查询oom,为收集统计信息sql,如下
SELECT
CONCAT('370740', '-', '-1', '-', 'time') AS `id`,
0 AS `catalog_id`,
370739 AS `db_id`,
370740 AS `tbl_id`,
-1 AS `idx_id`,
'time' AS `col_id`,
NULL AS `part_id`,
54572517117 AS `row_count`,
SUM(`t1`.`count`) * COUNT(1) / (SUM(`t1`.`count`) - SUM(IF(`t1`.`count` = 1, 1, 0)) + SUM(IF(`t1`.`count` = 1, 1, 0)) * SUM(`t1`.`count`) / 54572517117) as `ndv`,
IFNULL(SUM(IF(`t1`.`column_key` IS NULL, `t1`.`count`, 0)), 0) * 11.279462585593992 as `null_count`,
SUBSTRING(CAST('2021-04-13 20:31:40' AS STRING), 1, 1024) AS `min`,
SUBSTRING(CAST('2024-05-27 10:40:11' AS STRING), 1, 1024) AS `max`,
SUM(t1.count) * 16 * 11.279462585593992 AS `data_size`,
NOW()
FROM
(
SELECT
t0.`time` as `column_key`,
COUNT(1) as `count`
FROM
(
SELECT
`time`
FROM
`internal`.`default_cluster:db`.`mytable` TABLET(3670690,
3670746,
3670790,
3670834,
3670882,
4594011,
3673494,
.......,
3673542,
3673602,
3673630) ) as `t0`
GROUP BY
`t0`.`time` ) as `t1`
表mytable为大表(几百亿),time为表dupkey的首字段,类型为datetime,TABLET()内约有百个,省略显示全部。百个tablets的数量肯定达到亿级,即使是group by后,所以肯定会oom。
问题:
- 根据官网说明,默认所有表都是大表?huge_table_lower_bound_size_in_bytes默认值为0
- 大表统计不是采样huge_table_default_sample_rows(400万)吗,上面的sql查一部分tablets是怎么来的
- 关闭收集统计信息对性能有什么影响?
- 能否只针对某些表关闭收集。