我在测试TPCH Q1,然后在QueryProfile里面看到有一个算子是STREAMING_AGGREGATION_OPERATOR:
STREAMING_AGGREGATION_OPERATOR (id=1):
- BlocksProduced: sum 48, avg 1, max 1, min 1
- CloseTime: avg 1.206us, max 8.20us, min 568ns
- ExecTime: avg 10s203ms, max 10s838ms, min 9s518ms
- MemoryUsage: sum , avg , max , min
- HashTable: sum 7.88 KB, avg 168.00 B, max 168.00 B, min 168.00 B
- PeakMemoryUsage: sum 72.76 MB, avg 1.52 MB, max 1.52 MB, min 1.52 MB
- SerializeKeyArena: sum 72.75 MB, avg 1.52 MB, max 1.52 MB, min 1.52 MB
- OpenTime: avg 42.458us, max 199.827us, min 28.44us
- ProjectionTime: avg 0ns, max 0ns, min 0ns
- RowsProduced: sum 192, avg 4, max 4, min 4
Velox里面对于Streaming Aggregation的定义是不包含聚合函数的Aggregation,也就是只有分组列的aggregation,说白了就是一个去重操作(See [1]), 对于TPCH Q1应该是不符合的:
select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= date '1998-12-01' - interval '120' day
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus;
不过没有找到Doris明确的文档描述Streaming Aggregation,所以请教一下。
[1]. https://facebookincubator.github.io/velox/develop/aggregations.html