我在其他帖子已经提了这个问题了,但是因为追问没人回答,所以再开一个帖子询问。
分区是取决于物化视图的分区,还是也取决于原表的分区呢。如果是多表异步物化视图,多表又是取决哪张表的分区呢?
我在其他帖子已经提了这个问题了,但是因为追问没人回答,所以再开一个帖子询问。
分区是取决于物化视图的分区,还是也取决于原表的分区呢。如果是多表异步物化视图,多表又是取决哪张表的分区呢?
物化视图增量分区更新的核心原理是:识别原表变化数据所在的分区,识别出物化视图对应变化的分区,
针对物化视图对应的分区进行刷新,从而达到增量分区刷新的目的。
所以基于以上原理,分区字段在物化视图中定义,需要刷新的物化视图分区取决于物化视图使用的原表的分区。
如果物化视图想要分区增量更新,物化视图的定义SQL,需要满足如下的条件。
举例如下
表定义:
CREATE TABLE IF NOT EXISTS lineitem (
L_ORDERKEY INTEGER NOT NULL,
L_PARTKEY INTEGER NOT NULL,
L_SUPPKEY INTEGER NOT NULL,
L_LINENUMBER INTEGER NOT NULL,
L_QUANTITY DECIMALV3(15, 2) NOT NULL,
L_EXTENDEDPRICE DECIMALV3(15, 2) NOT NULL,
L_DISCOUNT DECIMALV3(15, 2) NOT NULL,
L_TAX DECIMALV3(15, 2) NOT NULL,
L_RETURNFLAG CHAR(1) NOT NULL,
L_LINESTATUS CHAR(1) NOT NULL,
L_SHIPDATE DATE NOT NULL,
L_COMMITDATE DATE NOT NULL,
L_RECEIPTDATE DATE NOT NULL,
L_SHIPINSTRUCT CHAR(25) NOT NULL,
L_SHIPMODE CHAR(10) NOT NULL,
L_COMMENT VARCHAR(44) NOT NULL
) DUPLICATE KEY(
L_ORDERKEY, L_PARTKEY, L_SUPPKEY,
L_LINENUMBER
) PARTITION BY RANGE(L_SHIPDATE) (
PARTITION `day_1`
VALUES
LESS THAN ('2017-02-01')
) DISTRIBUTED BY HASH(L_ORDERKEY) BUCKETS 3 PROPERTIES ("replication_num" = "1") "
CREATE TABLE IF NOT EXISTS orders (
O_ORDERKEY INTEGER NOT NULL,
O_CUSTKEY INTEGER NOT NULL,
O_ORDERSTATUS CHAR(1) NOT NULL,
O_TOTALPRICE DECIMALV3(15, 2) NOT NULL,
O_ORDERDATE DATE NOT NULL,
O_ORDERPRIORITY CHAR(15) NOT NULL,
O_CLERK CHAR(15) NOT NULL,
O_SHIPPRIORITY INTEGER NOT NULL,
O_COMMENT VARCHAR(79) NOT NULL
) DUPLICATE KEY(O_ORDERKEY, O_CUSTKEY) PARTITION BY RANGE(O_ORDERDATE) (
PARTITION `day_2`
VALUES
LESS THAN ('2017-03-01')
) DISTRIBUTED BY HASH(O_ORDERKEY) BUCKETS 3 PROPERTIES ("replication_num" = "1")
如下定义的物化视图可以分区增量更新,分区字段是 L_SHIPDATE
,使用了非 null 产生端的字段。
CREATE MATERIALIZED VIEW mv_10
BUILD IMMEDIATE REFRESH AUTO ON MANUAL
partition by(L_SHIPDATE)
DISTRIBUTED BY RANDOM BUCKETS 2
PROPERTIES ('replication_num' = '1')
AS
SELECT
t1.L_SHIPDATE,
t2.O_ORDERDATE,
t1.L_QUANTITY,
t2.O_ORDERSTATUS,
count(
distinct case when t1.L_SUPPKEY > 0 then t2.O_ORDERSTATUS else null end
) as cnt_1
from
(
select
*
from
lineitem
) t1
left join (
select
*
from
orders
) t2 on t1.L_ORDERKEY = t2.O_ORDERKEY
group by
t1.L_SHIPDATE,
t2.O_ORDERDATE,
t1.L_QUANTITY,
t2.O_ORDERSTATUS;
如下定义的物化视图则不可以分区增量更新,使用的分区字段在 join 的 null 产生端
CREATE MATERIALIZED VIEW mv_11
BUILD IMMEDIATE REFRESH AUTO ON MANUAL
partition by(O_ORDERDATE)
DISTRIBUTED BY RANDOM BUCKETS 2
PROPERTIES ('replication_num' = '1')
AS
SELECT
t1.L_SHIPDATE,
t2.O_ORDERDATE,
t1.L_QUANTITY,
t2.O_ORDERSTATUS,
count(
distinct case when t1.L_SUPPKEY > 0 then t2.O_ORDERSTATUS else null end
) as cnt_1
from
(
select
*
from
lineitem
) t1
left join (
select
*
from
orders
) t2 on t1.L_ORDERKEY = t2.O_ORDERKEY
group by
t1.L_SHIPDATE,
t2.O_ORDERDATE,
t1.L_QUANTITY,
t2.O_ORDERSTATUS;