条件是 event_day = '20240618' and event_min <= 1115
一开始用的pom依赖是:
<dependency>
<groupId>org.apache.doris</groupId>
<artifactId>spark-doris-connector-2.3_2.11</artifactId>
<version>1.2.0</version>
</dependency>
返回这个错误:spark type is DATEV2 but array type is dateday
如果我去掉 event_day和 event_min 的条件,就可以看到数据了;
看论坛里其他人说是 要最新的jar包,于是我改成了
<dependency>
<groupId>org.apache.doris</groupId>
<artifactId>spark-doris-connector-2.3_2.11</artifactId>
<version>1.3.2</version>
</dependency>
现在没有报错,但是拿不到数据
建表sql
CREATE TABLE `xxxxx` (
`event_day` DATE NOT NULL,
`event_min` SMALLINT NULL,
`os` VARCHAR(30) NOT NULL,
`version` VARCHAR(200) NOT NULL,
`udid` VARCHAR(200) NOT NULL,
`phone_mode` VARCHAR(200) NOT NULL,
`phone_band` VARCHAR(200) NOT NULL,
`client_ip` VARCHAR(200) NOT NULL,
`chn_id` VARCHAR(200) NOT NULL,
`chn_name` VARCHAR(200) NOT NULL,
`client_id` VARCHAR(200) NOT NULL,
`province` VARCHAR(200) NOT NULL,
`page_id` VARCHAR(200) NOT NULL,
`group_id` VARCHAR(200) NOT NULL,
`comp_id` VARCHAR(200) NOT NULL,
`click_index` VARCHAR(200) NOT NULL,
`mgbd_id` VARCHAR(200) NOT NULL,
`program_id` VARCHAR(200) NOT NULL,
`page_session_id` VARCHAR(200) NOT NULL,
`sp_name` VARCHAR(200) NOT NULL,
`uuid` VARCHAR(200) NOT NULL
) ENGINE=OLAP
AGGREGATE KEY(`event_day`, `event_min`, `os`, `version`, `udid`, `phone_mode`, `phone_band`, `client_ip`, `chn_id`, `chn_name`, `client_id`, `province`, `page_id`, `group_id`, `comp_id`, `click_index`, `mgbd_id`, `program_id`, `page_session_id`, `sp_name`, `uuid`)
COMMENT 'OLAP'
PARTITION BY RANGE(`event_day`)()
DISTRIBUTED BY HASH(`page_session_id`) BUCKETS AUTO
PROPERTIES (
"replication_allocation" = "tag.location.default: 3"
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "DAY"
"dynamic_partition.start" = "-7",
"dynamic_partition.end" = "3",
"dynamic_partition.prefix" = "p"
);
spark读doris的代码
val dfPositionClick: DataFrame = readTable(nacosUtil.dorisConfig.value, tablePositionClick)
.where(s"event_day = '20240619' and event_min = 1000")
我现在可以肯定就是 event_day = '20240619' 这个筛选条件出问题了,因为我去掉 event_day 的条件筛选,只保留 event_min = 1000,是有数据可以拿到的