【已解决】2.1.3 版本,spark读doris,获取不到数据

Viewed 70

条件是 event_day = '20240618' and event_min <= 1115
一开始用的pom依赖是:

<dependency>
    <groupId>org.apache.doris</groupId>
    <artifactId>spark-doris-connector-2.3_2.11</artifactId>
    <version>1.2.0</version>
</dependency>

返回这个错误:spark type is DATEV2 but array type is dateday

如果我去掉 event_day和 event_min 的条件,就可以看到数据了;
看论坛里其他人说是 要最新的jar包,于是我改成了

<dependency>
    <groupId>org.apache.doris</groupId>
    <artifactId>spark-doris-connector-2.3_2.11</artifactId>
    <version>1.3.2</version>
</dependency>

现在没有报错,但是拿不到数据

建表sql

CREATE TABLE `xxxxx` (
 `event_day` DATE NOT NULL,
 `event_min` SMALLINT NULL,
 `os` VARCHAR(30) NOT NULL,
 `version` VARCHAR(200) NOT NULL,
 `udid` VARCHAR(200) NOT NULL,
 `phone_mode` VARCHAR(200) NOT NULL,
 `phone_band` VARCHAR(200) NOT NULL,
 `client_ip` VARCHAR(200) NOT NULL,
 `chn_id` VARCHAR(200) NOT NULL,
 `chn_name` VARCHAR(200) NOT NULL,
 `client_id` VARCHAR(200) NOT NULL,
 `province` VARCHAR(200) NOT NULL,
 `page_id` VARCHAR(200) NOT NULL,
 `group_id` VARCHAR(200) NOT NULL,
 `comp_id` VARCHAR(200) NOT NULL,
 `click_index` VARCHAR(200) NOT NULL,
 `mgbd_id` VARCHAR(200) NOT NULL,
 `program_id` VARCHAR(200) NOT NULL,
 `page_session_id` VARCHAR(200) NOT NULL,
 `sp_name` VARCHAR(200) NOT NULL,
 `uuid` VARCHAR(200) NOT NULL
) ENGINE=OLAP
AGGREGATE KEY(`event_day`, `event_min`, `os`, `version`, `udid`, `phone_mode`, `phone_band`, `client_ip`, `chn_id`, `chn_name`, `client_id`, `province`, `page_id`, `group_id`, `comp_id`, `click_index`, `mgbd_id`, `program_id`, `page_session_id`, `sp_name`, `uuid`)
COMMENT 'OLAP'
PARTITION BY RANGE(`event_day`)()
DISTRIBUTED BY HASH(`page_session_id`) BUCKETS AUTO
PROPERTIES (
"replication_allocation" = "tag.location.default: 3"
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "DAY"
"dynamic_partition.start" = "-7",
"dynamic_partition.end" = "3",
"dynamic_partition.prefix" = "p"
);

spark读doris的代码

val dfPositionClick: DataFrame = readTable(nacosUtil.dorisConfig.value, tablePositionClick)
.where(s"event_day = '20240619' and event_min = 1000")

我现在可以肯定就是 event_day = '20240619' 这个筛选条件出问题了,因为我去掉 event_day 的条件筛选,只保留 event_min = 1000,是有数据可以拿到的

1 Answers

你这里的数据格式有些问题:20240618,得用这种时间类型的,2024-06-18