spark-doris-connector中使用udaf写bitmap类型的doris表失败

Viewed 53

使用函数ToBitmapUDAF,用spark-connector写入bitmap的doris表,报错如下:
load.DorisStreamLoad: Streamload Response RES STATUS Error:status: 200, resp msg: OK, resp content: {
"TxnId": 351544,
"Label": "spark_streamload_20240311_160031_c06e3bc2614f477f93d6334d8c83b244",
"Comment": "",
"TwoPhaseCommit": "false",
"Status": "Fail",
"Message": "[ANALYSIS_ERROR]TStatus: errCode = 2, detailMessage = bitmap column user_id require the function return type is BITMAP\n\n\t0# doris::Status doris::Status::create(doris::TStatus const&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:187\n\t1# doris::StreamLoadAction::_process_put(doris::HttpRequest*, std::shared_ptr) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:445\n\t2# doris::StreamLoadAction::on_header(doris::HttpRequest*, std::shared_ptr) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701\n\t3# doris::StreamLoadAction::on_header(doris::HttpRequest*) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701\n\t4# doris::EvHttpServer::on_header(evhttp_request*) at /home/zcp/repo_center/doris_release/doris/be/src/http/ev_http_server.cpp:255\n\t5# ?\n\t6# bufferevent_run_readcb\n\t7# ?\n\t8# ?\n\t9# ?\n\t10# ?\n\t11# std::_Function_handler<void (), doris::EvHttpServer::start()::$_0>::_M_invoke(std::_Any_data const&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/ext/atomicity.h:98\n\t12# doris::ThreadPool::dispatch_thread() at /home/zcp/repo_center/doris_release/doris/be/src/util/threadpool.cpp:0\n\t13# doris::Thread::supervise_thread(void*) at /var/local/ldb_toolchain/bin/../usr/include/pthread.h:562\n\t14# start_thread\n\t15# __clone\n",
"NumberTotalRows": 0,
"NumberLoadedRows": 0,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 0,
"LoadTimeMs": 0,
"BeginTxnTimeMs": 0,
"StreamLoadPutTimeMs": 1,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 0,
"CommitAndPublishTimeMs": 0
}

写入sql类似如下:
create temporary function to_bitmap as 'xxx.ToBitmapUDAF';
insert into test.test_tbl
select
col1,
col2,
to_bitmap(user_id) as user_id
from tbl
group by col1, col2;

表定义如下:
create table test.test_tbl(
col1 varchar(512) NULL,
col2 varchar(512) NULL,
user_id bitmap BITMAP_UNION NOT NULL
) ENGINE=OLAP
AGGREGATE KEY(col1, col2)
DISTRIBUTED BY RANDOM BUCKETS 6;

1 Answers

看udaf源码,返回的是binary[]数组类型,是不是需要将字节数组转换成bitmap?对应的函数是哪个啊?在文档里没找到合适的.

评论:找到了,需要用sparksql的base64()函数,将byte[]转换为base64,然后再用doris的bitmap_from_base64()函数即可转换