使用窗口函数查询报错,错误显示tablet does not exist

Viewed 37

sql 如下

select row_number() over (partition by vin order by cloud_time desc) as rm, vin, odometer_master_value, cloud_time
from dcsp_prod.o_tcu_xev_data_monitor
where date_format(cloud_time, '%Y-%m-%d') >= '2024-01-17' limit 10;

报错日志如下

[HY000][1105] errCode = 2, detailMessage = (doris-be-cluster1-3.doris-be-cluster1.prod.svc.cluster.local)[CANCELLED][INTERNAL_ERROR]failed to get tablet: 1441865, reason: tablet does not exist. doris-be-cluster1-3.doris-be-cluster1.prod.svc.cluster.local 0# doris::TabletManager::get_tablet_and_status(long, bool) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1147 1# doris::vectorized::NewOlapScanNode::_init_scanners(std::__cxx11::list<std::shared_ptr, std::allocator<std::shared_ptr > >*) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/tuple:180 2# doris::vectorized::VScanNode::_prepare_scanners(int) at /root/src/doris-2.0/be/src/common/status.h: ...

2 Answers

可以先 show tablet 1441865 看下,tablet 状态,看看这个tablet的归属,看下(doris-be-cluster1-3.doris-be-cluster1.prod.svc.cluster.local)在这台节点上有没有这个tablet。

猜测可能是,在查询的时候,tablet 进行了balance,导致这个BE节点上的tablet被balance到其他节点上了,导致出现了这种报错。(doris-be-cluster1-3.doris-be-cluster1.prod.svc.cluster.local)

如果确定是以上问题,可以看看是不是磁盘使用率导致tablet迁移,看看是不是trash中的垃圾太多了。

查看方式:
show trash;
清理方式:
admin clean trash;

tabletshow tablet 1441865 看了,发现这个tablet不在doris-be-cluster1-3.doris-be-cluster1.prod.svc.cluster.local上,然后也执行了admin clean trash ,再次查询还是会有问题。
image.png
再次执行sql,在doris-be-cluster1-3节点看fe.INFO日志发现

l, port=9020), query id: 1b4aae430ae54e5b-9022ebc06936d442, instance id: 1b4aae430ae54e5b-9022ebc06936d45f
W0719 02:24:58.222904 400 fragment_mgr.cpp:481] report error status: [CANCELLED] to coordinator: TNetworkAddress(hostname=doris-follower-cluster1-0.doris-follower-cluster1.prod.svc.cluster.local, port=9020), query id: 1b4aae430ae54e5b-9022ebc06936d442, instance id: 1b4aae430ae54e5b-9022ebc06936d45f
W0719 02:24:58.223091 390 fragment_mgr.cpp:481] report error status: [CANCELLED][INTERNAL_ERROR]failed to get tablet: 1441845, reason: tablet does not exist. doris-be-cluster1-3.doris-be-cluster1.prod.svc.cluster.local

    0#  doris::TabletManager::get_tablet_and_status(long, bool) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1147
    1#  doris::vectorized::NewOlapScanNode::_init_scanners(std::__cxx11::list<std::shared_ptr<doris::vectorized::VScanner>, std::allocator<std::shared_ptr<doris::vectorized::VScanner> > >*) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/tuple:180
    2#  doris::vectorized::VScanNode::_prepare_scanners(int) at /root/src/doris-2.0/be/src/common/status.h:435
    3#  doris::vectorized::VScanNode::alloc_resource(doris::RuntimeState*) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1295
    4#  doris::pipeline::StreamingOperator<doris::pipeline::ScanOperatorBuilder>::open(doris::RuntimeState*) at /root/src/doris-2.0/be/src/common/status.h:435
    5#  doris::pipeline::PipelineTask::_open() at /root/src/doris-2.0/be/src/common/status.h:435
    6#  doris::pipeline::PipelineTask::execute(bool*) at /root/src/doris-2.0/be/src/common/status.h:430
    7#  doris::pipeline::TaskScheduler::_do_work(unsigned long) at /root/src/doris-2.0/be/src/common/status.h:351
    8#  doris::ThreadPool::dispatch_thread() at /root/src/doris-2.0/be/src/util/threadpool.cpp:0
    9#  doris::Thread::supervise_thread(void*) at /var/local/ldb-toolchain/bin/../usr/include/pthread.h:562
    10# start_thread
    11# clone

to coordinator: TNetworkAddress(hostname=doris-follower-cluster1-0.doris-follower-cluster1.prod.svc.cluster.local, port=9020), query id: 1b4aae430ae54e5b-9022ebc06936d442, instance id: 1b4aae430ae54e5b-9022ebc06936d44c
W0719 02:24:58.223234 400 fragment_mgr.cpp:481] report error status: [CANCELLED] to coordinator: TNetworkAddress(hostname=doris-follower-cluster1-0.doris-follower-cluster1.prod.svc.cluster.local, port=9020), query id: 1b4aae430ae54e5b-9022ebc06936d442, instance id: 1b4aae430ae54e5b-9022ebc06936d45f
W0719 02:24:58.225312 387 fragment_mgr.cpp:481] report error status: [CANCELLED] to coordinator: TNetworkAddress(hostname=doris-follower-cluster1-0.doris-follower-cluster1.prod.svc.cluster.local, port=9020), query id: 1b4aae430ae54e5b-9022ebc06936d442, instance id: 1b4aae430ae54e5b-9022ebc06936d45f
W0719 02:24:58.225337 395 fragment_mgr.cpp:481] report error status: [CANCELLED][INTERNAL_ERROR]failed to get tablet: 24794562, reason: tablet does not exist. doris-be-cluster1-3.doris-be-cluster1.prod.svc.cluster.local

    0#  doris::TabletManager::get_tablet_and_status(long, bool) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1147
    1#  doris::vectorized::NewOlapScanNode::_init_scanners(std::__cxx11::list<std::shared_ptr<doris::vectorized::VScanner>, std::allocator<std::shared_ptr<doris::vectorized::VScanner> > >*) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/tuple:180
    2#  doris::vectorized::VScanNode::_prepare_scanners(int) at /root/src/doris-2.0/be/src/common/status.h:435
    3#  doris::vectorized::VScanNode::alloc_resource(doris::RuntimeState*) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1295
    4#  doris::pipeline::StreamingOperator<doris::pipeline::ScanOperatorBuilder>::open(doris::RuntimeState*) at /root/src/doris-2.0/be/src/common/status.h:435
    5#  doris::pipeline::PipelineTask::_open() at /root/src/doris-2.0/be/src/common/status.h:435
    6#  doris::pipeline::PipelineTask::execute(bool*) at /root/src/doris-2.0/be/src/common/status.h:430
    7#  doris::pipeline::TaskScheduler::_do_work(unsigned long) at /root/src/doris-2.0/be/src/common/status.h:351
    8#  doris::ThreadPool::dispatch_thread() at /root/src/doris-2.0/be/src/util/threadpool.cpp:0
    9#  doris::Thread::supervise_thread(void*) at /var/local/ldb-toolchain/bin/../usr/include/pthread.h:562
    10# start_thread
    11# clone

to coordinator: TNetworkAddress(hostname=doris-follower-cluster1-0.doris-follower-cluster1.prod.svc.cluster.local, port=9020), query id: 1b4aae430ae54e5b-9022ebc06936d442, instance id: 1b4aae430ae54e5b-9022ebc06936d44e
W0719 02:24:58.225345 399 fragment_mgr.cpp:481] report error status: [CANCELLED][INTERNAL_ERROR]failed to get tablet: 1441865, reason: tablet does not exist. doris-be-cluster1-3.doris-be-cluster1.prod.svc.cluster.local

    0#  doris::TabletManager::get_tablet_and_status(long, bool) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1147
    1#  doris::vectorized::NewOlapScanNode::_init_scanners(std::__cxx11::list<std::shared_ptr<doris::vectorized::VScanner>, std::allocator<std::shared_ptr<doris::vectorized::VScanner> > >*) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/tuple:180
    2#  doris::vectorized::VScanNode::_prepare_scanners(int) at /root/src/doris-2.0/be/src/common/status.h:435
    3#  doris::vectorized::VScanNode::alloc_resource(doris::RuntimeState*) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1295
    4#  doris::pipeline::StreamingOperator<doris::pipeline::ScanOperatorBuilder>::open(doris::RuntimeState*) at /root/src/doris-2.0/be/src/common/status.h:435
    5#  doris::pipeline::PipelineTask::_open() at /root/src/doris-2.0/be/src/common/status.h:435
    6#  doris::pipeline::PipelineTask::execute(bool*) at /root/src/doris-2.0/be/src/common/status.h:430
    7#  doris::pipeline::TaskScheduler::_do_work(unsigned long) at /root/src/doris-2.0/be/src/common/status.h:351
    8#  doris::ThreadPool::dispatch_thread() at /root/src/doris-2.0/be/src/util/threadpool.cpp:0
    9#  doris::Thread::supervise_thread(void*) at /var/local/ldb-toolchain/bin/../usr/include/pthread.h:562
    10# start_thread
    11# clone

to coordinator: TNetworkAddress(hostname=doris-follower-cluster1-0.doris-follower-cluster1.prod.svc.cluster.local, port=9020), query id: 1b4aae430ae54e5b-9022ebc06936d442, instance id: 1b4aae430ae54e5b-9022ebc06936d44d
W0719 02:24:58.225351 394 fragment_mgr.cpp:481] report error status: [CANCELLED][INTERNAL_ERROR]failed to get tablet: 24796549, reason: tablet does not exist. doris-be-cluster1-3.doris-be-cluster1.prod.svc.cluster.local

    0#  doris::TabletManager::get_tablet_and_status(long, bool) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1147
    1#  doris::vectorized::NewOlapScanNode::_init_scanners(std::__cxx11::list<std::shared_ptr<doris::vectorized::VScanner>, std::allocator<std::shared_ptr<doris::vectorized::VScanner> > >*) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/tuple:180
    2#  doris::vectorized::VScanNode::_prepare_scanners(int) at /root/src/doris-2.0/be/src/common/status.h:435
    3#  doris::vectorized::VScanNode::alloc_resource(doris::RuntimeState*) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1295
    4#  doris::pipeline::StreamingOperator<doris::pipeline::ScanOperatorBuilder>::open(doris::RuntimeState*) at /root/src/doris-2.0/be/src/common/status.h:435
    5#  doris::pipeline::PipelineTask::_open() at /root/src/doris-2.0/be/src/common/status.h:435
    6#  doris::pipeline::PipelineTask::execute(bool*) at /root/src/doris-2.0/be/src/common/status.h:430
    7#  doris::pipeline::TaskScheduler::_do_work(unsigned long) at /root/src/doris-2.0/be/src/common/status.h:351
    8#  doris::ThreadPool::dispatch_thread() at /root/src/doris-2.0/be/src/util/threadpool.cpp:0
    9#  doris::Thread::supervise_thread(void*) at /var/local/ldb-toolchain/bin/../usr/include/pthread.h:562
    10# start_thread
    11# clone

to coordinator: TNetworkAddress(hostname=doris-follower-cluster1-0.doris-follower-cluster1.prod.svc.cluster.local, port=9020), query id: 1b4aae430ae54e5b-9022ebc06936d442, instance id: 1b4aae430ae54e5b-9022ebc06936d44b