2.1.7,push_type=DELETE|error=[E-235]failed to push data. version count: 2002, exceed limit: 2000, tablet: 45663756

Viewed 20

Doris版本2.1.7
往doris写数据,先删后写,卡在delete语句一直在执行中,日志报错:
W20250113 18:57:15.218466 68976 task_worker_pool.cpp:1487] failed to execute push task|signature=96487867|tablet_id=45663756|push_type=DELETE|error=[E-235]failed to push data. version count: 2002, exceed limit: 2000, tablet: 45663756. Please reduce the frequency of loading data or adjust the max_tablet_version_num in be.conf to a larger value.
W20250113 18:57:15.218521 68977 task_worker_pool.cpp:1487] failed to execute push task|signature=96470942|tablet_id=45663756|push_type=DELETE|error=[E-235]failed to push data. version count: 2002, exceed limit: 2000, tablet: 45663756. Please reduce the frequency of loading data or adjust the max_tablet_version_num in be.conf to a larger value.
W20250113 18:57:19.567021 69615 fragment_mgr.cpp:1252] Could not find the query id:c19967d422d240d4-817e0f86cc8a40d4 fragment id:0 to cancel
W20250113 18:57:19.567051 69629 fragment_mgr.cpp:1252] Could not find the query id:c19967d422d240d4-817e0f86cc8a40d4 fragment id:1 to cancel
W20250113 18:57:21.139274 68176 status.h:413] meet error status: [TIMEOUT]Query tiemout

    0#  doris::ResultBufferMgr::cancel_thread() at /home/zcp/repo_center/doris_release/doris/be/src/runtime/result_buffer_mgr.cpp:210
    1#  doris::Thread::supervise_thread(void*) at /var/local/ldb-toolchain/bin/../usr/include/pthread.h:562
    2#  start_thread
    3#  clone

W20250113 18:57:28.139823 68176 status.h:413] meet error status: [TIMEOUT]Query tiemout

    0#  doris::ResultBufferMgr::cancel_thread() at /home/zcp/repo_center/doris_release/doris/be/src/runtime/result_buffer_mgr.cpp:210
    1#  doris::Thread::supervise_thread(void*) at /var/local/ldb-toolchain/bin/../usr/include/pthread.h:562
    2#  start_thread
    3#  clone
1 Answers

-235 了,看着是有tablet version堆积了。

Doris中delete 本质上也是一种导入,标记删除,所以会出现这个情况。

这种问题排查可以有以下思路:

  1. 停止这张表的写入,show tablet 45663756; 查看 Version 是否会减少。
  2. 如果有减少,证明是compaction慢了,导入的速度跟不上合并 rowset的速度。参考:FAQ
  3. 如果没减少,估计是有事物卡住,或者是compaction失败的问题。可以先私聊我主页微信,发日志看下。这块后续会整理个最佳实践出来。