streamload 不结束,始终处于 Publish Timeout 状态

Viewed 27

curl --location-trusted -u tuser:tuser -H "Expect:100-continue" -H "column_separator:," -T aa.csv -XPUT http://192.168.36.50:8030/api/test/report_t1/_stream_load

doris version: doris-1.2.3-rc02,使用上面的命令执行streamload,aa.csv文件有1条记录,命令执行结束,不能查询到结果,始终处于Publish Timeout 状态,命令具体输出内容:
{
"TxnId": 57501102,
"Label": "56779530-ccff-4fb1-9cc7-6bfd907e8c33",
"TwoPhaseCommit": "false",
"Status": "Publish Timeout",
"Message": "[PUBLISH_TIMEOUT]transaction commit successfully, BUT data will be visible later",
"NumberTotalRows": 1,
"NumberLoadedRows": 1,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 92,
"LoadTimeMs": 7013,
"BeginTxnTimeMs": 0,
"StreamLoadPutTimeMs": 2,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 2008,
"CommitAndPublishTimeMs": 0
}
fe 192.168.36.50 的 fe.warn.log 内容:
2024-12-18 09:25:53,822 WARN (thrift-server-pool-95|575) [MasterImpl.finishTask():93] finish task reports bad. request: TFinishTaskRequest(backend:TBackend(host:192.168.36.8, be_port:9060, http_port:8040), task_type:PUBLISH_VERSION, signature:57389683, task_status:TStatus(status_code:INTERNAL_ERROR, error_msgs:[(192.168.36.8)[E-914]]), report_version:17095600157901, error_tablet_ids:[51131254])

--请老师们指点如何解决,谢谢。

另外,发现集群fe的hostname表达方式不一致,不知是否有影响
image.png

1 Answers

1.2 版本的话,估计是 publish 线程卡死的问题。建议尽快升级到的2.1上。

这个目前只能通过找见出现堆积的BE节点,然后重启BE的方式解决了。

执行 grep "PUBLISH_VERSION" be.INFO | grep queue | tail be/log/be.INFO

看下queue的大小,如果大于 50 就说明任务堆积了,这时候需要重启对应BE节点。