routine load 失败

Viewed 51

routine load 一段时间后,(大概是3天左右), 报以下问题, 问下是什么原因
ReasonOfStateChanged: ErrorReason{code=errCode = 4, msg='failed to get latest partition offset. {}errCode = 2, detailMessage = Failed to get latest offsets of kafka topic: ltr_param_infer. error: Waited 5 seconds (plus 506632 nanoseconds delay) for io.grpc.stub.ClientCalls$GrpcFuture@64e75698[status=PENDING, info=[GrpcFuture{clientCall=ClientCallImpl{method=MethodDescriptor{fullMethodName=doris.PBackendService/get_info, type=UNARY, idempotent=false, safe=false, sampledToLocalTracing=true, requestMarshaller=io.grpc.protobuf.lite.ProtoLiteUtils$MessageMarshaller@7f20a62f, responseMarshaller=io.grpc.protobuf.lite.ProtoLiteUtils$MessageMarshaller@2ff9a260, schemaDescriptor=org.apache.doris.proto.PBackendServiceGrpc$PBackendServiceMethodDescriptorSupplier@131ff35a}}}]]'}
ErrorLogUrls:
OtherMsg: 2024-06-20 16:23:56:errCode = 2, detailMessage = failed to send task: java.net.SocketException: Broken pipe (Write failed)

1 Answers

这块可能是一个三方库的bug,具体判断方式如下:
1.be没有获取元数据错误的日志,即搜不到failed to get partition meta:/failed to get latest offset for partition:
2.打一个pastack,搜routine_load/clean_idle_consumer线程,看是否卡在consumer析构的栈上
可能避免出现bug的方法
1.删除topic之前先pause/stop routine load
2.设置be参数routine_load_consumer_pool_size = 0
出现了如何解决
重启be即可