在k8s中采用存算分离的方式部署。在使用一段时间后,执行查询语句出现timeout的报错。在pod ‘test-disaggregated-cluster-ms-*’中会出现如下的日志报错信息:
RuntimeLogger W20250115 02:17:44.783914 363 txn_kv.cpp:389] virtual TxnErrorCode doris::cloud::fdb::Transaction::get(std::string_view, std::string *, bool) failed to fdb_future_get_error err=Operation aborted because the transaction timed out key=011076657273696f6e00011031383134383031373133000110706172746974696f6e000112000000000000271212000000000000355e120000000000003565
RuntimeLogger I20250115 02:17:44.796782 325 meta_service_helper.h:81] begin get_obj_store_info from 192.164.11.22:39088 request=cloud_unique_id: "1:1814801713:uXZDfzBa"
RuntimeLogger W20250115 02:17:46.505126 360 txn_kv.cpp:389] virtual TxnErrorCode doris::cloud::fdb::Transaction::get(std::string_view, std::string *, bool) failed to fdb_future_get_error err=Operation aborted because the transaction timed out key=01106d657461000110313831343830313731330001107461626c65745f696e64657800011200000000000561aa
RuntimeLogger W20250115 02:17:46.752558 330 txn_kv.cpp:431] Operation aborted because the transaction timed out
RuntimeLogger W20250115 02:17:46.820328 363 txn_kv.cpp:389] virtual TxnErrorCode doris::cloud::fdb::Transaction::get(std::string_view, std::string *, bool) failed to fdb_future_get_error err=Operation aborted because the transaction timed out key=0110696e7374616e6365000110313831343830313731330001
RuntimeLogger I20250115 02:17:46.820472 363 meta_service_resource.cpp:238] get instance_key=0110696e7374616e6365000110313831343830313731330001
RuntimeLogger I20250115 02:17:46.820581 363 meta_service_helper.h:147] finish get_obj_store_info from 192.164.248.254:49466 response=status {
code: KV_TXN_GET_ERR
msg: "failed to get instance, instance_id=1814801713 err=Timeout"
}
RuntimeLogger I20250115 02:17:47.380452 355 meta_service_helper.h:81] begin get_obj_store_info from 192.164.248.254:49466 request=cloud_unique_id: "1:1814801713:FxU0_gTN"
RuntimeLogger W20250115 02:17:47.413942 360 txn_kv.cpp:389] virtual TxnErrorCode doris::cloud::fdb::Transaction::get(std::string_view, std::string *, bool) failed to fdb_future_get_error err=Operation aborted because the transaction timed out key=0110696e7374616e6365000110313831343830313731330001
RuntimeLogger I20250115 02:17:47.414059 360 meta_service_resource.cpp:238] get instance_key=0110696e7374616e6365000110313831343830313731330001
RuntimeLogger I20250115 02:17:47.414126 360 meta_service_helper.h:147] finish get_obj_store_info from 192.164.248.234:53870 response=status {
code: KV_TXN_GET_ERR
msg: "failed to get instance, instance_id=1814801713 err=Timeout"
}
RuntimeLogger W20250115 02:17:47.562582 156 txn_kv.cpp:389] virtual TxnErrorCode doris::cloud::fdb::Transaction::get(std::string_view, std::string *, bool) failed to fdb_future_get_error err=Operation aborted because the transaction timed out key=021073797374656d0001106d6574612d7365727669636500011072656769737472790001
RuntimeLogger W20250115 02:17:47.972550 330 txn_kv.cpp:389] virtual TxnErrorCode doris::cloud::fdb::Transaction::get(std::string_view, std::string *, bool) failed to fdb_future_get_error err=Operation aborted because the transaction timed out key=011076657273696f6e00011031383134383031373133000110706172746974696f6e000112000000000000271212000000000000355e120000000000003565
RuntimeLogger I20250115 02:17:48.119001 355 meta_service_helper.h:81] begin get_obj_store_info from 192.164.248.234:53870 request=cloud_unique_id: "1:1814801713:ZHVqa89N"
RuntimeLogger I20250115 02:17:48.208321 323 main.cpp:296] Periodically log for recycler
RuntimeLogger W20250115 02:17:48.482789 363 txn_kv.cpp:389] virtual TxnErrorCode doris::cloud::fdb::Transaction::get(std::string_view, std::string *, bool) failed to fdb_future_get_error err=Operation aborted because the transaction timed out key=0110696e7374616e6365000110313831343830313731330001
RuntimeLogger I20250115 02:17:48.482911 363 meta_service_resource.cpp:238] get instance_key=0110696e7374616e6365000110313831343830313731330001
RuntimeLogger I20250115 02:17:48.482997 363 meta_service_helper.h:147] finish get_obj_store_info from 192.164.11.24:34020 response=status {
code: KV_TXN_GET_ERR
msg: "failed to get instance, instance_id=1814801713 err=Timeout"
}
此时重启foundationdb的组件,没有任何效果。
如果重启pod ‘test-disaggregated-cluster-ms-*’,pod会启动失败,出现报错日志:
LIBHDFS3_CONF=
starts doris_cloud with args:
Wed Jan 15 02:17:12 UTC 2025
process working directory: "/opt/apache-doris/ms"
pid=149 written to file=./bin/doris_cloud.pid
RuntimeLogger I20250115 02:17:12.232849 149 main.cpp:214] try to start doris_cloud
RuntimeLogger I20250115 02:17:12.233088 149 main.cpp:215] version:{doris-3.0.3-rc04-release} code_version:{commit=62a58bff4c2f640f1afcba8c754058d5f77d420f time=2024-12-08 05:42:14 +0800} build_info:{initiator=root@vm-70 build_at=2024-12-08 05:42:14 +0800 build_on=PRETTY_NAME="Ubuntu 22.04.4 LTS" NAME="Ubuntu" }
version:{doris-3.0.3-rc04-release} code_version:{commit=62a58bff4c2f640f1afcba8c754058d5f77d420f time=2024-12-08 05:42:14 +0800} build_info:{initiator=root@vm-70 build_at=2024-12-08 05:42:14 +0800 build_on=PRETTY_NAME="Ubuntu 22.04.4 LTS" NAME="Ubuntu" }
RuntimeLogger I20250115 02:17:12.233106 149 main.cpp:221] meta_service and recycler are both not specified, run doris_cloud as meta_service and recycler by default
run doris_cloud as meta_service and recycler by default
RuntimeLogger I20250115 02:17:12.233132 149 main.cpp:243] begin to init txn kv
RuntimeLogger I20250115 02:17:12.235663 149 main.cpp:251] successfully init txn kv, elapsed milliseconds: 2
RuntimeLogger W20250115 02:17:22.239355 149 txn_kv.cpp:389] virtual TxnErrorCode doris::cloud::fdb::Transaction::get(std::string_view, std::string *, bool) failed to fdb_future_get_error err=Operation aborted because the transaction timed out key=021073797374656d0001106d6574612d73657276696365000110656e6372797074696f6e5f6b65795f696e666f0001
RuntimeLogger W20250115 02:17:22.239964 149 encryption_util.cpp:560] failed to get key of encryption_key_info err=Timeout
RuntimeLogger W20250115 02:17:22.240048 149 encryption_util.cpp:708] failed to generate random root key
RuntimeLogger W20250115 02:17:22.240077 149 main.cpp:255] failed to init global encryption key map
RuntimeLogger W20250115 02:17:22.240223 149 txn_kv.cpp:253] fdb_stop_network
RuntimeLogger W20250115 02:17:22.240303 153 txn_kv.cpp:248] exit fdb_run_network
doris的镜像版本是3.0.3,foundationdb的镜像版本是7.1.65.
fe,be,ms,fdb的配置基本采用的默认配置。
请问如何解决这个问题?