45台 128c 机器750G,hive大表为30T(通过hive_catalog链接),执行sql偶尔会报错(单表查询无join)
query_timeout = 60000;
parallel_pipeline_task_num = 150;
exec_mem_limit = 250G;
ERROR 1105 (HY000) at line 5: errCode = 2, detailMessage = (10.xxxx)[CANCELLED]failed to send brpc when exchange, error=Host is down, error_text=[E110]Fail to connect Socket{id=688 addr=10.xxxxx:8060} (0x0x7fafa37df8c0): Connection timed out [R1][E112]Not connected to 10.xxxxxx:8060 yet, server_id=688 [R2][E112]Not connected to 10.xxxxxx:8060 yet, server_id=688 [R3][E112]Not connected to 10xxxxxx:8060 yet, server_id=688 [R4][E112]Not connected to 10xxxxxx:8060 yet, server_id=688 [R5][E112]Not connected to 10
报错好像是be节点之间连接有问题,但是看了服务都还活着,端口都通