doris2.1.2升级到2.1.6后,通过catlog查询oracle变慢很多

Viewed 95

看下图,平时执行耗时6m左右的任务,26号耗时1h12m
![]
截图了任务的某个流程的sql,获取profile(profile见后面),这个sql耗时14m,平时应该也就几十秒
(https://ask.selectdb.com/uploads/post/5gBHt9Tt6mu.jpg)

以下是sql的profile,由于超长,分几段发:
Summary:
- Profile ID: ba6b36e6f8ec4712-86dddd01f3934aeb
- Task Type: QUERY
- Start Time: 2024-09-26 14:51:27
- End Time: 2024-09-26 15:06:25
- Total: 14m57s
- Task State: EOF
- User: root
- Default Catalog: internal
- Default Db: xh_spxl
- Sql Statement: /* ApplicationName=DBeaver for YashanDB 23.1.1 - SQLEditor <Script-6.sql> */ select b.starttime,a.id,b.groupid,b.endtime,a.groupid orgid,d.sname orgname,d.sfid orgfullid,b.userid,c.scode usercode,c.sname username,c.user_identity useridentity,
e.commoncust_type,b.camera_num,0 exempt_num,a.patrol_type,b.patrol_num,b.patrol_sec,1 user_num,a.rowcreatetime createtime,a.rowupdatetime updatetime,if(a.patrol_mode=2,2,1) patrol_mode
from jdbc_oracle_20.xh_base.xh_commoncust_patrol_log
inner join jdbc_oracle_20.xh_base.xh_commoncust_log b on a.id=b.patrol_id
inner join jdbc_oracle_20.xh_base.xh_sa_opperson c on b.userid=c.sid
inner join jdbc_oracle_20.xh_base.xh_sa_oporg d on a.groupid=d.sid
inner join jdbc_oracle_20.xh_base.xh_commoncust_group e on b.groupid=e.id
where a.patrol_type=2 and a.rowcreatetime>=('2024-09-25 00:00:00') and a.rowcreatetime<=('2024-09-25 23:59:59')

Execution Summary:
- Parse SQL Time: 3ms
- Nereids Analysis Time: 2s217ms
- Nereids Rewrite Time: 10ms
- Nereids Optimize Time: 8ms
- Nereids Translate Time: 3ms
- Workload Group: normal
- Analysis Time: 2s217ms
- Plan Time: 25ms
- JoinReorder Time: N/A
- CreateSingleNode Time: N/A
- QueryDistributed Time: N/A
- Init Scan Node Time: N/A
- Finalize Scan Node Time: N/A
- Get Splits Time: N/A
- Get Partitions Time: N/A
- Get Partition Files Time: N/A
- Create Scan Range Time: N/A
- Schedule Time: 37ms
- Fragment Assign Time: 3ms
- Fragment Serialize Time: 4ms
- Fragment RPC Phase1 Time: 25ms
- Fragment RPC Phase2 Time: 5ms
- Fragment Compressed Size: 49.24 KB
- Fragment RPC Count: 8
- Schedule Time Of BE: {"phase1":{"190.15.144.149: 8060":{"RPC Work Time":"14ms","RPC Latency From FE To BE":"10ms","RPC Work Queue Time":"0ms","RPC Latency From BE To FE":"-3ms"},"190.15.145.35: 8060":{"RPC Work Time":"5ms","RPC Latency From FE To BE":"11ms","RPC Work Queue Time":"0ms","RPC Latency From BE To FE":"-1ms"},"190.15.144.180: 8060":{"RPC Work Time":"7ms","RPC Latency From FE To BE":"15ms","RPC Work Queue Time":"0ms","RPC Latency From BE To FE":"-2ms"},"190.15.144.147: 8060":{"RPC Work Time":"4ms","RPC Latency From FE To BE":"13ms","RPC Work Queue Time":"0ms","RPC Latency From BE To FE":"-5ms"}},"phase2":{"190.15.144.149: 8060":{"RPC Work Time":"1ms","RPC Latency From FE To BE":"3ms","RPC Work Queue Time":"0ms","RPC Latency From BE To FE":"0ms"},"190.15.145.35: 8060":{"RPC Work Time":"0ms","RPC Latency From FE To BE":"3ms","RPC Work Queue Time":"0ms","RPC Latency From BE To FE":"-1ms"},"190.15.144.180: 8060":{"RPC Work Time":"0ms","RPC Latency From FE To BE":"9ms","RPC Work Queue Time":"0ms","RPC Latency From BE To FE":"-5ms"},"190.15.144.147: 8060":{"RPC Work Time":"1ms","RPC Latency From FE To BE":"6ms","RPC Work Queue Time":"0ms","RPC Latency From BE To FE":"-3ms"}}}
- Wait and Fetch Result Time: 14m55s
- Fetch Result Time: 14m55s
- Write Result Time: 0ms
- Doris Version: doris-2.1.6-rc04-653e315ba5
- Is Nereids: Yes
- Is Pipeline: Yes
- Is Cached: No
- Total Instances Num: 9
- Instances Num Per BE: 190.15.144.147:8060:1,190.15.144.149:8060:6,190.15.144.180:8060:1,190.15.145.35:8060:1
- Parallel Fragment Exec Instance Num: 16
- Trace ID:
- Transaction Commit Time: N/A
- Executed By Frontend: N/A

篇幅有限,截取一些关键的

          Fragment  4:
              Pipeline  :  0(instance_num=1):
                  DATA_STREAM_SINK_OPERATOR  (id=9,dst_id=9):
                        -  BlocksProduced:  sum  138,  avg  138,  max  138,  min  138
                        -  CloseTime:  avg  17.57us,  max  17.57us,  min  17.57us
                        -  ExecTime:  avg  73.102ms,  max  73.102ms,  min  73.102ms
                        -  InitTime:  avg  22.807us,  max  22.807us,  min  22.807us
                        -  InputRows:  sum  140.269K  (140269),  avg  140.269K  (140269),  max  140.269K  (140269),  min  140.269K  (140269)
                        -  MemoryUsage:  sum  ,  avg  ,  max  ,  min  
                            -  PeakMemoryUsage:  sum  0.00  ,  avg  0.00  ,  max  0.00  ,  min  0.00  
                        -  OpenTime:  avg  34.725us,  max  34.725us,  min  34.725us
                        -  RowsProduced:  sum  140.269K  (140269),  avg  140.269K  (140269),  max  140.269K  (140269),  min  140.269K  (140269)
                        -  WaitForDependencyTime:  avg  0ns,  max  0ns,  min  0ns
                            -  WaitForRpcBufferQueue:  avg  0ns,  max  0ns,  min  0ns
                      JDBC_SCAN_OPERATOR  (id=8.  table  name  =  "XH_BASE"."XH_COMMONCUST_LOG"):
                            -  PlanInfo
                                  -  TABLE:  "XH_BASE"."XH_COMMONCUST_LOG"
                                  -  QUERY:  SELECT  "PATROL_ID",  "STARTTIME",  "ENDTIME",  "USERID",  "GROUPID",  "CAMERA_NUM",  "PATROL_NUM",  "PATROL_SEC"  FROM  "XH_BASE"."XH_COMMONCUST_LOG"
                            -  BlocksProduced:  sum  138,  avg  138,  max  138,  min  138
                            -  CloseTime:  avg  47.71us,  max  47.71us,  min  47.71us
                            -  ExecTime:  avg  15m,  max  15m,  min  15m
                            -  InitTime:  avg  295.948us,  max  295.948us,  min  295.948us
                            -  MemoryUsage:  sum  ,  avg  ,  max  ,  min  
                                -  PeakMemoryUsage:  sum  0.00  ,  avg  0.00  ,  max  0.00  ,  min  0.00  
                            -  OpenTime:  avg  117.720us,  max  117.720us,  min  117.720us
                            -  ProjectionTime:  avg  0ns,  max  0ns,  min  0ns
                            -  RowsProduced:  sum  140.269K  (140269),  avg  140.269K  (140269),  max  140.269K  (140269),  min  140.269K  (140269)
                            -  WaitForDependency[JDBC_SCAN_OPERATOR_DEPENDENCY]Time:  avg  14m59s,  max  14m59s,  min  14m59s
                          VScanner:
                                -  MemoryUsage:  sum  ,  avg  ,  max  ,  min  
                                    -  FreeBlocks:  sum  10.41  MB,  avg  10.41  MB,  max  10.41  MB,  min  10.41  MB
1 Answers

是相同的数据量的情况下出现的吗?

这个可能得取个 explain 和 jstack 了。执行的时候监控有异常不,看看cpu和内存这些指标。

  1. explain $SQL
  2. jstack fe_pid > fe_js.txt