cdc同步Doris数据库 checkpoint异常

Viewed 26

Doris数据库A 下多表 做过cdc同步 然后为了合并cdc小文件 会暂停cdc程序并保存savepoint 新建表 table_tmp 然后将表table数据写入 table_tmp最后删除table将table_tmp重命名为table 最后从savepoint启动cdc程序
这样操作几次后cdc同步到数据库A的任务checkpoint都会卡在IN_PROGRESS然后失败
doris版本2.0.11 connector版本1.17-1.6.1
修改了同步方式--mysql-conf scan.startup.mode=latest-offset 可以正常ck
数据源表有4000万

1 Answers

使用DROP DATABASE FORCE删除原来的库 然后新建一样名字的也不行
cdc同步会前两个ck成功 然后第三个卡这里了 最后报错
同时job信息显示busy max(100%)

2024-07-10 12:53:41,653 WARN  org.apache.flink.runtime.checkpoint.CheckpointFailureManager [] - Failed to trigger or complete checkpoint 2 for job 59fa6119ea8f55a9c8066210386fb825. (0 consecutive failed attempts so far)
org.apache.flink.runtime.checkpoint.CheckpointException: Checkpoint expired before completing.
	at org.apache.flink.runtime.checkpoint.CheckpointCoordinator$CheckpointCanceller.run(CheckpointCoordinator.java:2216) [flink-dist-1.17.2.jar:1.17.2]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_351]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_351]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_351]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_351]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_351]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_351]
	at java.lang.Thread.run(Thread.java:750) [?:1.8.0_351]
2024-07-10 12:53:41,672 INFO  org.apache.flink.runtime.checkpoint.CheckpointRequestDecider [] - checkpoint request time in queue: 590028
2024-07-10 12:53:41,680 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Trying to recover from a global failure.
org.apache.flink.util.FlinkRuntimeException: Exceeded checkpoint tolerable failure threshold. The latest checkpoint failed due to Checkpoint expired before completing., view the Checkpoint History tab or the Job Manager log to find out why continuous checkpoints failed.
	at org.apache.flink.runtime.checkpoint.CheckpointFailureManager.checkFailureAgainstCounter(CheckpointFailureManager.java:212) ~[flink-dist-1.17.2.jar:1.17.2]
	at org.apache.flink.runtime.checkpoint.CheckpointFailureManager.handleJobLevelCheckpointException(CheckpointFailureManager.java:169) ~[flink-dist-1.17.2.jar:1.17.2]
	at org.apache.flink.runtime.checkpoint.CheckpointFailureManager.handleCheckpointException(CheckpointFailureManager.java:122) ~[flink-dist-1.17.2.jar:1.17.2]
	at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2155) ~[flink-dist-1.17.2.jar:1.17.2]
	at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2134) ~[flink-dist-1.17.2.jar:1.17.2]
	at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.access$700(CheckpointCoordinator.java:101) ~[flink-dist-1.17.2.jar:1.17.2]
	at org.apache.flink.runtime.checkpoint.CheckpointCoordinator$CheckpointCanceller.run(CheckpointCoordinator.java:2216) ~[flink-dist-1.17.2.jar:1.17.2]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_351]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_351]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_351]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[?:1.8.0_351]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_351]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_351]
	at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_351]
2024-07-10 12:53:41,686 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - 1 tasks will be restarted to recover from a global failure.

TASK MANAGER LOG

2024-07-10 13:05:34,820 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:05:36,771 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:05:36,791 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Trigger checkpoint 1@1720587936687 for 21494372e2d638922d748eef6a87a723_cbc357ccb763df2852fee8c4fc7d55f2_0_0.
2024-07-10 13:05:44,828 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:05:46,773 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:05:54,823 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:05:56,773 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:06:04,825 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:06:06,774 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:06:14,828 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:06:16,776 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:06:24,828 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:06:26,777 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:06:28,952 DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager [] - Closing expired connections
2024-07-10 13:06:28,953 DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager [] - Closing connections idle longer than 60000 MILLISECONDS
2024-07-10 13:06:31,410 DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager [] - Closing expired connections
2024-07-10 13:06:31,410 DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager [] - Closing connections idle longer than 60000 MILLISECONDS
2024-07-10 13:06:34,827 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:06:35,178 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received file upload request for file LOG
2024-07-10 13:06:35,182 DEBUG org.apache.flink.runtime.blob.BlobClient                     [] - PUT BLOB stream to /192.168.0.131:57314.
2024-07-10 13:06:36,782 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:06:44,828 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:06:46,781 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:06:54,832 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:06:56,780 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:07:04,829 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:07:06,781 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:07:10,812 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received file upload request for file LOG
2024-07-10 13:07:10,815 DEBUG org.apache.flink.runtime.blob.BlobClient                     [] - PUT BLOB stream to /192.168.0.131:57414.
2024-07-10 13:07:14,830 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:07:16,782 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:07:24,831 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:07:26,783 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:07:28,954 DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager [] - Closing expired connections
2024-07-10 13:07:28,954 DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager [] - Closing connections idle longer than 60000 MILLISECONDS
2024-07-10 13:07:31,411 DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager [] - Closing expired connections
2024-07-10 13:07:31,411 DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager [] - Closing connections idle longer than 60000 MILLISECONDS
2024-07-10 13:07:34,831 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:07:36,784 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:07:44,833 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:07:46,785 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:07:54,833 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:07:56,629 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received file upload request for file LOG
2024-07-10 13:07:56,632 DEBUG org.apache.flink.runtime.blob.BlobClient                     [] - PUT BLOB stream to /192.168.0.131:57536.
2024-07-10 13:07:56,786 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:08:04,834 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:08:06,788 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:08:14,838 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:08:16,788 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:08:24,837 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:08:26,788 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:08:28,955 DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager [] - Closing expired connections
2024-07-10 13:08:28,956 DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager [] - Closing connections idle longer than 60000 MILLISECONDS
2024-07-10 13:08:31,412 DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager [] - Closing expired connections
2024-07-10 13:08:31,412 DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager [] - Closing connections idle longer than 60000 MILLISECONDS
2024-07-10 13:08:34,837 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:08:36,788 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:08:44,837 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:08:46,790 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:08:54,838 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:08:56,791 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:09:04,839 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:09:06,791 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:09:14,839 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:09:16,792 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:09:24,840 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:09:26,793 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:09:28,956 DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager [] - Closing expired connections
2024-07-10 13:09:28,956 DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager [] - Closing connections idle longer than 60000 MILLISECONDS
2024-07-10 13:09:31,413 DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager [] - Closing expired connections
2024-07-10 13:09:31,413 DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager [] - Closing connections idle longer than 60000 MILLISECONDS
2024-07-10 13:09:34,841 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:09:36,794 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:09:44,842 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:09:46,795 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:09:54,842 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:09:56,796 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:10:04,843 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:10:06,796 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:10:14,844 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:10:16,797 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:10:24,845 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:10:26,798 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:10:28,957 DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager [] - Closing expired connections
2024-07-10 13:10:28,957 DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager [] - Closing connections idle longer than 60000 MILLISECONDS
2024-07-10 13:10:31,414 DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager [] - Closing expired connections
2024-07-10 13:10:31,414 DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager [] - Closing connections idle longer than 60000 MILLISECONDS
2024-07-10 13:10:34,845 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:10:36,799 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:10:44,846 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:10:46,800 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:10:54,847 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:10:56,801 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:11:04,848 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:11:06,804 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:11:14,848 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:11:16,802 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:11:24,849 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 842a3b7aa5ada486fee37b13badebd87.
2024-07-10 13:11:26,803 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received heartbeat request from 0e6b97944135d5137e19a21b33259c7a.
2024-07-10 13:11:28,959 DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager [] - Closing expired connections
2024-07-10 13:11:28,959 DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager [] - Closing connections idle longer than 60000 MILLISECONDS
2024-07-10 13:11:29,091 DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Received file upload request for file LOG
2024-07-10 13:11:29,094 DEBUG org.apache.flink.runtime.blob.BlobClient                     [] - PUT BLOB stream to /192.168.0.131:58112.