Fix counting of new jobs in scan-pbs (!945) · Merge requests · nordugrid / arc

Andrii Salnikov requested to merge andrii/arc:scanpbs-miss-jobs into master Feb 13, 2020

I have just discovered this bug that was there for ages!

Our current PBS backend is loosing 1 job per 10 minutes if cluster is loaded. And if you do not run at least 1 job in 10 minutes, than you looses starts to increase :) Sometimes it is a local jobs, so not that visible. But if there are no local jobs - you are always loosing a grid job obviously.

Details:

Each time scan-pbs-job runs, the number of processed jobs are increased by 1 that should not be a case.

The reason is echo "$exited_killed_jobs" | wc -l counts and extra \n that can be avoided with -n flag.

The code was the same for both PBS and PBSPro backends.

In addition, the waiting loop of this logs processing that aimed to wait 60 second was doing this for 600 seconds because of other typo that was fix. And actually this another bug that increases PBS latency 10 times, 10 time decrease the job looses :-)

Fix counting of new jobs in scan-pbs

Details:

Merge request reports