PL5 DeepPrime with fast GPU is very CPU bound

12 core Ryzen 3900x with all cores at 4.1GHz and RTX 3080ti with 1695MHz base clock.

Test set of 20 20Mp images. Export to disk times for the set:
No corrections applied: 10s
Basic corrections and DeepPrime: 66s
Basic corrections and DeepPrime with GPU forced to 765MHz: 70s.

Halving the GPU clock to add 4 seconds of processing time indicates only 4 of the 66s was GPU processing time or at least time waiting for the GPU.

As long as it is 2 or more the ‘Performance | simultaneously processed images’ setting makes almost no difference. For each simultaneously processed image an instance of DxO.PhotoLab.ProcessingCore.exe is run. Only one instance at a time takes significant processor cycles (presumably the one that ‘owns’ the GPU) and that instance uses at most about 15% CPU (2 cores worth). During export overall CPU usage doesn’t get much above 20%.

Reducing the CPU clock increased processing times almost exactly in proportion.

Don’t know if PL5 can be better optimized and/or use more CPU cores, if not it is looking like high end RTX GPUs do not offer much extra performance for the money.
At about 1/3rd the price does an RTX 3060ti with half the cores of a 3080ti perform the same as a 3080ti at half clock speed?

I found that setting the number of simultaneous image processing to half the number of cores gives the best performance.

Also, you need to restart PL after turning on the GPU.

