About GPU acceleration

No idea about Mac actually. Let me ask @akarlovsky to reply.

Thank you,
Svetlana G.

1 Like

Not long ago, I have asked DxO guys about GPU processing and optimisation in general. I wanted to build a dedicated machine just for image processing.

I was told that there is no GPU processing for images and it is used only for previews. There is no optimisation for any CPU, just general SSE4 for example.

DxO struggles with SMP as well - I have old workstation with 2x CPU Xeon with four cores. It is slower than my Apple MBP 2016 with i7 4 cores 2.9GHz because DxO struggles with more CPU.

It can pretty much hog any single CPU with many threads. It rarely matters whether you use 1 image parallel or 4 parallel on 4 physical cores CPU (at least on Mac OS X). It either uses 1CPU with many threads and makes all quicker or it shares resources across the cores and you get slower results.

It would be interesting to get GPU because Apple finally uses eGPU.

1 Like

Build 2.3.1

  • all sliders in action
  • prime
  • convert to full jpg sRGB

i9 9900k 5.0/4.6
32GB 4000/17
m.2 NVMe

16x CR3 from Canon EOS R
2 pics simultaneously = 350 sec.
4 pics simultaneously = 299 sec.
4 pics sim + OpenCL = 293 sec.

It doesn’t look like the GPU is increasing utilization but a few seconds took off.

ps. any other combination of parallelization like 1x, 8x, 16x gave worst effect.


See here: https://www.videocardbenchmark.net/gpu.php?gpu=GeForce+RTX+2060&id=4037

I have found similar results to Rauf above, ie on a high end station the GPU does nothing or is even detrimental and 4 photos in parallel is the optimal point.

My station:
Dual CPU Xeon E5-2667 V2 for a total of 32 cores at 3.3 / 3.6 GHz
64 GB ECC Ram
GTX 1080 ti 11 GB
DXO Photo Lab build 2.3.1

12 photos CR2 21 mpix, ISO 3200 with prime and lens correction:
2 photos in parallel, no GPU: 175s
3 photos in parallel, no GPU: 148s
4 photos in parallel, no GPU: 134s (so +23% better than 2 in parallel)
4 photos in parallel with OpenCL: 170s

Note that DXO does a ramp up at start so with larger batches we see an even better result of 4 photos in parallel:
30 photos CR2 21 mpix, ISO 3200 with prime and lens correction:
2 photos in parallel, no GPU: 472s
4 photos in parallel, no GPU: 350s (so +26% better than 2 in parallel)

2 photos in parallel only use about 58% of the cores. More than 2 photos in parallel use all cores, only dropping during transition from photo to photo.

When editing my photos and with openCL enabled I never see GPU usage > 3% so clearly not much is happening there. I have now disabled OpenCL completely in DXO.


Did you use PRIME for your tests? Based on my own tests I’d have expected 8 to be optimal for 32 cores. But PRIME does more multi-threading by default so could saturate your cores quicker.

But right, the GPU does nothing for export. The only case when I see more GPU utilization on MacOS is when I do local adjustments.


Good point @Calle
Yes, as indicated my tests included PRIME.
I ran a new test with 20 photos with 4 and then 8 photos in parallel.
8 photos in parallel is 6% faster on 32 cores than 4 photos in parallel. So great!

For general information, on my win10 64 GB memory usage idle is 13 GB.
Running 4 photos in parallel, memory usage peak is 16 GB
Running 8 photos in parallel memory usage peak is 19GB.
So DXO does not use much memory.

In Windows 10 do the following:
Go to

  • Windows Settings
  • System
  • Display
  • Graphics Settings (at the bottom)
  • Classic app
  • Browse (the button)
  • C:\Program Files\DxO\DxO PhotoLab 3\Dxo PhotoLab.exe
  • Add
  • Select High Performance

Restart PhotoLab and now you can select the OpenCL.

It will certainly speedup full screen previews.


That’s a usfull thing, pity pl doesn’t do this as it spends a long time each install setting it’s self up. I take it this will need remembering with new versions. I had open CL selected but your changes weren’t there already.

Thank you for that!

omg why are all yours so slow even the guy with i9 processor?

im on i7 processor and turning out 17-22 raws a min running at 12 simultaneously

i9 9900k 5.0/4.6
32GB 4000/17
m.2 NVMe

16x CR3 from Canon EOS R
2 pics simultaneously = 350 sec.
4 pics simultaneously = 299 sec.
4 pics sim + OpenCL = 293 sec.

just did anther test on i7 with open cl on and turned out 189 raws in 5mins 31s and this is all stock speeds however i have 3 options of overclocking speeds i could use to further boost if need so i really dont get those with i9s processors turning out so slowly

Are these results with or without PRIME noise reduction?

1 Like

Those are with prime noise off as i shoot always shoot iso100 no matter the situation but i do know when i tested prime before when bought dxo it wasnt much slower but i re-test tomorrow with prime and see what im hitting i do know it still at least 12 primes every 3 to 6 mins at the very least but will test and confirm i just dont use prime as no need for me

my typical image turnover is a 1000 a day raws and that is all in under 30mins i think if remeber way back from when first used prime it was pushed too 58mins around that mark but will update properly when refreshed mind on actually testing and fact

What size are your RAW files and do you use the lossy compression CRAW ?

This is a very interesting thread. I have a Ryzen 7 3700X CPU, an RX-5700 XT graphics card and 16 Gb of RAM. I tested PL3 with 10 NIKON D750 RAW files, all around 28 mb. I had Prime noise reduction enabled on all 10 files with varying ISOs from 3200 to 12800.

Processing 4 files simultaneously took 1 minute and 37 seconds.
Processing 8 files simultaneously took 1 minute and 50 seconds.

The processing times were exactly the same with OpenGL enabled or disabled.

When processing 4 files simultaneously about 8 Gb of RAM were used.
When processing 8 files simultaneously about 10 Gb of RAM were used.

Does anyone know if PL3 would process faster with 32 Gb of RAM?

Thank you,


I don’t think so as each of the XPCCor processes consume only 600MB of RAM. (At least when I last checked it with PL1.)

Your CPU has 8 cores and 16 threads. Based on my tests so far processing more then four pictures in parallel will not lead to better results as each picture will use four threads already.

GPU has no impact on performance in exports.

Did a test on my 8-core 2019 iMAC (40GB RAM) and checked how the number of images processed in parallel influenced the total processing time. Used my test set of 60 images (total of 1.3 GB) and got the following results:
1 -> 190s
2 -> 148s
3 -> 142s
4 -> 140s
5 -> 144s
12 -> 138s
As we can see, overall processing time is almost equal with 3 and more images processed in parallel. I did not use the Intel Power Gadget but I suppose that the limit has also to do with the thermal design of the PC, together with diminishing returns of parallel processing.
Note that the test with 12 images processed in parallel was done a few minutes after the test with 6 images while the tests before that were pretty close to each other.

While I did these tests, I discovered a bug that I’ve also reported elsewhere. Setting the number of parallel images, I get 1-2-3-4-5-6-7-7-9-9-11-12. Note: 8 and 10 don’t show and are replaced by 7 and 9 respectively.

1 Like

This post is the most useful I’ve read here: I never understood why my computer was so slow, asked for it a couple of times the support, and didn’t get this answer.
Thanks a lot.

Thank you for the information. It is really helpful.