PL4 GPU benchmarking?

All my export times were for jpg output. I amended the spreadsheet to suggest that.

I added results with my RTX 3080.
Used NVME SSD to be sure it has no impact on numbers.

Some general observations.
Canon 6DII file of the night sky ISO 6400.

Export a 16 bit Tiff. Prime at 60. PL3
Export a 16 bit Tiff. Deep Prime at 60. PL4
PL3, 28 seconds, PL4 17 seconds. Fans rev with PL3 not with PL4.
The screenshot shows the external GPU usages during export.

I’ve added my DeepPrime times to the spreadsheet.

So far, results have been interesting - it looks like most of my CPU cores are being used, albeit at less than 100% utilization, and the GPU is definitely in play due to my fast render times (approximately 8.5 20 megapixel Sony images per minute on at 1080Ti), but weirdly Task Manager is showing my GPU utilization as basically idle for the duration. My laptop, with a 2060, is a bit slower (approximately 6 images per minute), and I can see similar levels of CPU usage, with a GPU that, again, shows as idle most of the time, but occasionally spikes to 100% a few times per minute.

I have no idea if any of the utilization numbers I’m seeing are accurate. I’m guessing I’m missing some of the GPU spikes on the 1080Ti if they’re really short, but given that I’m not seeing either CPU or GPU running flat out, RAM is plentiful, and all of my data is stored on a SSD, I have to wonder where the bottleneck is. I’m inclined to think GPU, but it’s certainly not obvious from the numbers that I’m seeing.

interesting…
only 36 sec (DeepPrime, Egypte) on Nvidia P2000 - so that’s around 3Mpix/sec.

It should not be that fast.
CPU - i9 7900
64Gb RAM (DDR4).
I guess the amount of RAM made all the difference here??

DXO performance settings:
Maximum cache size: 20000Mb
Maximum number of simultaneously processed images: 16

P2000 is a humble performer, something is not adding up here.

Would developers be in a position to comment?

Task Manager in Windows is not accurate for GPU usage, it also show near 0% when running deepprime.
Afterburner which is more accurate show around 50% usage for Rtx 3080 but power draw is around 80% of the limit.
I also need at least 2 simultaneously processed images to reach max Mpix/sec.
I’ll do some more test to see if I can reach higher speed changing some parameters.

That makes sense. I’ve just fired up a test run with Afterburner monitoring, and that’s showing GPU spikes up to around 65% in places. GPU usage definitely looks like it’s in short bursts, so I’m guessing there’s still room for optimization somewhere, whether in terms of hardware or software.

I was planning on upgrading from the 6700K when Zen 3 ships, and going up from 16 to 32GB RAM as well, so I guess it’ll be interesting to see if a faster CPU with more cores can help keep the GPU loaded, or whether I’m purely GPU bound by the 1080Ti at this point.

I added times for my Surface Book 2 notebook w/ GTX1050, both for running on AC and on battery. Not surprisingly, times are better on AC (D850 images 196 sec, Egypte image 64 seconds) than when on battery (D850 images 252 sec, Egypte image 89 seconds). The integrated Intel 620 GPU processor is marked with an asterisk by DxO (i.e., don’t use); sure enough, it creates errors when trying to run.

This was mentioned in the German DSLR-Forum. So some of the forum names are from there.

Thanks; interesting. Google Translate is quite useful.

A Mac check:

I have an iMac Pro (Base) with a Radeon Pro Vega 56 (8GB) and an Intel Xeon W CPU (8 core, 3.2GHz). Processing a very noisy high ISO Sony A7RM4 image (61.7 MB) it took 20 seconds - about 3 mb/sec. I am used to running batches with Prime so I just walk away and let them run. But the GPU performance for Deep Prime is substantially better than what I used to see.

One note: in my experience, Deep Prime is valuable for high ISO noisy A7RM4 images only - no improvement at all in good light at ISO 100. But, I also have lots of old images shot with Canon Digital Rebel cameras, and these really benefit from the Deep Prime processing.

David

1 Like

Hi
It can also help in recovering shadows

2 Likes

I added my results to the spreadsheet. Surprised to see that no other Mac user has shared results so far.

I’ve added my results to the spreadsheet.

To use the GPU I had to expressly select it in the Preferences, as I’d already established that the Auto setting for Preferences / Performance / Deep Prime Acceleration resulted in the GPU not being used.

During the EA Beta testing for PhotoLab 4 I was advised that forcing the use of this GPU might cause errors. So far, I’ve exported tens of images with DeepPrime NR (using the EA and commercial releases) without issue.

Is that 61.7 megaBYTES and 3megaBYTES/sec, or are those megaPIXELS?

Added results from my 2013 Mac Pro with 3.3GHz 8-core Xeon, 1TB SSD, 64GB RAM and dual FirePro D500 3GB GPUs. Too bad PhotoLab 4 can’t use both of the GPUs at once. With these images as well as my own, CPU and GPU times have been identical or very nearly so. My MP/s calculation is based on actual pixel dimensions, not camera spec. The D850 wedding images are 33MP, not the camera’s 45MP max - must have been cropped.

Thinking my 8-core CPU might manage better average times if I fed it more images (to utilize all available cores and threads) I downloaded the next 11 images from PhotographyBlog (those are all 45MP) and exported all 16 images at once. Again, CPU and GPU results were identical, but this time MP/s was 0.69, down from 1.5 for just the 5 wedding images). Looks like the larger 45MP image reduce MP/s performance by more than half. Ouch.
Interestingly, when running GPU, the load was handed off from one to the other partway through. Each ran consistently at 86% while under load. When running CPU, the CPU load was very spiky, remaining low most of the time and peaking up to 1100% for 5 seconds about every 30-40 seconds, all while the GPU continued to crank along at 86% with the load again switched from one GPU to the other partway through. Huh.

My conclusion is that larger batches aren’t handled any more efficiently by the 8-core CPU, and the larger 45MP images of birds substantially reduced the MP/s performance. I’m also left to wonder why the GPU is working just as hard when PL4 is set to CPU-only as when it’s set to GPU-only. The CPU seems to add virtually nothing.

Hi there,
It looks strange to me that export times are absolutely identical between CPU and GPU…
Just to be sure, after you have changed your settings for DeepPRIME, did you reboot PhotoLab? Otherwise, any change will not be taken into account…

Steven.

1 Like

Feel free to open a feature request about this. You’re not the first one with such hardware configuration, so this may get enough votes for DxO to consider it.

Um, no, I didn’t reboot. Doh!

…then it’s time to run another test I guess :wink: