PL4 GPU benchmarking?

I’ve added my results to the spreadsheet.

To use the GPU I had to expressly select it in the Preferences, as I’d already established that the Auto setting for Preferences / Performance / Deep Prime Acceleration resulted in the GPU not being used.

During the EA Beta testing for PhotoLab 4 I was advised that forcing the use of this GPU might cause errors. So far, I’ve exported tens of images with DeepPrime NR (using the EA and commercial releases) without issue.

Is that 61.7 megaBYTES and 3megaBYTES/sec, or are those megaPIXELS?

Added results from my 2013 Mac Pro with 3.3GHz 8-core Xeon, 1TB SSD, 64GB RAM and dual FirePro D500 3GB GPUs. Too bad PhotoLab 4 can’t use both of the GPUs at once. With these images as well as my own, CPU and GPU times have been identical or very nearly so. My MP/s calculation is based on actual pixel dimensions, not camera spec. The D850 wedding images are 33MP, not the camera’s 45MP max - must have been cropped.

Thinking my 8-core CPU might manage better average times if I fed it more images (to utilize all available cores and threads) I downloaded the next 11 images from PhotographyBlog (those are all 45MP) and exported all 16 images at once. Again, CPU and GPU results were identical, but this time MP/s was 0.69, down from 1.5 for just the 5 wedding images). Looks like the larger 45MP image reduce MP/s performance by more than half. Ouch.
Interestingly, when running GPU, the load was handed off from one to the other partway through. Each ran consistently at 86% while under load. When running CPU, the CPU load was very spiky, remaining low most of the time and peaking up to 1100% for 5 seconds about every 30-40 seconds, all while the GPU continued to crank along at 86% with the load again switched from one GPU to the other partway through. Huh.

My conclusion is that larger batches aren’t handled any more efficiently by the 8-core CPU, and the larger 45MP images of birds substantially reduced the MP/s performance. I’m also left to wonder why the GPU is working just as hard when PL4 is set to CPU-only as when it’s set to GPU-only. The CPU seems to add virtually nothing.

Hi there,
It looks strange to me that export times are absolutely identical between CPU and GPU…
Just to be sure, after you have changed your settings for DeepPRIME, did you reboot PhotoLab? Otherwise, any change will not be taken into account…

Steven.

1 Like

Feel free to open a feature request about this. You’re not the first one with such hardware configuration, so this may get enough votes for DxO to consider it.

Um, no, I didn’t reboot. Doh!

…then it’s time to run another test I guess :wink:

It’s mega Bytes. The A7RM4 has a 61 mega pixel sensor. And the output ARW file is a compressed RAW file - the uncompressed file is larger.

I checked my (2019) Mac Book Pro 16 (Radeon Pro 550M) and it does it in 32 seconds. But I use the iMac Pro for most of my post-processing.

David

OK. Disregard everything I wrote above. I’ve retested, relaunching PL4 every time I switched DP prefs, and I’ve updated my spreadsheet entry. The upshot: Whether processing the five 33MP wedding shots, the mix of those with the next 11 45MP shots from the PhotographyBlog gallery, or just the one 51MP Egypte image, I got roughly 0.66MP/s running DeepPRIME on one FirePro D500 GPU. DP with GPU took 1.60x-2.37x longer than PRIME.

Hi. I’ve added my 2010 Mac Pro figures. CPU times are unsurprisingly slow but GPU times are 14.5 times faster. Using a Radeon RX580 8GB.

1 Like

I’ve just added my results from my 2018 MacBook pro with and without a Vega 64 eGPU
I also added my windows machine with a Vega 64 GPU and GTX970 (just swapped out for the Vega 64) and (ancient) i7-6700K

1 Like

I added the DeepPRIME times from my Windows 10 computer: Intel i7-4790K, 32GB RAM, Nvidia GeForce GTX 1050Ti.

The D850 images batch export took 137 seconds (1.22 MPx/s), the Egypte image export took 47 seconds (1.09 MPx/s). No surprises here.

Is DeepPRIME using TensorFlow or a similar framework for acceleration? Specifically thinking about whether or not it would take advantage of the Neural Engine in Apple’s M1 chip once PL4 is ported to it.

Currently PL4 does not use the Neural Engine. So on the news Macs it would use the GPU.

The winner is…my old Core i7 3770 3.4 Ghz, 32 GB Ram AMD radeon 5400 Win 10 machine.
The 5 Nef files took 1100 seconds
Or isn’t the highest value the best score
:innocent: :joy: :question:
For gods sake I own only a OMD EM5 MII

Wish all forum members a healthy and nice weekend

Guenter

Well, clearly the GPU makes the difference. I also have an i7-3770 (only 16GB RAM) but with a GTX950. Check the times in the spreadsheet.

Yes it is. I checked with Exiftool :+1:t3:


I made the test with my Macbook Pro 16" 2019
i9 8 cores 2,3 GHz, 32 GB Ram with Radeon Pro 5500M 4GB GPU.

Egypt (CPU): 3m33s
Egypt (GPU): 24s
Wedding, 5 pictures (GPU): 1min15s

Quite impressive for a laptop. The CPU did warm up a bit :hot_face:

Thanks for sharing. My 8-core 3.3GHz 2013 Mac Pro with 64GB RAM and FirePro D500 GPU managed only:
Egypte (CPU) - 4:30
Egypte (GPU) - 1:17
Wedding (CPU) - 14:21
Wedding (GPU) - 4:09

Dual GPU support would help, but that’s probably a very low priority. Apple Silicon support would seem to be top priority now. Sure wish DxO would say something about it, as I’ve got a new mini on the way and am dithering about whether to cancel and get an i5/i7 mini + eGPU…

What I was curious about is whether you’re using the frameworks (Metal, TensorFlow) that will allow for (relatively) easy porting to Neural Engine, or whether you rolled your own GPU code. I can understand taking either approach, but I really hope you used the former, and adding Neural Engine support isn’t too painful.

Well, I guess you are right, for the Mac, Apple Silicon is the future.
If you have your new mini you can test it out and if you are not happy, send it back to Apple for a refund and get an older mini. But be aware the eGPU is not working anymore with Apple Silicon.