PL5 - DeepPrime and performance gain on Windows?

Savay · November 2, 2021, 1:46pm

As Lucas said, you have to be careful which metrics you are monitoring.

There are several queues for modern GPUs and especially Windows isn’t really great at displaying them properly.
The default “GPU Usage” is very 3D specific and mostly only useful for Games or General Realtime Graphics.
And to make things worse the queues present also depend on the Vendor, GPU Gen and sometimes the drivers.

The best Tool for monitoring this is still HWInfo.

Another thing to be aware of is that DeepPrime has the tendency to generate rather short “bursts” of load first on the CPU, then on the GPU and then on the CPU again which aren’t necessarily properly displayed either. (the faster the CPU and GPU and the smaller the data to be processed the less obvious this is.)
This is especially true for the fastest Systems and can be really hard to notice if there is a batch running where this behavior is interwoven and overlapping.
This will lead to high peak loads but only for a very short amount of time!
Which might look as if there isn’t much of a load at all, but only because of the sampling rate of the monitoring which isn’t fast enough.

Funfact:
This can also lead to stability issues when the PSU isn’t really sufficient for such kind of loads.
I had to completely change my be quiet 600W PSU for a 800W one to get rid of the triggering of overcurrent protection which only happend when processing batches with Deep Prime!
Which was way harder than any other synthetic or non synthetic load that i’m aware of, because most games aren’t nearly as demanding on the CPU and most synthies or “Powerviruses” for testing the PSU tend to generate a steady load.

If DxO are using such high end cards to make the numbers look good that is one thing but if they are completely ignoring the fact that many of the users are not using such “exotic” hardware then that is another!!

Well i don’t know what you are expecting them to do.
PL4 might probably already runs the NeuralNetwork almost at its best with older hardware (pre 2017/2018 stuff)

Optimizing Consumer Hardware towards the execution of ML Algorithms is a quite recent thing and we’re still somewhat at the beginning here.
It’s just like any other novelty.
You also don’t expect a 10+ year old PC to be able to display 4K Video or even HDR Video etc. without any problems or do you?

BHAYT · November 2, 2021, 6:59pm

Lucas Than you for your feedback. I understand the need to embrace new technology particularly when it helps solve or actually solves a specific problem.

In addition DxO is in competition with other manufacturers of editing software and the “arms” race dictates using every means at your disposal to elevate the position of your product with respect to speed and quality, particularly with respect to areas such as noise reduction.

Unfortunately such an “arms” race can leave some of us “out in the cold”. Graphics cards are a particular area where this has happened big time, partly because the use of the technology they contain for AI and partly because the cryptocurrency boom has caused a huge price hike.

I programmed my first computer in 1964 (an ICT 1301) at the start of my degree in ‘Computing and Data Processing’, went on to do some research then teaching before joining Burroughs Machines (later Unisys) where I worked for 36 years before retiring in 2009. I have first hand experience of the changes that computing has undergone. but retiring means mostly careful husbanding of resources and, hence, I am typically behind the curve with respect to the hardware at my disposal…

I have never been a fan of Apple for various reasons (that should generate a load of comments), although my youngest son uses Macs in his freelance videography business and recently bought an M1 laptop while my other son built a heavyweight Ryzen at the beginning of 2020, fully armed with a heavyweight graphics card, so that he could do architectural modelling and rendering at home as well as at work (came in handy given the lock downs we experienced)!!

I have only ever bought 1 manufactured machine in my lifetime (excluding laptops, Ataris and an Alan Sugar CPC or two) all the rest have been home (own) builds and that effectively precludes using the Apple operating system (I have no desire to try to build a Hackingtosh). Currently my systems consist of 2 x I7-4790K, 1 x I7-3770 and 1 x I7-2700K all running Windows 10 with 12TB of storage on all but the I7-3770 which is effectively in retirement. I am now looking to build a new AMD system of some description and retire the I7-2700K but that won’t happen for some months yet and buying a manufactured system may be the only way of getting a new graphics card included without paying all of the current premium.

So my gripe is affordability, the only reason I currently have for any graphics card at all is for DeepPrime. In your response you indicate that to use some of the older graphics cards more effectively the algorithms would have to change and might harm the rendering quality. From a DxO standpoint such a (sub) project may not be acceptable, from a users perspective I might well be amenable to paying a bit more for an additional license if it means not spending £500 - £1,000 just to buy a better card which still won’t be the pinnacle of excellence and will be “yesterday’s” technology even before I buy it!!

If you made it this far, thank you, if you didn’t I fully understand but of course you won’t get to read that bit!!

Stenis · November 3, 2021, 10:29am

Abgestumft:
Yesterday I made a test when I was preparing the material for a digital portfolio and blog. I batch processed 200 DNG-images with a Protolab 5 Deep Prime export to JPEG and and compared the result with old figures doing the same when Photolab 4 was new.

With Photolab 4 it took my machine (a five year Intel i5-6400 with 8 GB and NVIDIA Geforce 960) an average of 33 seconds per image with Deep Prime. With Photolab 5 the average was 24. It is absolutely faster.

Photolab 4 needs 37,5% more time in average with Deep Prime than Photolab 5 on my ASUS-machine. It´s not so much but might make up for my anticipated upgrade from Sony A7 III (24 MP) to A7 IV with 33 MP. It will exactly on the decimal preserve status quo which I am glad for :-). So thanks for that DXO :-).

Alec: For what it´s worth: My machine and graphics card is about 5 years old. Still got some gain.

BHAYT · November 5, 2021, 8:55am

Stenis I am glad you experienced a performance boost in going from PL4 to PL5. As I stated in an earlier post my results were less dramatic than yours.

I repeated the tests on my two machines that are very similar, I7-4790K @ 4.4, 24GB of memory and Sata SSDs for the operating system etc and HDD for storage. The main difference is that one has a GTX1050 2GB graphics card (A) and the other a GTX1050TI 4GB graphics card (B).

The TI card is similar in performance to your GTX 960 I believe, the GTX 960 is just in front of the TI cards in the Google spreadsheet column for processing D850 photos.

The versions of software installed differ with PL 5 trial on both machines but PL 4.3.1 on B and 4.3.3 on A.

The tests were individual exports (repeated a number of times) and then a group of 11 photos all from a Panasonic G9 (Olympus lens) and the RW2 files are 20 megapixels (23MB) in size.

Export on B (GTX 1050TI 4GB) took 20 secs, 21secs for PL5 and 23s, 22s for PL4 (V4.3.1).

Export on A (GTX 1050 2GB) 28s, 29s for PL5 but then 54s, 31s, 32s for PL4 (V4.3.3). At this point I realised that the photos were on a SATA SSD because I had been testing for any difference SATA SSD versus HDD (a little difference), but still cannot account for the “high” times encountered.

Reverted back to HDD on A and results PL5 - 27s, PL4 - 28s.

So a very small difference between PL4 and PL5 (1 second) but a bigger difference between the two graphics cards (6 seconds).

For the group tests exported 11 photos on A which took 3mins 43s (20.27sec/photo) for PL5 and 3mins 49s (20.82 sec/photo) for PL4(V4.3.3). On B this took 3mins 9s (17.18sec/photo) for PL5 and 3mins 13s (17.55 sec/photo) for PL4(V4.3.1).

Less that a second difference between the two releases but about 3 to 3.5 seconds per photo between the two machines and a better performance when pushing many copies through the machine in a batch. Is the difference in the nature (type/scene) of the photos being taken? As I indicated in a previous post DeepPrime helps me reduce noise in skies after my (possibly over enthusiastic) application of ‘ClearView’ amongst other PL edits.

GothicSerpant · November 30, 2021, 9:18pm

I feel the same as you … the U point is fun, but the level of extra gained control you have over it vs PL4 is modest … to top it off for me the upgrade software with my MacBook air has made processing in deep prime in both pl 4 and 5 so freaking arduous…

Stenis · December 2, 2021, 1:38am

The last week I have been postprocessing old dia positives of pretty poor quality that I repro photographed earlier. When I processed them I had to do all sorts of tricks to get them usable. It took the dubble time and sometimes more to process them than digitally born images and I wonder if that´s because I emulated Kodak t-max 100 in Photolab in order to destroy the really teribble grain these ORWO images have, especially in skies. I bought the film in Syria 1973 (that had ties to Russia and East Germany at that time) and some of the film was depleted because of to hot improper storing I believe.

I took the images when I visited Petra in Jordan. You can look for yourselves, the image quality is far from todays average digital image quality but they get som other strange qualities instead. The images are very soft with low dynamics. I have been very close to throw them away before i started to use Photolab intensively.

Petra - Den glömda staden - Fotosidan

MikeR · December 3, 2021, 3:43am

Lucas, I was curious about this, but my experience on my machine is that the real test is not the speed to process one image. but the speed to process 100 images. I don’t believe the processing time spreadsheet really handles that full load and potentially throttled system issue.

Anyway, I’m running on an RTX3070 based laptop (Ryzen 5900HS 8 core CPU) and the performance is really good.

BHAYT · December 6, 2021, 12:03pm

Stenis (@Stenis) I replied to an earlier post of yours in this topic and congratulated you on finding an improvement in the DeepPrime processing time between PL4 and PL5, not present with my 1050Ti & 1050 cards. I also came across your post above about the processing of your old images of Petra and the fact that the processing of photographs you took with your current A7 III took double the amount of time than “normal” images taken with that camera.

With respect to the performance improvement, I am glad that you seemed to have gained an improvement but on reflection that leaves me confused! I am confused because the first post in this thread (topic) described performance from a GTX1060 card and referred to a post elsewhere involving a GTX1080 card which actually showed a slight decrease in performance!?

The GTX 960 you used is a generation before my 1050Ti (and 1050) and faster than the 1050 but very slightly slower that the 1050Ti although in the Google DxO spreadsheet the 960 is just above the 1050Ti!? My processor(s) (i7 4790K) are a bit faster than yours so I cannot understand where your performance improvement comes from!? In fact the PL4 versus PL5 entry from @Savay for the GTX970 GPU shows a very slight improvement only (1 second for processing the test batch).

It is possible that Lucas (@Lucas) can help because he has stated that the performance improved with the RTX2000 cards onwards, i.e. with cards equipped with Tensor cores.

With respect to your photos of Petra I am glad that you managed to salvage your old images using PL5 and Filmpack (for the Kodak t-max 100 emulation). I ran a test with an image I took with a very high ISO last year (a very high ISO of 20,000 in good light because I had left the camera with the wrong settings from an “experiment” the night before) with and without the emulation and the times were essentially the same. I have attached the photo and PL5 DOP (in a zip file) with a Virtual Copy (for the Kodak Emulation) so that you can test that image on your machine if you wish.

I would guess that the increased processing time is caused by DeepPrime having to process the noise from the original media alongside the noise from the A7III. If you would like to share a single image (+DOP) via the forum or via a direct message then I will run a test on my system and see what that shows up.

In the blog you describe the process more fully and indicate that ‘Bicubic’ may well have helped, implying that the images were resized on export. I have not applied other fixes to my tulip image as your blog suggests nor have I resized the output.

Personally I always export from PL full size and then use FastStone Image Viewer resizing to batch resize (I maintain a 1920 x 1443 library alongside the original images, they are now the only images on our NAS and provide faster access for tablets and smartphones and a much more portable library). FIV offers Bicubic, Lanczos 2 (Sharper), Lanczos 3 (Default) plus other algorithms for resizing and can be left unattended to resize large numbers of images; my personal choice for resizing has been Lanczos 2 (and given that all my images prior to 2018 were JPG I generally refrained from applying the PL ‘Lens Sharpness’ fix because that plus Lanczos 2 tended to provide “oversharp” images!) I need to revisit my strategy with respect to RAW processing e.g. use ‘Lens Sharpness’ but change to L3 or Bicubic or …for the Library images. I am sure other forum members have their own resizing favourites.

@MikeR and @Lucas if you want some “boring” general landscape photos taken while wandering a golf course with my wife during the UK lockdown earlier this year I should be able to provide 50, 100, 150 etc. RAW (20 megapixel MFT ORF) images from an Olympus EM1 MKII showing landscape shots with sky, trees, wind turbines etc., I will remove any images with people and dogs etc… This would then require a new column or additional sheet for any that want to run long batch tests.

My issues with this would be actually getting the photos uploaded “somewhere” when I have an almost non-existent upload speed (though that could be done e.g. via Flickr and then “publishing” a Flickr “album”). This then poses the logistics issue of the downloads required by those (if any) wanting to take part etc’

However, it provides a consistent batch of images (others may have better ideas with respect to how representative such a group of images would be) to compare and contrast both the GPU and processor performance and additional elements like storing such images on NVME, versus SATA SSD versus HDD and other “nerdy” issues that might well be important for tuning current hardware and when selecting future upgrades!

PL5 Release 5.0.2.zip (21.3 MB)