Photolab and Adobe DNG files - archiving old formats

Hi Peter,

I had forgotten that XnView allows “companion” files (sidecars) to be associated and then moved and deleted as per the raw “lead” file.

I am using “NeoFinder” which does not seem to have the same option but I am able to write my own utility apps to clean up any .xmp / .dop orphans.

One of the strengths of DxO PL3 is the fact that it creates .dop files which mean I can avoid owning a database of meta data and corrections with the added advantage that I can create my own tools to glue my DAM app to DxO.

One negative of PL2 and PL3 is that if an “openwith” command is issued on a raw file that is in a folder of thousands of images PhotoLab adds all the images to its database which takes an age.

The solution is to drag and drop the image into an album or use a tool that uses the DxO provided Lightroom -> DxO command line tool.

My present plan is to stick with original raw files, .xmp and .dop files and store my images in folders based on month of capture. This should mean that there are never more than a couple of thousand images in a busy month which then means I can allow PL3 to catalog a complete month or even use CaptureOne in session mode given that neither DxO or PhaseOne are anywhere near producing Digital Asset Management solutions that comes close to power and reliability of XnView, NeoFinder or the other stand alone DAM application.

As for Raws, the majority are Tiff files so I think developers could be writing meta data to them.
However, I accept that it complex and difficult to test especially when compared to editing a simple xml file. However there are advantages to having .xmp and .dop files, they are tiny in comparison to the raw files they refer to meaning that changes to meta data and edits get backed up vey quickly and any write errors do not damage the original raw image file.

best wishes
Simon

1 Like

hmm, i am ABSOLUTE :slight_smile:not an expert, not even informed at first base( know some basic things that’s it), but is the difference between a tiff and a raw file not the demosiacing principle? so is it possible that camera’s who deliver tiff based rawfiles are processing the sensordata including the demosiacing before storing?
in that case those work PRIME still with those files?

Neither am I anything like an expert although I recently wrote an application that extracted the JPEG preview file from the raw files I have on my computer (crw, nef, orf, rw2).

While a tiff and a raw look like single files they are actually a series of blocks of data. The first few bytes of either type include a pointer to a “table” of data. This table of data is a list of records, each of a fixed length (12 bytes), each record includes a numeric ID known as a tag, an indication of how the binary data should be read e.g. Ascii text, where the data is in the file and how long it is. These tables are known in tiff speak as “Image File Directories” or IFDs.

So to find the jpeg preview I first had to find the start of the table and then look for the tag that is used to identify jpeg data. Next the code reads the location and size information and then extracts the block of bytes from the file. In this example this block of data is a jpeg image.

The latest update to the Tiff specification is dated 1992 which is well before the invention / adoption of digital raw files so the specification is unlikely to include information on how raw data should be written. So if raw data were stored in a tiff structure it would no longer be a tiff because it would not meet the spec.

So most of the camera manufacturers took the tiff structure, changed its internal ID and saved their raw data and a collection of their own private meta data within a tiff like structure. This includes their private data along with jpeg previews, iptc meta data, exif data. So reading a raw file (or a tiff) is a little like being given a book that starts with an index. Some of the pages referred to by the index are in a Foreign Language while others are in English. I can read the index and the pages that are in English but while I can see the Foreign Language pages I don’t understand them. If I add extra pages mid way through the book I have to update the index page because the page numbers will have changed. Both raw and tiff files may include multiple indexes (IDFs).

Getting back to raw files. The tiff structure could be used to store any binary data such as a spreadsheet or word processor file. However, if this type of data is included then its not a tiff just a data structure. So you a correct in suggesting that tiffs only include demosiaced data its just that any that do are called raw files. Also a tiff file may include any number of images as may a raw file (my raw files seem to hold two jpegs plus the raw data).

Lastly why not change the tiff format to include raw data? Well Adobe own tiff so they did and its called DNG (i’m probably being simplistic here!).

From a programmers point of view it is far simpler to test code schemes for xmp, tiff, dng and say dop files rather than having to test code schemes against three hundred odd raw formats. So we are stuck with xmp unless we adopt DNG

Sorry for yet another long post,

best wishes
Simon

That sounds as an expert to me! :grinning:
about the text, good exploitation, i can follow you but not swim on my own so to speak.
i can see/understand that Raw files are just blocks of data placed in a IFD controlled by Numeric ID, And i know (read somewhere) raw uses 12Bits for decoding the exposure data and the others for something else (forgot what, probably redundancy check )

(And you are a clever programmer to decode those files for your own use.)

well i know there are "linear"DNG’s which are tiff’s (demosiaced) with a floating WB (every raw-developing app can produce them, and Adobes DNG is maybe a real raw in DNG container:

Citaat
Because DNG is a file format based on the TIFF format, it can not only be used to store RAW data, but also RGB data. So you can have a DNG file that is a RAW file, but you can also have a DNG file that is not a RAW file but a so-called “linear” RGB file. That is quite confusing, certainly because some RAW converters use that option. What you actually get is a TIFF file in a DNG envelope.
Citaat
some? most DNGexport in raw converters is linear. So that’s why i hesitating to say Adobe’s DNG converter is fully raw-DNG.

But then again who am i to claim this knowledge.

That’s a thing what is true and or you need a independed RAW DNG converter with DAM functionality (adding tags and such) which can read almost all types of RAWfiles. i believe Adobe Bridge can be this if it can give tagging metadata to the DNG.
help text adobe DNG converter
For my use i hope that DxOpl is evolving there DAM functionality and there DNG setup,
reading Real DNG (if i encounter RAW-files which arn’t in there list.) and linear DNG’s and writing linearDNG which you can swing back in DXO to create semi-finished products.

Better a long clear text then a short one who need’s extra explanation :slight_smile:
Regards
Peter

Whatever the ifs and possibilities might be, the question boils down to this i.m.o.

  1. Trust Adobe and convert
  2. Don’t trust Adobe and keep your originals

As with all either-or choices, this choice is bringing in risks that we can probably not fathom satisfactorily.

To get the best of both worlds, keep the originals and convert to dng too. Bringing edit history, keywords etc. into these files depends on what converter we use. Being able to read meta and development data and interpreting them correctly can limit the selection of apps though. DPL might be a wallflower in this dance, I’m afraid.

2 Likes

No matter how good option 1 is it is an additional processing stage that can go wrong or introduce errors. Whereas option 2 means that it is always possible to start processing from the beginning.

The problem is really one of terminology : a tiff file conforms to both the structure rules and data type laid out in the tiff specification. A raw file follows the structure rules but not the datatype. A dng is an extension of tiff that allows raw data and requires certain meta data but also allows demosiaced data as do raw files because the programmer can place anything they want into them.

What I find perplexing is that DxO PL is unable to open dngs based on raw files from cameras it has no knowledge of and also the reports that Capture One makes a better conversion from camera raws than it does from dngs based on the same raw files. These two issues seem to contradict the major “selling” point of dng files which is future proofing the raw data.

I think that I am going to ignore dng for the time being and possibly look at using Exiftool to read xmp keyword data and write it into the original raws. I will have backups before I try this!

best wishes
Simon

1 Like

While camera manufacturers add their own proprietary ‘makernotes’, there are three main metadata interchange standards in use by the photography industry - IPTC, EXIF and XMP. And even though different camera manufacturers have their own RAW formats, to the best of my knowledge they all follow very similar principles, so the risk of anything going wrong is very, very low, provided you stick to well regarded software.

IMHO the very much bigger risk, long term, is that the sidecar files become misplaced or deleted - just like old printed photographs that have become separated from the albums that once held the hand-written metadata to explain them, compared to those that have the metadata safely written on the reverse of the actual photo. Probably not a problem while you’re in charge of them, but perhaps when your collection is eventually passed on to someone else, especially if they aren’t a photographer and have no idea what a sidecar file does…

A very good analogy!

Exactly! My “spring clean” of my image collection has revealed horrors including orphan .xmp files where the image has been deleted but the .xmp missed and worse two or more .xmp files in different folders referring to a single image and containing different IPTC data. Also, all those .xmp files increase the workload when trying to sort things out. The jump from seven and half thousand images in a folder to fifteen thousand files in a folder is a significant burden when those extra files have to be checked.

Other problems included some early digital images have lost their Exif data and the same camera generated file name being used several times for different images.

My simple solution to the xmp keyword issue is to add principle keywords to the file name. In the longer term I shall also investigate using a tool to copy keywords from the xmp files into the raw file (or perhaps a dng copy). This also means that the xmps have to be renamed as well if I decid to keep them. A further problem is the size of my collection, some 80,000 images and 1Tbyte means that any operation applied to every image takes many many hours.

Spot on, I am now thinking about how my collection of family snaps can be passed to the next generation. Its quite ironic that there is no issue with images taken before 1999 yet we have to have these discussions about the modern image format. Progress!

best wishes
Simon

1 Like

Hi,

This page may be of interest: https://exiftool.org/idiosyncracies.html#raw

Simon

I was trying to write a program to add keywords to my nef files using exiftools in Pascal. I read from about 65000 images the keywords and stored them in a text file on disk. It toke me about 22 minutes. It was my first try. I stopped with it since I discovered that I could use ViewNx2 with my camera,D750. I think it’s the latest model that is supported. I will continue with my program, but slowly. :blush:

Correct me if I’m wrong.
The raw contains the sensordata in 12 or 14 bits digital values(Nikon). It is written to disk in 12 or 14 bits info. To keep it simple forget eventual compression.
A tiff file contains rgb data in 8 or 16 bits depth, resulting in 24 or 48 bits pixels.
A dng file contains…?
A linear rgb file is an image converted out of the sensor data but with no color space corrections?

George

Hi George, I think that you are correct : the tiff file specification details how an image is to stored as a series of bytes that represent colour in some way. Whatever the spec details it does not include raw data from camera sensors. However, the structure of a tiff can be thought of a little line a mini file system meaning the tiff structure can be used to store any data. Of course as soon as other data is stored inside the structure the file is no longer a proper tiff file. This means that if raw image data is stored within a tiff structure it has to be named something else e.g. .nef. In the case of .nef it will be something like 14 bit values of RGGBRGGBRGGB and so on which have to be processed to generate an image.

A dng is an extension of the tiff format so may contain any or indeed all of the formats.

No idea! I believe that the linear refers to the fact that the bytes are read left to right in a series of pixel values RGBA RGBA RGBA… So a linear dng is a file that is very similar to a tiff’s image content just with a modified or extended internal structure. However, please note that I have no real idea about how any images are stored in an array of bytes inside an image file. Rather I have approached raw, tiff and dng files much like a book with an index where I can read and understand the index and some of pages. If I add or subtract from a file then the index has to be updated otherwise the part that points at the raw data will point to the wrong location.

best wishes
Simon

Raw file formats are popular in digital photography workflows because they offer greater creative control. However, cameras can use many different raw formats, the specifications for which are not publicly available. This means that not every raw file can be read by a variety of software applications.

A linear rgb file is an image converted out of the sensor data but with no color space corrections?

No idea! I believe that the linear refers to the fact that the bytes are read left to right in a series of pixel values RGBA RGBA RGBA… So a linear dng is a file that is very similar to a tiff’s image content just with a modified or extended internal structure.<

A linear DNG means that the file has been demosaiced and a RGB image produced from the raw data. This is the same as a tif file and is not a raw (mosaiced) file…
Ian

Yes, that was what I was suggesting. A dng is a published extension of the tiff file format and is designed to store raw (mosaiced) camera data in a standard published way. It is also able to store demosaiced data i.e. final (or near final) image.

The change from camera raw to dng relies on an Adobe application converting the raw data and makers notes from the camera manufacturers closed format to the Adobe designed .dng format. I say unpublished but some camera manufacturers admit to sharing their file design with some unidentified software houses.

The original purpose of this thread was to explore the options for storing and reading old raw formats. The thread expanded to include how IPTC data such as keywords may be saved and also some of the myths that surround raw file formats. My conclusions, as the OP, are as follows.

  1. Should older raw formats be converted to .dng for future proofing?

It can’t hurt but given that storage is cheap keep the original raw file. This may be as a separate file or within the .dng wrapper format.

  1. What are the advantages to converting from raw to dng?

The dng file “may” be compatible with a wider range of processing applications. It will be compatible with Adobe products. IPTC and processing data should be stored within the dng structure meaning that there is no need for sidecar files. (Note, not many apps seem able to write meta data to dng)

  1. Are there disadvantages when converting from raw to dng?

Yes. The conversion takes processing time and the new file requires storage. Some thought needs to be given as how to manage the two files if both are kept. Many processing and DAM applications fail to utilise the ability to store meta data within the dng format. The conversion relies on another application working without errors. A minor change, for example adding a keyword, will require Mbytes to be re-written to disk and probably backup.H

  1. Are raw files ever modified?
    To say that a raw file is never modified is an urban myth. The raw data is not changed but the meta data may be changed. It is possible to write IPTC data directly into most raw file formats. However, if using a third party tool it is advisable to conduct tests on each camera model that you use. PhotoMechanic and ExifTool are examples of applications that can write IPTC data into a raw file.

  2. What about sidecar files?
    The principle problem with sidecar files is that your DAM system must keep them in the same folder as the raw files they refer to. If you keep tiff or jpeg “final versions” of your images then be cautious about file names. The specification for xmp side car requires that the xmp file share the same name as the raw file less the extension. If your tiff, dng or jpeg also share the same name as the raw file and your software writes all meta data to xmp you may overwrite the xmp that contains the processing instructions of the raw file (not a problem with DxO as these are held in a .dop file). A single xmp file may refer to more than one image file but I am not sure if many applications are capable of processing such xmp files. Remember IPTC data should be saved inside jpeg/dng/tiff files.
    However, the use of xmp/dop sidecar files has the advantage of speeding up file saves and backups as only a small text file is being written instead of a large dng file or raw file. This is significant when adding say keywords to thousands of files and backing up the changes to slow media (i.e. the cloud).

  1. What is your work flow?

It is evolving! I have decided to stay with original raw file formats although I may convert my old Canon and Nikon raw files to dng at some point in the future. While I have experimented with writing keywords directly into the raw files I have decided to stay with xmp files. However, to protect against lost xmp files I now write my keywords into the file names. I am working on a utility app that keeps the names of all the files that refer to a single raw file synchronised. I have also spent days changing how my files are named and stored. All image files now conform to the following name convention : Year_Month_Day_CaptureTime_CameraFileName_Keyword1_KW2_KWn.Ext

I use underscores as opposed to dashes because Apple Spotlight treats the dash as an ignore character in searches i.e if you try to search for -Keyword it will ignore it. The simplest solution was to use the underscore character. I only use basic characters in volume/folder/file names to ensure widest compatibility across operating systems. For examples spaces and brackets can be a nightmare on servers.

Images are stored in folders by month of capture . My aim is to be flexible. I use a DAM application but if all else fails I can use Apple’s Finder with Spotlight to locate images.

best wishes
Simon56

Good logic thinking and explaining, there is one thing which can brake your chain:

  • the length of the filenames can be reaching limitations of network transfer or reading capabilities of older aplications.

One thing i did not investigate is the posibility to bound rawfiles and xmp’s together in win10 explorer actions. In oa Xnviewmp you can bound files to a “master” so moving and deleting and changing filename will effect also the xmp. But i you are in explorer or in a other aplication this is less likely the activity.

In some ways there is only so much you can do. I have run into the problem you describe but it was some time ago. In that case file names were truncated to DOS style 7.3 form.

I use NeoFinder as my DAM and it does not have the ability to nominate sidecar files to be treated the same way as the raw files they describe but it is a feature that may be added soon.

Isn’t that to much info in a name? What if you want to add a keyword later? Can you use keywords existing out of more than just 1 word?
I store my files in directories by date and a hint of the subjects in the directory name. Files are named as GW_YYYYMMDD-number, GW being my initials. Keywords are stored in the iptc of the files. A directory may contain a jpg-subdirectory containing jpg’s from that directory. A must since I can’t use CaptureNx2 anymore.
Any image browser can deal with that.
I’ve a nas installand and since I’m working mostly from 2 pc, I synchronize them with SyncBackFree. Two sync runs and both the pc’s and the nas contains exactly the same. Not perfect but enough for me.

George

Rather than all that confusion in the file name, why not use tags, either in macOS Finder or, in Windows (reference here)

Hi, the first thing is that you should use whatever system works for you. In answering your questions I should mention that these file names are my “belt and braces” solution. If my DAM software becomes unavailable then I can use Apple’s Finder and Spotlight to locate images while I seek a new DAM application. Has this ever happened? Yes three times so far : iViewMedia Pro, Apple Aperture, Adobe Lightroom going rental.

Is that to much info ? Some of my images have been stripped of EXIF data due to a combination of unreliable software and finger trouble so storing the capture date and time is useful and means that the Finder may sort the images in chronological order. I keep the original file name to ensure that the names are unique, remember cameras may capture many images every second and also to enable me to identify duplicates.

What if you want to add a keyword later? I am using standard XMP keywording as my prime method of using keywords. If I want to add or remove keywords from the file names I just run my application to synchronise the XMP keywords to the words in the file name. It runs at a thousand files per minute.

Can you use keywords existing out of more than just 1 word?. Yes. XMP keyword “Cat and Dog” will be converted to “_Cat_and_Dog” as I do not like spaces in any file names.

First I have found that adding the keywords has added clarity rather than caused confusion but that may just be me. The principle reason for not using tags is that tags can be lost from files. For example on a Mac commands to reset the Spotlight database can remove both tags and comments from the files.

If you look at the screen shot above I think you can gain a great deal of information about the images with nothing more complicated than the list file command. This is true across almost all operating systems (bound to be an exception) as they all can list file names even if they can not process them. Look at the last image file and take a wild guess when and where it was taken and what the subject was.

best wishes
Simon
I’ll post details of the images later.