PL5 Adds Legacy (i.e., obsolete) IPTC Metadata

,

When you look at other DAM’s you see the range they cover which will be the range PL users will be asking for. It will be a never ending cycle of demands as its become now I suspect at the cost of the core program.

Hello!

@jch2103 the issue is claimed as fixed but to have a clear test of it I need your raw+xmp. Could you, please, provide them for me via upload.dxo.com?

Thank you
Regards,
Svetlana G.

Thank you for checking back.

Better yet, I’ll kill two birds with one stone. Here’s a link to a raw and output image with .dop and .xmp sidecars: Microsoft OneDrive - Access files anywhere. Create docs with free Office Online.
I set this up for @artmax for a time zone issue.

The ExifTool dump for the output JPG suggests that DxO has indeed cleaned up most of the Legacy IPTC tags, but that IPTC Application Record tags for By-line and Copyright Notice are still being created in the output image (but not present in the NEF).

Thank you!

Testing the issue.

Update: I confirm, the issue has been fixed, here is the output after the fix DSC_2725_DxO.zip (6.5 MB)

The fix will be delivered with the next release.

Regards,
Svetlana G.

Regards,
Svetlana G.

Excellent! Thanks, Svetlana.

John

1 Like

Additional comments (apologies - I didn’t take a detailed look at the updated image you posted until today):

  • Yes, ‘old’ IPTC data is gone. Thank you!
  • There’s still an issue with exported hierarchical keywords: the top level of the hierarchical keyword is duplicated in the export file (i.e., an extra ‘National Park’).
  • The ‘00:00’ time zone bug in Original Date/Time is still present in this exported image.

Hopefully these bugs will fixed also in the next release.

Good morning!

Yes, the latest 2 are still in a development stage.

Regards,
Svetlana G.

Hi John

I just downloaded your samples - a couple of questions to sate my curiosity…

The original NEF file contains both the exif:imageDescription and exif:userComment tags. I take it these are coming from the Image Comment menu item on the camera? My D850 only writes the exif:userComment tag.

I notice that the dc:description tag in the XMP file contains the value from that/those tag(s)? This is something I didn’t think of for my keywording/etc app but it looks like more cameras are now putting more metadata in the EXIF. Something, too my mind, that both DxO and other DAM apps are going to have to take account of. This also means that, if apps don’t allow writing to RAW files, we are going to find more and more synchronisation errors when moving files from one DAM to another.


The XMP file was apparently authored through ExifTool 12.39, which is the very latest version, what did you use to create the XMP file - an app that wraps it or the command line?


The lr:hierarchicalSubject tag contains a non-hierarchical, or at least, single level keyword…

  <lr:hierarchicalSubject>
    <rdf:Bag>
      <rdf:li>National Park|Chaco Culture National Historical Park</rdf:li>
      <rdf:li>Vacation</rdf:li>
      <rdf:li>National Park</rdf:li>
    </rdf:Bag>
  </lr:hierarchicalSubject>

From what I have read from the MWG guidelines, this is superfluous, so it would be interesting to see what put it there. Not all apps do this.


Finally, where did all the photoshop: and tiff: stuff come from? Lightroom by any chance?

@Joanna
Excellent questions. I’m likely tied up today on other things, so probably won’t be able to answer until tomorrow. In the meantime, you may find this topic in the IMatch Help system relevant/interesting: IMatch Help System
Also of interest: IMatch Help System You’re certainly not a beginner, but this link discusses how IMatch handles metadata internally.

As I said above, good questions!

By way of background, I use IMatch as my DAM; IMatch in turn uses ExifTool to manage metadata. IMatch creates an XMP sidecar for files, as described in IMatch Help System (again, this link is just to explain how IMatch processes metadata). Internally, they use the ExifTool .args files to copy/translate tags from one group/tag to another group/tag. These .args files are part of the full ExifTool installation. One example from exiftool Application Documentation is

exiftool -@ iptc2xmp.args -iptc:all= a.jpg

Translate IPTC information to XMP with appropriate tag name conversions, and delete the original IPTC information from an image. This example uses iptc2xmp.args, which is a file included with the ExifTool distribution that contains the required arguments to convert IPTC information to XMP format. Also included with the distribution are xmp2iptc.args (which performs the inverse conversion) and a few more .args files for other conversions between EXIF, IPTC and XMP.

IMatch uses the full set of ExifTool .args files.

This is interesting. My Z6 includes an option for storing user comment in the image file, but I don’t use it. So in this case, the information in the dc:description tag in the .XMP file was created via IMatch/ExifTool (using those .arg files) when I used IMatch to add the description tag XMP::dc\description\Description\0. But I don’t find exif:imageDescription in the NEF file, using Notepad++.

Yes, as noted above I was using IMatch/ExifTool (12.30) to create the XMP file.

I’m not sure what happened here. PL5 does create superfluous top-level keywords (‘National Park’) in the output JPG file, which I’ve reported as a bug; a fix is ‘in a development stage’ PL5 Adds Legacy (i.e., obsolete) IPTC Metadata - #16 by jch2103. However, I don’t know how this got copied to the NEF. I’ll need to monitor things to see if it happens again or if it was due to user error on my part.

I did set up a new copy of the unedited NEF, added basic metadata (caption, hierarchical keywords) and processed it with PL5. No issues with the NEF. The output JPG demonstrated the bug (extraneous 'National Park). So the issue with the original NEF that I had posted may have been a one-off.

The ‘photoshop’, etc., tags are a result of the ExifTool .args files. I didn’t use PS or LR or any other Adobe product on this files.

Let me know if you have questions.

OK, that explains a lot.

I understand what you are saying but do you have a choice of which ones get executed, because your XMP file shows stuff that one normally wouldn’t find like the photoshop: and tiff: stuff.

In any case, I would not expect PL to take such tags into account, unless it ever aspires to being a full metadata management tool - something I would hope DxO don’t waste time and money on as there are already plenty of other apps that do it and it would involve all sorts of standards compatibility battles that a RAW image editor really need not get distracted into when there are already too many much needed image editing areas that need either fixing or adding.

I used exiftool -G -a filename to get to it

It didn’t get copied to the NEF file, nor did it get written to the sample JPG file I downloaded.

We seem to be talking at cross purposes here…

<lr:hierarchicalSubject>
    <rdf:Bag>
      <rdf:li>National Park|Chaco Culture National Historical Park</rdf:li>
      <rdf:li>Vacation</rdf:li>
      <rdf:li>National Park</rdf:li>
    </rdf:Bag>
  </lr:hierarchicalSubject>

The extraneous line is not National Park - that is a legitimate part of the hierarchical context of Chaco Culture National Historical Park

If any line is extraneous, it is Vacation, which only exists as a standalone keyword.

From a great deal of research, I have found that there are two “permitted” ways of describing the National Park > Chaco Culture National Historical Park hierarchical context…

<lr:hierarchicalSubject>
    <rdf:Bag>
      <rdf:li>National Park|Chaco Culture National Historical Park</rdf:li>
    </rdf:Bag>
  </lr:hierarchicalSubject>

… and…

<lr:hierarchicalSubject>
    <rdf:Bag>
      <rdf:li>National Park|Chaco Culture National Historical Park</rdf:li>
      <rdf:li>National Park</rdf:li>
    </rdf:Bag>
  </lr:hierarchicalSubject>

Neither is wrong.

I wrote my app to use the second (full) version and, if you look at the spreadsheet in this post in a long conversation we had with Bryan (@BHAYT), you will see that CaptureOne uses the second version

Strictly, all that is needed to describe the hierarchical context of Chaco Culture National Historical Park is the shorter example but, by including the context for each keyword in the hierarchy, to my mind, it makes things just that bit clearer.

Let me try to explain.

Here we have three keywords…

  <dc:subject>
   <rdf:Bag>
    <rdf:li>Chaco Culture National Historical Park</rdf:li>
    <rdf:li>National Park</rdf:li>
    <rdf:li>Vacation</rdf:li>
   </rdf:Bag>
  </dc:subject>

We need to describe the hierarchical context of any and all keywords, mentioned in dc:subject, not just those that we thought we decide warrant “explaining”.

Any keyword not included in lr:hierarchicalSubject should be read as purely standalone so, by omitting National Park, we are saying that we are using it as just a standalone keyword and not the root of the National Park > Chaco Culture National Historical Park hierarchy.

Of course, National Park could also be used as a standalone keyword, but not if that use is combined with its role as the parent of Chaco Culture National Historical Park

PL5 is correct in including National Park in the lr:hierarchicalSubject tag, but it is not correct in including Vacation, which is strictly a standalone keyword.

Now does that help or further confuse?

There were over 13,000 metadata tags a few years ago that ExifTool knew about and I’m sure the number is much larger today. Most are irrelevant to almost all users, of course. IMatch has a limited standard set of tags that it imports, and the ability to add other tags based on user needs. For details, see IMatch Help System

Re photoshop tags, several of these are referenced in MWG guidance, e.g.

Representation
Information for Date/Time (Original) is available in the following properties:
Original Date/Time – Creation date of the intellectual content (e.g. the photograph),
rather than the creation date of the content being shown
Exif DateTimeOriginal (36867, 0x9003) and SubSecTimeOriginal (37521, 0x9291)
IPTC DateCreated (IIM 2:55, 0x0237) and TimeCreated (IIM 2:60, 0x023C)
XMP (photoshop:DateCreated)
Digitized Date/Time – Creation date of the digital representation
Exif DateTimeDigitized (36868, 0x9004) and SubSecTimeDigitized (37522, 0x9292)
IPTC DigitalCreationDate (IIM 2:62, 0x023E) and DigitalCreationTime (IIM 2:63, 0x023F)
XMP (xmp:CreateDate)
Modification Date/Time – Modification date of the digital image file
Exif DateTime (306, 0x132) and SubSecTime (37520, 0x9290)
XMP (xmp:ModifyDate)

I agree. That’s why I was satisfied with PL4’s handling of metadata and the fact that it passed metadata unchanged from raw to output files. And why I think PL5 should provide options to do the same thing. If DxO wants to provide some metadata management for its users, that’s fine with me as long as it doesn’t interfere with my own metadata management by modifying metadata in output files.

Ah, OK. I had used a more limited option (-G1). I believe the reason both EXIF tags are populated is due to the ExifTool .args files.

Probably.

Re ‘Vacation’ - it’s a valid hierarchical keyword; in this instance, it doesn’t have any subordinate parts, but it’s ready when I want to use Vacation|Mars

But I don’t see that repeating parts of hierarchical keywords provides any actual benefits, even if it’s not inconsistent with MWG. That simply duplicates information already present in the ‘full’ hierarchical keyword. It shouldn’t take any extraordinary programming to to design searches, etc., that would satisfy any foreseeable need. As you said,

Yes. So it seems unnecessarily complicated to distinguish ‘purely standalone’ keywords as you do. A root keyword in lr:hierarchicalSubject' can exist by itself or can become a 'full-fledged' hierarchical keyword as soon as subordinate keywords are added. So 'Vacation' can exist by itself in lr:hierarchicalSubjector can instantly become the roof of 'Vacation|Mars' when the user adds this subordinate information. If you need to describe the hierarchical content of keywords mentioned indc:subjectit would seem more logical to simply look inlr:hierarchicalSubject`. To do otherwise does seem unnecessarily.

But back to DxO - What I’m objecting to is PL5 changing my hierarchical keywords when it creates output files. I think you would agree that PL5 should not do this, whichever interpretation of ‘correct’ hierarchical keyword handling you prefer.

According to the ExifTool site today, there are 25764 tags, with 16462 unique tag names.

Mainly in the context of reconciliation and not something that needs to be or should be propagated.

OK, are you ready for a fight over this? :stuck_out_tongue_winking_eye: :crazy_face: :roll_eyes:

There is a difference between what is true in a dictionary of keywords and hierarchies and what needs to be written to the metadata of a file.

To be (hopefully) clear, if a keyword is not used in a particular context within a given file, there is absolutely no requirement for any “possible” hierarchies to be inferred. If that were the case, every keyword must be considered as potentially hierarchical and that is not the intent of the lr:hierarchicalSubject tag.

It is the job of the dc:subject tag to include all keywords mentioned in lr:hierarchicalSubject but there is no obligation for the inverse case, otherwise one could argue that using dc:subject is redundant.

If I may repeat, and clarify, something I said a while ago…

  1. The dc:subject tag is for searching
  2. The lr:hierarchicalSubject tag is for transmission of hierarchical context where such a context exists for the file in question

If a user has more than one image library, but only one keyword dictionary, and a certain keyword only ever exists in a hierarchical context for one of those libraries, it would not make sense to include irrelevant context in another library that has no requirement for that context.

To go back to my (in)famous example…

Orange
Fruit
Colour
Enterprise
Telecommunications
Satsuma

… are all valid standalone keywords.

But we can also construct them into a couple of simple hierarchies…

Fruit > Orange > Satsuma
Colour > Orange
Enterprise > Telecommunications > Orange

Now I want to search for all orange fruits.

In saying that, do I mean all fruits that are orange in colour, or all varieties of the fruit know as an Orange, or all fruits that are not just varieties of Orange but are orange in colour?

Asking such a question reveals that constructing a search predicate based on these hierarchical contexts involves a lot more effort than constructing such a predicate based on purely the keywords we want to locate and, for searching purposes, hierarchy becomes irrelevant and only the dc:subject tag is required.

Let’s say we want to keyword an image of an Apricot…

  <dc:subject>
   <rdf:Bag>
    <rdf:li>Apricot</rdf:li>
    <rdf:li>Colour</rdf:li>
    <rdf:li>Fruit</rdf:li>
    <rdf:li>Orange</rdf:li>
   </rdf:Bag>
  </dc:subject>
  …
  <lr:hierarchicalSubject>
    <rdf:Bag>
      <rdf:li>Colour</rdf:li>
      <rdf:li>Colour|Orange</rdf:li>
      <rdf:li>Fruit</rdf:li>
      <rdf:li>Fruit|Apricot</rdf:li>
    </rdf:Bag>
  </lr:hierarchicalSubject>

… and an image of a Satsuma would look like…

  <dc:subject>
   <rdf:Bag>
    <rdf:li>Colour</rdf:li>
    <rdf:li>Fruit</rdf:li>
    <rdf:li>Orange</rdf:li>
    <rdf:li>Satsuma</rdf:li>
   </rdf:Bag>
  </dc:subject>
  …
  <lr:hierarchicalSubject>
    <rdf:Bag>
      <rdf:li>Colour</rdf:li>
      <rdf:li>Colour|Orange</rdf:li>
      <rdf:li>Fruit</rdf:li>
      <rdf:li>Fruit|Orange</rdf:li>
      <rdf:li>Fruit|Orange|Satsuma</rdf:li>
    </rdf:Bag>
  </lr:hierarchicalSubject>

… but an image of a Blood Orange would look like…

  <dc:subject>
   <rdf:Bag>
    <rdf:li>Blood Orange</rdf:li>
    <rdf:li>Colour</rdf:li>
    <rdf:li>Fruit</rdf:li>
    <rdf:li>Orange</rdf:li>
    <rdf:li>Red</rdf:li>
   </rdf:Bag>
  </dc:subject>
  …
  <lr:hierarchicalSubject>
    <rdf:Bag>
      <rdf:li>Colour</rdf:li>
      <rdf:li>Colour|Red</rdf:li>
      <rdf:li>Fruit</rdf:li>
      <rdf:li>Fruit|Orange</rdf:li>
      <rdf:li>Fruit|Orange|Blood Orange</rdf:li>
    </rdf:Bag>
  </lr:hierarchicalSubject>

Constructing a predicate to look for images of fruits that are both Oranges in type and colour, based on lr:hierarchicalSubject becomes really difficult.

If we specify

 `Colour|Orange` AND `Fruit|Orange`

… then we preclude both Blood Orange and Apricot

If we specify

 `Colour|Orange` OR `Fruit|Orange`

… then we include everything that is orange in colour, even if they are not fruits

But if we base the predicate on dc:subject all we need is…

 `Colour` AND `Fruit` AND `Orange`

… then we get exactly what we expected.

Imagine the problems you are going to face if you were asked to write a magazine article on the colour Orange and its relevance to the influence of the word in any context, hierarchical or not, how are you going to create a predicate for images that might contain only any of…

Fruit > Orange > Satsuma
Colour > Orange
Enterprise > Telecommunications > Orange
etc, etc

???

If you create a predicate based on dc:subject all you need is the one word.

I can confirm that the ‘old’ PTC issue is gone with PL 5.1.3. Thanks!

The other two are still present.

1 Like

To add a note of levity: What we’ve got here is a failure to communicate

[This discussion is probably in the wrong thread, because the original topic was able Legacy IPTC metadata, but…]

You said

and I agree. If one repeats parts of a ‘shorter example’ hierarchical keyword, that violates SSOT. Using your example:

If a user decides to change (or misspells a word) in ‘National Park’ in hierarchical keywords in this example, they have two places to correct, right? If they only change/correct one, they will now have two different hierarchical keywords. If one only uses the ‘shorter example’ hierarchical keyword, this possibility is avoided.

On the theory that a picture is worth a thousand words, I created your colors/fruits example in IMatch.
Some background:

  • The IMatch UI uses the lr:hierarchicalSubject tag for entry, editing and processing of hierarchical keywords.
  • IMatch also copies the components of the hierarchical keyword into the dc:subject tag, where they’re available for searches, etc.
  • Example: color|orange; fruit|apricot in lr:hierarchicalSubject is copied as orange; color; apricot; fruit into dc:subject

Your examples show up in IMatch as:

If I run your filter color AND fruit AND orange in IMatch, I get the expected results:

IMatch (transparently to the user) handles the details of mapping lr:hierarchicalSubject to dc:subject and the internal details of doing searches and other manipulations of hierarchical keywords. It also thus provides a single point of user access to this information (lr:hierarchicalSubject), thereby providing SSOT. I’m not providing IMatch as a recommended solution (especially to a Mac user), but pointing out an approach that works.

So I think think we generally agree, except with respect to implementation details?

My bigger point, which may have gotten lost in all the words, is that PL5 needs a user option to skip DxO’s DAM and simply replicate PL4’s handling of metadata, i.e., pass it through unchanged from the raw file to the output files. As I said earlier, I’m fine with DxO’s efforts to implement some level of DAM for its users, as long as it doesn’t interfere with people who want to manage metadata on their own. Agree?

By the way, the reason for my original post in this thread was that PL5 was creating Legacy IPTC tags in output files when they didn’t exist in the raw file - this can easily lead to violations of SSOT through duplicate or near-duplicate sets of tags. I’m delighted that DxO has now fixed this issue with 5.1.3.

ps - If you want to add another example of ‘color’ in keywords, there used to be a Ford automobile dealer in Albany NY named Orange Ford; their slogan was ‘What color Orange Ford do you want?’

@jch2103

I understand your requests for a “straight” pass through of metadata except for the fact that in my post PL 5 Keywords Handling: Output Files - #9 by BHAYT I raise the issue that I cannot replicate your evidence in the tests I did earlier today.

I just cleared the IMatch database and reran the addition of keywords to a RAW photo and this time I got yet another variant on possible combinations, i.e.

Added another RAW and assigned the keywords in IMatch but with the ‘Write path elements’ set and got what I expected. Putting both through PL5.1.3 did not produce any extraneous keywords in the ‘hr’ group.

In the test that I did where I stated that I was confused, IMatch inserted the additional keys automatically, I did not enter any flat keys but IMatch did seem to be “second guessing” me as I entered the keywords!?

I’ll take a closer look. In the meantime, I replied to you in a different thread: PL 5 Keywords Handling: Output Files - #10 by jch2103

This should never be possible. Any good DAM should have a dictionary and a separate “manager” for that dictionary. When you add a hierarchical keyword, you should not have to, or be able to, write multiple levels to the hierarchical subject tag.

Take Fruit > Orange > Satsuma as an example

If I choose it…

Capture d’écran 2022-01-28 à 12.16.45

Capture d’écran 2022-01-28 à 12.19.54

Capture d’écran 2022-01-28 à 12.20.17

Capture d’écran 2022-01-28 à 12.20.29

Then what gets written, automatically, is…

<dc:subject>
   <rdf:Bag>
    <rdf:li>Couleur</rdf:li>
    <rdf:li>Fruit</rdf:li>
    <rdf:li>Orange</rdf:li>
    <rdf:li>Satsuma</rdf:li>
   </rdf:Bag>
  </dc:subject>
  …
  <lr:hierarchicalSubject>
    <rdf:Bag>
      <rdf:li>Couleur</rdf:li>
      <rdf:li>Couleur|Orange</rdf:li>
      <rdf:li>Fruit</rdf:li>
      <rdf:li>Fruit|Orange</rdf:li>
      <rdf:li>Fruit|Orange|Satsuma</rdf:li>
    </rdf:Bag>
  </lr:hierarchicalSubject>

Couleur is the French word but let’s change it to the English Colour. I go to the manager pane and select the word…

Capture d’écran 2022-01-28 à 12.25.16

Capture d’écran 2022-01-28 à 12.21.27

For the non-Francophones, that basically says that renaming this keyword will also update any images that contain it. Everywhere that word exists, including in any hierarchies, will also be changed.

If a DAM app allows direct editing of the component parts of the lr:hierarchicalSubject tag, it should be avoided like the plague.

I think the only real difference of opinion is whether to add all levels of a hierarchy or just the single path to the bottom level. As I’ve said before, both are valid, it’s just that I prefer to write all levels because it also allows the definition of intermediate levels, which might suit some DAMs or search engines better.

My IPTC tag are not integrated since the last PL5 update in the exported JPG file, only XMP metadata !!!

Are you talking about the same thing ? Because some of websites CMS or hosting services still use the “old” IPTC" format. There’s no problem to keep both standards side by side so why this regression ?

Please check my original post above: PL5 Adds Legacy (i.e., obsolete) IPTC Metadata

There can be differences between Legacy IPTC and XMP IPTC data that lead to data inconsistencies.

My complaint was about PL5 creating Legacy IPTC tags in output files when they don’t exist in the original raw file. I’d agree that if you have Legacy IPTC data in your raw files, they should be propagated to output files.