Keywords and IPTC

Stenis · November 23, 2022, 10:58pm

Today you don´t have any choises in many softwares. When it comes to metadata editing tools for images these seems to be stuck with IPTC-IIM in their interfaces of historical reasons. You are supposed to enter keywords in the IPTC-Keywords-element and then the automagics populates the XMP.

BHAYT · November 23, 2022, 11:48pm

@Joanna we had these discussions about a year ago and I raised ACDSee, Zoner and other software that do not “understand” ‘hr’ fields and they were being dismissed then as “IPTC legacy” etc. but they exist and will take the “Root|Branch|leaf” as user input and treat it as an acceptable string and the only place that they “can” put it is in the ‘dc’ fields so that is where is goes!

Now ‘hr’ “aware” products can elect to see that as a single string and not try to “read” anything into it and take anything out of it, i.e. it is a string so leave it alone and don’t try to parse it. You didn’t say what would happen if you presented the same sidecar file to your software - it would be interesting to know.

I have written on numerous occasions about the dangers of what DxPL does because there is a situation where the “Root|BranchlLeaf” was put into an image (sidecar) and the next thing that the program sees is “Root”, “Branch” and “Leaf” even worse deleting those keywords in a ‘dc’ only program will leave the ‘hr’ keywords (the actual string that was entered by the program originally) intact, so DxPL will continue to treat the image as having keywords!

I am sorry but I don’t understand this at all!? The data will only stay within the database (and the DOP) if it is not written back to the sidecar file automatically with AS(ON) or manually with AS(OFF)!

But the issue of searchability does not spring from anything other than the fact that DxPL has parsed the data and moved the hierarchical keyword to the ‘hr’ fields and post PL5.2.0 “thrown” all but the leaf (which happens to be “Leaf” in this case) keyword out, that is definitely an error. Unless the searchability you are referring to is the ability to search on the original full string, i.e. on “Root:Branch|Leaf”!

So to see if DxPL is worse than other programs and knowing how you and @platypus absolutely adore my long posts I decided to do some tests sorry @YvPL5!

All software has a decision to take when handling ‘dc’ keywords, i.e. to parse of not to parse.

With respect to hierarchical keywords in the ‘hr’ fields and simple keywords in ‘dc’ and/or ‘hr’ fields then any and all packages have to make up their minds what to do with the following elements

Simple keywords from ‘dc’ fields (None in this particular case)
Hierarchical (structured) keywords from ‘hr’ fields (in this case we have that in the ‘dc’ field which the software should, arguably, ignore
Simple keywords from ‘hr’ fields (None in this particular case)
Hierarchical components (HCs) from the ‘hr’ fields, e.g. “Root|Branch|Leaf” yields 3 HCs namely “Root”, “Branch” and “Leaf”, if the software treats this ‘dc’ field key as an “acceptable” hierarchical keyword and acts as if it was in the correct place, i.e. in the ‘hr’ fields.

and the difference between the packages is how long have they been doing what they are doing (because their users have a lot of photographs keyworded in a particular way), when were they created with respect to the emerging standards, including the “standards” from that most important standards body - Adobe (!?) and how they then choose to put items 1 to 4 together to create the ‘dc’ and ‘hr’ fields!

I conducted two sets of tests,

One where the key was discovered in the ‘dc’ field having been put there by ACDSee
Where I entered the keyword into the software via the UI

Test 1:-
Adobe Bridge (AB) - leaves the ACDSee keyword intact:-

Capture One (C1) - Moves the keyword to ‘hr’ and puts “Leaf” in ‘dc’ (the only result that troubles me because I expected all 3 component keywords of the hierarchical keyword to be present in the ‘dc’ area)

IMatch (IM) with both options selected - moves the keyword to ‘hr’ and fully populates the ‘dc’ with the hierarchical components.

Lightroom (LR) - essentially does what PL5.1.4 used to do and moves the hierarchical keyword to the ‘hr’ area and the hierarchical components to the ‘dc’ area

PhotoLab (PL) 5.1.4 - Did what LR and IM do, as DxPL did before DxO changed things for the worse on PL5.2.0!!

Photo Mechanic (PM) - moves the hierarchical key to the ‘Hr’ area and leave the ‘dc’ area empty

'Photo Supreme (PS)** - same as DxPL with PL5.2.0 and later

So some consistency but not a lot and I am not suggesting that because other software does accept an hierarchical keyword in the ‘dc’ field that it makes DxPL right or wrong but this is the market in which they “compete”.

Test 2:-

Adobe Bridge (AB) - Bridge has options to select all the elements in the hierarchy as does DxPL and so with only the leaf node (“Leaf” selected the result is the same as in Test 1 but if all elements in the hierarchy are selected then you get

 <dc:subject>
    <rdf:Bag>
     <rdf:li>Branch</rdf:li>
     <rdf:li>Root</rdf:li>
     <rdf:li>Leaf</rdf:li>
    </rdf:Bag>
   </dc:subject>
   <lr:hierarchicalSubject>
    <rdf:Bag>
     <rdf:li>Root|Branch</rdf:li>
     <rdf:li>Root</rdf:li>
     <rdf:li>Root|Branch|Leaf</rdf:li>
    </rdf:Bag>
   </lr:hierarchicalSubject>

the same as you get with Capture One when the keyword is entered directly and PL5.1.4 and PL6 when all elements of the hierarchy are selected!

Capture One (C1) - as indicated above you get

  <lightroom:hierarchicalSubject>
    <rdf:Bag>
     <rdf:li>Root</rdf:li>
     <rdf:li>Root|Branch</rdf:li>
     <rdf:li>Root|Branch|leaf</rdf:li>
    </rdf:Bag>
   </lightroom:hierarchicalSubject>
   <dc:subject>
    <rdf:Bag>
     <rdf:li>Root</rdf:li>
     <rdf:li>Branch</rdf:li>
     <rdf:li>leaf</rdf:li>
    </rdf:Bag>
   </dc:subject>

IMatch (IM) with both options selected - same as Test 1

Lightroom (LR) - Same as Test 1

PhotoLab (PL) 5.1.4 - Same as Test 1 and if all elements in the hierarchy are selected

This will give the same results as Capture One, and Bridge etc…

Photo Mechanic (PM) - Same as Test 1

'Photo Supreme (PS)** - Same as Test 1

I agree with this but the palette is there, just not as well organised as it might (should) be and while comparing any other software with your creation might be illuminating and a good example of how it could/should be done, it is essentially academic because it is not available on the Windows platform at all and I suspect there is limited access to the beta on the Mac platform and how well will it interwork with the packages I have looked at above!?

I really don’t understand this?

@YvPL5 Some of the software tested here cost a lot of money so my recommendation is to use DxPL but if possible use the option to select all levels of a hierarchy. Currently this has to be done for every image individually! We have been “promised” an option to make this automatic on release PL6.?.? (and I hope that promise will be kept) because although I own IMatch I find DxPL easier to use!

You will “only” be setting xmp keywords but …

PS @Joanna you keep going on at me about using the “|” symbol but if I enter "Root>Branch>Leaf into IMatch, which allows a choice of symbols to denote hierarchies | , \ / : ; * or @ then I get this

i.e. it has been taken as a string as you keep suggesting with happen to “Root|Branch|Leaf”!?

Joanna · November 24, 2022, 10:32am

Unfortunately, not all software synchronises the XMP tags when IPTC tags are written.

So, @YvPL5 , even if images contain the IPTC:Keywords tag, they will not be readable in PL.

I just wrote something only to the IPTC:Keywords tag using ExifTool…

% exiftool -IPTC:Keywords=IPTC _HLN0032.NEF

Nothing shows up in PL.

The only way to make it visible is to use the following command…

% exiftool "-xmp-dc:subject<$IPTC:Keywords" _HLN0032.NEF

There is no implicit update from one tag to the other, so that will leave you with the problem of having to manually synchronise between the two domains.

And then you will still have the problem of hierarchical keywords, which not only require synchronising xmp-dc:subject, but also lr:hierarchicalSubject at the same time.

@YvPL5 your initial post seems to indicate that you are not trying to read IPTC metadata from existing files but, rather, write it to new files.

Can I ask why? Are you confusing the term IPTC with metadata in general?

No they are not. At least in the most recent versions.

Once again, this is not the case. They are written, first and foremost, to the xmp-dc:subject tag and, if they are hierarchical, to the lr:hierrchicalSubject tag as well.

PL only writes IPTC tags explicitly connected to the fields in the IPTC palette.

Aaaah! This tells me that you are confusing IPTC with metadata in general.

PL writes metadata to:

XMP sidecars in the case of RAW files
the image files themselves if they are not RAW files

In addition, PL writes the metadata to its internal database, but only for indexing and searching purpose within PL.

PL only writes to XMP sidecars for RAW files and, as long as the sidecars as kept alongside the RAW files if/when they are moved, most other software will read the metadata as if it were stored in the image file.

You can choose to update the metadata either automatically or manually. In both cases, either the XMP file will be updated for RAW files or the image file itself will be updated for all other file types.

As I have already explained, PL doesn’t store either keywords or hierarchies in IPTC tags but it does store both of them in appropriate tags in the XMP metadata.

As you can see, this thread digressed significantly, discussing all sorts of esoteric stuff, which has nothing to do with your original question.

Clarifying that you really mean metadata in general and not just IPTC - yes

If you only have PL, then use PL. As @platypus and others will confirm, using more than one metadata manager is the quickest way to a whole world of hurt and anguish.

I personally use my own metadata manager (Mac only) but, as I just mentioned, as long as you only use PL, why not use it instead of spending out and having to learn another manager, which might not be 100% compatible with PL.

Stenis · November 24, 2022, 4:40pm

Joanna Joanna Carter OpticsPro EA member

2 d

My point is that using delimiters in single keywords, which is what you are doing, might “imply” hierarchy but doesn’t get interpreted as such. What you seem to be getting is a list of keywords that happen to contain delimiters.

About standards
There is a Windows standard too to use comma separation in any sort of lists in the US and in Sweden of some reason the standard is semicolon. When exporting hierarchical keywords from Photolab to Photo Mechanic the whole structure ended up also in the IPTC Keyword element with pipe delimiters (so BAYT is not that odd) and that was what indicated for me that I had to use a flat keyword structure instead.

So when I use comma separation in my images it´s because I have realized that not all countries speaks either Swedish or adapts to Swedish standards on the Internet so I and many by me use the English/US standard instead (I refuse the word “American” in these contexts since I see that as an insult to all Latin Americans) because that´s the only thing working throughout the Internet. today despite what the Russians around Putin thinks of that. Russias softpower today is 0.

Joanna · November 24, 2022, 5:56pm

That might be the case for Windows files for other purposes but, for metadata, the delimiters are universal.

The xmp-dc:subject tag should only contain a list of separate words. Any non-text characters in that tag have no significance, apart from as part of each single “word”.

         <dc:subject>
            <rdf:Bag>
               <rdf:li>Root</rdf:li>
               <rdf:li>Branch</rdf:li>
               <rdf:li>Leaf</rdf:li>
               <rdf:li>Solo</rdf:li>
            </rdf:Bag>
         </dc:subject>

You can see clearly that the XMP layout is marked as a “Bag”, which is another word for a list. Each item in that list is marked as “li” (list item). There is absolutely no indication of hierarchy.

If the tag is written like this…

         <dc:subject>
            <rdf:Bag>
               <rdf:li>Root|Branch|Leaf</rdf:li>
               <rdf:li>Solo</rdf:li>
            </rdf:Bag>
         </dc:subject>

… then there are only two “li” (list item) tags. It’s just that the first one happens to contain pipe symbols. Or it could equally well contain any other “delimiter”.

If I edit the XMP sidecar to include…

         <dc:subject>
            <rdf:Bag>
               <rdf:li>Root;Branch;Leaf</rdf:li>
               <rdf:li>Solo</rdf:li>
            </rdf:Bag>
         </dc:subject>

… then PL, correctly, reads that as just two keywords…

Capture d’écran 2022-11-24 à 18.25.44

It is only when certain managers decide to interpret that single list item as something other than just the collection of characters that it contains, that we end up with compatibility problems between managers.

If I want to use any of the five “reserved” characters in an XMP sidecar, you need to "escape them…

	escaped
"	`"`
’	`'`
<	`&lt`;
>	`>`
&	`&`

… so you can do things like this…

            <rdf:Bag>
               <rdf:li>Root &amp; Branch &amp; Leaf</rdf:li>
               <rdf:li>Solo</rdf:li>
            </rdf:Bag>
         </dc:subject>

… to give you…

Capture d’écran 2022-11-24 à 18.28.53

But, if I want to include < or > as directional hierarchy delimiters, since they would normally have to be escaped, I would have to write them as…

         <dc:subject>
            <rdf:Bag>
               <rdf:li>Root&gt;Branch&gt;Leaf</rdf:li>
               <rdf:li>Solo</rdf:li>
            </rdf:Bag>
         </dc:subject>

But, even if I disobey the rules and don’t escape them, PL still reads that list item as…

Capture d’écran 2022-11-24 à 18.45.46

Likewise, commas and semi-colons are not interpreted as delimiters in PL…

Capture d’écran 2022-11-24 à 18.53.10

The only delimiter that PL interprets as hierarchical is the pipe symbol, but I believe that is only the case because someone pushed for it as a “compatibility” feature request, even though it breaks all the rules.

No matter what “interpretation” you try to put on the contents of list items in the xmp-dc:subject tag, what appears between starting and ending labels should only ever be regarded as one single keyword.

The XMP standard states that hierarchical context should only be explicitly stated in the lr:hierarchicalSubject tag.

It is because software authors have chosen to break that standard that we have the unholy mess of incompatibility that is metadata today.

Stenis · November 24, 2022, 7:05pm

I´m not interested at all really in the hierarchical keywords, since there are so many problems with them stil to solve and I shouldn´t be even if there wasn´t since they are so inefficient to use.

You doesn´t seem to take in that in a whole industry like the world-wide museum-world is using the “Subject”-element in XML to something completely else than a common individual uses of peoples personal keywords. The “Subject” element is used not just to common keywords but also to standardized vocabularies that the Subject Cublin Core element is part of together with a couple of other elements. “These are sets of designated terms for populating the IPTC Subject, Genre, and Scene fields, respectively. All versions of the current Controlled Vocabulary Keyword Catalog (CVKC) contain these three sets of terms within the hierarchical catalog, however they should not be applied to the Keyword field. These are designed to be placed into the specific field for which they are named, but are included in the CVKC as this is the easiest way to store them for use.”. Controlled Vocabulary is the key Joanna - a thing you seems to totally neglect.

Of my own experiences of using the XMP-schemas to populate both forms and communication between SQL-databases and XMP-aware DAM-systems I see no problems placing keywords from legacy IPTC-only flat metadata systems to the IPTC namespace of XMP - you see the IPTC namespace is still a part of XMP so there is still in XMP a place for flat keywords to be placed. What you are heading for is a clash between the Keywords in IIM and the XMP that doesn´t at all need to take place. What you are doing is to contamin the Subject element. That will be a problem if you want to use those elements to be used in “Semantiv Web” - scenarios.

It´s not at all as simple as you think because the Dublin Core element Subject is a core element even in the “Semantic Web” and in RDF-standard, it´s not just something you play around with like I have shown you above in my example from Sweden and for the rest of the Cultural Heritage World I think.

YvPL5 · November 24, 2022, 9:13pm

OK, it is clear now. I did think the only way to store keywords in a jpg file was in the IPTC labels but now that you tell me there is another place for that in the file named XMP, it s ok for me. What i did not want is keywords placed in a another file beside.
It seems this is the case with raw files but i don’t care a lot as there will be systemically at least one jpg file for each of my raw files …
Another problem is that a lot of my photos have been tagged in Acdsee and i have found quite a mess in the palette keywords but it seems possible to restore all this…

BHAYT · November 24, 2022, 11:35pm

@YvPL5 Now I understand part of the “confusion” with respect to ‘IPTC Keywords’

the above is from ACDSee Ultimate 2022 (V15.1 Build 2922). I use ACDSee for importing my images from the memory card and have done since version 2.5. Periodically I buy a new licence which I did last year, a little unfortunate because there seems to be some new masking features on offer in the 2023 version!?

Regardless of what ACDSee calls it the keyword data ends up in a xmp sidecar file for RAW images and in the embedded xmp metadata for JPGs. Similar to DxPL except that (at least up to the 2022 version) ACDSee only uses the ‘dc’ fields.

So any hierarchical keywords will be located in there (the ‘dc’ fields) and regardless of “guidelines” all the software I tested and documented above read that and treated it as Hierarchical keywords (with the exception of Adobe Bridge!?)

The result, however, is that the hierarchical keywords will move to the ‘hr’ keywords and the simple keyword components will reside in the ‘dc’ keywords (to a greater or lesser extent, explanation below).

DxPL also shares another “unfortunate” trait with most but not all products in putting simple keyword into the ‘hr’ fields.

So if you have keywords x, y, z, a|m|b entered via ACDSee you will start with

but after DxPL

has read and updated the sidecar it will look like this, but only after any metadata update in DxPL (which could be as “simple” as a ‘Rating’ change)

and in the UI of PL6 it looks like this

whereas in PL5.1.4 it looked like this and may return to that (optionally) if DxO keep their promise!

and if all levels of the hierarchy are selected like this in DxPL

then the sidecar looks like this

but returning to ACDSee it is only going to see this

Hence, the warning about mixing your metadata managers, particularly between products that only work with ‘dc’ fields versus those that work with both ‘dc’ and ‘hr’!!

Before choosing to move to DxPL for metadata management please test the search functions on offer compared to what you are used to with ACDSee!

I may download ADCSee 2023 to see what is on offer but If you want any more help with the transition from ACDSee to PL6 then I have ACDSee 2022 installed alongside PL6 on one machine and alongside PL5.1.4 on another but both are Win 10 systems.

I am not sure what platform you are using Windows or Mac!?

Joanna · November 24, 2022, 11:51pm

Excellent Bryan. That certainly clarifies where that pipe syntax comes from.

And it also reinforces that keywords are actually stored in the XMP tags and not the IPTC section.

Joanna · November 25, 2022, 10:22am

BHAYT:

Joanna:

And the constituent keywords may not be searchable from software other than PhotoLab, which seems to use its own database rather than the XMP sidecar, due to xmp-dc:subject not being completely defined. in the sidecar.

I am sorry but I don’t understand this at all!? The data will only stay within the database (and the DOP) if it is not written back to the sidecar file automatically with AS(ON) or manually with AS(OFF)!

But the issue of searchability does not spring from anything other than the fact that DxPL has parsed the data and moved the hierarchical keyword to the ‘hr’ fields and post PL5.2.0 “thrown” all but the leaf (which happens to be “Leaf” in this case) keyword out, that is definitely an error. Unless the searchability you are referring to is the ability to search on the original full string, i.e. on “Root:Branch|Leaf”!

I forgot to answer this issue.

I use Mac computers and Finder uses the Spotlight indexing and searching mechanism to maintain a system-wide database of useful search information for all sorts of files and even some app data, in the background, all the time the computer is running.

When writing keywords directly to image files (I do that for RAW files as well as others) Spotlight allows me to specify searching explicitly for Keywords and will return all images that contain a (or more than one) specified keyword. This is not a text search, Spotlight uses the macOS metadata framework to parse out all sorts of metadata from files. For example, I can search for all images taken at a specific aperture, speed or ISO, amongst a wide range of other attributes.

But if keywords are only written to the lr:hierarchicalSubject tag, they do not get found.

Strictly, I don’t need any software at all to search for keyworded files, I can just use Spotlight - provided the keywords are marked in the xmp-dc:subject tag.

Of course, using XMP sidecars makes life more difficult because the Spotlight search for such files is purely text based and, having found a matching sidecar, I then have to locate the related image file. Which is where my software, amongst others, handles that linking. But my software uses only the Spotlight database, so whatever changes in whatever file, isn’t reliant on a third-party database to maintain an index of the metadata from the files, for searching purposes.