Win 10 PL5.3.1 - Use Keyword Format Templates instead of just reverting to the pre-PL5.2.0 keyword format

@uncoy or a case for actually how straightforward it is to use them?

When it can be “boiled” down to 6 simple parameters and arguably column 1 can be taken as implied (=A) and the F in column 4 is actually a remnant from my initial analysis and is encompassed within the AC (that it actually only ever exists with anyway)!

So we are down to 4 parameters controlling the formats (I believe/hope) of all the software I was able to test!

Is it really that complicated or is it just that different software adopting similar but not identical keyword formats seems to make it that way!

I believe that if DxO chose to adopt this approach they could be compatible with any package and even with any format that @joanna believes is “ideal”. To be honest it would actually be “simple” to write a program to take the keyword data in and spit it out in any format, until someone proves me wrong that is!?

To be honest the testing was the easy part it has taken many days (not full time) to write, refine, re-write, repeat that sequence to get to the post etc… I don’t mind people finding minor holes I just hope the principle mostly holds water!?

You are absolutely right:
A program is meant to spit out whatever you want or need, depending on input :wink:

MWG et al. set up ways that allow a provider to transfer keywords to a consumer in a structure (bunch of tags). Even though such “ways” (standards) exist, they are not implemented in exactly the same way by everyone and I suppose that they never will.

Your proposal for using keyword format templates that allow a user to make keyword records interoperable with a target app sounds good, but who will a) be able and willing to write such templates and b) what will the interface to such templates be? I think we’d replace one issue with another…

Note to @BHAYT: Please post future proposals here, so that we can vote for them. No need to add win and bug flags and, oh, give us a few lines of what you propose, no need to spread PhotoLab’s innards all over the table, please.

@platypus Thank you, I suspected there was something like that but didn’t know where it was.

But perhaps I like the sight of “innards” - actually I need to understand the innards (as much as is possible from the outside) to be sure I am on the right track and I was always told to show your “workings out”, not least because someone might actually spot where I went wrong.

Although you are partly correct with “out of the frying pan into the fire” the current templates that I have provided would (should) work for all the programs identified, i.e they can be used tomorrow once they have been ratified and once I have my “pound of flesh”, of course.

I would estimate a fairly short amount of development is required to actually modify the current code to pick up a template entry as designated by the user. e.g. #3 or "CO " or "C1 " or “BHT1” or “JO01” or “PL01” etc. and “spit” out the keyword metadata in the desired format.

This is not so much about standards but choice, which can include the strictest adherence to standards or compatibility with the “worst” program and everything in between, as required

All the real work has already been done by DxO, by the look of things it has been functioning since PL3, i.e. the parsing of the image metadata into the database structures and then re-formatting them on output, with PL3 and PL4 only for exports but with PL5 that was extended to outputting to the image metadata as well.

There is design work to be completed to agree how the table should be maintained and, hopefully, accessible to the users (please see the “For what “price” can this change be achieved”:- item in the original post) but this is way easier than much of the coding DxPL already contains!

The difference (if it can be implemented, which I believe it can) is transformative both in what it does to the keyword data format (literally) and what it does to the product and what is does for those users who want a “compatible” keyword handler as part of their “favourite” editor.

Very well put. I know I seem somewhat OCD about the MWG Guidance document but I thought, if there were a guiding standard, that should be it.

However, since even Adobe, who helped draw up MWG, can’t be bothered to stick to it, then the idea of “brew your own” for compatibility seems an eminently sensible idea.

Not forgetting that hierarchical keywords are only normally used for transmission of such structures from one DAM to another, never for use by image libraries and agencies - all they care about is the dc:subject tag. Which means that, if all words from any hierarchies are not included, an author’s images are going to be difficult to find.

@Joanna and this shows which will or will not be “friendly” to this principle

Update: i.e. those with an “A” (ALL) in column 3 will contain all component keywords from an hierarchical keyword.

The “yellow” rows are non-hierarchical packages and if they ever contain hierarchical keywords they will be in the ‘dc’ fields. Typically they will be “stolen” and placed in the ‘hr’ fields but to maintain compatibility with the package DxPL would need to put any hierarchical keyword back into the ‘dc’ field as per the template!

But my proposal is for the format templates to be attached to export profiles so we can have one for “return” to or compatibility with ACDSee and one that meets the standards e.g. one designated AC and one designated e.g. JO01 or BHT1 etc.!

I do hope DxO realises the potential of this approach @Musashi!

The interesting thing would be to see what the “unfriendly” programs make of format #3 (Capture One and PL5 both releases with ALL assigned).

But the “proposal” is compatible with “good”, “bad” or “ugly” and flexible enough to take “excellent” etc. as well, this should be an inclusive design as much as possible!

I’m not sure about that, supposing that they use many other tags from IPTC’s set. CapOne provides a separate set for one of the larger providers, just to mention one thing.

Back to templates: Imagine having to support the templates, every little change that e.g. Adobe comes up with, would need to be dealt with quickly in order to not be buried in complaints. Multiply this by the number of apps out there and load will grow accordingly. Add issues like duplicate “Title” tags and and and…but from a technical point of view, templates might be a viable option.

@platypus the templates are only used for keyword metadata not the rest of the metadata and if there are major changes caused by changing/evolving standards then DxO will need to keep abreast of them with respect to the input process, so adding or amending an existing template is hardly going to be a major item.

In truth if the various packages start changing keyword layouts on a regular basis their users will be as vocal as they were with DxO, it is not something that is going to happen on a routine basis; DxO has continued with the same format since PL3 and would still be using it if they hadn’t over-reacted!

Currently some users are using a “DAM sandwich” to “re-align” the keyword data of exported images, wouldn’t it be better if that data simply didn’t need any “adjustment”.

The proposed scheme applies not only to the keyword data taken from the image but also any keyword data added in DxPL, in fact if something as simple as a ‘Rating’ is changed in PL5 the user currently won’t want to write that back to the image for fear of “damaging” the keyword format!

The use of the templates, providing they work as I believe they will, removes barriers to making DxPL a fully integrated element of the work flow; the benefits far outweigh any negatives by a long, long way!

Thank you very much for your deep analysis and comparative study with other applications. This will be helpful for us.

So if we well understand, you want:
1/ to control if all hierarchical levels are written in DC subject
2/ to control if the KW search uses all hierarchical levels to return results
3/ to control if all hierarchical levels should be assigned when assigning a KW from PL

That’s all what we’ve planned to introduce in a next PL6.X version

Do we miss a use case here ?

In parallel, I just want to recall that our team has plans to improve the KW and Database management, this takes time and is done step by step (as the 3 previous proposed solutions). As already stated by @CaptainPO in the previous post, we do put efforts in this part of the app but cannot answer to all requests.

Best regards

@musashi the answer is yes and no, yes these are the items that you identified in your response and yes they were the items that we responded to but no that is not what I am proposing here,

Item 1 returns DxPL to the pre-PL5.2.0 formatting rules that have been in use since PL3 as far as I can tell. At the time of the closing of that post I had not completed the work shown here and what was proposed was better than what is currently available by a long way!

But item 1 can and should, in my opinion, be improved upon using the keyword format tables (templates) described in this topic.

However, in spite of the copious complaints about PL5 damaging keywords etc. from users this topic and Keyword Format Templates - A more flexible way of working with keywords in DxPL have almost zero comments from any of those complainants so I can only think that they are all completely happy with what happens with DxPL keywording!?

In the meantime I have chosen to try my hand at coding, the first time since 2009, and have started writing my own keyword (re-)formatting utility in Python to make use of the work that I did!?

While I suggested incorporating the table into the database and @platypus “hinted” that selecting formats from a drop down table would be good (I considered that I had asked for a lot (maybe a few days of coding) 2022-07-26_104223_Original versus Format Template pseudo-code_W.pdf (4.9 MB)
and I left drop-down lists out of the request, nice though they would be!?

The table could be embedded in DxPL initially, as shown above from my Python utility, and eventually opened up to users for additional templates e.g. “LIB1 8A-A----” which would “flatten” hierarchical keywords into the ‘dc’ fields for Library use etc.

However, with Item 3 in the above list implemented I will be able to assign “all” easily (rather than photo by photo) which produces the same keyword layout as Capture One, with both the pre and post PL5.2.0 formats, according to my table, in the meantime I will be able to use my own utility!

@Musashi and @CaptainPO what I offered here was something no other software offers but …

This morning, I tested a hierarchical keywords (again) and found that both Capture One and Lightroom

  • include the complete hierarchy flattened under the dc:subject tag, no matter if I add the complete hierarchical path or just the leaf keyword
  • include, in the hierarchical tag, one line per keyword that is shown in the keyword field, e.g.
    root|branch|leaf…if only the leaf is visible in the keyword field of the GUI
    or
    root
    root|branch
    root|branch|leaf…if all levels are shown.

This seems to be “industry standard” notation and I propose that DPL should go along with that.

1 Like

Summary:- @platypus if that is all you want then you have it already with the pre PL5.2.0 release with all keywords in a hierarchy assigned. With Post PL5.2.0 you are “light” all but the leaf in the 'dc’fields.

With the proposed PL6 release the feature will be fully available alongside everything else in that release (and in the PL5 releases from PL5.2.0 onwards!

@platypus from my table I did not see that in my tests of LightRoom but did for Capture One, i.e. my tests of LightRoom did not invoke the feature meaning that there is more than one option in Lightroom, one that conforms to PL5.1.4 and another than conforms to Capture One and PL5.1.4 with all keywords assigned. So thank you for adding to the table, a snapshot of how you achieved that would be useful.

But the whole point of my “design” is that you are doing to hierarchical keywords what PL5.1.4 did to ‘dc’ keywords and you are not matching the users system in any way, shape or form. Why is that any better than anything else, there will be groans that there are “too” many hierarchical keys cluttering up their keywords etc. etc,?

In addition it actually requires nothing other than the option to return to the PL5.1.4 layout to populate the ‘dc’ keys, hence, this populating of the ‘hr’ fields is currently achievable with all versions of DxPL5 with all items assigned (but with PL5.2.0 onwards the ‘dc’ keys do not conform) on Windows in particular this is a chore that must be repeated for every image until the changes highlighted by @Musashi are implemented!

Had DxO not changed the format it would be correct right now!

If the table I have suggested is embedded in DxPL with additional entries for Photo Supreme and the alternative for Lightroom that you have described, and any other possibilities that we discover’ then I believe that we are talking about days of work! I have retained the identity of the package rather than distilled the formats down to #1 to #7 because users are more likely to be comfortable using the id. of their own package.

The output you are proposing would fit with your use of Capture One and the PL5.1.4 format would fit with the Lightroom format I encountered but how is a Photo Mechanic user going to react to more ‘dc’ entries and more ‘hr’ entries, given the reaction to this topic there will be no reaction whatsoever, but I don’t believe that for one minute and what happens with ACDSee users I dread to think!

EDIT:-

To get Lightroom to include ALL Combinations is achieved in much the same way as DxPL, i.e. by assigning from the keyword tree, when I was testing I was looking for options in the ‘preferences’, i.e. the “wrong” place!

2022-07-29_183504_
2022-07-29_183523_

and I still don’t believe that “one size fits all” is the right approach when a fully “tailored” solution is a bit of coding away, why swap one problem for another when a better solution is within reach!?

as far as I’ve seen in my tests, we have to consider four aspects.

  1. how keywords are embedded in XMP
  2. what that embedding does to finding assets
  3. how apps add keywords (user->gui->xmp)
  4. how apps display keywords (xmp->gui)

CapOne does the thing I find most logical: When I add crow in the keywords panel, C1 automatically adds bird and animal. If I wanted to remove animal and bird, I’d have to act in the keyword list.

If I want DPL and LrC to behave like C1, I have to enter keywords using the keyword list panel, because adding crow in the keyword panel will not automatically add bird and animal.

The difference of what is displayed in the keyword panel (on import of updated metadata) is based on what is in the hierarchical subject tag and not based on the dc:subject tag, provided all levels turn up, which was always the case when keywords are added using the keyword list or by C1 respectively.

Other keyword managers might do different things, so some testing would be required by someone having those apps…

@platypus let’s start at the end and work backwards! With respect to the layout of keywords in the ‘dc’ fields and ‘hr’ fields there is no real mystery as you seem to imply!? The formats that I determined are as shown in the tables and I will add" Lightroom ALL assigned" to the spreadsheet, thank you for that.

You seem intent on casting doubt on what is actually a very straightforward process.

I originally stated that I might have missed options and that involving other users was essential but the “rules” that I have provided will work with the packages and options I have tested and “Lightroom ALL assigned” falls into category #3, I tested it a little earlier.

The feature I am suggesting does nothing about improving DxPL keyword entry, it is “simply” about transforming what would have been output into a slightly different format. If the capability existed in DxPL or any of the other packages we would be wondering what all the “fuss” was about, but it doesn’t, either you have to accept what the package provides or you…!?

and what is this statement supposed to “teach” us!? Of course it is to do with the hierarchical keywords, except that @Joanna would suggest that if the correct data is not in the ‘dc’ fields then searching is not going to work. This is the part that the format templates are designed to handle, basically stopping DxPL from using it’s own format but to adopt one that “coincides” with another package or one that achieves something desired by the user.

For example, I don’t like the automatic transfer of simple keys to the ‘hr’ fields but all the packages, except Photo Mechanic, do just that, but with a format of ‘A-AAF-C’ (which should really be ‘A-AA–C’) then I would get all the features of Capture One and other #3 formats but without the ‘dc’ fields being copied to the ‘hr’ fields!

The use of the #1 formats might break some “rules” if hierarchical keywords wind up in the ‘dc’ fields but at least ACDSee etc. would be able to work with them!

With a format of ‘A-A—’ this would convert an hierarchical set of keys to a flat set of simple keys more suited to museums and libraries etc.

While I am glad that Capture One helps you complete your keyword correctly the feature proposed by @Musashi will automatically assign all keywords in an hierarchy rather than just the ‘Leaf’ or ‘Last’ item or allow you to miss a selection in the hierarchy.

However the “C” (shortened to “C” from “AC” in the spreadsheet) in column 6 of the table will automatically do the same thing!

PS:-

@platypus I don’t mind a critique if it actually adds anything to the story and if you had made the following point then I would know that you understand what I have written but it is up to me to make it instead but then it has only just occurred to me! I have been concentrating on the exports and I do not believe that there is an issue with them!

But I have also proposed this for the ‘Write to image’ and therein lies a “problem”. DxPL will potentially be using a transform to reformat the data going to the image which will (potentially) no longer match the database (or will it, mostly an issue with AC?) and the DOP, i.e. there is a possible need to read the reformatted image keyword data after it has been written to keep the image and database (and DOP) in line.

This poses another potential problem which I need to review on Sunday, what happens if reformatted data is passed through the reformatting process again (I don’t think there is a problem but I need to look at that more closely)!

I’m sorry Bryan but all your spreadsheets do is to confuse me.

I think I have discovered something that might “throw the cat among the pigeons”.

I start by using my app to create an XMP file for an image, which then contains…

   <dc:subject>
    <rdf:Bag>
     <rdf:li>Animal</rdf:li>
     <rdf:li>Mammal</rdf:li>
     <rdf:li>Bear</rdf:li>
     <rdf:li>Black Bear</rdf:li>
    </rdf:Bag>
   </dc:subject>
   <lr:hierarchicalSubject>
    <rdf:Bag>
     <rdf:li>Animal</rdf:li>
     <rdf:li>Animal|Mammal</rdf:li>
     <rdf:li>Animal|Mammal|Bear</rdf:li>
     <rdf:li>Animal|Mammal|Bear|Black Bear</rdf:li>
    </rdf:Bag>
   </lr:hierarchicalSubject>

I then manually edit that sidecar to remove the dc:subject tag (as I see in your table PhotoMechanic can write this)

   <lr:hierarchicalSubject>
    <rdf:Bag>
     <rdf:li>Animal</rdf:li>
     <rdf:li>Animal|Mammal</rdf:li>
     <rdf:li>Animal|Mammal|Bear</rdf:li>
     <rdf:li>Animal|Mammal|Bear|Black Bear</rdf:li>
    </rdf:Bag>
   </lr:hierarchicalSubject>

Next, I open CaptureOne 12, create a session and add just that image. This has the side effect of writing an XMP sidecar for every image in the same folder, which is very annoying.

But, not only does it do that, it also rewrites the dc:subject tag that I had removed back again…

   <dc:subject>
    <rdf:Bag>
     <rdf:li>Animal</rdf:li>
     <rdf:li>Mammal</rdf:li>
     <rdf:li>Bear</rdf:li>
     <rdf:li>Black Bear</rdf:li>
    </rdf:Bag>
   </dc:subject>
   <lightroom:hierarchicalSubject>
    <rdf:Bag>
     <rdf:li>Animal</rdf:li>
     <rdf:li>Animal|Mammal</rdf:li>
     <rdf:li>Animal|Mammal|Bear</rdf:li>
     <rdf:li>Animal|Mammal|Bear|Black Bear</rdf:li>
    </rdf:Bag>
   </lightroom:hierarchicalSubject>

If I strip the sidecar down to just…

   <lr:hierarchicalSubject>
    <rdf:Bag>
     <rdf:li>Animal|Mammal|Bear|Black Bear</rdf:li>
    </rdf:Bag>
   </lr:hierarchicalSubject>

… then CaptureOne “updates” this to…

   <dc:subject>
    <rdf:Bag>
     <rdf:li>Black Bear</rdf:li>
    </rdf:Bag>
   </dc:subject>
   …
   <lightroom:hierarchicalSubject>
    <rdf:Bag>
     <rdf:li>Animal|Mammal|Bear|Black Bear</rdf:li>
    </rdf:Bag>
   </lightroom:hierarchicalSubject>

So, it seems that CaptureOne takes it upon itself to “normalise” keyword metadata if it deems it is non-standard.

The only problem with the second example is that the dc:subject tag doesn’t contain all of the flattened hierarchical tags.

What are your thoughts on that?

@Joanna
In my recent tests, I found the following:

As long as there are entries in the hierarchical subject tags, apps can (and will) add entries in dc:subject as you discovered. BTW, Lightroom does it too.

Here’s an archive of XMP files showing how keywords are written by Lightroom with an example hierarchy of AAAnimal>BBird>Crow (silly notation to make it easily visible in my keyword list) and combinations of keywords saved, you’ll find what was added by the respective names of the files. Note that I’ve relieved the files from anything not related to keywords.

Archiv.zip (5.3 KB)

You’ve found a “feature” in Capture one!

The original formats (templates) that I “discovered” were all from the use case of entering the data directly into the package and forcing the package to write that data back to the image (i.e. create an xmp sidecar for RAW).

The tests were repeated multiple times even when I was using the extended syntax of animals|mammals|bear|black bear and then yet again when I used the abbreviated syntax of as|ms|b|bb.

Throughout those tests, the original tests used 4 JPGs and 4 RAWS, the results were always consistent; so many tests because I was reluctant to “publish” anything that might be “flawed” or too badly “flawed”.

I was also concerned about my lack of experience with the various packages and certainly missed the Lightroom All assigned situation that @platypus found!

But the results were consistent and appear to work for the principle of matching the output that is equivalent to the given use case, i.e. the creation of keywords afresh by entering the data directly into each package.

So I believe that all is O.K. for using the templates to create exported files that match the outputs that the respective packages would create if the data was entered into those packages, via their own UI.

My intention was to keep users “happy” because their keyword layout was intact in the exports, I was not trying to create an emulator for the other packages, with respect to keyword handling!

It is rather sad that Capture One doesn’t seem to stick to to own “rules” when it encounters the situation where there is an hierarchical field that has no data in the ‘dc’ field. I repeated a number of tests and Capture One mostly but not completely leaves the originals alone, i.e. it adds “bb” to the ‘dc’ fields.

Adding an “x” does nothing surprising and is added to the ‘dc’ and ‘hr’ fields as nearly all the packages do, but which Capture One didn’t do when it decided to “liberate” “bb” from the hierarchical keyword, if it was following any rules shouldn’t the “bb” also have wound up in the ‘hr’ fields!

Deleting and re-inputting the “as|ms|b|bb” keyword into C1 results in the return of the original format!

Similar “anomalies” in behaviour might exist with other packages when confronted by data that has been externally generated. In fact on Friday I put my development of a Python “converter” to one side and started to look at the development of an xmp sidecar generator.

The “generator” would take any keyword combination and generate a “labelled” xmp sidecar file for each of the formats I have currently documented. These could then be associated with any RAW image to undertake the kind of test you undertook using manual intervention, in order to speed up the testing process! I need part of that code for the convertor anyway!

My coding skills are still raw (pun intended) but now that I have abandoned the IDEs for a combination of Hippo Edit and IDLE I am not trying to fix errors flagged by the IDE’s that are perfectly acceptable to Python!!??

That brings me to my biggest concern, or rather biggest concern after the complete indifference on most of the users who complained in the first place, namely using this technique to format the metadata written back to the image.

With AS(OFF) such a ‘Write to image’ would have to be user initiated. The existing DOP and database would continue to generate the same formatted keywords without being updated but if a KFT (keyword Format Template) formatted ‘Write to Image’ is made then that should be followed either by an ‘S’ icon or by an automatic ‘Read from Image’ as appropriate and now we have the situation of a changed database structure in-line with the KFT “rules” then being subjected to the KFT rules again in any subsequent export or ‘Write to image’, which bears a slight resemblance to the Capture One case you cited @Joanna.

Arguably users would only ever need to write back to the image if they had used DxPL to make metadata changes but even a simple ‘Rating’ change that the user wanted to keep will currently change the format of the image keywords, the whole crux of the problem (or so I thought)!

So to summarise

  1. I don’t think this “sinks” the KFT “model” for exports at all.
  2. I am concerned about the lack of consistency with Capture One and what it might tell us about the reliability of software to have rules that they consistently use.
  3. The issue of exporting using KFT formatting back to the image and the implications on the process either immediately or when such images are “discovered” anew by DxPL (I think I am overthinking the dangers but …)

Once again @Joanna and @platypus thank you for your testing, I still believe that KFT “has legs” and do intend to complete the sidecar generator and the format convertor because they will be useful tools should I ever bother to test keywords again and getting back into coding is interesting, but me and IDEs don’t seem compatible!?

The second example reflects a situation, in which you have added Black Bear and then removed its forebears

The hierarchical keywords structure contains the relevant information about keywords and their relations, dc:subject seems irrelevant - at least as far as keyword display is concerned in the apps we tested with. Maybe we need to take on a new view?

Nope, my four bears are still there…

It might be all well and good for those apps which use the hierarchy to build the display but, it really doesn’t help with those apps which rely on the the dc:subject tag for searching - like macOS Spotlight, which knows nothing about lr:hierarchicalSubject.

…then it’s welcome that C1, Lr etc. rebuild the dc:subject entries.

I’ve not yet come across an app that does not write dc:subject and modifying it “externally” is not a way to manage that tag regularly imo.

Making sure that animal>mammal>bear is added together should provide the best possible foundation for exchange of keywords between apps…which means that I’d have to enter keywords from the keyword lists, rather than typing them in, which does the fourbears in DPL and LrC.

@Joanna, @platypus the Capture One “interpretation” of the Photo Mechanic output appears to show C1 avoiding any radical changes to the original layout but adding the “leaf” keyword to allow for searching generally or because it needs that for its own internal management.

I think my concerns are more philosophical than real with respect to keeping DxPL in line with any external reformatting because if we changed the 'Write to image" to C1 format then it is the equivalent of one of the DxPL formats and would show up as All assigned in DxPL any way.

If using a “sparse” layout like the PM layout then the output would be restricted but the additional keywords are contained in the component keywords in the hierarchical keywords anyway and the format templates indicate how those can be translated.

Hence with a “sparse layout”, DxPL can fill in the “gaps” using the hierarchical keyword components and remove excess duplicate keywords using de-duplication logic to trim the more voluminous layouts, that could become even more voluminous when they are then passed through a some of the templates!

Hence I think it is better to update DxPL with the new keyword layout when it has been written to the image to ensure that DxPL is tracking what is on disk but it would probably have been useful from the beginning to allow the original image data to be (optionally) preserved (always a useful feature when implementing something new!?).

Plus @Musashi can we have a feature to enable the “standard” launch background to be replaced by a user image, I was bored with the current one before we even finished Beta testing, e.g. @Joanna’s fourbears or

More than four bears plus associated “friends” and their fans! My old teddy and my wife’s are somewhere in the loft, Fredbear and threadbear!