PL5 Searching doesn't work properly for all keywords in an hierarchy - only for "Selected" keywords

BHAYT · June 20, 2022, 8:55am

During discussions with @Joanna it emerged that searches for items in an hierarchy didn’t necessarily find the “expected” keywords in the hierarchy and I think this is happening because of a “bug” as follows @sgospodarenko.

In a previous post I had suggested to @Joanna that searching Keywords should be simple with PL5 but she responded that was simply not the case because it didn’t seem to work correctly and I do not understand exactly what DxO think that they are doing because when I said it was simple to do the searching it was based on the database structure(s) available to execute such a search so for scenario 3 (“animals|mammals|bear|black bear”) we have

2022-06-19_110821_PL5db Keywords available to search

where it is clear that all the keywords are available for searching and the hierarchical keys can be recreated using the ‘ParentId’.

The ItemsKeywords structure is effectively the link back to the image(s) (‘Items’) using ‘ItemId’ that contain the keywords and a forward link from the ‘Items’ to their keywords using ‘KeywordId’.

2022-06-19_110931_PL5db ItemsKeywords available to limit a search

But actually it isn’t and therein lies the reason why the searches don’t always work!

I tested the searching and in this particular test where “animals” is checked, “bear” is checked along with “black bear” but “mammals” is left unchecked and the searches return the following responses

2022-06-19_111217_PL5 search for black
2022-06-19_111307_PL5 search for bear
2022-06-19_111357_PL5 search for mammals
2022-06-19_111452_PL5 search for animals

Although a search would find “mammals” in ‘Keywords’ (Id=7) there is no entry in ‘ItemsKeywords’ to tie it back to an ‘Items’ image (no ‘KeywordId’ = 7 exists)! I believe the problem is as “subtle” as that the ‘ItemsKeywords’ structure is misnamed it should be ‘ItemsKeywordsSelected’. i.e. it does not contain all the keywords associated with an image but only those that have the check box marked.

So for scenario 3 the hierarchy is re-established for display and output using the ‘Keywords’ structure “black bear” (9) < “bear” (8) < “mammals” (7) < “animals” (6) and the qualifying data for the “selected” (checked) items comes from ‘ItemsKeywords’ where Image 3 is related to ‘Keywords’ entry 9 (“black bear”), 8 (“bear”) and 6 (“animals”) but NOT 7 (“mammals”), hence 7 (“mammals” is not found in a search).

Hence, whenever a search is made for a keyword that is not a “selected” it is not present in ‘ItemsKeywords’ and the search terminates as “NOT Found”.

The simplest way to have resolved this problem was to have an entry for every keyword for each image and carry a flag in the record that identifies that the item is “selected”, the searches would then work (I believe) and the checkboxes can be correctly reconstructed for display and the items included for output!

Either the structure is being “overloaded”, i.e. pressed into service for a purpose it was not intended (searching) or someone thought it could be used for both purposes but that is simply not true in the way that it is currently designed. Unfortunately getting from what is currently in the database to what is required needs migration software as well as a fix.

EDIT 01:- While explaining this issue for my wife at breakfast, we met when I was assigned to help a customer build their Mortgage and Investment systems and my “wife to be” was their Database Administrator, she pointed out that it would be good if all items were selected which does happen when “bear” in the hierarchy is selected but I feel that to be included in the search it is wrong to only include “selected” items!

Either the structure needs to have a new field added or it could be handled by creating another structure, the “real” ‘ItemsKeywords’, e.g. ‘ItemsKeywordsAll’ which would then be used as part of the search but the first time it is created it would need to be populated using the “Leaf” pointers from ‘Keywords’ to ‘ItemsKeywords’, arguably the only pointer currently “guaranteed” to exist and then create an entry for every keyword located using the 'ParentId’s in ‘Keywords’

BHAYT · June 20, 2022, 10:57am

Placeholder to allow post stats to be monitored and for anything I think of pertinent to the topic!

platypus · June 20, 2022, 1:23pm

Tried to test the behaviour of DPL 5.1.3.55 on macOS Monterey on iMac 2019 and got the following:

Test 1

added the “animal > mammal > bear > black bear” hierarchical keywords
one image got “black bear”
one image got “black bear” + “bear”
one image got “black bear” + “bear” + "mammal*
one image got “black bear” + “bear” + "mammal* + “animal”

Search then gave me 4 entries each for the four keywords

When I added the keywords to additional images, the search results would not update. Search results were updated when I restarted DPL.

Test 2

added “insect > bug > junebug” below “animal”
added the new keywords to a few images and restarted DPL

This is what I got:

recent searches remember 5 searches
the number of search results (pink box) follows the logic above (8 animals, 4 mammals etc.)
what the heck are the numbers left of the keywords in the tree (green box) ?

I’ll now delete the database and ignore keywords for a while…

For those interested: XMP sidecars at the end of the test:
XMP.zip (7.3 KB)

Joanna · June 20, 2022, 2:01pm

I had not looked at the database from this angle before because I regularly trash it and rely on either metadata embedded directly in RAW files or XMP sidecars.

But what you point out is a serious flaw, which, if not corrected, will prevent the search mechanism from doing anything other than the simplest of searches. It is already pretty limited, only being able to create predicates for keywords in precise contexts and, then, only being able to AND multiple sub-predicates rather than being able to search for wider contexts and use OR predicates.

What you have noticed in the database is pretty bad but, from an XMP, cross-application, compatibility point of view, it can be a total disaster.

I say “can be” because, if a user ensures that all the parents of a selected keyword are checked, as well as the desired keyword, everything works absolutely fine and I haven’t found any compatibility problems reading and searching for PL5 written XMP in other apps.

However, if user tries to remove any of the parents of a selected keywords, the whole situation changes and reading and searching compatibility with other apps is totally “screwed”.

If I take the case of someone only selecting the leaf node of a hierarchy…

Capture d’écran 2022-06-20 à 15.18.46

… everything might seem fine, but only as long as you only use PL5 to read or search.

Using macOS Finder’s powerful search on the folder that contains the above marked file, it can certainly find Black Bear…

Capture d’écran 2022-06-20 à 15.36.00

… but, because Black Bear is also the name of an excellent craft beer, I want to ensure I only find Black Bears that are Mammals but not Beers. So, I add Mammal into a combined AND predicate…

Capture d’écran 2022-06-20 à 15.37.11

Lo and behold, macOS powerful Spotlight search mechanism is totally defeated.

And, if I take the simple example of trying to find all images that contain Mammal, regardless of their species…

Capture d’écran 2022-06-20 à 15.37.39

… once again Spotlight fails to find our file. Why? because the only metadata tag that most software relies on for searching for keywords is dc:subject.

If we take a look at the XMP sidecar for this file, it instantly becomes apparent why Finder and many other apps can’t find anything other than the leaf node…

         <dc:subject>
            <rdf:Bag>
               <rdf:li>Black Bear</rdf:li>
            </rdf:Bag>
         </dc:subject>
         …
         <lr:hierarchicalSubject>
            <rdf:Bag>
               <rdf:li>Animal|Mammal|Bear|Black Bear</rdf:li>
            </rdf:Bag>
         </lr:hierarchicalSubject>

In complete violation of the MWG Guidance document, the constituent parts of the lr:hierarchicalSubject tag are not written to the dc:subject, as required - effectively blocking everything apart from PL5 from searching for anything other than the leaf node keyword, since lr:hierarchicalSubject is not normally searchable.

Is it any wonder I only use my own keywording app and regularly delete the PL database?

As for hoping to include meaningful keyword metadata to an image file exported from PL5 - dream on. ExifTool reveals that such files are equally as unsearchable…

[XMP]           Subject                         : Black Bear
[XMP]           Hierarchical Subject            : Animal|Mammal|Bear|Black Bear

It would be really nice if someone from DxO would comment on all this work that Bryan (@BHAYT) and I have done on this subject?

BHAYT · June 20, 2022, 2:05pm

@platypus the answer lies in the selection boxes beside the keywords

When the hierarchical keyword “animals|mammals|bear|black bear” or “animals>mammals>bear>black bear” or “black bear<bear<mammals<animals”, all three syntaxes are accepted by Win 10 PL5, then the entry you get in the table is as shown with only the “leaf” "ticked. You can move up the tick boxes and if I tick “bear” then all above will also be selected but by default what I have shown is what I get and only “black bear” will be found because only “black bear” has been selected and will get an entry in ‘ItemsKeywords’.

Why all your entries are selected I do not know? But as it comes out of the box with no further input from me I will get one ‘ItemsKeywords’ entry and nothing in my search for “animals”, “mammals” or “bear”.

If on the MAC all items are selected automatically then the system is working differently from Win10 but if you uncheck one of those items and suddenly don’t get a search response then the underlying software is the same and the same “bug”/“feature” exists

Your first xmp shows, which is what was produced by Win 10 PL5 prior to PL5.2.0

     <dc:subject>
        <rdf:Bag>
           <rdf:li>animal</rdf:li>
           <rdf:li>bear</rdf:li>
           <rdf:li>black bear</rdf:li>
           <rdf:li>mammal</rdf:li>
        </rdf:Bag>
     </dc:subject>
     <exif:DateTimeOriginal>2007-10-24T16:51:31</exif:DateTimeOriginal>
     <lr:hierarchicalSubject>
        <rdf:Bag>
           <rdf:li>animal|mammal|bear|black bear</rdf:li>
        </rdf:Bag>
     </lr:hierarchicalSubject>

Since PL5.2.0 the ‘dc’ keywords have been reduced to

     <dc:subject>
        <rdf:Bag>
           <rdf:li>black bear</rdf:li>
        </rdf:Bag>
     </dc:subject>
     <dc:format>image/x-panasonic-rw2</dc:format>
     <exif:DateTimeOriginal>2022-05-30T16:54:28.885+00:00</exif:DateTimeOriginal>
     <lr:hierarchicalSubject>
        <rdf:Bag>
           <rdf:li>animal|mammal|bear|black bear</rdf:li>
        </rdf:Bag>
     </lr:hierarchicalSubject>

taken from my final test of you scenarios. To get the rest of the ‘hr’ keys that you show then the tick box needs to be selected and then the search will find the keywords (because they are “selected”)

Edit:- I cannot get both sets of data to look the same no matter what I try, hmmm - fixed!

PS I can’t remember if it was PL5.2.0 or 5.2.1 that made the transition to the reduced ‘dc’ keys @Marie can you please refresh my aging memory.

Joanna · June 20, 2022, 2:22pm

This is acceptable and correct. I obviously should have kept an older copy of PL5 to compare my current (5.3) results to.

As I have said countless times before, if dc:subject doesn’t contain all keywords mentioned in the hierarchy, nothing outside of PL5 will be able to search for anything other than the leaf node of a hierarchy.

Whichever version it was, it needs putting back PDQ!

Joanna · June 20, 2022, 2:30pm

P.S. My app doesn’t have a dedicated database - instead it uses the built-in macOS Spotlight metadata database, that is running all the time, being constantly updated as soon as anything changes in a file.

As a result, both my app and macOS Finder have no trouble at all finding anything I ever wrote using my app - in any combination of keywords or hierarchies.

When it comes to using a dedicated, private, database for PL5 for Mac, the expression re-inventing the wheel springs to mind. Or doesn’t Windows have such a mechanism that could be leveraged?

BHAYT · June 20, 2022, 3:06pm

Absolutely, or made optional and while we are at it please create a metadata preferences section @Musashi and add the options for “short” or “long” ‘dc’ and for auto selection of all elements of the hierarchy to give the mega detailed ‘hr’ entries that @platypus wound up with (not included in my snapshot above)which come close to the “ideal” that Capture One creates!? If you are worried about changing the ‘Preferences’ attach as the last item in the ‘Files’/‘Metadata’ menu item, i.e. ‘Read…’, ‘Write …’. and then ‘Options’.

As far as I know there is no OS equivalent in Win10 so its an SQLLite database or …

Arguably, even if there was such an equivalent it would drive the two versions even further apart because they would be so different! In truth it doesn’t really matter because SQLLite is a perfectly good system but only as good as the designers and coders who use it. I actually don’t feel that DxO have made a particularly bad job except for the precipitous action taken with the respect to the changes in ‘dc’ keywording and the reversion to the DOP processing, both are fine by me but only via options that preserve the previous methods while allowing users to switch to the new (old in the case of the DOP) as and when or not at all @Musashi.

If I had tried that with my customers without signed approval I would have been looking for a new job!

I have preserved PL5.1.4 on my “Test” machine and it will remain there for some time I feel!

EDIT:-

The following are the exports from an image with a keyword of “L1|L2|L3|L4|L5|L6|L7|L8” where the default after adding such keyword is that only the last item is selected, i.e. “L8”. The intermediate images are what I got when I set each checkbox until all were selected!

and the final sidecar files was

     <dc:subject>
        <rdf:Bag>
           <rdf:li>L1</rdf:li>
           <rdf:li>L2</rdf:li>
           <rdf:li>L3</rdf:li>
           <rdf:li>L4</rdf:li>
           <rdf:li>L5</rdf:li>
           <rdf:li>L6</rdf:li>
           <rdf:li>L7</rdf:li>
           <rdf:li>L8</rdf:li>
        </rdf:Bag>
     </dc:subject>
     <dc:format>image/x-panasonic-rw2</dc:format>
     <exif:DateTimeOriginal>2022-05-30T16:54:28.885+00:00</exif:DateTimeOriginal>
     <lr:hierarchicalSubject>
        <rdf:Bag>
           <rdf:li>L1</rdf:li>
           <rdf:li>L1|L2</rdf:li>
           <rdf:li>L1|L2|L3</rdf:li>
           <rdf:li>L1|L2|L3|L4</rdf:li>
           <rdf:li>L1|L2|L3|L4|L5</rdf:li>
           <rdf:li>L1|L2|L3|L4|L5|L6</rdf:li>
           <rdf:li>L1|L2|L3|L4|L5|L6|L7</rdf:li>
           <rdf:li>L1|L2|L3|L4|L5|L6|L7|L8</rdf:li>
        </rdf:Bag>
     </lr:hierarchicalSubject>

platypus · June 21, 2022, 11:41am

I’ve tested adding hierarchical keywords in Capture One (version 12 on macOS Monterey on iMac 2019) and found the following in conjunction with DPL5 and LrC 11.4

First test: Add hierarchical keyword

I can add a hierarchy like A>B>C>D
→ same in DPL and LrC
I can add keywords like “A”, “A,B”, “A,B,C” and "A,B,C,D*
→ I can’t add e.g. “C” alone, CapOne will only do “C<B<A”
→ LrC can add a single “C”, unless it’s in the keyword list several times with different parents, in which case, LrC automatically proposes/adds a parent level (I suppose it will add all necessary levels to completely define “C”)
→ DPL can add a single “C”, no questions asked.

Second test: Hierarchical nonsense “F>F>F>F”

DPL, C1 and LrC let me do that, no questions asked
When I then look at what I get in the other apps, displayed the keywords variy greatly
LrC displays F<F<F<F if F is written by LrC - or
F, F<F, F<F<F, F<F<F<F for keywords entered and stored in the other apps
C1 does not display nonsense from DPL and LrC
DPL displays F F F F if F is written by DPL or C1
Just F if it’s written by LrC
Note: Tooltips display the complete tree and or F, F>F, F<F<F, F<F<F<F respectively

Test it yourself if you want to see other combinations

BHAYT · June 21, 2022, 1:45pm

@platypus this is what the manual says

I started this topic because “assigned” (“selected”) keywords (or rather the reverse) had an “unexpected” (by me) consequence on the ‘Search’ command or rather on the results of the ‘Search’ command. The count should relate to the number of check boxes that have been “checked”, i.e. “assigned” which currently affects both the outcome of a search and also the contents of the metadata that will be

output back to an image (AS(ON) or AS(OFF) + ‘Write to image’)
and/or exported!

DxO may (will probably) respond that the product is working as designed, whereas I consider (as usual) that it could do much better and in this case that it absolutely should do better!

The checkboxes are created/recreated from the contents of the ‘ItemsKeywords’ structure which also provides the counts shown in green in your snapshot (I believe/guess/…). It could be argued that if a keyword is not selected (assigned) then it doesn’t “belong” to an image and therefore should/would not be found in a search!

By default, only the “leaf” keyword of “Tree|branches|leaf” is assigned by default (on Win 10 PL5) and there will be one entry in ‘ItemsKeywords’ for the addition of that hierarchical keyword. BUT by virtue of the fact that it is an hierarchical keyword all the other simple keywords that make up that hierarchy are also effectively used by the image, albeit as part of the hierarchy.

Hence, I do not believe it unreasonable for users to expect to find all the images that have “branches” and “Trees” assigned as part of an hierarchical keyword entered into PL5 but currently unless the user explicitly “assigns”/“selects” them (with the consequences upon the metadata output) then the other keys will not be found.

The scenario that I tested this morning is what happens when PL5 “discovers” an image with the metadata rather than the addition of the metadata taking place within PL5.

Typically the image will contain the metadata “Tree|branch|leaf” but accompanied by the simple keywords “Tree”, “branch” and “leaf” in the ‘dc’ fields and also in the ‘hr’ fields in varying combinations. With the latest version of Win 10 PL5 the ‘dc’ keywords will only consist of “leaf”!

I will write up what I discovered later today or early tomorrow and also undertake your new tests at the same time.

Regards

Bryan

Joanna · June 21, 2022, 3:58pm

Can I just chip in with a test I have just made?

Delete PL5 database and any DOP and XMP files from test folder.

Open PL5 and select a RAW image with no keywords assigned.

Type A > A > A into the keywords field…

Capture d’écran 2022-06-21 à 17.14.15

Press Enter to accept…

Capture d’écran 2022-06-21 à 17.14.27

Note that all levels of the hierarchy are assigned.

Now enter A > B > C into the keywords field…

Capture d’écran 2022-06-21 à 17.16.28

Press Enter to accept…

Capture d’écran 2022-06-21 à 17.16.39

Note that only C gets assigned, in addition to the three levels of A. The middle B is ignored, whereas, with the A > A > A, the middle A was assigned.

Write the metadata to the file…

         <dc:subject>
            <rdf:Bag>
               <rdf:li>A</rdf:li>
               <rdf:li>C</rdf:li>
            </rdf:Bag>
         </dc:subject>
         …
         <lr:hierarchicalSubject>
            <rdf:Bag>
               <rdf:li>A</rdf:li>
               <rdf:li>A|A</rdf:li>
               <rdf:li>A|A|A</rdf:li>
               <rdf:li>A|B|C</rdf:li>
            </rdf:Bag>
         </lr:hierarchicalSubject>

My question, at this point is, why did PL5 assign all three levels when I type A > A > A, but not when I typed A > B > C?

Using Adobe Bridge, if I add the same values, I get an XMP file that looks like this…

   <dc:subject>
    <rdf:Bag>
     <rdf:li>A</rdf:li>
     <rdf:li>C</rdf:li>
    </rdf:Bag>
   </dc:subject>
   …
   <lr:hierarchicalSubject>
    <rdf:Bag>
     <rdf:li>A|A|A</rdf:li>
     <rdf:li>A|B|C</rdf:li>
    </rdf:Bag>
   </lr:hierarchicalSubject>

Using Capture One 12, if I type the same hierarchies into the keywords field, I get…

Capture d’écran 2022-06-21 à 17.50.43

… in the UI and…

   <dc:subject>
    <rdf:Bag>
     <rdf:li>A</rdf:li>
     <rdf:li>B</rdf:li>
     <rdf:li>C</rdf:li>
    </rdf:Bag>
   </dc:subject>
   <lightroom:hierarchicalSubject>
    <rdf:Bag>
     <rdf:li>A</rdf:li>
     <rdf:li>A|A</rdf:li>
     <rdf:li>A|A|A</rdf:li>
     <rdf:li>A|B</rdf:li>
     <rdf:li>A|B|C</rdf:li>
    </rdf:Bag>
   </lightroom:hierarchicalSubject>

So, Capture One seems to be more compliant with MWG than both PL5 and Adobe Bridge. Although the UI doesn’t show me which A is the parent of which other A without having to invoke the tooltip. on each token.

Reading this back into PL5, I then get…

Capture d’écran 2022-06-21 à 17.53.41

But in PL5.1.3b55, the dc:subject used to confirm to the MWG guidance in doing the same as Capture One.

This is just the kind of thing that is getting so frustrating for users who want to use an external DAM in addition to PL5 and is why several of us have ended up saying Don’t do it to anyone thinking of so doing.

Now, in a recent discussion with @platypus we seem to be in agreement that, not only does PL5 rely on the lr:hierarchicalSubject tag, or more likely on its own internal database, to power the search, other apps may be doing the same and my guess is that DxO are following the crowd, rather than “doing the right thing”

If, as we suspect, some apps (Adobe included) are relying on their own database/catalogue instead of the actual XMP for searching and refusing to follow the MWG Guidance they help write, this would explain the missing keyword B in the dc:subject tag.

Now, this is all well and good, as long as users only play with software written by the “gang”, which seems to be led by Adobe.

But, if a user decides to use any other software, they will not be able to search for intermediate level keywords, since they’re not mentioned in the dc:subject tag, as recommended by the MWG.

It’s not exactly rocket science to write the dc:subject tag correctly, as Capture One proves.

Franky · June 21, 2022, 4:32pm

This is what I get with XnViewMp if I want parent keywords to be used for ranking only:

And what if I want parent keywords to be added to my keywords:

platypus · June 21, 2022, 6:37pm

Accessing the database should be much quicker than searching umpty files. That’s why a) search is based on the database and b) metadata changes should be taken into account, possiblywithout having to restart the app…

BHAYT · June 21, 2022, 11:30pm

Not on Windows 10!

I get the following

Adding A>A>A:-

Please note that only the last “A” is selected

’Items’:-

’Keywords’:-

2022-06-21_232929_

ItemsKeywords:-

2022-06-21_233044_

Adding A>B>C:-

’Keywords’:-

2022-06-21_234110_

ItemsKeywords:-

2022-06-21_234131_

The xmp looks like this because not all the boxes have been checked, i.e assigned/selected. The assignment process (selection) drives the contents of the ‘ItemsKeywords’ structure which determines the Keywords layout for display, the contents of the metadata in the sidecar (or embedded) and the eligibility to be found in a search and Win 10 only “assigns” the “leaf” keyword of an hierarchy not all the keywords as seems to be the case on the Mac!?

         <dc:subject>
            <rdf:Bag>
               <rdf:li>A</rdf:li>
               <rdf:li>C</rdf:li>
            </rdf:Bag>
         </dc:subject>
         <dc:format>image/x-panasonic-rw2</dc:format>
         <exif:DateTimeOriginal>2022-05-30T16:33:30.294+00:00</exif:DateTimeOriginal>
         <lr:hierarchicalSubject>
            <rdf:Bag>
               <rdf:li>A|A|A</rdf:li>
               <rdf:li>A|B|C</rdf:li>
            </rdf:Bag>
         </lr:hierarchicalSubject>`Preformatted text`

For me to get something to compare I would need to assign the intermediate levels myself, however, we are not comparing like with like and why PL5 left “B” unassigned on the Mac I do not know.

I will try to find time over the next few days to look at @platypus’s tests but I have actually got a lot to do at the moment and it is clear that the two products are not the same with respect to keyword “assignments”, so we are in danger of trying to compare fruits that both happen to be round but there the similarity ends!

However, please continue to use this topic to investigate what is happening on the MAC and I will replicate as much as possible on Win 10 (when I have the time). While we wait for DxO to enlighten us all @sgospodarenko.

platypus · June 22, 2022, 5:33am

…assignment is one thing, putting keywords into XMP is another.

DPL’s way to express keywords in XMP has changed between versions 5.1 and 5.2. @Joanna and I have checked that yesterday and found that the Mac version 5.1.3 build 55 is the one that closely corresponds to MWG guidelines.

I appreciate Capture On’s sensible approach to add a hierarchy only from top down, which means that, from A>B>C>D, I can’t add a lonely D, which again prevents ambiguities, e.g. in @Joanna’s “orange” example.

BHAYT · June 22, 2022, 7:16am

@platypus I have been "moaning about the change between Win 10 PL5.1.4 which I still have installed on my Test machine and PL5.2.?, I believe that it was the first PL5.2 release where it was mentioned by @Marie that hopefully things were now improved here PL5 completely messes up my Hierarchical Keywords when exporting - #19 by Marie and PL5 completely messes up my Hierarchical Keywords when exporting - #23 by Marie plus XMP-files gets F*cked up due hierachical mismanagement in dual management - #26 by Marie and XMP-files gets F*cked up due hierachical mismanagement in dual management - #28 by Marie.

When I tested the product I immediately commented on the ridiculous new ‘dc’ keyword(s) which I considered both wrong and wrongly executed, i.e. changing the product with no option to use the original feature which I consider to be better than the new one!

However, because the Win 10 PL5 default is to only assign (select) the “leaf” keyword I have never used the other checkboxes and discovered that will assign all the combinations that I had only discovered with Capture One up to that time! Those assignments are controlling the keyword combinations included in the display but also included in the xmp until DxO decided to “block” all but the “leaf” keyword in the ‘dc’ data!

I had been “saving” this for a new topic but now is as good a time as any @sgospodarenko, @Marie, @Musashi to publish part of my updated spreadsheet, the additional sheets then take the outputs from the various programs tested and pushes then through PL5.1.4 and PL5.3.0, this needs checking.

What is missing is a row for PL5 when all combinations are assigned but to do that in Windows is a real “pain” because it would need to be done for every image where the “full” set is required, hence, my requests for options @Musashi to

Select 5.1.4 methodology
Select 5.2.0 methodology
Select all elements to be assigned automatically because we don’t get that in the Win 10 version!

Taking the following 4 scenario’s I entered the data into a number of packages to see what xmp data I got!?

“animal”, “mammal”, “bear”
“animal|mammal|bear” or “animal>mammal>bear” or “bear<mammal<animal” whichever you prefer!?
“animals|mammals|bear|black bear” or your preferred syntax variant
“animal”, “mammal”, “bear”, “animal|mammal|bear” or the alternative syntax variant of the hierarchical keyword

The first page of the spreadsheet attached below is the result from various packages when I enter each keyword with the appropriate delimiter (, or ; etc) into the package. Because the packages have preference options that can change the structure of the keywords that the package then places in the image metadata I am trying to include the options as well?

Capture One makes no direct changes to any image file always creating an xmp sidecar file regardless of the image type (I believe) so the only way of externalising keywords is via the export option!

I feel that the biggest mistake (by DxO) was the lack of communication about the DOP usage change, closely followed by a lack of realisation about the position that PL5 takes in the user’s work flow where users simply do not want their (DAM) metadata changed in any way, certainly not in the image itself but also not in the export from PL5, hence the proposal of a “DAM sandwich” from some!

PL5 is not really worse than most of the other packages which doesn’t make it right just not as wrong as it is being painted!

What is the “perfect” metadata configuration for my little collection?

I am particularly concerned about how many (if any) simple keywords should be in the ‘hr’ but with my “horror” combination (4) if there are no simple keys in the ‘hr’ field you wind up with the same combination for 2 and 3!

Copy of meta data setting _07-01.8 (first sheet only).zip (3.6 KB)

I also believe that there is a bug in the way that PL5 handles the storage of my “horror” combination internally (more tests + a new bug report if appropriate), which was why I started further investigation prompted by a comment about search issues with PL5 by @Joanna.

PS:-

The IMatch options:-

The first selected (not a good choice), versus both selected versus neither selected! Please note the similarity between IMatch options and the PL5.1.4 and PL5.2.0 alternatives!?

Joanna · June 22, 2022, 7:41am

And this is something I have been trying to put across since before PL5 was released.

I really don’t care what DxO do in their database (since I regularly scrap it). What I do care about is compatibility with other apps.

XMP is a “universal” means of communicating metadata between apps, not just for use by a single app. In this regard, it is important to that PL writes XMP in a way that is the most compatible with everything if at all possible. Which is why the MWG Guidance was drawn up.

This paragraph is absolutely fundamental to the correct transmission of compatible metadata.

Note the phrase…

MUST write the XMP dc:subject property to store the individual keywords

The problem is that it would seem that some software authors have read this part of the paragraph and are duly storing leaf keywords selected by a user but, at the same time, ignoring any antecedents because such things are managed in their database and it may be seen as unnecessary duplication to also write them to the XMP.

However, the second part of the paragraph goes on to say…

Hierarchical path elements MUST be flattened, which means that each hierarchy node needs to be stored as a separate keyword entry to XMP dc:subject

So, whether the hierarchy contains a single entry for something like…

A|B|C|D

… or the more complete…

A
A|B
A|B|C
A|B|C|D

… the MWG guidance is absolutely clear that every keyword mentioned in lr:hierarchicalSubect MUST also be mentioned in dc:subject.

As @platypus says, this used to be the case in PL5.1.3 - something that made PL perfectly compatible with all other software but, for some unknown reason, this was changed in subsequent versions - something that has now rendered more recent versions to be incompatible with several apps.

The MWG Guidance document includes a very clear example of how a user might not select a complete hierarchy…

Note how Animals has not been selected, even though it is the parent of Mammals. This is analogous to the hierarchy UI in PL5.

But then the document goes on to clearly state that the correct way of writing this to XMP should be to include Animals in the dc:subject tag.

Capture d’écran 2022-06-22 à 09.34.15

Note the comment in green…

flat keyword list for interoperability

I really don’t know how clear this has to be to be seen as important. Certainly even DxO thought it important enough to adopt this behaviour in PL5.1.3 - only to revert to a breach of this guidance in subsequent versions.

As @platypus says, Capture One manages to do this with no problem at all - thus preventing all sorts of problems like restricted search in other software and ambiguities in the PL5 UI.

BHAYT · June 22, 2022, 7:41am

@Musashi & @sgospodarenko can you please point me to the documentation with respect to the reversion to the “old” PL4 DOP handling method where presenting a PL5 DOP with an image with embedded metadata will always take the DOP metadata with AS(OFF)! When it was PL4 the only blocking of image metadata was ‘Rating’ and ‘Rotation’ now it is essentially all the metadata will be blocked.

Users want to be able to present the DOP for their preserved edits but now we have “old” DOPs potentially getting in the way of newer metadata all because DxO didn’t want a new option. PL5.3.0 doesn’t work the same as PL5.2 which doesn’t work the same as PL5.1.4 (Win 10 numbering) and there is no way of going back and preserving the rest of the later features/bug fixes that may be useful, this is truly …

Musashi · June 22, 2022, 8:41am

Hi,

I need to sync with the team before answering properly to this thread.
I’ll get back to you.

Best regards

platypus · June 22, 2022, 8:46am

Just did a little test with Capture One on macOS Monterey on M1 MacBook Air 2020.

While C+ still writes keywords in what seems to be MWG compliant ways, C1 also added a information popup that appears, when I try to delete an intermediate level keyword:

The more I look into this topic in C1, the better I like what C1 is doing.