PL5 Searching doesn't work properly for all keywords in an hierarchy - only for "Selected" keywords

BHAYT · June 26, 2022, 1:22pm

I guessed from their reaction to the various things I have suggested over time. The trouble with “automatic” is “one size fits all” and it is predicated on the gurus at DxO understanding/predicting what users want and being right every time.

Hell I was very good at my job but even I didn’t believe that! However, I designed my UIs with more options than you could shake a very large stick at! One screen got a lot of flack from the developers and remained with a single configuration for a long time until the customer was proposing a change in the protocol. Into the test bed, changed the configuration and we were ready for testing within minutes!

Joanna · June 26, 2022, 1:30pm

Heheheh. In our dreams And that is not to denigrate the engineers at DxO, just to say that that kind of thing is a thankless task because there’s always going to be someone who wants something that hasn’t been thought of, like another new DAM that does it differently.

Excellent! I worked on a major business management suite which, by the time I had finished designing the undergirding framework, allowed for dynamic edition of database tables and the UI could be customised by a (non-programming) consultant - in other words, we designed a UI that designed the UI. Sounds like we think alike on things like that.

platypus · June 26, 2022, 2:01pm

This option was last seen in DPL4.

As for how DxO should do their programming, I’ll be silent, it’s their responsibility after all…

BHAYT · June 26, 2022, 3:43pm

@platypus that statement is incorrect I am afraid at least when presented with the exported output from Capture One where the following appears to be the outcome if I did the tests correctly of course for Releases 4.3.6, 5.1.4 and 5.3.1.

In this case it appears to be “Good stuff in” then “Good stuff out”. Incidentally I wasn’t really expecting the results to be quite as consistent @Joanna! I changed no assignments just created directories copied some “old” Capture One outputs and exported from PL4 and PL5 releases and consolidated the resulting JPGs!!

I need to think about this and arguably run some tests with outputs from other programs to see what DxPL makes of them (some of that is in pages 2 and 3 of the spreadsheet but not including PL4 outputs!

PS:-

and us that pay for its development (and give our time to test and re-test and re-test and … for free - my company charged me out at £1,000 a day, of which I received a small fraction) and have to live with the “consequences” - so silence is not an option!

platypus · June 26, 2022, 5:09pm

Well, @BHAYT, there are a few things that might have escaped you in the heat of the moment.

DPL 4 has a preference/setting that was supposed to GIGO metadata to exported files.
Please note that I’ve not written that GIGO actually worked correctly.
Please note that the statement is about how DxO is programming their stuff. It’s not about what they program. I’m not going to tell DxO that they should use this or that database or that their sprints should be 10 or 20 days long.
Nevertheless, the what should take into account user requirements - provided that they are known.

BHAYT · June 26, 2022, 6:22pm

I am sorry but there does not appear to be any such options in PL4 that I can see. It may have been an “unwritten” or even “written” “rule” but it does not have a corresponding entry in the ‘Preferences’, not really surprising given what has been written above!

There does not appear to be such an option anywhere in the product, on Win 10.

I don’t believe that PL4 was any better at preserving the metadata than PL5, in truth I have believed that the code in both is identical once the metadata has been “imported” into DxPL, just that the immutability constraints for PL4 (and earlier) meant that nothing went back into the image metadata but that doesn’t apply for exports!

However, the key point was the notion that PL4 offered some form of preservation that is missing from PL5. but according to my tests the xmp metadata created for all three products in this case is the same!?

I will use my spreadsheet to find a product where PL5 appears to be particularly “destructive” and then see if PL4 was better behaved with the particular keywords combination than PL5.

Providing

The users actually know exactly what they want and can successfully enunciate it
What options might be open to them (the user) with a new product rather than products which have a long history which might actually be constraining the product and the user
That DxO actually understand the requirements when enunciated by the user.
That the conflicting requirements of those users who enunciate their requirements/desires/prejudices etc. can actually be met in one product, and if they can all be met at what point in a development timescale they might become available.

I have explained before why I “design” a solution, including the underlying database, it is simply to assess whether what I am requesting/suggesting can be fulfilled potentially “easily” or is a real “drains-up”/“if I was going there I certainly wouldn’t want to start from here” type situation.

If you are “irritated” by my “presumption” then that might also (sadly) be the case with DxO personnel but I am not actually trying to “win a popularity contest” I am trying to get a product I can use for editing my photos and doing a bit of reliable metadata management at the same time.

PS:- sorry for the use of “enunciate”, I suddenly panicked and thought that I might have used the wrong word, but the word is correct, if somewhat “pompous” and certainly well and truly overused!

platypus · June 26, 2022, 10:29pm

…another mac/win difference, apparently.

…that DxO has asked around, also outside of the forum.

Anyway, we’ve gotten off the OP subject: search, and, most of all, finding the right stuff.

BHAYT · June 26, 2022, 11:06pm

@platypus It is my topic and we have only just strayed off topic and discovered another difference that I knew nothing about although I have been asking for such a feature for some time. The search issue is driven by the assignment issue which also leads into the way the the keyword metadata is populated which leads into the “ideal” format of the keywords metadata which certainly leads into the problems created by the user reaction to the way PL5 “adjusts” the image keywords which leads into me suggesting a way around the problem which …

This topic is tracking the original intent way better than some I have seen and as the topic author it is me who should be upset but I feel that we are covering useful ground albeit for the nth. time as n tends to a large finite number (but I am not sure about the use of the word finite)!?

I understand how busy the DxO developers, support staff etc. are and it does appear that they monitor the forum but I do object to situations like the “Unwanted Virtual Copies” that ran for almost a year and no-one from DxO thought it was appropriate to bring it to and end with the facts two posts after it started!?

The whole change in the use of the DOP with the lost ‘Rating’ and ‘lost Rotation’ and ‘Lost Tag’ the latter being an actual bug was allowed to “fester” until we eventually worked out what was going on which then tied in with the chart that DxO produced and circulated via the Forum. While I like the challenge I would rather these things were laid to rest much sooner.

Why the apparent “secrecy”, are we (users) being tested, is DxO embarrassed or are the bugs actually considered “features” until a user expresses some cogent reason that casts doubt on that assertion!?

I am concerned that there is an underlying problem with the mindset, having pride in one’s work is as it should be but not blinding oneself to shortcomings. Revelling in the many positives of the product don’t entirely excuse the missed opportunities and definitely don’t excuse any real bugs, even when they are “only” caused by a lack of communication!

Joanna · June 27, 2022, 8:47am

I’m wondering why I am bothering to raise the issues I have been raising. After all, I have written my own app and use it daily for keywording, tagging and rating, as well as searching. To quote a famous Apple catchphrase - “it just works”.

My app is setup so that I can search, create Smart Folders from saved searches, and then simply double-click on selected thumbnails and they will be opened in PL, with a project created by PL if this images come from more than one folder.

The major problems with PL5, at present, is that keywords are being recorded incorrectly thus crippling the search functionality. Added to which, the search cannot even cope with ORing two predicates.

So why am I bothering to pester DxO to get metadata right in PL if I don’t even use the DAM functionality?

Simple. I adore the image editing functionality of PL, FP and VP and desperately want DxO to gain a reputation as a company that produces superb software that I can happily recommend to more and more people.

I participate in beta tests because I am in the position of also teaching PL to people, which then means I get to see other use cases that I might not even have dreamt of, which can provoke otherwise undiscovered bugs.

Do I get paid for participating in beta tests? Well, despite any rewards, I still purchase an upgrade myself, because I feel that DxO is a company worth supporting financially and I want to be able to continue to use their product for years to come.

Why do I keep on badgering for MWG compliant keyword handling? Because I have proven that, in so doing, every other DAM can then easily read any metadata written by PL.

Unfortunately, some other DAMs do not comply with MWG and allow users to enter all sorts of rubbish, which then goes on to produce non-standard metadata, which PL cannot cope with. Equally unfortunately, DxO then responds to folks who use these other “non-standard” DAMs and produce non-standard metadata by trying their best to cater for these anomalies at the cost of breaking compatibility with other apps which do play nicely with MWG. As a result, instead of complaining to companies like Adobe and others, they insist that all their ills are the fault of DxO and bleat until PL changes to suit their needs, regardless of the effect any changes might have on other use cases.

From a compatibility point of view, PL5.1.3 was perfect in how it stored and exported metadata - then someone changed it - why?

It is a futile exercise to try and read non-standard metadata from other DAMS because there will always be yet another DAM that will do things in an, as yet, unforeseen way.

It is my opinion that what we are seeing in DxO’s approach to metadata handling, which is leading to searching not working, is not “bugs” but simply poorly thought through design that works as intended - unfortunately though, not as desired.

Please DxO, revert to PL5.1.3 handling, stop reacting to folks requesting “input compatibility” and, instead, offer options for “passthrough” for those who don’t use MWG compliant DAMs.

BHAYT · June 27, 2022, 2:09pm

@Joanna keywords are not being recorded at all unless the Mac DxPL works very differently to the Win 10 version. There is no notion of ‘dc’ or ‘hr’ in the Win 10 database. There is no notion of hierachical keywords being stored in a particularly searchable way!

All keywords are stored as simple keywords, i.e. any hierarchical structure is gone (except that it can be reconstructed by following the pointers). The search is of a table of “simple”/“flattened” keywords in the ‘Keywords’ structure (I believe it can’t really be of anything else) and when a match is located in that structure the ‘KeywordId’ is used to see if the keyword is actually in-use and if so by what image(s).

But (on Win 10) the entries in ‘ItemsKeywords’ (which links a ‘KeywordId’ to an ‘Items’ id) are only populated for keywords that have been “assigned” and "the Win 10 default setting** is that “only” the “leaf” item is assigned (selected), I believe that the default on Mac is that all items are assigned (selected), i.e. for Win 10 I have the following

so for the keyword “animals|mammals|bear|black bear” there will be 4 entries in the ‘Keywords’ Table, one for each simple keyword in the hierarchy but only one entry in the ‘ItemsKeywords’ Table for the “black bear” keyword which has been assigned by default.

BUT it turns out (something I did not appreciate when I started this topic) that the assignment not only controls what may or, mostly @sgospodarenko, may not be turned up by a search but also the contents of the keywords inserted in the metadata of an exported image and/or the original image. The “subtlety” of this element of the assignment had been completely lost on me, mainly because I left things as they were created by PL5.

So we have in tortuous detail the related snapshots for the following tests scenario:-

Clear database again!!
Select two RAWs and place in a directory and open PL5 and navigate to that directory.
Enter “animals|mammals|bear|black bear” into the keywords for photo 1 and assign all keywords.
Enter “animals|mammals|bear|black bear” into the keywords for photo 2 and retain the default assignment.
Undertake searches for “animals”, “mammals” “bear” and “black bear”
Export both photos and compare keywords with Capture One metadata for the same hierarchical keyword.
Retire from testing because it is excruciatingly repetitive and I need to get a life!

Photo 1 - with all keywords marked as “assigned”:-

Photo 2 - with the default Windows 10 DxPL assignment:-

Photo 3 - Search for “animals”:-

Photo 4 - Search for “mammals”:-

Photo 5 - Search for “bear”:-

Photo 6 - Search for “black bear”:-

Reconstructing Hierarchical Keywords from ‘Keywords’:-

black bear (5) < bear (4) <mammals (2) < animals (1)
black bear (3) <mammals (2) < animals (1) (this exists because of my attempt to solve a typo!?)

"Reconstructing an “animals” search":-

“animals” located at (1) in ‘Keywords’
Locate “KeywordId”=1 in ‘ItemsKeywords’ and this returns ‘ItemsId’=1, i.e. active count of 1

Reconstructing a “mammals” search:-

“mammals” located at (2) in ‘Keywords’
Locate “KeywordId”=2 in ‘ItemsKeywords’ and this returns ‘ItemsId’=1, i.e. active count of 1

Reconstructing a “bear” search:-

“bear” located at (4) in ‘Keywords’
Locate “KeywordId”=4 in ‘ItemsKeywords’ and this returns ‘ItemsId’=1, i.e. active count of 1

Reconstructing " black bear" search:-

“black bear” located at (3) in ‘Keywords’ and at (5)
Locate “KeywordId”=3 in ‘ItemsKeywords’ and this returns not found (keyword not in use)
Locate “KeywordId”=5 in ‘ItemsKeywords’ and this returns ‘ItemsId’=1 and =2, i.e. active count of 2 (1 assignment for image 1 and 1 assignment for image 2)

But both keywords in images 1 and 2 are identical so why the difference in search “discovery”? This difference is caused by the assignment check box - incorrectly I believe @sgospodarenko!

The assignment then goes on to determine (poulate) the keywords of the exported image!

Photo 7 - Exported JPG keywords compare to Capture One:-

Hence, I don’t feel that they are being “recorded” incorrectly but rather that they are being used to “populate” the output metadata incorrectly (unless all the keywords in the hierarchy are assigned/selected) and influence the searching incorrectly, ignoring all keywords that are implicitly assigned by including only those that are explicitly assigned via the checkbox .

Please show me a snapshot of the assignment “table” as I have included above when you enter the “animals|mammals|bear|black bear” keyword to an image on the Mac.

BHAYT · June 27, 2022, 2:34pm

I don’t believe that all the fuss made in the forum was actually correct but that might vary between Win 10 users and Mac users.

Here is the third sheet from my spreadsheet BUT it arguably needs some more work because I used the default assignment for Win 10 which always just assigns the “leaf” keyword! I need to repeat the tests with all keywords in the hierarchy assigned to see what we get then!

In the meantime this is actually the third sheet in the spreadsheet with Package output that is then input to PL5 compared to PL5.1.4 export compared to PL5.2.0 export for that imported image to help put the various complaints into perspective @sgospodarenko

Copy of meta data setting _07-01.8.pdf (30.4 KB)

Joanna · June 28, 2022, 9:33am

The Mac database is based on something called Core Data, which is an OPF (Object Persistence Framework). The idea is that you design the relationships between “objects” and the framework constructs the database for you.

This means that, in the case of something like an Invoice, you would have a model where the Invoice object contains a list of Invoice Lines. However, that then gets mapped into a normalised database model where the Invoice doesn’t know about its Lines but, instead, each Line contains a reference to its owning Invoice.

This then leads to an object model for keywords where a Keyword contains a list of child Keywords, which is then mapped to a database model where, instead, each Keyword contains a reference to its parent Keyword.

So, in the case of our test hierarchy, the Keywords table contains a recursive reference from a “child” to its “parent”

These relationships exist regardless of whether any of these keywords are assigned to an Item (image) or not.

In the case of only assigning the leaf node of our hierarchy…

Capture d’écran 2022-06-28 à 10.29.58

The “link” table only contains one link between the leaf keyword and an item. So, if I write a SQL query like…

SELECT ZDOPITEM.ZNAME FROM ZDOPITEM, ZDOPKEYWORD, Z_10KEYWORDS
 WHERE ZDOPITEM.Z_PK == Z_10KEYWORDS.Z_10ITEMS
 AND Z_10KEYWORDS.Z_12KEYWORDS == ZDOPKEYWORD.Z_PK
 AND ZDOPKEYWORD.ZTITLE == "Black Bear"

… it will return…

… as expected but, if I want to find all items that contain Bear regardless of colour or type, if I then write an SQL query like…

SELECT ZDOPITEM.ZNAME FROM ZDOPITEM, ZDOPKEYWORD, Z_10KEYWORDS
 WHERE ZDOPITEM.Z_PK == Z_10KEYWORDS.Z_10ITEMS
 AND Z_10KEYWORDS.Z_12KEYWORDS == ZDOPKEYWORD.Z_PK
 AND ZDOPKEYWORD.ZTITLE == "Bear"

… it returns nothing.

Why? Because the link table only contains one reference to the Black Bear keyword…

However, if I alter the structure of the the link table by adding an Applied column…

Then the same query on Bear returns…

From the other direction, finding which keywords have been applied to an image now becomes simple.

If I want all keywords - for searching…

… but, should I only want to know which keywords to check in either the keywords field or the hierarchy tree view, then modifying the SQL to include filtering on the Applied column, gives just what we want…

Would you agree that this could solve all sorts of problems, or am I dreaming?

BHAYT · June 28, 2022, 10:35am

@Joanna I agree because that is exactly what I stated in my opening “submission”, sorry first post shown in the quotes at the end of this post. The problem is “easy” to spot on Win 10 because only the ‘leaf’ is assigned and then found in a search, i.e. the rest are missing!!

The structure needs a new field as you state and as I stated initially and to contain all the keywords explicitly assigned via the checkbox (the current system) and implicitly assigned by virtue of belonging to the hierarchical key that has effectively been “assigned” by the inclusion of the “leaf” keyword.

In a post I was writing but parked to respond to you I had written the following

"I would consider the following might be useful, but the exact way in which the data would be used remains to be discussed.

‘ItemId’ - pointer to ‘Items’ (Images)
‘KeywordId’ - pointer to the ‘Keywords’
‘Assigned’ - whether the keyword has been marked as ‘assigned’ (or ‘selected’ or ‘Applied’ (Y or N, 1 or 0 etc.)
‘Child’ - whether the keyword is a ‘child’ of an hierarchy, by virtue of any value other than 0 or ‘NULL’ then the keyword is a ‘child’ a 0 or ‘NULL’ identifies either the head of an hierarchy or a standalone keyword. Better would be ‘NULL’ for simple keyword and 0 for the head of an hierarchy, perhaps!
‘FieldOrigin’ - whether the keyword comes from the ‘dc’ or ‘hr’ fields’ arguably (only) for keywords from the image. I had thought that it could have a ‘both’ value but wonder if it might be better to store two entries, one for the ‘keyword’ located in the ‘dc’ fields and another for an entry in the ‘hr’ fields. One problem with this is that it relates only to keywords that come from ‘Source’ = ‘I’
‘Source’ - whether the keyword came from the image (‘I’) or was entered in DxPL (‘D’)

So for “animal|mammal|bear” entered in DxPL there would be

‘Keywords’ - entries for “animal”, “mammal” and “bear” with pointers to ‘ItemsKeywords’

‘ItemsKeywords’(new) would have the following:-

‘Items’ = ItemId ‘KeywordsId’ = “animal” ‘Assigned’ = N ‘Child’ = 0 ‘FieldOrigin’ = ‘NULL’ ‘Source’=DxPL
‘Items’ = ItemId ‘KeywordsId’ = “mammal” ‘Assigned’ = N ‘Child’ = 1 ‘FieldOrigin’ = ‘NULL’ ‘Source’=DxPL
‘Items’ = ItemId ‘KeywordsId’ = “bear” ‘Assigned’ = Y Child = 1 FieldOrigin = ‘NULL’ Source=DxPL"

An alternative to changing ‘ItemsKeywords’ is to create a new structure, sometimes referred to as “co-structure expansion” and frequently used when there is a lot of data pertinent to certain items but not all (with databases that don’t manage variable length well or at all and with databases of potentially millions of records, sorry Table entries).

In this case the technique would be used to avoid a database re-organisation leaving ‘ItemsKeywords’ as it currently is and creating ‘ItemsKeywordsAll’, for example.

The risk in this case is that there would be a total duplication when all items are assigned. Plus in either case there has to be a re-organisation because all the “implicitly” assigned keywords are actually “missing”. But it has the advantage that all the existing GUI code remains the same and “just” the Search code needs to change @Musashi and @sgospodarenko once the new structure has been populated!

Not only does there need to be a change in the database structures (extended ‘ItemsKeywords’ or co-structure) but every “Leaf” only entry needs to be used to traverse up the ‘Keywords’ structure checking for entries in ‘ItemsKeywords’ (the explicitly assigned) and creating an entry for the “implicitly” assigned items that don’t currently have an entry at all. With a new structure this is about as easy as it can get because ‘ItemsKeywords’ is left alone and the new structure is populated from an analysis of Keywords in association with the “unchanging” ‘ItemsKeywords’, any failure during the expansion process simply means clearing the structure and starting again!.

My original assertions:-

PS:- If the two structures are employed the ‘Assigned’ field is not actually used! ‘ItemsKeywords’ becomes ‘ItemsKeywords’ (assigned) either implicitly or explicitly (by being renamed or replaced by a new structure) and ‘ItemsKeywordsAll’ is new and identical to the current ‘ItemsKeywords’ but contains all entries for keywords elements that are going to wind up in the image or export and is used for the ‘search’ function (or remapped to the old ‘ItemsKeywords’ if that structure becomes ‘ItemsKeywordsAssigned’).

One nasty implication of such re-organisation is the issue of loading an old database!

Joanna · June 28, 2022, 5:05pm

I can’t see why this is necessary if the recursive Parent relationship is already present in the Keywords table.

It is possible to query the recursive keywords table to extract a chosen leaf node and all of its parents…

Unless I am misunderstanding you?

Now this is becoming quite complex. Strictly, the only purpose for the dc:subject tag is to record all keywords (uniqued and flattened) from the lr:hierarchicalSubject tag so, as long as al hierarchies are complete and correct, the subject can be derived from them.

At the moment, the only change I see as necessary, to the database, is to add the Applied column to the link table.

In fact, in providing the above test material, even though I added the Applied column, PL5 ran fine.

Unless you mean the issue of “upgrading” existing records to take account of the Applied column? But this could be done lazily on re-reading either the XMP or the DOP as the images are viewed.

Either way, this is a fairly important restructuring of the UI logic to take account of the database changes.

Addenda

What about what should happen to the DOP files - for those who rely on them and regularly scrap the database?

BHAYT · June 29, 2022, 10:21am

You are right, when I came to “design” this I was “thinking aloud” and not “value judging” anything that I thought. After attending a brainstorming session at ICI Paint division many years ago I remember some of the moderators initial words, “Free wheeling”, “Volume”, “Suspend judgement” etc. as they wrote down every idea on flipchart sheets and pinned them everywhere!

But for the current design we have

The default when adding an hierarchical keyword in PL5 on Win 10 is that the ‘leaf’ keyword is assigned and any above that keyword are not by default, i.e. for A|B|C|D (A>B>C>D) only D will be stored in the ‘ItemsKeywords’ Table but the ‘Keywords’ will be stored as D-C-B-A where the - represents a pointer (the ‘ParentId’), from D to C to B to A.
To retrieve the keyword list (and structure) for presentation in the UI etc., where D will then be marked with a ‘tick’, “all” that needs to happen is
1- the ‘ItemsKeywords’ entry is accessed using the ‘ItemId’ (via idx_ItemsKeywords_ItemId)
2- the ‘KeywordId’ from that entry is used to access the ‘Keywords’ Table to give the lowest level in the hierarchy (via Primary Key).
3- For each pointer found in the ‘Keywords’ entry the next keyword up will be located using that ‘ParentId’ pointer until all keywords have been found (via idx_Keywords_ParentId).
The reconstructed keyword can be displayed or the reconstructed keyword can be used to decide what is going into the metadata.
This process must be repeated for every keyword associated with the image!

But this is the ideal case, the default for Win 10, but if I decide to un-assign D but assign C how is that going to work? Using the above procedure DxPL will be able to re-construct A>B>C quickly which is sufficient (I think) for populating the export or image metadata and the UI list of keywords but not sufficient to recreate the whole tree structure.

Finding levels below the ones assigned requires that an access is made via ‘idx-Keywords-ParentidIndex’ using ‘ParentId’ = the ‘KeywordId’ of C (in this case) which will be the D entry pointing to the C entry!! Using this technique it is possble to work backwards down the hierarchical keyword terminating when nothing further is found. The worse situation is when A is the only item assigned when the whole tree needs to be re-assembled using this backwards approach!.

This was hinting at an alternative way of leaving the original metadata intact.

I have already proposed that DxO can achieve what (some) users want by providing an export option that indicates that all keywords should be sourced from the original image unchanged for the chosen export option.

But an alternative would be to store keywords and their original locations in the structure but an option would then be required in the UI to indicate that the original format should be used in preference to the DxPL way of doing things!. Hence these fields would only be used if the keyword data came from the image and had not been entered directly via DxPL.

The fields would only be used for recreating the input metadata keywords directly and exactly rather than using my alternative of an export option that forces the keyword data for an the export to come directly from the image.

I do mean that, when coming up with a new design to fix an issue the best time for that to happen is before the product ever “escapes” out the door. We are way too late for that but at least it is a re-structuring of the logic not the actual UI!?

There are three alternatives (at least), I believe,

Use a revised structure with an ‘assignment’ field added. I am not sure if this is even possible without losing all the data in the structure, my database experience is way before SQL.
Add the same structure to the database and use the existing ‘ItemsKeywords’ and ‘Keywords’ Tables to populate this structure with an entry for every keyword using the procedure I outlined above and use ‘ItemsKeywords’ to mark the appropriate entries as ‘Assigned’ and switch all processing to the new structure.
Use two structures, the original ‘ItemsKeywords’, marking the assignments as it does now, and a new ‘ItemsKeywordsAll’ Table that starts life empty but which should have an entry for every keyword item associated with an image, whether assigned or not. When a search is undertaken then use the above procedure to reconstruct the list of keywords and add them to ‘ItemsKeywordsAll’ if they are not present already but the actual search must be conducted on ‘ItemsKeywordsAll’ not on ‘ItemsKeywords’.

With potential variations on the above, I am sure.

By using two structures instead of one they could both be identical in data structure but one would be the current ‘ItemsKeywords’ with entries for all “assigned” keywords (i.e. no change required) and the other ‘ItemsKeywordsAll’ would contain an entry for every keyword “associated” with an ‘Items’ entry (image) and is used to provide a more fruitful search experience!

I thought there was a potential issue with this “design” with respect to the way that data is currently returned for a search when hierarchical keys are involved so more tests.

I created 3 VCs and assigned D to the '[M]aster (i.e. left it with the Win10 default setting), [1] had C assigned, [2] had B assigned and [3] had A assigned for the DOP test below. I then added a number of images to the directory and the final images and assignments were as follows

Mallow [M] A>B>C>D
Mallow [1] A>B>C>D
Mallow [2] A>B>C>D
Mallow [3] A>B>C>D
Rose 1. A>B>C>D
Rose 2. A>B>C>D
Rose 3. A>B>C>D
Rose 4. A>B>C>D
Rose 5. animal, mammal, bear
Lavatera. animal>mammal>bear
Rose 6. animals>mammals>bear>black bear
Perovskia. animal, mammal, bear, animal>mammal>bear

i.e.

The issue revolves around removing ambiguity with respect to results for a search hence

Search for D from A>B>C>D:-

Personally I feel that D should still be shown as part of an hierarchy even if the two occurrences are both identical they are also part of the same hierarchy @Musashi, @sgospodarenko and @Joanna

Search for bear:-

Where there is a difference between “bear” found as a simple keyword and as part of an hierarchy both are shown but with the distinction between the simple keyword and the hierarchical element clearly shown; but the current search only returns “bear” when it is has been assigned.

Using either the single structure with an ‘Assigned’ flag but with all keyword entries or two separate structures the look-up can still be done to preserve the current feature of showing the partial hierarchy where relevant but now potentially showing far more matches overall!

If no item is searched on then the second structure would remain empty if a “provision as you search” approach is used!

Currently on Win 10 I have seen no data in the DOP relating to the assignments. In another post I also made a comment about projects and whether the DOP should hold that as a means of recovering lost project data

So an excellent point but tests with VCs showed how it is done.

I created 3 VCs and assigned D to the '[M]aster (i.e. left it with the Win10 default setting), [1] had C assigned, [2] had B assigned and [3] had A assigned and the exports showed the following

I then copied the image with the DOP to a new test location and opened the directory in PL5 and the assignments were all intact

The DOP contains the following

Incidentally it has occurred to me that there was a “hole” in the pre-PL5.3.0 release when the metadata would not be taken from the DOP for the [M]aster in any circumstances @sgospodarenko because if the default assignment was not used for the [M]aster it would be lost unless the metadata was written back to the image.

But then we knew that so no change there is just the implication that an assignment that could affect exports would be lost!

platypus · June 29, 2022, 11:51am

Splitting hairs: only the leaf keyword is shown, therefore, only the leaf is allocated in the database…

Basically, the complete tree is stored in the DB and only showing the leaf is a choice rather than a necessity. Speaking of choice: What can you find in the DB if you check keywords like e.g. Capture One, in which no keyword can be selected without all its higher levels?

Joanna · June 29, 2022, 12:30pm

Do you mean if we write them using Capture One and then see what happens in the PL5 database - or in the Capture One database?

Joanna · June 29, 2022, 12:32pm

This proves that the DOP is completely useless for storing and reconstructing hierarchies unless all members of a branch are included.

BHAYT · June 29, 2022, 12:33pm

@platypus thank you for your comments.

No it is not entirely a choice it is a choice that PL5 on Win10 makes as the default and it has a number of consequences, as I have stated from the beginning of this post.

The search for keywords uses the ‘ItemsKeywords’ structure which will contain only one entry per hierarchical keyword by default so only one simple keyword will ever be found in a search for a hierarchical keyword, which I consider wrong!
It influences the keywords that PL5 puts back into the image (if allowed) and uses for the exports, if any are made.

So I am doing nothing “wrong” at all, the “fault” or “feature” is all of DxO’s making and I understand what is going on only too well!

I have indicated the differences that I believe occur in PL5 Searching doesn't work properly for all keywords in an hierarchy - only for "Selected" keywords - #30 by BHAYT.

The only problem with that post is that I didn’t include all the options possible! Here are some more where the original table contained 1,2,3,4,5,6,8,10 (and no I have not tested the missing items)


Test	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17
animals	X	-	-	-	-	X	X	X	X	X	-	X	X	-	-
mammals	X	X	-	-	-	X	X	-	-	-	X	X	-	X	-
bear	X	X	X	-	-	-	X	X	X	-	-	-	-	-	X
black bear	X	X	X	X	-	X	-	X	-	X	X	-	-	-	-

But the default option for Win 10 after entering an hierarchical keyword is 4 from the table and this results in a single entry in the ‘ItemsKeywords’ Table. Every test that I have conducted is the same whether intentional or accidental by DxO!

If I de-select “black bear” and then re-select it then PL5 will automatically select all items i.e. scenario 1, effectively similar to Capture 1, and there will be four entries in the ‘ItemsKeywords’ Table and a lot more metadata will be included in the image and exports and the searches will work successfully.

This is from a previous post PL5 Searching doesn't work properly for all keywords in an hierarchy - only for "Selected" keywords - #50 by BHAYT.

BHAYT · June 29, 2022, 12:40pm

But when the DOP is read back into the system it produces the same situation as is did previously so what is the problem @Joanna!

Sorry I re-read your post, yes you are correct but that is because the assignment process influences the contents of the keyword fields. My complaint was always that it also corrupted the searches.

The correct assignment might/should be to assign all elements of the hierarchy of keywords to the image and that will result in a “full house” my VCs are looking at what happens when you only select some and that results in Rubbish choices = Rubbish output = faithful return of the rubbish later!?

Albeit if only some are discovered later then you will never retrieve the original hierarchy but does that matter if the image was never assigned that in the first place, i.e. you cannot retrieve what you never assigned!?

EDIT:-

It might be possible to persuade DxO to store the full set of original keywords in the DOP with the assignment tables so that the full scenario could be “rescued” from the DOP?