PL5 Searching doesn't work properly for all keywords in an hierarchy - only for "Selected" keywords

platypus · June 29, 2022, 1:25pm

I was thinking of choice in more general terms.

Of course choices are made by the user (us) and by the provider (DxO etc.).

C1 will not add bear to an image without automatically adding mammal and animal to dc:subject an the gui/view. C1’s choice is to always include what’s above, at least im version 12 that I use.

DPL’s current implementation has its shortcomings. As a user, I simply say that search does not work as expected (compared to industry/market standards like Lightroom et al.) and that I don’t care what DxO is thinking and doing, as long as “it works”.

Joanna · June 29, 2022, 3:41pm

Excuse me for not quoting you: @BHAYT, @platypus, but I can’t find a convenient starting point.

In mentioning storing keywords in DOP files, I was trying to point out that what they hold is decidedly minimal and that, whereas it is possible to reconstruct a keyword hierarchy from the database, it is not always possible so to do from a DOP file.

Which leads me to pose the question - why does DxO put anything to do with metadata in DOP files?

At the moment, metadata can be found in up to four places…

An image file itself (RAW or not) although PL doesn’t write it there.
An XMP sidecar for a RAW file
A DOP file
The database

At which point, I need to dig up the spectre of SPOD.

In my app, I can say that I only ever store metadata in one place - either the image file or an XMP sidecar. I do not manage a database.

Well, I might not manage a database but macOS does - it’s the Spotlight search engine, freely provided as part of the operating system. It is constantly being updated and indexed all the time that the OS is running.

So, when I write metadata to files, be they images or XMP sidecars, Spotlight duly, automatically, adds those metadata to its database. When I want to search for files that match (a) certain keyword(s), my app submits a predicate to the Spotlight engine and it returns every single file that references it/them.

My point is that, in order to be able to search for files, one first has to add metadata to search for, to some kind of database.

Now, if they wanted, DxO could leverage the same database technology for the Mac version and completely do away with overloading their image editing database but, unfortunately, Windows doesn’t have such a powerful mechanism.

So, it would appear that DxO have decided to implement their own database, presumably so that they can use the same one for both Mac and Windows - except it is not the same database, it is actually two different databases, one for each platform, as can be seen from the differences in table and column names in screenshots made by Bryan and me.

Both databases are SQLite but the Mac database is, in fact, autogenerated from a Core Data object model but I can’t tell what was used to create the Windows one.

All this raises several questions like…

Regardless of platform, why integrate metadata in the same database as image editing data, when separating it out would provide for greater flexibility and speed?
Why store metadata in any database, for macOS, given that macOS already provides a comprehensive metadata engine?
Since the metadata stored in DOP files is so inadequate, why bother writing it there in the first place? It certainly can’t be used for searching.
It is now possible to turn off automatic synchronisation of editing data to DOP files and metadata to XMP files, meaning that, if someone were to trash the database, they would lose absolutely everything - both editing data and metadata.
The majority of users who want keywording will use XMP sidecars so, apart from needing a metadata database for indexing and searching, why is there any need at all to put metadata in DOP files?

Which leads me to answer question 5.

At present, DOP files are used for metadata because they contain separate data for virtual copies and, in principal, each virtual copy can have its own set of keywords. This is managed by using GUIDs to link the DOP reference to the VC to the database entry for it.

But this is unnecessary duplication, as I have mentioned before, the DOP isn’t used for searching - its only use seems to be as a safeguard. Or it would be if it actually worked.

I setup a RAW file with two virtual copies.

I then applied a “matching” hierarchy of keywords to the master and the two VCs…

Assuming that the DOP file would “rescue” me from database deletion, I then closed PL5, deleted the database and restarted PL5.

Well the two VCs were restored fine but the master wasn’t so fortunate…

… even though the DOP contained the Master keyword…

			Keywords = {
				{
					"Master",
				},
				{
					"Master",
					"Copy 1",
				},
			},
            …
			Keywords = {
				{
					"Master",
				},
				{
					"Master",
					"Copy 2",
				},
			},
            …
			Keywords = {
				{
					"Master",
				},
			},

On the other hand, instead of “fixing” bugs like this in metadata held in the DOP files, why not take advantage of the fact that’s XMP stands for eXtensible Metadata Platform - something that means that DxO could add their own metadata, like for virtual copies, and dispense with polluting DOP files with metadata.

This still leaves the problem that, outside of PL5, it is not possible to search for keywords applied to virtual copies

I may have answered some of @BHAYT and @platypus remarks here but I daren’t make this post any longer and will address anything specific in another post.

Prem · June 29, 2022, 4:14pm

I don’t think that you are going to like me @Joanna. But while I think you’re spod would be ideal. In the case of DxO. They have 2 points of data entry. One is the DOP file and the other should be the XMP file. The XMP file should contain the meta data/exif data only and the DOP file should contain imaging edited data only. On next opening of the image file. The database should then read the XMP file and DOP file to update itself if any changes. This database should then only be used for DxO’s own housekeeping and should not be the main file until it is updated. As far as the XMP file goes. To me, DxO must follow the MWG standards at least until they have the dam working correctly and reliably, and any improvements they can make then be consulted with the and MWG to possibly make standard.

I learnt very early in my programming life. If you get a bug or a difference in operating systems, squash it before going onto the next major update. Otherwise it can come back and bite you as much as fivefold.

Joanna · June 29, 2022, 4:25pm

Oh, but I do, I do

Absolutely. But, as I mentioned, it would mean adding custom keys in the XMP file, for virtual copies - not exactly rocket science and would solve all sorts of problems.

Agreed. There is absolutely no reason why DxO shouldn’t follow MWG to the letter. The problem seems to be that they are trying to appease folks using other DAM solutions that don’t follow MWG

Only fivefold?

I once acted as consultant/designer for a project that “connected” a Windows app, written in Delphi, with a Java data provider. I started by laying out the “interface” between the “provider” and the consumer and wrote a JNI wrapper for the Windows client to talk and listen to.

The Windows team created dummy data behind a proxy data provider, for testing and, when the time came that the two sides were both ready, it took changing only three lines of code to point the client to the real provider instead of the test one. It’s simple if you follow the rules

platypus · June 29, 2022, 6:10pm

and I think that it’s not necessary. DxO has more than enough food for thought and adding more of the same will not help imo.

If DxO should have questions about what to do and why, they can ask their questions here, like they have done in the past on a few occasions. I feel that a lifelike discussion should be initiated between DxO and people on the Forum and others, but I fear that its a) difficult by messaging and b) that it’s not going to happen, in spite of all the possibilities like webex, Zoom etc. or c) it’s already taking place behind the scene •

KeithRJ · June 30, 2022, 12:10am

It seems to me that DxO could do with some external IT consultants to help with their design, so why not offer your services to them @Joanna and others specially since you @Joanna are already in France

Joanna · June 30, 2022, 8:10am

From the lack of enthusiastic response, or even dialog, here, I think there might be a tad of “not invented here” syndrome and I gather DxO use Agile methodology - something, from my experience with other companies, that can severely affect flexibility of thinking. But I hope I am wrong.

BHAYT · June 30, 2022, 8:45am

Although I am not a fanatic about not updating multiple copies of data at the same time, multiple databases is a whole new thing entirely. Why bother when a well designed and well programmed single database is fine. Some metadata is going to exist in the editing database anyway as part of the editing process so why split it other than to take advantages of Mac OS features that are not available in Windows which make maintenance even more difficult!

More in common between version than not, perhaps.

Because the DOP is effectively an audit record of the database. It can be carried between systems, if done carefully. It can be used as long term storage of edits and metadata. But if inadequate it needs to be improved not dispensed with altogether (please remember Virtual Copies - which you did in a later comment).

The DOP is not intended to be an active component of the system just an audit at a point in time. Some are copying DOPs off and keeping a history of editing attempts/possibilities!?

But they wouldn’t lose everything @Joanna because the data will be recovered with AS(OFF) from the DOP and thanks to the latest change with AS(OFF) the DOP will be used in preference to the metadata even if the metadata is actually later than the DOP. I don’t hate the change I just hate the way it was implemented and I hate the lack of any attempt to use metadata in preference to the DOP.

Because DxO appear to consider the metadata from the DOP as a DxO addition to the database the so called ‘Conflict Resolution’ code is not triggered. Herein lies my real gripes about DxO development, half-hearted or half-cocked or both but certainly inadequate (at times).

Adding metadata to PL5 does not trigger the ‘S’ icon when it conflicts with the image metadata because according to one response I received from DxO it was never intended to!
The same applies with the 5.3.0 release with the data taken from the DOP, no check with the real metadata and no ‘S’ flagged under any circumstances!

If you don’t put it there then the only other place is back into the image. I like the two-stage process it means that if I discover that DxO have got it wrong it is up to me to not write it back to the image. On a serious note when I started PL5 Beta testing I did envisage a situation where there might be shoot progress data in DxPL that was kept separate from the main (real metadata) and was used to progress work with the customer.

This would require a greater metadata separation but the export option I suggested to take metadata from the image directly to the exported image to avoid the “my metadata has been corrupted” situation would actually do that nicely @Musashi @sgospodarenko!

The scenario you described worked fine for me did you have AS(ON) or AS(OFF), mine were done with AS(OFF) when the keyword metadata will be taken from the DOP!

@Musashi and @sgospodarenko please start talking to us because the implementation of the IMatch alternative keyword strategy of PL5.2.0 and the reversion to the DOP strategy of PL4 but now with a huge amount of metadata effectively blocking the metadata from the image on first discovery or not based upon the AS(ON) or AS(OFF) settings are just … words fail me!

This one seed sorry seems (how and why did I type seed) to work (on Win 10)

Musashi · June 30, 2022, 10:56am

Dear testers,

After internal discussion, here are the actions we will work on for a future version (PL 6.0.X).

1/ Preference to apply all KW hierarchy to an image or just the selected child
We will add a preference to apply all KW hierarchy or just the selected one, this preference will be added in the Preferences menu.

2/ Compatibility mode
We will add a preference to add the keywords hierarchy in “DC subject”

3/ Option to search for whole KW hierarchy or just selected KW
We will add an option to choose whether to search the whole KW hierarchy or just the selected one (via contextual menu on the token or option button in search field with list of options, this is work in progress)

These additional developements will be planned for a release close to the PL6.0 launch.

We hope this will suit you and will answer positively to the majority of questions here.

Best regards,

cc @sgospodarenko @CaptainPO @kettch @alex @Marie

BHAYT · June 30, 2022, 11:28am

I wrote this but had not completed and posted when the welcome post from @musashi copied to @sgospodarenko @CaptainPO @kettch @alex @Marie was posted. I have made some amendments to the posts but feel that whatever points I was trying to make are still valid.

@KeithRJ Why “pay” for what is on offer for free!?

To be honest the “metadata” not meeting guidelines is an “easy fix” but without that and a means for keyword pass through the product is currently worse than useless for me (to start using it for metadata) and for others who want their metadata preserved @Musashi @sgospodarenko.

If DxPL is the last link in the workflow chain then the export mechanism must work the way that the users wants/needs it to work regardless of whether that meets any guidelines or not (other than the users “guidelines”)!

Otherwise the metadata must then be (re-)adjusted after export, the so called “DAM sandwich”, or I’m DAMned if I do but well and truly DAMned if I don’t or something like that.

The major problems are

Bugs of a minor or major form that “litter” (“litter” not “flood” or “blanket”) the PL5 build and have become evident during close scrutiny of the keyword implementation in particular.
Unnecessary and " crudely" implemented reactions to perceived user complaints without proper consultation and evaluation which then lead to an even worse situation.
Unfinished good ideas that are going nowhere, deliberately it appears, namely the so called ‘Conflict Resolution’ the major part being ‘Conflict Detection’ followed by a simple way of the resolving any conflicts easily. Or rather detecting any conflicts between external metadata and DxPL but not the other way around because who needs that feature. That element of the product is half-way to being useful now please finish it properly rather than provide some excuse that it was never intended to be a complete well-rounded solution!
A development team that appears to be snowed under with work and I sometimes feel that the support staff at DxO are not held in high enough esteem and that the software engineers are “revered” as “gods”. The support staff should be the “champions of the user” and the “guardians” of the companies reputation and not someone that needs to go “cap in hand” to anyone (that was the way I was treated in my own company, as the “champion of the customer” where I was front line support for most of my career, albeit I started as UK support for the database product for the first 2 years). Is there something profoundly wrong inside the development infrastructure that is leading to such a patchy reliability record and a complete absence from the forum when topics of importance are being discussed which directly impact the usability of the product in one particular area or another. I stand by this statement even after the welcome post from @Musashi and others including @CaptainPO.

Agile is the last thing that the DxO development seems to have become! It appears to be mired in the legacy of the metadata implementation which I believe has actually exposed issues that have been in the product for some time.

Some (most) are actually minor fixes but I do not want DxO to monitor this or any other topic and then quietly go away and repeat the PL5.2.0 (change to the ‘dc’ fields) or PL5.3.0 (revert to the PL4 DOP handling methodology but by hijacking a PL5 feature) without discussing it openly and actually developing a solution for all, does the new post from DxO change this statement!?

If that means adding more options to the sacrosanct ‘Preferences’ @Musashi then do that or add another ‘Metadata Preferences’ to the ‘Edit’ menu. In the world of metadata handling most packages have some options that allow users to tailor the keywords in particular. Some improvement here then!

In truth there are “hidden” options that “came to light” (to me at least) while investigating this topic namely the impact of this (which are arguably a powerful set of capabilities albeit ones that may well infringe the odd guideline or two @Joanna)

or this

Resulting in this from the now “damaged” PL5.3.1 versus the DxPL5.1.4

The reason that I discuss implementation strategy at such length in some of my posts is not to show what a good designer I am (that’s a given - ) but to try to open the eyes/minds of the DxO developers to the possible routes to a “better” or “fixed” product at the very least, since I am concerned about some of the decisions that have been made and the thinking behind them as evidenced by this topic in particular.

I have just seen the post from DxO and need to digest exactly what it is promising; other than we will need to upgrade to get access to these “new” features!

But @Musashi I am very puzzled about the issue I “tripped” over, prompted by a discussion with @Joanna, as to the original intent of the design I considered a “bug” in this topic!?

KeithRJ · June 30, 2022, 11:33am

This is great news and hopefully allow users to configure how they work with keywords to match their needs, well done!

nwboater · June 30, 2022, 12:45pm

It seems they will implement these changes in PL5.

BHAYT · June 30, 2022, 1:09pm

@nwboater I understand that but want to assess whether that goes far enough with respect to the issues raised in this topic and others and once again we have a DxO reaction to forum posts that may be “well formed” or “badly informed” by other posts upon which I and others then base our comments.

Actually they are promising them for PL6.? rather than PL5. So they will not be around until after September/October and only if you purchase an upgrade to PL6 (which I am likely to do anyway)

I could promise my wife that I am going to “fix” the leaking tap but does that mean take it apart and replace the washer (difficult if it is ceramic), replace the leaking tap with a new one that may or may not match the old one, rip out the sink in a fit of pique and replace all the taps and the sink and discover the new sink doesn’t fit the cut-out in the work-top and anything in between.

Exactly what do DxO think they mean, the “devil” is in the detail?

We float ideas and make complaints and hurl insults in the forum but so far DxO’s reactions have been “misinformed” either because we are not being explicit enough (I tend to overdo the explicit enough by a huge amount), or DxO is misunderstanding or both!

Listening and promising action is good but what I would like to see is DxO actually reaching out to users and actively interacting before deciding on the final design and the implications of that design, with these changes DxO is not exactly exposing the “crown jewels” of the product and NDA’s can cover a wide variety of things not just a Beta test !

Before DxO offer us a beautifully crafted “chocolate teapot”, good to look at, good to eat (perhaps) but useless as a teapot and prone to melting in a hot room etc.

Prem · June 30, 2022, 2:32pm

Hopefully DxO will stick to the MWG guidelines until they get everything working correctly and no bugs.
That should mean easier maintenance and less code writing and most of all keep the meta data in the XMP file and not the DOP file. That will make one SPOD for the meta data in the XMP file and one SPOD for the imaging editing in the DOP file. Again making coding easier to maintain.

BHAYT · June 30, 2022, 2:49pm

@Prem they can get all the advice they want but if you look at the spreadsheet pages I included some posts back you will see that the packages do not all agree with Capture One but DxO comes close but only with all elements of the hierarchical keyword assigned!

But many users don’t want that, they want exactly what their DAM put into the image so DxO needs to accommodate those users as well or risk continued dissent (a polite word for the actual reaction contained in the topic with f… etc in the title).

Prem · June 30, 2022, 3:01pm

If DxO follow the MWG guidelines at least there dam will be correct. If you are expecting DxO to program and be compatible with all other dam programs. You are expecting them to do an awful lot of programming which should not be their responsibility as you can see from some of these postings. Some customers are expecting the nigh on impossible. I still think DxO should get their software running properly before worrying about being compatible with software that does not meet the standards.

Joanna · June 30, 2022, 3:18pm

Yes, I think that idea is left over from the thought that the PL DAM might possibly have been better packaged as a “plugin” like FP or VP. Not all users want DAM functionality in much the same way as other users might not want the FP and VP functionality. Developing it “in isolation” would have meant that folks who feel it necessary to delete the main database to overcome problems moving files didn’t feel so wary, in case they also lost their metadata.

As others have said, over the years, DOP files have become their fundamental record of changes, with the database simply acting as an indexing and caching mechanism.

As someone who regularly trashes their database to fix file reorganisation issues I would strongly contest that assertion.

What I am getting at is that it is possible to run PL with no DOPs or XMP files, just the database - something that scares me silly.

I guess this is because DxO regard any changes made by them as “the new truth”, replacing all past truth. From what others have said, this appears to be quite a pain point because some want the metadata format imposed by another DAM to be respected.

If by saying “back into the image” you mean into the XMP, why not?

I always have AS(ON) otherwise I could not scrap the database when I needed to. And it was testing this situation that revealed that a single, top-level keyword does not get restored from the DOP if the database is scrapped (@Musashi please note)

As others have said, thank you for letting us know that you are taking notice of our discussions but could I please ask that you don’t make any knee-jerk reactions to what we have said, including my 2¢ worth.

This needs clarifying. Do you mean in the (non-RAW) file, the DOP, the XMP file or the database?

This I don’t quite understand, unless you mean ensuring complying with MWG guidance by ensuring all keywords mentioned in lr:hierarchicalSubject are recorded.

Failure to record all such keywords will impact heavily on your item 3/ - in that, although you might be able to work something out for a search in PL, without all keywords present in dc:subject means that other software, like the Spotlight search in macOS, will not be able to search effectively.

Something inside by brain keeps telling me that 2/ and 3/ are just two sides of the same coin and only one option may be necessary (discuss)

Exactly. And I think the answer is that DxO would much rather get all this free consultancy from us, as experienced developers and analysts, as well as possibly rather autistic or OCD

As of PL5.1.3, metadata handling seemed fine and to guidelines. I could confidently work with it, Capture One and my app without any conflict. As I have often (possibly too often) mentioned, I have absolutely no need for DAM functionality as part of PL and it would seem that it is only really when PL’s DAM is used in tandem with certain other software that it starts to fall down.

I think I mostly agree with this.

… and this.

I don’t have any problem with infringing guidelines, as long as it doesn’t cripple functionality - as evidenced by the mess it causes to external searching and lack of compatibility with other DAM providers.

That’s what I like to see - a bit of humility

I was employed as a consultant by a major international business software house to “rationalise” their codebase as it had gotten somewhat “spaghetti-like”. Fortunately, the managing director, who also wrote the initial version, wasn’t too precious about his legacy code and, much to my surprise, decided to start from scratch on a completely new codebase, whilst part of the development team maintained the legacy app in the meanwhile. We mutually agreed that constructive argument was “a good thing™”, sometimes having multi-day “arguments”, where we would both try to prove the other wrong. What we ended up with was solutions that blew us both way with the shear genius that could have only come from more than one brain

I couldn’t agree more with your post. To my mind, any options should be based on

Total MWG Guidance adherence (the default)
You’re on your own buddy

BHAYT · June 30, 2022, 3:31pm

I am not suggesting compatibility with other DAMs from the start of the furore I have asked for only 2 things;

Compliance with standards
A pass through for all users who effectively want DxPL to keep its “paws” off their lovely metadata.

Item 2 has some “nasty” implications if a user breaks their own rules and starts adjusting the ‘Rating’ in DxPL for example (the most likely potential infringement) what should be carried forward and how do you allow the user to configure such a “pick and mix” scenario, if you allow it at all, but that is what design is all about:

enumerate the possible scenarios
enumerate the solutions
pick the cheapest or those that offer the most “bang for the buck”
block those scenarios you won’t/can’t handle or offer limited value for the investment

or something like that!

But don’t leave a giant hole for unsuspecting users to fall into, i.e. document what can and can’t be done with the product don’t leave it to “idiots” like me to waste huge amounts of time doing “boundary” tests when they or other users vanish down “rabbit” holes!

I am not sure that they want or expect anything except their favourite RAW editor, they simply don’t like what is on offer because it bends their metadata out of shape…

Now as for me, but that is a whole different story. I’m afraid that I don’t think that any of this is that complicated which makes me irritated when it seems to require so much effort to start the ball rolling.

The table shows more similar if not identical scenarios (right or wrong) than it shows radically different ones. Photo Mechanic and Capture One stand out for different reasons and Photo Supreme is not there because my tiny brain cannot handle the way is demands categories!

I think that PL5 drew flack from PM and IMatch users and the change in PL5.2.0 was to effectively implement the output from IMatch except that the output in question to the ‘dc’ fields is controlled by a pair of options from which there are 3 different combinations.

The original PL5 correctly followed one scenario, another scenario is similar to other packages and not my choice under any circumstances and the other scenario is the same as the one implemented on PL5.2.0 where it becomes the mandatory choice (no option at all).

I purchased a license for IMatch when it was on offer at Christmas but it is a bit of a behemoth and I would welcome something simpler. But the author, the frequency of releases, the release discipline and release notes are exactly what any developer should aspire to and light years away from DxO who don’t …

Joanna · June 30, 2022, 3:33pm

Can I just clarify something to see that we are all agreed on it?

MWG Guidance indicates that the following is the correct way of recording a hierarchy…

     <dc:subject>
        <rdf:Bag>
           <rdf:li>Animal</rdf:li>
           <rdf:li>Mammal</rdf:li>
           <rdf:li>Bear</rdf:li>
           <rdf:li>Black Bear</rdf:li>
        </rdf:Bag>
     </dc:subject>
     …
     <lr:hierarchicalSubject>
        <rdf:Bag>
           <rdf:li>Animal|Mammal|Bear|Black Bear</rdf:li>
        </rdf:Bag>
     </lr:hierarchicalSubject>

This indicates that the “selected” keyword is Black Bear.

If a user were to select Mammal as well, then I believe the lr:hierarchicalSubject should then change to…

     …
     <lr:hierarchicalSubject>
        <rdf:Bag>
           <rdf:li>Animal|Mammal</rdf:li>
           <rdf:li>Animal|Mammal|Bear|Black Bear</rdf:li>
        </rdf:Bag>
     </lr:hierarchicalSubject>

… or, if all nodes were selected…

     …
     <lr:hierarchicalSubject>
        <rdf:Bag>
           <rdf:li>Animal</rdf:li>
           <rdf:li>Animal|Mammal</rdf:li>
           <rdf:li>Animal|Mammal|Bear</rdf:li>
           <rdf:li>Animal|Mammal|Bear|Black Bear</rdf:li>
        </rdf:Bag>
     </lr:hierarchicalSubject>

This then means that selected (applied) nodes can be identified from the XMP.

@Musashi does this make sense to you as well?

BHAYT · June 30, 2022, 3:47pm

This is the output from a test currently running on my Test machine for A - “Animal”, B=“Mammal”, C=“Bear” and D= “Black bear” with PL5.1.4.4728 on Windows 10.

PL5-1-4_4 is the result of selecting B (“Mammal”) but automatically selected A as well and PL5-1-4_5 is the scenario with “Mammal” and “Black Bear” selected and PL5-1-4_3 is all selected which correspond exactly to what you are showing