Duplicate keywords appear at top of hierarchy

Do you realise that a keyword can exist at more than one level? For example:

Orange
Fruit | Orange | Satsuma
Colour | Orange
Enterprise | Telecommunications | Orange

The way that DxO have decided to implement this is that each keyword is identified in its hierarchical context. So looking for Orange alone will only return those images that have been tagged with the standalone keyword.

In order to look for Colour | Orange you will have to specify that exact hierarchy in the search.

At the moment, the search engine cannot do sophisticated searches for “Orange, wherever it appears” and you need to take care to check how you are marking images in order to get things to work according to the present implementation.

What you are seeing is not a bug, more an unexpected behaviour. Could you post a screenshot of the keywords section of PL, showing the hierarchies listed?

What are the things that you do before the keywords are shifted up?

  • Customising details?
  • Export and review?
  • Interactions with other applications?
  • Other?

I tried to reproduce - without success - the issue you found.

2 Likes

This is exactly what I am trying to figure out! Thank you for your interest in helping me. I still haven’t found a way to knowlingly reproduce the problem. I do have one odd behavior that might be related but I still working to clarify this.

The odd behavior is this: there are two ways to add a keyword, best as I can tell. 1) Begin typing into the “Add keywords” box and then arrow down to the desired keyword and hit Enter. 2) Open all nodes by clicking twisties until the desired keyword is displayed, followed by checking the box.

In the first case, only the leaf keyword is added. In the second case, all intervening keywords are added. As a real example using places, the first case assigns one keyword, such as “Bertha K. Russel Preserve” to the image. The second case adds: “Places”, “Bertha K. Russel Preserve”, “North America”, “Portsmouth”, “Rhode Island”, “United States” because the keyword I clicked on is actually Places > North American > United States > Rhode Island > Portsmouth > Bertha K. Russel Preserve.

I don’t understand why those two gestures behave so differently and I wonder if it is related to the issue I encounter where top-level keywords are errantly created that duplicate the spellings of leaf keywords.

I should note that the second case adds keywords correctly. In this case “Places”, “Places > North America”, etc. Hence this does NOT duplicate the problem.

Thanks so much for taking an interest. I do understand that the spelling of a keyword is not unique across the list of keywords. This is exactly the problem because PL creates duplicates of my leaf keywords at the top level. Now I am cleaning up 9 images that got messed up. What was I doing when the mess occured? I wish I knew!

I wonder if this is important: The images with messed up keywords are old. They were shot with a Pentax *istDS back in 2009. I was working on them to get past the idiotic problem that PL doesn’t support the camera. I could not open the DNG files so I went back into my subscription software and did a mass export to TIFF format. The messed up images are all tiff files displaying an icon that states that “No DxO Optics Module available for this image”.

As far as I read from your answers, the issue came up when you opened the TIFFs that you exported from Lightroom?

My guess 1: Keywords added in Lr: YES
My guess 2: Keywords added in Lr: NO

Hint: Be as specific as possible in your answers…

I had found this problem between the 2 ways to add keywords, here.

Oy! You want specific? I’ll try my best.

The original keywords were definitely added in Lr. However, I just remembered what I was doing when the error occurred. Well, not exactly what I was doing but generally.

I had misspelled a keyword and I was working on the effected images, adding the correct spelling and removing the incorrect one. I was at the Newport Folk & Jazz Festival and tagged many images with:

People > Celebrities > Pete Seger

I already had a keyword with the correct spelling of Pete’s name, so I was adding the correct keyword to a bunch of images and removing the incorrectly spelled keyword. Well after I completed the task, I noticed that all my place tags for Newport, RI had become top-level tags.

Places > North America > Rhode Island > Newport

had become:

Places
North America
Rhode Island
Newport

I have been playing around with these types of actions on recent images and I cannot reproduce the problem. I will to compare and contrast PL’s behavior with new vs. old images.

Eureka! The hierarchical keywords were added in Lr. The duplicated keyword problem occurs when I visit the folder containing the files. This is a disaster for me. What do I do? Can I fix this in Lr? I’ve cancelled my subscript (hooray!) but I have it for another two weeks.

Ooookayyy…let me think about it…

  1. Create an empty folder on your desktop.
  2. Direct Lightroom, PL5, PL4 etc. to that empty folder
  3. Quit out of Lightroom, PL5, PL4 etc.
  4. Do something else!

Wait.

Mark, can you possibly either post one of your affected files here or send it to me via a DM? Assuming they are RAW files and have XMP sidecars, both would be useful. I have a couple of ideas about sorting this mess out and need to see what you have at the moment.

Here is what I’d do to restore keyword sanity if I were in your shoes

READ through all before acting. Ask questions before acting.

  1. Open Lightroom, keeping it pointed at the empty folder (preventing further oddities with your files)
  2. Expand the Keyword List - stay away from everything else during all that follows
  3. Check, which keyword hierarchies you have are still intact
  4. Fix damaged hierarchies (make sure to stay in the keyword list) and don’t touch anything correct
  5. Repeat steps 3 and 4 until the keyword list is cured
  6. Wait a while
  7. Find any rogue keywords (that might have been created during your “excursion”
  8. Select all images that have a rogue keyword
  9. Delete the rogue keyword and fill in the correct keyword from the keyword list
  10. Repeat steps 7 to 9 until the compromised files are cured
  11. Tell Lightroom to write metadata to files (if you use Lr in its default settings)
  12. Wait until finished, quit Lightroom.

Trash the PhotoLab 5 database (look for files named DOPDatabaseV5)

  1. Open PhotoLab 5 and let it rest on the empty folder
  2. Expand the Keyword List and delete the rogue keywords from the database
  3. Tell PhotoLab to index the folder(s) that had images with rogue keywords
  4. When done, check the keyword list, it should now be free of rogue keywords
  5. Quit out of DPL if things look okay

If you’ve not customised images in PL5, that’s it.

If you have changed settings under the customize tab, redo the changes tomorrow.
If you have changed settings in PL4, make yourself noticed here.

I’ll check in again in a few hours.

Thanks for hanging in there with me. (I just split several trees for winter stove fuel so I’m not doing anything I may regret!

I believe that I understand your advice. I have created an empty folder and PL 5 and LrC are both active there. I have quite both PL 5 and LrC. When I restart Lrc, I should fix all rogue keywords. (Rogue keywords are the ones that have been duplicated in error. I have verified that these rogue keywords are visible in LrC and can be corrected there.) I must note the folder(s) containing images with rogue keywords to use when I am back in PL later in the process.

When I am done eliminating the rogue keywords and re-assigning the proper hierarchical keywords, I then set LrC to write all metadata to images. I think this step requires me to select all images in the LrC database and perform a Metadata - Save Metadata to File (Ctrl-S) command. (This will take a boatload of time for 100,000 images.)

When LrC is done writing metadata to files, then I quit Lrc and say goodbye to Adobe forever.

Then I will delete the PL5 database to start over with a new database.

Next I will restart PL and delete any rogue keywords. I will then revisit the folders that contained the images with (now removed) rogue keywords and re-index those folders. I don’t exactly understand this but I’ll look up how to re-index folders.

At this point, all should be good. If I understand this properly, it occurred because I did not instruct LrC to write all metadata to files.

Hi Joanna, I am planning to follow platypus’s advice, although I’m not sure why the problem wouldn’t recur. Do you have any insight? Basically platpus would fix keywords in LrC, write all metadats to images in LrC, then start up with a new PL database. Thoughts?

Mark, having originally use LR for keywording and then migrated to PL without issues, I would do what platypus suggests as it makes perfect sense. PL reads LR keywords, including hierarchical keywords just fine.

I think that the issue was brought in, in PL5, when you added keywords (in the keyword tool) that were already present in a hierarchy as shown in the keyword list tool. Maybe going to and fro between Lr and PL, both set to automatically sync sidecar files, added to the mess too.

I set Lr and PL to NOT sync metadata automatically. This puts me in charge of metadata transfers.
My keyword master is Lightroom and I only use PL to edit metadata with copies of images for testing. Automatic metadata sync will add issues rather than to solve them…

@markinlcri, have you used PL4 for extensive customising, before installing PL5?
If so, we’ll also have to consider to bring PL4 settings data over to PL5…

I shoot Fuji and only switched to PL 5 recently. Never used earlier versions. I am spent so will pick this up tomorrow. It is clear that I have been a bit naughty with updating keywords with PL and LrC, though I intended to shut down my use of LrC as soon as my evaluation of PL was over.

I do have another question that might be relevant. In LrC I added leaf nodes but never added the parent keywords in the hierarchy. Should I have? LrC had a setting to allow searches based on parents that would include all children. I used that feature and decided to avoid tagging parent keywords. I suspect that I should get LrC to add all parents to all images before switching to PL.

Another way to phrase my previous question: LrC implicitly assigns parent keywords. Should parent keywords be explicitly assigned before migrating over to PL?

I don’t think that is necessary as PL reads the hierarchy just fine with only the leaf nodes assigned. You can assign parents and all intermediate nodes in PL later if you wish.

I personally prefer to just have the leaf nodes assigned and I do not like the addition of all combinations of the hierarchy cluttering my keywords.

…you’d also want to delete the .dop files before indexing again. PL5 stores keywords in these files too and might therefore reintroduce the rogue keywords…

This is indeed a problem.

As software developers, we are taught the golden rule of SPOD (Single Point Of Definition). This means that important information should only be stored in one place and always referred to and searched for from that place.

Unfortunately, DxO have chosen to keep keywords in, at least, three places: their internal database, DOP files and, for RAW images, XMP sidecar files. Which then leads to questions about what is the truth.

If an image has already been keyworded in another app, PL can read the XMP sidecar file and know what keywords come with it. In theory, that should remain the SPOD that everything depends on.

The problem is that searching through thousands of files to see which ones contain certain keywords is a slow process - hence the reason why most “DAM” solutions resort to keeping that information in a database. And this is true not just for PL but for most Windows based DAM solutions.

Why do I single out “Windows based” instead of including Mac as well? Because macOS includes a comprehensive metadata indexing system as part of the everyday operation of the operating system. They even include their own metadata “tags”, which can serve perfectly well as the basis for searching for files, of any type, by any of approximately 160 metadata terms, which include those automatically retrieved from image files, like aperture, shutter speed, ISO, GPS, keywords in XMP files, etc. All of these metadata are automatically indexed by the Spotlight mechanism and all it takes is to use Finder to search for files that contain them - no external database required.

I am in the final stages of writing my own lightweight keywording solution for Mac and, even though I don’t maintain a database, it can do lightning fast searches in tens of thousands of images.

Unfortunately, until only relatively recently, Windows did not possess such a metadata indexing system, hence why most DAM writers had to fall back on creating and maintaining their own databases, for a “one size fits all” solution for both Windows and Mac, even though it isn’t necessary for Mac.

So now we have two places where keywords have to be stored and updated when changes are made, creating a reconciliation and synchronisation problems which is what you are seeing when using another DAM to do some metadata manipulations and PL to do other manipulations. If it were a simple matter of everything being stored in only the XMP sidecars, life would be simple but, as it stands, something like Lr holds it’s own database and PL holds its own, so now we have three “sources of truth”.

But it doesn’t end there! DxO use DOP sidecar files to keep a record of edits made to the image, including virtual copies, which used to be fine until they decided to put keywords in there as well. So, now you have a third record of those keywords to keep in sync in PL alone.

And there’s more! If you have more than one virtual copy of an image, the DOP file contains a record of the keywords, which can be different, for both copies, but an XMP is automatically created only for the master copy. Other copies have their keywords written only when an image is exported, when they are written to the exported file - yet another source of truth :woozy_face:

Not forgetting that Adobe provides alternative formats for storing hierarchical keywords, which may or may match other DAM solutions. And this is possibly the underlying cause of Mark’s problems.

In theory, the MWG (Metadata Working Group) provide guidelines for metadata storage but a lot of DAM writers take them as just that - guidelines. Even though Adobe participating in drawing them top, they then went ahead and contravened some of them themselves setting up their own “standards”, which most DAM writers are now afraid of breaking, even though the XMP standards are now deemed to be the more “universal”.

All in all, it’s a mess. I for one feel that DxO should not have stuck their head into this particular Hornet’s nest. But, having said that, they haven’t done too bad a job for a first foray into the field.