x

Obsidian nested tag search caveats

I took a break from my vault for a while, but eventually came crawling back—I realized how much I missed it. And in updating my vault plugins and getting it set back up on devices I hadn’t yet installed it on, I ended up going back over some of my notes. In the process, I ran into a curious situation: What if I want to search for a tag, without returning nested ones?

This isn’t a new ask, and while I think this use-case goes against how tags are designed, Obsidian and Zettelkasten (if you adhere to it) are tools that should be customizable to the user. So, if you really want to do it, you should be able to.

Well then, how do you do it?

The tag search operator tag: in Obsidian performs non-fuzzy, non-substring matches to complete tags, so tag:#hello/worl does not match the tag #hello/world. The operator will also automatically include all nested tags, so tag:#hello includes #hello/world.

If you want to exclude nested tags from such a search, tags are just text, so you could use text search to negate nested tags as they might appear in text:

tag:#hello -#hello/

But this has a major flaw: it will exclude notes that contain the non-tag text you excluded, so if the tag you’re excluding is, say, used as the anchor in a hyperlink, that note will be excluded too. So for the above example query, a note containing this content would not be returned in the search results:

#hello <-- found in the search
...
[hello](https://example.com/world#hello/) <-- causes exclusion

For those with generic tags, such as #todo or #work, this could be a real concern. But if you think unintentional matches will be rare, there should be no problem using this solution. Using emojis and other unique characters and phrases in your tags that would be rare to find in natural text or links will help as well.

Regular expressions (regex)

Some may suggest using regex, and skipping the tag: operator entirely, but that comes with its own difficulties:

  • Regex is slow for large vaults
  • Obsidian’s regex engine might be missing features depending on your platform
  • Tags can be inserted in places in note text that confuse simple regexes
  • Tags can be represented in frontmatter properties (tags), meaning your regex should be able to also fully parse YAML values if you want to be particular

However, you will need regex searches if you want to match substrings of tags since there is no other native way, meaning you’re forced to deal with the above complications as they present themselves.

For example, if you want to search for all nested tags ending with usage, you might write something like this:

/(^|\s)#\S+\/usage(\/?$|[^\/\w-])/

This will match tags like #lotus-notes/usage and #kde-kontact/usage, but not #microsoft-office/usage/workarounds.

Or if you want to find tags containing the nested tag tasks in any position:

/(^|\s)#(|\S+\/)tasks(\/|$|[^\/\w-])/

This will match all of #car-repair/tasks, #home/tasks/done, and #tasks/urgent.

But these regexes aren’t perfect:

  • To match the leftmost part of the nested tag, I used the inverted whitespace character class, \S, which will match any character except ASCII whitespace and newline. This is quite broad, so text like #hello/world%/usage would match the first example, despite only being made up of the tag #hello/world—the following /usage is plain text and not part of the tag, since the % separates them.
  • I used the word character class, \w, to ensure the tag isn’t continued, and that we really are matching the end of the tag. In JavaScript-flavored Regular Expressions (JSRE), this is equivalent to [a-z0-9_], so the technically-valid tag #obsidian/usage→success would also match the first example—it considers to be the end of the tag.

There isn’t really a reasonable alternative to the above, since tags can contain what seems like any character except for a specific subset of punctuation. For example, all Japanese punctuation (e.g. the full stop “”) do not end tags, while the French single guillemets (“‹›”) do, and the double guillemets (“«»”) don’t. 🤷

The only real solution is to dig into Obsidian and see how it works, or individually test characters… And in my individual character testing, this seems to be a decent list of punctuation that ends tags:

[^`~!@#$%‰^&*()=+{}[\];:'‘‚’"“‌„”‹›\\|.‥…,<>?\s]

Note the ] and \ are escaped.

Tags can be written different to how Obsidian sees them

Another issue with using regex or plain text searches is how you can write tags in your notes or, more realistically, how automated tools and plugins can add tags to your notes. This is an extreme example, but have a look at this:

”asd�asd” is a valid tag? okay?
”asd\0asd” is a valid tag? okay?

In this example, the tag has been added to the note using YAML frontmatter. The issue with the frontmatter is that its parsing rules are different to plain text notes. And in this extreme case, an escape sequence is embedded in the tag that is not only accepted, but is stripped from the tag as it appears in Obsidian. It’s also missing its leading hashtag, which is added by Obsidian automatically, and is surrounded by quotes. End result is searching for this tag with plain text or regex searches will not find it, if you use the same input as you might give the tag: search operator. The only way to have certainty is to somehow make a regex that can parse any valid YAML scalar, and that ain’t happening.

In conclusion

You see why I didn’t recommend using nested tags like this? And these regexes, even with the punctuation fix, still suffer from yet more gotchas, whigh are also shared with text search:

  • Matches will still be found inside contexts where tags don’t apply, like inside code blocks or comments
  • Invalid tags could erroneously be matched, like ones made up entirely of numbers (unless immediately followed by subtags)

The intricacies of the parser should not be our concern, yet here we are.

Sure, I’m overthinking this. Crafting your own regexes that work for your own unique situations will allow them to stay of a reasonable length, and none of this really matters at a personal level. But if you want to provide a working solution for every Obsidian user, these edge-cases must be considered—such as for plugin authors, or those with shared vaults.

Ultimately, to avoid all of these edge-cases and headache, I would recommend each nested child tag be a strict subset of its parent tag, and that the child tag’s purpose (and optionally text, if you still plan to rely on substring tag search) be globally unique. If it isn’t, make a new tag instead. So, for a situation where you might tag things like this:

  • #travel/china/pictures
  • #travel/finland/pictures

You instead do something like this:

  • #travel/china + #pictures/travel
  • #travel/finland + #pictures/travel

This way, you can search for travel pictures from China using the query tag:#travel/china tag:#pictures/travel, while still being able to easily see all travel pictures with tag:#pictures/travel. You also get the added benefit of being able to easily see all pictures with tag:#pictures, without having to do any advanced search queries. To repeat the wisdom of the ancestors: KISS!

Further reading

See how I accomplished nested tags in my own vault here.

I opened a Feature Request on the forum for better tag: searching that would help support nested tags here.

P.S.: Highly recommend learning regular expressions for your daily life, if you are prone to fun situations like this. A lot of things utilize user-provided regex, like search queries and filters. But even without that, it’s otherwise still a great skill to have in your pocket. Being able to hammer out regex in a moment’s notice for whatever purpose can be a huge time-saver.

Left-click: follow link, Right-click: select node, Scroll: zoom
x