Bug(?): pathological edge case for the tag-matching regex


#1

This is a completely pathological edge case, and if this was never fixed, I don’t think you’d have many detractors. But having found it, I thought I’d share. :smiley:


Create a new item with this text:

This is a task @a(\) @b(0)

and click on anything after the first @a(. You get the following text in the search field, which matches no items, and is highlighted in red:

@a = \) @b(0

I think this should be treated as two separate tags:

  • a tag a with value \
  • a tag b with value 0

Looking at birch.js, it looks like the problem might be this regex on line 31946:

tagValueRegex = /\(((?:\\\)|[^\)])*)\)/;

I haven’t found a fix that works with TaskPaper’s JavaScript interpreter.

The problem doesn’t occur if you only have @a(\) at the end of a line.

I’m using TaskPaper 3.1 (191).


Sidebar: how did I find this?

I’m writing a Python module for interacting with TaskPaper-formatted documents. I want to be able to script these documents, but the built-in JS isn’t too useful for me, because I’m also using this stuff on Windows, for work. I would use Matt Gemmell’s Ruby library, but I know approximately zilch Ruby, and I thought it would be easier for me to write my own from scratch than to learn Ruby.

This is obviously hugely useful in terms of actually doing stuff on my todo list.

I threw my tag-parsing code through a Python fuzzing library called Hypothesis, and it stumbled upon this edge case.

If I reverse the order of the match-an-escaped-closing-parens and match-anything-but-a-closing-parens groups, like so

r'((?:[^\)]|\\\))*?)'

then it seems to work correctly. The same fix doesn’t seem to work for TaskPaper – if I edit the JS this way, it doesn’t recognise that @b(0) is a tag.

Aren’t regexes horrible? :stuck_out_tongue:


ETA: AFAIK, most regex parsers go from left-to-right. I suspect this might be fine if you worked from right-to-left, finding the latest match first, and working backwards. No idea how you do that though.


Proposal, require that both ( and ) are always escaped in tag values?
#2

Pathological is right! I’d have to say TP is working as intended there, given the @search tag’s value format explicitly includes escaped close-parens, so the remainder of the line really is within the @a tag.


#3

Great to see your project and thanks for sharing your findings. As @mattgemmell says, once you escape the ), then everything that follows is gobbled up as a tag value until the ) at the end. Correct as far as that goes, but certainly confusing. This behavior can also break TaskPaper’s internal API in some cases. I’m looking at this possible fix: