TaskPaper (BNF) Grammar

Is there a publicly available BNF grammar for TaskPaper syntax? I am interested in making a ProseMirror schema for the TaskPaper format.

It should be simple to figure out the grammar on my own, but it would be nice to start with something that matches the reference implementation.

Sorry I don’t have a BNF. The best I have is the parser code link from here:

And in particular the parsing process for TaskPaper formatted text starts with deserializeItems here:

1 Like

Here’s my attempt at an annotated taskpaper grammar. Before I turn it into a ProseMirror schema, can I get some feedback on:

  • Anything I missed or got wrong
  • Better terminology to match Taskpaper convention. (Specifically, I didn’t know if there was better term for “content.”)

Interesting things I discovered while writing the grammar:

  • ‘:’ is a valid project name (however it messes up parsing of bodyContentString)
  • “@” and “@()” are valid tags. (And you can even retrieve the value of 'data-' via the API)


///--- GRAMMAR ---///
// A taskpaper document is a series of items.
taskpaperDocument : {item} ;

// An item starts with optional indents followed by a project, task, or note.
// An item ends with a newline preceded by optional white space.
item : {INDENT} (project | task | note) [WHITESPACE] NEWLINE ;

// A project starts with a project name (that ends in a colon).
// A project line may optionally end with tags.
project: PROJECTNAME {tag} ;

// A task starts with optional indents followed by a dash and white space.
// Any text after that forms the content of the task.

// A note starts with an optional indent followed by content.
note: {INDENT} content ;

// A tag starts with whitespace followed by the @ symbol and tag name.
// A tag may end with an optional value enclosed in parentheses.

// Content is a series of tags and/or text.
content: {tag | TEXT} ;

///--- TERMINAL TOKENS ---///
// Project names are 0 or more non-colon characters terminated a colon.
PROJECT_NAME: /[^:]*:/ ;

// Tasks are indicated by starting with the special dash character.
TASK_OPEN: '-' ;

// Tags names are the @ symbol followed by 0 or more non-tag-specific characters.
TAG_NAME: /@[^@()]*/ ;

// Parentheses enclose the value of a tag.
VALUE_OPEN:  '(' ;

// The value can be anything except for the value close.
// If the VALUE_CLOSE is escaped, it is treated as part of the value.
VALUE: /([^)]|(\\))*/ ;

// Taskpaper only indents with tabs.
INDENT: '\t' ;

TEXT: /./ ;
NEWLINE: '\n' ;

It’s hard for me to say without being able to just test various cases and compare to TaskPaper. But with that said I think you’ve got the most of the oddball cases that I can think of and that many TaskPaper parsers skip. Such as tags allowed after a project. Escapes in tags.

Would love to be able to test this once you’ve got it working in ProseMirror. Looks good!

@leftium I’d like to add some points:

  1. Project names are allowed to have colons in the middle of the name. So e.g. Project: My new project: and even Project:: are all valid project names (See https://github.com/jessegrosjean/birch-outline/blob/master/src/serializations/taskpaper/types.coffee).

  2. Tasks may begin with -, +, or * character followed by a space (See https://github.com/jessegrosjean/birch-outline/blob/master/src/serializations/taskpaper/types.coffee).

  3. Tag names may only have characters from the certain Unicode ranges (See https://github.com/jessegrosjean/birch-outline/blob/master/src/serializations/taskpaper/tags.coffee).

Thanks, I forgot about #2.

I found out tags with escaped \) are quite hard to do with regex alone and the ProseMirror schema wasn’t exactly what I thought it was.

It doesn’t cover all the different syntax edge cases, but I got something working here: https://taskmirror.glitch.me/

You can view the source code here: https://glitch.com/edit/#!/taskmirror

1 Like

Maybe the definition of the tagValueRegex variable in https://github.com/jessegrosjean/birch-outline/blob/master/src/serializations/taskpaper/tags.coffee can help.


Thanks for the great pointer to the API source! CoffeeScript makes regex a little less painful~

If I end up trying to cover all the syntax cases (after a while, there are diminishing returns), I might try PrismJS.

PrismJS might also help with the case where tags are not fully highlighted until the final closing ) in my simple version.

At least in the os x Application it seems tha a Tag can be also before the project name Etc