Looking for help: How to track individual lines in a plain text file?

I’m in the process of trying to make it so that TaskPaper remembers folded items even when the document is edited outside of TaskPaper.

I think to do this I’ll need to track items by line number and text content in a fuzzy way. I want to restore the “folded” items even if another editor inserts lines or edits some text.

In this case errors aren’t the end of the world, collapsed state will just be off. But generally it should work for reasonable edits. I think my solution would look like:

  1. On save

    • Hash each folded item’s line number, type, and text content
  2. On load

    • For each saved hash find the closest matching item in the loaded document and collapse it.

I’m looking for that magic hash function.

I see “Locality-sensitive hashing” on WikiPedia and the description sound hopeful. But it’s a complex subject and I’d like to get pointed in the right direction. Can anyone suggest the simplest most strait forward solution to this problem… How to track individual lines in a plain text file that may be edited by others?

Thanks,
Jesse

You’ve probably looked at this kind of thing:

http://research.microsoft.com/en-us/um/people/sdumais/ECIR07-MetzlerDumaisMeek-Final.pdf

(the literature seems to focus on measuring the similarity of two strings)

Levenshtein distance ?

(or perhaps longest common subsequence ?)

(as in the unix diff algorithm: diff - Wikipedia )

Thanks for the ideas. I think most of those options require two human readable texts and then they do a comparison between them. I was hoping more for something like:

- Hello this is my task.

To a hash:

DAsd2wrd

And then from the hash fuzzy match:

- Hello this is my favorite task.

So then for example you could create a url like taskpaper:DAsd2wrd and have a good chance of getting back the the linked to task even if it was somewhat modified.

But I’ve come to the conclusion that this isn’t really what I want. It would be cool, and work magically sometimes. And thus also break magically other times. I think instead I’m just going to encode collapsed items with a line number and a standard string hash. That means if the item is edited at all (outside of TaskPaper) it’s collapsed state will be lost. But generally it should work to track collapsed items in a document even if another editor adds/deletes tasks. Anyway I’m going to try this approach today. Much simpler. :slight_smile:

Sounds sensible :slight_smile:

( I was even wondering about something like http://textbundle.org/ with a .bml serialization (encoding fold state) saved into the bundle with the .taskpaper version )

I think that would still have the same problem. The issue being that if another editor edits the TaskPaper data without also keeping the collapsed lines data in sync (which I wouldn’t expect any but the most specialized TaskPaper editors to do) then there needs to be some sort of “fuzzy” to get things back in sync.

1 Like

That’s a wise compromise; it’ll be way too fragile to help in pathological situations (like line-edits externally), so I wouldn’t bother trying.

As a related point, I think you could remove the need for preserving tree state if TaskPaper were to have a multi-document/single-window interface, as some others have mentioned (like Ulysses, say). People would reopen documents much less frequently in that scenario, and you’d presumably have the data’s native behind-the-scenes graph representation in memory as long as a document remained present in the single-window UI.

1 Like

Alternative approach: use a plus-sign bullet point for folded items and a hyphen for unfolded ones. It can be rendered on-screen as a hyphen in either case, if matters. No hashes needed, deterministic functionality.

Yeah this would definitely be easiest, but there’s a number of TaskPaper users who share a single TaskPaper file through Dropbox. Shared groceries and the like. If I kept the fold state embedded in the file it would create lots of fun (and confusion) as the edits went back and forth!

Someday maybe! But still quite a ways to go before I get that far.