Thanks for this really interesting thread. I have some work in progress (that’s not ready for prime time) that is related to the work here: I have a rough draft of a Python library that converts .bike documents into the Pandoc JSON representation of its AST using lxml and panflute. I’ve put my draft in my own rdhyee_utils
Python library – so it’s rough and not been packaged to useful to others yet: https://github.com/rdhyee/rdhyee_utils/blob/master/rdhyee_utils/bike/bikeformat.py. You can see also my work so far in writing a Python library to talk to Bike using AppleScript, or my precisely the deprecated, though still very useful, appscript: https://github.com/rdhyee/rdhyee_utils/blob/master/rdhyee_utils/bike/__init__.py. More later once I get my work polished up (and after I get a draft on a pandoc bike writer)
@jonsterling I work in IT support at the University of Rochester Medical Center, and while I’m not an academic, I work with a ton of them. I’ve also recently discovered the world of Markdown, and I love it. I worked in the production side of graphic design for about ten years in the 90s and early 2000s, and XML tagging for text was just taking off. Markdown (and HTML) are both simplified versions of that, with CSS taking the concept further. That, or XML is generalized HTML, but I digress.
Anyway, I plan to read your blog post and share it with my geekier academic friends.
As an academic I found this extremely helpful. Thank you @jonsterling for your efforts and sharing!
As I want to retain highlighted words in the output Markdown file using <mark>
tags, I attempted to modify the Pandoc command. ChatGPT suggested using a Lua filter to preserve mark
tags. After several prompts and troubleshooting errors, I successfully created the following Lua filter to meet my needs:
function RawBlock(el)
return el
end
function RawInline(el)
if el.text:match("<mark>") then
return el -- Pass through without alteration
end
end
function Span(el)
if el.classes:includes("mark") then
local content = pandoc.utils.stringify(el.content)
return pandoc.RawInline("markdown", "<mark>" .. content .. "</mark>")
end
end
So the terminal command would be:
cat /path/to/your.bike | saxon -xsl:/path/to/bike-to-html.xsl - | pandoc -f html -t markdown-raw_html --lua-filter=/path/to/preserve_mark.lua | sed 's/\\//g' > /path/to/output.md