XQuery over a folder-full of Bike files

macOS includes a built-in XQuery engine (NSXMLNode.ObjectsForXQuery), and supports XInclude, so that we can apply a query like:

 //p[ matches(., $Search_Pattern, "i") ]

not just over one Bike document, but over a whole folder-full, and generate clickable reports like the following, in which the links take you straight to the Bike document, and to the specific line that contains a match.

with a Keyboard Maestro macro like:

XQuery search over Bike folder.kmmacros.zip (5.3 KB)

which requires you to have the following Keyboard Maestro subroutine in any active KM macro group.

XQuery over XML SUBROUTINE.kmmacros.zip (3.2 KB)


Useful.

You may notice, however, that:

  1. My Bike files (one per day) have ISO8601 dates in their names
  2. the most recent matches returned above are from 2023. None of the matches in 2024 or 2025 (or even in late 2023) are found.

The problem seems to be that files written out by Bike now (since a request made in Aug 2023) decorate their enclosing <html> tag with the XHTML name-space attribute:

xmlns="http://www.w3.org/1999/xhtml"

At the time, it looked as if this very reasonable request entailed no side effects, but I should have realised a long time ago why:

  1. more recent Bike files seemed invisible to XQuery over an XIncluded folder
  2. older bike files that did show up in the queries suddenly vanished from them if re-saved by current builds of Bike 1.0

It seems that adding xmlns="http://www.w3.org/1999/xhtml" to the <html> tag may shadow the xmlns:xi="http://www.w3.org/2003/XInclude" namespace declaration of a tag that wraps the filepaths of the files in the folder, allowing them to be treated as a single NSXMLDocument by XQuery.

In other words, XInclude appears to stop including the XML at the point where it finds a competing xmlns namespace declaration.

Expand disclosure triangle to view XInclude source
<?xml version="1.0" encoding="utf-8"?>
<group xmlns:xi="http://www.w3.org/2003/XInclude">
	<doc path="/Users/houthakker/DayNotes/" file="welcome.bike">
	<xi:include href="file:///Users/houthakker/DayNotes/welcome.bike"/>
	</doc>
	<doc path="/Users/houthakker/DayNotes/" file="notes2025-04-06.bike">
	<xi:include href="file:///Users/houthakker/DayNotes/notes2025-04-06.bike"/>
	</doc>
	<doc path="/Users/houthakker/DayNotes/" file="notes2025-04-05.bike">
	<xi:include href="file:///Users/houthakker/DayNotes/notes2025-04-05.bike"/>
	</doc>
	<doc path="/Users/houthakker/DayNotes/" file="notes2025-04-04.bike">
	<xi:include href="file:///Users/houthakker/DayNotes/notes2025-04-04.bike"/>
	</doc>
	<doc path="/Users/houthakker/DayNotes/" file="notes2025-04-03.bike">
	<xi:include href="file:///Users/houthakker/DayNotes/notes2025-04-03.bike"/>
	</doc>
	<doc path="/Users/houthakker/DayNotes/" file="notes2025-04-02.bike">
	<xi:include href="file:///Users/houthakker/DayNotes/notes2025-04-02.bike"/>
	</doc>
	<doc path="/Users/houthakker/DayNotes/" file="notes2025-04-01.bike">
	<xi:include href="file:///Users/houthakker/DayNotes/notes2025-04-01.bike"/>
	</doc>
	<doc path="/Users/houthakker/DayNotes/" file="notes2025-03-29.bike">
	<xi:include href="file:///Users/houthakker/DayNotes/notes2025-03-29.bike"/>
	</doc>


</group>

Any chance of making the xmlns="http://www.w3.org/1999/xhtml" optional when the <html> tag is written out ?

It seems a pity not to be able to use standard XML tools, built into the Apple Foundation classes, and available to JXA (and AppleScript), like XQuery and XInclude


UPDATE: Same problem for other versions and implementations of XQuery

I’ve just tested .bike files with and without the late 2023 addition of the namespace

xmlns="http://www.w3.org/1999/xhtml"

in Saxon implementations of XQuery, and with XQuery 3.1 (rather than the XQuery 1.0 in Apple’s NSXMLDocument

Same result:

  1. XIncluded bike files vanish from XQuery results if their <html> tag contains that xmlns schema declaration, and
  2. reappear as soon as the tag is restored to a vanilla <html>

Tested: Saxon-PE XQuery 12.5, running in Oxygen XML,
and also Saxon-HE and Saxon-EE equivalents.


Perhaps this sheds light ?


Perhaps writing out xmlns="http://www.w3.org/1999/xhtml"
might work better as an option than as a default ?

In the meanwhile, while one could theoretically use a command line to remove all those xmlns declarations, before perfoming a query, doing so would, I think, break the path-independent bike:// row links to those files by losing the com.apple.metadata:kMDItemIdentifier extended file attribute.

(And a command line pre-process takes time, alarms me, and I would not recommend to it anyone :slight_smile:

4 Likes

PS note that for Bike links from a query report (back to a document and line) to work, you need to add the enclosing folder of Bike files to:

Bike > Settings > Sandbox > Extended Sandbox Folders

My head is spinning, maybe this is why people like JSON so much. I guess I’ll change to preserving this attribute, but won’t default to writing it out. I’m uncertain on adding a preference … seems too technical for settings panel.

1 Like

Many thanks !

Perfect.


This is very strange! What happens if you replace xmlns="http://www.w3.org/1999/xhtml" with xmlns:xhtml="http://www.w3.org/1999/xhtml" and then prefix all html elements accordingly?

@jonsterling

( Jumping on a train - may not be able to get to this for a week )


UPDATE (did take a look en route to wilderness)

Good experimental question – prefixing every html-enclosed element in all files with xhtml: (
after <html xmlns:xhtml="http://www.w3.org/1999/xhtml">
) does let XInclude find those elements,

but perhaps, in practice, the costs of that change might include retrofitting (and increasing the size of) all historical files, while risking new incompatibilities when (in one software context or another) an <xhtml:p> is not immediately recognized as equivalent to a <p> ?

The existing XQuery clauses

let $matches := //p[ matches(., $Search_Pattern, "i")]

don’t find the prefixed tag names, so we would either have to add a little complexity and extra work to XQuery expressions, in the form of or clauses, or local-name() function applications (to obtain equivalence between <p> and <xhtml:p>)

declare namespace xhtml = "http://www.w3.org/1999/xhtml";

let $matches := //*[local-name() = "p" and matches(., $Search_Pattern, "i")]

(not unmanageable, but turning a 1s query into a 1.5s query here, FWIW)

or generally move to a simpler pattern like:

declare namespace xhtml = "http://www.w3.org/1999/xhtml";

let $matches := //xhtml:p[ matches(., $Search_Pattern, "i")]

and risk some tool like XUpdate or sed over all historical files, inflating their size a bit, and hoping for the best on their compatibility with other contexts.

(sed would be a little crazy – recursion eludes it – but it can at least edit “in-place”, whereas anything like XUpdate would, I think, shed all .bike file attributes, including those on which row links, and fold state memory, depend)


I think my own vote is probably to stay with simplicity in Bike files and XQuery expressions, but I’m very grateful for the improved question, and experimental insight, about interactions between XInclude and xmlns.

Thank you !

More to the point (on reflection)

If we have a folder of .bike files with slightly mixed namespace annotations to the <html> element – perhaps, for example:

  • vanilla <html>
  • the late 2023 <html xmlns="http://www.w3.org/1999/xhtml">
  • today’s experimental <html xmlns:xhtml="http://www.w3.org/1999/xhtml">

all XInclude-wrapped:

At folder level:
<group xmlns:xi="http://www.w3.org/2001/XInclude">

and at file level:
<xi:include href="file:///....bike" xmlns:xi="http://www.w3.org/2001/XInclude"/>

Then an XPath like:

//*[local-name() = "p" and matches(., $Search_Pattern, "i")]

(without needing any explicit name-space annotation of the descendants of <html>)

will harvest <p> (-ish) elements with a set of differing namespace patterns, inherited both from the outer XInclude annotations, and from any any inner annotations to the <html> tags in each file:

<xhtml:p xmlns:xhtml="http://www.w3.org/1999/xhtml"
         xmlns:xi="http://www.w3.org/2001/XInclude">Mnemosyne</xhtml:p>
<p xmlns:xi="http://www.w3.org/2001/XInclude">Note the points V briefly (just a mnemonic keyword)</p>
<p xmlns="http://www.w3.org/1999/xhtml" xmlns:xi="http://www.w3.org/2001/XInclude">Mnemosyne</p>

whereas the simpler and more optimistic XPath which I started with:

//p[ matches(., $Search_Pattern, "i")]

only finds the plainest pattern (inheriting from the Xinclude tags, but nothing from the <html> tag):

<p xmlns:xi="http://www.w3.org/2001/XInclude">...</p>

i.e.

  1. There is no need to retrospectively normalise legacy .bike files on any side of the Aug 2023 watershed, but
  2. we probably do need, since the Aug 2023 period of annotating <html>, to take the 30% query performance hit, and match
    //*[local-name() = "p"], rather than just plain indexed //p (without dynamic evaluation of each tag name)

Lighter name-spacing ( a vanilla <html> tag ) seems to offer a quieter and lighter life :slight_smile:


On performance:

We could recoup query performance (if it proved to be a source of palpable friction) by creating a coherent file base, either with no <html> annotation and plain

//p

or with the prefix pattern, if there really were any practical advantages to offset the costs of specifying xhtml

<html xmlns:xhtml="http://www.w3.org/1999/xhtml">
declare namespace xhtml = "http://www.w3.org/1999/xhtml";

let $matches := //xhtml:p[ matches(., $Search_Pattern, "i")]

but vanilla works pretty well …

Footnote:

If we want to establish a coherent set of files (vanilla <html> tag), to take advantage of the simpler (and indexed – faster) query //p

then we can identify the names of the files which need attention by writing something like:

for $match in (//*[local-name() = "p"] except (//p))
return <doc>{$match/ancestor::doc/@file}</doc>

or get a fuller sense of their whole content and context by writing the simpler:

for $x in (//*[local-name() = "p"] except (//p))
return $x/ancestor::doc