I think there is a bug in the handling of HTML (public.html) pasted into Bike. It seems that Bike is assuming that HTML is iso-8859-1 encoded when it doesn’t know what the encoding actually is.
When I put
<b>français</b>
on the clipboard (various ways, including writing a pyobjc program to write a public.html string to the clipboard) and paste it to Bike, I get
The JavaScript for Automation code below seems to retrieve <p>Français</p> intact as UTF-8 from the public.html pasteBoard after setting it, but if we then paste it into Bike, we see:
Français
Expand disclosure triangle to view JS source
(() => {
"use strict";
ObjC.import("AppKit");
const main = () =>
either(x => x)(x => x)(
(
setClipOfTextType("public.html")(
"<p>Français</p>"
),
clipOfTypeLR("public.html")
)
);
// --------------------- GENERIC ---------------------
// Left :: a -> Either a b
const Left = x => ({
type: "Either",
Left: x
});
// Right :: b -> Either a b
const Right = x => ({
type: "Either",
Right: x
});
// either :: (a -> c) -> (b -> c) -> Either a b -> c
const either = fl =>
// Application of the function fl to the
// contents of any Left value in e, or
// the application of fr to its Right value.
fr => e => "Left" in e
? fl(e.Left)
: fr(e.Right);
// ----------------------- JXA -----------------------
// clipOfTypeLR :: String -> Either String String
const clipOfTypeLR = utiOrBundleID => {
const
clip = ObjC.deepUnwrap(
$.NSString.alloc.initWithDataEncoding(
$.NSPasteboard.generalPasteboard
.dataForType(utiOrBundleID),
$.NSUTF8StringEncoding
)
);
return 0 < clip.length
? Right(clip)
: Left(
"No clipboard content found " + (
`for type '${utiOrBundleID}'`
)
);
};
// setClipOfTextType :: String -> String -> IO String
const setClipOfTextType = utiOrBundleID =>
txt => {
const pb = $.NSPasteboard.generalPasteboard;
return (
pb.clearContents,
pb.setStringForType(
$(txt),
utiOrBundleID
),
txt
);
};
return main();
})();
I’m not sure if this is the defined behaviour, i.e. what default encoding is assumed in the absence of an explicit charset="utf-8" in the pasteBoard HTML.
(macOS does have a murky MacRoman encoding inheritance from its pre-history, which still occasionally shows through
for example, the same thing happens if you paste your HTML pasteboard into TextEdit …
If we put <p><i>Français</i></p> into the public.html pasteboard, and paste to TextEdit, the italics get through, but the encoding is not UTF-8)
I’m able to paste my HTML into at least MS Word and Obsidian without problems – they seem to default to UTF-8 when the encoding is uncertain. Or doesn’t the macOS clipboard actually understand the encoding of strings and bike is assuming iso-8859-1 when the HTML doesn’t explicitly say UTF-8?
(() => {
"use strict";
ObjC.import("AppKit");
const main = () =>
either(x => x)(x => x)(
(
setClipOfTextType("public.html")(
"<p><i>Français</i></p>"
),
clipOfTypeLR("public.html")
)
);
// --------------------- GENERIC ---------------------
// Left :: a -> Either a b
const Left = x => ({
type: "Either",
Left: x
});
// Right :: b -> Either a b
const Right = x => ({
type: "Either",
Right: x
});
// either :: (a -> c) -> (b -> c) -> Either a b -> c
const either = fl =>
// Application of the function fl to the
// contents of any Left value in e, or
// the application of fr to its Right value.
fr => e => "Left" in e
? fl(e.Left)
: fr(e.Right);
// ----------------------- JXA -----------------------
// clipOfTypeLR :: String -> Either String String
const clipOfTypeLR = utiOrBundleID => {
const
clip = ObjC.deepUnwrap(
$.NSString.alloc.initWithDataEncoding(
$.NSPasteboard.generalPasteboard
.dataForType(utiOrBundleID),
$.NSUTF8StringEncoding
)
);
return 0 < clip.length
? Right(clip)
: Left(
"No clipboard content found " + (
`for type '${utiOrBundleID}'`
)
);
};
// setClipOfTextType :: String -> String -> IO String
const setClipOfTextType = utiOrBundleID =>
txt => {
const pb = $.NSPasteboard.generalPasteboard;
return (
pb.clearContents,
pb.setStringForType(
$(txt),
utiOrBundleID
),
txt
);
};
return main();
})();
pasting to MS Word (unlike pasting to TextEdit) seems to involve a default to UTF-8
and not sure if it’s relevant, but I notice that if we supply a null or void value for the encoding argument in a function like NSString’s initWithData:encoding: then a message tells us:
Incorrect NSStringEncoding value 0x0000 detected. Assuming NSASCIIStringEncoding. Will stop this compatibility mapping behavior in the near future.
(Perhaps Apple would do better, in a macOS Unix context, to assume NSUTF8StringEncoding)
@jessegrosjean I wanted to flag this thread for your attention – I think there’s a bug in how Bike handles the pasting of HTML with ambiguous character encoding.