Possible bug: encoding of HTML pasted into Bike

I think there is a bug in the handling of HTML (public.html) pasted into Bike. It seems that Bike is assuming that HTML is iso-8859-1 encoded when it doesn’t know what the encoding actually is.

When I put

<b>français</b>

on the clipboard (various ways, including writing a pyobjc program to write a public.html string to the clipboard) and paste it to Bike, I get

français

which matches

print('français'.encode('utf-8').decode('iso-8859-1'))
français

Interestingly, if I put

<meta charset="utf-8"><b>français</b>

as public.html on my clipboard, the pasting into Bike works properly.

(I’m using the latest Bike version 1.18.2 Preview (176). I Love Bike!)

Seen here, too, I think.

The JavaScript for Automation code below seems to retrieve <p>Français</p> intact as UTF-8 from the public.html pasteBoard after setting it, but if we then paste it into Bike, we see:

Français

Expand disclosure triangle to view JS source
(() => {
    "use strict";

    ObjC.import("AppKit");

    const main = () =>
        either(x => x)(x => x)(
            (
                setClipOfTextType("public.html")(
                    "<p>Français</p>"
                ),
                clipOfTypeLR("public.html")
            )
        );

    // --------------------- GENERIC ---------------------

    // Left :: a -> Either a b
    const Left = x => ({
        type: "Either",
        Left: x
    });


    // Right :: b -> Either a b
    const Right = x => ({
        type: "Either",
        Right: x
    });


    // either :: (a -> c) -> (b -> c) -> Either a b -> c
    const either = fl =>
    // Application of the function fl to the
    // contents of any Left value in e, or
    // the application of fr to its Right value.
        fr => e => "Left" in e
            ? fl(e.Left)
            : fr(e.Right);

    // ----------------------- JXA -----------------------

    // clipOfTypeLR :: String -> Either String String
    const clipOfTypeLR = utiOrBundleID => {
        const
            clip = ObjC.deepUnwrap(
                $.NSString.alloc.initWithDataEncoding(
                    $.NSPasteboard.generalPasteboard
                    .dataForType(utiOrBundleID),
                    $.NSUTF8StringEncoding
                )
            );

        return 0 < clip.length
            ? Right(clip)
            : Left(
                "No clipboard content found " + (
                    `for type '${utiOrBundleID}'`
                )
            );
    };

    // setClipOfTextType :: String -> String -> IO String
    const setClipOfTextType = utiOrBundleID =>
        txt => {
            const pb = $.NSPasteboard.generalPasteboard;

            return (
                pb.clearContents,
                pb.setStringForType(
                    $(txt),
                    utiOrBundleID
                ),
                txt
            );
        };


    return main();
})();

I’m not sure if this is the defined behaviour, i.e. what default encoding is assumed in the absence of an explicit charset="utf-8" in the pasteBoard HTML.

(macOS does have a murky MacRoman encoding inheritance from its pre-history, which still occasionally shows through :slight_smile:

for example, the same thing happens if you paste your HTML pasteboard into TextEdit …

If we put <p><i>Français</i></p> into the public.html pasteboard, and paste to TextEdit, the italics get through, but the encoding is not UTF-8)

1 Like

Thanks for confirming what I’m seeing!

I’m able to paste my HTML into at least MS Word and Obsidian without problems – they seem to default to UTF-8 when the encoding is uncertain. Or doesn’t the macOS clipboard actually understand the encoding of strings and bike is assuming iso-8859-1 when the HTML doesn’t explicitly say UTF-8?

Confirmed here too – after:

Expand disclosure triangle to view JS source
(() => {
    "use strict";

    ObjC.import("AppKit");

    const main = () =>
        either(x => x)(x => x)(
            (
                setClipOfTextType("public.html")(
                    "<p><i>Français</i></p>"
                ),
                clipOfTypeLR("public.html")
            )
        );

    // --------------------- GENERIC ---------------------

    // Left :: a -> Either a b
    const Left = x => ({
        type: "Either",
        Left: x
    });


    // Right :: b -> Either a b
    const Right = x => ({
        type: "Either",
        Right: x
    });


    // either :: (a -> c) -> (b -> c) -> Either a b -> c
    const either = fl =>
    // Application of the function fl to the
    // contents of any Left value in e, or
    // the application of fr to its Right value.
        fr => e => "Left" in e
            ? fl(e.Left)
            : fr(e.Right);

    // ----------------------- JXA -----------------------

    // clipOfTypeLR :: String -> Either String String
    const clipOfTypeLR = utiOrBundleID => {
        const
            clip = ObjC.deepUnwrap(
                $.NSString.alloc.initWithDataEncoding(
                    $.NSPasteboard.generalPasteboard
                    .dataForType(utiOrBundleID),
                    $.NSUTF8StringEncoding
                )
            );

        return 0 < clip.length
            ? Right(clip)
            : Left(
                "No clipboard content found " + (
                    `for type '${utiOrBundleID}'`
                )
            );
    };

    // setClipOfTextType :: String -> String -> IO String
    const setClipOfTextType = utiOrBundleID =>
        txt => {
            const pb = $.NSPasteboard.generalPasteboard;

            return (
                pb.clearContents,
                pb.setStringForType(
                    $(txt),
                    utiOrBundleID
                ),
                txt
            );
        };


    return main();
})();

pasting to MS Word (unlike pasting to TextEdit) seems to involve a default to UTF-8

Screenshot 2024-04-13 at 7.20.38 PM

1 Like

As a footnote, we can reproduce this behaviour by replacing $.NSUTF8StringEncoding with $.NSASCIIStringEncoding in the JXA expression:

ObjC.deepUnwrap(
    $.NSString.alloc.initWithDataEncoding(
        $.NSPasteboard.generalPasteboard
        .dataForType("public.html"),
        $.NSUTF8StringEncoding
    )
)

and not sure if it’s relevant, but I notice that if we supply a null or void value for the encoding argument in a function like NSString’s initWithData:encoding: then a message tells us:

Incorrect NSStringEncoding value 0x0000 detected. Assuming NSASCIIStringEncoding. Will stop this compatibility mapping behavior in the near future.

(Perhaps Apple would do better, in a macOS Unix context, to assume NSUTF8StringEncoding)

1 Like

@jessegrosjean I wanted to flag this thread for your attention – I think there’s a bug in how Bike handles the pasting of HTML with ambiguous character encoding.

Ah, thanks! I saw this while I was away… and then got onto other things and forgot. Will look into this today.

1 Like

I’ve just posted a new preview release which I think fixes this problem:

1 Like

:+1: working well here now.

(Bike 1.18.2 (177) – Sonoma 14.4.1)

1 Like

Bike 1.18.2 (177) Preview seems to have fixed the problem for me too. Thanks @jessegrosjean !

1 Like