
On Mon, 18 Sep 2023 08:37:50 +1200, Peter Reutemann quoted:
'... in July 2023⦠...'
I was trying to decipher how that “⦔ came about. Most commonly, this kind of random junk appears from misinterpreting Unicode text as ISO-8859-1 encoding. If you do a View Source on the original page, you see this: ... in July 2023⦠... In hex, these codes are 0xE2 and 0xA6. If you look at the UTF-8 spec <https://www.rfc-editor.org/rfc/rfc2044.txt>, a byte of the form 0b1110xxxx (0xE2 = 0b11100010) needs to be followed by two more bytes of the form 0b10xxxxxx, of which 0xA6 (= 0b10100110) is one. Presumably the missing byte was something unprintable in ISO-8859-1, so it simply got lost along the way. After trying a few things (not exhaustively), I came up with this candidate sequence: 0xE2 0x80 0xA6, which is the UTF-8 encoding for U+2026 HORIZONTAL ELLIPSIS, in other words the “…” character.