
On Wed, 9 May 2018 09:58:44 +1200, Peter Reutemann wrote:
The code wasn't the problem... The levels of management to go through to approve changes to a core component of Windows, then the same for applying for budget, approving budget (each time several rounds) ... [etc etc]
Well, guess what: flushed with that previous success, those daring folks at Microsoft have gone even further <https://arstechnica.com/gadgets/2018/12/latest-windows-insider-build-makes-a-major-upgrade-to-uh-notepad/>: The new and improved Notepad now has better Unicode support, defaulting to saving files as UTF-8 _without_ a Byte Order Mark ... You may or may not know, but “Byte Order Mark” is Unicode character U+FEFF, while the character code with the bytes swapped, U+FFFE, is “unassigned”, and will forever remain so. The usefulness of this pair dates back to the era when Unicode was only 16 bits, so what is now “UTF-16” encoding was equivalent to fixed-length “UCS-2” encoding. You may also know about the “big-endian” versus “little-endian” issue between different processor architectures. So text encoded in UCS-2 or UTF-16 is supposed to begin with a Byte Order Mark, and any program reading that text can check that the first character is indeed u+FEFF. If it sees U+FFFE instead, then it knows that the encoding comes from a machine with the opposite endianness, and can automatically apply a corresponding byte-swap adjustment to the text. Since UCS-2 is no longer sufficient to represent current versions of Unicode, and UTF-16 is a pain to deal with, UTF-8 is considered a much superior encoding. Furthermore, its definition is endianness-independent, so software running on different architectures always agrees about how the bytes are ordered. However, Microsoft in their wisdom decided that their version of UTF-8 text should still begin with a Byte Order Mark (UTF-8-encoded, of course). Which is completely pointless and ends up introducing a garbage character at the start when read by non-Windows software.