It’s Not A Bug, It’s A Feature!

The IndexByDate.txt files available on the various Bitsavers directories (e.g. <http://bitsavers.trailing-edge.com/pdf/>) list the various items available in order of date. But if you want them ordered by name, so items in the same directory come together, the logical thing to do is use the sort(1) <https://linux.die.net/man/1/sort> command on this file. You specify the sort key by a “field definition”. The man page says that, by default, each field starts at a non-whitespace character after whitespace. The file has a very regular format: splitting by whitespace, you get the date, the time, and then the pathname. So a command like sort -k3 IndexByDate.txt should work. Except I was seeing a sequence like this: 2006-08-17 22:17:12 ge/MarkI_Timesharing/IPC-202646_FORTRAN_Aug66.pdf 2012-08-29 02:26:21 generalAutomation/18_30/88A00026A-A_GA1830_Industrial_Supervisory_System_Reference_Manual_May69.pdf ... 2018-05-16 00:33:35 georgiaTech/GTL_Programmers_Reference_Manual_for_the_Burroughs_B_5500_Aug1974.pdf 2006-08-27 11:36:57 ge/PAC-4000/GET-3201B_PAC-4000_SysMan_Mar66.pdf As you can see, the “ge” section is interrupted by “generalAutomation” and others, and then resumes after them. What was going on? It took me a while to realize that the sorting is done by locale, and the locale setting is set to ignore the slashes. Thus, the above sequence is effectively “gem” followed by “gen”, then “geo” and “gep”. What I wanted was a straight ASCII sort on the full pathname, without regard to any locale settings. But there is no hint in the man page as to how to specify this. Then I realized: the locale is controlled by environment variables. If you want the “non-localized” locale, then specify the locale called “C”, thus: LC_ALL=C sort -k3 IndexByDate.txt and this gives the order I want.
participants (1)
-
Lawrence D'Oliveiro