Scientists Rename Human Genes To Stop Microsoft Excel From Misreading Them as Dates

'There are tens of thousands of genes in the human genome: minuscule twists of DNA and RNA that combine to express all of the traits and characteristics that make each of us unique. Each gene is given a name and alphanumeric code, known as a symbol, which scientists use to coordinate research. But over the past year or so, some 27 human genes have been renamed, all because Microsoft Excel kept misreading their symbols as dates. From a report: The problem isn't as unexpected as it first sounds. Excel is a behemoth in the spreadsheet world and is regularly used by scientists to track their work and even conduct clinical trials. But its default settings were designed with more mundane applications in mind, so when a user inputs a gene's alphanumeric symbol into a spreadsheet, like MARCH1 -- short for "Membrane Associated Ring-CH-Type Finger 1" -- Excel converts that into a date: 1-Mar. This is extremely frustrating, even dangerous, corrupting data that scientists have to sort through by hand to restore. It's also surprisingly widespread and affects even peer-reviewed scientific work. One study from 2016 examined genetic data shared alongside 3,597 published papers and found that roughly one-fifth had been affected by Excel errors.' -- source: https://science.slashdot.org/story/20/08/06/147226 This auto-interpreting and getting things wrong has always annoyed me about Excel and other spreadsheet applications. Not sure whether there ever was an option to turn off this "auto-detect of content type"... Cheers, Peter -- Peter Reutemann Dept. of Computer Science University of Waikato, NZ +64 (7) 858-5174 http://www.cms.waikato.ac.nz/~fracpete/ http://www.data-mining.co.nz/

On Fri, 7 Aug 2020 09:13:47 +1200, Peter Reutemann quoted:
'... when a user inputs a gene's alphanumeric symbol into a spreadsheet, like MARCH1 -- short for "Membrane Associated Ring-CH-Type Finger 1" -- Excel converts that into a date: 1-Mar.'
Presumably localized versions of Excel will perform corresponding (mis)conversions on names that happen to look like dates in those locales. In other words, renaming the genes to accommodate Excel’s quirks seems like it’s only going to backfire in some other way.

On Fri, 7 Aug 2020 09:13:47 +1200, Peter Reutemann wrote:
This auto-interpreting and getting things wrong has always annoyed me about Excel and other spreadsheet applications. Not sure whether there ever was an option to turn off this "auto-detect of content type"...
According to a reader comment in this article <https://www.theregister.com/2020/08/06/excel_gene_names/>, Excel lets you change the format, but it seems a lot of scientists, experts in their field (which presumably includes expertise in the tools they use for their job), can’t seem to figure this out. Also I tried creating a text file with the following lines, using the examples mentioned in the article: DEC1 SEPT2 MARCH1 and see what LibreOffice Calc would do with that. Before loading the file, it presented me with a dialog to specify the import format--tab or comma delimited, character set etc. And also to let me choose the type for each column. The default “Standard” type kept the cell contents as text. Actually, even explicitly choosing a “Date” type didn’t work: they still displayed as the original text!

This auto-interpreting and getting things wrong has always annoyed me about Excel and other spreadsheet applications. Not sure whether there ever was an option to turn off this "auto-detect of content type"...
According to a reader comment in this article <https://www.theregister.com/2020/08/06/excel_gene_names/>, Excel lets you change the format, but it seems a lot of scientists, experts in their field (which presumably includes expertise in the tools they use for their job), can’t seem to figure this out.
Also I tried creating a text file with the following lines, using the examples mentioned in the article:
DEC1 SEPT2 MARCH1
and see what LibreOffice Calc would do with that. Before loading the file, it presented me with a dialog to specify the import format--tab or comma delimited, character set etc. And also to let me choose the type for each column. The default “Standard” type kept the cell contents as text. Actually, even explicitly choosing a “Date” type didn’t work: they still displayed as the original text!
Good to know! :-) I always thought that the explicit import dialog for CSV files in LibreOffice was far superior to Excel's "let's just try loading the CSV file and then the user can fix things that I screwed up"... CHeers, Peter -- Peter Reutemann Dept. of Computer Science University of Waikato, NZ +64 (7) 858-5174 http://www.cms.waikato.ac.nz/~fracpete/ http://www.data-mining.co.nz/

On Fri, 7 Aug 2020 13:39:34 +1200, Peter Reutemann wrote:
I always thought that the explicit import dialog for CSV files in LibreOffice was far superior to Excel's "let's just try loading the CSV file and then the user can fix things that I screwed up"...
This comment <https://forums.theregister.com/forum/all/2020/08/07/army_promotion_excel_snafu/#c_4085999> mentions something called RIKEN identifiers, which can look like numbers. LO Calc does automatically format them as numbers in exponential format with its “Standard” column type, but changing that to “Text” fixes it.
participants (3)
-
david@rebirthing.co.nz
-
Lawrence D'Oliveiro
-
Peter Reutemann