Text processing in the shell

Hi everyone Just came across this article, which also has real-world examples: 'One of the things that makes the shell an invaluable tool is the amount of available text processing commands, and the ability to easily pipe them into each other to build complex text processing workflows. These commands can make it trivial to perform text and data analysis, convert data between different formats, filter lines, etc. When working with text data, the philosophy is to break any complex problem you have into a set of smaller ones, and to solve each of them with a specialized tool.' -- source: https://blog.balthazar-rouberol.com/text-processing-in-the-shell Cheers, Peter -- Peter Reutemann Dept. of Computer Science University of Waikato, NZ +64 (7) 858-5174 http://www.cms.waikato.ac.nz/~fracpete/ http://www.data-mining.co.nz/

On Wed, 18 Mar 2020 15:21:54 +1300, Peter Reutemann wrote:
https://blog.balthazar-rouberol.com/text-processing-in-the-shell
Text processing is useful, but it is also prone to some pitfalls. Watch out for locale settings. As the grep(1) man page <http://man7.org/linux/man-pages/man1/grep.1.html> points out, you might expect that writing “[a-d]” is equivalent to “[abcd]”, but it might actually give you “[aAbBcCdD]” instead. Doing export LC_ALL=C turns off all locale handling in the current shell. Also watch out for special characters in filenames. What if any of the filenames returned by the «filter» in “for file in $(«filter»); do «action» done” contains spaces or other funnies? By default, the shell separates strings into words at space, tab and newline characters. The characters it uses are the value of the “IFS” special shell variable. Spaces are very common in filenames these days; you can avoid the shell tripping over these with an assignment like IFS=$'\n' but this still leaves newlines as a potential problem. If you can be sure these will never occur in filenames, you’re fine. Otherwise, slightly more elaborate measures must be taken.
participants (2)
-
Lawrence D'Oliveiro
-
Peter Reutemann