The tr command performs transforms on a stream of text, producing a new stream as its output. You can substitute, delete, or convert characters according to rules you set on the command line.
Do you need a no-frills method for manipulating a stream of text in Linux? Look no further than the tr command, which can save you time in replacing, removing, combining, and compressing input text. This is how it’s done.
What Is the tr Command?
The Linux tr
command is a fast and simple utility for stripping out unwanted characters from streams of text, and for other neat manipulation tricks. It gets its name from the word “translate,” and tr
‘s roots run deep in the Unix tradition.
As we all know, Linux is an open-source rewrite of Unix. It adds its own stuff into the mix, too. It isn’t a byte-for-byte clone, but it clearly takes much of its design principles and engineering steerage from the Unix operating system.
Although only two Linux distributions have so far been certified as POSIX compliant and rubber-stamped as being officially accepted as implementations of Unix—EulerOS and Inspur K-UX—Linux has almost completely supplanted Unix in the business world.
All Linux distributions, at least in their core utilities, adhere to the Unix philosophy. The Unix philosophy encapsulates the vision the Unix pioneers had for their new operating system. It’s often para-phrased as being “Write programs that do one thing well.” But there’s more to it than that.
One of the most powerful innovations was that programs should generate output that could be used as the input to other programs. The ability to daisy chain command line utilities together, using the output stream from one program as the input stream to the next program in line, is massively powerful.
Sometimes you’ll want to fine-tune or tweak the output from one program before it reaches the next program in line. Or perhaps you’re not taking your input from a Linux command line tool, you’re streaming text out of a file that hasn’t been created with your particular needs in mind.
This is where tr
comes into its own. It allows you to perform a set of simple transformations on its input stream, to produce its output stream. That output stream can be redirected into a file, fed into another Linux program, or even into another instance of tr
to have multiple transforms applied to the stream.
Replacing Characters
The tr
command operates on its input stream according to rules. Used without any command line options, the default action of tr
is to substitute characters in the input stream for other characters.
Commands to tr
usually require two sets of characters. The first set holds the characters that will be replaced if they are found in the input stream. The second set holds the characters that they will be replaced with.
The way this works is occurrences of the first character in set one will be replaced by the first character in set two. Occurrences of the second character in set one will be replaced by the second character in set two, and so on.
This example will look for the letter “c” in the input stream to tr
, and replace each occurrence with the letter “z.” Note that tr
is case-sensitive.
We’re using echo
to push some text into tr
.
echo abcdefabc | tr 'c' 'z'
All occurrences of “c” are replaced with “z” and the new string is written to the terminal window.
This time we’ll search for two letters, “a” and “c.” Note that we’re not searching for “ac.” We’re looking for “a”, then looking for “c.” We’re going to replace any occurrence of “a” with “x” and any occurrence of “c” with “z.”
echo abcdefabc | tr 'ac' 'xz'
For this to work you must have the same number of characters in both sets. If you don’t, you’ll get predictable, but probably unwanted, behavior.
echo 'call me Ishmael.' | tr 'abcdjklm' '123'
There are more characters in set one than in set two. The letters “d” to “m” have no corresponding character in set two. They’ll still get replaced, but they’re all replaced with the last character in set two.
It’s just about possible that this could be useful in some cases, but if you want to prevent this you can use the -t
(truncate) option. This only replaces those characters contained in set one that have a matching character in set two.
echo 'call me Ishmael.' | tr -t 'abcdjklm' '123'
Using Ranges and Tokens
Set one and set two can contain ranges of characters. For example, [a-z]
represents all the lowercase letters, and [A-Z]
represents all the uppercase letters. We can make use of this to change the case of a stream of text.
This will convert the input stream to uppercase.
echo "How-To Geek" | tr '[a-z]' '[A-Z]'
To flip the case in the other direction, we can use the same command but with the uppercase and lowercase ranges swapped on the command line.
echo "How-To Geek" | tr '[A-Z]' '[a-z]'
There are tokens that we can use for some of the common cases that we might want to match with.
- [:alnum:]: Letters and digits.
- [:alpha:]: Letters only.
- [:digit:]: Digits only.
- [:blank:]: Tabs and spaces.
- [:space:]: All whitespace, including newline characters.
- [:graph:]: All characters including symbols, but not spaces.
- [:print:]: All characters including symbols, including spaces.
- [:punct:]: All punctuation characters.
- [:lower:]: Lowercase letters.
- [:upper:]: Uppercase letters.
We can perform our lowercase to uppercase and uppercase to lowercase conversions just as easily, using tokens.
echo "How-To Geek" | tr '[:lower:]' '[:upper:]'
echo "How-To Geek" | tr '[:upper:]' '[:lower:]'
Inverting the Matches
The -c
(complement) option matches all characters apart from those in the first set. This command converts everything apart from the letter “c” to a hyphen ” -
“.
echo abcdefc | tr -c 'c' '-'
This command adds the letter “a” to the first set. Anything apart from “a” or “c” is converted to a hypen ” -
” character.
echo abcdefc | tr -c 'ac' '-'
Deleting and Squeezing Characters
We can use tr
to remove characters altogether, without any replacement.
This command uses the -d
(delete) option to remove any occurrence of “a”, “d”, or “f” from the input stream.
echo abcdefc | tr -d 'adf'
This is one instance where we only have one set of characters on the command line, not two.
Another is when we use the -s
(squeeze-repeats) option. This option reduces repeated characters to a single character.
This example will reduce repeated sequences of the space character to a single space.
echo "a b c de f c" | tr -s '[:blank:]'
It’s a little confusing that the [:blank:]
token represents the space character, and the [:space:]
token represents all forms of whitespace, including tabs and newline characters.
In this case, we could replace [:blank:]
with [:space:]
and get the same result.
echo "a b c de f c" | tr -s '[:space:]'
Deleting Characters
The differences between [:blank:]
and [:space:]
become apparent when we delete characters. To do this, we use the -d
(delete) option, and provide a set of characters that tr
will look for in its input stream. Any that it finds are removed.
echo "a b c de f c" | tr -d '[:blank:]'
The spaces are deleted. Note that we get a newline after the output stream is written in the terminal window. If we repeat that command and use [:space:]
instead of blank, we’ll get a different result.
echo "a b c de f c" | tr -d '[:blank:]'
This time we don’t start a new line after the output, the command prompt is butted right up against it. This is because [:space:]
includes newlines. Any spaces, tabs, and newline characters are removed from the input stream.
Of course, you could use an actual space character as well.
echo "a b c de f c" | tr -d ' '
We can just as easily delete digits.
echo abcd123efg | tr -d '[:digit:]'
By combining the -c
(complement) and -d
(delete) options we can delete everything apart from digits.
echo abcd123efg | tr -cd '[:digit:]'
Note that everything apart from digits mean all letters, and all whitespace, so once again we lose the terminating newline.
Combining and Splitting Lines
If we substitute newline characters for spaces, we can split a line of text and place each word on its own line.
echo 'one two three four' | tr ' ' '\n'
We can change the delimiter that separates words, too. This command substitutes colons ” :
” for spaces.
echo 'one two three four' | tr ' ' ':'
We can find whatever delimiter is in use, and replace it with newline characters, splitting difficult to read text into easier to manage output.
The path environment variable is a long string of many directory paths. A colon ” :
” separates each path. We’ll change them to newline characters.
echo $PATH
echo $PATH | tr ":" "\n"
That’s much easier to visually parse.
If we have output that we want to reformat into a single line, we can do that too. The file “lines.txt” contains some text, with one word on each line. We’ll feed that into tr
and convert it to a single line.
cat files.txt
cat lines.txt | tr '\n' ' '
Using tr With Pipes
We can use the output from tr
as the input for another program, or even to tr
itself.
This command uses tr
four times.
- The first
tr
deletes any hyphens “-” from the input. - The second
tr
squeezes any repeated spaces into single spaces. - The third
tr
replaces spaces with underscore “_” characters. - The fourth and final
tr
converts the string to lowercase.
echo "Mangled FiLE-nAMe.txt" | tr -d '-' | tr -s ' ' | tr ' ' '_' | tr '[:upper:]' '[:lower:]'
RELATED: How to Use Pipes on Linux
Simple Is as Simple Does
The tr
command is great because it is simple. There’s not much to learn nor remember. But its simplicity can be its downfall, too.
Make no mistake, frequently you’ll find that tr
let’s you do what you need without having to reach for more complicated tools like sed
.
However, if you’re struggling to do something with tr
, and you find yourself building long daisy chains of commands, you probably should be using sed
.