Apple

Linux Comes With a Free Dictionary and It’s More Useful Than You Think

October 10, 2024

Every Linux system comes with a dictionary file: a huge list of words in your language. This “words” file is useful for spell-checking, but you can put it to use in many different ways. Learn more about Linux commands with some wordy diversions.

Table of Contents

1 Find Words With a Certain Length

The dictionary file—usually located at /usr/share/dict/words—is particularly useful for fans of word games and similar puzzles. Imagine you’re looking for a word to complete a crossword puzzle, or one to fit in a particular space in a design. You might want a list of words that are exactly a specific length. With the words file and some common Linux utilities like grep or awk, this is an easy problem to solve.

You’re probably more familiar with grep than awk, so let’s start with that tool. Here’s how to use a regular expression to get all the words of a certain length:

egrep '^.{22}$' /usr/share/dict/words

egrep is shorthand for “grep -E” which uses extended regular expressions. In this case, it allows for cleaner syntax.

This example is pretty straightforward once you’re familiar with regular expressions:

The anchor characters—circumflex (^) and dollar ($)—represent the beginning and end of the line. They ensure matches that are complete words, not just part of a word.
The period (.) is a wildcard that matches any single character. Finally, the {22} repeats the previous item 22 times. So the full expression gives us words exactly 22 characters long:

Output from a Linux grep command showing words of an exact length.

awk is another tool that makes heavy use of regular expression patterns, but it’s much more powerful than grep—a full-blown language, in fact. This means it has various shortcuts that are invaluable in cases such as this.

Here’s the awk equivalent of the previous grep command:

awk 'length($0) == 22' /usr/share/dict/words

This uses the features $0 to match the entire line, and awk’s built-in length function to count the number of characters in it.

You can use either of these readily available tools for the job. Your choice will depend on factors like efficiency, ease-of-use, and personal preference. If you’re familiar with regular expressions, the grep approach may be easier, but the awk command is a bit more readable.

2 Discover the Longest Word

Building on the previous example, how about finding out what the longest word is? You can do so very easily with the wc (word count) program and its -L flag:

wc -L /usr/share/dict/words

The output of wc showing the length of the longest line.

The output tells you the longest word is 28 letters long, which is pretty impressive. To find out what this word actually is, simply reuse the process from the word length example:

awk 'length($0) == 28' /usr/share/dict/words

The output from an awk command showing the longest word from the dictionary file.

Dictionary files vary greatly. Of course, different languages will have totally different dictionaries but, between systems, even the same language can have a very different set of words. For example, macOS tells me that “antidisestablishmentarianism” is the longest word, while a remote Ubuntu sytem I login to tells me it’s “electroencephalograph’s.” Maybe macOS is just that little bit more “book smart.”

3 Get Inspiration for Naming Things

The dictionary file has a surprising number of proper nouns, but we can use this to our advantage. Maybe you need a character name for that novel you’re working on or you’re looking for unorthodox suggestions for the name of a newborn.

For whatever reason, if you want a list of possible names, the words file has you covered. Simply search for every line beginning with a capital letter:

grep '^[A-Z].' /usr/share/dict/words

A list of proper nouns obtained by searching the Linux words file.

4 Get a Random Word

Getting a random line from a file sounds useful, but there’s no obvious built-in command to do so. However, this demonstrates the power of the Linux pipeline: you can chain a couple of simple commands together to do the job.

First of all, you need to know about the sort command and its -R flag. sort usually orders a set of lines alphabetically or numerically, but the -R flag randomizes the order instead. This simplifies the task: randomize the words and pick the first, which is a job for head:

sort -R /usr/share/dict/words | head -n1

A Linux pipeline using sort and head to get a random word from the dictionary.

Be sure to use -R, not -r which sorts in reverse order.

This may not be the most efficient solution, but it’s fast enough on a typical dictionary file and a modern computer—and it’s easy to remember!

5 Find Words Without Any Vowels

Curious about the English language? You may have heard the old adage that every word contains a vowel. Let’s disprove that with some simple grep’ing:

egrep '^([^aeiou]){2,}$' /usr/share/dict/words

Output from a grep command that searches the dictionary file for words without vowels.

This expression uses the character class syntax—square brackets ([ and ])—to restrict characters to a given set. It uses negation—the second circumflex (^)—to match any characters that are not in the set. So each character must not be “a,” “e,” etc. The {2,} restricts matches to words with at least two letters; single-letter results aren’t very interesting!

6 Find Words With All the Vowels in Order

We word nerds love our trivia, and a common challenge involves finding a word that contains all vowels in order: a, e, i, o, and u. Here’s a quick grep command that looks for words containing the five vowels, with any number of characters in between:

grep '.*a.*e.*i.*o.*u.*' /usr/share/dict/words

A search of the Linux dictionary file for words with each vowel in order.

You may notice a slight flaw with this regex: it returns words like “abietineous” which repeat vowels. If you want to be very strict, you can modify the regex, although it gets a bit messy:

grep '[^aeiou]*a[^aeiou]*e[^aeiou]*i[^aeiou]*o[^aeiou]*u[^aeiou]*' /usr/share/dict/words

This explicitly rules out vowels between the ones we’re searching for. It gives some more satisfying results, including words like “abstemious” and “facetious:”

A search of the Linux dictionary file for words with each vowel in order, appearing only once.

7 Master Back-References for Smarter Search

Back to grep, and this time we’ll see how to search for a complicated pattern like the same letter twice. This requires the use of back-references which help you search for something that has already been matched. You can find words with a double-letter sequence like this:

egrep "(.)\1" /usr/share/dict/words

Here, the parentheses create a grouped expression which you can then match using a back-reference. The back-reference—\1—matches the first (and only) group in the regex. So this regular expression means “any character followed by the same character:”

Output from a grep command showing words with double letters in the Linux dictionary file.

There are many words in the English language with two identical letters in a row. But what about three?

egrep "(.)\1\1" /usr/share/dict/words

On macOS, my dictionary contains seven interesting examples:

Output from a grep command showing words with triple letters in the Linux dictionary file.

8 Discover Which Letters Are Used the Most

OK, this is a bit of a cheat since we’ll actually discover which letter appears at the beginning of most words, but this is still interesting information. It’s also another useful demonstration of a pipeline, this time using four separate utilities:

cut -b1 /usr/share/dict/words | tr '[:upper:]' '[:lower:]' | uniq -c | sort -n

“cut -b1” returns the first letter from each line in the file. “tr ‘[:upper:]’ ‘[:lower:]'” converts everything to lowercase. “uniq -c” gives us unique lines, combining adjacent identical lines and counting them as it goes. Finally, “sort -n” sorts the results numerically so the most common leading letter appears at the bottom.

Your results will vary according to your language. In English, it looks like “s” starts the most words, narrowly beating out “p.”

Output from a Linux command that determines initial letter frequency from words in the dictionary file.

If your dictionary orders all uppercase words first, followed by all lowercase words, add a “| sort” just before the “| uniq.”

These are just some of the fun and interesting things you can do in your Linux terminal. You may also want to check out how to use Spotify in your terminal, or how to create artistic works in the command line.

Source link

1 Find Words With a Certain Length

2 Discover the Longest Word

3 Get Inspiration for Naming Things

4 Get a Random Word

5 Find Words Without Any Vowels

6 Find Words With All the Vowels in Order

7 Master Back-References for Smarter Search

8 Discover Which Letters Are Used the Most

RELATED ARTICLESMORE FROM AUTHOR

How to share Safari tabs across different devices

Apple Watch helped Whole Foods founder give up drinking: ‘It changed my life’

A new app called ‘People’ is coming to help you remember stuff about … people

RELATED ARTICLES MORE FROM AUTHOR