How to Exclude Patterns, Files, and Directories With grep


Linux laptop showing a bash prompt
fatmawati achmad zaenuri/Shutterstock.com

Since 1974, the Linux grep command has been helping people find strings in files. But sometimes grep is just too thorough. Here are several ways to tell grep to ignore different things.

The grep Command

The grep command searches text files looking for strings that match the search patterns you provide on the command line. The power of grep lies in its use of regular expressions. These let you describe what you’re looking for, rather than have to explicitly define it.

The birth of grep pre-dates Linux. it was developed in the early 1970s on Unix. It takes its name from the g/re/p key sequence in the ed line editor (incidentally, pronounced “ee-dee”). This stood for global, regular express search, print matching lines.

grep is famously—perhaps, notoriously—thorough and single-minded. Sometimes it’ll search files or directories you’d rather it didn’t waste its time on, because the results can leave you unable to see the wood for the trees.

Of course, there are ways to reign grep in. You can tell it to ignore patterns, files, and directories so that grep completes its searches faster, and you’re not swamped with meaningless false positives.

Excluding Patterns

To search with grep you can pipe input to it from some other process such as cat , or you can provide a filename as the last command line parameter.

We’re using a short file that contains the text of the poem Jabberwocky, by Lewis Carroll. In these two examples, we’re searching for lines that match the search term “Jabberwock.”

cat jabberwocky.txt | grep "Jabberwock"
grep "Jabberwock" jabberwocky.text

Two different ways to search through the same text file with grep

The lines that contain matches to the search clue are listed for us, with the matching element in each line highlighted in red. That’s straightforward searching. But what if we want to exclude lines that contain the word “Jabberwock” and print the rest?

We can accomplish that with the -v (invert match) option. This lists the lines that don’t match the search term.

grep -v "Jabberwock" jabberwocky.text

Using the -v inverted search option with grep

The lines that don’t contain “Jabberwock” are listed to the terminal window.

All of the lines that don't contain the word jabberwock

We can exclude as many terms as we wish. Let’s filter out any lines that contain “Jabberwock” and any lines that contain “and.” To achieve this we’ll use the -e (expression) option. We need to use it for each search pattern we’re using.

grep -v -e "Jabberwock" -e "and" jabberwocky.txt

Using multiple search clauses with grep

There’s a corresponding drop in the number of lines in the output.

The lines from the text that do not match either search term

If we use the -E (extended regexes) option, we can combine the search patterns with “|“, which in this context doesn’t indicate a pipe, it’s the logical OR operator.

grep -Ev "Jabberwock|and" jabberwocky.txt

Using the logical OR operator with grep

We get exactly the same output as we did with the previous, longer-winded command.

The lines from the text that do not match either search term

The format of the command is the same if you want to use a regex pattern instead of an explicit search clue. This command will exclude all lines that start with any letter in the set of “ACHT.”

grep -Ev "^ACHT" jabberwocky.txt

Excluding files that start with particular letters

To see lines that contain a pattern but which also don’t contain another pattern, we can pipe grep into grep . We’ll search for all lines that contain the word “Jabberwock” and then filter out any lines that also contain the word “slain.”

grep "Jabberwock" jabberwocky.txt | grep -v "slain"

Piping grep into grep to filter twice

Excluding Files

We can ask grep to look for a string or pattern in a collection of files. You could list each file on the command line, but with many files that approach doesn’t scale.

grep "vorpal" verse-1.txt verse-2.txt verse-3.txt verse-4.txt verse-5.txt verse-6.txt

Searching through a list of named files

Note that the name of the file containing the matching line is displayed at the start of each line of output.

To reduce typing we can use wildcards. But that can be counterintuitive. This appears to work.

grep "vorpal" *.txt

Using wildcards to search a collection of files

However, in this directory there are other TXT files, with nothing to do with the poem. If we search for the word “sword” with the same command structure, we get a lot of false positives.

grep "sword" *.txt

Searching for "sword" through a collection of TXT files

The results we want are masked by the deluge of false results from the other files that have the TXT extension.

A large results set of false positives

The word “vorpal” didn’t match anything, but “sword” is included in the word “password” so it was found many times in some pseudo-logfiles.

We need to exclude these files. To do that we’ll use the --exclude option. To exclude a single file called “vol-log-1.txt” we’d use this command:

grep --exclude=vol-log-1.txt "sword" *.txt

In this instance, we want to exclude multiple log files with names that start with “vol.” The syntax we need is:

grep --exclude=vol*.txt "sword" *.txt

Excluding files with wildcards

When we use the -R (dereference-recursive) option grep will search entire directory trees for us. By default, it will search through all files in those locations. There may well be multiple types of files we wish to exclude.

Beneath the current directory on this test machine, there are nested directories containing logfiles, CSV files, and MD files. These are all types of text files that we want to exclude. We could use an --exclude option for each file type, but we can achieve what we want more efficiently by grouping the file types.

This command excludes all files that have CSV or MD extensions, and all TXT files whose names start with either “vol” or “log.”

grep -R --exclude=*.{csv,md} --exclude={vol*,log*}.txt "sword" /home/dave/data/

Using multiple --exclude clauses and filename groupings

Excluding Directories

If the files we want to ignore are contained in directories and there are no files in those directories that we do want to search, we can exclude those entire directories.

The concept is very similar to that of excluding files, except we use the --exclude-dir option and name the directories to ignore.

grep -R --exclude-dir=backup "vorpal" /home/dave/data

Excluding a directory from the search

We’ve excluded the “backup” directory, but we’re still searching through another directory called “backup2.”

It’ll come as no surprise that we can use the --exclude-dir option multiple times in a single command. Note that the path to excluded directories should be given relative to the directory the search will start in. Don’t use the absolute path from the root of the file system.

grep -R --exclude-dir=backup --exclude-dir=backup2 "vorpal" /home/dave/data

Excluding two directories from the search

We can use groupings too. We can achieve the same thing more succinctly with:

grep -R --exclude-dir={backup,backup2} "vorpal" /home/dave/data

Excluding directories with grouping

You can combine file and directory exclusions in the same command. If you want to exclude all files from a directory and exclude certain file types from the directories that are searched, use this syntax:

grep -R --exclude=*.{csv,md} --exclude-dir=backup/archive "frumious" /home/dave/data

Excluding file types and directories in the same command

Sometimes It’s What You Leave Out

Sometimes with grep it can feel like you’re trying to find a needle in a haystack. it makes a big difference to remove the haystack.

RELATED: How to Use Regular Expressions (regexes) on Linux





Source link

Previous articleApple MacBook Pro M2 13-inch review – TechCrunch
Next articleA Great Intro – Review Geek