You’re Risking Data Loss By Using This Linux Wildcard Wrong


Linux wildcards let you type a single command that acts on whole groups of files at the same time. That’s a great time saver, unless things go wrong. And they can. Destructively.




What Wildcards Are For

The well-known wildcards are the question mark, ?, and the asterisk, *. These can be used to create filename patterns. The question mark represents any single character, and the asterisk represents any sequence of characters, including zero characters.

Knowing this, we can construct patterns that match multiple filenames. Instead of typing all the filenames on the command line, we type the pattern instead. All files that match the pattern are acted on by the command.

If we have a collection of files in a directory like this:

A directory listing of a collection of various file types, in a terminal window.

We can select groups of files that match the patterns we provide.


ls taf_*  
Using ls in a terminal window, to select files that start with taf_.

That gives us all files with “taf_” at the start of their names.

ls *.sh
ls s*.sh
Using ls in a terminal window to select different groups of files, by using wildcards.

The first command lists all the shell script files in the directory. The second command lists only files that start with “s” that are also shell script files.

That all seems simple enough, and with ls, it is. But other commands can make use of this type of pattern matching. Problems arise when the shell tries to help by pattern matching before the command gets a chance.


Using the Asterisk With the find Command

The action of expanding a pattern into a list of matching files is called globbing.

It started out as a standalone command in Unix version 6, then became a library that could be linked into other programs, and nowadays it is a shell built-in. The expansion of the pattern is performed by the shell, and the results of the expansion are passed to the command as command line parameters.

We’ll look at two examples using the find command. One does what you might expect, but the second one may well surprise you.

For this example, we’re going to use a directory with a single file in it, called readme.txt. There are two directories, src and inc. They contain a mix of C, H, MD and TMP files.

ls -R 
A recursive ls directory listing showing subdirectories and files, in a terminal window.


We can use find to recursively find files (-type f) with names that match our pattern (-name *.c), giving us a list of the C files.

find . -type f -name *.c 
Using find in a terminal window to recursively find files with a C extension.

We can add the -not option to invert the search, showing us everything apart from the C files.

find . -type f -not -name *.c 
Using find in a terminal window to recursively find files without a C extension.

Having reviewed this list, we choose to delete everything apart from the C files. We can do this by adding the -delete option.


find . -type f -not -name *.c -delete
find .
Using find in a terminal window to recursively delete files without a C extension.

The second find command recursively lists everything in and below the current directory. All that remains are our C files.

That worked the way most of us would have expected. Now we’ll do the exact same thing, but this time the file in the current directory isn’t a text file, it’s a C file.

ls -R 
Using ls in a terminal window to recursively list files. There is a file called main.c in the current directory.

We’ll use the same find command and options to delete everything but the C files. That’s not what we wanted at all.


find . -type f -not -name *.c -delete
find .
Attemtping to use find in a terminal window to recursively delete files without a C extension. There is a file called main.c in the current directory.

That’s blithely deleted every single file in the directory tree, apart from the one C file in the current directory.

We’ll reset the files once more, and issue the command in the way we’re supposed to use it.

All the files are in place, and we have a C file in the current directory, just as we did before.

ls -R 
Using ls recursively in a terminal window to show the files and subdirectories. There is a file called main.c in the current directory.

This time, we’ll wrap the wildcard pattern in single quotes.


find . -type f -not -name '*.c' -delete
find .
Using find with the wildcard enclosed in single quote marks, to recursively delete all files with a C extension.

That is what we wanted. Everything’s gone apart from our C files.

OK, So What Went Wrong?

The single quotes stop the shell from expanding the filename pattern. It’s passed to the command or program as is, for the command to act upon.

In the example that worked, we had a readme.txt file in the current directory. The shell couldn’t find a match to *.c, so it passed *.c to find to act upon.

In the example that deleted everything but the C files, we had a file called main.c in the current directory. The shell matched the pattern to that file, and passed the name of the file to the find command. So find’s instructions were to delete everything that wasn’t called main.c.


We can illustrate this with a small C program that does no more than display its command line parameters in the terminal window.

#include stdio.h>
#include stdlib.h>

int main(int argc, char *argv[]) 
{
    int i;

    printf("You supplied %d arguments.\n", argc-1);

    for (i=1; i            printf("%-2d) \"%s\"\n", i, argv[i]);

    exit (0);
}

I saved this as a file called glob.c, and compiled it with:

gcc -o glob glob.c 

The variable argc holds the number of arguments we pass to the program. A for loop runs through the list of arguments and prints each one to the terminal window.

The for loop starts at argument one, not zero. There is an argument zero. It always holds the name of the binary itself. To avoid muddying the water, I’ve avoided printing it. The only arguments that get printed are ones we provide on the command line.

./glob one two 3 ant beetle cockroach
Using the example program to count and list command line arguments, in a terminal window.

Let’s try that with *.c as the command line parameter.


ls *.c
./glob *.c
Using the example program with a command parameter of *.c, without a C in the same directory.

Without any C files in the current directory, the shell passes *.c to the find command. The find command then acts upon the wildcard pattern itself. But, when we have a C file in the current directory, the shell passes the name of the matching C file to the program.

ls *.c
./glob *.c
Using the example program with a command parameter of *.c, with a C in the same directory.

Our program receives the name of the C file as its parameter, and the same is true for the find command. So actually, find was doing what it was told to do: delete all files except for the main.c file.


This time, we’ll wrap the wildcard pattern in single quotes.

ls *.c 

./glob '*.c'

Using the example program to count and display command paramters, when *.c is enclosed in single quotes.

The shell ignores the chance to apply its globbing to the wildcard pattern, and passes it straight to the command for further processing.

A Simple Fix, You Can Quote Me

As a general rule, quote wildcard patterns that you’re passing to commands like find. That’s all it takes to prevent this type of potentially disastrous mishap.



Source link

Previous article5 Reasons Why I Stopped Using Ubuntu