Apple

Linux Built-In Tools Are So Powerful, You Build a Database With Them. Here’s How

November 3, 2024

Table of Contents

Quick Links

What Kind of Database Can I Build in Linux?

How to Create and Use a Database With Linux Tools

A database is a core part of many apps, from full-blown enterprise websites to simple tools like shopping lists and finance trackers. Relational databases driven by SQL are popular, but Linux offers a simpler, more transparent alternative.

What Kind of Database Can I Build in Linux?

In Linux, the text file is king. With many robust tools and an ecosystem that encourages chaining them together, you can achieve a lot with plain text files.

This includes building a database, a structured set of data that you can use for many purposes. Using text files and command line tools, you can create simple prototypes, inspect your data quickly, and version-control your data as easily as any of your code.

There are many useful Linux commands and many of them work with text. Many of these commands are filters that receive data via standard input, perform some actions, and produce standard output.

grep searches through input and selects lines that match one or more patterns.
cut extracts selected parts of each line and writes them to standard output.
awk is a more powerful pattern scanning and processing language.
sort does exactly what you’d expect, but it can sort specific columns and handles numeric/alphabetic sort correctly.
head and tail let you extract a slice of rows from the output.
join supports related data in multiple files.

In this example, you’ll build a simple database for a todo app. You can build all the basic functionality using standard Linux tools. Eventually, you can extend the app by using a scripting language or migrating to a relational database.

Create Tables as Flat Files

One of the simplest structured text formats is DSV, or delimiter-separated values. This is a general case of the CSV—comma-separated values—format. On Linux, structured text files often use the space or colon (:) character to separate fields. The /etc/passwd file is a classic example:

Example content from the Linux /etc/passwd file shows one record per line with each field separated by a colon character.

You can store many kinds of data, such as a todo list, with this kind of format:

Buy milk:2024-10-21:2:open
Call bank:2024-10-20:1:closed

You can use any text editor to update your database, one more advantage of the plain text format. You can even add items directly from the command line by redirecting the output from echo to a file:

echo "Take out the trash:$(date -I):3:open" > tasks

This is the equivalent of the SQL:

INSERT INTO tasks VALUES('Take out the trash', CURDATE(), '3', 'open')

Note that this command uses a subcommand to get the current date. It’s a bit awkward if you’re typing it out, but in a script it can be much more convenient.

Fetch an Entire Table

Selecting data is probably the most common database task. The basic case is to select everything from a table, i.e.

SELECT * FROM tasks

Which fetches all the columns from every row of a database. With a file-based database, the equivalent is trivial:

cat tasks

The contents of a DSV file storing tasks from a todo app.

Select Columns With cut

Getting a bit more sophisticated, you can narrow down your selection to specific columns. In SQL, this goes like:

SELECT task FROM tasks

Using the cut tool, you can achieve just about the same functionality:

cut -d':' -f1 tasks

The d option specifies a delimiter, the character that separates the fields on each line of your file. The f option lets you choose particular fields. This will give you a simple list of all the tasks in your database:

A Linux cut command shows the first field from a data file.

Select Rows With grep or awk

Rather than fetching every row from your database, you’ll usually want to restrict the results. Filtering based on field values is the most obvious requirement, as in:

SELECT * FROM tasks WHERE status=open

In this case, grep is the perfect substitute. Use it to match rows by a regex pattern and you can find, for example, every task with an “open” status:

grep 'open$' tasks

A Linux grep command shows two records from a data file.

This particular case takes advantage of the fact that status is the final field on each line; $ matches the end of a string. For fields in the middle of a line, you might need to use a more complicated regex. For example, getting every row with a priority of 2:

grep ':2:[^:]*$' tasks

But grep can only match against text patterns, it can’t handle more complicated expressions like this:

SELECT status, task FROM tasks WHERE date2024-10-21

This SQL uses a logical comparison to get tasks before a certain date. You could try to construct a more complicated regex, but this starts to get beyond grep’s capabilities.

What you need is a more powerful tool, like awk:

awk -F':' '$2 tasks

Output from a Linux awk command shows records from a data file before a certain date.

Awk can do the job of both grep and cut. In this example, the part that reads

$2"2024-10-21"

Is a precondition which means only rows with an earlier date will match. The command then prints the first two columns of each row.

Paginate Results With tail and head

SQL’s LIMIT clause lets you fetch a specific number of results. To fetch the first two rows:

head -2 tasks

You can use tail to get the last n rows. Together with head, this lets you approximate the LIMIT clause to include offsets. For example, to get rows 2-3:

head -3 tasks | tail -2

The linux head command piped to the tail command to extract a slice of rows from a file.

Sort Rows With sort

The “ORDER BY” clause is an important part of many SQL statements. Fortunately, Linux has a great equivalent command: sort. Like cut and awk, you can specify a delimiter and field by number, although the flags use different letters. This time, t specifies the delimiter and k the field number:

sort -t':' -k2 tasks

This will show all records sorted by date:

The output from a Linux sort command showing a data file sorted by date.

Join Tables With join

Relational databases are all about the relationships between different tables, where a field in one references a field in another. You may not be aware of it, but Linux has an equivalent to SQL’s JOIN clause called, unsurprisingly, join.

Let’s extend the todo data to cater for several people’s tasks. Start by adding a new name column to the original tasks file so that it looks like this:

The contents of a text file storing data about tasks.

And create a file, people, to store details about each person we’ll store tasks for:

The contents of a text file storing data about users.

You can now use the join command with a delimiter specified via the t option:

join -t':' -1 5 -2 1 tasks people

The -1 and -2 options specify the number of the field to join on from each file. Here, they are the 5th and 1st field respectively. join will use the first field by default, so you can simplify this to:

join -t':' -1 5 tasks people

And it will produce:

The linux join command showing data from two files.

To clean the output up a bit, you can pipe your joined tables to cut and omit the name field:

join -t':' -1 5 tasks people | cut -d':' -f2-

A linux join command piped to the cut command to clean up the output.

And you can combine the two names into one using awk:

join -t':' -1 5 tasks people | awk -F':' '{print $2":"$3":"$4":"$5":"$6" "$7}'

A linux join command piped to the awk command to clean up the output.

Putting Everything Together

To wrap up, consider a much more complicated SQL expression. Here’s one that joins both tables to get names, selects specific columns, and fetches rows with a certain priority. It sorts by date and finishes by fetching just the first matching row:

SELECT task,date,priority,status,first_name,last_name
FROM tasks t
LEFT JOIN people p ON t.name=p.name
WHERE priority=2
ORDER BY date
LIMIT 1

The equivalent command pipeline is perhaps trickier to understand, but it’s no more complicated once you’re familiar with these core tools:

join -t':' -1 5 -2 1 tasks people \
  | awk -F':' '{print $2":"$3":"$4":"$5":"$6" "$7}' \
  | grep ':2:' \
  | sort -t ':' -k2 \
  | head -1

A pipeline of linux commands emulating a complicated SQL database query.

Source link