Docker images are created by building Dockerfiles. The build process executes the instructions in the Dockerfile to create the filesystem layers that form the final image.
What if you already have an image? Can you retrieve the Dockerfile it was built from? In this article, we’ll look at two methods that can achieve this.
The Objective
When you’re building your own Docker images, you should store your Dockerfiles as version controlled files in your source repository. This practice ensures you can always retrieve the instructions used to assemble your images.
Sometimes you won’t have access to a Dockerfile though. Perhaps you’re using an image that’s in a public registry but has an inaccessible source repository. Or you could be working with image snapshots which don’t directly correspond to a versioned Dockerfile. In these cases, you need a technique that can create a Dockerfile from an image on your machine.
Docker doesn’t offer any built-in functionality for achieving this. Built images lack an association with the Dockerfile they were created from. However, you can reverse engineer the build process to produce a good approximation of an image’s Dockerfile on-demand.
The Docker History Command
The docker history
command reveals the layer history of an image. It shows the command used to build each successive filesystem layer, making it a good starting point when reproducing a Dockerfile.
Here’s a simple Dockerfile for a Node.js application:
FROM node:16 COPY app.js . RUN app.js --init CMD ["app.js"]
Build the image using docker build
:
$ docker build -t node-app:latest .
Now inspect the image’s layer history with docker history
:
$ docker history node-app:latest IMAGE CREATED CREATED BY SIZE COMMENT c06fc21a8eed 8 seconds ago /bin/sh -c #(nop) CMD ["app.js"] 0B 74d58e07103b 8 seconds ago /bin/sh -c ./app.js --init 0B 22ea63ef9389 19 seconds ago /bin/sh -c #(nop) COPY file:0c0828d0765af4dd... 50B 424bc28f998d 4 days ago /bin/sh -c #(nop) CMD ["node"] 0B <missing> 4 days ago /bin/sh -c #(nop) ENTRYPOINT ["docker-entry... 0B ...
The history includes the complete list of layers in the image, including those inherited from the node:16
base image. Layers are ordered so the most recent one is first. You can spot where the layers created by the sample Dockerfile begin based on the creation time. These show Docker’s internal representation of the COPY
and CMD
instructions used in the Dockerfile.
The docker history
output is more useful when the table’s limited to just showing each layer’s command. You can disable truncation too to view the full command associated with each layer:
$ docker history node-app:latest --format "{{.CreatedBy}}" --no-trunc /bin/sh -c #(nop) CMD ["app.js"] /bin/sh -c ./app.js --init /bin/sh -c #(nop) COPY file:0c0828d0765af4dd87b893f355e5dff77d6932d452f5681dfb98fd9cf05e8eb1 in . /bin/sh -c #(nop) CMD ["node"] /bin/sh -c #(nop) ENTRYPOINT ["docker-entrypoint.sh"] ...
From this list of commands, you can gain an overview of the steps taken to assemble the image. For simple images like this one, this can be sufficient information to accurately reproduce a Dockerfile.
Automating Layer Extraction with Whaler and Dfimage
Copying commands out of docker history
is a laborious process. You also need to strip out the /bin/sh -c
at the start of each line, as Docker handled each instruction as a no-op Bash comment.
Fortunately there are community tools available that can automate Dockerfile creation from an image’s layer history. For the purposes of this article, we’ll focus on Whaler which is packaged into the alpine/dfimage
(Dockerfile-from-Image) Docker image by the Alpine organization.
Running the dfimage
image and supplying a Docker tag will output a Dockerfile that can be used to reproduce the referenced image. You must bind your host’s Docker socket into the dfimage
container so it can access your image list and pull the tag if needed.
$ docker run --rm -v /var/run/docker.sock:/var/run/docker.sock alpine/dfimage node-app:latest Analyzing node-app:latest Docker Version: 20.10.13 GraphDriver: overlay2 Environment Variables |PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin |NODE_VERSION=16.14.2 |YARN_VERSION=1.22.18 Image user |User is root Dockerfile: ... ENTRYPOINT ["docker-entrypoint.sh"] CMD ["node"] COPY file:bcbc3d5784a8f1017653685866d30e230cae61d0da13dae32525b784383ac75f in . app.js RUN ./app.js --init CMD ["app.js"]
The created Dockerfile contains everything you need to go from scratch
(an empty filesystem) to the final layer of the specified image. It includes all the layers that come from the base image. You can see these in the first ENTRYPOINT
and CMD
instructions in the sample output above (the other base image layers have been omitted for brevity’s sake).
With the exception of COPY
, the instructions specific to our image match what was written in the original Dockerfile. You can now copy these instructions into a new Dockerfile
, either using the whole dfimage
output or by taking just the part that pertains to the final image. The latter option is only a possibility if you know the original base image’s identity so you can add a FROM
instruction to the top of the file.
The Limitations
In many cases dfimage
will be able to assemble a usable Dockerfile. Nonetheless it’s not perfect and an exact match is not guaranteed. The extent of the discrepancies compared to the image’s original Dockerfile will vary depending on the instructions that were used.
Not all instructions are captured in the layer history. Unsupported ones will be lost and there’s no way you can determine what they were. The best accuracy is obtained with command and metadata instructions like RUN
, ENV
, WORKDIR
, ENTRYPOINT
, and CMD
. RUN
instructions could still be missing if their command didn’t result in filesystem changes, meaning no new image layer was created.
COPY
and ADD
instructions present unique challenges. The history doesn’t contain the host file path which was copied into the container. You can see a copy occurred but the source path references the file hash that was copied into the image from the build context.
As you do get the final destination, this can be enough to help you work out what’s been copied and why. You can then use this information to interpolate a new source path into the Dockerfile which you can use for future builds. In other cases, inspecting the file inside the image might help reveal the copy’s purpose so you can determine a meaningful filename for the host path.
Summary
Docker images don’t include a direct way to work backwards to the Dockerfile they were built from. It’s still possible to piece together the build process though. For simple images with few instructions, you can often work out the instructions manually by looking at the CREATED BY
column in the docker history
command’s output.
Larger images with more complex build processes are best analyzed by tools like dfimage
. This does the hard work of parsing the verbose docker history
output for you, producing a new Dockerfile that’s a best effort match for the likely original.
Reverse engineering efforts aren’t perfect and some Dockerfile instructions are lost or mangled during the build process. Consequently you shouldn’t assume Dockerfiles created in this way are an accurate representation of the original. You might have to make some manual adjustments to ADD
and COPY
instructions too, resurrecting host file paths that were converted to build context references.