How to use PDF Debugger to inspect a PDF file’s code





Apple ended direct PostScript file support in macOS Sonoma, but you can still peek inside PDF files to see what they contain, by using PDF Debugger.

PDF – the web’s ubiquitous document format was invented at Adobe Systems in the early 1980’s. At that time laser printers had just come of age with Apple’s LaserWriter printer and the Macintosh Plus, which served as one of the world’s first commercial desktop publishing systems.

PostScript – PDF’s origins

PostScript is a language that describes how a page is to be laid out on screen or on paper. Although PostScript was initially used in the ROMs of laser printers, it was later used in the computers made by Steve Jobs’ second company, NeXT Inc.

NeXT’s operating system NeXTStep (later called OpenStep) overcame early onscreen limitations by using Display PostScript for the onscreen display of text, shapes, and images.

Third generation early Apple LaserWriter included PostScript in its ROMs.

While Adobe’s original PDF file standard wasn’t technically pure PostScript, it was derived from it. In version 1.3 published in 2000, Adobe added PostScript Language Level 3 imaging model support.

It also supported the original and now defunct Adobe font standard, Type 1, which we’ll cover in a future article.

PDF and .ps files

A few years later, Adobe introduced the Portable Document Format, or PDF, which has become a document and web standard. PDF was originally a proprietary Adobe format but was standardized as ISO 32000 in 2008.

The standard was again revised in 2020.

PDF nearly didn’t see the light of day as much of Adobe’s management at the time didn’t see any demand for it, and PostScript was still the dominant page description language in the graphic design, desktop publishing, and printing world.

PDF can also embed forms, digital signatures, 3D objects, video, and a host of other content. PDF files can be encrypted and protected with passwords, although Adobe recently announced it has ended support for the original PostScript font format – Type 1 fonts.

When you open a PDF file on a modern computer, the application uses the operating system or library code to read the PDF file’s instructions. It converts the commands into native drawing routines for display on the OS.

For macOS and iOS, this is the Quartz framework, which contains an API for handling PDF files, and the Core Graphics framework which provides graphics drawing contexts for PDF display. Apple split the original Quartz framework functionality so Core Graphics handles most drawing primitives and contexts, and Quartz takes care of images, PDF operations, and Quick Look preview functionality.

Preview, printing, and viewing

Apple’s Preview app and printing system used to be able to open, display, and print PostScript directly in the form of .ps files, which contain PostScript code, but this support was ended in macOS 14 Sonoma. Preview has supported PDF files for decades.

You can still view the raw PostScript contents of .ps files on the Mac simply by dropping them on the TextEdit app. They will open as text files and you can read the PostScript directly.

PDF and PostScript files in Finder.

PDF and PostScript files in Finder.

Although most modern laser printers no longer contain PostScript interpreters in their ROMs, some consumer-level laser printers contain PostScript emulators, such as BR-Script from Brother which can receive, decode, and print .ps files using the printer’s native rendering.

You can look inside a PDF file to see its raw content using a Mac hex editor utility such as HexFiend or HexEdit. Hex editors are designed to show code and binary file contents but can be used to view any kind of file content if you know the file format.

Viewing PDF contents in Hex Fiend.

Viewing PDF contents in Hex Fiend.

But for many files, including PDF files, the raw data can be encoded or stored in such as way that it’s not human-readable. For this reason, to understand what you’re looking at in hex editors you’ll need to be familiar with the internals of the file format.

In the case of PDF files, they usually begin with the key “%PDF” and end with “%EOF”.

Using PDF Debugger

Most PDF files consist of a hierarchical tree-like structure. Some nodes on the tree consist of child nodes which further describe parent nodes, while other nodes (leaf nodes) only contain info about the file such as the number of pages, type, length, creator details, and other information.

PDF files can become corrupted and contain invalid tree data, which in most cases renders them unreadable. If you think you have a corrupted PDF file or just want to see a PDF’s tree info, now there’s an easy way.

PDF Debugger, a simple web tool by Ukrainian Yevhenii Hyzyla lets you do just that. It’s easy to use: just drag and drop any PDF file from the Mac’s Finder onto the drop pane on the page and it will read and display the file’s PDF tree info.

Drop a PDF onto the PDF Debugger page to display its tree info.

Drop a PDF onto the PDF Debugger page to display its tree info.

Although PDF Debugger doesn’t display a PDF’s entire contents, you can still use a hex editor or other raw data reader app to do that.

Other utilities can convert PDFs into .ps files so you can read the PostScript directly.

PDF Debugger is a quick and easy way to verify the basic structure of PDF files.

Hyzyla also has a wrapper library for the Node.js JavaScript engine on his GitHub page which uses Google’s high-performance PDFium library written in WebAssembly.



Source link

Previous articleBase launches boot camp to train blockchain developers
Next articleMicroStrategy Bitcoin Investment Stands at $60M Unrealized Gain