Before starting, we will learn about the basics of PDF files, so that we can have a better understanding. Portable Document Format (PDF) is a file format used to represent documents in a manner independent of application software, hardware, and operating systems. Each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, graphics, and other information needed to display it. Most of the times, we read PDF files in Adobe Reader. Anyone may create applications that can read and write PDF files without having to pay royalties to Adobe Systems; Adobe holds patents to PDF, but licenses them for royalty-free use in developing software complying with its PDF specification. The PDF combines three technologies:
- A subset of the PostScript page description programming language, for generating the layout and graphics.
- A font-embedding/replacement system to allow fonts to travel with the documents.
- A structured storage system to bundle these elements and any associated content into a single file, with data compression where appropriate.
Note: PostScript is a page description language run in an interpreter to generate an image, a process requiring many resources. It can handle not just graphics, but standard features of programming languages such as if and loop commands. PDF is largely based on PostScript but simplified to remove flow control features like these, while graphics commands such as lineto remain.
Now, we will learn about the important things that we always come across while reading them. A PDF file consists primarily of objects, of which there are eight types. They are Boolean values which represents true or false, numbers, Strings, names, arrays(which are ordered collections of objects), dictionaries (which are collections of objects indexed by Names), and streams (which usually contains large amounts of data and the null object). There are two types of layout in PDF files.
They are as follows:
- Non-linear layouts which are also known as non-optimized layout. Non-linear PDF files consume less disk space than their linear counterparts, though they are slower to access because portions of the data required to assemble pages of the document are scattered throughout the PDF file.
- Linear layouts which are also known as optimized layouts. Linear PDF files are constructed in a manner that enables them to be read in a Web browser plugin without waiting for the entire file to download, since they are written to disk in a linear fashion.
Now, text in PDF is represented by text elements in page content streams. A text element specifies that characters should be drawn at certain positions. The characters are specified using the encoding of a selected font resource.
Now after having an understanding to PDF files, let us learn to create, write and read PDF files using PDFOne. As we know, The PdfDocument is the main class in PDFOne Java. It represents a PDF document and allows the users to create, read, and enhance PDF documents. It provides numerous methods for the users to render PDF elements such as text, images, shapes, forms, watermarks, and document annotations.
Now let us create a PDF document first. This is the first step, before we start writing new PDF elements. In order to create a new document, user should create a PdfDocument object using a PdfWriter object. When the user creates the PdfWriter object, he/she can specify the stream or the file pathname to which the PDF document needs to be saved.
Listing 1: Shows the code for creating a new PDF document
// Create a PdfWriter instance PdfWriter w = PdfWriter.fileWriter("sample_doc1.pdf"); // Create a PdfDocument instance with the PdfWriter PdfDocument d = new PdfDocument(w); // Write some text on page 1 d.writeText("Hello, World!"); // Write document to file d.write(); // Close all I/O streams associated with the PDF writer w.dispose();
After creating a new PDF document object, then the writeText() method will allow the user to write text on a page, which is created by default in the document. So, when we start writing text in this manner, text will be rendered on the top-left corner of the document. Subsequent calls to the method will change this location to where the previouswriteText() method had let off.
Figure 1: Shows the image in which text is rendered at default location using default font.
Now, after creating the PDF document, the next thing that we will be learning is to read the PDF document. In order to read an existing PDF document, the user needs to create aPdfDocumentobject using aPdfReader object. When they create thePdfReaderobject, he/she can specify the stream or the file pathname from which the PDF document needs to be read.
Listing 2: Shows the code to read the PDF document
// Create a PdfReader instance PdfReader r = PdfReader.fileReader("sample_doc1.pdf"); // Create a PdfDocument instance with the reader PdfDocument d = new PdfDocument(r); // Get page count and display on console System.out.println( "Number of pages in sample_doc1.pdf is " + d.getPageCount()); // Close all I/O streams associated with the PDF reader r.dispose();
Once the PDF Document object is created by the user, he/she can use the object's methods to read from the loaded document. Now, we will learn about writing more in an existing PDF Document. This step helps to make changes in an existing PDF document. In this case, users only need to specify an output file or stream when the PdfReaderobject is created. Any changes that the user makes to the loaded PDF document is saved to this output file or stream.
Note: PDFOne Java will not overwrite the input file or stream! This is the advantage of using this technique.
Listing 3: Shows the code for writing more or making changes in an existing PDF document
// Create a PdfReader instance PdfReader r = PdfReader.fileReader( "sample_doc1.pdf", // read from (input file) "sample_doc2.pdf"); // write to (output file) // Create a PdfDocument instance with the reader PdfDocument d = new PdfDocument(r); // Write text at position (100, 100) on page 1 d.writeText("Hello again, World!", 100, // x-coordinate 50); // y-coordinate // Set output file to be displayed after it is // written to d.setOpenAfterSave(true); // Write to output file d.write(); // Close all I/O streams associated with the PDF reader r.dispose();
In the Code above, a PDF document is loaded and another "Hello, World" text string is written at a different location (which is specified by x-y coordinates) on the first page using an overloaded method of PdfDocument.writeText(). This modified PDF document is then saved to the output file that was specified when creating the PdfReader object.
Figure 2: Shows the image of an example for using the code of Listing 3
We can also apply changes in terms of fonts and colors. We can render text in different fonts and colors using PdfFont objects. We can create font objects either by specifying the name of the installed font or the pathname of the font file.
Listing 4: Shows the code for making changes in terms of font and color to an existing PDF document
// Create a PdfReader instance PdfReader r = PdfReader.fileReader( "sample_doc1.pdf", // read from (input file) "sample_doc3.pdf"); // write to (output file) // Create a PdfDocument instance with the reader PdfDocument d = new PdfDocument(r); // Create font objects PdfFontfontArialItalic = PdfFont.create( "Arial", // name of installed font PdfFont.ITALIC | PdfFont.STROKE_AND_FILL, 18, PdfEncodings.WINANSI); PdfFontfontTahomaNormal = PdfFont.create( "tahoma.ttf", // pathname of a font file PdfFont.STROKE_AND_FILL, 48, PdfEncodings.WINANSI); // Write text on page 1 using the Arial font created above d.writeText("Hello again, World!", fontArialItalic, // font 100, 50); // Set font properties fontTahomaNormal.setStrokeWidth(2); fontTahomaNormal.setStrokeColor(Color.RED); fontTahomaNormal.setColor(Color.ORANGE); // Write more text on page 1 using Tahoma d.writeText("Hello again, World!", fontTahomaNormal, // font 100, 100); // Set output file to be displayed after it is // written to d.setOpenAfterSave(true); // Write to output file d.write(); // Close all I/O streams associated with the PDF reader r.dispose();
In this above coding, text has been rendered using specific fonts. This has been made possible by the use of another overloaded PdfDocument.writeText() method.
Figure 3: Shows the image of an example for using the code of Listing 3
So, we have learnt about the basics of PDF files which will provide a platform to the users to understand the topic in detail. After this, we have leant the procedure of creating, reading, writing and updating an existing PDF document with the help of JAVA.