Free Online Courses for Software Developers - MrBool
× Please, log in to give us a feedback. Click here to login
×

You must be logged to download. Click here to login

×

MrBool is totally free and you can help us to help the Developers Community around the world

Yes, I'd like to help the MrBool and the Developers Community before download

No, I'd like to download without make the donation

×

MrBool is totally free and you can help us to help the Developers Community around the world

Yes, I'd like to help the MrBool and the Developers Community before download

No, I'd like to download without make the donation

Data Compression with Zip Files in Java

We all the importance that ZIP files offer, when comes to their use during data compression and decompression. In this tutorial, we shall see the process that can be used for the handling of ZIP files in Java.

Several sources of data are known to have redundant information, or data that hold no value to the already stored information. As a result of this, very often large amounts of data end up being generated, which is then transferred between server and client systems. The most obvious method that can help an organization tackle this issue is by making use of more storage devices and also, by expanding on the already existing communication facilities. However, this is a step that is also obviously going to increase the organization’s operating costs.

A counter method would be to free up a portion of the data storage capabilities by the use of efficient code. This is particularly where data compression and decompression happen to be very effective means, as they are both efficient, as well as convenient. In the case of Java applications, we can implement this by making use of the java.util.zip package.

Why can’t we use file compression application to compress files?

While it well known that data can be compressed and decompressed with the help of already existing applications, such as gzip, WinZIP, or even Java Archive (known as jar), they are largely standalone applications. One can invoke these tools from their respective Java applications, but it isn’t exactly a straightforward process. Also, the process isn’t very efficient. Besides, if you wish to compress or decompress data on the fly, the above mentioned issues couldn’t hold truer.

We shall now discuss data compression, the java.util.zip package and then learn how to compress and decompress data.

Data compression – A basic overview

Usually, the most basic redundancy that can be observed in a file is the repetition of characters. For example, take a look at the string given below –

AAAGGCCDDDDLLLLLRRRSSSSS

This string can be efficiently encoded by simply replacing the string of characters that are repeated with a number that will represent the time that the character has been repeated. Thus, after encoding, this string would appear as –

3A2G2C4D5L3R5S

Here, the 3A represents three A’s, 4D represents four D’s and so on. This method of compressing a string is known as run-length encoding.

We shall take another example in the form of a bitmapped image, which is given below.

Now, using run-length encoding, we can compress this representation of a rectangle by using the following.

0,40
0,40
0,10 1,20 0,10
0,10 1,1 0,18 1,1 0,10
0,10 1,1 0,18 1,1 0,10
0,10 1,1 0,18 1,1 0,10
0,10 1,20 0,10
0,40

Here, the first line happens to be represented by 40 0s. Similarly, the fourth line consists of 10 0s, followed by a single 1, 18 0s, another 1 and then a further 10 0s.

Another approach can be the usage of a graphics metafile, in order to store the image.

Rectangle 11, 3, 20, 5

This states that the starting coordinates of the rectangle are (11, 3). The length is 20 and the width is 5.

A very important point to note in the case of run-length encoding is that they tend to require separate representations in case of the file and also the encoded version. Therefore, this method is not suitable for implementation on all files. Other techniques, such as variable length encoding, also called Huffman Coding, may be used.

Figure 1. Block diagram for data compression

What are benefits of data compression?

Data compression assures several benefits. However, the main advantage remains that it helps in the reduction of storage requirements. Also, in this case, when compressed data is transferred over a certain medium, the rate of information transfer naturally increases.

Also, it is good to note that data can be compressed by the use of both software and special types of hardware.

What is the java.util.zip package?

Java is known to provide the java.util.zip package that allows for ZIP oriented data compression. Basically, it provides you with classes, with the help of which you can create, read and even modify files bearing the GZIP and ZIP formats. Besides, it is known to offer utility classes that can be used for the computation of checksums of arbitrary inputs that can help in the validation of such input data.

Overall, the package consists of one interface, 14 classes, and 2 exception classes that have been outlined below.

  • Checksum – This interface is meant for representing data checksum. It is implemented by Adler32 and CRC32 classes.
  • Adler32 – This class is used for the computation of the Adler32 checksum of an input data stream.
  • CRC32 – This class computes the CRC32 checksum of a data stream.
  • CheckedInputStream – This class is an input stream that is meant to maintain the checksum of data that is being read.
  • CheckedOutputStream – This class is an output stream that is meant to maintain the checksum of data that is being written.
  • Deflater – This class supports general compression by making use of the ZLIB compression library.
  • DeflaterOutputStream – This class is an output stream filter meant to compress data in the deflate compression format.
  • GZIPInputStream – This class is an input stream filter meant for the reading of compressed data in the GZIP file format.
  • GZIPOutputStream – This class is an input stream filter meant for the writing of compressed data in the GZIP file format.
  • Inflater – The use of this class is that supports general compression by making use of ZLIB compression library.
  • InflaterInputStream – This class is an input stream filter by which one ca decompress data that is in deflate compression format.
  • ZipEntry – This class just represents a zip file entry.
  • ZipFile – This class is meant for reading entries from a zip file.
  • ZipOutputStream – This is class meant to be an output stream filter that allows writing of files in ZIP file format.
  • ZipInputStream - This is class meant to be an input stream filter that allows reading of files in ZIP file format.
  • ZipException – This basically an exception class that is thrown for signaling zip errors.
  • DataFormatException – This happens to be an exception class meant for signaling a data format error.

Decompression and Extraction of data from a ZIP file

As stated above already, we are aware that the java.util.zip package provides us with classes for both, compression and decompression. Basically, it can be said that decompressing a file means the reading of data from an input stream. The ZipInputStream class is ideal for reading ZIP files in Java. It is created just as you create other input stream classes in Java, like –

FileInputStream filein = new FileInputStream("fins.zip"); 
ZipInputStream zins = new ZipInputStream(new BufferedInputStream(fi));

Once the ZIP input stream has been created, you can easily get going in terms of reading ZIP entries by making use of the getNextEntry method that returns a ZipEntry object. When the end of the file is reached, this method returns the null.

ZipEntry enter;
while((enter = zins.getNextEntry()) != null) {
   // extract the data
   // open the output streams
}

After this, it would be ideal to set up a decompressed output stream that has been provided.

int BUFR = 2048;
FileOutputStream fileos = new FileOutputStream(enter.getName());
BufferedOutputStream destin = new BufferedOutputStream(fileos, BUFR);

You might have noticed how we have made use of BufferedOutputStream rather than ZIPOutputStream. The ZIPOutputStream and the GZIPOutputStream are known to use a buffer size of 512. The implementation of BufferedOutputStream is only relevant when the size of the buffer is more than 512, i.e. 2048 in our case. As for ZIPOutputStream, it doesn’t allow the setting of a buffer size and for GZIPOutputStream, you can specify the internal buffer size as a constructor argument.

In this part of the code, we shall create a file output stream using the entry’s name, which can be retrieved by the use of the method entry.getName. The sourced zipped data is then read and further written into the decompressed stream.

while ((counter = zins.read(data, 0, BUFR)) != -1) {
   //System.out.write(x);
   destin.write(data, 0, counter);
}

Further, you can close the input and output streams.

destin.flush();
destin.close();
zins.close();

Following image shows a typical structure of a ZIP file

Figure 2. ZIP files structure

Now, we shall take a look at a sample that shows the decompression and extraction process. To test this, we compile and run it, passing a compressed file in ZIP format.

prompt> java UNZIP afile.zip

Listing 1. Program showing unzip functionality

import java.io.*;
import java.util.zip.*;
 
public class UNZIPExample {
   final int BUFR = 2048;
   public static void main (String args[]) {
      try {
         BufferedOutputStream destin = null;
         FileInputStream filein = new 
            FileInputStream(args[0]);
         ZipInputStream zins = new 
            ZipInputStream(new BufferedInputStream(fi));
         ZipEntry enter;
         while((enter = zins.getNextEntry()) != null) {
            System.out.println("Extracting: " +enter);
            int counter;
            byte data[] = new byte[BUFR];
            // writing of the files to the disk
            FileOutputStream fileos = new 
               FileOutputStream(enter.getName());
            BufferedOutputStream destin = new 
              BufferedOutputStream(fileos, BUFR);
            while ((counter = zins.read(data, 0, BUFR)) 
              != -1) {
               destin.write(data, 0, counter);
            }
            destin.flush();
            destin.close();
         }
         zins.close();
      } catch(Exception exp) {
         exp.printStackTrace();
      }
   }
}

One important point to note is that the ZIPInputStream class is known to read ZIP files in a sequence. However, in the case of the class ZIPFile, it reads the contents using a random access file internally, such that the entries of the ZIP file does not have to be read in a sequence.

Another basic difference between these happens to be that zip entries are not cached when a combination of FileInputStream and ZIPInputStream is used. But, in case of ZIPFile, the file is always cached internally.

Compression and Archiving of data in ZIP files

The ZIPOutputStream is used for the compression of data to a ZIP file. The ZIPOutputStream is known to write data to an output stream in a ZIP format.

The provided program can help you understand such a process.

Listing 2. Program showing zip functionality

   import java.io.*;
   import   java.util.zip.*;
    
   public class ZIPExample   {
      static final int BUFR = 2048;
      public static void main (String args[]) {
         try {
            BufferedInputStream orig = null;
            FileOutputStream destin = new 
              FileOutputStream("c:\\zipped\\figs.zip");
            ZipOutputStream output = new   ZipOutputStream(new 
              BufferedOutputStream(destin));
            //output.setMethod(ZipOutputStream.DEFLATED);
            byte data[] = new byte[BUFR];
            // getting a list of files from the current   directory
            File fil = new File(".");
            String file[] = fil.list();
    
            for (int a=0; a<file.length; a++)   {
               System.out.println("Adding:   "+file[a]);
               FileInputStream fi = new 
                 FileInputStream(file[a]);
               orig = new 
                 BufferedInputStream(fi, BUFR);
               ZipEntry enter = new   ZipEntry(file[a]);
               out.putNextEntry(enter);
               int counter;
               while((counter =   origin.read(data, 0, 
                 BUFR)) != -1) {
                  output.write(data, 0, counter);
               }
               orig.close();
            }
            output.close();
         } catch(Exception exp) {
            exp.printStackTrace();
         }
      }
   }
   

It is good to note that entries can be added to a ZIP file either in compressed, i.e. DEFLATED, form, or also in uncompressed, i.e. STORED, form. The setMethod can be used for the setting of the method of storage.

Conclusion

The article that has been written above shows the API that can be used for the compression and decompression of from your Java applications. In fact, the overall java.util.zip package has been described so that you can clearly refer to what it has to offer.

With the presence of code samples for the two major processes, you can easily learn how to handle ZIP files in Java applications and further implement them when the need to do so arises. Therefore, this allows you to save disk space and also increase the file transfer rates between the server and client machines. Besides handling just file, one may also look further and learn how to compress and decompress data on the fly.



Website: www.techalpine.com Have 16 years of experience as a technical architect and software consultant in enterprise application and product development. Have interest in new technology and innovation area along with technical...

What did you think of this post?
Services
[Close]
To have full access to this post (or download the associated files) you must have MrBool Credits.

  See the prices for this post in Mr.Bool Credits System below:

Individually – in this case the price for this post is US$ 0,00 (Buy it now)
in this case you will buy only this video by paying the full price with no discount.

Package of 10 credits - in this case the price for this post is US$ 0,00
This subscription is ideal if you want to download few videos. In this plan you will receive a discount of 50% in each video. Subscribe for this package!

Package of 50 credits – in this case the price for this post is US$ 0,00
This subscription is ideal if you want to download several videos. In this plan you will receive a discount of 83% in each video. Subscribe for this package!


> More info about MrBool Credits
[Close]
You must be logged to download.

Click here to login