Parallel compression – decompression of files on multi-core-multi-threaded cpu’s

Abstract

Using multiple cores and processors simultaneously to achieve faster compression and decompression rates is possible nowadays with the new generation of multi-core cpu’s. Using the following methods to create compressed backups of your files will be less time consuming.

Introduction

The most widely used compression tools are gzip, bzip, and xz. It is a common practice to create a tar file of a number of folders that contain various files and use the aforementioned tools to compress it. This way we can save bandwidth and hard disk space. In this article we will install, create a test file, tar it and then we will compress it with traditional methods and then compare it to the parallel compression methods focusing on CPU utilization and time consumption.

Materials and Methods

Most Linux distribution come preinstalled with the gzip, bzip and xz compression tools. We will need just to install the parallel compression tools.

The tests were performed on a dell Inspiron 14 with :

  • Intel Celeron Processor (Dual Core)
  • 2GB Memory

Install Required Packages

Open a terminal and install the required packages. In Debian/Ubuntu systems you can do that as following:

sudo apt-get install pxz pigz pbzip2

Testing file

Now we will be creating some random files inside our RAM memory. Then we will create a tar archive out of these files and start using the compression tools.

cd /dev/shm
mkdir testing-folder
cd testing-folder
dd if=/dev/urandom of=hundredmegfile1 bs=1024 count=102400
dd if=/dev/urandom of=hundredmegfile2 bs=1024 count=102400
dd if=/dev/urandom of=hundredmegfile3 bs=1024 count=102400

We run the dd part 3 times by changing just the name of the created file to generate the files that contain random bits.

Now lets create a tar archive and remove the folder to save memory:

cd ..
tar cvf testingfiles.tar testing-folder && rm -rf testing-folder

Benchmark

Now lets compare the methods by using the time command.

Please note that we are comparing just TIME and CPU utilization betweenTraditional vs Parallel methods and not the size (even though for reference we provide that info) or the Speed/Compression ratio of each tool.

Traditional methods

Gzip

Compress, show used space, show some file attributes and then decompress with gzip.

time gzip -k testingfiles.tar;du -ksh testingfiles.tar*;ls -lh testingfiles.tar*;time gzip -d testingfiles.tar.gz

the output should look like:

gzip -k testingfiles.tar  19.06s user 0.34s system 99% cpu 19.453 total
301M    testingfiles.tar
301M    testingfiles.tar.gz
-rw-r--r-- 1 user user 301M Sep 23 03:30 testingfiles.tar
-rw-r--r-- 1 user user 301M Sep 23 03:30 testingfiles.tar.gz
gzip: testingfiles.tar already exists; do you wish to overwrite (y or n)? y
gzip -d testingfiles.tar.gz  2.27s user 0.34s system 11% cpu 23.113 total

Bzip2

Compress, show used space, show some file attributes and then decompress with bzip2.

time bzip2 -k testingfiles.tar;du -ksh testingfiles.tar*;ls -lh testingfiles.tar*;time bunzip2 -f testingfiles.tar.bz2

the output should look like:

bzip2 -k testingfiles.tar  151.70s user 0.67s system 100% cpu 2:32.30 total
301M    testingfiles.tar
302M    testingfiles.tar.bz2
-rw-r--r-- 1 semin semin 301M Sep 23 03:30 testingfiles.tar
-rw-r--r-- 1 semin semin 302M Sep 23 03:30 testingfiles.tar.bz2
bunzip2 -f testingfiles.tar.bz2  56.43s user 0.75s system 100% cpu 57.162 total

XZ

Compress, show used space, show some file attributes and then decompress with xz.

time xz -k testingfiles.tar;du -ksh testingfiles.tar*;ls -lh testingfiles.tar*;time unxz -f testingfiles.tar.xz

the output should look like:

xz -k testingfiles.tar  261.58s user 1.19s system 93% cpu 4:41.12 total
301M    testingfiles.tar
301M    testingfiles.tar.xz
-rw-r--r-- 1 semin semin 301M Sep 23 03:30 testingfiles.tar
-rw-r--r-- 1 semin semin 301M Sep 23 03:30 testingfiles.tar.xz
unxz -f testingfiles.tar.xz  0.66s user 0.50s system 99% cpu 1.164 total

Parallel methods

PIGzip

Compress, show used space, show some file attributes and then decompress with pigz.

time pigz -k testingfiles.tar;du -ksh testingfiles.tar*;ls -lh testingfiles.tar*;time pigz -d -f testingfiles.tar.gz 

the output should look like:

pigz -k testingfiles.tar  21.92s user 0.62s system 173% cpu 12.970 total
301M    testingfiles.tar
301M    testingfiles.tar.gz
-rw-r--r-- 1 semin semin 301M Sep 23 03:30 testingfiles.tar
-rw-r--r-- 1 semin semin 301M Sep 23 03:30 testingfiles.tar.gz
pigz -d -f testingfiles.tar.gz  0.57s user 0.64s system 109% cpu 1.111 total

PBzip2

Compress, show used space, show some file attributes and then decompress with pbzip2.

time pbzip2 -k testingfiles.tar;du -ksh testingfiles.tar*;ls -lh testingfiles.tar*;time pbzip2 -d -f testingfiles.tar.bz2 

the output should look like:

pbzip2 -k testingfiles.tar  174.41s user 2.49s system 170% cpu 1:43.63 total
301M    testingfiles.tar
302M    testingfiles.tar.bz2
-rw-r--r-- 1 semin semin 301M Sep 23 03:30 testingfiles.tar
-rw-r--r-- 1 semin semin 302M Sep 23 03:30 testingfiles.tar.bz2
pbzip2 -d -f testingfiles.tar.bz2  58.96s user 2.12s system 172% cpu 35.443 total

Pxz

Compress, show used space, show some file attributes and then decompress with gzip.

time pxz -k testingfiles.tar;du -ksh testingfiles.tar*;ls -lh testingfiles.tar*;time pxz -d -f testingfiles.tar.xz

the output should look like:

pxz -k testingfiles.tar  254.53s user 2.59s system 166% cpu 2:34.03 total
301M    testingfiles.tar
301M    testingfiles.tar.xz
-rw-r--r-- 1 semin semin 301M Sep 23 03:30 testingfiles.tar
-rw-r--r-- 1 semin semin 301M Sep 23 03:30 testingfiles.tar.xz
pxz -d -f testingfiles.tar.xz  0.72s user 0.43s system 98% cpu 1.165 total

Results and Discussion

Here are the results summarized in a table format and graphically represented based on the time needed to complete the task.

Compression

gzip pigz bzip2 pbzip2 xz pxz
CPU (avg.%) 99.00% 173.00% 100.00% 170.00% 93.00% 166.00%
TIME (min.sec) 0.19 0.12 2.32 1.43 4.41 2.34

compression-bench

Decompression

gzip pigz bzip2 pbzip2 xz pxz
CPU (avg.%) 11.00% 109.00% 100.00% 172.00% 99.00% 98.00%
TIME (min.sec) 0.23 0.01 0.57 0.35 0.01 0.1

decompression-bench

 

Commentary

Lets try and interpret the results:

  • Gzip for compression it used just one core for 19 secs and for decompression 1 core but not fully utilizing it for 23 secs
  • PIGzip for compression it used both cores for 12 secs and for decompression 1-2 cores but partially utilizing them for 1 sec
  • Bzip2 for compression it used just one core for 2.5 minutes and for decompression 1 core fully utilizing it for 57 sec
  • Pbzip2 for compression it used both one cores for 1.5 minutes and for decompression both cores for 35 secs
  • Xz for compression it used just one core for ~4.5 minutes and for decompression 1 core for 1 sec
  • Pxz for compression it used both cores for 2.5 minutes and for decompression 1 core for 1 secs

As we can see parallel compression and decompression:

  • Utilize all the cores that we have in a multi-core CPU when compressing and in most cases when decompressing
  • Half and in some cases less is the time needed to complete the task

It is important though to understand that this numbers can vary depending on the file types that you use them on.

References

  1. Comparison of compression tools: Gzip vs Bzip2 vs LZMA vs XZ vs LZ4 vs LZO
  2. AskUbuntu: Multi-Core Compression tools
Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.