Abstract
Using multiple cores and processors simultaneously to achieve faster compression and decompression rates is possible nowadays with the new generation of multi-core cpu’s. Using the following methods to create compressed backups of your files will be less time consuming.
Introduction
The most widely used compression tools are gzip, bzip, and xz. It is a common practice to create a tar file of a number of folders that contain various files and use the aforementioned tools to compress it. This way we can save bandwidth and hard disk space. In this article we will install, create a test file, tar it and then we will compress it with traditional methods and then compare it to the parallel compression methods focusing on CPU utilization and time consumption.
Materials and Methods
Most Linux distribution come preinstalled with the gzip, bzip and xz compression tools. We will need just to install the parallel compression tools.
The tests were performed on a dell Inspiron 14 with :
- Intel Celeron Processor (Dual Core)
- 2GB Memory
Install Required Packages
Open a terminal and install the required packages. In Debian/Ubuntu systems you can do that as following:
sudo apt-get install pxz pigz pbzip2
Testing file
Now we will be creating some random files inside our RAM memory. Then we will create a tar archive out of these files and start using the compression tools.
cd /dev/shm
mkdir testing-folder
cd testing-folder
dd if=/dev/urandom of=hundredmegfile1 bs=1024 count=102400
dd if=/dev/urandom of=hundredmegfile2 bs=1024 count=102400
dd if=/dev/urandom of=hundredmegfile3 bs=1024 count=102400
We run the dd
part 3 times by changing just the name of the created file to generate the files that contain random bits.
Now lets create a tar archive and remove the folder to save memory:
cd ..
tar cvf testingfiles.tar testing-folder && rm -rf testing-folder
Benchmark
Now lets compare the methods by using the time
command.
Please note that we are comparing just TIME and CPU utilization betweenTraditional vs Parallel methods and not the size (even though for reference we provide that info) or the Speed/Compression ratio of each tool.
Traditional methods
Gzip
Compress, show used space, show some file attributes and then decompress with gzip.
time gzip -k testingfiles.tar;du -ksh testingfiles.tar*;ls -lh testingfiles.tar*;time gzip -d testingfiles.tar.gz
the output should look like:
gzip -k testingfiles.tar 19.06s user 0.34s system 99% cpu 19.453 total
301M testingfiles.tar
301M testingfiles.tar.gz
-rw-r--r-- 1 user user 301M Sep 23 03:30 testingfiles.tar
-rw-r--r-- 1 user user 301M Sep 23 03:30 testingfiles.tar.gz
gzip: testingfiles.tar already exists; do you wish to overwrite (y or n)? y
gzip -d testingfiles.tar.gz 2.27s user 0.34s system 11% cpu 23.113 total
Bzip2
Compress, show used space, show some file attributes and then decompress with bzip2.
time bzip2 -k testingfiles.tar;du -ksh testingfiles.tar*;ls -lh testingfiles.tar*;time bunzip2 -f testingfiles.tar.bz2
the output should look like:
bzip2 -k testingfiles.tar 151.70s user 0.67s system 100% cpu 2:32.30 total
301M testingfiles.tar
302M testingfiles.tar.bz2
-rw-r--r-- 1 semin semin 301M Sep 23 03:30 testingfiles.tar
-rw-r--r-- 1 semin semin 302M Sep 23 03:30 testingfiles.tar.bz2
bunzip2 -f testingfiles.tar.bz2 56.43s user 0.75s system 100% cpu 57.162 total
XZ
Compress, show used space, show some file attributes and then decompress with xz.
time xz -k testingfiles.tar;du -ksh testingfiles.tar*;ls -lh testingfiles.tar*;time unxz -f testingfiles.tar.xz
the output should look like:
xz -k testingfiles.tar 261.58s user 1.19s system 93% cpu 4:41.12 total
301M testingfiles.tar
301M testingfiles.tar.xz
-rw-r--r-- 1 semin semin 301M Sep 23 03:30 testingfiles.tar
-rw-r--r-- 1 semin semin 301M Sep 23 03:30 testingfiles.tar.xz
unxz -f testingfiles.tar.xz 0.66s user 0.50s system 99% cpu 1.164 total
Parallel methods
PIGzip
Compress, show used space, show some file attributes and then decompress with pigz.
time pigz -k testingfiles.tar;du -ksh testingfiles.tar*;ls -lh testingfiles.tar*;time pigz -d -f testingfiles.tar.gz
the output should look like:
pigz -k testingfiles.tar 21.92s user 0.62s system 173% cpu 12.970 total
301M testingfiles.tar
301M testingfiles.tar.gz
-rw-r--r-- 1 semin semin 301M Sep 23 03:30 testingfiles.tar
-rw-r--r-- 1 semin semin 301M Sep 23 03:30 testingfiles.tar.gz
pigz -d -f testingfiles.tar.gz 0.57s user 0.64s system 109% cpu 1.111 total
PBzip2
Compress, show used space, show some file attributes and then decompress with pbzip2.
time pbzip2 -k testingfiles.tar;du -ksh testingfiles.tar*;ls -lh testingfiles.tar*;time pbzip2 -d -f testingfiles.tar.bz2
the output should look like:
pbzip2 -k testingfiles.tar 174.41s user 2.49s system 170% cpu 1:43.63 total
301M testingfiles.tar
302M testingfiles.tar.bz2
-rw-r--r-- 1 semin semin 301M Sep 23 03:30 testingfiles.tar
-rw-r--r-- 1 semin semin 302M Sep 23 03:30 testingfiles.tar.bz2
pbzip2 -d -f testingfiles.tar.bz2 58.96s user 2.12s system 172% cpu 35.443 total
Pxz
Compress, show used space, show some file attributes and then decompress with gzip.
time pxz -k testingfiles.tar;du -ksh testingfiles.tar*;ls -lh testingfiles.tar*;time pxz -d -f testingfiles.tar.xz
the output should look like:
pxz -k testingfiles.tar 254.53s user 2.59s system 166% cpu 2:34.03 total
301M testingfiles.tar
301M testingfiles.tar.xz
-rw-r--r-- 1 semin semin 301M Sep 23 03:30 testingfiles.tar
-rw-r--r-- 1 semin semin 301M Sep 23 03:30 testingfiles.tar.xz
pxz -d -f testingfiles.tar.xz 0.72s user 0.43s system 98% cpu 1.165 total
Results and Discussion
Here are the results summarized in a table format and graphically represented based on the time needed to complete the task.
Compression
gzip | pigz | bzip2 | pbzip2 | xz | pxz | |
---|---|---|---|---|---|---|
CPU (avg.%) | 99.00% | 173.00% | 100.00% | 170.00% | 93.00% | 166.00% |
TIME (min.sec) | 0.19 | 0.12 | 2.32 | 1.43 | 4.41 | 2.34 |
Decompression
gzip | pigz | bzip2 | pbzip2 | xz | pxz | |
---|---|---|---|---|---|---|
CPU (avg.%) | 11.00% | 109.00% | 100.00% | 172.00% | 99.00% | 98.00% |
TIME (min.sec) | 0.23 | 0.01 | 0.57 | 0.35 | 0.01 | 0.1 |
Commentary
Lets try and interpret the results:
- Gzip for compression it used just one core for 19 secs and for decompression 1 core but not fully utilizing it for 23 secs
- PIGzip for compression it used both cores for 12 secs and for decompression 1-2 cores but partially utilizing them for 1 sec
- Bzip2 for compression it used just one core for 2.5 minutes and for decompression 1 core fully utilizing it for 57 sec
- Pbzip2 for compression it used both one cores for 1.5 minutes and for decompression both cores for 35 secs
- Xz for compression it used just one core for ~4.5 minutes and for decompression 1 core for 1 sec
- Pxz for compression it used both cores for 2.5 minutes and for decompression 1 core for 1 secs
As we can see parallel compression and decompression:
- Utilize all the cores that we have in a multi-core CPU when compressing and in most cases when decompressing
- Half and in some cases less is the time needed to complete the task
It is important though to understand that this numbers can vary depending on the file types that you use them on.
References
- Comparison of compression tools: Gzip vs Bzip2 vs LZMA vs XZ vs LZ4 vs LZO
- AskUbuntu: Multi-Core Compression tools