Archiving and Compressing Data

Archiving with tar

Archiving and Compressing Data
This subject can fall into the general “housekeeping” of your files. You can put all of your related files and store them as one larger separate file called a “tarfile.”

Archiving with tar
Creating a file archive (tarfile) can help you organize your files for storage or transfer.

Archiving with tar
You can use a file archive (tarfile) when you: want to keep files for later reference but are done with them at the moment. want to compress groups of files for storage or transfer to other hosts.

Archiving with tar
Tar (tape archive) is one UNIX command we use to accomplish this. Tar can also send files directly to a magnetic tape but for our purposes we will use it to make file archives. We used this the first day to extract your UNIX_class subdirectory structure.

1

Archiving with tar
Creating a tarfile
Suppose we want to archive a directory. Let’s take dir2 from our UNIX_class examples. It is always a good idea to verify the name of the directory you want to archive.

tar Practice
Start by moving into your UNIX_class subdirectory.
$ ls –l drwxrwxr-x drwxrwxr-x drwxrwxr-x drwxrwxr-x drwxrwxr-x drwxrwxr-x 2 2 2 2 4 4 userid userid userid userid userid userid userid userid userid userid userid userid 512 512 512 512 512 512 Jun Jun Jun Jun Jun Jun 28 18 18 24 24 28 17:52 21:31 20:10 15:38 13:55 15:19 Animal Shakespeare Wildcards dir1 dir2 dir3

We will keep the example simple and create the archive file in the same directory.

tar Practice
Type: $ tar –cvf dir2.tar dir2

tar Practice
a a a a a a a a a a a a a a dir2/ 0K dir2/.DS_Store 7K dir2/address_list 1K dir2/final.paper 1K dir2/history.txt 7K dir2/picts/ 0K dir2/picts/unixbutton.JPG 24K dir2/cats/ 0K dir2/cats/catsup/ 0K dir2/cats/cathode/ 0K dir2/cats/caterpillar/ 0K dir2/cats/caterpillar/butterfly 1K dir2/cats/caterpillar/larva 1K dir2/cats/catalyst 1K

tar Practice
If we list our directory we should see the dir2.tar at the same level in hierarchy as dir2.
$ ls –l drwxrwxr-x drwxrwxr-x drwxrwxr-x drwxrwxr-x drwxrwxr-x -rw-rw-r-drwxrwxr-x 2 2 2 2 4 1 4 userid userid userid userid userid userid userid userid 512 Jun userid 512 Jun userid 512 Jun userid 512 Jun userid 512 Jun userid 49664 Ju1 userid 512 Jun 28 18 18 24 24 8 28 17:52 21:31 20:10 15:38 13:55 16:28 15:19 Animal Shakespeare Wildcards dir1 dir2 dir2.tar dir3

tar Practice
You will see that you have a file named dir2.tar and it is not a directory, but a regular file.
$ ls –l drwxrwxr-x drwxrwxr-x drwxrwxr-x drwxrwxr-x drwxrwxr-x -rw-rw-r-drwxrwxr-x 2 2 2 2 4 1 4 userid userid userid userid userid userid userid userid 512 Jun userid 512 Jun userid 512 Jun userid 512 Jun userid 512 Jun userid 49664 Ju1 userid 512 Jun 28 18 18 24 24 8 28 17:52 21:31 20:10 15:38 13:55 16:28 15:19 Animal Shakespeare Wildcards dir1 dir2 dir2.tar dir3

2

Archiving with tar
What we did here is create a tarfile called dir2.tar which is a file archive of all the files and directories under dir2.

Archiving with tar
$ tar –cvf filename.tar directoryname

The options: -c tells tar to “create”

Lets look at tar more closely. There are many options and different ways to use the command, but we will focus on the ones you will use most often.

Archiving with tar
$ tar –cvf filename.tar directoryname

Archiving with tar
$ tar –cvf filename.tar directoryname

The options: -v tells tar to be verbose in the output it provides. This is not necessary, but it is helpful to see what is going on. It tells you what you are doing (appending or extracting) and the file names and sizes.

The options: -f tells tar that we are using a tarfile and the next [operand] has to specify the name of the tarfile. If we did not specify the -f option then tar would assume that your files are going to tape.

Archiving with tar
$ tar –cvf filename.tar directoryname

Archiving with tar
$ tar –cvf filename.tar directoryname

We use certain things in UNIX by convention and not by any rule. When we specify a name for our tarfile (file archive) it is good practice to use the .tar extension.

We use certain things in UNIX by convention and not by any rule. I also recommend, if you are tarring a whole directory as in our example, to have the new filename be the same as the original directoryname.

3

Archiving with tar
tar –cvf /home/jsmith/tarfiles/dir2.tar dir2 tar –cvf ~/tarfiles/dir2.tar dir2
Remember that the “tarfiles” directory has to already exist

Archiving with tar
tar –cvf /home/jsmith/tarfiles/dir2.tar dir2 | pathname tarfile
original directory

If we wanted the tarfile to be stored somewhere else in your directory structure you could specify an absolute or relative pathname in front of the tarfile. It could be stored anywhere in which you have write permissions.

If we wanted the tarfile to be stored somewhere else in your directory structure you could specify an absolute or relative pathname in front of the tarfile. It could be stored anywhere in which you have write permissions.

Extracting with tar
Extracting a tarfile: At some point in time you will probably need access to individual files that reside in your archive. You can extract the file individually or the whole archive. For the purpose of our example we will extract the contents of dir2.tar to another location.

Extracting with tar
First, from our home directory, let’s do a listing of our UNIX_class directory. $ cd (takes us to the top of our home directory) $ ls -l UNIX_class

Extracting with tar
$ cd $ ls –l UNIX_class drwxrwxr-x 2 userid drwxrwxr-x 2 userid drwxrwxr-x 2 userid drwxrwxr-x 2 userid drwxrwxr-x 4 userid -rw-rw-r-- 1 userid drwxrwxr-x 4 userid userid 512 Jun userid 512 Jun userid 512 Jun userid 512 Jun userid 512 Jun userid 49664 Ju1 userid 512 Jun 28 18 18 24 24 8 28 17:52 21:31 20:10 15:38 13:55 16:28 15:19 Animal Shakespeare Wildcards dir1 dir2 dir2.tar dir3

Extracting with tar
When we extract our tarfile using this particular example, we need to be in the directory in which we want the files to end up. For more tar options, see the man page on tar.

We want to extract the tarfile to a different location. In this case, the top of our home directory.

4

Extracting with tar
Make sure you are in your home directory (not in UNIX_class). $ pwd (always a good habit to get into)

Extracting with tar
Type: $ tar –xvf UNIX_class/dir2.tar
-x option means to extract.

Extracting with tar
x x x x x x x x x x x x x x dir2, 0 bytes, 0 tape blocks dir2/.DS_Store, 6148 bytes, 13 tape blocks dir2/address_list, 455 bytes, 1 tape blocks dir2/final.paper, 716 bytes, 2 tape blocks dir2/history.txt, 7091 bytes, 14 tape blocks dir2/picts, 0 bytes, 0 tape blocks dir2/picts/unixbutton.JPG, 24125 bytes, 48 tape blocks dir2/cats, 0 bytes, 0 tape blocks dir2/cats/catsup, 0 bytes, 0 tape blocks dir2/cats/cathode, 0 bytes, 0 tape blocks dir2/cats/caterpillar, 0 bytes, 0 tape blocks dir2/cats/caterpillar/butterfly, 325 bytes, 1 tape blocks dir2/cats/caterpillar/larva, 103 bytes, 1 tape blocks dir2/cats/catalyst, 174 bytes, 1 tape blocks

Extracting with tar
The verbose output tells you that it is extracting (the “x” at the beginning),and lists the file names and sizes.

Extracting with tar
Verify dir2 is there in your home directory: $ ls dir2 UNIX_class

Extracting with tar
Remember that when you extract, your original tarfile is still there where you put it in the first place. You can extract it as many times as you like to any other locations in which you have write permissions.

5

Compressing and Uncompressing Compressing and Uncompressing
Compressing reduces the file size using a special encoding. Tarring files and compressing often go hand in hand. You can compress a whole tarfile without having to compress each individual file.

Compressing and Uncompressing
This will be helpful if you have large directories that you’ve tarred and will not need for some time and you want to save some disk space.

Compressing and Uncompressing
As you could imagine, it will also save on upload/download times with ‘sftp’ if your file is compressed before sending it to another host.

Compressing and Uncompressing
The compression command we recommend is gzip. It was designed to replace the older UNIX command simply called compress. It is more efficient and free, therefore it is widely supported on other platforms.

Compressing and Uncompressing
gzip is used to compress (zip) a file or files gunzip is used to expand (unzip) a file or files gzip produces files with a .gz extension

6

Compressing and Uncompressing
Using gzip to compress a file.

Compressing Practice
Let’s find our previous tarfile (dir2.tar) that should reside at the top of your UNIX_class directory and try this handy utility. $ cd UNIX_class $ ls -l

Compressing Practice
$ cd UNIX_class $ ls –l drwxrwxr-x 2 userid drwxrwxr-x 2 userid drwxrwxr-x 2 userid drwxrwxr-x 2 userid drwxrwxr-x 4 userid -rw-rw-r-- 1 userid drwxrwxr-x 4 userid userid 512 Jun userid 512 Jun userid 512 Jun userid 512 Jun userid 512 Jun userid 49664 Ju1 userid 512 Jun 28 18 18 24 24 8 28 17:52 21:31 20:10 15:38 13:55 16:28 15:19 Animal Shakespeare Wildcards dir1 dir2 dir2.tar dir3

Compressing Practice
Type: $ gzip dir2.tar

Let’s take a look at what we did.

Compressing Practice
$ gzip dir2.tar $ ls –l drwxrwxr-x 2 userid drwxrwxr-x 2 userid drwxrwxr-x 2 userid drwxrwxr-x 2 userid drwxrwxr-x 4 userid -rw-rw-r-- 1 userid drwxrwxr-x 4 userid userid 512 Jun userid 512 Jun userid 512 Jun userid 512 Jun userid 512 Jun userid 29601 Ju1 userid 512 Jun 28 18 18 24 24 8 28 17:52 21:31 20:10 15:38 13:55 16:28 15:19 Animal Shakespeare Wildcards dir1 dir2 dir2.tar.gz dir3

Compressing and Uncompressing
$ gzip filename In it’s simplest form the only [operand] you have to supply to gzip is the filename you want to compress (zip). In this case we used dir2.tar.

7

Compressing and Uncompressing
Notice the file size difference between the unzipped file and the zipped one. before:
-rw-rw-r-- 1 userid userid 49664 Ju1 8 16:28 dir2.tar 8 16:28 dir2.tar.gz

Compressing and Uncompressing
Using gunzip to uncompress a file.

after:
-rw-rw-r-- 1 userid userid 29601 Ju1

Also notice the .gz extension

Compressing and Uncompressing
Here again, the command can be very simple. Always look to the man page for more options. The basic usage of the gunzip command is to reverse the operation of the gzip command.

Uncompressing Practice
Type: $ gunzip dir2.tar.gz

Let’s take a look at what we did.

Uncompressing Practice
$ gunzip dir2.tar.gz $ ls –l drwxrwxr-x 2 userid userid 512 Jun drwxrwxr-x 2 userid userid 512 Jun drwxrwxr-x 2 userid userid 512 Jun drwxrwxr-x 2 userid userid 512 Jun drwxrwxr-x 4 userid userid 512 Jun -rw-rw-r-- 1 userid userid 49664 Ju1 drwxrwxr-x 4 userid userid 512 Jun
28 18 18 24 24 8 28 17:52 21:31 20:10 15:38 13:55 16:28 15:19 Animal Shakespeare Wildcards dir1 dir2 dir2.tar dir3

The End…
Next …
Pipes and Redirects

Now we are back to our original filesize and gunzip removed the extension .gz for us. Simple!

8