Introduction Linux Archiving & Compression
Archiving is the process of combining multiple files and directories (same or different sizes) into one file. in the second place compression is the process of reducing the size of a file or directory.
Hope you understand the difference between archiving and compression.So now let’s get into the topic.
Archives and Directories
The most common programs for archiving files and directories are:
- Zip
- bZip2
- Tar
ZIP command in Linux with examples
ZIP is a compression and file packaging utility for Unix.As well as each file is stored in single.zip {.zip-filename} file with the extension .zip.
- The zip program puts one or more compressed files into a single zip archive, along with information about the files (name, path, date and time of last modification, protection, and check information to verify file integrity).So an entire directory structure can be packed into a zip archive with a single command.
- Compression ratios of 2:1 to 3:1 are common for text files. zip has one compression method (deflation) and can also store files without compression. In addition to zip automatically chooses the better of the two for each file to be compressed.
The program is not only useful for packaging a set of files for distribution but also for archiving files and for saving disk space by temporarily compressing unused files or directories.
Syntax :
zip [options] zipfile files_list
Syntax for Creating a zip file:
zip myfile.zip filename.txt
Extracting files from zip file
In fact Unzip will list, test, or extract files from a ZIP archive, commonly found on Unix systems.The default behavior (with no options) is to extract into the current directory (and sub-directories below it) all files from the specified ZIP archive.
unzip myfile.zip
Options :
1. -d Option:
Removes the file from the zip archive. After creating a zip file, you can remove a file from the archive using the -d option.
So suppose the following files in current directory :
unixcop1.c
unixcop2.c
unixcop3.c
unixcop4.c
unixcop5.c
unixcop6.c
unixcop7.c
unixcop8.c
Syntax :
zip –d filename.zip file.txt
Command :
zip –d myfile.zip unixcop7.c
After removing unixcop7.c from myfile.zip file, the files can also be restored with unzip command.
unzip myfile.zip
ls
2.-u Option:
Updates the file in the zip archive.This option can also be used to update the specified list of files or add new files to the existing zip file.
Syntax:
zip –u filename.zip file.txt
similarly suppose we have these files in my current directory are listed below:
unixcop1.c
unixcop2.c
unixcop3.c
unixcop4.c
Command :
zip –u myfile.zip unixcop5.c
After updating unixcop5.c from myfile.zip file, the files can also be restored with unzip command.
Commands:
unzip myfile.zip
ls
3. -m Option:
Deletes the original files after zipping and you can also move the specified files into the zip archive actually, this deletes the target directories/files after making the specified zip archive. If a directory becomes empty after removal of the files, the directory is also removed.
Syntax :
zip –m filename.zip file.txt
we have following files in my current directory are listed below:
unixcop1.c
unixcop2.c
unixcop3.c
unixcop4.c
Command :
zip –m myfile.zip *.c
As a result this command has been executed by the terminal :
ls
4.-r Option:
To zip a directory recursively, use the -r option with the zip command and it will recursively zips the files in a directory.This option can also help you to zip all the files present in the specified directory.
Syntax:
zip –r filename.zip directory_name
We also have the files in my current directory (docs) are listed below:
unix.pdf
oracle.pdf
linux.pdf
zip –r mydir.zip docs
5. -x Option:
Equally important to exclude the files in creating the zip.Let say you are zipping all the files in the current directory and want to exclude some unwanted files. You also can exclude these unwanted files using the -x option.
Syntax :
zip –x filename.zip file_to_be_excluded
Also suppose the files in current directory arre listed below:
unixcop1.c
unixcop2.c
unixcop3.c
unixcop4.c
Command :
zip –x myfile.zip unixcop3.c
This command on execution will compress all the files but unixcop3.c file
Command:
ls
Output :
myfile.zip //compressed file
unixcop3.c //this file has been excluded while compressing
6.-v Option:
Verbose mode or print diagnostic version info.Normally, when applied to real operations, this option enables the display of a progress indicator during compression and requests verbose diagnostic info about zip file structure oddities.
When -v is the only command line argument, and either stdin or stdout not redirected to a file, a diagnostic screen is printed. In addition to the help screen header with program name, version, and release date, some pointers to the Info-ZIP home and distribution sites arre given. Then, it shows information about the target environment (compiler type and version, OS version, compilation date and the enabled optional features used to create the zip executable.
Syntax :
zip –v filename.zip file1.txt
As listed below:
unixcop1.c
unixcop2.c
unixcop3.c
unixcop4.c
zip -v file1.zip *.c
bzip2 command in Linux with examples
bzip2 command in Linux used to compress and decompress the files i.e. it helps in binding the files into a single file which takes less storage space as the original file use to take. Also it has a slower decompression time and higher memory use. In addition to it can also use Burrows-Wheeler block sorting text compression algorithm, and Huffman coding.Each file is replaced by a compressed version of itself, with the name original name of the file followed by extension bz2.
Syntax for compressing:
bzip2 [OPTIONS] filenames ...
Syntax of unpacking:
bunzip2 [OPTIONS] filenames ...
Options:
- -z Option : This option forces compression.
bzip2 -z input.txt
Example:
- Note: This option deletes the original file also.
- 2. -k option: This option does compression but does not deletes the original file.
bzip2 -k input.txt
Example:
3. -d Option:
consequently this option is used for decompression of compressed files.
bzip2 -d input.txt.bz2
Example:
4. -t Option :
This option does the integrity check of the file and does not decompresses the file. It can also give us the idea that the file is corrupt or not.
bzip2 -t input.txt.bz2
Example:
5.-v Option: Verbose mode show the compression ratio for each file processed.
bzip2 -v input.txt
Example:
- -h –help : To display the help message and exit.
- -L –license -V –version : It is used to display the software version, license terms in addition to conditions.
- -q –quiet : It will suppress non-essential warning messages.Messages pertaining to I/O errors and other critical events will also not be suppressed.
- -f –force : It will force overwrite of output files.
tar command in Linux with examples
The Linux ‘tar’ stands for tape archive, used to create Archive and also extract the Archive files. tar command in Linux is one of the important command which provides archiving functionality in Linux. We can use Linux tar command to create compressed or uncompressed Archive files and also maintain and modify them.
Syntax:
tar [options] [archive-file] [file or directory to be archived]
Options:
-c : Creates Archive
-x : Extract the archive
-f : creates archive with given filename
-t : displays or lists files in archived file
-u : archives and adds to an existing archive file
-v : Displays Verbose Information
-A : Concatenates the archive files
-z : zip, tells tar command that create tar file using gzip
-j : filter archive tar file using tbzip
-W : Verify a archive file
-r : update or add file or directory in already existed .tar file
What is an Archive file?
An Archive file is a file that is composed of one or more files along with metadata. Archive files are used to collect multiple data files together into a single file for easier portability and storage, So it simply to compress files to use less storage space.
Examples:
1. Creating an uncompressed tar Archive using option -cvf :
This command creates a tar file called file.tar which is the Archive of all .c files in current directory.
tar cvf file.tar *.c
2. Extracting files from Archive using option -xvf :
This command extracts files from Archives.
tar xvf file.tar
3. gzip compression on the tar Archive, using option -z :
This command creates a tar file called file.tar.gz which is the Archive of .c files.
tar cvzf file.tar.gz *.c
4. Extracting a gzip tar Archive *.tar.gz using option -xvzf : This command also extracts files from tar archived file.tar.gz files.
tar xvzf file.tar.gz
5. Creating compressed tar archive file in Linux using option -j :
This command compresses and creates archive file less than the size of the gzip. Both compress and decompress takes more time then gzip.
tar cvfj file.tar.tbz example.cpp
6. Untar single tar file or specified directory in Linux : This command will Untar a file in current directory in addition to will untar it in a specified directory using -C option.
tar xvfj file.tar
or
tar xvfj file.tar -C path of file in directoy
7. Untar multiple .tar, .tar.gz, .tar.tbz file in Linux
This command will extract or untar multiple files from the tar, tar.gz and tar.bz2 archive file. So for example the above command will extract “fileA” “fileB” from the archive files.
tar xvf file.tar "fileA" "fileB"
or
tar zxvf file1.tar.gz "fileA" "fileB"
or
tar jxvf file2.tar.tbz "fileA" "fileB"
8. Check size of existing tar, tar.gz, tar.tbz file in Linux
This command will display the size of archive file in Kilobytes(KB).
tar czf file.tar | wc -c or tar czf file.tar.gz | wc -c or tar czf file.tar.tbz | wc -c
9. Update existing tar file in Linux
tar rvf file.tar *.c
10. list the contents and specify the tarfile using option -tf
This command will list the entire list of archived file.even we can also list for specific content in a tarfile
tar tf file.tar
11. Applying pipe to through grep command’ to find what we are looking for
This command will list only for the mentioned text or image in grep from archived file.
tar tvf file.tar | grep "text to find"
or
tar tvf file.tar | grep "filename.file extension"
Example:
tar tvf file.tar | grep "unixcop1"
12. We can pass a file name as an argument to search a tarfile
This command views the archived files along with their details.
tar tvf file.tar filename Example: tar tvf file.tar unixcop1.c
13. Viewing the Archive using option -tvf
tar tvf file.tar
What are wildcards in Linux
Accordingly referred to as a ‘wild character’ or ‘wildcard character’, a wildcard is a symbol used to replace or represent one or more characters.Wildcards are typically either an asterisk (*), which represents one or more characters or question mark (?),which represents a single character.
Example :
14. To search for an image in .png format
This will extract only files with the extension .png from the archive file.tar. The wildcards option also tells tar to interpret wildcards in the name of the files to be extracted besides to the filename (*.png) is enclosed in single-quotes to protect the wildcard (*) from being expanded incorrectly by the shell.
tar tvf file.tar --wildcards '*.png'
Note: In above commands ” * ” is used in place of file name to take all the files present in that particular directory.