• Aucun résultat trouvé

Creating Archives of Your Data

Dans le document URLs Referenced in This Book (Page 147-151)

Part II: The Bioinformatics Workstation

SEQRES 1 357 GLU VAL LEU ILE THR GLY LEU ARG THR ARG ALA VAL ASN 2MNR 106 SEQRES 2 357 VAL PRO LEU ALA TYR PRO VAL HIS THR ALA VAL GLY THR 2MNR 107

5.9 Playing Nicely with Others in a Shared Environment

5.9.4 Creating Archives of Your Data

So, after months of your time, hundreds of megabytes of files, and several layers of subdirectories, the otter project is finally complete. Time to move on to the next project with a clean slate. But as refreshing as it may sound, you can't just type:

% rm -rf otter/

Other people may need to look back at your findings or use them as a starting point for their own research. At the other extreme, you can't leave your files lying around or laboriously copy them a few at a time to another location.

Not every file needs to be accessible at all times; some files are replaced, while others are more conveniently stored elsewhere. This section covers the tools provided by Unix for archiving your data so you don't have to worry about it on a day-to-day basis but can find things later when you need them.

5.9.4.1 tar: Hold the feathers

Safari | Developing Bioinformatics Computer Skills -> 5.9 Playing Nicely with Others in a Shared Environment

http://safari.oreilly.com/main.asp?bookname=bioskills&snode=58 (6 of 9) [6/2/2002 8:59:45 AM]

Usage: tar functions [options] [arguments] filenames

After going through all the effort of setting up your filesystem rationally, it seems like a waste to lose that structure in the process of storing it away, like hastily packed dishes in an unexpected cross-country move. Fortunately, there is a Unix command that lets you work with whole directories of files while retaining the directory structure. tar compacts a directory and all its component files and (if you ask for it) subdirectories into a single file with the name of the compacted directory and a .tar extension. The options for tar break down into two types: functions (of which you must choose one) and options. tar is short for "tape archive," since the utility was originally designed to read and write archives stored on magnetic tape. Another common use of tar is to package software in a form that can be easily transferred over the Internet.

To run tar, you must choose one of the following functions:

c

Creates a new tape archive r

Appends the files to an existing archive u

Adds files to the archive if they aren't present or are modified x

Extracts files from an existing archive t

Prints a table of contents of the archive The options for tar are as follows:

f archive

Performs the specified operation on archive, which can either be a device (such as a tape drive or a removable disk) or a tar file

v

(verbose mode) Prints the name of each file archived or extracted with a character to indicate the function (a for archived; x for extracted)

w

(whiny mode) Asks for confirmation at every step

Note that neither functions nor options require the hyphen that usually precedes Unix command options.

If you type:

% tar cvf otter/

the otter/ directory and all its subdirectories are rolled into a single file called otter.tar. It's good practice to use the v option, so you can see if something is going horribly wrong while the archive is being processed.

If, on the other hand, you want to make an archive of the otter/ directory on the tape drive nftape, you can type:

% tar cvf /dev/nftape otter/

A couple of warnings about tar are in order. First, before you use tar on your system, you should use which to find out whether the GNU or the standard version is installed. Several of the options mean different things to each version;

the ones listed earlier are the same in each version.

Second, the tar file you create will be as large as all the contents of the directory and subdirectories beneath it. This condition has dire implications if your archived directory is large and you have limited disk space, or you need to transfer large amounts of tar 'd data. In these cases, you should break down the directory into subdirectories of a more manageable size, and tar those instead.

Safari | Developing Bioinformatics Computer Skills -> 5.9 Playing Nicely with Others in a Shared Environment

http://safari.oreilly.com/main.asp?bookname=bioskills&snode=58 (7 of 9) [6/2/2002 8:59:45 AM]

If you don't have the space on your current filesystem or partition for your files and the archive you are creating to exist simultaneously, or you wish to download a whole archive file and unpack it just to retrieve a few files, you can transfer your archive over the network or even just to another partition using a combination of ftp and tar commands.

Sending an archive this way and then extracting it at the destination can be less time-consuming than a cp -r if a large number of files are involved. The ftp program recognizes a form in which a command replaces the input filenames.

The command is executed in a subshell on the local machine and operates on files on the local filesystem. The construct is:

ftp command "| command" filename

Inside the ftp program, here's how to send the output of the tar command, enclosed in quotes, into the filename specified as the target on the remote machine:

put "|tar cvBf - *" filename

Here's how to direct the downloaded archive through the tar command, resulting in extraction of only the files in the specified directory within the archive:

get filename.tar "|tar xvf - dirname"

Finally, here's how to list the contents of the remote archive:

get filename.tar "|tar t - *"

5.9.4.2 compress

Usage: compress -[options] filenames

Ultimately, you don't want to be left with large—if more manageable—tar files cluttering up your filesystem. In this situation, data-compression utilities are important, since they allow you to cheat and reduce the amount of space that files take up on your hard disk. compress is the standard Unix file-compression command. It's the opposite of

uncompress, the command used in Chapter 3 to open compressed papers and software. compress adds a .Z to the end of the filename.

Here are the most useful options for compress : -f

Forces compression; even if there is already a compressed version of the file, the main effect is to not overwrite an existing compressed file

-v

(verbose mode) Prints percentage compression achieved by the file -r

(recursive mode) If compress is applied to a directory that contains subdirectories, compresses their contents as well as those of the original directory

If you have a text file named stoat.txt and the tar file of the otter/ directory from the last section, and you want to compress both and look at the resulting compression ratio achieved, type:

% compress -v stoat.txt otter.tar

This command produces two files stoat.txt.Z and otter.tar.Z. The files can be uncompressed using the uncompress command or gzip -d (described next). In case you were wondering, natural languages (the kind humans use) end up with a compression ratio around 60%, and programming languages get around 40%. Try compressing the sequences of some of your favorite proteins to see what sort of ratio you get: the values can be wildly variable, depending on whether there are repeats in the sequence.

5.9.4.3 gzip

Usage: gzip -[options] filenames

As usual, in addition to the standard Unix compress, there's a faster and more efficient GNU utility: gzip. gzip behaves in much the same way as compress, except that it gets better compression on average, since it uses a superior

Safari | Developing Bioinformatics Computer Skills -> 5.9 Playing Nicely with Others in a Shared Environment

http://safari.oreilly.com/main.asp?bookname=bioskills&snode=58 (8 of 9) [6/2/2002 8:59:45 AM]

algorithm. gzip adds the suffix .gz to a file that it compresses. It emulates the compress options described earlier and adds a few of its own:

-N

(default setting) Preserves the original name and timestamp from the file being compressed -q

(quiet mode) Suppresses warnings when running -d

Returns a file that has been compressed by gzip to its uncompressed state; gzip can also recognize and uncompress files produced by compress

Delivered for Maurice ling Swap Option Available: 7/15/2002

Last updated on 10/30/2001 Developing Bioinformatics Computer Skills, © 2002 O'Reilly

< BACK Make Note | Bookmark CONTINUE >

Index terms contained in this section

archives, creating for data

© 2002, O'Reilly & Associates, Inc.

Safari | Developing Bioinformatics Computer Skills -> 5.9 Playing Nicely with Others in a Shared Environment

http://safari.oreilly.com/main.asp?bookname=bioskills&snode=58 (9 of 9) [6/2/2002 8:59:45 AM]

Show TOC | Frames My Desktop | Account | Log Out | Subscription | Help

Programming > Developing Bioinformatics Computer Skills > III: Tools for Bioinformatics See All Titles

< BACK Make Note | Bookmark CONTINUE >

158127045003020048038218232180015152050067001135112006120214154184001152226225090070035

Dans le document URLs Referenced in This Book (Page 147-151)