• Aucun résultat trouvé

Performing and Documenting a Computational Experiment

Dans le document URLs Referenced in This Book (Page 65-68)

Chapter 2. Computational Approaches to Biological Questions

2.5 A Computational Biology Experiment

2.5.6 Performing and Documenting a Computational Experiment

When managing results for a computational project, you should make a distinction between primary results and results of subsequent analyses. You should include a separate section in your results directory for any analysis steps you may perform on the data (for instance, the results of statistical tests or curve fitting). This section should include any observations you may have made about the data or the data collection. Keep separate the results, which are the data you get from executing the experiment, and the analysis, which is the insight you bring to the data you have collected.

One tendency that is common to users of computational biology software is to keep data and notes about positive results while neglecting to document negative results. Even if you've done a web-based BLAST search against a database and found nothing, that is information.

And if you've written or downloaded a program that is supposed to do something, but it doesn't work, that information is valuable too—to the next guy who comes in to continue your project and wastes time trying to figure out what works and what doesn't.

2.5.6.1 Documentation issues in computational biology

Many researchers, even those who do all their work on the computer, maintain paper laboratory notebooks, which are still the standard recording device for scientific research.

Laboratory notebooks provide a tangible physical record of research activities, and maintenance of lab records in this format is still a condition of many research grants.

However, laboratory notebooks can be an inconvenient source of information for you or for others who are trying to duplicate or reconstruct your work. Lab notebooks are organized linearly, with entries sorted only by date. They aren't indexed (unless you have a lot more free time than most researchers do). They can't be searched for information about particular subjects, except by the unsatisfactory strategy of sitting down and skimming the whole book beginning to end.

Computer filesystems provide an intuitive basis for clear and precise organization of research records. Information about each part of a project can be stored logically, within the file hierarchy, instead of sequentially. Instead of (or in addition to) a paper notebook on your bookshelf, you will have electronic record embedded within your data. If your files are named systematically and simple conventions are used in structuring your electronic record, Unix tools such as the grep command will allow you to search your documentation for occurrences of a particular word or date and to find the needed information much more quickly than you would reading through a paper notebook.

2.5.6.2 Electronic notebooks

While you can get by with homegrown strategies for building an electronic record of your work, you may want to try one of the commercial products that are available. Or, if you're looking for a freeware implementation of the electronic notebook concept, you can obtain a

Safari | Developing Bioinformatics Computer Skills -> 2.5 A Computational Biology Experiment

http://safari.oreilly.com/main.asp?bookname=bioskills&snode=34 (4 of 6) [6/2/2002 8:54:17 AM]

copy of the software being developed by the DOE2000 Electronic Notebook project. The eNote package lets you input text, upload images and datafiles, and create sketches and annotations. It's a Perl CGI program and will run on any platform with a web server and a Perl interpreter installed. When installed, it's accessible from a web URL on your machine, and you can update your notebook through a web form. The DOE project is designed to fulfill federal agency requirements for laboratory notebooks, as scientific research continues to move into the computer age.

The eNote package is extremely simple to install. It requires that you have a working web server installed on your machine. If you do, you can download the eNote archive and unpack it in its own directory, for example /usr/local/enote. The three files enote.pl, enotelib.pl, and sketchpad.pl are the eNote programs. You need to move enote.pl to the /home/httpd/cgi-bin directory (or wherever your executable CGI directory is; this one is the default on Red Hat Linux systems) and rename it enote.cgi. If you want to restrict access to the notebook, create a special subdirectory just for the eNote programs, and remember that the directory will show up in the URL path to access the CGI script. The sketchpad.pl file should also be moved to this directory, but it doesn't have to be renamed. Move the directories gifs and new-gifs to a web-accessible location. You can create a directory such as /home/httpd/enote for this purpose. Leave the file enotelib.pl and the directory sketchpad where you unpacked them.

Finally, you need to edit the first line in both enote.cgi and sketchpad.pl to point to the location of the Perl executable on your machine. Edit the enote.cgi script to reflect the paths where you installed the eNote script and its support files. You also need to choose a directory in which you want eNote to write entries. For instance, you may want to create a

/home/enote/notebook directory and store eNote write files there. If so, be sure that directory is readable and writable by other users so the web server (which is usually identified as user nobody) can write there.

The eNote script also contains parameters that specify whether users of the notebook system can add, delete, and modify entries. If you plan to use eNote seriously, these are important parameters to consider. Would you allow users to tear unwanted pages out of a laboratory notebook or write over them so the original entry was unreadable? eNote allows you to maintain control over what users can do with their data.

The eNote interface is a straightforward web form, which also links to a Java sketchpad applet. If you want only specific users with logins on your machine to be able to access the eNote CGI script, you can set up a .htaccess file in the eNote subdirectory of your CGI directory. A .htaccess file is a file readable by your web server that contains commands to restrict access to a particular directory and/or where it can be accessed from. For more

information on creating a .htaccess file, consult the documentation for the web server you are using—most likely Apache on most Linux systems.

If you do begin to use an electronic notebook for storing your laboratory notes, remember that you must save backups of your notebook frequently in case of system failures.

Delivered for Maurice ling Swap Option Available: 7/15/2002

Last updated on 10/30/2001 Developing Bioinformatics Computer Skills, © 2002 O'Reilly

< BACK Make Note | Bookmark CONTINUE >

Safari | Developing Bioinformatics Computer Skills -> 2.5 A Computational Biology Experiment

http://safari.oreilly.com/main.asp?bookname=bioskills&snode=34 (5 of 6) [6/2/2002 8:54:17 AM]

Index terms contained in this section

computational biology experimenting in data sets, selecting

documentation of experiments

DOE2000 Electronic Notebook project electronic notebooks

eNote script (electronic notebook) experiments

carrying out and documenting data sets, selecting

problems, identifying and solving separating into simpler components hypothesis

Java

sketchpad applet (electronic notebook) nonredundant subset of proteins

notebooks (paper/electronic) paper notebooks

Perl language

electronic notebooks and

© 2002, O'Reilly & Associates, Inc.

Safari | Developing Bioinformatics Computer Skills -> 2.5 A Computational Biology Experiment

http://safari.oreilly.com/main.asp?bookname=bioskills&snode=34 (6 of 6) [6/2/2002 8:54:17 AM]

Show TOC | Frames My Desktop | Account | Log Out | Subscription | Help

Programming > Developing Bioinformatics Computer Skills > II: The Bioinformatics Workstation See All Titles

< BACK Make Note | Bookmark CONTINUE >

158127045003020048038218232180015152050067001135112006120215207095124197094065241251111

Dans le document URLs Referenced in This Book (Page 65-68)