• Aucun résultat trouvé

Image Acquisition

Dans le document Advanced Information and Knowledge Processing (Page 149-153)

Kai Huang and Robert F. Murphy

8.2.1 Image Acquisition

Cell biologists evaluating subcellular location patterns may view anywhere from a few to dozens of cells under a microscope, but the number of images saved is usually limited to a few examples for presentation and publication.

In contrast, development and evaluation of methods for location proteomics have required the collection of a sufficient number of digital images to permit statistically meaningful results to be obtained. Our group has used four collections of images for this purpose, which are described here and summarized in Table 8.1. Links for downloading the image collections can be found at http://murphylab.web.cmu.edu/data.

2D Protein Fluorescence Microscope Image Collections

2D CHO. For the initial demonstration of the feasibility of automating subcellular pattern classification, we collected four sets of fluorescence microscope images of Chinese hamster ovary (CHO) cells. In one set, nuclear DNA was labeled with Hoechst 33258; in the other three sets, a specific protein was labeled by immunofluorescence[45]. Antibodies against the Golgi protein giantin, the yeast nucleolar protein NOP4, the lysosomal protein LAMP2, and the microtubule protein tubulin were used. The number of images collected for each protein ranged from 33 to 97. For each field (which

148 Data Mining in Bioinformatics

Table 8.1. Summary of the image collections discussed in this chapter.

Data No. of Microscopy Objective Pixel size No. of Ref.

set classes method in original colors per

field (µm) image

2D 5 Deconvolution 100X 0.23 1 [45]

CHO

2D 10 Deconvolution 100X 0.23 2 [46]

HeLa

3D 11 Confocal 100X 0.0488 3 [414]

HeLa scanning

3D 46 Spinning disk 60X 0.11 1 [73]

3T3 confocal

were chosen to include primarily one cell), three optical slices separated by 0.23 micron in focus position were taken. Nearest-neighbor deconvolution was used to remove out-of-focus fluorescence from the central slice [4].

The resulting images were cropped to remove any partial cells along the edge so that only a single-cell region remained. The most common pixel intensity in an input image was subtracted from all pixels in the image to remove the background fluorescence (the most common pixel value was used as background based on the assumptions that cells typically occupy less than half of the total area of the image and that variation in pixel intensity is greater in the cell than in the background). The images were then thresholded so that all pixels below four times the background value were set to zero. Figure 8.1 shows example processed images taken from the 2D CHO collection.

Fig. 8.1. Example images from the 5-class CHO cell image collection depicting five major subcellular location patterns: (A) giantin, (B) LAMP2, (C) NOP4, (D) tubulin, and (E) DNA. From reference [45].

2D HeLa. For a more complete testing of the capabilities and limits of automated subcellular location analysis, a similar immunofluorescence approach was used to acquire a larger image collection of the human cell line HeLa. This line was used because more antibodies are available for human than for hamster cells and because the cells are larger and better spread than CHO cells. Nine different proteins located in major subcellular organelles were labeled, and DNA was labeled in each sample using a distinguishable fluorescent probe. The proteins included a protein located in the endoplasmic reticulum membrane, the Golgi proteins giantin and Gpp130, a protein on the mitochondria outer membrane, the nucleolar protein nucleolin, the lysosomal protein LAMP2, transferrin receptor (primarily localized in endosomes), and the cytoskeletal proteins beta-tubulin and f-actin. The two Golgi proteins were included to test the ability of the automated methods to distinguish similar patterns. The number of images per protein ranged from 73 to 98. The same fluorescence microscope and nearest neighbor deconvolution method used for the CHO set was applied to the HeLa images. The same preprocessing steps of cropping, background subtraction, and thresholding were applied, but an automated threshold selection method [331] was used rather than the fixed multiple method used previously. A separate DNA class was created by making synthetic two color images in which both colors contained the same DNA image (the DNA images collected in parallel with giantin were used).

Figure 8.2 shows example processed images taken from the 2D HeLa image collection.

3D Protein Fluorescence Microscope Image Collections

3D HeLa.To acquire the 3D image collection, we used a three-laser confocal laser scanning microscope that is able to remove out-of-focus fluorescence while taking an image. The same nine proteins used in the 2D HeLa set were imaged. For each protein, parallel images of DNA and total protein were obtained using additional fluorescent probes so that three-color images were obtained. Every 3D image in the set consisted of 14 to 24 2D slices and the dimensions of each voxel in the resulting 3D stack was 0.049 × 0.049×0.2 microns. Between 50 and 58 3D images were collected for each protein. Automatic cell segmentation was performed by employing a seeded watershed algorithm on the total protein channel using the centers of the nuclei (calculated using the DNA channel) as seeds. After segmentation, the images were further processed by background subtraction and automated thresholding. Figure 8.3 shows an example image taken from the 3D HeLa image collection. A synthetic DNA class was created by copying the DNA image, and a synthetic “cytoplasmic” class was created by copying the total protein image.

150 Data Mining in Bioinformatics

Fig. 8.2.Example images from 2D HeLa cell image collection depicting ten major subcellular location patterns. The target proteins include (A) an ER protein, (B) the Golgi proteins giantin, (C) Gpp130, (D) a lysosomal protein LAMP2, (E) a mitochondrial protein, (F) a nucleolar protein nucleolin, (G) filamentous actin, (H) an endosomal protein transferrin receptor, (J) tubulin, and (K) DNA. Scale bar=10µm. From reference [46].

Fig. 8.3.An example image from the 3D HeLa cell image collection. Each image comprises three channels labeling (A) a specific target protein (tubulin), (B) total DNA, and (C) total protein. Summed projections onto the X–Y and X–Z planes are shown. From reference [414].

3D 3T3. As mentioned previously, random tagging can provide an image library of all proteins expressed in a cell type. For example, CD tagging [200]

introduces a CD cassette into the genome through a genetically engineered retroviral vector. The coding sequence of a green fluorescence protein (GFP) is enclosed in the CD cassette and a GFP-tagged fusion protein will be expressed if (and only if) the CD cassette is inserted into an intron of a gene. Cell lines expressing properly tagged genes can be isolated by selection for GFP expression. The sequence of the tagged gene can be determined later and identified by sequence homology search. We have used a library of CD-tagged lines derived from 3T3 cells [201] to build a collection of single-color 3D images for 46 tagged clones [73]. Between 16 and 33 3D images were collected for each clone, where each voxel represents 0.11×0.11×0.5 microns in space. Since no DNA or total protein images were available to permit automated cropping, manual cropping was conducted followed by background subtraction and thresholding. Figure 8.4 shows example processed images taken from the 3D 3T3 image collection.

Fig. 8.4.Example 3D images from the 3T3 image collection. Summed projections onto the X–Y and X–Z planes are shown for proteins representing six of the major subcellular patterns found by cluster analysis [73].

Dans le document Advanced Information and Knowledge Processing (Page 149-153)