A Novel Metric for Bone Marrow Cells Chromosome Pairing

ABSTRACT
In this project we presented a new metric algorithm which is used to compare the chromosomes and automatically pairing the chromosomes for leukemia diagnostic purposes. During the Metaphase type of cell division Karyotyping is a set of procedure produces a visual representation of the 46 chromosomes paired and arranged in decreasing order of size. This method is a difficult one because these chromosomes appear distorted, overlapped, and their images are usually blurred with undefined edges. So here Karyotyping uses new mutual information method which is proposed to increase the discriminate power of the G-banding pattern dissimilarity between chromosomes and improve the performance of the classifier. This algorithm is formulated as such a method of combinatorial optimization. Where the distances between homologous chromosomes are minimized and the distances between nonhomologous ones are maximized. A new Bone marrow chromosome dataset Lisbon-K1 (LK1) chromosome dataset with 9200 chromosomes was used in this project. chromosome contains approximately 30 000 genes (genotype) and large tracts of non coding sequences. The analysis of genetic material can involve the examination of specific chromosomal regions using DNA probes, e.g., fluorescent in situ hybridization (FISH) called molecular cytogenetic, comparative Genomic hybridization (CGH) , or the morphological and pattern analysis of entire chromosomes, the conventional cytogenetic, which is the focus of this paper. These cytogenetic studies are very important in the detection of acquired chromosomal abnormalities, such as translocations, duplications, inversions, deletions, monosomies, or trisomies. These techniques are particularly useful in the diagnosis of cancerous diseases and are the preferred ones in the characterization of the different types of leukemia, which is the motivation of this paper . The pairing of chromosomes is one of the main steps in conventional cytogenetic analysis where a correctly ordered karyogram is produced for diagnosis of genetic diseases based on the patient karyotype. The karyogram is an image representation of the stained human chromosomes with the widely used Giemsa Stain metaphase spread (G-banding) , where the chromosomes are arranged in 22 pairs of somatic homologous elements plus two sexdeterminative chromosomes (XX for the female or XY for the male), displayed in decreasing order of size. A karyotype is the set of characteristics extracted from the karyogram that may be used to detect

INTRODUCTION
The study of chromosome morphology and its relation with some genetic diseases is the main goal of cytogenetic. Normal human cells have 23 classes of large linear nuclear chromosomes, in a total of 46 chromosomes per cell. The

Copenhagen. and the clinical staff is trained to pair and interpret each specific karyogram according to the ISCN information. Fig 2 Normal male karyotype Fig 3 Difference between the chromosomes quality in Edinburgh. which is the most important feature for chromosome classification and pairing. related to the chromosome dimensions and shape. The pairing and karyotyping procedure. usually done manually by visual inspection. The International System for Cytogenetic Nomenclature (ISCN) provides standard diagrams/ideograms of band profiles. is time consuming and technically demanding. are also used to increase the discriminative power of the manual or automatic classifiers. as for all the chromosomes of a normal human. This is the most appropriated moment to its visualization and abnormality recognition because the chromosomes appear well defined and clear.chromosomal abnormalities. Other features. The application of the G-banding procedure to the chromosomes generates a distinct transverse banding pattern characteristic for each class. and Philadelphia THE HIMAN KARYOTYPE Fig 1 Metaphase plate of a normal male Fig 4 human Karyotype . The metaphase is the step of the cellular division process where the chromosomes are in their most condensed state.

The results are summarized and given to a board-certified cytogeneticist for review. dimensions. a combinatorial optimization problem is solved in order to obtain a permutation matrix that establishes the right correspondence between the chromosomes of each pair. The normal human karyotypes contain 22 pairs of autosomal chromosomes and one pair of sex chromosomes. cord blood. extracted from the unordered karyogram. blood. The cells are generally fixed repeatedly to remove any debris or remaining red blood cells. are processed by making histogram equalization. tumor. males have both an X and a Y chromosome denoted 46. This kills the cells and hardens the nuclei of the remaining white blood cells. 2) Feature extraction—In this step. G-banding profile. Any variation from the standard karyotype may lead to developmental abnormalities. and replaced with a hypotonic solution. Carnoy's fixative (3:1 methanol to glacial acetic acid) is added. The results are then given out reported in an International System for Human Cytogenetic Nomenclature 2009 (ISCN2009) ALGORITHM DESCRIPTION The algorithm described in this paper is composed of the following three sequential steps. The features extracted in this step are organized in a distance matrix containing the distances (using a given metric described later) between every two chromosomes in the karyogram. The cells are then centrifuged and media and mitotic inhibitor are removed. geometric distortion compensation. SLIDE PREPARATION Cells from bone marrow. e. and dimensional scaling normalization (see Section II-A). The cell suspension is then dropped onto specimen slides. XY. liver. and to write an interpretation taking into account the patients previous history and other clinical findings. chorionic villi. This stops cell division at mitosis which allows an increased yield of mitotic cells for analysis. and tissues (including skin. Normal karyotypes for females contain two X chromosomes and are denoted 46. . ANALYSIS Analysis of banded chromosomes is done at a microscope by a clinical laboratory specialist in cytogenetics (CLSp(CG)). umbilical cord. colcemid) is then added to the culture. and mutual information (MI) [26] between each pair of chromosomes in the karyogram (see Section II-B). the chromosome images. amniotic fluid. discriminative features are extracted from the processed images.g. normalized area. 1) Image processing—In this step. This causes the white blood cells or fibroblasts to swell so that the chromosomes will spread when added to a slide as well as lyses the red blood cells.. Generally 20 cells are analyzed which is enough to rule out mosaicism to an acceptable level. XX. and many other organs) can be cultured using standard cell culture techniques in order to increase their number. A mitotic inhibitor (colchicine. After the cells have been allowed to sit in hypotonic.Most (but not all) species have a standard karyotype. 3) Pairing—In this step. After aging the slides in an oven or waiting a few days they are ready for banding and analysis.

For oncology generally a large number of interphase cells are scored in order to rule Fig 4 a) Two different metaphase plates containing bone marrow chromosomes b) chromosomes from the Copenhagen dataset A new chromosome dataset LK1 [29] was created in collaboration with the Institute of Molecular Medicine. sodium citrate). ordered and annotated by he clinical staff to be used as ground truth data in the conducted tests. MATERIALS AND METHODS Slide preparation The slide is aged using a salt solution usually consisting of 2X SSC (salt. namely. correctly oriented. This dataset is of the same nature and . The slides are then dehydrated in ethanol. noise reduction and chromosome segmentation. Lisbon. Some image preprocessing tasks. generally between 200 and 1000 cells are counted and scored.out low level residual disease. were manually performed with Leica continuous wave (CW) 4000 Karyo software used by the clinical staff. The sample DNA and the probe DNA are then co-denatured using a heated plate and allowed to re-anneal for at least 4 hours. For congenital problems usually 20 metaphase cells are scored. and used to assess the accuracy of the proposed pairing algorithms. experiments were made by using Grisan et al. to test the classification and pairing algorithms of this type of ―low‖ quality chromosomes for leukemia diagnosis purposes. The bone marrow cell chromosomes in this new dataset were manually segmented. such as comparative genomic hybridization arrays. Future of cytogenetics Advances now focus on molecular cytogenetics including automated systems for counting the results of standard FISH preparations and techniques for virtual karyotyping. To further validate the proposed algorithm. The slides are then washed to remove excess unbound probe. The pairing ground truth was obtained manually by the technical staff of the Institute of Molecular Medicine. Lisbon. Analysis Analysis of FISH specimens is done by fluorescence microscopy by a clinical laboratory specialist in cytogenetics. dataset [30]. CGH and Single nucleotide polymorphism-arrays.6-Diamidino-2-phenylindole (DAPI) or propidium iodide. The images were acquired with a Leica Optical Microscope DM 2500. and the probe mixture is added. and counterstained with 4'.

(f) Border regularization The automatic pairing algorithm is composed of four main steps: 1) chromosome image extraction from the unordered karyogram and image processing. 1) Chromosome extraction—Each chromosome is isolated from the unordered karyogram. 3) classifier training. (a) Original image. these components are described in detail. performed by using the algorithm is needed to obtain chromosomes with vertical medial axis. 2) Geometrical compensation—The geometric compensation. (d) and (e) Interpolation along orthogonal lines to the smoothed medial axis. The image Fig 7 Dimension and shape normalization and intensity equalization. (a) . In the next sections. (b) Chromosome and medial axis segmentation. 2) feature extraction. These effects must be compensated to improve the results of the pairing algorithm. Fig 5 very low quality kariogram Fig 6 Geometrical compensation. The image processing step is composed of the following operations. brightness and contrast depend on the specific tuning of the microscope and the particular geometric shape of each chromosome depends on the specific metaphase plaque from which the chromosomes were extracted. IMAGE PROCESSING The image processing step aims at image contrast enhancement and compensation of geometric distortions observed in each chromosome not related with its intrinsic shape or size. and 4) pairing. and Copenhagen datasets because the images are based on cells extracted from the amniotic fluid and choroidal villi (prenatal cytogenetics).quality as the Philadelphia. (c) Axis smoothing. Edinburgh. This compensation algorithm is composed of the following main steps: a) chromosome and medial axis segmentation b) axis smoothing c) interpolationalong orthogonal lines to the smoothed medial axis d) border regularization 3) Shape normalization—The features used in the comparison of chromosomes are grouped into two classes: 1) geometric based 2) pattern based (G-banding).

For example.I. and the results might not be meaningful. RGB = cat (3. to isolate particular objects from their background. 4) Intensity compensation—The metaphase plaque from which the chromosomes are extracted does not present a uniform brightness and contrast. there are other functions that return a different image type as part of the operation they perform. (b) Spatial normalization. you can . You can perform certain conversions just using MATLAB syntax. hence we can choose the most appropriate method for reducing the effects. if you want to filter a color image that is stored as an indexed image. Usually we know what type of errors to expect. and we shall CONCEPTS PHASE USED IN THID 1) Image conversion 2) Denoising 3) Edge detection 4) Two dimensional convolutions. MATLAB simply applies the filter to the indices in the indexed image matrix. to recognize or classify objects. and blue planes. as is appropriate. These errors will appear on the image output in different ways depending on the type of disturbance in the signal. For example. convert a grayscale image to true color format by concatenating three copies of the original matrix along the third dimension. Therefore. we may expect errors to occur in the image signal.I). For example. Denoising We may define noise to be any degradation in the image signal. Cleaning an image corrupted by noise is thus an important area of image restoration. listed in the following table. We may use edges to measure the size of objects in an image. the spatially scaled images are histogram equalized. geometrical and dimensional differences must be removed. There is a large number of edge finding algorithms in existence. you must first convert it to true color format. (d) Band profile To compare chromosomes from a band pattern point of view.I. green. (c) Histogram equalization. Image conversion The toolbox includes many functions that you can use to convert an image from one type to another. or at least attenuated. and hence the type of noise on the image. When you apply the filter to the true color image. The resulting true color image has identical matrices for the red. MATLAB filters the intensity values in the image. If you attempt to filter the indexed image. a dimensional scaling is performed before the pattern features is extracted to make all the chromosome with the same size and aspect ratio by interpolating the original images. To compensate for this inhomogeneity. via satellite or wireless transmission. If an image is being sent electronically from one place to another. the region of interest functions returns a binary image that you can use to mask an image for filtering or for other operations.Geometrically compensated image.In addition to these image type conversion functions. so the image displays as shades of gray. or through networked cable. Edge detection Edges contain some of the most useful information in an image. caused by external disturbance.

the cost function. this case is the same as C = conv2(hcol*hrow.. If one of these matrices describes a twodimensional finite impulse response (FIR) filter. The sum of distances implied by a pairing P can be written as CLASSIFIER The pairing process is a computationally hard problem because the optimal pairing must minimize the overall distance.na] and the size of B is [mb. have to be expressed by linear functions of the variables. avoidance of over-fitting. . . . This problem can be stated as a combinatorial optimization problem.hrow.e. as specified by the shape parameter insensitivity to feature dimensionality. Its training process guarantees a globally optimized solution.parameters. . as well as the constraints.'shape') returns a subsection of the two-dimensional convolution.. A complete description of the theory of SVMs for pattern recognition is given in . n. thus allowing for very efficient optimization methods.look at some of the more straightforward of them. Note that the cost function (11) can be reformulated as a matrix inner product between the distance matrix D and a pairing matrix X = {x(i. A pairing assignment is said to be total if and only if. a pairing assignment P is defined as a set of ordered pairs (i. j)}. then the size of C is [ma+mb-1. That is.na+nb-1]. To do so. minus one. An Online Support Vector Classifier (OSVC). The indices of the center element of B are defined as floor(([mb nb]+1)/2). the other matrix is filtered in two dimensions. the solution is the global minimum of the cost function. C = conv2(hcol. j).B) computes the twodimensional convolution of matrices A and B. such that: 1) holds for any pair and 2) any given index i appears in no more than one pair of the set. Considering n chromosomes (for n even). it can be formulated as an integer programming problem. . i. for any i = 1. is thus proposed..'method'. where . the on-line adaptivity is incorporated into the algorithmic design to accommodate the ever-changing experimental conditions. If hcol is a column vector and hrow is a row vector. . The size of C in each dimension is equal to the sum of the corresponding dimensions of the input matrices. The general Matlab command for finding edges is edge(image.nb]. C = conv2(..A). there is exactly one pair (r. A classifier with good overall performance. In contrast to conventional off-line learning algorithms for classification. Moreover. if the size of A is [ma. mapping inputs into a higher dimensional space wherein an optimized linear division with least errors and maximal margin is seeked. and and the goal of the pairing process is to find a total pairing P that minimizes C(P).A) convolves A first with the vector hcol along the rows and then with the vector hrow along the columns. ) Where the parameters available depend on the method used Two dimensional convolutions C = conv2(A. s) in the set such that either i=r or i = s. which keeps removing support vectors from the old model and assigning new training examples weighted according to their importance.

In order for the matrix X to represent a valid total pairing. They work by continually searching from the current support vector α and extending along the specified feasible direction u. which is defined as follows The cost function then becomes linear with the pairing matrix X. For example. Last.Second. Biologists hope that after labelling only one or two ―movies‖ of microscopy images.It can be rewritten as C(P) = (1/2)D · X where ―·‖ denotes the usual matrix inner product. sometimes. the classification problem should be addressed in the multiclass setting. the model has to be updated periodically. The entries of this matrix are the parameters with respect to which (13) is to be minimized. The standard online SVM algorithms are discussed in the binary classification setting without addressing the issues that ensue when different classes are of different levels of importance. the updated classifiers will automatically classify new examples with better accuracy. no.. consisting of minimize D · X data is supplied to all these algorithms in batches and thus a large amount of computation is involved. Recently. . vol. However REFERENCES [1] C. three difficulties have to be circumvented. x(i. First. 2. Constraining the domain of the matrix entries to be Boolean (i. 127–134. prophase plays an important role for the identification of the starting point of the mitosis process. 7. but there are only about 140 examples of prophase in a movie of 200 frames of microscope images. 1993. as well as in each column. before applying online SVM to the task of cell phase identification. Price. which can be expressed in linear form as follows: constraint 1) is equivalent to state that the main diagonal of D is all zeros and constraint 2) corresponds to having one and only one entry equal to 1 in each row.M. the data sets are critically imbalanced. this matrix has to satisfy constraints 1) and 2) mentiones earlier. 1}). j) say that {0.e. Due to the ever-changing experimental conditions. The Sequential Minimal Optimization (SMO) algorithm chooses the direction with only two non-zero elements. CONCLUTION An array of algorithms has been developed to solve the SVM QP problem . pp. which are determined by the so-called pair.. the latter is the same to The combinatorial optimization problem can then be restated as a integer programming problem.However.‖ Blood Rev. The classification accuracy will be undesirably biased toward the classes with more samples. ―Fluorescence in situ hybridization. various online SVM algorithms have been proposed to extend the SVM to the online setting. the classes with fewer samples may be more important than other classes.

Nov. II. . 5. Conf. Piper and E. pp. Gader.[2] J. 2009. Shaffer and N. NY: IRL Press. An International System for Human Cytogenetic Nomenclature (ISCN). CA: Freeman. IEEE Int. Abingdon. J. P. 2003.. Lodish..: BIOS Scientific Publishing. 1995. 93. 130 pp. ―Classification of chromosomes using higher-order neural networks. Darnell. 543–550. ISBN 3-8055. pp.. Tan. [3] D. L. no.‖ IEEE Trans. 242–255. 2001. [6] B. Switzerland: Karger and Cytogenetic and Genome Research. [9] J. Human Cytogenetics./Dec. no. Ithaca. S. Granum. D. ―On fully automatic feature measurement for banded chromosome classification. Imag. C. vol. F. Czepulkowski. Stanley. 6. [8] J.8019-3.. Afshordi. ―Optimizing comparative genomic hybridization probes for genotyping and SNP detection in plasmodium falciparum. 4th ed. Blain. Zardoshti-Kermani and A. pp. Tommerup. NJ: Humana Press. Med. R. Matsudaira. [4] J. A.K. Eds. J. Lobo. T. P. [5] H. A Practical Approach. N. and M. pp. 1992. Totowa. Caldwell. vol. vol. Cancer Cytogenetics: Methods and Protocols (Methods in Molecular Biology). Ferdig. 17. vol. 3. 451–462. E. ―Data-driven homologue matching for chromosome identification. 10. Tan. H. T. C. 1998. G.‖ Cytometry. Jun.‖ in Proc. 2005. Zipursky. San Francisco. [7] L. Patel. [10] M. A. J. Rooney and B. vol. Swansbury. Neural Netw. M.‖ Genomics. Jun. and J. 2004. 1989. Basel. Baltimore. Keller. J. Albert. Analyzing Chromosomes. U. Czepulkowski. E. Berk. 2nd ed. C. Molecular Cell Biology. 2004. 2587–2591. J. and W.

Sign up to vote on this title
UsefulNot useful