• Aucun résultat trouvé

Evaluate dynamic Kmeans algorithm for automatically segmented different breast regions in mammogram based on density by using seed region growing technique

N/A
N/A
Protected

Academic year: 2021

Partager "Evaluate dynamic Kmeans algorithm for automatically segmented different breast regions in mammogram based on density by using seed region growing technique"

Copied!
10
0
0

Texte intégral

(1)

automatically segmented different breast regions in mammogram based on density by using...

Article · January 2015

CITATION

1

READS

42

4 authors:

Some of the authors of this publication are also working on these related projects:

Policy-based ManagementView project

Segmentation et classification des images mammaires pour l’aide au diagnostic du cancer du sein View project

Abdelali Elmoufidi

Université Hassan II de Casablanca 12PUBLICATIONS 19CITATIONS

SEE PROFILE

Khalid El Fahssi

Faculté des Sciences Ain Chock - Casablanca 13PUBLICATIONS 19CITATIONS

SEE PROFILE

Said jai andaloussi

Université Hassan II de Casablanca 34PUBLICATIONS 70CITATIONS

SEE PROFILE

Abderrahim Sekkaki

Université Hassan II de Casablanca 84PUBLICATIONS 134CITATIONS

SEE PROFILE

(2)

EVALUATE DYNAMIC KMEANS ALGORITHM FOR AUTOMATICALLY SEGMENTED DIFFERENT BREAST

REGIONS IN MAMMOGRAM BASED ON DENSITY BY USING SEED REGION GROWING TECHNIQUE

1ABDELALI ELMOUFIDI, 2KHALID EL FAHSSI, 2SAID JAI ANDALOUSSI, 2ABDERRAHIM SEKKAKI

1Department of Mathematics & Computer Science, Faculty of sciences, Hassan II University, Casablanca, Morocco

2Department of Mathematics & Computer Science, Faculty of sciences, Hassan II University, Casablanca, Morocco

E-mail: 1[email protected], 2[email protected], [email protected], [email protected]

ABSTRACT

This paper presents a method for segment and detects the boundary of different breast tissue regions in mammograms by using dynamic Kmeans clustering algorithm after evaluate it by using Seed Based Region Growing (SBRG) techniques. Firstly, the K-means clustering is applied for dynamically and automatically divides mammogram into homogeneous regions according to the intensity of the pixel. From where automatically selected of seeds point and determined the threshold value for each region. Secondly, the region growing algorithm is applied with previously generated seeds point and threshold values as input parameter of SBRG. The main goal of this study is to evaluate the dynamic k-means clustering algorithm in the detection and segmentation of different breast tissue regions, which correspond to the density in mammograms. Segmentation of the mammogram into different mammographic densities is useful for risk assessment and qualitative or/and quantitative evaluation of density changes. In order to evaluate our proposed method, we used the well-known Mammographic Image Analysis Society (MIAS) database. The obtained qualitative and quantitative results demonstrate the efficiency of this method and confirm the possibility of its use in improving the computer-aided detection / diagnosis.

Keywords: Breast Density; Breast Segmentation; Medical Image Processing; Kmeans Clustering Algorithm; Region Growing Technique

1. INTRODUCTION

Breast cancer is one of the commonest types of cancer contributing to the increase in mortality among women worldwide. Recent statistics show that breast cancer affects one of every ten women in Europe and one of every eight in the United States [1]. Early detection and diagnosis of breast cancer is one of the most important factors affecting the possibility of recovery from the disease. For that reason, the mammography remains the best and most accurate tool in detecting breast cancer.

Mammography is an X-ray technique that was developed specifically for soft tissue radiography of the breast. It is based on the differential absorption

of X-rays between the various tissue components of the breast such as fat, fibroglandular tissue, extremely dense tissue (can be a tumor) and calcifications. This appearance is described as 'mammographic density'. The mammographic density part contains ducts, lobular elements, and fibrous connective tissue of the breast [2]. In medical community, breast tissue density is an important risk indicator for growth of breast cancer [2-4]. The proportion of fatty and fibroglandular tissue of the breast region is evaluated by a relatively small number of radiologists in the interpretation of mammographic images. The result is subjective and varies from one radiologist to another. Radiologists can miss a significant

(3)

proportion of abnormalities [4] and a large number of mammographic abnormalities may turn out to be benign after biopsy [2, 5]. In order to improve the accuracy of interpreting mammograms, a variety of computer-aided diagnosis (CAD) systems that perform computerized mammographic analysis have been proposed, as stated by Rangayyan et al.

[3, 4]. These systems are usually employed as supplement to the radiologist’s assessment. Thus, their role in modern medical practice is considered to be significant and important in the early detection of breast cancer.

The strong relationship between breast density and the risk of developing breast cancer has been proposed, firstly by Wolfe [3]. It has been confirmed by other researchers, such as Boyd et al [6] and Karssemeijer [7]. Breast Imaging Reporting and Data System (BI-RADS) of the American College of Radiology (ACR) aims to provide a standardized classification system for reporting mammographic breast densities [8]. BI-RADS 4th Edition classifies breast density into four major categories: (1) predominantly fat; (2) fat with some fibroglandular tissue; (3) heterogeneously dense;

and (4) extremely dense.

Breast density is an important factor in the interpretation of a mammogram. Good reading breast density plays a very important role in the early detection of breast cancer. However in a dense breast, the sensitivity of mammography for the early detection of breast cancers is reduced because of the difficulty in detection ill-defined cancers. Quantification of breast density remains a topical issue. Major studies published have shown that high breast density "endogenous" correlated to cancer risk breast. Therefore, segmentation of the mammogram into different mammographic densities is useful for risk assessment and quantitative evaluation of density changes.

Suckling et al. [8], automated segmentation of the glandular tissue can provide as an initial step in mammographic lesions detection.

Many researchers have published on segmentation of breast regions according to differences in density. For example, Saidin et al. [2]

proposed a method to segment the breast into four regions: background, skin-air boundary, fatty boundary and pectoral muscle boundary.

Karssemeijer [7] used an approach to subdivide a mammogram into three distinct regions: breast tissue, pectoral muscle and background. Adel et al [9]. Proposed a method to segment the breast regions into three distinct areas are: pectoral muscle, fatty and fibroglandular regions by using Bayesian techniques with adaptation of Markov

random field for detecting regions of different tissues on mammograms. El-Zaart et al [10]

segmented mammogram image into three regions, which are fibroglandular disc, breast region and background. Camilus et al [11]. Proposed an approach based the graph cuts algorithm to identify the pectoral muscle automatically.

In this paper, we propose a method by using dynamic K-means algorithm and seed region growing technique is able to automatically segment the breast tissue into two family areas: The first family is static, contains the background and the pectoral muscle. The second family is dynamic, contains a variable number of regions selected dynamically and automatically as breast density, shape, size, etc. But still belongs to the following list (Fatty, Fatty-Glandular and Dense-Glandular) in most areas we can have the same breast density but segmented separately because we do not have the spatial continuity of regions in the image mammogram. The minimum number of regions in a mammogram is three (background, pectoral muscle, homogeneous breast profile (100% Fatty or 100%

Fatty-Glandular or 100% Dense)), see more than three regions (four regions, five regions ...) in most cases (see fig.5, 6 and 7).

The setup of the paper is as follows: Section II-A describes the database that we used for demonstration and evaluation.

Section II-B describes the k-means algorithm.

In section II-C describes the region growing algorithm. In section III we present our image preprocessing techniques.

Section IV-A describes method of Segmentation and detection the boundary of

Figure 1: shows of different components in the image mammography.

(4)

different breast tissue regions. Results and discussion are presented in section V. We end by conclusion in section VI.

2. MATERIALS AND METHODS 2.1 Database

In this paper we used the mini-MIAS database [20], which contains 322 digitized mammogram images consisted of left and right breast images. The acquired mammogram images are classified into three distinct classes: Fatty (F) (106 images), Fatty-Glandular (G) (104 images) and Dense-Glandular (D) (112 images), similar to Mavroforakis et al. [22]. Size of these images is 1024 x 1024 pixels in Portable Greymap (PGM) format. Each pixel in the images is represented as an 8-bit word, where the images are in gray scale format with a pixel intensity of range [0, 255] [2].

2.2 K-means Algorithm

The k-means algorithm is one of the simplest unsupervised learning algorithms that solve the well-known clustering problem. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed a priori. The principle of k-means algorithm is given below:

1. Define K cluster centers, either randomly or based on some heuristics.

2. Assigning each pixel to the nearest cluster is based on the minimum Euclidean distance between the point and the k cluster centers.

3. Re-compute the cluster centers.

4. Repeat step 2 and 3, a loop has been generated is the criterion stops the loop when the center does not move.

For a given set of N observation {S1, S2,...,SN} k- means algorithm segments the observation into k cluster {C1, C2,..., Ck}, their cluster center are {µ12,..., µk} (k<N) So as to minimize the within cluster sum of squares in equation (1).

Where:

K : Is the number of cluster centers;

ci : Is the number of data points in ith cluster;

x μ: Is the Euclidean distance between xi and µj;

μ : Is the mean of ith in c during each iteration; it

is as follows:

2.3 Seed Based Region Growing (SBRG)

Seeded region growing (SRG) algorithm for segmentation introduced by R. Adams et al [13], is a simple method of segmentation which is rapid and free of tuning parameters. It a better choice for easy implementation and applying it on a larger dataset. The only drawback of SRG algorithm is the difficulty in automating seed generation and dependency of output on order sorting of pixel as

Figure 2: Organization chart of the proposed method.

(5)

different order of processing pixels during region grow process leads to different final segmentation results [14].

Seeded region growing approach to image segmentation is to segment an image into regions with respect to a set of n seeds as presented in [14, 15] is discussed here. Given the set of seeds {S1, S2, …, Sn} each step of SRG involves identifying one additional pixel to one of the seed sets.

Moreover, these initial seeds are further replaced by the centroids of these generated homogeneous regions, {R1, R2, …, Rn} by involving the additional pixels step by step. The pixels in the same region are labeled by the same symbol and the pixels in variant regions are labeled by different symbols. All these labeled pixels are called the allocated pixels, and the others are called the unallocated pixels.

Where: N(x, y) is the second-order neighborhood of the pixel (x,y).For the unlabeled pixel (x; y)∈ T, N(x, y) meets just one of the labeled image region

R

i and define ψ(x; y) ∈ (1, 2,…; n) to be that index such that N(x; y) ∩ R (x; y) ≠ ∅

(x, y,Ri) is defined as the difference between the testing pixel at (x,y) and its adjacent labeled region Ri. (x, y, Ri) is calculated as:

Where: g(x, y) indicates the values of the pixel intensity at location (x, y) and gX , Y represents the average values of pixels from homogeneous region Ri, with gX , Y as the centroid of Ri. If N(x, y) meets two or more of the labeled regions, ψ(x; y) takes a value of i such that N(x, y) meets Ri and δ(x, y, Ri) is minimized.

Above seeded region growing process is repeated until all pixels in the image have been allocated to the corresponding regions. The above criterions ensure that the final partition of the image is divided into a set of regions as homogeneous as possible on the basic assumptions made initially.

3. OUR PROPOSED APPROACH

This section describes the details of our proposed method for segmentation and detection the boundary of different breast tissue regions in mammograms. The proposed method consists of three phases; firstly, we start by preprocessing the mammogram images, in order to:

1. Remove the digitization noise, 2. Enhance the contrast of breast profile.

3. Remove the additional background borders,

Secondly, used k-means clustering algorithm to automatically generate the number of clusters and classify pixels of the mammogram images into homogeneous sets plus determine of seeds points and thresholds values for each clusters.

Thirdly applied Seed Based Region Growing (SBRG) Techniques to segment and detect the boundary of different breast tissue regions in mammograms for object is evaluate the dynamic kmeans clustering in segmentation of mammograms.

2.1. Mammogram image pre-processing

As in typical film scanned, the digitization of the mammography images can cause some noises at the result image. In the MIAS database several types of noise and imaging artifacts are present [12]. Therefore, computer image processing techniques will be applied to enhance the quality of images:

Figure 3: Noise suppression and contrast enhancement: (a) Original image, (b) Noise removal and breast contrast enhancement, (c) Separation breast profile from background.

(6)

284 2.2. Breast profile extracted

In this phase, the aim is to extract breast profile

region from background, firstly a threshold value is used to transform gray mammogram image to binary mammogram image. The value of this threshold is calculated from the minimum intensity values between the initial two most significant peaks of mammogram image histogram; these peaks represent the background and the breast [1].

Secondly, connected component is used to extract the largest component which is the breast (see fig.3 (c)).

2.3. Digitization Noise Removal and Breast Contrast Enhancement

In this phase we used a two-dimensional (2D) median filtering in a 3-by-3 neighborhood connection to remove noise. Additionally, the mammogram is usually basically low contrast [1], therefore a step of enhancement of contrast is necessary for this purpose we used filter means.

(See fig.4).

4. SEGMENTATION AND DETECTION THE OUNDARY OF DIFFERENT BREAST TISSUE REGIONS

3.1. Automatically generated the input parameter’s region growing

One of the drawbacks of the k-means algorithm is the number of clusters which will divide the data set, which must be determined by the user as an input parameter. In database of mammogram images, the intensity, texture and shape are changing from one image to another. Therefore, taking a fixed number of regions and the same regions for all images of database is irrelevant. For this reason we have proposed two approaches for dynamically and automatically determined the regions of interests for each mammogram based only on these characteristic (the intensity, texture and shape ...) "Detection of Regions of Interest's in Mammograms by Using Local Binary Pattern, Dynamic K-Means Algorithm and Gray Level Co-occurrence Matrix" Elmoufidi et al [16]

[17]. So the main goal of this study is to evaluate the dynamic k-means clustering algorithm in the segmentation of different breast tissue regions in mammogram by using Seed Based Region Growing as an evaluation technique. Firstly; we started by using kmeans clustering to divide a mammogram to different density tissue region, automatically selected the seeds point and determinate threshold values of each regions. Secondly applied Seed Region Growing technique for redetect and Figure 5 : Segmentation and detection the

boundary of different breast tissue regions of mdb184 and mdb028: (a) Original image, (b) Mammography image after preprocessing step, (c) Detect the different breast tissue regions by Figure 4: Separation of breast profile region from

background:

(7)

segmented the boundary of different breast tissue region in mammogram based on input parameter selected by kmeans algorithm. Our object is to evaluate the dynamic kmeans algorithm we used in our previous study.

3.2. EXPERIMENTAL RESULTS 4.2.1. Experiment 1: Fatty Tissue

The first example deals with a mammogram which is predominantly comprised of fatty tissues. Fig.5 respectively shows the results for MIAS images mdb184 and mdb028.

4.2.2. Experiment 2: Fatty-Fibroglandular Tissue

The second example deals with a mammogram which is predominantly comprised of glandular tissue. Fig.6 respectively shows the results for MIAS images mdb019 and mdb120.

4.2.3. Experiment 3: Dense -Fibroglandular

Tissue

The third example deals with a mammogram which is predominantly comprised of dense tissue. Fig.7 respectively shows the results for MIAS images mdb063 and mdb107.

5. RESULTS AND DISCUSSION

In some cases the elements of the discontinuity is located in the same region is a drawback of the algorithm kmeans. But in image segmentation mammograms this discontinuity is an advantage and a highlight of the kmeans algorithm, because we are not interested in the spatial information, our goal is the detection of suspicious regions even if they are discontinuous especially in the case of micro-calcifications pixels belong to the same region are still separated. Then, according to the treatment of many classes of disadvantage freeze user in previous studies [16] by the motion of the methods for automatic and dynamic generation number class. Now we are faced with an effective method for distributed image mammogram different region according to his breast density to extract the suspect regions more small areas that can be represent a micro calcification or metastasis which cannot be detected with other methods such as growing seed region, for example (see FIG 4, 5 and 6).

The most essential requirement from a radiologist point of view for image processing algorithms is the ability to achieve enhanced visualizations of anatomical structure, while preserving the detail of the structure [12]. There are many researches, which worked on the segmentation and classification of glandular tissues. Each segmentation and classification result needs evaluation of its performance. There are three types of performance evaluations of algorithms and approaches proposed for medical imaging processing (detection of regions of interest, segmentation and classification): The first type involves qualitative assessment, the second is quantitative assessment involving the ground truth evaluation, and the third is a statistical evaluation [12]. Performance evaluation for research on classification of breast density involves comparison of research result with density class that has been given by radiologist, while performance evaluation for segmentation of breast density usually is done in qualitative analysis. This is because of the difficulty in obtaining the ground truths from radiologist [12]. The quantitative analysis is performed only by a small number of researches.

For the quantitative analysis, usually the performance of the segmentation results is compared with the ground truth by the radiologist.

Ground truth in these density based research means, a correct marking of the glandular tissue or density area by the radiologist in a digital mammogram.

Figure 6: Segmentation and detection the boundary of different breast tissue regions of mdb019 and mdb120:

(a) Original image, (b) Mammography image after preprocessing step, (c) Detect the different breast tissue regions by using kmeans algorithm, (d) Segment and detect the boundary of different breast tissue regions by

(8)

Normally the evaluation of our algorithm is qualitative, because we do not seek a well-defined

object in the image mammogram, but our goal is the segmentation of different breast tissue. Our method for a comprehensive utilization and for any image mammogram regardless of the intensity levels, and whether that contains of lesions or normal. But to have quantitative results, we took pictures contains lesions and calculate the rate overlapped between the regions segmented by our algorithm and critical regions segmented by a radiologist. Like that quantitative results are obtained. The evaluation criterion is the overlapped area ratio [21]. The proposed method is tested by using the Mini-MIAS database [20] as previously mentioned in section II-(A). Also, we used some information offered by the (MIAS) database, such as:

1. Class of abnormality;

2. Image-coordinates of center of abnormality;

3. Radius (in pixels) of a circle enclosing the abnormality.

In the case of (MIAS) database, the criterion region is the circle formed by the coordinates of center and radius. Overlap criteria have been used by several mass segmentation techniques [18, 19] and is described by equation 6.

CDR Overlap% ∗ 100%

(6) Where:

TP: True Positives, FN: False Negatives, FP: False Positives and CDR: Correct Detection Ratio

6. CONCLUSION

In this work we presented an approach of the detection and segmentation of the different breast tissue regions in mammograms based on dynamic Figure 8: Segmentation and detection the boundary of

different breast tissue regions of mdb063 and mdb107:

(a) Original image, (b) Mammography image after preprocessing step, (c) Detect the different breast tissue regions by using kmeans algorithm, (d) Segment and detect the boundary of different breast tissue regions by using SBRG.

Figure 9 : Notion of: True Positive, False Positive and False Negative.

Figure 7 : Region detected by our algorithm (marked by a yellow line) and region mentioned by radiologist (marked by a white circle)

(9)

k-means clustering algorithm plus evaluate it by using region growing technique. This approach consists of three main stages which are: (1): Image preprocessing steps (2): Dynamically and automatically detected of different breast tissue regions in mammograms. (3) Redetected and segmented of boundary of different breast tissue regions by using region growing technique.

Applied dynamic k-means clustering algorithm for automatically and dynamically segmented and detected of boundary of different region breast tissue has an important role. The results of the algorithm can contribute to solving the main problem in mammography image processing such as detection, diagnostic and classification. The Efficiency of the proposed method confirms the possibility of its use in improving the computer-aided detection / diagnosis.

REFRENCES:

[1] Ali El-Zaart et al, "Breast Segmentation using K- means Algorithm with A Mixture of Gamma Distributions", IEEE 3rd SBNFI, Lebanon. May 28 and 29, 2012.

[2] N. Saidin et al, "Density based breast segmentation for mammograms using graph cut and seed based region growing techniques", inProceedings of the 2nd International Conference on Computer Research and Development (ICCRD '10), pp. 246-250, Kuala Lumpur, Malaysia, May 2010.

[3] J. N. Wolfe, "Risk for breast cancer development determined by mammographic parenchymal pattern", Cancer, vol. 37, no. 5, pp. 2486-2492, May 1976.

[4] F. M. Hall et al, "Nonpalpable breast lesions:

Recommendations for biopsy based on suspicion of carcinoma at mammography", Radiology, vol. 167, 1988, 353-358.

[5] R. L. Birdwell et al, "Mammographic characteristics of 115 missed cancers later detected with screening mammography and the potential utility of computer-aided detection", Radiology, vol. 219, 2001, 192-202.

[6] N. F. Boyd et al, "Quantitative classification of mammographic densities and breast cancer risk:

results from the Canadian National Breast Screening study", J. Nat. Cancer Inst., vol.

87(9), 1995, pp. 670-675.

[7] N. Karssemeijer, "Automated classification of parenchymal patterns in mammograms", Phys.

Med. Biol., vol. 43, 1998, pp. 365-378.

[8] American College of Radiology. American College of Radiology Breast Imaging Reporting and Data System (BIRADS). 4th ed, American College of Radiology, Reston, VA (2003).

[9] Adel et al , "Statistical Segmentation of Regions of Interest on a Mammographic Image", EURASIP Journal on Advances in Signal Processing 2007, Article ID 49482, 1-8 (2007).

[10] A.El-Zaart et al, "Expectation Cmaximization technique for fibro-glandular discs detection in mammography images", Comput. Biol. Med.

40(4), 392-401 (2010).

[11] K. S. Camilus et al, "Computer- Aided Identification of the Pectoral Muscle in Digitized Mammograms", Journal of digital imaging, October 2009, pp. 1-19.

[12] Stylianos.D et al, "A fully automated scheme for mammographic segmentation and classification based on breast density and asymmetry", computer methods and programs in biomedicine 2011, 47-63.

[13] R. Adams and L. Bischof. "Seeded region growing", IEEE Trans, Pattern Analysis Machine Intelligence, 16(6):641-647, Jun 1994.

[14] Harikrishna Rai G.N et al, "Gradient Based Seeded Region Grow method for CT Angiographic Image Segmentation", InterJRI Computer Science and Networking, Vol. 1, Issue 1, July 2009.

[15] Jianping Fan, Guihua Zeng, Mathurin Body, Mohand-Said Hacid, "Seeded region growing:

an extensive and comparative study", Pattern Recognition Letters 26 (2005), 1139-1156.

[16] A. Elmoufidi et al, "Detection of Regions of Interest in Mammograms by Using Local Binary Pattern and Dynamic K-Means Algorithm", International Journal of Image and Video Processing: Theory and Application Vol.

1, No. 1, 30 April 2014 ISSN: 2336-0992.

[17] Khalid El Fahssi, Abdelali Elmoufidi, Abdenbi Abenaou, Said Jai-Andaloussi, Abderrahim Sekkaki, "Feature extraction of the lesion in mammogram images using segmentation by minimizing the energy and orthogonal transformation adaptive", WSEAS

TRANSACTIONS on BIOLOGY and

BIOMEDICINE , Volume 11, 2014, E-ISSN:

2224-2902.

[18] A. Mencattini et al, "Automatic breast masses boundary extraction in digital mammography using spatial fuzzy c-means clustering and active contour models", IEEE International Workshop on Medical Measurements and Applications (MEMEA '11), 2011, Bari ,Italy.

(10)

288 [19] R. Ferrari, "Segmentation of the fibro-glandular

disc in mammogrms using Gaussian mixture modelling", Medical and Biological Engineering and Computing 42 (3) (2004) 378- 387.

[20] Suckling, J. et al, "The Mammographic Image Analysis Society digital mammogram database", Exerpta Medica, International Congress Series 1069 pp. 375-378., 1994.

[21] J.Sethian, "Theory, algorithms, and applications of level set methods for propagating interfaces", Acta Numerica, Volume 5 January 1996, pp 309-395.

[22] M. Mavroforakis et al, "Significance analysis of qualitative mammographic features, using linear classifiers", neural networks and support vector machines, European Journal of Radiology 54 (1) (2005) 80-89.

Références

Documents relatifs

Our method consists of three phases: firstly, preprocessing images by using thresholding and filtering methods; secondly, generating range of number of clusters by using Local

While the broadcast flooded the network with 8.4✂N messages, the geographic routing with its location service sent to the network 5✂N messages, and PTH sent 2.4✂N messages. Not

[r]

We apply a clustering approach based on that algorithm to the gene expression table from the dataset “The Cancer Cell Line Encyclopedia”.. The resulting partition well agrees with

Keywords: breast image extraction from a mammogram, binarization, quanti- zation, threshold processing, scaling, fast wavelet transform, arithmetic mean

If we join centers of all clusters and the starting position of a drone by an area of cells of double size (e.g, using minimal spanning tree (MST) or Steiner tree, Fig. 6) and then

Qian [17], developed a graph based semi-supervised learning (SSL) scheme using deep convolutional neural net- work (CNN) for breast cancer diagnosis with a small portion of labeled

R´ esum´ e. Ce TP a pour but de mettre en pratique deux m´ ethodes couramment utilis´ ees en ap- prentissage statistique, ` a savoir l’algorithme K-means et l’ACP. Dans la