Process Related - Overview of the Enterprise Data Mining Activities 1 Customer Related

Enterprise Data Mining: A Review and Research Directions

4. Overview of the Enterprise Data Mining Activities 1 Customer Related

4.6 Process Related

Manufacturing process related documents exist in various formats according to different sources, such as documents in digital text, paper, and audio formats. Huang et al. (2006) proposed a rough-set-based approach to improve document representation and to induce classification rules. It was shown that the proposed approach achieved higher user satisfaction than the vector space method. Other process related data mining studies are reviewed by industry area in the following subsections.

4.6.1 For the Semi-Conductor Industry

Saxena (1993) described how Texas Instruments isolated faults during semiconductor manufacturing using automated discovery from wafer tracking databases. Associations were first generated based on prior wafer grinding and polishing data to identify interrelationships among processing steps. To reduce the search space of the discovered associations, domain filters were incorporated. In addition, the interestingness evaluator tried to identify patterns such as outliers, clusters, and trends; only those patterns with interestingness value higher than a set threshold were generated.

Table 5. Summary of logistics related data mining studies.

Reference Goal Databases/Data Description

Data size actually used

Preprocessing Data Mining Algorithm

Data selection Multiple linear regression,

Six test problems An association

rule based

Inventory data Decision trees

Wu (2006) To identify a

Bertino et al. (1999) reported their experience in the use of data mining techniques, particularly association rules and decision trees, for analyzing data concerning the wafer production process with the goal to determine possible causes for errors in the production process in less time (from several days to a few hours). They showed that two commercial tools, i.e., Mineset and Q-Yield, were inadequate to solve the fault detection problem and thus developed a new graph-based algorithm.

Significant combinations of process attributes and the interest order are represented as a directed graph, called the interest graph. As a result of the interest order, the visit to nodes corresponds to a slightly modified breadth search of the graph. The algorithm returns a set of certain causes.

It is thus not always easy to determine right away which data mining technique works best for the problem at hand.

Chen et al. (2004) proposed an integrated processing procedure RMI (Root cause Machine Identifier) to discover the root cause of defects. The procedure consists of three sub-procedures: data preprocessing to transform raw data into the records to be considered, Apriori-based candidate generation, and interestingness ranking based on a newly proposed measure called continuity. This is a measure used to evaluate the degree of continuity of defects in the products in which a target machine-set is involved. A higher value of continuity means that the frequency of defect occurrence in the involved products is higher and the corresponding machine-set has higher possibility to be the root cause.

Karim et al. (2006) proposed some modifications to the original growing self-organizing map for manufacturing yield improvement by clustering. The modifications include introduction of a clustering quality measure to evaluate the performance of the program in separating good from faulty products and a filtering index to reduce noise from the dataset. To investigate the huge amount of semiconductor manufacturing data and infer possible causes of faults and manufacturing process variations, Chien et al. (2007) developed a data mining and knowledge discovery framework that consists of the Kruskal-Wallis test, k-means clustering, and the ANOVA F-test as the variance reduction splitting criterion for building a decision tree. The viability of the proposed framework was demonstrated using a case study, which involved the

analysis of some low CP yield lots in order to find the root causes of a low yield problem. Readers are referred to Chapter 8 for another work of Dr. Chien and his associate.

Cunningham and MacKinnon (1998) discussed statistical methods used to distill large quantities of defect data into relevant information important for a quick understanding of low yield. In particular, they proposed a spatial pattern recognition algorithm that employs defect parsing and data transformation by the Hough transformation for detecting collinear spatial patterns.

Gardner and Bieker (2000) presented three case studies of Motorola semiconductor wafer manufacturing problems. Self-organizing neural networks and rule induction were used together, implemented in CorDex developed by Motorola in house, to identify the critical poor yield factors from normally collected wafer manufacturing data, to explain the wild variation in transistor beta of the bipolar devices manufactured for an automotive application, and to find the cause of intermittent yield problem in a high yield wafer line that manufactures discrete powers used in automobile ignition applications. Using the data mining technology, wafer yield problems were solved ten times faster than using the standard approach; yield increases ranged from 3% to 15%; and endangered customer product deliveries were saved. Li et al. (2006b) presented a genetic programming approach for predicting and classifying the product yields and for uncovering those significant factors that might cause low yield in semiconductor manufacturing processes. They tested their approach using a DRAM fab’s real dataset in comparison with C4.5.

Chen and Liu (2000) used an adaptive resonance theory network (ART1) to recognize defect spatial patterns on wafers. This information could then be used to aid in the diagnosis of failure causes. Because the total number of dies for this wafer product was 294, the number of input nodes was 294 in the ART1 network. The numbers of outputs were seven corresponding to the numbers of defect patterns. A self-organizing map (SOM) was also used for comparison. The training data used was 35 wafers with 5 for each defect pattern. The results showed that ART1 could recognize similar spatial defect patterns more easily and correctly.

Han et al. (2005) described the decision tree technique to automatically recognize and classify a failure pattern using a fail bit map.

Wang et al. (2006) proposed an on-line diagnosis system based on denoising and clustering techniques to identify spatial defect patterns for semiconductor manufacturing. First, a spatial filter was used to determine whether the input data contained any systematic cluster and to extract it from the noisy input. Then, an integrated clustering scheme which combined fuzzy c-means with hierarchical linkage was applied to distinguish different types of defect patterns. Furthermore, a decision tree based on two cluster features (convexity and eigenvalue ratio) was applied to a separate pattern to provide decision support for quality engineers. Hsu and Chien (2007) proposed a framework that integrated spatial statistics, ART1 networks, and domain knowledge to improve the efficiency of wafer-bin-map clustering.

Lee et al. (2001) applied data mining techniques, which include a SOM neural network for clustering, a statistical homogeneity test to merge clusters, and interactive explorative data analysis of SOM weight vectors, to wafer bin map data in order to design an effective in-line measurement sampling method. Rietman et al. (2001) presented a large system model capable of producing Pareto charts for several yield metrics by sensitivity analysis for a fab devoted to the manufacturing of a transistor structure known as gate. These Pareto charts were then used to target specific processes, among twenty-two of them, for improvement of the yield metrics.

Bergeret and Le Gall (2003) proposed a Bayesian method to identify the process stage where there is a yield drift as seen at electrical or class-probe tests. The approach is based only on the process dates of all the process stages. They demonstrated the efficiency of their approach by using two real yield issues where the defective stage is known. Note that this method requires sufficient lot mixing along the process flow and some minimal number of defective lots. Besse and Le Gall (2005) used two change detection methods to identify a defective stage within a manufacturing process. One was a Bayesian method with the use of a reversible Markov Chain Monte Carlo computation (Green, 1995) and another was based on an optimal segmentation of a random process

(Lavielle, 1998). To prevent false alarms, two complementary approaches were used with one based on the theory of shuffling a deck of cards and another based on bagging and hypothesis testing. Three examples with known solutions were presented to show that the Bayesian method was efficient in highlighting the defective stage but more likely to involve false alarms. Optimal segmentation was also efficient but it required more parameters to be fixed than the Bayesian method did.

Last and Kandel (2002) presented a novel, fuzzy-based method for automating the cognitive process of comparing frequency histograms obtained from an engineering experiment for process improvement at a semiconductor factory. The method involves first calculating the membership grades of per-interval proportion differences in the “small”

and the “bigger” fuzzy sets and then evaluating the overall shift between the compared distributions with three possible outcomes: positive shift, negative shift, and no shift. The method was found to provide a more accurate representation of the experts’ domain knowledge than several statistical tests.

Considering time series to be composed of segments between change points, Ge and Smyth (2000) formulated the problem of change point detection in a segmental semi-Markov model framework where a change-point corresponds to state switching. This segmental semi-Markov model is an extension of the standard hidden Markov model (HMM), from which learning and inference algorithms are extended. The semi-Markov part of the model allows for an arbitrary distribution of the location of the change point (equivalently, state duration) whereas the segmental part allows for flexible modeling of the data within individual segments. The proposed method was shown to be useful to detect the end of the plasma etch process which is quite important for reliable wafer manufacturing.

Braha and Shmilovici (2002) applied three classification-based data mining methodologies to better understand the laser cleaning mechanisms for removing micro-contaminants harmful to wafer manufacturing, and to identify the attributes that are significant in the cleaning process. Two groups of input variables were considered: energy factors with 7 variables and gaseous flow factors with 4 variables. The performance of the cleaning process was measured by percentage of

particles moved from the original location (%Moval) and percentage of particles removed from the target wafer (%Removal). The two performance indices were continuous and were converted into a finite number of discrete classes before applying the three methodologies:

decision trees, neural networks, and composite classifiers. Some experimental data were used in the study and the results indicated that the strategy of building a diverse set of classifiers from different model classes performed better than other strategies.

Braha and Shmilovici (2003) performed an exploratory data mining study of an actual lithographic process, which was comprised of 45 sub-processes. Based on the records of 1,054 unique lots of 13 different 0.7-micron products, a decision tree induction algorithm called C4.5 implemented in the KnowledgeSEEKER environment was employed to enhance the understanding of the intricate interactions between different processes, and to extract high-level knowledge that can be used to enhance the overall process quality. Given a historical dataset of wafer input variables and their corresponding critical dimension (CD) classes, a decision tree was induced that could identify the CD class to which a new set of input variables is most likely to fit. Braha et al. (2007) developed a model for evaluating classifiers in terms of their value in decision-making. Based on the decision-theoretic model, they proposed two robust ensemble classification methods that construct composite classifiers, which are at least as good as any of the component classifiers for all possible payoff functions and class distributions. They showed how these two robust ensemble classification methods could be used to improve the prediction accuracy of yield and the flow time of every batch in a real-world semiconductor manufacturing environment.

Lada et al. (2002) proposed a general procedure for detecting faults in a time-dependent rapid thermal chemical vapor deposition (RTCVD) process based on a reduced-size dataset, which had the following steps:

(a) data reduction; (b) construction of the nominal (in-control) process data model; (c) development of the process fault detection statistic; and (d) application of the test statistic to detect potential process faults.

Furthermore, the data reduction step is consisted of two sub-steps: (1) selecting wavelet coefficients by working with a single dataset based on a

method that effectively balances model parsimony against data reconstruction error; and (2) deciding on a data-reduction strategy for all replicates, where each replicate is a different set of signals collected form an independent, identically distributed instance of the same in-control process. The nominal process model is approximated by using the selected coefficients. If the original data are normally distributed, a variant of the classical two-sample Hotelling’s T²-statistic adapted to the reversed jackknife sampling scheme is used to test whether the estimated wavelet coefficient vector for the new process is in control. A nonparametric procedure is applied when the original datasets are non-normal; interested readers are referred to the original paper for more details.

Jeong et al. (2006) experimented with a tree-based classification procedure, CART, for identifying process fault classes from semiconductor fabrication data, reduced with a new data reduction method based on the discrete wavelet transform to handle potentially large and complicated non-stationary data curves. Their data reduction method minimized an objective function which balances the trade-off between data reduction and modeling accuracy.

Gibbons et al. (2000) discussed research involving the implementation of data mining techniques to achieve a greater level of process control using a predictive model. They first carried out principal component analysis on over one hundred wafer process parameters and then built a predictive model using partial least squares regression and a feed-forward backpropagation three layer neural network. For fault detection and operation mode identification in processes with multimode operations, Chu et al. (2004b) proposed a method which employed a SVM as a classification tool together with an entropy-based variable selection method. They gathered a dataset of 1,848 batches from a rapid thermal annealing process in which a wafer is processed for about 2 minutes. To monitor the process condition, seven process variables were measured once every 3 seconds (the number of measurement in one batch run was 43), resulting in 301 total numbers of variables. Sixty-two variables were selected to build 3 SVM classifiers with 1,000 batch data

(one for each mode). Considerably lower errors than that of the traditional PCA-based fault detection method were reported.

Kot and Yedatpre (2003) described how e-diagnostics capabilities combined with proven enterprise data mining technology could help pinpoint the specific critical process conditions and variables that affect process control.

Kusiak (2001) presented a rough set theory based rule-restructuring algorithm to extract decision rules from datasets of different types generated by different sources, in support of making predictions in the semiconductor industry. The structural quality of extracted knowledge was evaluated with three measures. They are: (1) a decision support measure (DSM), (2) a decision redundancy factor (DRF), and (3) a rule acceptance measure (RAM). DSM is the total number of rules or the number of objects from the training set supporting a decision. DRF is the number of mutually exclusive feature sets associated with the same decision. RAM reflects the user confidence in the extracted rules. The prediction quality such as classification accuracy of a rule set was evaluated with one of the following three methods: partitioning, bootstrapping, and cross validation. Table 6 summarizes all of the process related data mining studies for the semiconductor industry reviewed above.

4.6.2 For the Electronics Industry

Apté et al. (1993) employed five classification methods (k-nearest neighbors, linear discriminant analysis, decision trees, neural networks, and rule induction) to predict defects in hard drive manufacturing. Error rates at a critical step of the manufacturing process were used as input to identify knowledge of two classes (fail or pass) for providing further assistance to engineers. Büchner et al. (1997) described three case studies of successful use of data mining in fault diagnosis. The first study involved building a model using the C4.5 algorithm from process data in order to identify a lapse in the production of recording heads. The second and third studies were identical to those reported by Saxena (1993) and Apté et al. (1993), respectively.

Table 6. Summary of process related data mining studies for the semi-conductor industry.

Reference Goal Databases/Data Description

Data size actually used

Preprocessing Data Mining Algorithm

Table 6. Summary of process related data mining studies for the semi-conductor industry (cont’d).

Reference Goal Databases/Data Description

Data size actually used

Preprocessing Data Mining Algorithm

Table 6. Summary of process related data mining studies for the semi-conductor industry (cont’d).

Reference Goal Databases/Data Description

Data size actually used

Preprocessing Data Mining Algorithm

Wafer bin maps 138 maps Data integration, cleaning,

Table 6. Summary of process related data mining studies for the semi-conductor industry (cont’d).

Reference Goal Databases/Data Description

Data size actually used

Preprocessing Data Mining Algorithm

Software Jeong et al.

(2006)

To detect faults Rapid thermal chemical vapor

Wafer data 133 parameters by 16,381

Data normalization SOM, followed by statistical

Table 6. Summary of process related data mining studies for the semi-conductor industry (cont’d).

Reference Goal Databases/Data Description

Data size actually used

Preprocessing Data Mining Algorithm

Software

Rietman et al. (2001)

To identify processes to increase yield

Wafer production data

111,117 records each with 181 fields

Data preprocessed normalized for some fields

Neural network with forward gating and backward gating Saxena

(1993)

Fault diagnosis Wafer tracking databases

Association rules, domain filters, interestingness evaluator Wang et al.

(2006)

To identify spatial defect patterns

Real and synthetic wafer samples of DRAM

Denoising Fuzzy c-means, hierarchical linkage, decision trees

68Recent Advances in Data Mining of Enterprise Data

Kusiak and Kurasek (2001) used rough set theory to identify the causes of soldering defects in a printed-circuit board assembly process.

Special attention was paid to feature selection, data collection, extraction of three rule sets (rules for defect occurrence, rules for defect non-occurrence, and approximate rules for the occurrence of ambiguous outcomes under the same set of conditions), and knowledge validation.

The presence of approximate rules indicates that the feature set considered was insufficient and additional features were needed to be defined.

Tseng et al. (2004) presented a new heuristic algorithm, called extended rough set theory, for identifying the most significant features and deriving a set of decision rules simultaneously that explain the cause of soldering ball defects. Zhang and Apley (2003) proposed an MLPCA (maximum-likelihood principal component analysis) logistic regression clustering algorithm and applied it to identify the two underlying variation sources which govern the variation pattern among more than 3,000 soldering joints in a selected region of printed circuit boards (PCBs).

Maki and Teranishi (2001) developed an automated data mining system designed for quality control in manufacturing and discussed three characteristic functions of the system: (a) periodical data feeding and mining involving data transformation, discretization, and rule induction by the CHRIS algorithm; (b) storage and presentation of data mining results through the Web on the factory intranet; and (c) extraction of temporal variance of data mining results, which involves comparing the rank of each rule in the newer rule lists with that of the corresponding rule in the older lists in terms of their u-measure values and recognizing a change in rank as a “rise”, a “fall”, or a “stay”. The u-measure evaluated the significance of each rule. The system was applied to liquid crystal display fabrication to show its usefulness for rapid recovery from

Dans le document Recent Advances in Data Mining of Enterprise Data: Algorithms and Applications (Page 85-121)