Production Planning and Control Related - Overview of the Enterprise Data Mining Activities 1

Enterprise Data Mining: A Review and Research Directions

4. Overview of the Enterprise Data Mining Activities 1 Customer Related

4.4 Production Planning and Control Related

Sun and Kuo (2002) proposed a visual exploration approach for mining abstract, multi-dimensional data stored in a relational database and applied it to generate visual images from which users could quickly and easily compare the machine idle cost performance of alternative master production plans. Their approach used the small multiples design and automatically generated a non-uniform color mapping.

Table 3. Summary of product related data mining studies.

Reference Goal Databases/Data Description

Data size actually used

Preprocessing Data Mining Algorithm

Warranty data 684,038 records of 88 attributes

Table 3. Summary of product related data mining studies (cont’d).

Reference Goal Databases/

Data Description

Data size actually used

Preprocessing Data Mining Algorithm

Product routings 250,000 routings

Data are filtered and transformed

Factor analysis to identify the most important

Feature weighting using the AHP method and data normalization unwanted text, stop words, and word stemming

Table 3. Summary of product related data mining studies (cont’d).

Reference Goal Databases/Data Description

Data size actually used

Preprocessing Data Mining Algorithm

To cluster bills of materials (BOMs)

Table 3. Summary of product related data mining studies (cont’d).

Reference Goal Databases/

Data Description

Data size actually used

Preprocessing Data Mining Algorithm

Questionnaires 1,472 records with each having 29 attributes about the product

Data cleaning k-means, SOM, and FuzzyART

170 records with 12 condition attribute of

Koonce and Tsai (2000) used an attribute-oriented induction methodology to extract a set of rules from data generated by a genetic algorithm (GA) that was implemented to perform a scheduling operation.

Specifically the GA was designed to solve a 6×6 benchmark job shop scheduling problem and run 1,000 times. Of all 1,000 optimal sequences, 264 were unique and mined together with some operations’

characteristics using attribute-oriented induction to determine a set of 24 distinct characteristic rules, which duplicate the GA’s performance.

Before the induction, GA sequences were mapped into a relation and numerical attributes were divided into a number of intervals.

Kwak and Yih (2004) presented a data-mining-based production control approach, called the competitive decision selector (CDS), for the testing-and-rework cell in a dynamic and stochastic computer-integrated manufacturing (CIM) system. For the construction of CDS, the training data were generated by simulation models developed using SIMAN.

Features were selected through the iterative process of a hybrid feature-selection approach, which involves using the filter approach to prescreen promising features and the wrapper approach to determine the final set of features. The data were then transformed and partitioned according to the system congestion level. A knowledge base was constructed by using a decision tree algorithm, specifically C4.5, within each sub-partition. The proposed CDS is comprised of two algorithms. It observes the status of the system and jobs at each decision point and makes its decision on job preemption and dispatching rules in real time by activating the corresponding group of knowledge bases. The CDS dynamic control was shown to perform better than static control rules, particularly when static control rules are competing with each other. Readers are referred to Chapter 6 for other related work by Yih and her associates.

Li et al. (2006) proposed a hybrid approach that combined metal-fuzzification, data trend estimation, and ANFIS to learn FMS scheduling rules from a small dataset. The predictor attributes used were size of the input/output buffers of each machine, arrival rate of parts, and speed of AGV. The dispatching rules considered include first come first served, shortest processing time, and earliest due date. Li and Olafsson (2005) introduced a framework for using data mining, specifically decision tree

models, to discover dispatching rules from production data. They also developed methods for using frequent item set generation to construct composite attributes which in combination with attribute selection improve the performance of the predictive models.

Estimating the cycle time for a product in a factory, especially one with complicated processes such as semiconductor manufacturing is necessary to assess customer due dates, schedule resources and actions to anticipated job completions, and to monitor the operation. To forecast the cycle time of a lot or a product, Yu and Huang (2002) proposed a production learning system based on the tool model. The tool model attempts to divide the flow of a lot or a product into the basic elements, or steps, rather than stages. The tool model concept involves building a model to determine the time required for a step for a lot being processed.

At each step, the tool model can be divided into two parts: the waiting part and the processing part, thus both the waiting time and the processing time are involved in each step. The cycle time of a lot is the summation of both waiting time and processing time at each step. To estimate the (waiting or processing) time, a backpropagation neural network was used to establish the relationship between the input and output (time) of the model.

Sha and Liu (2005) presented a rule based total work content model (RTWK) which incorporated a decision tree for minimizing the knowledge of job scheduling about due date assignment in a dynamic job shop environment. The decision tree induced by C4.5 was able to adjust an appropriate allowance factor k according to the condition of the shop at the instant of a job arrival, thereby reducing the due date prediction errors of the TWK method. Simulation results showed that the proposed RTWK model was significantly better than its static and dynamic counterparts (i.e., TWK and dynamic TWK methods). Several studies seek to predict individual lot cycle time by comparing key characteristic of a lot in progress to lots that have completed the target operation for which predictions are to be made. The assumption is that the production process is approximately constant over the time frame of prediction.

Öztürk et al. (2006) explored the use of data mining for lead time estimation in make-to-order manufacturing. They chose the regression

tree approach as the data mining method. To select a small subset of features with high predictive power, they also devised an empirical attribute selection procedure, which starts with the set of all attributes and then eliminates attributes based on a criterion called the weighted attribute usage ratio (WAUR).

Chang et al. (2002, 2005b) applied a partition-based fuzzy modeling method to build a prediction model for estimating the flow time for an order by taking into account a number of dynamic characteristics of a wafer fabrication factory. The number of fuzzy terms for each attribute is optimized by a genetic algorithm. Test results of data generated from a simulated wafer factory showed that the proposed method outperformed both case based reasoning and back-propagation neural networks. In another study, Chang and Liao (2006) showed that even higher prediction accuracy could be achieved by combining SOM with fuzzy rules. Backus et al. (2006) compared three data mining methods (k-nearest neighbors, neural networks, and regression trees, with and without clustering first) to learn a predictive model for cycle time from historical manufacturing data. CART with clustering was found to build the best predictive model.

The residuals were checked to monitor the model to signal whether the process has changed from the conditions used to build the model.

Last and Kandel (2001) applied an information theoretic fuzzy approach, the Information-Fuzzy Network (Maimon and Last, 2000), to a real-world dataset provided by a semiconductor company. The dataset contains about 110,000 records with each characterized by 8 attributes. A set of 58,076 records related to a product family were selected for the study. The objective was to predict the yield and the flow time of each manufacturing batch. The method produced a compact and reasonably accurate prediction model, which could be converted into a small set of interpretable rules.

Subsequently, they presented a novel, perception-based method, called the Automated Perceptions Network (APN), for the automated construction of compact and interpretable models from highly noisy datasets (Last and Kandel, 2004). They evaluated the method on yield data of two semiconductor products. The accurate estimation of the actual yield is of interest to planning personnel because an “optimistic”

estimate would cause delays in the delivery of a customer order and a

“pessimistic” estimate would lead to a waste of precious resources.

Readers are referred to Chapter 7 for a recent data mining project carried out by Dr. Last and his associates on the prediction of wine quality based on agricultural data. Table 4 summarizes all of the production planning and control related data mining studies reviewed above.

Dans le document Recent Advances in Data Mining of Enterprise Data: Algorithms and Applications (Page 73-81)