Product Related - Overview of the Enterprise Data Mining Activities 1 Customer Related

Enterprise Data Mining: A Review and Research Directions

4. Overview of the Enterprise Data Mining Activities 1 Customer Related

4.3 Product Related

Adams (2002) discussed two case studies of industrial data mining. The first case study is how Intuitive Surgical used the Datasweep Advantage software to detect any trend in both manufacturing history and field uses for a given subassembly or unit of their product, i.e., the da Vinci system that is the most technologically sophisticated robotic assisted surgery system on the market today. The company is regulated by the U.S.’s Food and Drug Administration (FDA) and is required to track the manufacturing history in detail for every unit shipped. The second case study involves how Cymer used Statserver to analyze shop floor and field data in order to uncover problems associated with critical components (called consumables) of their exciter laser, which is the essential light source for a deep ultraviolet photolithography system used in manufacturing semiconductors. Data analysis operations include using Pareto, Shewhart, and CuSum Charts to check whether data are within the specified standards, regression analysis and variance analysis to uncover problems.

Table 2. Summary of sales related data mining studies.

Reference Goal Databases/Data Description

Data size actually used

Preprocessing Data Mining Algorithm

2 years for training and 1 year for sales of 1.5 millions transactions for

Time series data A series of 1,344 data points

Their CymerOnline is an e-diagnostic system that provides light source performance monitoring capabilities, stores data to enable data mining, and delivers easy-to-interpret charts and reports.

Buddhakulsomsiri et al. (2006) presented a rough set theory-based association rule generation algorithm to uncover the relationships between product attributes and causes of failure from warranty data.

They applied the algorithm to an automotive warranty dataset collected over 2 years. To simplify the product quality evaluation process, Zhai et al. (2002) proposed an integrated feature extraction approach based on rough set theory and genetic algorithms. Using the historical data gleaned from the manufacturer of an electronic device, the prototype system was able to identify significant attributes for product quality evaluation, leading to a 58% cost reduction. Strobel and Hrycej (2006) presented a framework for the association analysis of quality data, with the goal to find relationships between the assembly and testing process and the failures in the field. They performed a case study on quality control of electronic units in automotive assembly with 3,789 field failures and 3,310 process attributes.

Menon et al. (2004) presented two successful implementations of text data mining for the purpose of quality and reliability improvement in the product development process within two large multi-national companies.

The first case study involved the use of association analysis to analyze a service center database. This database contained records of the repair actions, customer complaints and individual product details of inkjet printers. The database was a hybrid of fixed format fields and free-form text fields. The fields relevant to the analysis were first extracted from the database before applying association analysis. Classification analysis was performed in the second case on a collection of ‘voice of the customer’ data from call centers using SVMssupport vector machines.

Preprocessing the text was undertaken to remove “unwanted” text, stop words, and stemming words.

It is important to consider the relationship between the product market and technical diversity early in the product life cycle, ideally at the product development stage. To this end, Agard and Kusiak (2004a) developed a 3-step methodology for the design of product families based

on the analysis of customers’ requirements using a data mining approach.

In the first step, data mining algorithms were used for customer segmentation. Once a set of customers was selected, an analysis of the requirements for the product design was performed and association rules were extracted. The second step created a functional structure that identifies the source of the requirements’ variability. The last step elaborated on a product structure and distinguished modules to support the product variability. Agard and Kusiak (2004b) discussed the selection of subassemblies for manufacturing by a supplier based on customers’

requirements. A data mining algorithm together with an integer programming model were used to determine a candidate set of modules and options to be considered for building subassemblies. Cunha et al.

(2006) presented a data mining approach based on the learning and inference of association rules to determine the sequence of assemblies that minimizes the risk of producing faulty products.

Shao et al. (2006) proposed an effective architecture to discover customer group-based configuration rules in configuration design. Fuzzy clustering and variable precision rough sets were integrated to analyze the dependency between customer groups and product specification clusters. The Apriori algorithm was implemented as the mining method to obtain configuration association rules between clusters of product specifications and configuration alternatives.

To explore the opportunity for Taiwan’s hi-tech industry to penetrate into a new market, in particular the automobile telematics computer market, Su et al. (2006) proposed an E-CKM model with a methodology for precisely delineating the process of customer knowledge management (CKM). In the E-CKM model, the CKM process is comprised of four stages: identification of product features, categorization of customers’

needs, segmenting the markets, and extracting patterns of customers’

needs, which are supported by the applications of different methods in information technology. After data cleaning, 1,472 effective questionnaires with each having 29 attributes were obtained through a survey posted on a website. Three clustering methods including k-means, SOM, and FuzzyART were applied for segmenting the markets with the

number of ‘natural’ clusters determined by locating the ‘elbow’ point in the plot of R-squared values versus the number of clusters.

Many engineering artifacts such as space shuttle fuel tanks and off-shore drilling plate forms are joined by welding. Perner et al. (2001) empirically compared the performance of neural nets and decision trees based on a dataset for the detection of defects in welding seams. Each digitized weld image was decomposed into various ROIs (Region of Interest) of 50×50 pixel size and for each ROI 36 features were computed. A parameter significance analysis was used for feature selection to reduce the number of features to seven before training four neural nets (BP, RBF, fuzzy ARTMAP, and LVQ). Numerical attribute discretization was done before inducing decision trees using Decision Master. BP and RBF were found to produce lower error rates but they are not comprehensible by humans.

On the other hand, Liao (2003) reported that a GA-enhanced fuzzy rule approach outperformed both fuzzy k-nearest neighbors and MLP neural networks when all three methods were tested with 147 records of six different weld flaw types with each characterized by 12 numeric features. Ceramics have been chosen as the material of choice for many applications such as cutting tools due to their desirable properties; but they are known to be brittle. Dengiz et al. (2006) investigated the effects of three ceramic powder preparation methods for ceramics manufacturing on the growth and characteristics of microstructure flaws and damage on the ceramic surface, using a two-stage procedure. In the first stage, digital microstructural images are mined to characterize the flaws and surface damage. In the second stage, an extreme value probability distribution was fitted using the information from the first stage.

Hsu and Wang (2005) applied decision tree-based approaches to develop systems for sizing pants for soldiers in Taiwan. Samples that contain missing or abnormal data were first deleted. Domain experts were consulted to determine eight anthropometric variables that are strongly associated with garment production. Factor analysis was performed to select waist girth and outside leg length as the two most important sizing variables. Finally, taking the body mass index as the target variable, the CART technique was used to model the data.

Romanowski et al. (2006) developed a similarity measure that can be used to cluster bills of materials (BOMs) into product families and subfamilies. In their formulation, each BOM was depicted as a rooted, unordered tree. They argued that different engineers may build completely identical end items with very different BOM structures. They distinguished three ways that BOM trees may differ: (i) structural differences such as the number of intermediate parts, parts at different levels, and parts with different parents, (ii) differences in component labels, and (iii) differences in both components and structures.

Computing the similarity of BOMs was formulated as an NP-hard tree bundle matching problem and for its solution some possible heuristic approaches were suggested. In (Romanowski and Nagi, 2005), 75 BOMs with known product family classifications were collected from an electronic manufacturer and the Decomposition and Reduction (DeRe) algorithm was used to compute the pairwise distances between them. A k-medoid clustering algorithm, CLARANS, was used to group similar BOMs into product families.

Product portfolio planning has far reaching impact on the company’s business success in competition. In general, product portfolio planning has two major stages: portfolio identification and portfolio evaluation and selection. Portfolio identification aims to capture and understand customer needs effectively and accordingly to transform them into specification of product offerings. Portfolio evaluation and selection deals with the determination of an optimal configuration of these identified offerings with the objective to achieve best profit performance for the company. Jiao and Zhang (2005) developed explicit decision support to improve product portfolio identification by efficient knowledge discovery from past sales and product records using an association rule mining system. The system involves four consecutive stages, data preprocessing, functional requirements clustering, association rule mining, and rule evaluation and presentation, which interact with one another to achieve the goals. They applied the methodology and system to a consumer electronics company in order to generate a vibration motor portfolio for mobile phones.

To tackle the problem of product assortment analysis, Brijs et al.

(2004) introduced a microeconomic integer programming model for product selection called the PROFSET model based on the use of frequent itemsets. The objective was to maximize the overall profitability of the hit list of products. Basic products can be specified by forcing the model to select certain products. The size of the hit list was also specified as a constraint. They carried out an empirical study based on some sales data. The study involved two phases: discovery of the frequent sets of the products and selection of a hit list of products using the PROFSET model.

Wong et al. (2005) studied the problem of Maximal-Profit Item Selection (MPIS ) with cross-selling effect, which involved finding a set of items in consideration of the cross-selling effect such that the total profit from the item selection is maximized. They modeled the cross-selling factor with a special kind of association rules called loss rules.

The rules take the form I -> ◊d, where I is an item and d is a set of items, and ◊d means the purchase of any items in d. Such a rule is used to estimate the loss in profit of item I if all items in d are missing after the selection. The rule corresponds to the cross-selling effect between I and d. They proposed a quadratic programming method, a heuristics method called MPIS_Alg and a genetic algorithm approach to solve the problem.

A comparison was also made with a naïve approach that simply calculates the profits generated by each item for all transactions and selects the J items with the greatest profit, and the HAP approach that applied the “hub-authority” profit ranking. Table 3 summarizes all of the product related data mining studies reviewed above.

Dans le document Recent Advances in Data Mining of Enterprise Data: Algorithms and Applications (Page 67-73)