Deep learning for automated visual inspection of uncured rubber

(1)

Deep Learning for Automated Visual Inspection of

Uncured Rubber

by

James Thomas Howard Smith

B.S. Mechanical Engineering, Georgia Institute of Technology, 2013

Submitted to the Department of Mechanical Engineering and the MIT Sloan School of Management in partial fulfillment of the requirements for the degrees of

Master of Science in Mechanical Engineering and

Master of Business Administration

in conjunction with the Leaders for Global Operations Program at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY

June 2018

The author hereby grants to MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created.

Author ....

Signature

redacted...

Department of Mechanical Engineering and the MIT Sloan School of Management

May 11, 2018

Cetfedb.Signature

redacted

Certified by..S

g

a u e r d c

d

...

David Hardt, Thesis Supervisor Ralph E. and Evelyn F. Cross Professor of Mechanical Engineering

Certified by...Signature

redacted

RojWelsch, Thesis Supervisor Eastman Kodak Leaders for Global Operations Professor of Management

A pproved by ...

Rohan Abeyaratne Chair, Mechanical Engineering Graduate Program Committee

Approved by...Signature

redacted...

MASSACHUSETTS INSTITUTE %Maura Herson

OF TECHNOLOGY Director, MBA Program, MIT Sloan School of Management

JUN 2

5 2018

(2)

(3)

Deep Learning for Automated Visual Inspection of Uncured

Rubber

by

James Thomas Howard Smith

Submitted to the Department of Mechanical Engineering and the MIT Sloan School of Management on May 11, 2018, in partial fulfillment of the requirements for the degrees of

Master of Science in Mechanical Engineering and

Master of Business Administration

Abstract

This thesis proposes a data-driven approach to automate the visual inspection of uncured rubber tire assemblies. Images collected from a machine vision system are used to develop proof of concept predictive models to automate the visual inspection step for tread caps. The developed binary model exhibits an AUC of 0.91 on the test set and a simulated business scenario shows this performance can reduce manual inspection time by 16-70%, depending on the selected decision threshold determined by business needs. This appears to be the first study to develop a method that successfully detects and locates a wide range of uncured rubber nonconformities. The multiclass model also exhibits promising ability to distinguish between different nonconformity types. The results of this study can be used to inform the investment decisions required to fully automate the process. It will be straightforward to adapt the models to predict nonconformities on the rest of the uncured assembly surface when that data becomes available.

Of interest to the machine learning community, the empirical work required to develop these models highlights several key insights. A comparison is made of techniques used to address class imbalance in neural network training. For our problem, a penalized loss function is superior for binary classification, while oversampling performs best for the multiclass problem. The study also highlights the importance of analyzing to what extent a pretrained network should be transferred. For our problem, removing the final convolutional layers of the pretrained network significantly improves performance. While the specifies of these findings are likely unique to our problem, this study highlights the importance of these decisions when training neural networks on relatively small and imbalanced training sets. Thesis Supervisor: David Hardt

Title: Ralph E. and Evelyn F. Cross Professor of Mechanical Engineering Thesis Supervisor: Roy Welsch

(4)

(5)

Acknowledgments

The author wishes to acknowledge the Leaders for Global Operations Program for its support of this work.

This thesis was made possible by the support of many people. I would like to thank all of my teammates at the sponsoring company that helped make this work possible in one way or another. I also want to thank my academic advisors Professor Dave Hardt and Professor Roy Welsch for the guidance they provided along the way. Finally, I want to thank all of my family and friends for their encouragement, care and support. A special thanks to my parents for always encouraging me to pursue my interests and goals, wherever they may take me.

(6)

(7)

Glossary 15 1 Introduction 17 1.1 Project Motivation . . . . 17 1.2 Problem Statement . . . . 18 1.3 Approach ... ... ... 18 1.4 Contributions . . . . 19 1.5 Thesis Outline . . . . 20 2 Problem Background 21 2.1 Tire Manufacturing and Inspection Process . . . . 21

2.1.1 Manufacturing Process . . . . 21

2.1.2 Need for Visual Inspection . . . . 22

2.1.3 Manual Inspection Procedure . . . . 22

2.2 Automated Inspection . . . . 23

2.2.1 Automated Inspection Goals . . . . 23

2.2.2 Existing Equipment . . . . 24

2.3 Chapter Summary . . . . 24

3 Overview of Available Data 25 3.1 Original Available Data . . . . 25

3.2 Image Sorting . . . . 25

(8)

3.4 New Data . . . .. . . . . 28

3.5 Chapter Summary . . . . 28

4 Related Work in Image Classification 29 4.1 Tire and Rubber Industry Image Classification . . . . 29

4.2 General Image Classification . . . . 31

4.3 Deep Learning Overview . . . . 31

4.3.1 Learning Methods . . . . 31

4.3.2 Artificial Neural Networks . . . . 32

4.3.3 Convolutional Neural Networks . . . . 33

4.4 Chapter Summary . . . . 34

5 Problem Formulation 35 5.1 Supervised Learning Approach . . . . 35

5.2 Challenges . . . . 36

5.2.1 Small Dataset . . . . 36

5.2.2 Class Imbalance . . . . 36

5.2.3 Concept Drift . . . . 37

5.3 Performance Metrics . . . . 37

5.3.1 Binary Classification Metrics . . . . 37

5.3.2 Multiclass Metrics . . . . 40

5.3.3 Localization Performance . . . . 41

5.4 Chapter Summary . . . . 41

6 Model Design 43 6.1 Transfer Learning and Model Architecture . . . . 43

6.2 Data Augmentation . . . . 45

6.3 Optimization . . . . 46

6.4 Regularization . . . . 48

6.5 Addressing Class Imbalance . . . . 49

(9)

6.7 Chapter Sum m ary . . . . 7 Experimentation 55 7.1 Methodology . . . . 55 7.1.1 Cross-Validation Procedure . . . . 55 7.1.2 Hyperparameter Selection . . . . 57 7.2 Binary Classification . . . . 57

7.2.1 Binary Classification Model Selection . . . . 58

7.2.1.1 Adressing Class Imbalance: Oversampling vs Penalized Loss Function . . . . 58

7.2.1.2 How Much of the Pretrained Network Should be Transferred? 60 7.2.1.3 Selected Final Configuration . . . . 62

7.2.2 Binary Classification Test Set Results . . . . 63

7.2.3 Business Impact Scenario . . . . 68

7.3 Multiclass Classification . . . . 71

7.3.1 Multiclass Model Selection . . . . 71

7.3.2 Multiclass Test Set Results . . . . 71

7.4 Results Summary and Discussion . . . . 73

8 Conclusions 77 8.1 Summary of Results and Contributions . . . . 77

8.2 Future Work . . . . 78

A Tables 81

(10)

THIS PAGE INTENTIONALLY LEFT BLANK

(11)

List of Figures

2-1 General Process Flow for Tire Manufacturing Process . . . . . 3-1 Labeled Tread Cap Examples . . . . 5-1 Example ROC Curve . . . . 6-1 M odel Architecture . . . . 6-2 Example Binary Classification Heatmap Prediction . . . .

7-1 7-2 7-3 7-4 7-5 7-6 7-7 7-8 7-9 7-10 Cross-validation procedure . . . . Oversampling vs. Penalized Loss to Address Class Pretrained Model Output Comparison . . . . Selected Binary Classification Model . . . . Binary Model Test Set ROC Curve . . . . Sample of Binary Model True Positives . . . . Sample of Binary Model True Negatives . . . . . All Binary Model False Negatives . . . . Sample of Binary Model False Positives . . . . Business Impact of Implementing Binary Model .

Imbalance

7-11 Oversampling vs. Penalized Loss to Address Class Imbalance for Multiclass P roblem . . . . 7-12 Selected Multiclass Model . . . . 7-13 Sample of Correct Multiclass Model Test Set Predictions . . . . 7-14 Sample of Incorrect Multiclass Model Test Set Predictions . . . .

22 27 39 45 53 . . . . 56 . . . . 59 . . . . 61 . . . . 62 . . . . 64 . . . . 65 . . . . 66 . . . . 67 . . . . 68 . . . . 70 72 72 74 75

(12)

(13)

List of Tables

3.1 Original Image Data (Collected Prior to 01/19/2018) . . . . 3.2 New Image Data (Collected After 01/19/2018) . . . . 5.1 Binary Model Confusion Matrix . . . . 6.1 Data Augmentation Transformations . . . . 7.1 Multiclass Model Test Set Confusion Matrix . . . . A. 1 Binary Model 5-Fold Cross-Validation Experimental Results . A.2 Multiclass Model 5-Fold Cross-Validation Experimental Results

26 28 38 46 73 82 83

(14)

(15)

Glossary

area under the curve (AUC) Model comparison metric determined by calculating the area underneath the receiver operating characteristic curve.

Artificial neural networks (ANN) Computing models based on a collection of connected units, loosely inspired by biological neural networks.

binary classification The task of classifying the elements of a dataset into two separate classes.

class The specific category to which a data observation belongs.

class imbalance Situation where there is a significant difference between prior probabilities of the different classes in the dataset.

convolutional layer The core building block of a CNN, consisting of a set of learnable kernels.

Convolutional neural network (CNN) A type of artificial neural network that utilizes a shared-weights architecture.

hyperparameter A machine learning model parameter whose value has to be set before the learning process begins.

kernel A matrix of weights used to perform a convolution operation on an image.

multiclass classification The task of classifying the elements of a dataset into one of three or more classes.

(16)

oversampling A data analysis sampling technique that is used to include repeated samples from the minority classes to adjust the class distribution of the training set.

receiver operating characteristic (ROC) curve A plot that illustrates a binary classi-fier's diagnostic ability as its decision threshold is varied.

transfer learning A machine learning technique where knowledge gained from one problem is applied to a different but related problem.

(17)

Chapter 1 Introduction

With recent advances in data storage abilities, algorithm development and computing hard-ware, there has been an explosion of data-driven applications throughout society. Businesses are racing to implement these advances to solve their challenges. Manufacturing companies hope to use this progress to improve quality and reduce costs in their plants. Here, our work focuses on using data-driven techniques to automate a visual inspection process for a tire manufacturing company. This thesis develops data-driven models to automate visual quality inspection of uncured rubber tire assemblies. This chapter introduces the project motivations, outlines the approach and contributions of this work, and provides an overview of the thesis structure.

1.1 Project Motivation

The visual inspection of uncured tire assemblies can serve an important purpose in the tire building process, but using a human to perform this step has disadvantages. The information gained from this inspection provides a valuable feedback signal for the process, and the decision to scrap assemblies based off this inspection can prevent the waste of resources. However, the overwhelming majority of tires are in conformance and most of the manual inspector's time is spent inspecting passing tires. Hypothetically, if only 4% of tires in a shift exhibit nonconformities for the inspector to recognize and record, 96% of the inspector's shift time is spent inspecting passing tires and adding no value to the process. This time could

(18)

instead be used by the worker to perform other value added tasks within the plant. In addition, the manual inspection process is a task subject to human judgment.

Thus, tire companies desire to automate this visual inspection step. For the sponsoring company of this project, the initial goal is to automate the pass/fail decision, a binary classification problem. An effective binary classifier can significantly increase the time spent by the inspector performing other valuable tasks within the plant. Of course, the company's eventual goal for the system is to fully automate the process. This requires a system that can recognize, locate and classify nonconformities without a human in the loop, a multiclass classification problem. This thesis makes significant progress in moving the company toward both of these goals.

1.2 Problem Statement

The company sponsoring this work is currently developing a machine vision system that collects image data and performs digital dimensional measurements of uncured tire compo-nents. A hypothesis of our project is that this image data can also be utilized to automate the visual inspection of the uncured tire assembly. At the time of this study, image data is only available for the tread cap component of the tire. As a result, this study focuses on automating the visual inspection of the tread cap component. A key requirement is that the approach must be general enough to be applied to the rest of the tire once the vision system is capable of collecting image data of the entire assembly surface. Challenges for this problem include a relatively small dataset (~~2250 images), class imbalance in the dataset (only ~5% of the images have nonconformities), noisy data and the variable deformation characteristics of uncured rubber. This thesis presents methods to address each of these challenges.

1.3 Approach

The variability of uncured rubber makes it difficult to design explicit rules to describe the nonconformities that can occur. This leads to the need for machine learning approaches that

(19)

can learn the nonconforming features from the data without explicit programming. The availability of image labels from the manual inspection allows this problem to be modeled as a supervised learning image classification task. Convolutional neural network (CNN) models are developed for both the binary and multiclass classification problems. The models use transfer learning and data augmentation techniques to overcome difficulties with the small data set. Gradient class activation mapping is used to create a localization heatmap to gauge each model's ability to localize nonconformities. Experiments are performed to determine the best model configurations for this problem. In addition, time was spent in the plant observing the manufacturing process and inspecting tires with the manual inspector to better understand the real world context in which these models must operate.

1.4 Contributions

The binary and multiclass models we have developed for the tread cap exhibit impressive performance despite the small and imbalanced dataset. The binary classification model has an area under the receiver operating characteristic curve (AUC) of 0.91 for the test set. Assuming this performance can generalize to the rest of the uncured assembly surface once that data becomes available, this can immediately reduce the manual time spent inspecting passing tires by ~16-70%, depending on the required decision threshold. This performance is expected to improve as the training set size grows. The multiclass model also shows promising ability to discriminate between different class labels, though more data for each class is required to become production ready. The localization heatmaps confirm the models are highly capable of locating nonconformities. Results of this work can help the company clarify the economics of automated inspection and guide future investment decisions.

Beyond the tire industry, the methodology presented in this paper has applicability for researchers performing image classification tasks with small, imbalanced datasets. trans-fer learning is a useful technique to overcome the challenges of small datasets for image recognition tasks, and our empirical work demonstrates the importance of experimentation to determine how much of a pretrained network should be transferred. For our problem, removing the final convolutional layer of the transferred pretrained network significantly

(20)

outperforms transferring all of the convolutional layers. In addition, imbalanced data is prevalent in practical applications, and our empirical work compares the performance of a penalized loss function to an oversampling technique to address class imbalance in neural networks. For our problem, a penalized loss function provides higher performance for the binary classification problem, but oversampling outperforms the penalized loss function for the multiclass classification problem. While these findings are likely problem specific, it highlights the importance of considering the different approaches for different settings.

1.5 Thesis Outline

Chapter Two provides background on the tire manufacturing process and automated in-spection goals. Chapter Three provides an overview of the available data to give additional context for the problem. Chapter Four discusses related image classification work in the tire/rubber industry as well as more general applications. A general overview of deep learn-ing is also provided. Chapter Five discusses the problem formulation and related challenges as well as the metrics used to evaluate model performance. Chapter Six presents an overview of the design decisions used to develop the model. Chapter Seven presents the experimental results for both the binary and multiclass models. Chapter Eight summarizes the findings and proposes recommendations and future work.

(21)

Chapter 2 Problem Background

This chapter provides background information and context related to the tire manufacturing process and the automated inspection goals.

2.1 Tire Manufacturing and Inspection Process

2.1.1 Manufacturing Process

Figure 2-1 outlines the general tire manufacturing process. A tire is a complex assembly of various uncured rubber components and steel or fabric reinforcements. The components are built up on a tire building drum to form the uncured assembly. A manual visual inspection is performed on the uncured assembly prior to the curing stage. Assemblies that do not pass this visual inspection are scrapped. Passing assemblies are sent on to the curing press where they are subjected to heat and pressure to cure the rubber and give the tire its final shape. Final finish operations are then performed on the cured tire to verify the quality of the final product. These operations include force and balance testing and a final visual inspection. The manual visual inspection prior to the curing step presents an opportunity for automation and is the focus of this thesis.

(22)

- -1

1 W Maual ES CringYES

- Uncured Visual Pass? YES CrTsFinal Finish Pass?

Assembly Inspection Stage Testing Tasr?

NO

Figure 2-1: General Process Flow for Tire Manufacturing Process. Once the individual components are assembled, the uncured assembly undergoes a manual visual inspection. Assemblies that pass this stage go on to curing and final finish testing. This project seeks to automate the manual visual inspection step.

2.1.2 Need for Visual Inspection

The visual inspection of the uncured tire prior to the curing step serves at least two impor-tant functions. First, nonconformity information captured by the visual inspection serves as an important feedback mechanism for the process. Nonconformities in the uncured rubber components can occur due to variations in the materials, process equipment and environ-ment. Information gained from the visual inspection can be used to recognize and correct issues early. The second function of the visual inspection is to scrap nonconforming uncured assemblies prior to curing and final finish. The presence of nonconformities leads to issues in the curing process and poor performance in final finish testing. Scrapping nonconforming assemblies prior to these steps prevents the waste of resources on tires that would need to be scrapped after final testing anyways. These two outcomes of the visual inspection step, process feedback and prevention of wasted resources, explain why this step is important to the tire manufacturing process.

2.1.3 Manual Inspection Procedure

A description of the manual visual inspection procedure helps illustrate the desire to auto-mate this step. In the current state, the inspector manually examines the uncured assembly surface searching for nonconforming features in the rubber. Assemblies that pass the inspec-tion are sent to the curing step. If a nonconformity is found, the inspector records the type

(23)

of nonconforming feature and its location. Most assemblies with nonconformities cannot be repaired, so these assemblies are automatically scrapped. A few types of nonconformities can be repaired, so the inspector repairs these assemblies and sends them on to the curing step. This inspection procedure is only valuable to the manufacturing process when nonconformi-ties occur. Since the significant majority of uncured assemblies in a given shift do not have nonconformities, most of the inspector's time is spent inspecting passing tires. Manual time spent inspecting passing tires prevents the inspector from performing useful tasks within the plant. A system that can automate this visual inspection step can increase productivity within the plant and reduce any potential variability in the inspection process.

2.2 Automated Inspection

2.2.1 Automated Inspection Goals

The move towards an automated visual inspection system can be divided into two stages. The goal of the first stage is to automate the binary passing/nonconforming decision. In this stage, the automated system inspects the assemblies and only alerts the human inspector when a nonconformity is detected. When alerted, the human locates the suspected noncon-formity, determines the nonconformity type and performs the repair procedure if necessary. An automated system that distinguishes between passing and nonconforming assemblies can significantly reduce the human time spent inspecting passing tires, increasing productivity within the plant.

The second automation stage requires a system that not only recognizes a nonconformity is present, but is also able to locate and label the type of nonconformity. Now, the human is only alerted when a nonconformity that can be repaired is detected or if the system is unsure about the nonconformity type. The latter case may be due to a new or rare type of nonconformity. This second automation stage further increases productivity within the plant and reduces the need for human involvement in distinguishing between nonconformity types.

(24)

2.2.2 Existing Equipment

Within a manufacturing plant of the sponsoring company, a machine vision digital mea-surement system is under development to automate dimensional meamea-surements of uncured rubber components. This system uses laser triangulation to collect data that can be used to construct an image of the uncured rubber component. Our hypothesis is that this image data can also be used to automate the visual inspection process. Since the machine vision system is still in development, the tread cap is currently the only component of the uncured assembly surface with image data available. This thesis focuses on developing models to automate the visual inspection of the tread cap, using methods that are general enough to be used on the rest of the uncured assembly once this data becomes available.

2.3 Chapter Summary

The visual inspection step prior to the curing process can provide valuable feedback for the manufacturing process. In addition, scrapping nonconforming assemblies at this step can reduce wasted curing and final finish resources. However, manual time spent inspecting passing assemblies does not add value to the process. We can think of the path to au-tomation for this process in two stages. The first stage attempts to automate the binary passing/nonconforming decision. The second stage fully automates the inspection process by also automating the ability to distinguish between different nonconforming types. Tread cap image data collected from an available machine vision system is used for this study.

(25)

Chapter 3 Overview of Available Data

This chapter describes the available dataset and the steps taken to prepare it for analysis.

3.1 Original Available Data

The original tread cap image data available for this study was captured between September 6, 2016 and January 18, 2018. Only a single tire building station in the plant had an image acquisition system installed prior to November 2017. This station accounts for ~50% of the available data. Image acquisition systems were installed on additional tire building stations in November 2017, resulting in these stations producing the remaining data in a roughly even manner. Inspection labels are available for the barcode of each uncured tire assembly inspected by the manual operator.

3.2 Image Sorting

The image data and inspection labels were matched based off their barcodes, and each image was then sorted by its inspection label. An initial review of the images revealed that some were filled with excessive noise that prevents observation of the tread cap. This is mainly a problem for data captured at earlier dates when the image acquisition system was early in its development. These extremely noisy images were manually removed from the dataset. This resulted in a total of 1592 labeled images initially available for the training/validation/test

(26)

Table 3.1: Original Image Data (Collected Prior to 01/19/2018) Binary Multiclass Multiclass

#

of

Label Name Label Examples

Passing Images 0 A [1, 0, 0, 0] 1487 Images with Nonconformity Present B [0, 1, 0, 0] 23 C [0, 0,1, 0] 28 D [0, 0, 0, 1] 54 set split.

Reviewing the images also revealed that some were mislabeled. While models developed from large datasets can be robust to a few mislabeled examples, performance of models de-veloped on small datasets can be sensitive to this noise. This fact led to a manual review of every image to ensure correct labels. The plant quality team was consulted for question-able images. While the manual review was labor intensive, it provides confidence the data represents the correct probability distributions we want to model. Recommendations were made to convey the importance of properly labeled data to prevent the need for this manual review in the future.

The images are sorted into four labeled classes: Class A, Class B, Class C and Class D. Class A represents the "passing" class, while the other three classes represent images with specific types of nonconformities. Table 3.1 displays the number of examples in each class for the original data. Though it is certainly possible for multiple nonconformities to occur on a single tread cap, this situation was not present in the available data. For this reason, the multi-label case was not considered and the problem is modeled as a mutually exclusive multiclass problem instead. Addressing the prediction of multiple labels in the same image is left to future work when more data is available. For the binary classification problem, Class B, Class C and Class D are combined into a single class (the positive case) to represent all examples with nonconformities present. This results in an initial total of 105 nonconforming examples for the binary classification task.

(27)

(a) Class A (b) Class B

(c) Class C (d) Class D

Figure 3-1: Labeled Tread Cap Examples. Image (a) is an example of a passing image. Images (b), (c) and (d) are examples of images with specific types of nonconformities.

3.3 Rescaling

The image data is represented by a two-dimensional matrix of raw pixel intensities. The lengths and widths of these raw images vary from ~ 700 - 1200 pixels depending on the diameter, width and rotation speed of the assembly at the time of data acquisition. Each image is rescaled to 299x299 pixels to standardize the image size. This reduces memory requirements, increases calculation speeds and prepares the data to be used with the pre-trained architectures and weights discussed in Chapter 6. The rescaled images are then saved as JPEG files. The rescaling and conversion to JPEG format does not appear to cause meaningful loss of information in the available data. Figure 3-1 shows an example image from each class.

(28)

Table 3.2: New Image Data (Collected After 01/19/2018) Binary Multiclass Multiclass

#

of

Label Name Label Examples

Passing Images 0 A [1, 0, 0, 0] 644 Images with Nonconformity Present B [0, 1, 0, 0] 4 C [0, 0, 1, 0] 0 D [0, 0, 0, 1] 7

3.4 New Data

After the model training commenced, additional data became available between January 19, 2018 and February 19, 2018. This data was added to the test set and is presented in Table 3.2.

3.5 Chapter Summary

A total of 1592 tread cap images are available in the training/validation/test set split. After sorting the images by their inspection label, a manual review was performed to ensure correct labels were applied to each image. The images are rescaled to 299x299 pixels. An additional 655 images became available after model training started, so this data has been added to the test set.

(29)

Chapter 4 Related Work in Image Classification

This chapter provides an overview of previous image classification work in the tire and rubber industry and for general image classification tasks. A brief overview of neural network methods is also included.

4.1 Tire and Rubber Industry Image Classification

There appears to be little published work related to image classification of uncured rubber. Most of the relevant published work focuses on anomaly detection for cured rubber. Guo and Wei [12] propose a method using image component decomposition (ICD) to detect non-conformities in cured tire sidewall X-ray images. They apply successive filtering techniques to separate the image into three components: texture, background and nonconformity. A threshold function is applied after this separation to detect and locate the nonconformity. In another related work [54], Zhang et al. combine geometric transforms with edge detection methods to detect edges in cured tire laser shearography images. Additionally, Xiang et al. use a dictionary representation method in [51] to detect nonconformities in cured tire sidewall X-ray images.

Guo et al. recognize in [13] that the above methods are designed for the smooth surface of cured tire sidewalls and fail to perform effectively for more complex components such as tread caps. They instead propose a weighted texture dissimilarity method to be used for X-ray images of finished tire tread caps and sidewalls. This method creates an anomaly

(30)

map by calculating the weighted average of the dissimilarity between image pixels and their neighbors in a local window. Nonconformities are then located by segmenting this anomaly map using a threshold function. Pixels in the anomaly map are set to one or zero based on whether they exceed predetermined intensity and local variance thresholds. This creates a binary image meant to highlight the location of nonconformities. The feature engineering for this method is difficult since it requires the selection of global thresholds and a local window size that will work for all of the data. The test images they use in this work appear to have relatively simple nonconforming shapes. For our problem, the complex nonconforming patterns with varying degrees of subtlety make it difficult to determine global thresholds and select a local window size that works for all types and sizes of nonconformities.

Bharathi et al. attempts to overcome the issues of global thresholding methods in their work on the detection of rubber oil seal nonconformities [40]. They propose a statistical approach using gray level co-occurrence matrices (GLCM). This method characterizes the image texture by calculating the frequency that specific pairs of pixels in certain spatial relationships occur in an image. Statistics can be calculated from the GLCM to describe features and make detection decisions. The authors conclude this method is somewhat effective though not sufficient for accurate nonconformity detection. Furthermore, the spatial orientations chosen for the construction of the GLCMs and the statistics used as features have to be specific to the data. So again, a substantial amount of feature engineering is required for this method.

In addition to the difficulty of defining features in order to apply the above methods to our problem, it is important to note that all of the previously mentioned methods are only designed for the binary problem of detecting whether a nonconformity is present or not. They are not capable of distinguishing between different types of nonconformities for the multiclass problem. There also does not appear to be a clear path for adapting these methods for this purpose due to the increased feature engineering complexity this would entail. A different approach is required for our problem since the second automation stage requires a model that can discriminate between different nonconformity types.

(31)

4.2 General Image Classification

Traditional image analysis and classification tasks have relied on handcrafted rules and fea-ture engineering [29]. The methods mentioned in Section 4.1 reflect a few examples of these types of approaches. Researchers would also like to use algorithms to learn the important fea-tures from the raw image pixels without the need for specific rules and feature descriptions. LeCun achieved the first successful application of Convolutional Neural Networks (CNNs) for this purpose in his work on hand-written digit recognition [25]. However, it took several years before advances in data collection, computing resources and algorithm development allowed deep learning networks to be designed and achieve significant performance levels. The AlexNet algorithm [23] was the major contribution that significantly outperformed pre-vious methods in the benchmark ImageNet image recognition competition [32]. This success caused the use of CNNs to gain momentum in the research community. Numerous advances in algorithms and training methods have since been proposed [44, 50, 42]. Contributions such as these continue to improve image classification capabilities in numerous domains. An important application area related to our problem is the use of deep CNNs in the medical image community [36, 48, 37]. The grayscale X-ray images used in these studies and the complex shapes they attempt to recognize are relatively similar to the complex features of uncured rubber images.

4.3 Deep Learning Overview

This section provides a general overview of deep neural networks to provide intuition and context for the rest of the thesis. The reader is referred to [25, 11] for further information.

4.3.1 Learning Methods

Two main types of machine learning methods are supervised and unsupervised learning algorithms. In supervised learning, there is a dataset with training examples X along with corresponding training labels Y from the population (X, Y). The goal is to learn a predictive model f that approximates the mapping from X to Y:

(32)

i =

f

(Xi; 6) (4.1) where

Qj

is the model prediction for the ith example xi from X and

E

represents the model parameters to be learned from the training data. This learning is typically achieved by iter-atively updating

e

to minimize a loss function L(yi,

Qj)

during training. Here, yj represents the true label. In unsupervised learning, the training labels Y are not available. Instead, the goal is to learn the approximate underlying structure of X from the training data without labels. Supervised learning is the method applicable to our classification problem and will be the focus of this work.

4.3.2 Artificial Neural Networks

Artificial neural networks (ANN) have become an effective method for supervised learning tasks [11]. The output of a particular hidden unit (neuron) zj is calculated by applying a nonlinear activation function aj to the dot product of the hidden unit inputs Zj_ 1 and the

corresponding input connection parameters

E

₃

:

zj = aj(Ojzy_) (4.2)

where zj_1 has been augmented with an additional 1 to take the bias term into account. The basic fully connected feedforward ANN architecture is formed by stacking and connecting J layers with kj hidden units in each layer 1j. The output of every hidden unit in layer 1j serves as an input of every hidden unit in the next layer l_j1 _{. Given example xi, the model}

output ij can be obtained by:

i = aj(Ejaj_₁ (j_₁ ... a1(8zo))) (4.3)

where zo represents the input vector xi augmented with an additional 1 to take the bias term into account. To train the network, we define a loss function L, randomly initialize

E,

and then iterate through the training set updating

E

in the direction that minimizes the loss. Gradient based optimization methods are typically used for this minimization problem.

(33)

4.3.3 Convolutional Neural Networks

For high dimensional input data such as images, the number of parameters needed for a fully connected architecture such as the one described above can grow large. Fortunately, for data with a known grid-like input structure such as images, convolutional neural networks (CNN) [26] use a modified architecture which significantly decreases the number of parameters that need to be trained.

In image processing, a convolution operation consists of sliding a matrix of weights (pa-rameters) known as a kernel across the image input xi and multiplying the kernel parameters

ek by the image pixels to produce a feature map Zk. By cleverly defining ek for the kernel,

the convolution operation can be used to produce feature maps that represent useful image processing outputs such as edge detection, blurring and sharpening among others. With CNNs, the convolution operation for kernel parameters &k in layer

j

can be simulated by

sharing O's parameters across all spatial locations of the input giving E₂,k. This parameter sharing implies that if a feature is detected in one spatial region of the image, it is also useful to be detected in another spatial region of the image. This assumption allows for a signifi-cant reduction in required parameters compared to a fully connected network. Performing the matrix multiplication between Ej,k and xi creates the feature map Zj,k which is then

used as an input to the next layer. By stacking K kernels in layer Lj, K, feature maps can be created and used as inputs to the next layer. Stacking multiple layers together allows for a hierarchy of feature maps to be created. In addition, pooling layers are oftentimes inserted between convolutional layers to downsample the size of the input to the next layer for dimensionality reduction purposes. CNNs also generally have fully connected layers at the end of the network to compute the class prediction. As with ANN training, by randomly initializing the kernel parameters, the network can be trained using similar gradient-based optimization procedures.

The CNN architecture allows the model to learn a hierarchy of features. As a simplified example, the earlier layers of the network may learn several types of edge detectors. The output of these edge detection layers can then be used as inputs to later layers in the network. These later layers may learn to combine the edges into various shapes. The final layers of

(34)

the network may learn to combine these shapes to recognize the target classes.

4.4 Chapter Summary

There is little published work related specifically to image classification of uncured rubber. The published work that exists focuses on cured rubber applications. These methods rely on handcrafted rules and features that can be difficult to define for the complex nonconformities that occur in uncured rubber. In addition, the previous methods are only focused on binary classification and do not offer a clear path to the development of a multiclass model. For these reasons, deep learning methods are explored since they have achieved state of the art image classification performance in a variety of domains. A brief overview of artificial neural

networks (ANNs) and convolutional neural networks (CNNs) is provided.

(35)

Chapter 5 Problem Formulation

This chapter outlines the supervised learning approach used for this problem, the challenges related to this approach and the performance metrics used to evaluate effectiveness.

5.1 Supervised Learning Approach

With supervised learning, our goal is to learn a mapping function f and its parameters & from a training set of labeled examples. We will then use this function to make a prediction

yi

of the labels of new examples. Given (xi, yi), the ith input image and its corresponding label from (X, Y) sampled from the true population (X,Y), we seek to find a function

yj

= f (xi; 6) that approximates the mapping from X to Y. This can be thought of probabilistically as learning a function that models P(y = yx I xi, 0). For the binary classification problem,

yj = 0 for the passing case (negative case) and yj = 1 for the case in which a nonconformity is present (positive case). For the multiclass classification problem, yj = [A, B, C, D] where the letter of the corresponding class equals one and the other letters equal zero. The label for each class is provided explicitly in Table 3.1 of Chapter 3. Modeling the problem as a supervised learning problem allows for the opportunity to probabilistically learn the important features and structure from the available data, without attempting to explicitly define these complex features manually.

(36)

5.2 Challenges

The available dataset presents multiple challenges that need to be addressed to create a useful image recognition model.

5.2.1 Small Dataset

The state of the art deep learning architectures have millions of parameters that need to be learned from the data. This makes it difficult to avoid overfitting these parameters on small datasets. The seminal architecture AlexNet [23] has 60 million parameters and was trained on 1.2 million images from the ImageNet database. This leads to the conclusion that the 2247 images available for this study will not be enough to train a custom CNN architecture from scratch. To overcome this challenge, the concept of transfer learning can be used by adapting the parameters from models previously trained on large image databases [52, 8].

This allows for progress to be made even with relatively small datasets. The use of this technique for our problem is discussed in detail in Section 6.1. Even with the use of transfer learning however, we still need more data to model the countless ways objects can appear in images due to shifts in lighting, position, size or orientation [33]. Data augmentation methods [41] are employed to perform transformations on the training images. This helps improve the generalization performance of the model. The specific transformations used during training are presented in Section 6.2.

5.2.2 Class Imbalance

Only 5% of the available images have nonconformities present. This situation is known as class imbalance. It is characterized by a significant difference in prior probabilities between classes [19]. The optimization methods of neural networks generally assume balanced classes, so class imbalance can significantly affect the performance of these models. Steps must be taken to either adjust the optimization objective or balance the data. Methods used to address class imbalance for this problem are described in Section 6.5.

(37)

5.2.3 Concept Drift

It is possible for the underlying distributions of the data to shift over time due to changes in the manufacturing process, materials, evironment or equipment settings. This phenomenon is known as concept drift and can be problematic for classification models trained on previous data [47]. Concept drift combined with imbalanced data can compound the difficulty of the problem [6, 17]. Due to the size of the dataset and the limited number of nonconforming examples available, this work does not explicitly address the issue of concept drift. The tests of the final models include new test data acquired over an additional month after the original training/validation/test split was performed. The results of these tests did not indicate a noticeable issue with concept drift, though the additional data was collected over a relatively short timescale. It is reasonable to suspect the nature of the tire manufacturing process can potentially lead to concept drift. The investigation of its impact and mitigation solutions is left to future work when more images are available.

5.3 Performance Metrics

During the training/validation phase, we will select models that best minimize the loss function that will be defined in Section 6.3. However, metrics that translate to business impact need to be used when evaluating the models on test sets. The metrics used to evaluate the binary and multiclass models are now discussed.

5.3.1 Binary Classification Metrics

As previously defined, a tread cap image with a nonconformity present, yj = 1, is the positive case and a passing tread cap image, yj = 0, is the negative case. The confusion matrix is defined with these definitions in Table 5.1. A true positive corresponds to the model correctly labeling a nonconforming example as nonconforming. A true negative indicates the model correctly labeled a passing example as passing. A false positive means the model incorrectly flagged a passing example as nonconforming. A false negative occurs when the model incorrectly applies a passing label to a nonconforming example. The binary model has

(38)

Table 5.1: Binary Model Confusion Matrix Predicted Class Negative Positive (Passing) (Nonconformity Present) True Class

. True Negatives False Positives

Negative (Passing) (TN) (FP)

Positive _{False Negatives} _{True Positives}

(Nonconformity _(FN)

(TP) Present)

a single output that can be interpreted as P(yj = 1 xi, E). To transform this probabilistic score to a label, a decision threshold d must be chosen. The model labels examples that receive a score above d as nonconforming and below d as passing. The selection of this operating point depends on business needs. The tradeoff between the true positive rate (TPR) and false positive rate (FPR) as d varies can be visualized with a receiver operating characteristic (ROC) curve curve as presented in Figure 5-1. A perfect classifier has an operating point at [0,1] in the upper left corner of the ROC plot. The diagonal line from [0,0] to [1,1] represents the performance expected from random guessing.

The area under the curve (AUC) can be a useful way to reduce the information from the ROC curve into a single scalar metric for model comparison. The AUC is equivalent to the probability that the model will output a higher probabilistic score for a randomly chosen positive example over a randomly chosen negative example [4]. As noted in [35] however, only considering the AUC can be misleading. While a model with an ROC curve that strictly dominates another will have a higher AUC, the converse is not necessarily true and different models may dominate in different operating regions. For this reason, it is important to still compare the ROC curves of models to understand how they differ at different operating points.

In practice, the business information required to determine the optimal decision threshold is not currently available. Since the current state process requires the manual inspector to inspect every tire, this question was previously not crucial. Asking this question during

(39)

1.0- 0.8-ap 0 0.6 ale 0.4-0.2 0 -- _{Model (AUC} ₌_0.79) - -- Random 0.0 0.0 0.2 0.4 0.6 0.8 1.0

False Positive Rate

Figure 5-1: Example ROC Curve. The true positive rate of a binary model is plotted against the false positive rate. This allows a visualization of the tradeoffs involved depending on the decision threshold. The diagonal line represents random guessing.

(40)

model development has had the positive benefit of spurring the plant to consider this more deeply. When presenting the simulated scenario in Chapter 7, we show the results at multiple decision thresholds to illustrate the expected performance with different business needs. From an optimization viewpoint, for the model training presented in Chapter 6, we take actions to evenly balance the classes despite their imbalanced representation in the data. This implies that the loss value of a single missed minority example is equivalent to the loss value of many missed majority examples (the exact number depends on the degree of imbalance). The loss values and decision thresholds can easily be adjusted as more granular information becomes available.

5.3.2 Multiclass Metrics

The output of the multiclass model will be a vector where each entry represents the nor-malized probability the image belongs to the class corresponding to that entry position. In a multilabel setting where an example image can contain more than one possible class, a decision threshold can be chosen for each individual class similar to the binary decision threshold discussed above. While this scenario is possible for our problem, the available data does not contain examples with more than one class. Due to this, for simplicity we restrict ourselves to the multiclass situation where it is assumed an image can only belong to a single class. With this assumption, we will use a rule that labels an example based on the class with the highest predicted probability. We can then evaluate the test results by using a multiclass confusion matrix and calculating the recall and precision for each class. We can consider class i true positives (TP) as correct class i predictions, class i false positives (FPi) as examples from another class incorrectly predicted as class i, and class i false negatives

(FNi) as examples from class i that are incorrectly predicted as another class. Using these definitions we can calculate recall and precision for class i as:

Recalli = TP (5.1)

TPc + F N

Preci~sioni = T* (5.2)

T P + F Pi 40

(41)

Recall is the same as the true positive rate for class i and provides a measure of how many of the total relevant class examples are labeled correctly. Precision provides a measure of how often the model is correct when it predicts a particular class.

5.3.3 Localization Performance

Beyond a pure probability prediction, it is important for the automated inspection system to be capable of locating the position of nonconformities when they are present. This is also important during model development to understand if the model makes its predictions for the correct reasons. A localization heatmap is generated to accomplish these goals using techniques discussed in Section 6.6. The heatmaps are reviewed during evaluation to gain a qualitative sense of how well the models localize nonconformities.

5.4 Chapter Summary

The models for this problem are developed using a supervised learning approach. Challenges related to this approach include the small dataset, class imbalance and concept drift. For the binary classification problem, an ROC curve is the main performance metric used to evaluate the model's effectiveness on the test set. For the multiclass model, a multiclass confusion matrix along with the calculation of precision and recall for each class are used for evaluation. In addition, the models produce a localization heatmap that can be used to evaluate localization effectiveness.

(42)

(43)

Chapter 6 Model Design

Machine learning algorithms have a variety of design decisions and hyperparameters that cannot be determined by the learning algorithm itself. This chapter discusses these decisions and hyperparameters and the method or reasoning used for their selection. The models were developed using a combination of Tensorfiow [30] and Keras [7] machine learning frameworks.

6.1 Transfer Learning and Model Architecture

As described in Chapter 5, a major challenge when working with small datasets is the tendency for learning algorithms to overfit. We employ transfer learning to help combat this issue. Donahue et al. show in [8] that after pretraining a deep convolutional architecture on a large labeled database, the learned features tend to cluster into interesting categories the network was never trained for. This demonstrates the potential for generality of some of the features learned by CNNs. Yosinski et al. show higher performance in [52] when pretrained weights are used to initialize a network compared to randomly initialized weights. They also state that the initial layers of most deep neural networks trained on images learn features that resemble Gabor filters or color blobs. In [53], it is shown that a CNN pretrained on a large ImageNet dataset can achieve state-of-the-art performance on small datasets from a different task by simply retraining the final classification layer on the new task's data. They show this model significantly outperforms the same CNN architecture that was trained from scratch on only the new task's data. Essentially, it seems the lower level features learned by

(44)

CNNs from images in different domains can be similar and it is often only the higher level features and how they are combined that may significantly differ. While dog and cat images are quite different from uncured rubber images to a human eye, the lower level edges and shapes are not that different in pixel form. We can take advantage of this and use a network pretrained on a large general image database to learn the lower level features. This allows us to bypass the large amount of data required to learn these lower level features. We can then focus on training the model to learn higher level features required to classify images for our task.

There are a number of readily available sets of pretrained model weights such as VGG19 [42], ResNet50 [50] and Inception-V3 [45] among others. The weights from the Inception-V3 model were selected to use as the baseline for this work due to this model's high performance on the ImageNet validation dataset and its relatively low memory requirements at 92 MB. This model was originally trained on the ImageNet Large Scale Visual Recognition Challenge data to classify images into 1000 different classes. A decision point when designing the convo-lutional layers of a CNN is the kernel size of the convoconvo-lutional operation, such as 3x3, 5x5 or 7x7. The Inception architecture combines multiple sized convolutions within a single layer, or module, increasing the width of the network. These modules are then stacked together to increase the network depth. It also takes advantage of lxi convolutions for dimensionality reduction as suggested in [28]. For Inception-V3, the authors implemented additional mod-ifications for increased performance such as replacing convolutions greater than 3x3 with a series of smaller convolutions. A detailed explanation of the original Inception architecture can be found in [44] and of the Inception-V3 modifications in [45].

There are two main approaches when transfer learning with CNNs. The first is to freeze the transferred weights when training new layers on the target task to ensure the transferred weights are not updated. The other approach is to fine-tune the transferred weights by allowing them to be updated when training on the new task. A variation includes freezing only certain layers and fine-tuning others. It is important to avoid overfitting the training data, so the best approach depends on the number of weights in the network and the amount of training data of the target task. For our model, we remove the classification layers of the original Inception V-3 model and then experiment to determine how much of the pretrained

(45)

- -- U

-INr rClosfer

Incepo nV3 Global H ondtdn Siz Ixnx Moe O* * utput:

Hxi24d1e208llx(# __________o________ f oncnfrmty- .8

WegAeMHidden Units infta Fcina 0 0

(Weights Frozen) Pooi Hidden Un50n ia

In~~~utL .ieI:Prediction :o

29 x3 2 99x3 _8x)24+ _Slgig _Functon

CM 2

Siz Ou Sie Ot- Size Out: Size Out: 1xlx1 J

8x x24 104 1xlx(# Of Multiclass Case:

Hidden Units In Softmax Function

-Final FCHL) Size Out: lxlx4

Figure 6-1: Model Architecture

network to transfer. Specifically, we treat the Inception Module number as a hyperparameter and experiment with the use of Inception Module eight, nine and ten as the output of the pretrained network. Due to the small training set size, we opt to freeze all of the transferred Inception weights to prevent overfitting. As a result, the pretrained architecture can be treated as if it is a single layer. We then add a global average pooling layer (like the original Inception V3 model) and n fully connected hidden layers (FCHL), where n > 0 and is a hyperparameter to be determined by experimentation. The number of hidden units for each FCHL is also treated as a hyperparameter. The hidden units are activated using Rectified linear unit (RelU) [31] activation functions, and the weights are initialized using a He normal [16] initialization. Finally, we add a classification layer which will output the model's probability estimate. The weights of the final classification layer are initialized using Glorot uniform [10] initialization. Batch normalization [18] is used to to add robustness to the initialization process and to increase the speed of training. The model architecture is presented in Figure 6-1.

6.2 Data Augmentation

We also use data augmentation to help improve the model's ability to generalize despite data limitations. By transforming images in the training set, we can increase the number of different images observed during training and reduce the tendency of the model to overfit specific training example features. Data augmentation has been shown experimentally to improve CNN performance [49]. In practice, it is recommended for CNNs in most image recognition scenarios. The data augmentation transformations we use during training are

(46)

Table 6.1: Data Augmentation Transformations

Transformation Value Description Simulated Effect on Features Horizontal Flip 0.5 Probability of random Locations/orientations

horz. flip

Vertical Flip 0.5 Probability of random Locations/orientations vert. flip

Width Shift 0.98-1.02 Range for random Locations horz. shifts

Height Shift 0.98-1.02 Range for random Locations vert. shifts

Range for random

Zoom 0.96-1.04 Size

image zoom

Shear -0.2-0.2 Range for randomSize/orientation shear angle (radians)

Pixel Intensity 230-255 Range Image Contrasts pixel intensity rescaling

presented in Table 6.1. They are performed in real time by the CPU during training. The transformations are performed randomly within a specified range of values. The ranges were chosen to ensure the class labels are maintained after the transformation. This method of real time data augmentation ensures the model will essentially never observe the exact example more than once during training.

6.3 Optimization

For the training optimization, we define a loss function L that allows for a comparison between the model's output and the ground truth label. For a single example xi, we can define Li as the negative log-likelihood:

(6.1)

Deep learning for automated visual inspection of uncured rubber

Deep Learning for Automated Visual Inspection of

Uncured Rubber

James Thomas Howard Smith

Signature

redacted...

redacted

g

a u e r d c

d

...

redacted

redacted...

JUN 2

5

2018

Deep Learning for Automated Visual Inspection of Uncured

Rubber

by

James Thomas Howard Smith

Abstract

Acknowledgments

Contents

List of Figures

List of Tables

Glossary

Chapter 1

Introduction

1.1

Project Motivation

1.2

Problem Statement

1.3

Approach

1.4

Contributions

1.5

Thesis Outline

Chapter 2

Problem Background

2.1

Tire Manufacturing and Inspection Process

2.1.1

Manufacturing Process

2.1.2

Need for Visual Inspection

2.1.3

Manual Inspection Procedure

2.2

Automated Inspection

2.2.1

Automated Inspection Goals

2.2.2

Existing Equipment

2.3

Chapter Summary

Chapter 3

Overview of Available Data

3.1

Original Available Data

3.2

Image Sorting

#

3.3

Rescaling

#

3.4

New Data

3.5

Chapter Summary

Chapter 4

Related Work in Image Classification

4.1

Tire and Rubber Industry Image Classification

4.2

General Image Classification

4.3

Deep Learning Overview

4.3.1

Learning Methods

₃