A case-based system to support electronic circuit diagnosis

(1)

(2)

.^

(3)

(4)

(5)

HD28

.M414

no.

WORKING

PAPER

ALFRED

P.

SLOAN

SCHOOL

OF

MANAGEMENT

A

Case-based

System

to

Support

Electronic

Circuit

Diagnosis

Thilo

Semmelbauer,

Anantaram

Balakrishnan,

and

Haruhiko

Asada

WP#

3497-92-MSA

October,

1992

MASSACHUSETTS

INSTITUTE

OF

TECHNOLOGY

50 MEMORIAL

DRIVE

CAMBRIDGE,

MASSACHUSETTS

02139

(6)

(7)

A

Case-based

System

to

Support

Electronic

Circuit

Diagnosis

Thilo

Semmelbauer,

Anantaram

Balakrishnan,

and

Haruhiko

Asada

WP#

3497-92-MSA

October, 1992

(8)

(9)

A

Case-based

System

to

Support

Electronic

Circuit

Diagnosis

^

7/7/70

Semmelbauer

LeadersforManufacturing M. I. T., Cambridge,

MA

Anantaram

Balakrishnan

Sloan Scfiool of

Management

M. I.T., Cambridge,

MA

Haruhiko

Asada

Department of Mechanical Engineering M. I. T.,

Cambndge,

MA

(10)

(11)

Abstract

Diagnosis

and

repairoperationshave traditionally been

major

bottlenecksin electronics

circuit

assembly

operations because they arelabor intensive and highly vanable.

Rapid

technological advances, shrinking product lifecycles, and increasingcompetition inthe

electronics industry have

made

quick

and

efficientdiagnosis critical forcostcontroland quality

improvement,

while simultaneously increasing the difficulty ofthe diagnosis task.

This paperdescribes acase-baseddiagnosis support system to

improve

the effectiveness

and

efficiencyofcircuitdiagnosis by automatically updatingthe setof candidatedefects,

and

prioritizingtestsduring thesequential testing process.

The

case-based

system

stores

individual diagnostic instancesratherthan generalrulesand algorithmic procedures; its

knowledge

base

grows

as

new

faults aredetected

and

diagnosed

by

theanalyzers.

The

system

providesdistributed access to multiple users,

and

incorporates on-line updating features that

make

itquick toadapt tochanging circumstances.

Because

itis easy to install,

update,

and

integrate withexisting systems, this

method

is well-suitedfor real

manufacturingapplications.

We

have

implemented

a prototype version,

and

tested the

approach

in anactual electronics

assembly

environment.

We

describethe system's underlyingprinciples, discuss

methods

to

improve

diagnostic effectivenessthrough

principledtest selection

and

sequencing,

and

discuss managerial implicationsfor successful implementation.

(12)

1.

Introduction

1.1

Overview

Electronic circuitdiagnosis is theprocess ofidentifying the

component(s)

thatis

responsible for the malfunction ofacircuitboard so thatcorrective (repair)actioncan be taken.

Subsequent

exploration ofthe problem's rootcauses motivates process

and

product

improvement

efforts.

The

diagnosis

and

repairloophas traditionally

been

a

major

bottleneckincircuit board assembly. Recenttrends inthe electronics industry such as the

rapid technological advances,increasing product complexity,

and

shon

product lifecycles,

have considerably increasedthe

imponance

ofquick

and

efficient diagnosis, while

simultaneouslyincreasing the difficultyofthe diagnosis task.

Reducing

diagnosistime

and

variability,

and improving

diagnosisaccuracy

becomes

critical as electronics

companies

strive towards lean manufacturing, quickresponse,

and

greater flexibility.

This paperdescribes acase-based diagnosissupport system that

we

developed

in

consultation with a

high-volume

consumer

electronics

assembly

plant.

The

plant, faced witharapid turnoverofproducts

and

an increasing

shonage

ofskilledtechnicians,

wanted

a

means

to

improve

the efficiency

and

effectivenessofcircuitdiagnosis, reduce the

workload

on

its

few

expert analyzers,

and

develop a tool to train

new

technicians, without

devoting significant

system development

resources.

An

informal survey ofavailable systems

and

current practice revealed that, althoughthe literaturedescribes several

rule-based

and model-based

systems to supportdiagnosis, these systems

were

not widely used by theelectronics industry partlybecause they require considerable time

and

special

expertise toinstall (e.g., for

knowledge

acquisition)

and

update.

The

case-based approach

proposed

in this paperoffers apractical solution to thediagnosis support needsof industry.

Itaccumulates

knowledge

on-line asanalyzers detect

and

diagnose

new

faults.

Using

this

information about priordiagnostic instances, the systemeliminates candidate defectsduring

thesequential testing process,

and

prioritizes subsequenttests.

We

have

implemented

a

prototype versionofthe system,

and

testeditin anactual electronics

assembly

environment

We

describe the system's underlying principles,outline

methods

to

improve

diagnostic effectiveness

by

reducing the

number

ottests

needed

to

complete

the diagnosis,

and

discussimplementationaspects.

1.2 Why

is circuit

diagnosis important?

Rapid changes

inelectronics product

and

process technology, such as smaller,

surface-mounted

components,

higherboarddensities, and

more

complex

circuitry,

have

made

circuitdiagnosis

more

challenging.

With fewer

accessibleconnections

and

testpads, in-line functional testing can only identify theapproximate area orblock of

components on

the

(13)

boardthat isresponsible for the malfunction. Detaileddiagnosis

must

be

done

off-line

throughextensive probing

and measurement.

Each

component

and

connection on a circuitboard isa potential defect site,

and

every

sitecan

have

severalpossible defects such asan

open

or shortcircuit(due topoorsoldering or

component

misalignment),

component

malfunction, inherentflaw inthe circuit design,

placement

ofthe

wrong

component,

or defect in the board'sinternal wiring.

The

failure rate ofa board containing hundreds of

components

is, therefore, severalordersof

magnitude

higher than theprobability ofa singledefective

component

or operation,

making

testing

and

diagnosis inevitable

even

with highly capable placement

and

soldering

operations. Isolating the rootcausein such

complex and

densecircuitsis arduous

and

time

consuming.

Annual

expensesdirectly related todiagnosis

and

repaircan

exceed

several

hundreds

ofthousands ofdollars

even

in a

medium-sized

facility.

Work

inprocess atthe

diagnosis stage

and

delayed feedbackforprocess control further increase this cost.

More

importandy, the time

and

effortrequiredtodiagnose a faultyboard is highly variable(from a

few

minutes toover

4

hours)

depending

on

the type ofdefect, the analyzer'sskill, the

effectivenessofdiagnostic tools,

and

the incentive

and performance

evaluation systems.

Consequendy,

testing, faultdiagnosis

and

repairoperations areoften

major

bottlenecks in

medium

or high

volume

printedcircuit board

assembly

facilities.

The

cost

and

effort to train techniciansis also

becoming

a significant concernfor

electronics companies.

A

new

technician requires several

months

oftraining

and

experience with a particularboard to understand thecircuit'scause-effect behavior

and

the

potential rootcausesin itsmanufacturing process.

As

productlifecycles shrink, the

training costs

must

beamortizedoversmaller

volumes

foreach board. Diagnosis

bottlenecksareespecially acute during thecritical production

ramp-up

period following a

new

productintroduction. Capturing marketshare earlyis essential forcompetitive survivalin the electronics industry, butproduction facilitiesoften cannot achieve the

necessaryrapid

ramp-up

because yields are low,

and

analyzers are least productive since they

have

not

had

significantexperience with the

new

product.

Fast

and

accuratediagnosis notonly reducesdirectmanufacturingcost

and

flow time, butisalsoan essentialprerequisite forsuccessful quality

improvement

efforts tiirough

improved

product design (e.g.,

Soederberg

[1992]), better process understanding

and

characterization, tighterprocesscontrol,

and

close supplierpannership (e.g.,

Wenstrup

[1991]).

The

ability toprovidequick information feedback

from

the diagnosis stage to

previous stages (includingvendors) is particularly important in

medium

and

high

volume

environments.

For

instance, a snick glue pin ora

worn

placement

head, ifnotdetected

(14)

1.3 Computer

support

for circuit

diagnosis

Manual

diagnosisofcircuit boards can be both inefficient

and

error-prone. Empirical evidence

(Rouse and

Morris [1985],

Rouse and Hunt

[1986]) suggests that

humans

are not optimaJ diagnosticians since they

have

difficulty ineffectively utilizing all the available test

data toprune the listof candidate defects

and

systematically selectthe testsequence.

On

the

other hand, building acompletely

automated

system that iscapable ofanticipating and diagnosing all possibledefectscan be prohibitively expensive; sucha system

might

also require extensive time

and

effortby specialists (design and process engineersor software experts) toinstall

and

maintain. Therefore, a system that assists, ratherthan replaces,

human

diagnosticians is

more

practicable.

To

cope

with the

dynamic

electronics

environment, the

system

must

berobust

and

flexible, i.e., it shouldeasily adapt todesign

and

process changes. Practitionersprefera flexible

system

that possiblydiagnoses only

the

most

common

defect types (thataccountfor avastmajorityofboard failures) to

one

that offers

comprehensive

diagnostic coveragebutcannot easily adapttochanges.

The

case-basedapproach offers the potential to

meet

these requirements. Unlike other

knowledge-based

systems, the case-based system accumulates

knowledge

on-line

by

storingdiagnostic instances (asanalyzers identify

and

diagnose

new

faults)ratherthan generalrulesoralgorithmic procedures.

Each

diagnostic instanceorcasecorrespondsto a distinctdefect; itis represented

by

the setoftestsconducted

and

theirobserved

outcomes

for that defect.

By

comparing

the testresults for the currentboard with previous instances,

the

system

can eliminatecandidatedefects

and

determine the effectivenessof subsequent

tests.

Recording

diagnostic instances rather than general rules has the p)otential

disadvantageof requiring a large database.

However,

our prototype implementation experience suggests that thevastmajority offaultsencountered in practicecorrespond toa

very small fractionofthe setofall possible instances.

Because

ofits simplicity, the system can be maintained

by

the analyzers themselves, avoiding long knowledge-acquisition delays

and

the consequentsystem obsolescence. Furthermore, thestandardformat ofacase

promotes

learning

and

information sharing

among

technicians (fromdifferent shiftsor productionlines),

and

enables the systemto

serve also as a trainingtool for

new

technicians.

These

features

make

the case-based

system

viable

and

usefulfor real manufacturing applications.

Although

case-based

reasoning has

been

appliedtoother

domains

such as legal

argument and customer

service,

we

arenot

aware

ofits previousappHcation to electroniccircuitdiagnosis.

We

enhance

the

basiccase-based approachtoincorporate historical data

and

other priorinformation for

systematically selectingtests

and

optimizingthe search process.

To

validate the overall approach,

we

implemented and

testeda prototype versionofthe system in a

high-volume

printedcircuit board (surface-mount)

assembly

line producinga hybrid (analog-Kiigital) circuit board for a state-of-the-art

consumer

electronics product.

Our

implementation

(15)

permitsdistributedusage via a local access network. Within five

weeks

of installation,the

system

achieved

90%

coverage, i.e., the system had acquired

enough

diagnostic instances

tocorrectlydiagnose

90%

ofthedefects occuring in the fifth week.

Based on

our implementation

expenence,

we

offersuggestions to

improve

the

management

ofthe

diagnosis function; in particular,

we

propose a proactive

management

approach and incentive

schemes

that

emphasize

the analyzer's process

improvement

rather than

fault-findingrole.

The

restofthis paperisorganized as follows. Section 2 describes the circuitdiagnosis process in

more

detail,

and

develops thegoals foran idealdiagnosis

suppon

system. Section 3 discusses the underlyingstructure

and

operation ofa basiccase-based

system

for

verifying defects based

on

testoutcomes. Section

4

presents system

enhancements

to

optimizetest selection usinghistorical dataand otherpriorinformation on the likelihoodof

different defects.

We

present a

dynamic

programming

formulationoftheoptimal test

sequencing

problem

to

minimize

theexpected

number

oftestsper board, andoutline

heuristic

methods

foron-line testselection. Section 5 briefly

descnbes

ourprototype implementation

and

the lessons

we

learned

from

this implementation exercise,

and

identifies directions forfurtherwork.

2. The

Diagnosis

Process

2.1 In-line

testing:

Capabilities

and

tradeoffs

Circuitboard

assembly

linestypically contain in-line inspection

and

testing stages, after

placement

and

soldering operations (see, forinstance.

Noble

[1989] for adescription ofthe

processing steps in printedcircuit board assembly).

The

main

purpose ofthese in-line

screening operationsis toidentifydefective boards,although they also

have

limited diagnosticcapabilities.

For

instance,visual

and X-ray

inspectioncan identify missingor misplaced

components and

possiblycertain solderdefects, but are not effectiveorreliable fordense boards with small

components.

Furthermore,

complete

manual

orX-ray inspection

can

be very time

consuming and

expensive.

Two

typesofin-line electrical tests areavailable

—

in-circuit testing

and

functional

testing. In-circuittestsverify the functionalityofindividual

components on

the boardusing

automated

test

equipment

(see, for instance.

Maunder

and

Tulloss [1990]).

They

can

isolate

and

precisely identifycertain typesofdefects (e.g.,

open

or shortcircuits,defective passive

components),

but require hardfixturing

and

programming

that is difficult to justify forshort production runs.

Moreover,

the increasing useof surface-mount technology

and

the high costofboard"realestate"

make

in-circuit testing infeasibleorlimit theirdiagnostic

capabilities(Garcia-Rubio [1988]). In-circuit testsare alsoprone to

measurement

errors (see, for

example.

Chevalier [1992]) especiallyfor small, high

performance components.

(16)

leadingto false acceptanceor rejectionofboards. Functional tests

measure

the circuit's

response (atthe board's output

pons

or

edge

connectors) todifferentcombinations ofinput

signals. Functional testscan confirm

whether

the interactions

between

various

components

meet

the specifications; typically, they can trace aboard's failure to the responsible circuit,

but not to aparticularcom]X)nentorconnection in this circuit.

Selecting the testing strategyfor aproduct involves addressingtradeoffs

between

higher

assembly

cycletimes (forall boards) toconductextensive in-linediagnostic tests

versus higheroff-tinediagnosis effon (fordefective boards)

when

the in-line tests

do

not

have

highresolution.

As

theboarddefect rate decreases

and

component

density increases, this tradeoffincreasinglyfavors the strategyofusingin-line testingonly to identify

defective boards; detaileddiagnosis

and

defectverification istiien

performed

off-line by

skilledtechnicians.

We

next describethisoff-line diagnosis process,

and

identify

opportunities tosupport this process usinga

computer-based

system. Forbrevity,

we

will

use the

word

"diagnosis" to referto the detailed, off-line diagnosisprocedure after a board hasfailed in-line (functional) tests.

2.2 Sequential

testing

procedure

for off-line

diagnosis

Diagnosing

acircuitboardthat has failed functional testsentailsidentifying tiie source ofthe malfunction (e.g., acompxDnentorconnection

on

the board)so thatappropriate corrective action (e.g.,replacing or resoldering the

component)

can betaken torepair the

board

Diagnostic information also serves as the input forrootcause analysis

and

continuous

improvement

of vendors, processes,

and

product design.

We

define a dfifg^las the finest grain source ofmalfunction thatcan be

unambiguously

identified

by

the analyzer.

Each

defect hasan associated corrective action; different

defects

might

require the

same

correctiveaction. For instance, adefective resistorcanact

aseithera shortcircuitoran

open

circuit. Ineachcase, the

symptoms

and

the

means

to

identify the

problem

are different; therefore,

we

view

these

two

possibilitiesas separate defects

even though

bothrequirethe

same

corrective action (namely, replacingthe

resisitor).

Each

defect has asetofassociated

symptoms

thatthe analyzer

must

discover

by

applying

one

or

more

tests.

A

testis an action that

must

be

performed

either

by

a

technician or

by automated

testequipment, resultingin an observation aboutthe behavior of the board.

A

test

might

consistof applying a specified setofelectrical inputs, adjusting

certain circuitelements (e.g.,tuningcapacitors),

and

probing selectedlocations

on

the circuit boardto

measure

thevoltage or

view

the electrical signal

on

anoscilloscope.

Each

test has

two

or

more

possible resultsor

outcomes

(e.g.. voltage is greater than 4.5 volts orless than 4.5 volts); observe thatif theoutput

measurement

iscontinuous (e.g., voltage)

we

discretize it by definingtherange of continuous values associated with each outcome. For convenience,

and

without loss ofgenerality, allofour subsequent discussions

assume

(17)

thatevery test has

two

possible

outcomes which

we

denote as or 1.

The

outcome

ofa test

depends on

the defectin the board.

We

permitdefects tohave indeterminate

outcomes

for certain tests, e.g., foracertain defect, the voltage at a particularlocation

may

beeither

or 1

depending

on, say, the state ofa tlip-flopdevice.

We

also recognize thatthe analyzer

might have

onlypartial priorinformation regarding

which

defect produces

what outcome

foreach test;

however,

we

assume

thatthe available information is adequate to uniquely

identifyeachdefect.

To

identify the truedefect in a malfunctioning board, the analyzeruses a sequential testing

procedure

.

The

process starts with a candidate set ofdefects

(which

depends

on

the board'sfailure

mode

during functionaltesting)

and

a setoftests thatcan distinguish

between

thesedefects.

The

analyzersuccessively selects

and

applies varioustests,

observes the

outcome

ateach stage,

and

eliminates (fromthe candidatedefect set)all

defectsthatcannot

produce

theobserved outcomes. Ifthe initialcandidatedefect set is

complete

(i.e.,the setcontains all possible defects),

and

the setofavailabletests is

comprehensive

(i.e.,

some

combination of

known

test

outcomes

uniquely identifieseach candidatedefect), theprocedure terminateswith the truedefect as the only remaining

member

ofthecandidatedefect set.

Inpractice, analyzers

might

not

always

followa systematic procedure to selecttests

and

eliminate defects. Diagnosis is oftenconsidered an "art",

and

analyzers rely largelyon

intuition,

some

circuit

knowledge,

memory,

and experience as they

perform

diagnosis. Often, they

do

notformally maintain

and

update thesetof candidatedefects

by comparing

the defects' expected test

outcomes

with the observedoutcomes. Indeed, the setof candidatedefects

and

theirexpected

outcomes

may

noteven be

documented,

and

might

vary

from

technician totechnician.

A

common

"myopic

"

approach fortroubleshooting consistsofhypothesizing aparticular defect,attempting toverify it by performingthe

appropriate tests, revising theguess ifthe testsdisprove the first hypothesis,

and

soon.

Alternatively, theanalyzer

might

follow apredeterminedtestsequencerepresented, for

instance, as afault tree.

A

purely

manual

diagnosis process can be both inefficient and inaccurate especially

when

itisnotsupported

by

clearprocedures and documentation. Forinstance, using a staticfault treecanresult in unnecessary tests and unproductive search (relativetoan

approach

that usesrecent defect history to guide the search). Likewise, diagnosis without adequate

documentation

and

formal recordingprocedures canlead to replicationoftestsand

falseidentification. Also, these informal approachesare not

conducive

to learning,

transfering

knowledge

toother analyzers, or generating appropriate information forquality

improvement.

To

identifythedesirablecharacteristicsofa

computer-based system

toassist

(18)

2.3 Diagnosis decision

support

requirements

Consider an intermediate stage in thediagnosis process

when

the analyzerhas applied severaltests

and

observed the resultsof each test.

The

analyzer

now

faces fourrelated

questions,all

aimed

atdeciding

what

to

do

Given

the

outcomes

ofall previoustests, can

we

concludethat the boardcontainsa specific

(known)

defect?

(2)

Does

the boardcontain apreviously

unknown

defect?

(3) If

more

than

one (known)

defectremain ascandidates,

which

available test(s)can distinguish

between

these defects?

(4)

What

testtoapply next?

We

will refer to the first

two

questionsas Defect Identification,

and

the

remaining

two

as

Test

Selection.

Consider

the information needs and actionsfor thedefect identificationquestions.

An

affirmative

answer

toeitherquestion takes the boardout ofthe regulardiagnosisiteration.

In the firstcase, the action consists of repairing the identified defect,possibly after

confirmingthe diagnosis. If

no

known

defectcauses the

observed

cumulative test

outcomes

(i.e.,the

answer

toquestion 2 is affirmative),

we

have three possibilities: (i) the

boardcontains a

new

defect,or(ii) the

assumed

test

outcomes

forthe

known

defectsare

inaccurate,or(iii) the actualtest

outcomes were measured

incorrectly. In all tiiree

simations, the boardrequires extraordinary actions such asconfirming previous test

outcomes,

consulting with design

and

process engineers, checking with suppliers,

and

updatingcurrent

knowledge.

Ifmultiple

(known)

defects

remain

in thecandidate set, thetestselection questions first

seektoidentifyrelevant tests,not yetperformed, thatcandistinguish

between

the

remaining

possibledefects. Conceptually, thisprocess corresponds to determining,for

eachavailable test, ifthattesthasdifferentexpected

outcomes

for the candidatedefects. If

no

suchtest is available,theanalyzer

must

develop

and

apply

one

or

more new

tests to

distinguish thedefects. Otherwise,

we

must

decide

which

among

the availableteststo

perform

next.

As we

shall see later, thisdecision impacts the

number

ofiterations

needed

to

complete

thediagnosis.

Computer suppon

for the defect identification questions

might

consistofa databaseor

logical reasoning system thatautomatically

compares

the observedtest

outcomes

with the

expected

outcomes

forall candidatedefects,

and

eliminates defects with inconsistent outcomes. Similarly, to supporttestselection, a

computer-based system

can identify the

relevanttestsateachstage,and apply a systematic

method

to select the nexttest.

To

be

(19)

initializing

and

updating,

and

quick response.

We

list

below

these

and

other desirable featuresofa

computer-based system

to

suppon

electronic circuitdiagnosis:

•

_Development

_effort:

_Given

the short productlifecyclesofelectronic products,

we

require a

system

that

becomes

operational quickly and withoutsignificantexpense.

•Maintainability/ robustness:

The

system

must

be easyto update, preferably by the users themselves; relying

on

external resources to update the system (e.g., software

expens

and

knowledge

engineers) often introduces delays, causing inconsistencies

and

system obsolescence.

Another

important requirement isthe system'sability to adaptgracefully

tothe frequentdesign

and

process

changes

thatcharacterize theelectronics industry.

• Compatibility: Techniciansoften

view

structured tools and methodologiesas

impediments

to creativity; compatibility with current practices

and

the cultureofthe

organization is, therefore, critical tothe success

and

effectivenessofthe system.

System

ergonomics, easeofuse, flexibility,

and manual

ovemde

options are

some

of the

ways

to

improve

compatibility.

•

_Response

_{time/efficiency:} Since

each

boardrequires severaltest iterations toprune the

candidate defect set, the

system must

respondrapidly tothesequestions so thatdiagnosis

does

not

become

the

botdeneck

activity.

• _{Effectiveness:}

The

main

purpose ofadiagnosis support

system

is toreduce thediagnosis time per board. Partofthissavings

comes

from

eliminating errors

and

automating

some

oftheroutine

bookkeeping and

lookup functions.

Two

other features can reduce

diagnosistime:

• _reducing the (expected)

number

oftests through systematic search

and

principledtest

selection

and

sequencing; and,

• providing

good

"coverage", i.e.,

when

the

system

has stabilized, it

must

correctly

diagnosea large percentageofthe defects.

Other

desirable

system

featuresincludeabilities to: interface with existing databases, prepareperiodic

summary

reports for process

improvement,

highlight inconsistencies

and

faciUtate learning,diagnose bothdigital

and

analog (e.g., radiofrequency) circuits,

and

operatein multi-userenvironments.

The

next section uses these

performance

criteriato

motivate thecase-based

method.

3.

The

Case-based

Approach

for Circuit

Diagnosis

3.1

Knowledge

base

for

diagnostic

systems

Circuitdiagnosisrequires three broadclassesof

knowledge

—

historic, heuristic,

and

fundamental

knowledge.

Historic

and

otherpriorinformation

on

the likelihoodofdefects can help theanalyzer prioritizecandidatedefects, thusenabling an effective choiceoftest

(20)

analyzer mightfirst

examine

that

pan

before pert'orming othertests. Heuristic orempirical

knowledge, sometimes

termed "shallow"

knowledge,

refers to an

empincal

association

between

observed

symptoms

and diagnostic conclusions. This category

might

include rules

and

procedures,as well astricks orshoncuts thatthe analyzer has

found

to be effectiveafter

some

troubleshootingexpenence.

Fundamental

or"deep"

knowledge

impliesan understanding ofthe underlying physicsofcircuitelements

which

we

use to

predictcircuit response.

Design and

testengineersrely

on

such fundamental

knowledge,

often

encoded

ina circuit simulator, todesign and troubleshoot electronic circuits.

Depending on

the primary sourceof diagnostic

knowledge,

we

can distiguish

between

experience-based (empirical

knowledge) and model-based

(fundamental

knowledge)

systemstosupportcircuitdiagnosis.

The

model-based

approach uses ananalytical

model

ora circuit simulatortodynamicallyidentify the potential causes forthe observed

symptoms

duringthe sequential testingprocess. Since diagnosisentails "backtracking"

(i.e.,given ceruin

observed

symptoms

or outputs,

what

valuesofcircuitelements can

produce

these outputs),

we

need

a translator to

conven

traditional circuitdesign

models

(thatcharacterize the output foragivencircuitconfiguration) intocorrespondingdiagnostic models.

Davis and

Hamscher

[1988] illustrate the three basic steps

—

hypothesis

generation, hypothesistesting,

and

hypothesisdiscrimination

—

in

model-based

diagnosis. Gensereth [1984],

deKleer

and Williams

[1987],

and

Hamscher

[1988] describe various

model-based

systems todiagnosedigital circuits.

The

most

common

experience -based a

pproach

is arule-based (expen)

system

that relies

on

the experience of

human

expertsto

develop if-then rules thatenablethe systemto reason aboutthe behavioroftiie

malfunctioning board (see

Semmelbauer

[1992] for an illustration ofthis approach).

One

ofthe pioneering apphcations ofthe rule-based

approach

was

for medical diagnosis (e.g.,

see

Harmon

and King

[1985]). Unlike the fieldof medicine,

however,

electroniccircuits

and

manufacturing processes vary widely,

and

thetechnologies

undergo

frequentand radical changes. Ingeneral, the literature

on knowledge-based

diagnosticsystems focusses

on

knowledge

representation issues

and

inference

mechanisms

to identify the underlying source oftheproblem,

and does

not adequately addressissuesofoptimizing the sequential search process.

For

electroniccircuitdiagnosis, both the

model-based and

rule-basedapproachesoffer certain advantages, but also

impose

limitations.

By

consideringa decision suppxsrt

framework

that

combines

both approaches ina

complemetary

fashion,

we

can exploittheir

respective strengths

and

overcome

the disadvantages.

Model-based

systems offerthe

advantage ofdirecdy integratingdesignand diagnosis:

when

the designer

changes

the circuitschematic (and addsrelevant

sub-models

for

new

components), the diagnosis

model

is automatically updated, permitting instantaneous troubleshooting capability without

(21)

they arevery computationally intensive,

model-based

diagnostic systems are not well-suited foron-lineapplications. Furthermore, technicaldifficulties still remain in building

suitablediagnostic

models

for analogcircuits (see, forexample, Bennetts [1981], Kritz and

Sugaya

[1986],

Tong

etal. [1989])

due

to bi-directional signal propagation,

and

the impact ofphysical layout

and

tolerances

on

the performanceof high speed radio-frequency

circuits. Dealing with bridge defectsand

improper

(open) connections alsoposes

problems

since the basiccircuitstructure itself

changes

(Priestand

Welham

[1990]).

On

theother

hand, therule-based approach can incorporate heuristic

knowledge

to deal with these complexities,

and

provide quickon-lineresponse during diagnosis; the infrastructure of

tools to

implement

a rule-basedsystem isalso quite well-established.

However,

traditional

expert systems

might

require long

development

times (Priest

and

Welham

[1990]), especially for initial

knowledge

acquisition (e.g.,

Newstead

and

Pettipher [1986]). Furthermore, since each

component

and connection

on

the board can cause malfunction, anticipating

and encoding

rules forevery possibledefectis almost impossible. For instance, the initial

knowledge

that

we

acquired for

one

board tobuild aprototype system covered less than

20%

ofall the defecttypes that

were

encountered during the firstfive

weeks

of

system

operation. Finally, a

system

that relies

on

software experts or

knowledge

engineers to updatethe

knowledge

base (forinstance,

when

the circuit designor manufacturing process changes) can introduceexpensive delays.

To

cope

withthe rapidpace of

change

inthe electronics industry,

we

requirea

diagnostic

system

thatcan both adapteasily tothe changing environment,

and

reduce diagnosis time in production

by

providingquick on-line response. Next,

we

describe a case-based

approach

thatis similarto rule-basedsystems in itson-line operation, butcan exploit thecapabilities of

model-based

systemstoadapt tochanging circuitdesigns

and

new

defects.

3.2 Elements

of

case-based

reasoning

The

goalofourproject

was

todevelop

and

testa diagnostic support systemthatis

viable inapracticalmanufacturing environment.

Two

importantobservations regarding practice motivated our

proposed

case-based approach:

1.

The

ubiquitous

80-20 law

applies also to electroniccircuitdiagnosis, i.e., less than

20%

ofallpossible defects

occur

in over

80%

ofthe defective boards. Figure 1,

which

shows

the cumulativerelative frequencies ofvarious defects (arrangedin decreasingorder of occurence)forour prototype system,clearly illustrates this

phenomenon.

Given

this

80-20

characteristic, a

system

thatcanoperate with incomplete

knowledge

, progressively

addingdefects asthey are encountered, is preferable toenumerating,apriori, all possible

defects. This strategyhas fourimportant advantages: (a) the developmentaleffort and timeislikely to besignificantly lower; (b)focusing

on

the relatively

few

defectsthat have

(22)

occured inthepastcan significantlyreduce theresponse time; (c) since it cantolerate

incomplete

knowledge and

can be updatedon-line, the system can easily adaptto

changes

inprocess

and

productdesign;

and

(d)

by

relying

on

human

expens

todiagnose

new

defects, the systemis non-threatening and hence likely to be well-accepted.

Our

experience (reportedin Section 5)indicates that the strategyofprogressivelyadding

new

defects as they are encountered provides over

90%

coverage within justa

few

weeks

of operation.

2.

A

simple

way

to "learn"

from

an expertanalyzer is by recordingthe actions

—

the tests

performed

and

theirrespective

outcomes

—

as the analyzerexplores

and

identifies a

new

defect.

The

observed

outcomes

forall the teststhat the analyzerapplied todiagnose the

boardcharacterize the

new

defect,

making

ita convenient

knowledge

representation for

system

maintainability,

and

as a

means

to

communicate

among

expens.

Case-based

reasoning

(CBR)

originated in the areaoflegal

argument

(hence, the term

"case").

An

early system,

HYPO,

operated in the

common

law

domain

oftrade-secret law;

the

system

combines

reasoning about the statutes (rules) with the precedents (cases) that

concern them.

The

problem

offindingthe case(s) appropriate to the currentsituation

amounts

toclever searching ofthe case base

on

a

number

ofsignificant attributes.

The

"closest" cases withrespectto these attributes are selected forperusal

by

the researcher. Legal

argument

continuesto

dominate

the

most

recentcase-based reasoning literature (see, for

example,

Rissland

and

Skalak [1988]). Applications of

CBR

tothe fault diagnosisare limited.

Lee and

Liu [1988] apply

CBR

to robotic

assembly

cell diagnosis,

and

Ishida

and

Tokumaru

[1990] present ageneralized approach tocase-based diagnosis using the frame theory.

For

electroniccircuitdiagnosis, the case-based approachoffers the attractive

featuresofcompatibility

and

incremental,

on-hne

knowledge

acquisition; the system can

tolerate incomplete

knowledge, and

its coverageincreases withtime.

We

enhance

the

conventional case-based approach

by

addingfeatures to incorporate:

(i) robust

knowledge

acquisition: the systemidentifies incomplete

and

inconsistent

knowledge

(e.g.,

due

to

measurement

errors ordesign changes),

and

incorporates

verification

and

updating steps tocorrect these errors;

(ii)distributed usage:

by implementing

the system inadistributed client-server

environment

with a shareddatabase, several userscan simultaneously access

and

use thesystem; and,

(iii)effectivetestsequencing: the system reduces the

number

oftests

by

applyingtest

selection heuristics that accountfor priorprobabilitiesofdifferent defects.

In theelectronic circuitdiagnosis context, acase is an instance ofadiagnostic

experience, represented by itstestoutcomes.

(A

rule,

by

contrast, is a generalization about diagnostic experience gathered overtime.)

Each

casecorresponds toa particular defect.

(23)

The

case

base

consistsofthecollection of instances correspondingto different defects observed inthe past. Recall thateachdefect haseithera deterministic or indeterminate

outcome

forevery test; funher, the valueofa deterministic

outcome

might

be

known

(0or

1)or

unknown.

For

agiven defect,

we

r. ^r toits setofexpected

outcomes

for all testsas the

signature

of that defect.

Suppose

we

have n

defect types

and

m

available tests.

We

visualize the case base as a n x

m

matrix, with each

row

corresponding to a previously

diagnosed

caseor defect.

The

i

row

in the case base represents the signature ofthe i defect type, i.e., thej

element

ofthis

row

has a 0, 1, 1, or

U

depending

on whether

the

expected

outcome

oftestj fordefect iis 0, 1, indeterminate, or

unknown.

We

permit the

userto

dynamically

add rows

(cases)

and

columns

(tests) tothe case base as

new

defects aredetected

and

diagnosed (incremental

knowledge

acquisition).

3.3 Diagnosing

a

defective

board:

Successive

refinement

through

case

matching

Diagnosing

adefective board correspondsto gathering

enough

information (by

performing

tests

and

observing their

outcomes)

about the board's signature, while

simultaneously searchingthe case base, to identify a

matching

case. If

no matching

case is

found, the boardcontains a

new

defector an inaccuratecase. In either situation, thecase base

must

beupdated toreflectthis latest experience.

We

remark

that the general case-based

approach goes beyond

case

matching

and database updatingfunctions topossibly includereasoning

and

extrapolation.

Our

implementation doesnot exploit these additional

capabilities. Figure 2

shows

the

complete

flowchart forour case-basedcircuitdiagnosis support

system

(CDSS).

We

firstexplian

how

the

system

operates beforedescribing

how

toupdate thecase base (Section3.4). Section

4 descnbes

enhancements

to

improve

the

system'stest sequencing capabilities,

and

Section 5 discusses

how

tointegratethis system witha

model-based

approach.

At

each stageofthe search process, thesystemmaintains: (i) a candidatedefectset

consistingofall

known

defects

whose

signatures

match

theobserved

outcomes

for all the tests

performed

thusfar,

and

(ii) theavailable testsetcontainingall tests that havenot yet

been

perfonned

but

whose

outcome

differentiates

one

or

more

candidatedefects

from

the

remaining

defects in thecandidate set. In the basic versionofthe case-basedsystem, the

analyzer

chooses

atest

from

the availabletest set, performsthis test,

and

observesthe

outcome

ofthetest. (In Section

4 we

describesystem

enhancements

tooptimally select the

nexttest).

When

theanalyzerenters the observed

outcome

into thesystem, thecandidate

defect set

and

available test set areupdated as follows:

•

_remove

_from

_thecandidatedefectsetalldefects

whose

expected

outcome

for thelatest testdiffers

from

the

observed outcome.

Note

that

we

retain all defects

whose

outcome

(24)

• _delete the testthat

was

just

performed from

the available test set; also,

remove

any test

whose outcome

is the

same

(or

unknown

or indeterminate) for all the remaining defects

in theupdatedcandidate defectset.

Ifthe initialcase base is

comprehensive

(i.e.,containsall possible defecttypes, and has

enough

information regarding expected

outcomes

touniquely identify eachdefect)

and

accurate, the process

must

terminate with the candidate defectsetcontainingonly the true defect.

To

illustrate this process, considerthe simple

5-component,

series

network

shown

in

Figure 3.

Suppose

we

have

a board

whose

abnormal

outputsignal for the functional test

indicatesmalfunction in

one

ofthe 5

components

(all the intermediateconnections

on

the

board are good); fori

=

1,2 5,defect i refers to amalfunction in

component

i.

We

have

four availableteststhatapply a stimulusattheinput terminal,

and

measure

thevoltage at

each ofthefour intermediateconnections (see Figure 3).

An

outcome

of0, corresponding to

abnormal

voltage, indicates malfunctionof

one

ofthe upstream

components.

Thus,

we

can representthe signature foreachdefect asa vectorcontaining

4

binary elements. Defect

1 has the signature

{0000},

defect 2 has signature {I

000),

and

so on. Table 1

shows

thecasebase containing the

complete

signatures forall 5 defects. This

example

does nothave

any

indeterminateoutcomes, and

we

initially

assume complete

information

(i.e.,

no

"unknown"

outcomes).

For

simplicity,

assume

thatthe technician always selectsthe testsinreverse sequence,

i.e., he firstapplies test 4 (i.e.,

measures

thevoltage

between

components

4 and

5), then

test 3,

and

so

on

until the defect isconclusively identified.

Suppose

a board contains defect

3 (i.e.,

component

3 isdefective). Table 2 tracesthe successive statesofthe system, i.e.,

the candidate defectset

and

theavailable test set at each stage; the

columns

ofthis table

answer

thedefectidentification

and

test selection questionsofSection 2.

At

the

end

ofthe third iteration, the candidatedefectsetcontainsa single defect (defect 3), proving conclusively that

component

3

on

the board isdefective. For thisexample, the

chosen

testsequence eliminates

from

the available testsetonly

one

test(the testthat

was

justapplied)at

each

step

However,

if

we

had

firstapplied test 3 insteadoftest 4. its "0"

outcome

(forthis

example)

automatically eliminates test4 since allremaining candidate

defects {1, 2, 3}

have

the

same outcome

"0" _for_{test 4.}

This

example

motivates the following three

imponant

observations:

1. Exploiting prior information:

Suppose

past historyor a priori information (e.g.,

knowledge

about

vendor

quality

problems) suggeststhatthe likelihoodofdefect 3 is significantly higher than theother

(25)

perform

these

two

tests first. For our

sample

board containingdefect 3, this strategy

would

require only

two

iterations tocomplete thediagnosis

(compared

to the 3 iterations

requiredfor the static reverse-order sequence).

For

complex

circuits, this savings in

number

oftestscan be significant. Section4discusses

how

to exploithistorical

and

other prior

knowledge

todynamically select an effective testsequence.

2.

Incomplete knowledg

e:

Our

example assumes

that

we know

the

complete

signature for each defect, i.e.,

we

know

theexpected

outcome

ofeach testforevery defect.

However,

the partial signature {0

U U

U}

issufficient to isolate defect 1 (the

element

"U"

denotes an

unknown

outcome)

if

we know

thatall other defects producea "1"

outcome

fortest 1; similarly,

(U

1

U}

isolates defect 3, i.e., tests 2

and

3 are the only essential tests to uniquely identify defect 3. Table 3

shows

an "incomplete" casebase containing adequate

knowledge

todistinguish

between

the 5defects.

Using

thiscase base

and

ouroriginal reversetest sequence, the setof candidatedefects for the first3 iterations

remain

the

same

as before

(shown

in Table 2).

However,

observing a "1"

outcome

fortest2 during the third iteration eliminates defect 2 butdoes not eliminate defect 1 (since the

outcome

oftest 2 fordefect 1 is

unknown).

Therefore, at the start ofthe fourth iteration, the candidate defectsetcontains

two

defects (defects 1

and

3),

and

we

have

one

available test(test 1) thatcan eliminate defect 1.

We

must

necessarily

perform

test 1 (in thiscase,

we

willobserve a "1"

outcome,

eliminating defect

1) in orderto conclusively provethat the board contains defect 3. Thus,

complete

knowledge

about theexpected

outcomes

foreachdefectcanaccelerate the pruning

process

and

reducethe

number

oftests.

Also

note that,

even though

we

staned with only the partial signatures,

we

can

improve

our

knowledge

afterthisdiagnostic exercise. In particular,

we

can

change

defect I's expected

outcome

for test2

from "U"

to"0" in the

case base, potentially

improving

future performance.

3.Multipledefects:

The

signatureofadefect represents the anticipatedtest

outcomes assuming

thatthe board containsonlythatdefect.

What

happens

ifa boardcontains multiple defects? Suppose,

inour

example,

theboard contained defects 3

and

1.

The

diagnosis, illustrated in Table

2, is stillcorrect, i.e.,

component

3 isdefective.

However,

after

we

verify defect 3

and

replace

component

3,

we

must

againapply the functional test to

confum

proper

functioning ofthe board. Ifthe board malfunctions,the diagnosis process

must

be repeated; in this example, the

second round

willdetectdefect 1.

Although

our

example

useda verysimplecircuit, it has servedtoillustrate several generic

concepts relating to effectivediagnosis. Realcircuits contain

many

complexities such as

(26)

measurements,

stimuli, and

component

settings,

and

so on.

Our

methodology

applies

even

tothese

complex

circuits.

3.4 Updating

the

case

base:

Incomplete

knowledge

and

inaccuracies

We

have

already seen

one

opportunity to update thecasebase

and

improve

subsequent performance: suppose

we

identify defect iat the end ofthe diagnosis process,

and suppose

the currentcase basecontainsonly the partial signature fordefect i. Then, forevery test

j

performed

during thecurrentdiagnosis

whose

outcome

fordefecti

was

previously

unknown,

we

canreplace the

unknown

entry in the case base with the actual observed

outcome

for thecurrent board (assuming testj is

known

to

have

a deterministic

outcome

for

defect i).

We

will refer to this updating step as signature

augmentation

.

Note

that this

updating step is not valid ifthe boardcontains other defects besides defect i; hence,

we

can

augment

defecti'ssignature onlyafterrepairing itand verifying the properfunctioning of

the board.

Our

discussion in Section 3.4

assumed

that the case base is

comprehensive

(i.e., it

covers allpossible defect types)

and

accurate. Ifthe case base isincomplete or inaccurate, the

matching and

successivedefect/testelimination process mightnot necessarilyterminate witha single (truej -efectin thecandidate defect set.

There

are

two

possibilities:

(i)

The

process terminates because thecandidate defectsetis

empty

,i.e., the

combination

of

observed

test

outcomes

does not

match

with the signature ofany

known

defect.

This situation couldariseeither because:

(a) theboardcontains a

new

defectthat had notoccuredpreviously;

(b) atleast

one

ofthedefect signaturesrecordedin the case base is inaccurate (due to

dataentryerrors,design changes, etc.); or,

(c) theobservation

and

measurement

ofthe acaialtestresults

was

erroneous.

In all three cases, the boardrequires "extraordinary" actionsto identify the true

symptoms

and

cause oftheproblem; these actions

might

entaildetailed probing

and

experimentation

by

theanalyzeras well as consultations withdesign

and

testengineers, production staff,

and

component

vendors. Ifthe board is

found

tocontain a

known

defect, itssignature in thecase base

must

be

checked and

corrected,if necessary.

We

refer to this ujxiating step as signature correction.

On

theother hand, if the board

contains a

new

defect,

we

must add

a

new

case (row) corresponding to that defect.

We

referto this step ascase addition.

The

new

case contains the expected test

outcomes

for that defect,

and

possibly

some

auxiliary information such as the

name

ofthe defect, theanalyzerorengineer

who

diagnosedthe defector revisedthe signature, the

corrective (repair) action required, thedefect category,

and

who

to notify(or

how

to

(27)

new

defect

from

the previously

known

defects (theobserved test

outcomes

that led to the

empty

defectset provide adequate

"proof

forthis defect), the analyzer

might

wish

to

add one

or

more

new

teststhatcan quickly isolatethis defect.

(ii)

The

diagnosisprocess terminatesbecause the available test setis

empty

, butthe

candidate defect setcontains

more

than

one

possibledefect. Thisoccurrence indicates

erroneoustest

measurements,

inaccurate defect signatures, orinadequate tests.

"Inadequatetests" refers tothe situation

when

the currenttests cannotdistinguish

between

the candidate defectseither because

many

test

outcomes

are

unknown

or

new

testsare required. Recall that

we

deletea test

from

the available testset

when

we

perform

that testorif itsexpected

outcome

isthe

same

(same

known

value, indeterminateor

unknown)

forall theremaining candidatedefects. Again, this

stoppingcondition (with multiple candidate defects

and no

remaining tests) requires extraordinary actions toverify the test

measurements,

check

thedefect signatures stored

inthecase base, complete the partialsignatures oftheremainingcandidatedefects, or

add

new

teststhatdistinguish

between

these defects.

We

refer to thisprocess as test

addition.

Figure 2 contains aflow chart

summarizing

the case

matching and

updatingprocess. This process includes a defectverification step atthe

end

ofa successful diagnosisexercise.

Verificationentailsconfirming previousobserved

outcomes and

applying any teststhat

remain

intheavailabletest set; if

one

or

more

outcomes

conflict with the hypothesized

defect, a

new

case

must

be added. Thisoptional step is especially important afterdesign

changes

and

during production

ramp-up

when

many

new

casesare

added

to thecase base.

During normal

operation, defectsare automatically verifiedif, afterrepairing the identified defect,the boardpassesthe functionaltest.

A

board failingthe testagain requiresexplicit

defectverification todetermine

whether

the originaldiagnosis

was

erroneousor iftheboard contains multipledefects.

In

summary,

thecase-based approach tosupportcircuit diagnosispermitsincremental, robust,

and

distributed

knowledge

acquisition.

Unhke

other

knowledge-

based approaches,

thecase-based

system

employs

on-line

and

incremental, ratherthan batch-oriented,

knowledge

acquisition. Allcases havethe

same

format; a case contains information

on

the

symptoms

thatthe useridentifies to be relevantin definingand distinguishing

between

defects.

Because

ofthe standardizedformat, the case baseis easy to understand

and

update.

The

system

begins with incomplete

knowledge and

poordefect coverage, buteach unsuccessful diagnosisinitiatesa process ofcase-building toensure subsequent coverage ofthe

same

defect.

Our

experience suggeststhat the casecoverage increasesvery rapidly during the first

few

weeks.

By

only maintaining the diagnostic

knowledge

that is needed,

(28)

baseareeasyto detect: they result in unusual termination ofthe diagnosis process(empty candidate defect setor

empty

availabletest set).

The

various correction

and

updating steps

—

signature augmentation, signature correction, case addition,

and

test addition

—

make

the

knowledge

acquisition processrobust

and

adaptive.

The

case-based

system

can be

implemented

using a sharedrelationaldatabase in a multi-user

computing

environment, thus providing

wide

access to analyzers atdifferent locations as well asdesign, testand

process engineers and production supervisors. This implementation alsoallows the case baseto be integrated with other factory information systems,

conveying

current vendor, design,

and

process

problems

to the diagnosisoperation,

and

feeding backdefect informationforquality

improvement.

The

system can alsokeep trackofthe cumulative

relative frequencies

and

trendsofdifferent defects;

we

next describe

how

touse these

relativefrequencies toguide

and

optimize the sequential testing process.

4. Optimal

Test Selection

By

automating thesearch

and

updating functions,the basic

CDSS

reducesthe time for

each iteration ofthediagnosis process. Decreasingthe

number

ofiterations provides another

means

to significantiyreduce the total diagnosis time per board.

As

our

example

of Section 3 illustrates,the

number

ofiterations

depends

on

thetest sequence.

We,

therefore,

require aneffective testselection policyorrule: ateach stage ofthe diagnosis process, the

method

helps us decide

which

available testtoapply next (based

on

theobserved

outcomes

thus far).

We

refer toa testselectionrule that

minimizes

the expected

number

oftests as

the optimaltestselection policy. This section presents a

dynamic

programming

formulation

for the optimaltest selection problems,

and

describes on-line heuristic rules that the basic

CDSS

can incorporate toreduce the expected

number

oftests.

These

rules rely

on

prior

informationconcerningthe likelihoodofdifferentdefecttypes,estimated

from

historical

data

on

defects (relative frequencies

and

trends),

knowledge

about specific

problems

atthe

vendor

site orin the

assembly

process, inputs

from

design engineers

on

new

engineering changes, etc.

4.1

Test

selection

example

We

firstconsidera simple

example

to betterunderstand the impact ofteststructureand

sequencing

on

the

number

oftests requiredto prove the defect.

Suppose

aboardcontains

one

of npotentialdefects,

and

eachdefect has a singleassociatedtest thatis both necessary

and

sufficient to "prove" that defect, i.e.,test i has a "1"

outcome

fordefect i,

and

a

"0"

outcome

forall other defects(orviceversa).

We

will refer to this type oftestas n

focused

test (test 1 in Figure 3 is a focused test). Thus, the

complete

case basefor n focused tests

corresponds toa n x n identity matrix.

Suppose

n

=

5,

and

the five possible defects have

the following probabilities of occurence: 0.1, 0.1, 0.2, 0.2,

and

0.4. Intuitively, applying

(29)

For

ourexample, the "decreasing probability sequence" correspondsto applying the tests in

reverse indexorder(i.e., 5-4-3-2-1 ) untilthe defect is identified.

The

expected

number

of tests, say Ej, using this strategy is:

Ej

=

1x0.4

+

2x0.2

+

3x0.2

+

4x0.1+5x0.1

=

2.3 tests per board.

On

theotherhand, fora static policy that appliestests in the forward sequence 1-2-3-4-5.

the expected

number

oftests

E2

is:

E2

=

1x0.1+2x0.1+3x0.2

+

4x0.2

+

5x0.4

=

3.7 tests per board.

The

optimal strategy saves,

on

average, 1.4 testsperboardrelative tothe

second

sequence, a

43%

savings.

(A

more skewed

probability distribution can give

even

greater savings.) Thus, optimaltest sequencing hasconsiderable potential toreduce diagnosiscost,

and

provide quickfeedback forquality

improvement.

Because

we

assumed

thateachdefecthas a focusedtest,our

example

hasa simple optimal testing policy thatis static (i.e.,

does

notvary withthe observed

outcomes)

and

easy to implement,i.e., sequence the tests indecreasing order ofdefect probabilities.

Since focusedtestsare infeasiblefor certaindefects (because ofconstraints

on

probing,

measurement,

etc.)

and

eliminateonlyone defectatatime (exceptatthe final step),

analyzersrely

on

a

combination

offocuseda.ndjointtests. Unlike a focused testthatis

both necessary

and

sufficient,joint testscannotisolatedefects individually but their

combined

outcomes

canjointly identify

and

prove thedefect (e.g., tests2 through 4in

Figure 3). Joint testsreduce the

number

ofrequired tests todiagnose all defects,

and

can,

under

certaincircumstances(depending

on

their structure and the relative likelihoodof variousdefects), reduce theexpected

number

ofiterations per board.

The

"test

configuration

problem"

of deciding

what

combination of focused

and

joint tests toinclude

intheavailable testsetposes challenging tradeoffs

between

the

number

ofavailable tests

and

theexpected

number

ofiterations per board;

we

do

notaddressthis tradeoffin this

paper,

assuming

instead thatthe available testsare predetermined.

When

the available test setcontainsjointtests,theoptimal test selection policy

might

be

dynamic,

i.e., the decision

on what

test toapply next

depends on

the current "state" of the system defined by the

previoustest

outcomes;

we

nextformulate this testselection problem.

4.2 Test

selection:

Problem

formulation

and

optimal

solution

The

testselection

problem

seeks an optimalornear-optimaltestingpolicythat

minimizes

theexpected

number

oftests oriterationstodiagnose amalfunctioning board.

We

distinguish

between dynamic and

static policies; a

dynamic

policy uses information regardingprevioustest

outcomes

to select thenexttest, while a staticpolicy selectsthe

same

sequence

regardlessofthe testoutcomes.

Dynamic

policies

perform

bener(i.e., they require

fewer

tests

on

average) since they use

more

information.

The

optimizationof

(30)

sequenrial search processes has been previously

modeled

(see, forexample,

Whinston and

Moore

[1986]),

and

applied, forinstance, to

computer

file search

(Moore,

Richmond,

and

Whinston

[1988]).

The

reliability literature has also addressed optimal testsequencing

issues forcertain specialcircuit

and

test structures (e.g., Nachlas,

Loney,

and

Binney

[1990], Butler

and Lieberman

[1984] studyseries systems with focused tests).

However,

unlikeour

dynamic knowledge

acquisition context, this stream ofliterature often

assumes

thatalldefect types

and

theirprobabilities are

known, and

the requisitetests are available,

withoutprovisionsfor

unknown

orindeterminate outcomes. Furthermore, the

knowledge-baseddiagnosis systems described in the literature

do

not appear toincorporate optimal search principles.

4.2.1

Dynamic

Programming

Formulation

This section presents a standard

dynamic

programming

formulation to clarifythe

ingredients

and

tradeoffsin test selection duringcircuitdiagnosis. First,

we

introduce

some

notation.

We

use the indices i

=

1,2,. ..,n

and

j

=

1,2,...,m torepresent the n defect types

and

m

available tests.

We

assume, without lossofgenerality, thatthe casebase contains all possible defecttypes. Otherwise,

we

can introducea

dummy

defect type called "unidentified"

(which

includes "no trouble found"),

and

a fictitiousfocused test with a"1"

outcome

iftheboard has an unidentified defect,and a "0"

outcome

forallother

known

defect types

(we perform

this fictitious testonly

when

thesetof

known

candidate defectsis

empty).

We

permitincomplete

knowledge

ofdefect signatures.

However,

the

known

partialsignature for

each

defect

must

be accurate,

and

we

must

have

enough

information

(i.e., theanalyzer

must

known

the expecteddeterministic

outcomes

for

enough

defect-test

combinations) to

unambiguously

identify the true defect.

We

also

assume

thatthe test

observations

and

measurements

are error-free.

These

assumptions ensure thatthe diagnosis processis accurate,

and always

terminates "properly" with the candidate defect

setcontainingonlythe true defect.

At

any

intermediate iteration ofthesequential search process, tiieobserved

outcomes

forthe tests

we

have performed

thus fardetermine the remaining candidatedefects (i.e.,

defects

whose

expected

outcomes

areconsistent with the observed

outcomes)

and

available

tests (thatcan distinguish

between

the remainingdefacts); hence, tiie previouslyobserved

outcomes

completely captureourcurrent

knowledge

about die board.

We

let

X

denote the state vectorrepresentingthiscurrent

knowledge.

We

specify the state

X

asthe following m-vector (where

m

isthe

number

ofinitially available tests): tiiej elementx.can take three values: X:

=

iftestj

was

previously

performed and

itsobserved

outcome was

"0",

Xj

=

1 iftestj's

outcome

was

"1",

and

X:

=

N

iftestj has not yet

been

applied.

The

initial

statevector

X^

haselementsx^'

=

N

for allj

=

l,2,...,m. Let

D(X)

and

T(X)

denote,

respectively,the indexsets of candidate defects

and

available testscorresponding to any

stateX. Initially,

D(X^)

=

( l,2,...,n) is the set ofall possible defects,