• Aucun résultat trouvé

Education staff training development programme : evaluation in curriculum development

N/A
N/A
Protected

Academic year: 2022

Partager "Education staff training development programme : evaluation in curriculum development"

Copied!
44
0
0

Texte intégral

(1)

1

3B£3

1

d

Was

H»SS3

$£83111 P

!

i

1

i

WSBS&

£§§£is

w

vbSsSSSSE

iii I i I §1

| I

mm

jaar

w

till WL k

t

EDUCATIONAL TRAINING MANUAL No. 16

EDUCATION STAFF TRAINING

DEVELOPMENT PROGRAMME

EVALUATION IN CURRICULUM DEVELOPMENT

UNITED NATIONS

Economic Commission for Afrlica September 1991

(2)
(3)

Educational Training Manual No: 16

EDUCATION STAFF TRAINING DEVELOPMENT PROGRAMME

EVALUATION IN CURRICULUM DEVELOPMENT

UNITED NATIONS

Economic Commission for Africa September 1991

(4)

You 1.

2.

3.

4.

5.

6.

7.

8.

9.

should be able to:

Define and explain the purpose of evaluation Describe the types and models of evaluation Explain and describe the phases of evaluation Describe the nature of the evaluation process Explain what is meant by formative and summative

evaluation

Discuss the role and functions of formative and

summative evaluation

Write an evaluation proposal

(a) Conduct a simple evaluation study (b) Write an evaluation report

Critically analyze an evaluation report of a selected educational programme

(5)

EVALUATION IN CURRICULUM DEVELOPMENT

The study of educational evaluation, as Professor Kingsfield says, is "something new and unfamiliar to most of you, unlike any other schooling that you have ever known before". Therefore, in order to meaningfully learn about and perform components of the evaluation process, it is first necessary for you to develop a cognitive structure into which such experiences can be

integrated. The purpose of this chapter is to provide an

overview of the evaluation process and of the various types and phases of evaluation. The goal of this chapter is for you to acquire an understanding of the evaluation process and evaluation methodology which will facilitate acquisition of specific

evaluation knowledge and skills.

The Nature of Evaluation Definition and purpose

Evaluation is the systematic process of collecting and analyzing data in order to make decisions. Although some definitions seem to equate measurement with evaluation, most recognize that measurement is but one essential component of

evaluation.

Answers to questions regarding degree of objective achievement and relative worth require the collection and analysis of data and interpretation of that data with

respect to one or more criteria. The purpose of evaluation is not to determine whether something is "good", or

worthwhile, as opposed to "bad" or worthless, per se. It is to determine the current status of the object of the

evaluation, to compare that status with a set of standards, or criteria, and to select an alternative from among two or

more to make a decision.

The purpose of evaluation is the same regardless of the type of evaluation or the nature of the objectives, whether instructional, curriculum, or project objectives, explicitly or implicit, process oriented or product oriented, short- term or long-term. in all cases the evaluation process is basically the same and involves determination of the types of data which need to be collects, determination of the individual, group, or groups from whom data will be obtained, collection of the data analysis of the data,

interpretation of the data, and decision making. The person or persons responsible for data collection may or may not be the persons who make the subsequent decisions.

(6)

skills, who is not associated with the programme being evaluated, who collects and analyzes appropriate data or verifies existing data. The validity of decisions is a

function of the validity of the data collection and analysis

procedures.

Measurement

Basing decisions on valid procedures is critical because although all decisions are not equally important, each one has a consequence which directly or indirectly affects students. The data collected during the evaluation process are only as good as the measurement upon which they are based.

Measurement is the process of quantifying the degree to which someone or something possesses a given trait, i.e.

quality, characteristic, or feature. Measurement permits more objective description concerning traits and facilitates

comparisons. Expressing a trait numerically considerably reduces ambiguity and provides much more useful information. Although, theoretically, all traits of interest can be measured,

educational measurement is complicated by the fact that almost all measurement is indirect; there are no yardsticks for

measuring traits such as intelligence.

The term measurement is not synonymous with the administration of a test; data may also be collected via

processes such as observation or may already be available and retrievable from records. A test is a means of measuring knowledge, skills, feelings, intelligence, or aptitude of an individual or group The very best instruments available should be used and results should be interpreted with appropriate

caution.

Good evaluation usually involves decisions based on data obtained from a number of sources, not just on the results of single test. It is clearly better to base judgements and

decisions on the b est data available rather than on subjective impressions. The whole notion of accountability requires the availability and use of valid indices of performance.

Evaluation and Accountability

To varying degrees, any person or programme which receives funds from one or more sources is accountable to each source;

this means that each is responsible for providing evidence

related to accomplishments and successes.

What a person or programme is accountable for depends upon agreed-upon function and objectives. The manner in which the accountability principle is implemented depends upon who is

(7)

demanding accountability and from whom. The increased interest in, and demand for, accountability is not difficult to

understandgiven the rising costs of the education enterprise and the apparently declining test scores of high school students.

Even persons who express negative attitudes about

objectives, evaluation, and accountability engage in evaluation all the time and demand accountability in their everyday

activities with respect to purchased goods and services. It is the nature of accountability, e.g. in business that it leads to increased efforts to produce quality products and efficient procedures; likewise, increased demands for accountability in education should result in increased efforts to improve the educational experiences of all children.

Everyone can benefit from feedback; the evaluation process can provide the feedback which will identify those things that we are doing right and those things that need improvement, and helps us to make decisions about our objectives, strategies, and

activities. Evaluation and accountability are worthwhile

concepts only if related procedures are competently conducted and results are interpreted carefully.

It is critical to the whole process that people be evaluated on, and be held accountable for, only those things over which they have control and only to the degree that they do. A teacher is only partially responsible for the achievement of students.

Types of Evaluation

Types of evaluation versus models

When we speak of "types" of evaluation, we are

referring to the different processes, products, and persons subject to evaluation - students, curricula, schools, school systems, large populations, special programmes and projects, and personnel. The basis evaluation process is the same regardless of what is being evaluated; what differs is what is being evaluated, how the evaluation process is applied, and the types of decisions made.

All valid systems and models for evaluation involve the same essential components, namely: specification of goals and objectives; selection and/or development of measurement tools; delineation of strategies for objective attainment;

process and product procedures; and analysis and

interpretation of results. There is often a fine line

between research and evaluation, and an evaluation may very easily utilize a research design; both research and

evaluation involve decision making and both are based on the scientific method.

(8)

The approach one adopts in collecting evaluation data depends upon the type of questions one wishes to answer.

Although it is usual to select a model which suits the evaluation objectives, an evaluator can develop his own model specifically geared to the questions and problems he is dealing with. We shall nevertheless describe some of the available models hoping that the reader will be able to acquire some guidelines to help him in designing his own models if he ever has to.

Discrepancy Evaluation Model

The term Discrepancy Evaluation was used by Malcolm Provus to refer to his evaluation model. Provus states that "Evaluation is primarily a comparison of programme performance with expected or designed programme, and secondarily, among many other things, comparison of client performance with expected client outcomes."

Evaluation in other words involves a comparison of programmes in the real life situations.

It further involves a comparison between intended or planned outcomes with actual measures of student performance. The

discrepancies noted in such an evaluation serve as feedback for improving the education programme. In general, the Discrepancy Evaluation model involves the following:

Design of the programme/project Installation of the programme

Evaluation of interim products Evaluation of products

Cost Benefit Analysis of the programme

Individually Prescribed Instruction (IPI)

The Individually Prescribed Instruction model was developed by the Learning Research and Development Centre of the University of Pittsubrgh. The model was actually developed for student

instruction but has features that makes it adaptable for

evaluating educational programmes in general. The IPI programme attempts to suit education and instruction the particular

abilities of the child. The model has the following components:

goals, plan, operational and assessment.

(9)

Goals:

Instruction in a subject area is divided into the above four components. For instance, instruction in English has a set of goals. These goals refer to the objectives the English programme sets out to achieve. The objectives are then arranged in the following manner:

They are sequenced in a logical manner. The are arranged in a hierarchical or pre-requisite order as follows:

Read words Read sentences

Comprehend sentences

Interpret sentences

A student who first enters the programme is given a

placement test which helps to locate the appropriate level from which he has to start his learning.

Plan:

This sector refers to the procedures and materials the child will have to use to achieve the goals of the programme.

Operation:

When a plan has been put into operation, i.e. when a child starts using a plan for achieving certain instructional goals there is the need to monitor the plan to see how well it is working. The sector refers to the system evaluating

the plan.

Assessment:

The last component of the IPI model refers to the evaluation of the achievement of the student. Data obtained at this level is used for making improvements in the studentxs learning.

In the IPI programme, the child works on an individually prescribed assignment at his pace and according to his own abilities. Concepts like standard one, standard two, or standard three, standard four, are irrelevant in the IPI

system.

The course of instruction in a subject area has been well planned out and a child starts at the point where his

(10)

ability is suited and works through the programme on his

own.

The Stake Model

Stake views evaluation as a system of describing an educational programme as well as judging the worth of the

programme in r elation to some external criteria. In the process of evaluation, the evaluator needs three sources of data relating to Antecedent, Transactions and Outcome behaviours.

Figure 1

STAKE: CONGRUENCE-CONTINGENCY MODEL

INTENTS OBSERVATIONS STANDARDS JUDGEMENTS

RATIONALE

ANTECEDENTS

TRANSACTIONS

-OUTCOMES

DESCRIPTION MATRIX JUDGEMENT MATRIX

Antecedent data refer to the entry behaviours of the student before the student goes through the curriculum These behaviours in aptitude, attitude, interest,

previous experiences, etc.

Transactions refer to the educational processes that occur in classroom. These may involve teacher-student

(11)

relationships, discussions, etc. Transactions are hence seen as dynamic processes.

Outcomes refer to those abilities, attitudes, and achievement that are from the educational experience, and should also include application and transfer of knowledge over time.

Secondly, outcomes refer to the effect of education on teachers, administrators, counsellors and at a third level, also refer to data wear and tear of equipment, effects of learning environment, the costs of the

programme.

In an evaluation study the evaluator has two responsibilities: he describes the Antecedents,

Transactions and Outcomes of the programme. Secondly, he must judge the appropriateness or merit of these three categories. The Stake Model is shown in Figure 1, A description of the various aspects of the model is as follows:

Antecedents:

Conditions existing prior to the teacher-learning situation as they refer to outcomes. These conditions are known as Entry Behaviours and involve:

Aptitude

Previous Experience Interest

Willingness to learn, etc.

Antecedent behaviours are entry behaviours and are relatively static.

Transactions:

These are the encounters between teachers and students during the process of education. These may take various forms, seminars, class discussions, etc.

Transactions are normally dynamic processes.

(12)

Outcomes

Evaluation of outcomes takes place at three levels: primary level, secondary level, and tertiary level. The components involved are as follows:

1. Achievement, abilities and attitudes.

2. Impact on Teachers, Administrators and Counsellors.

Wear and tear on equipment

Effects on learning environment and on the community.

Costs incurred.

Intended and unintended effects.

Immediate and long-range effects.

Data should be entered in the twelve cells for the

appropriate programme aspects under consideration bearing in

mind that:

Intents refers to the intended student outcomes, i.e.

the goals and objects.

Observations refer to test results, direct

observations, interviews, checklists, etc, which provide descriptive data about the student.

Standards are prescribed by educational experts, i.e.

teachers, administrators, students, parents, spokesmen for the society, etc. Standards however depend upon particular situations. They differ from school to school, from instruction to instruction, from student to student, and hence they are not absolute.

Judgements are the values of how people feel about the programme being judged. Judgements may be in terms of good, fair, poor, etc. A detailed version of the Stake Model is shown in Figure 3.

The evaluation according to Stake may collect descriptive or judgement data. In collecting descriptive data, he has to collect data on intended data on what he actually observes over the three categories. The evaluator then has to relate the intended and observed descriptive data in terms of

congruence. When the evaluator has to judge the worth of a programme, he first has to collect criterion data on

prescribed "standards," relating to antecedent, transaction and outcome behaviours. With these standards the evaluator

(13)

11

can then make judgements as to the worth or merit of the data and of the programme.

The CIPP Model

This Model was developed by Daniel Stufflebeam and others The details of the process involved are shown in Figure 2.

Figure 2: The CIPP Model of Evaluation

Objective

Approach

Relation To

Decision Making In The Change Process

Context Evaluation

Input

Evaluation ProcessEvaluation Project Evaluation

As shown in Figure 2. the CIPP Model involves four types of

evaluation namely. Context evaluation, Input Evaluation, Process Evaluation, and Product evaluation.

(14)

Figure 3

CONGRUENCE-CONTINGENCY MODEL PROGRAMME

RATIONALE

ANTECEDENTS

Student Characteristics Teacher characteristics Curricula content

Curricula context

Instructional materials

TRANSACTIONS Communication Flow Time Allocation Sequence of Events Reinforcement Schedule

Social Climate

OUTCOMES

Student Achievements Student Attitudes Student Motor Skills Effects on Teacher Instructional Effect

DATA VALUATION OF AN EDUCATIONAL PROGRAMME

Intents Sources

-A

Observation Sources

—B

Standard Sources

Judgement Sources

- - D

1

Example A: Manufacture Specification of an Instructional

Material Kit

Example B: Teacher Description of Student Understanding Example C: Expert Opinion on Cognitive Skill Needed for a

class of problems

Example D: Administrator judgement of Feasibility of a Field

Trip Arrangement

Source: Reproduced from J.C. Saylor and W.A. Alexandra.

Planning Curriculum for Schools. Holt Rmehart and Winston, New York, 173, p. 360.

(15)

13

Context evaluation involves the identification of needs, statement of programme objectives and the development selection of criterion measures through interviews or expert opinion

research and surveys.

Input evaluation focuses on an examination of various input strategies evaluating their strengths and weakness and then

selecting the best strategy for solving the project goals.

Process evaluation is geared to monitoring the change process to detect defects and thereby institute corrective measures for the success of the programme. Product evaluation involves an assessment of project outcomes as they relate to

project objectives, context, input, and processes of the project.

The CIPP Model is closely related to the Stake Model and the reader is urged to compare and relate the two.

Student evaluation

Achievement is but one of many variables on which a student is assessed, other major variables include aptitude,

intelligence, personality, attitudes, and interests.

In order to assess achievement tests, both standardized and teacher made, are administered; projects, procedures, and oral presentations are rated; and formal and informal observations are made. A teacher uses performance data not only to evaluate

student progress but also to evaluate his or her own instruction.

Feedback on current student progress also gives direction to current and future instructional activities. Teachers make both short-term and cumulative decisions concerning students, based on a variety of academic and personal-social data.

There are numerous other decisions that are made concerning students in which the teacher is either not involved or only makes recommendations. Of all the types of evaluation, student evaluation is probably the most critical (with the possible exception of evaluation of personnel) because decisions made affect individuals as individuals, as opposed to, for example, curricular decisions, which affect groups.

Curriculum evaluation

Curriculum evaluation involves the evaluation of any

instructional programme or instructional materials, and includes evaluation of such factors as instructional strategies,

textbooks, audiovisual materials, and physical and organizational arrangements. Curriculum evaluation may involve evaluation of a total package or evaluation of one small aspect of a total

curriculum such as a film.

(16)

Although ongoing programmes are subject to evaluation, a

curriculum evaluation is usually associated with innovation, new or different approach; the approach may be general, i.e., applicable to may curriculum areas, or specific given subject area. Curriculum evaluation usually involves both internal and external criteria and comparisons. Internal e valuation is concerned with where it does what it purports to do, as well as

with e valuation of the object themselves.

External evaluation is concerned with whether the process or product of whatever it does better than some other process or

product. Student evaluation is almost always a part of curriculum evaluation, but we are concerned with group, not individual, performance. Some persons advocate what is called

"goal-free" evaluation, and suggest that attention should be focused on actual outcomes, not on pre-specified outcomes.

In addition to student achievements, there are a number of other factors which are generally recommended for inclusion in curriculum evaluation; attitudes are such factor. Research has demonstrated a relationship between teacher attitude towards a curriculum and its ultimate effectiveness. Determining reason for any teaching dissatisfaction may suggest remedies which when implemented will bring about a change of teacher attitude and subsequently increase the effectiveness of the curriculum. For similar reasons, attitudes of other involved groups, such as

students, administrators, and parents, are also important. Other factors being equal, a new programme is using considered to be cost-effective if one of the following is true:

1. it costs essentially the same as other programmes but results greater achievement (if achievement is the

same, why bother?!);

2. it costs less and results in equal or greater achievement (unlikely);

3 it costs more but results in significantly greater achievement (assuming it is affordable to the group investigating effectiveness).

Curriculum evaluation has one major well-known problem associated with it, it is very difficult to compare fairly the effectiveness of one programme or approach with another. Even it two programmes deal with the same subject area, they may deal with objectives which are very different, and it is very

difficult to find a test or other measure which is equally fair,

or valid for both programmes.

If one curriculum is to be compared to another, the

objectives of each may be examined carefully; if no measure can

(17)

15

be located which is equally appropriate to both, then one must be developed that is.

Matrix sampling basically involves dividing a test into a number of sub-tests, using random sampling techniques on the

items, and administering each sub-test to a subgroup of students, also randomly selected from a total group. Matrix sampling

considerably reduces both testing time and cost.

School evaluation

Evaluation of a school involves evaluation of the total educational programme of the school and entails the collection of dataton all aspects of its functioning. The purpose of school evaluation is to determine the degree to which school objectives are being met and to identify areas of strength and weakness in the total programme. Information from school evaluation provides feedback which gives direction to the future activities of the school and results in decisions concerning the allocation of

school resources.

One major component of school evaluation is the school

testing programme; the more comprehensive the testing programme, the more valuable are the resulting data. A school testing

programme should include measurement of achievement, aptitude, personality, and interest. Tests selected for a school e

valuation must match the objectives of the schools and be

appropriate for the students to be tested. The question of the appropriateness of norms is a critical issue; when available, system norms are more useful as a basis of comparison than

national norms.

A school evaluation involves more than the administration of test to students; it may require any combination of

questionnaires, interviews and observations, with data being collected from all persons in the school community, including administrators, teachers and counsellors. School evaluation is a total effort and all relevant groups must be involved in all

phases - planning, execution, interpretation of results, and subsequent decision making.

Evaluation of large Populations

Evaluation of large populations involves assessing the current status and the educational progress of large numbers of students, typically distributed over a large geographic region.

State wide assessment programmes are generally based on the premise that the state system of education is responsible for student achievement of certain basic skills required for

effective functioning in society, and that programmes designed to promote achievement of the basic skills should be as effective and economical as possible.

(18)

One of the major purposes of state assessment is to provide information to the state and local decision makers about the adequacy of the basic educational programme. State wide

assessment typically involves measurement of minimum educational objectives and selected optional objectives using both criterion referenced and norm-referenced tests.

Evaluation of special projects and programmes

Special projects and programmes include all those organized efforts which are not, strictly speaking, part of the regular school programme; they are typically innovative in nature and the duration of their existence is dependent upon their success.

Whether it is required or not, conducting an e valuation is in the best interest of a project since it is the only valid way to verify its effectiveness. One problem unique to project

evaluation is that by its very nature it is likely to be

concerned with objectives for which there are not corresponding standardized instruments; all needed instruments must be

developed, validated, field-tested and revised until acceptability criteria are met.

Evaluation of personnel

Evaluation of personnel (staff evaluation) included evaluation of all persons responsible, either directly or indirectly, for educational outcomes, i.e., teachers, administrators, counsellors, and so forth.

One of the major reasons that this ares of e valuation has been so slow to progress is that it is so complicated; it is difficult to determine what behaviours are to be evaluated.

Although the degree of responsibility is debatable, teachers are in some way accountable for the achievement of their students.

The solution to the personnel evaluation problem, at least for the present, involves collecting the best data possible, from as many sources as possible.

Phases of Evaluation

The continuity of evaluation

Evaluation is a continuous process; contrary to public opinion, it is not what you do "at the end". Evaluation should be planned for prior to execution of any effort and should be involved throughout the duration - at the

beginning, in the middle, and at the end, it there is an end; there are typically a series of temporary "ends" in a continuous cycle. Every stage of every process is subject.

(19)

17

to evaluation, beginning with the objectives. Evaluation must be planned for before execution of any effort.

When evaluation is put off until the end, it is very difficult to provide valid evidence concerning

effectiveness; also, valuation feedback, which would have come from ongoing evaluation and might well have resulted in more positive results, is missed.

Some basic problems associated with "at the end" evaluation include the following:

(a) objectives are frequently worthy ones but unmeasurable

as stated;

(b) data are likely to be insufficient and inappropriate;

(c) there may have been too few participants or data may have been collected on too few of them; what data were collected may not be the data needed for determining achievement of the objectives; and it is often too late to develop or obtain appropriate measures or to conduct reasonable observations. Careful planning for

evaluation does not guarantee that you will have no problems but it certainly does increase your chances.

The evaluation process entails decision making (decision about objectives, strategies, measurements, and so on); these various decisions can be classified in terms of when they are made, that is, during what stage of the effort of interest.

Each of the three phases of evaluation involves different kinds of decisions; the planning phase deals with "What will we do?"; the process phase asks "How are we doing?"; and the

product phase is concerned with "How did we do, at least so

far?".

The planning phase

The planning phase of evaluation takes place prior to actual implementation and involves making decisions about what course of action will be taken towards what end.

Situation analysis

The first step in the planning phase is to analyze the present situation in order to establish the parameters of

the effort.

Situation analysis includes activities such as the collection of background information and assessment of existing constraints.

(20)

After the parameters of the situation have been established, more realistic goals and objectives can be formulated.

Specification of objectives

Goals are general statements of purpose, or desired

outcomes, and are not as such directly measurable; each goal must be translated into one or more specific t objectives. Objectives are specific statements of what is to be accomplished and how well, and are expressed in terms

of quantifiable, measurable outcomes.

Process objectives describe outcomes desired during the execution of the effort; in other words, they related to development and execution. Product objectives describe outcomes intended as a result of the effort.

Process objectives do not typically involve student _ _ behaviour; they deal with those strategies and activities which are intended to result in achievement of the product

objectives.

Objectives give direction to all subsequent activities_and achievement of objectives is ultimately measured. It is critical that objectives themselves be evaluation in terms of both relevance and measurability, substance, and

technical accuracy.

Specification of prerequisites

Specification of a given set of instructional objectives is based on the assumption that students have already acquired certain skills and knowledge; if the assumption is

incorrect, then the objectives are not appropriate. These assumed behaviours are referred to as prerequisites, or

entry behaviours.

Systematic instruction and e valuation require that prerequisites be specified and measured (with a test o£

entry behaviour), especially at the beginning of a school year or a new unit of study. To arrive at pre-requisites, you simply ask yourself: What must my students know or be able to do prior to instruction in order that they may benefit from instruction and achieve my objectives.

Selection and development of measuring instruments

More often than not, collection of data to determine the degree of achievement of objectives requires administration of one or more instruments, but not always; certain data, such as attendance figures, will normally be a matter of

record.

(21)

3.9

Selection of an instrument is not simply a matter of

locating a test which appears to measure what you want to measure; it involves examination of those that are available and selection of the best one available, best being defined in terms of being most appropriate for your objectives and

your user group.

If an acceptable test cannot be located, then one must be developed; training at least equivalent to a course in measurement is necessary in order to acquire the skills needed for good instrument development.

At a very minimum, post-tests of desired behaviour are required in order to measure achievement of product objectives. It is also almost always a good idea to administer a pretest of the same material prior to

instruction or implementation; assessment of entry behaviour

is also recommended.

While all needed instruments are identified during the

planning phase, they are not necessarily all available prior to the process phase, especially if one or more must be

developed.

Delineation of strategies

Strategies are general approaches to promoting achievement of one or more objectives; we may speak of instructional strategies, curriculum strategies, programme strategies, and the like.

Each strategy generally entails a number of specific

activities, and there are typically a number of strategies to choose from. Execution of these strategies must be planned for, in order to ensure the availability of

necessary resources.

Task analysis is a process for determining the order

(sequence) in which objectives should be taught which also facilitates the translation of goals into objectives. Task analysis involves ordering a set of knowledge and skills into a hierarchy in terms of which ones are subordinate, or prerequisite, and to which one?

To perform a task analysis, you start with a terminal, or cumulative, objective at the top of the hierarchy; to then move down the hierarchy by determining what capabilities students need in order to achieve that objective following instruction; this process is repeated for each identified objective until a point is reached at which the necessary prerequisite capabilities are considered to be entry

behaviours.

(22)

There is considerable research evidence which indicates that systematic review of concepts promotes retention; provision of practice and feedback are also variables which have been shown to promote learning.

There are a number of instructional strategies, such as sequencing, review, feedback and practice, which are generalizable across subject areas and objectives.

Other instructional strategies, such as grouping, role playing, and use of a given medium, will be appropriate

only for certain objectives.

Selection of design

Use of a research design is not always appropriate,

necessary, or feasible; when called for, however, the very best one that is feasible should be selected.

Experimental design refers to the basic structure of a >

process; its purpose is to assist in making valid decisions concerning the effectiveness of an approach (or product) and

usually involves comparisons between or among groups using different approaches.

What differentiates a "good" design from a "poor" design is the degree to which the design assures initial group

equivalency and controls for unwanted factors. The best single way to ensure initial group equality is to take one large group (randomly selected if possible) and randomly divide them in half, each half receiving a different

treatment, e.g., curriculum. If random assignment is not feasible (and sometimes even when it is), at least pretest data should be collected; such data allow us to see how

similar the groups actually are.

In many evaluation studies, randomization is not feasible because they are conducted in real-world settings such as schools, where students are already in classes. When _

existing groups are compared, a quasi-experimental design is required; such designs are not as good as true experimental designs (which require randomly formed groups) but they do a reasonable job of controlling extraneous factors (and

remember, we always do the best we can).

You select the best design you possibly can which will yield the data you need, given the constraints under which you are operating. Student evaluation, per se, does not require application of an experimental design. Without carefully controlled comparisons, it is almost impossible to make

valid decisions concerning the relative effectiveness of two

or more alternatives.

(23)

21

Preparation of a time schedule

Preparation of a realistic time schedule is important for all types of evaluation; rarely do we have as long as we please to conduct an evaluation. Basically, a time schedule includes a listing of the major activities of the proposed evaluation effort and corresponding expected initiation and completion times for each activity.

You should allow yourself enough time so that if an

unforseen minor delay occurs, you can still meet your final deadline. The Gantt chart method is a very useful approach for constructing a time schedule. A Gantt chart lists the activities to be completed down the left-hand side of the page and the time to be covered by the entire project across the top of the page; a bar graph format is used to indicate the beginning and ending data for each activity.

The process phase

The process phase involves making decisions based upon events which occur during actual implementation of the

planned instruction, programme, or project. The first step in the process phase is to administer pretests, if such are appropriate, and in the case of pupil evaluation, tests of entry behaviour. Based on the pretest results, decisions may be made concerning the appropriateness of the already specified objectives.

Following initial testing, planned strategies and activities are executed in the predetermined sequence. The basis

purposes of this phase are to determine the degree of

achievement of process objectives, and to identify ways in which improvements can be made.

If several strategies are being used simultaneously, then at various points m time decisions will be made as to which ones are working and which are not. Very few efforts work out exactly as planned; there is nothing wrong with makinq changes m midstream if the end result will be improved

because of them. *

The process phase of evaluation is often referred to as

formative evaluation and is typically defined in discrepancy

terms, that is, differences between intended and actual

outcomes.

The product phase

The product phase involves making decisions at the end or more likely, at the end of one cycle of instruction (e.g.' a unit), a programme, or project. Decisions made during the

(24)

product phase are based on the result of past tests (of

achievement, attitude, behaviour, and the like) and on other

cumulative types of data.

The major purpose of the product is to collect data in order to make decisions concerning the overall effectiveness of

instruction, a programme, or project (or whatever). During this phase it is determined whether and/or to what degree intended product objectives were achieved; unanticipated

outcomes are also analyzed.

If an experimental or quasi-experimental design involving group comparisons was applied to the evaluation, final group performance is now compared to determine if differences are significant, i.e., probably due to chance factors or

probably due to differential treatments.

An independent evaluator (or evaluation team) either

verifies procedures and results as reported by the programme personnel or actually collects and analyzes appropriate

data. The major function of an independent evaluator is not to "check up" on anyone but rather to provide skills and knowledge related to evaluation which are often not

possessed by programme or project personnel.

Data analysis and interpretation is almost always followed by the preparation of a report which describes the

objectives, procedures and outcomes of the effort. Whether a report goes to parents, the school board, or an outside funding agency, it should be written understandably - it should communicate as clearly and concisely as possible.

The results of the product phase of evaluation are used in

at least three major ways:

1. They provide feedback and direction to all who were involved in the effort, e.g., students, teachers, programme directors, and thus, each cycle (of

instruction, for example) can benefit from analysis of the outcomes of the previous cycle;

2. They provide feedback to outside decision makers, such as parents, principals, school board members, and

funding sources; and,

3. Depending upon the type of evaluation involved, there are a number of groups who can utilize the results, i.e., guidance and counselling personnel, school administrators, educational researchers, and other

evaluators.

(25)

23

Results of the product phase need to be interpreted with care. Failure to meet objectives, for example, is not necessarily fatal; degree of achievement needs to be

considered. Just as the process phase is often referred to as formative evaluation, so also the product: phase is

frequently referred to as summative evaluation.

The Nature of the Evaluation Process

In general, the process of evaluation consists of

determining the degree and character of the value of something.

In education, evaluation refers to the process of determining the degree to which the objectives of an educational activity or

enterprise, have been achieved. The activity may be a large one, in terms of its duration and scope, such as an entire course or even a curriculum for producing persons with certain kinds of professional or general skills or the activity may be relatively restricted, such as the work of a single instructor with his group of adults over a period of a week or an hour. In either case, the evaluation process determines the degree to which the

*objectives' have been achieved.

As it has come to be practised in education, the process of evaluation has usually consisted of the steps, here listed in the logical, and what is also typically the chronological order in which these steps are carried out. Different writers would vary in the specific ways in which these steps are clustered or

grouped, but the essentials are well established.

Step l. Identify the objectives of the educational activity or enterprise. Often, perhaps, usually, the objectives are stated in term s of desired changes in the ways in which

students can behave (their capabilities) or will typically behave (their habits or tendencies). This way of stating objectives has several advantages over such alternatives as stating objectives in terms of teacher activities or educational facilities. These advantages result from focusing on the ultimate objectives

(student achievement of various capabilities and tendencies) rather than the various means (e.g. teacher activities or

educational facilities) by which these ends will be sought. it is a distinction between ultimate and intermediate objectives.

Once the intermediate objectives have been attained and, for

example, education, facilities have been constructed and teachers have been trained, evaluation must become concerned with whether their purposes - certain kinds of student achievement,

capability, and habits - are being attained.

Thus, in the "appraisal of a fourth education project in Tanzania", it is stated in the summary and conclusions (paragraph v) that "Immediate needs of the education system are the

expansion of teacher training, medical education, training of skilled craftsmen, and the integration of various types of rural

(26)

education and training schemes (including) work-oriented

education and training for youths and adults..." The kind of statement of objectives should be converted, as much as possible, into statements of desired changes in students * abilities and tendencies to behave in certain ways, e.g., perform such tasks as are entailed in various kinds of farming and skilled crafts.

Step 2. Specify the types of student behaviour that will be considered to reflect achievement of the objectives.

In the cognitive domain, the student behaviour would consist of ability to remember, comprehend, apply, analyze, synthesize, or evaluate ideas, facts, concepts, principles, problems, and the like. These various kinds of abilities should be defined in observable terms. Thus, instead of resting with such terms as

"knowledge, comprehension, critical thinking, ore grasping the significance of", one should attempt to use terms that imply observable behaviour, such as "stating, recognizing,

distinguishing true statements from false, matching, putting into onexs own words, computing, naming, stating relationships

between, or listing the consequences of".

In the affective domain, the student behaviour would consist of various degrees of internalization of, and commitment to,

attitudes and values considered desirable. Here again, an effort to maximize observability should be made. Thus, instead of such terms as Appreciate, have an interest in, respect, etc*, one might use such terms as *on his own initiative, he seeks out, tries, approaches, speaks favourably of, spends his own money for. pays attention to, etc.'.

In the psychomotor domain, the student behaviour readily takes such observable forms as ways of moving large and small limbs, communicating and expressing non-verbally, and motor aspects of ways of speaking and writing.

Step 3. Construct situations in which the student will be

required or expected to demonstrate the desired ways of behaving.

The situation may range from simple and short questions (or xitem') that will elicit a kind of knowledge to more elaborate problems that will call for a higher mental process (e.g., mechanical, spatial, aesthetic, social, or whatever - the main criterion being relevance to the objectives. The situations may be contrived entirely or the kind that occur regularly in xreal

life'.

In devising such situations, it is often desirable to make surveys of representatives samples of the real-life conditions and circumstances under which the desired ways of behaving would be exhibited. Thus, when it comes to certain kinds of farming skills, the objectives should be defined on the basis of surveys of the kinds of soils, crops, terrain, and the like, in which the farming will be done. For various kinds of skilled crafts,

(27)

25

surveys should be made of representative samples of the kinds of carpentry, wood, joints, structures, etc., that a carpenter would need to deal with in the work for which he was being trained.

Similarly, the kinds of habits of accuracy, neatness, punctuality, and dependability entailed in the successful performance of the work would be specified. All of the

specifications should be based, on carefully designed or surveys of representative samples of the real-life situation in which the student or trainee will subsequently work.

Step 4. Determine the criteria or standards that will be used to assess the value - correctness or desirability - of the

behaviour elicited. In simple paper-and -pencil tests, this step consists of determining the ^scoring key', i.e., the list of correct answers. In more complex on non-symbolic situations, such as those in which a real-life performance is to be evaluated - e.g., a dance, a lathe operation, a meal preparation, or a

discussion-group participation - this step consists of:

(a) a list of important dimensions of the performance and, (b) an accompanying set of scales for use by observers,

measurers, or judges of the performance.

Needless to say, the criteria to be used should be determined by the values, customs, and standards prevalent in the culture of the society in whose education system the evaluation is being performed. Tests, criteria, and standards appropriate in one culture will in many instances be inappropriate in another. Only curriculum experts in any particular country can judge and ensure this appropriateness of criteria and standards.

Step 5. Apply the measuring instruments - the tests, rating scales, situational performance tests, etc. - to the students whose achievement of the objectives is being determined. This step refers to the administration of the tests. It can take the form of individual or group testing under highly or loosely

standardized conditions, depending on what is necessary and feasible for the purpose of the evaluation.

Individual testing is, of course, much more expensive than group testing. It requires much more time on the part of the examiners, and frequently it is more important in individual testing that the examiner be highly trained. Group tests, by definition, can be given to groups ranging in size from 2 to 2,000 students at a time depending on the size of the room in which the testing is being conducted. The cost of testing is thereby reduced materially. Most individual tests usually

require hand scoring by experts, whose time is expensive. Group tests, on the other hand, can be scored by clerks or by machines.

(28)

Thus, in general, group testing is to be preferred except under special conditions. These conditions are those in which the testing must be carefully adjusted to the child or adult being tested. If highly idiosyncratic behaviours are being

looked for - behaviours of the kind that might be exhibited by exceptional children or adults (either retarded or highly

gifted), or are to be examined for signs of creativity and

originality - then individual testing may be necessary. But, for the vast majority of educational objectives for the vast majority of students, group tests will serve. Individual tests can take the form of oral examinations and interviews, and their high

expense is justifiable when the importance of the examinee or the position to be filled is correspondingly high. But for the

purposes of selecting a considerable number of persons from a large number of applicants, the less expensive group testing

procedures usually turn out to be at least equally valid and much

less costly.

Step 6. Score the behaviour - the responses or performances - elicited by the measuring instruments. The score will take such form as xnumber of correct responses' or xtotal rating on the various dimensions' or xaccuracy of the product prepared in relation to specified attributions'. The score should tell the degree to which the objectives embodied in the questions or items of the measuring instrument have been achieved by the student.

The degree to which the examination or test does yield accurate information on achievement depends on its reliability and

validity.

Reliability is the degree to which the measuring instrument yields consistent results, regardless of what it is actually measuring. Thus, if a test yields the same score from one occasion to another - whether the occasions are separated by minutes, hours, days, or weeks - it is said to be reliable. But if memory is likely to make the student respond the same way to the same tests on two different occasions, the reliability of the test is better estimated by using two equivalent forms of the same test. Then the agreement between scores on the two

equivalent forms - an agreement which is often measured by means of a co-efficient of correlation - is used as an estimate of the test*s reliability. Or, reliability can be estimated by

determining the degree to which two equivalent halves of the same test yield results that agree, or correlate, with one another.

Finally, reliability can be estimated in terms of * internal consistency', or the degree to which the items of a test yield scores that agree, or correlate, with one another. Statistical formulas have been developed and are well established for use in estimating reliability in these various senses. They are used in estimating an important characteristic of the measuring

instrument - the degree to which it yields measures that are

(29)

2?

* accurate' apart from the degree to which they measure that the test is actually intended to measure.

Validity refers to the latter characteristic - the degree

to which the test measures what it is intended to measure. What it is intended to measure is, in the case of achievement tests defined by the educational or instructional objectives whose achievement the test is designed to measure. For most

achievement tests, validity can be determined primarily by inspection of the test*s content and of the psychological or behavioral processes that it elicits from the student. Thus if a test is intended to measure higher mental processes, such as reasoning<ability or the ability to synthesis information of various kinds, but inspection reveals that it calls upon the student merely to recall or recognize information that he has received rom books or lectures, then the test would be judged to

+Z l°W*XI!uvalldltv* Another example of low validity would be

??-4.°i fc Whlch is designed to determine the students

attitudes towards, or appreciation of, some of the literature he

has been studying. if the student knows that the teacher or

school desires him to express favourable attitudes, and if he Knows that his responses will not be anonymous and that

unfavourable responses will redound to his disadvantage, then

again the test can be judged to be low in validity. it is siraplv not measuring what it was intended to measure.

The validity of tests used for selection or research

annro^?^aVaKe d!fferent forms fr™ the judgemental procedures

appropriate for achievement tests. When a test is used for

selection purposes, its prediction validity is estimated by the degree to which scores on the test correlate with subsequent

measures of criteria on performance on a job. Thus, tests used

to select draughtsmen would have high predictive validity if

scores on the test correlated substantially with ratings of the

draughtsmen obtained some months later from their supervisors l*J**Z ' 5hS tfsts would have succeeded in predicting success on the job and would thus be considered to have predictive validity If the criterion measures of success on the job are obtained at the same time as the test scores, the resulting correlation

between the two sets of measures is termed the test^s concurrent

validity. Thus, a test used to measure student attitude toward agiven author which correlated substantially with whether or not the students borrowed books by that author from the school

library would be considered to have high concurrent validity.

Step 7. Judge the degree to which the score obtained by the students reflects achievement of the objective of the educational

activity or enterprise. Two major approaches to making such

judgements are termed the norm-referenced and the criterion-

reference approaches. In the former, each students score'is

compared with that of other students constituting a norm group

Such a norm group might be * other students who have just taken

(30)

the same course', or it might be a Representative sample of students of the same grade-level throughout the nation'. The ( given student is then found to have a certain percentile rank in relation to the norm group, i.e. to equal or exceed a certain

percentage of the students in the norm group.

The success of a norm-referenced test is judged by the

degree to which it discriminates among students. The purpose of the test is to spread students out-to put them in a rank order ranging from the highest achieving to the lowest. A test that does not discriminate in this way is judged to be unsuccessful in creating variance, and hence it cannot be reliable or valid, if norm-referenced approaches are being used. Norm-referenced measurement imposes a kind of competition among students. Each student's achievement is evaluated primarily by being compared with that of the other students in his class, school, or

community. By this approach, some students are forced to be inferior, because some students must by definition be below

average; indeed, half of the students must always fall below the median, of course, and suffer the corresponding implications of

inferiority.

For many years, it was considered necessary to use the norm- referenced approach because it was assumed that no other way of judging educational achievement - other than by comparing that of one student with that of other students was possible. Mor

recently, it has been realized that achievement measurement can be referred to objectives rather than to other students. It is possible to set up educational or instructional objectives, and to define observable behaviours that indicate achievement of those objectives, than it is possible to evaluate achievement by eliciting and evaluating those behaviours directly, without

reference to the performance of other students-

In criterion-referenced interpretation of test scores, the judgement is made directly with reference to the pre-specified types of behavioral or performance objectives of the educational activity. If the objectives referred to * ability to read a

typical newspaper article', then the interpretation of the student's score refers directly to whether such an ability has been demonstrated. If the objective referred to 'ability to

produce a cylinder', with dimensions of a certain accuracy, on a lathe, then the criterion-referenced interpretation refers

directly to whether such an ability has been demonstrated.

Other examples could be found in the objectives of

programmes for training vehicle drivers, or subsistence farmers, etc So-called ^Minimal learning packages' intended to develop basic skills can be evaluated by criterion-referenced approaches.

Norm-referenced evaluation has been used frequently in the past. It is closely linked to 'grading on the curve', which

(31)

29

S of

aegree that such a goal is not achieved, one looks toward

iiS ssBsrsr-s si

iiS.,™- ssBsr.srss si

differences in aptitude or intelligence among students.

baSic distribution between norm-referenced and

;er^meaSUrret h ilii

for various

valid in the sense that the student's achievement d

not consistently reflect achievement of thttbjectfvis.

The formative and summative functions of evaluation

mmmm achieved. There are, however, numerous occasions when

(32)

Misunderstanding about the function of evaluation arise from the failure to distinguish between summative and formative

evaluation. The former comes at the end of an educational

activity or project, while the latter occurs continually during the activity or project, with the aim of producing information which can be fed back to ensure that the appropriate aims and objectives are being attained. The information fed back from formative evaluation may necessitate changes in any of the first four steps listed earlier. They may call for changes in the situation or procedures developed in order that specific objectives will be achieved, or in the determination of the criteria or standards used to assess the value of the required behaviour. There are occasions, however, when modifications may

be required in the objectives themselves.

In theory, whether an evaluation is to be summative or formative, the objectives to be achieved should be clearly stated at the outset of new project. In practice, it is often the case that summative evaluation is added on to a project almost as an afterthought. In such instances, formulations of objectives may not be produced until the project is almost completed. The danger here is that the evaluation may then become more concerned with those objectives which have in fact been achieved, rather than with those which the project was originally intended to achieve.

The uses of summative evaluation

Two kinds of circumstances can be identified in which

summative evaluation may legitimately be carried out. The first is in a project, usually a short tern one, in which the

circumstances will hardly permit any changes being introduced during the course of the project. An example would be a project which required the building of a number of additional secondary schools, or further teacher-training colleges, without any

initial specifications that the curricular in the new

establishments should be changed to meet new requirements. The summative evaluation would then be largely concerned with

quantitative aspects, or perhaps with ensuring that here were not differences in the average levels of achievement of students in the new institutions as compared with those already existing.

In making a summative evaluation, it is often desirable to have a baseline against which the results of a given educational activity can be measured. Such a baseline can be obtained from the performance of a * control group' - a group of students who have not received the kind of educational experience or training that is being evaluated. Sometimes such a control group received no training at all. In other instances, the control group may receive the traditional or regular kind of training that one is trying to improve upon. Ideally, the experimental group - the one that receives the new kind of training - and the control

(33)

31

group - the one that received the old kind of training or none at all - are ^randomly equivalent'. That is, students in these two groups are assigned at random from the total group, by some

mechanism such as a table of random numbers or the tossinq of a coin. Such randomization, if it is feasible, and if the control and experimental groups are large enough, is sufficient to

ensure that all other possible explanations of any significant post-training differences between the two groups cannot be attributed to extraneous factors, such as differences in age,

S°2la\2laSS' home background, or whatever. For randomization

makes the two groups non-significantly different in all

conceivable factors other than the experimental variable - the differences between the new and old kinds of education

curriculum, or instruction.

Sometimes it is possible to introduce additional refinements

by using statistical methods to adjust for whatever differences

between the experimental and control groups. Thus, if the two groups are found to differ in scholastic aptitude, even after randomization, this difference can be used to adjust the post- instructional achievement test scores so as to eliminate such aptitude differences as an explanation of the achievement differences. if the adjusted achievement scores still differ significantly, i.e., to a degree greater than can be accounted tor by chance fluctuation in random sampling, then it must be inferred that the instructional differences made a genuine difference m achievement that cannot be attributed to differences in aptitude.

The second circumstance in which summative evaluation mav be

legitimately required occurs at the end of a project during the course of which formative evaluation had been carried out. The primary aim of formative evaluation is to ensure, though changes

brought about by the feedback process, that the originally stated objectives of a project are being achieved. In many instances,

however, the feedback process causes changes to be made in the initially specified objectives, because some may prove to be unattainable m their original form. There this happens,a summative evaluation should report on the extent to which both original and modified objectives have been achieved. An instance of this second type of summative evaluation might occur in a

project which requires "revisions of the secondary school structure and curriculum in order to improve" the standard of

br?nSfn^y«H^°^ed^CatiOn- Any ori?ina^Y planned objective for

bringing about the improvements required might well have

undergone a revision as a result of a feedback from a formative

evaluation carried out during the course of the project.

It has already been emphasized that the translation of

general goals into appropriate sets of behavioral objectives forms the first important step in the evaluation process.

Detailed behavioral objectives take time to be developed

Références

Documents relatifs

The Decision requested AU Member States, supported by the AU Commission and the African RECs (Regional Economic Communities), to drive the process of boosting

Relevant and effective curriculum planning and the determination of subject content and teaching - learning experiences is dependent upon a clear definition of the national goals

Most significantly, priority needs will vary between countries with respect to the degree of commitment with which national manpower and employment policies are designed and

do indicate some of the broad classifications of/educational objectives which are possible of educational attainment by students.—' However, an educational objective which is

Evaluation of training programmes in terms of on-the-job behaviour is more difficult than the reaction and learning evaluations described in the two previous sections. A more

He continued and noted that the course provided an opportunity for both participants and tutors to delve into the areas of curriculum development and research

Certainly the two handbooks of the &#34;Taxonomy of Educational Objectives: Cognitive Domain and Affective Domain&#34; do indicate some of the broad classifications of

Evaluation should be _; a_ built-in- component of a programme spanning the entire life of a project from inception to ex-post evaluation to allow for refinement of