Experiment Methodology - Evaluation Methodology

Evaluation Methodology

6.2 Experiment Methodology

The general goal of the experiment series discussed in the following sections was to collect data on students’ “as-is” state, which was used as a baseline condition; to then ask them to train their L2 competences by using the CALL-SLT tool; and again to collect a set of data on their post-experimental status for comparison. On these grounds we intended to analyze the effect that the use of CALL-SLT has on a student’s post-experimental condition concerning his or her L2 competences. By additionally collecting information on each student’s personal background, we also meant to set our analyses in context with variables such as the L1, students’ tech-savviness, or their general interest in learning a second language. In addition to the quantitative data collected pre- and post-experimentally (in both written and oral forms), a feedback questionnaire allowed us to collect students’ qualitative perceptions of the usefulness of the proposed exercises.

6.2.1 Experiment Setup

In order to collect the above-mentioned data, we conducted a series of experiments in 15 different school classes, at seven schools, in six cantons in German-speaking Switzer-land (namely the cantons Basel, BaselSwitzer-land, Aargau, Solothurn, Bern and Nidwalden).

The diversity of this group helped us to get a better picture of the situation across various cantons and schools. In addition it allowed us to eliminate the risk of having biased results, which the influence of uncontrollable variables, such as the teacher or the school setting, can evoke in the experiment outcome.

The setup of the experiments required the participating students to use the eight CALL-SLT lessons (as introduced in Chapter 5, Table 5.2) to train their L2 communica-tion skills during a four-week period in a home-study environment (c.f. Chapter 5, sec-tion 5.3); we then measured the students’ L2 competences pre- and post-experimentally in order to evaluate a potential improvement in their skills through the active use of CALL-SLT. For the experiment series described in this thesis, all eight lessons were ac-cessible during the entire experiment phase, meaning that students could freely decide which lessons they wanted to play and how often they wanted to repeat them. In total there were 622 different prompts available across the eight lessons.

For objective evaluation purposes, both written and spoken L2 competences were measured throughout the experiment phase. The written competences were evaluated on the basis of the scores achieved in two written placement tests (administered pre- and post-experimentally) and the oral L2 competences were measured on the basis of the students’ spoken interactions with the system. Since every interaction was recorded and saved together with a number of additional information (such as the system prompt, a timestamp, an automatic transcription of the students’ utterance, whether or not the help function was used, and whether or not the utterance was accepted by the system), the students’ performance in the beginning could be compared to their performance in the end of the experiment phase, in order to determine whether an improvement in the oral L2 competences could be observed.

In order to guarantee validity across experiments, the experiment setup always followed the same three phases (pre-intervention, intervention and post-intervention), which are described in more detail below.

Pre-intervention phase An experiment always started with a briefing of the teacher, followed by informing the students and their parents about the experiment (c.f. subsec-tion 6.2.2). The actual experiment then began with the students taking a placement test during class time and under the supervision of the teacher. Additionally, all students filled in a questionnaire, providing information on their personal background. More details on data collection are given in subsection 6.2.3. The students then received comprehensive instructions (in the form of instruction sheets, a dedicated webpage and video tutorials, c.f. subsection 6.2.2) and a personal login for CALL-SLT. In addition to the instructions handed out, the teachers also introduced the CALL-SLT tool during class time in order to ensure that all students understood how to use the program.

Intervention phase Once these preparatory measures were taken, the students were instructed to use the CALL-SLT tool on their own as a homework assignment. In order to guarantee the appropriate use of our technology, the teachers were asked to regularly encourage their pupils to use the tool and to dedicate a small amount of their weekly class time to the trouble-shooting of CALL-SLT problems, if necessary. Apart from the class time dedicated to CALL-SLT instructions and trouble-shoots, the pupils were instructed to regularly use CALL-SLT at home, where they were supposed to train their L2 skills by playing the eight proposed scenarios during a period of four weeks.

All eight lessons were available throughout the experiment phase and students were free to choose which lesson they wanted to follow. As a guideline, “on a regular basis” was defined as using the system at least twice a week for approximately 10 minutes. During this phase, the usage (number of logins and interactions per student and week) was regularly communicated to the teachers, so that they could take respective measures if students did not use the tool as they were instructed to. If desired, the teachers were also given access to the log-files of their pupils’ recorded utterances in order to control their performance at any time during the experiment phase.

Post-intervention phase This active phase, during which the students used the system, was followed by a third and final experiment phase. In this final phase, all stu-dents took the same written placement test as they had before using CALL-SLT. This was again done under the teacher’s supervision during class time. This time, the stu-dents additionally filled in a questionnaire providing qualitative feedback on CALL-SLT

and the experiment. At the end of the experiment, the teachers received a final report with all user statistics and the logged recognition results individually listed per student.

6.2.2 Information Policy

As we discussed in Chapter 5, the integration of ICT in the traditional classroom is a delicate issue. It is hence of particular importance that a close collaboration with the school, the language teachers and the pupils is in place at all stages of the experiment – from the preparation, to the realization and to the evaluation phase. In order to in-form all parties about the experiment at one moment or another throughout its course, different measures were taken.

We informed the participating English teachers ourselves during an in-person or telephone meeting, at which point all organizational and technical details of the ex-periment were discussed and all of the teachers’ questions were addressed. During the experiment phase itself we were in constant contact with the teachers, providing them with user statistics and assisting them in case of instructional or technical problems.

A second step was to inform the pupils’ parents about the experiment. A few weeks before the start of the experiments, the parents received an information letter describing the intervention and what was expected of the students in terms of workload during the experiment phase. The parents were additionally given a login for CALL-SLT to give them the opportunity to become familiar with the application before the experiment started. Furthermore, all parents were asked to fill in a consent form, giving us the permission to use the collected data of their children in an anonymized form for research purposes. The information letter and the consent form can both be found in Appendix D.

In a third step, the students themselves had to be informed and instructed. This was done by the English teacher in order to avoid a halo effect in the experiment evaluation. Here the proven strategy was to conduct an introductory session in the school’s computer lab, where the application and its functioning were demonstrated and the pupils had the chance to practice with the tool in a classroom environment before using it on their own at home. Since not all lower secondary school children are familiar with using information technology in their everyday life, it was essential

Figure 6.1: CALL-SLT Internet page

that they had the possibility of testing the CALL-SLT tool in an assisted environment when engaging in the game for the first time. In addition to this introductory session, they also received a printed instruction sheet (c.f. Appendix E), as well as the link to a dedicated Internet page where they could find all important information (c.f. Figure 6.1). This page included (1) the link to the CALL-SLT application, (2) a detailed instructions page on which the most frequently asked questions were answered in the students’ L1 (German), (3) a series of two video instructions (a short and a longer version) demonstrating a “live session” of a young teenager using CALL-SLT, (4) a contact form via which students could address any urgent matters to us directly, as well as (5) a guestbook, where students could leave their comments concerning CALL-SLT or the experiment. According to the statistics, participants made regular use of this webpage with a total of 2,095 visitors and almost 10,000 page views throughout the project. As far as urgent matters are concerned, we received mails from roughly 5% of the participating students, which mainly concerned microphone or firewall issues.

6.2.3 Data Collection

As mentioned in subsection 6.2.1, we collected a series of different data sets that served as a basis for our evaluations. More precisely, the following data collections were used as a basis to analyze a possible improvement of students’ L2 competences with the use of CALL-SLT and students’ subjective appreciation of various CALL-SLT components.

Written placement test The content of the written placement test was largely based on the content covered by the CALL-SLT lessons, as described in Chapter 5, section 5.4. From the written placement tests, which were filled in twice by all students (pre- and post-experimentally), we collected the following scores on written L2 skills.

Table 6.1 gives an overview of the four exercises with examples (more details on the written placement test can be found in Appendix F).

1. The first exercise served as an indicator of the students’ syntactic grammar skills, as it consisted of building a grammatical sentence out a number of non-inflected words.

2. The second exercise gave information on students’ vocabulary skills, consisting of a simple translation task of individual terms.

3. The third exercise covered information on the students’ (written) conversational skills. In this exercise the students had to fill in speech bubbles of a dialogue that matched the utterances that were already filled in.

4. The fourth exercise gave the most relevant information on knowledge trained by the CALL-SLT tool: it consisted of translating entire sentences, as they were used one-to-one in the exercises covered in the CALL-SLT lessons.

The written placement test was developed in close collaboration with the subject matter expert, who gave valuable advice on the type of exercises that were suit-able for the target group. In order to guarantee consistent scores (pre- and post-experimentally), all placement tests were corrected at the same time by the course designer (a German native speaker with native-like knowledge of the L2) in a blind scenario, not knowing which test was taken pre- and post-experimentally.

Table 6.1: Written placement test exercise overview

N° Example of exercise Example of a correct answer 1 (I / want / room / 4 nights) I want a room for 4 nights.

2 das Doppelzimmer the double room

3 “How are you?”

[STUDENT]

“I’m fine, thanks!”

I’m good, thanks! And how are you?

4 Ich m¨ochte ein Musical-Ticket f¨ur Dienstagabend.

I would like a musical ticket for Tues-day evening.

From a pedagogical point of view, it might be argued that the comparison of scores in a written placement test does not truthfully reflect students’ improvement, since the intervention variable – the use of CALL-SLT – mainly concentrated on the training of oral productive L2 skills. However, we argue that the written placement test nev-ertheless provides a useful benchmark with which to measure students’ knowledge of vocabulary and grammatical structures, considering that some parts of the CALL-SLT game were delivered in written form, such as the help examples. In order to additionally remove some weight from the written nature of the measured competences, the reviser was instructed not to account for minor spelling errors in her corrections.

The risk of the memory effect in such a test-retest procedure must be kept in mind;

however, since the time between test and retest administration was neither too short (which would foster the memory effect), nor too long (which would bear the risk of differential learning, in which students might learn different amounts during the time period), we argue that this effect is negligible. An interval of one month between the two tests rules out both possible effects mentioned above (Ary et al. (2013)). In addition to the time span between the administration of the pre- and post-placement test, the procedure was discussed with the collaborating teacher, who approved of this methodology. Another condition that strengthens our approach is that students did not know that they were to take an identical placement test at the end of the experiment, which means that they had no interest in remembering the content of the placement test administered pre-experimentally.

A detailed evaluation of students’ improvement in post-experimental written com-petences will be discussed in Chapter 7, section 7.2.

Logged interactions The recordings of the students’ interactions with CALL-SLT provide information on students’ vocabulary and grammar competences, as well as on their oral productive skills, such as pronunciation and fluency. Since the recordings are available for all interactions, they allowed us to compare students’ performance at the beginning of the experiment with their performance at the end of the four-weeks experiment phase. This served as a basis for comparing the students’ oral produc-tive competences pre- and post-experimentally, as was done with the above-mentioned written competences. Those results will be discussed in Chapter 7, section 7.3.

Pre-questionnaire The questionnaire on personal background that was filled in be-fore the experiment started provided information on various aspects that might have influenced a student’s performance.¹ Below is a list of the type of information that was provided in the pre-questionnaire (for a more detailed overview, c.f. Appendix G). The information provided in this first questionnaire was mainly used to analyze whether students’ personal backgrounds have an influence on their performance in our project.

Those results will be discussed in Chapter 7, section 7.4.

• age

• gender

• place of parents’ origin

• mother tongue

• other languages spoken at home

• previous stays in Anglophone countries

• subjective assessment of local language skills

• subjective assessment of L2 (English) skills

1The questions of the questionnaire on students’ personal background is based on the questionnaire proposed by the PISA study, which can be found here:

https://nces.ed.gov/surveys/pisa/questionnaire.asp

• subjective motivation to learn the L2 (English)

• number of years learning the L2 (English)

• number of years learning other L2s

• access to a computer at home

• access to Internet at home

• affinity with working with computers

• time spent on the Internet

• previous contact with CALL systems

• previous contact with speech recognition technology

Post-questionnaire The second questionnaire that was taken at the end of the experiment provided qualitative information on the students’ appreciation of CALL-SLT. The questionnaire covered the questions listed below (a copy of the post-questionnaire can be consulted in Appendix H).

• Have you learned English outside of school since the experiment started?

• What was your motivation to use CALL-SLT?

• Did you like using the tool?

• Did you consider the tool to be a useful supplement to traditional teaching meth-ods?

• Did the tool realistically imitate the conversation with a native speaker?

• Were the exercises too difficult / easy?

• Did you easily understand the videos?

• Did you easily understand the help examples?

• Did you always know what to respond?

• Was the (written / spoken) help function helpful?

• Did the tool understand your utterances?

• Did the tool help you improve your pronunciation competences?

• Did the tool help you improve your vocabulary competences?

• Did the tool help you improve your grammar competences?

• Did the tool help you improve your overall L2 competences?

• Did you prefer the gamified or non-gamified version of CALL-SLT?

• How did you like the GUI and prompt design?

• How much do you like learning the L2 (English)?

• Would you use CALL-SLT again in the future?

• Would you recommend CALL-SLT to your friends?

The answers to the questions listed above served as a basis for evaluating students’

subjective appreciation of various CALL-SLT components, such as the help function (c.f. Chapter 7, section 7.6) or the prompt design (c.f. Chapter 7, section 7.7). It also provided information on students’ overall appreciation of CALL-SLT (c.f. Chapter 7, section 7.8), as well as on their subjective perception of the tool’s usefulness in improving their L2 competences (c.f. Chapter 7, sections 7.2 and 7.3).

Teacher’s post-questionnaire In addition to the data collected about the partic-ipating students, we also asked all teachers to fill in a feedback questionnaire at the end of the experiment. This questionnaire provided information on teachers’ reasons to participate in the experiment, their satisfaction with their students’ engagement, as well as their appreciation of the experiment and the CALL-SLT tool. An overview of these results can be consulted in Chapter 7, section 7.9.

Table 6.2: Overview of participating school classes

School Class Teacher Canton Level Total n° students

As already mentioned, we used participants from various schools and cantons. We discussed in Chapter 1 that the Swiss school system is not uniform; Table 6.2 gives an overview of the participating school classes and their respective competence level.

Some cantons, such as Basel-Stadt or Nidwalden, do not distinguish the competence level of their classes at the lower secondary level, which is why we had to assume that the level within those classes was of a mixed nature. We can see that out of 15 classes, a third belongs to the highest level (level C), a third belongs to the intermediate level (level B) and two classes belong to the lowest level (level A). Whereas the level B and C classes were in their first or early second year of learning English, the two A classes were both one year ahead, in order to compensate for their lower overall competence level.

Figure 6.2: Gender distribution across classes

In total we had 310 potential participants across the 15 school classes, of which roughly two thirds – namely 207 students – participated in our experiment. As can be seen in Table 6.2, some classes had a smaller percentage of participants than others, which is due to the fact that some parents did not want their children to participate for a number of reasons, which were unfortunately not communicated.

From the 207 participants, we had a total of 112 male (54%) and 95 female (46%) participants. A breakdown per class can be seen in Figure 6.2. We can see that in roughly half of the classes, the gender distribution was balanced. However, there are six classes in which the portion of male participants was higher (classes 1, 2, 3, 5, 12 and 14), as well as two classes with higher female participation (classes 6 and 7).

The age distribution is displayed in Table 6.3. As expected, most students were between 13 and 15 years old (88.4%), with 6.2% who were younger (12 years) and 5.4%

who were older (16-17 years).

6.2.5 Activity Groups

For our evaluations, we divided all users into different activity categories. As we will see in section 6.3.2, these groups then served us for inter-group comparisons, as well as pre- and post-treatment comparisons, where each group was used as its own control.

Table 6.3: Overall age distribution

Dans le document The Potential of Interactive Speech-Enabled CALL in the Swiss Education System: A Large-Scale Experiment on the Basis of English CALL-SLT (Page 175-191)