• Aucun résultat trouvé

TUTORIAL 1: USING NATURAL LANGUAGE PROCESSING TOOLS IN EDUCATIONAL DATA MINING

N/A
N/A
Protected

Academic year: 2022

Partager "TUTORIAL 1: USING NATURAL LANGUAGE PROCESSING TOOLS IN EDUCATIONAL DATA MINING"

Copied!
1
0
0

Texte intégral

(1)

Using Natural Language Processing Tools in Educational Data Mining

Scott Crossley

Georgia State U.

Atlanta, GA 30303

scrossley@gsu.edu

Laura K Allen

Arizona State Univ.

Tempe, AZ, 85287

LauraKAllen@asu.edu

Danielle S. McNamara

Arizona State Univ.

Tempe, AZ, 85287

dsmcnama@asu.edu

ABSTRACT

The workshop will cover the development, use, and educational data mining applications of a number of freely available natural language processing (NLP) tools such as Coh-Metrix, the Writing Assessment Tool (WAT), the Simple NLP (SiNLP) tool, the Tool for the Automatic Analysis of Lexical Sophistication (TAALES), and the Tool for the Automatic Analysis of Cohesion (TAACO). The workshop will provide the participants with an overview of the types of linguistic features that can be measured with these NLP tools. Additionally, it will describe how these features have been (and can be) used in text analyses that are of importance to the educational data mining community. Participants will receive hands-on training with the tools using data from computerized learning environments.

Participants will also be shown how the output from these tools can be used to develop machine learning algorithms that can aid in predicting educational outcomes.

Keywords

Natural language processing, Data mining, machine learning

1. Description of the Tutorial Content and Themes

In this workshop, participants will be introduced to a number of freely available natural language processing tools. The primary focus of the tutorial will be to familiarize participants with the linguistic constructs measured by the tools, including lexical sophistication, text cohesion, rhetorical style, syntactic

complexity, and text organization. The constructs that will be discussed have all been shown to correlate with student success in various educational settings. The remainder of the tutorial will focus on how these constructs can be applied in educational data mining research to predict educational outcomes. The basic outline for the tutorial is an introduction to linguistic constructs, an overview of available NLP tools, a description of how the linguistic constructs are calculated in the tools, and a discussion of the links between these linguistic constructs and educational outcomes (e.g., performance, attitudes, etc.).

1.1 Justification for Importance of Topic

Natural language processing tools have been widely used to better inform research in a number of fields including cognitive science, medical discourse, literary studies, language learning, and the social sciences. However, large-scale use of NLP tools in educational data mining is much less common. Recently, the strength of NLP has begun to be recognized, as evidenced by a number of funding opportunities, and research findings in fields related to EDM (e.g., MOOCS, ITSs, etc.). This suggests that NLP is poised to become an important element of educational research, particularly when used in combination with more traditional measurements of student success (i.e., psychometric data, system interaction data, and sensor data). The convergence of readily available NLP tools, large-scale educational data sets, and data mining techniques can provide EDM researchers with new approaches to better understand and predict variables related to educational success.

Références

Documents relatifs

This module searches a file of code or documentation for blocks of text that look like an interactive Python session, of the form you have already seen many times in this book.

More specifically, we deployed on a virtual cloud environment a virtual network function running a panel of three applications, namely a proxy, a router, and a database, to which

Here, we conduct a comparison of four state of the art NLP tools at the task of recognizing Gene Ontology concepts from biomedical literature using the Colorado Richly

In Pierpaolo Basile, Franco Cutugno, Malvina Nissim, Viviana Patti, and Rachele Sprugnoli, editors, Proceedings of the 5th Evaluation Campaign of Natural Language Processing and

For this research, we have applied our system to create the initial version of the Anthology of the Spanish Society for Natural Language Processing 1 –the SEPLN Anthology: a

Keywords: Natural Language Processing, Compositional Distributional Model of Meaning, Quantum Logic, Similarity Measures..

In Pierpaolo Basile, Anna Corazza, Franco Cutugno, Simonetta Montemagni, Malvina Nissim, Viviana Patti, Giovanni Semeraro, and Rachele Sprugnoli, editors, Proceedings of Third

SAS® Analytics U is a new initiative for making SAS data analysis and mining tools available for free to educational researchers and instructors.. T hese tools