An institutional content developer’s view on resources and repositories

(1)

Publisher’s version / Version de l'éditeur:

Learning Content Registries and Repositories Summit Submitted Position Papers,

2010-04-14

READ THESE TERMS AND CONDITIONS CAREFULLY BEFORE USING THIS WEBSITE. https://nrc-publications.canada.ca/eng/copyright

Vous avez des questions? Nous pouvons vous aider. Pour communiquer directement avec un auteur, consultez la première page de la revue dans laquelle son article a été publié afin de trouver ses coordonnées. Si vous n’arrivez pas à les repérer, communiquez avec nous à PublicationsArchive-ArchivesPublications@nrc-cnrc.gc.ca.

Questions? Contact the NRC Publications Archive team at

PublicationsArchive-ArchivesPublications@nrc-cnrc.gc.ca. If you wish to email the authors directly, please see the first page of the publication for their contact information.

NRC Publications Archive

Archives des publications du CNRC

This publication could be one of several versions: author’s original, accepted manuscript or the publisher’s version. / La version de cette publication peut être l’une des suivantes : la version prépublication de l’auteur, la version acceptée du manuscrit ou la version de l’éditeur.

Access and use of this website and the material on it are subject to the Terms and Conditions set forth at

An institutional content developer’s view on resources and repositories

Grégoire, Robert

https://publications-cnrc.canada.ca/fra/droits

L’accès à ce site Web et l’utilisation de son contenu sont assujettis aux conditions présentées dans le site LISEZ CES CONDITIONS ATTENTIVEMENT AVANT D’UTILISER CE SITE WEB.

NRC Publications Record / Notice d'Archives des publications de CNRC:

https://nrc-publications.canada.ca/eng/view/object/?id=869dba1b-b03f-42b1-92f0-007d04e7e5d7

https://publications-cnrc.canada.ca/fra/voir/objet/?id=869dba1b-b03f-42b1-92f0-007d04e7e5d7

(2)

An Institutional Content Developer’s View

on Resources and Repositories

Position Paper presented to ADL’s Learning Content Registries and Repositories Summit,

Apr. 13-14, 2010

Robert Grégoire

National Research Council of Canada Institute for Information Technology

Moncton, Canada robert.gregoire@nrc-cnrc.gc.ca

Abstract – From the perspective of a content development team,

the management of learning objects must be automated whenever possible, particularly with regards to extracting metadata at every step of the development process. Selective exposure of resources to a Web service should thereafter trigger rule-based processes such as harvesting by enterprise systems and publishing to the appropriate audience based on resource rights, aggregation level and other conditions.

I. CONTEXT

A. An Institutional Content Development Team

The Groupe des technologies de l’apprentissage (GTA) [1] at the Université de Moncton is a team created in 1999 and comprised of 20 some project managers, instructional designers and media specialists. This learning technologies group has been creating online educational content for academia, as well as the New Brunswick Education department among others, and training content for large institutions such as the Justice Knowledge Network, a national consortium of health practitioners and the province’s order of nurses. The GTA has also been involved in a number of research projects in the past few years, and has recently branched out in parallel services such as Web site development and learning technologies consultation.

II. ABRIEF HISTORY OF GTA’S R&DWORK

WITH LEARNING OBJECTS

The Université de Moncton’s involvement in the management of learning objects has evolved in three distict phases. Its Groupe

des technologies de l’apprentissage first participated in the

pan-canadian eduSource project. Then it helped develop the eduRafael federation of learning object repositories. It currently collaborates with the National Research Council of Canada (NRC) in two distinct research projects, SynergiC3 being its major involvement and MDXtract, a smaller but more targeted initiative, both of which explore relevant issues for the management of learning content repositories.

A. The eduSource project

The Université de Moncton’s GTA contributed learning objects to Canada’s eduSource [2] initiative, circa 2003. The goal was to reach a critical mass of resources to demonstrate the validity of the learning objects model for education. This initial effort entailed the adoption of the CanCore metadata application profile and the IMS Content Packaging specification along with the development of a description methodology designed to identify resources deemed of educational value from a file server, and document them according to standards.

A team of indexers under the supervision of a library specialist were in charge of identifying learning objects within course aggregations of a few selected GTA partners, and manage those resources in order to describe and expose them. The indexing was done through a partner’s Web interface, one resource at a time, and the learning objects were kept locally on a Web server. At project’s end, some 1,200 learning objects were available in a catalogue of provincial resources. This infrastructure was not maintained long, as the GTA’s repository partner lost its funding. Of this first experience, only a large XML file of the learning objects descriptions, a collection of resources disaggregated from their original context and a publication [3] were left at project’s end.

Among the lessons learned in the eduSource project was the fact that it was not effective to rely on the manual description of resources, not only in terms of the time and costs involved, but also in terms of the quality and management of the metadata. Local control of metadata in a relational database architecture appeared preferable in order to provide the flexibility needed to manage the GTA partners’ assets independently of repositories or other content management solutions.

B. The eduRafael project

More recently (circa 2007), the GTA associated with The Université du Québec à Montréal’s Télé-université (TELUQ) and the University of Ottawa to follow up on the eduSource project and create a network of federated repositories of French learning objects. The project was anchored by TELUQ’s Paloma open source repository solution which underwent, as part of the project, a

(3)

significant upgrade to implement a recommender system and other Web 2.0 commenting and filtering mechanisms.

Apart from the learning modules it contributed to eduRafael [4], the GTA was tasked once again to describe new collections of learning objects it had developed since eduSource. The approach differed quite drastically this time around. A copy of the file server containing the source files and all other documentation relative to projects was created, freezing in time a view of the production server. Then a simple extraction procedure captured the properties of every file: Date, Author, Format, Size, etc. Further cleaning up of the file server provided some LOM Relational information of interest. This was done by standardising folder names and structures to reflect learning content organisation (program courses, course modules, module pages, etc.).

The next and most resource intensive step was to create a Title and Description for every resource deemed of “asset” interest in the repository. Keywords and Classification were a bit easier as they could already be attributed to sets of resources. And then complete folders of content could be tagged with appropriate LifeCycle, Metametadata, Educational, Rights and other descriptive elements with simple database commands, in fact hugely accelerating the description process. All of the mandatory metadata elements of the Normetic application profile [5] were treated this way, leaving only Title and Description to the library specialist, and those could fairly easily be extracted from the course context.

One interesting improvement with this latter methodology came from somewhat clarifying one of the inherent limitations of a file server structure where many to many relationships are impossible. When such a file server structure is used by production teams, media files are inevitably duplicated at different levels of the tree structure and, over time, confusion becomes inevitable. A simple cyclic redundancy check allowed to identify all the unique files, and their association with different courses and programs, providing at the same time another means to help solve the versioning problem (although this particular aspect was not given much thought other than this initial capability).

The Paloma repository solution [6] supports batch imports, so a month’s worth of work could be uploaded in no time at all. From the GTA’s perspective, the beauty of this methodology, as compared to previous approaches, was the capability to impart major changes and corrections to large sets of resources at a time using simple database commands.

III. AUTOMATING THE MANAGEMENT OF

METADATA

The GTA is currently involved with NRC in automating the extraction of metadata throughout the content development process. Research has already produced a vast number of metadata extraction tools for almost all file formats. Despite the fact that most existing tools currently present significant limitations, we

know that it is relatively easy to recognize a file type, select the appropriate extraction tool, and present the metadata to an author for his review and augmentation. The improvement of this process is the work of the MDXtract project. Two further considerations are examined more specifically by the Université de Moncton and its partners in the SynergiC3 [7] project.

The first consideration stems from the production environment of a content development team like GTA’s. From the moment a new contract is signed, relevant metadata are already available for harvesting. Then, throughout the analysis, design, development, implementation and evaluation cycle, additional metadata are available from the contribution of each actor in the creation process. This led to the thinking of metadata extraction as a workflow. More specifically, if a data point changes in a rule-based system, like something as simple as “a new learning object was added to the repository”, then functions can be triggered like, for example: “determine learning object format, select appropriate tool and extract metadata, save metadata record in the database and send a request for validation to the author”. This concept is currently being explored in the SynergiC3 project.

The other consideration is that metadata should no longer be bound by a one to one file relationships between the resource and its metadata. The fact of the matter is that “truth” about a resource’s description is usually asserted from the perspective of the use envisioned for this resource by the metadata creator. But if we think of person metadata for example, not as a set of fixed data entries for a specific context, but rather as a multifaceted entity that can be described from various points of view and constantly augmented by various contributors in an open environment, then we break that one to one relationship and we describe resources more like logical entities or sets of RDF triplets that can apply to various contexts. This idea is summarised in a distinct position paper [8].

IV. CONCLUSION

It has long been posited that users should not be concerned by metadata as this information about data should rather be handled behind the scene by automatic agents. This is truer now than ever with the latest Web of data emphasis [9]. As a content development team of professionals, the Université de Moncton’s GTA views the process of learning objects creation as a workflow throughout which information can be extracted. The idea of a rules-based environment ensures that this can be done behind the scene without human intervention. When learning content is stable and ready to be published, it should simply be uploaded to a staging environment that can be exposed to services and disseminated as per the needs and constraints of the client, for example to a learning objects repository, a learning content management system or a registry of content creation. And metadata should be thought of not as static records but rather as dynamic logical entities that can be described in and retrieved from semantic entities that are open to the myriads of audiences’ interpretations, and served in any desired standard.

(4)

ACKNOWLEDGEMENT

This work is part of the MDXtract project which is funded by the National Research Council of Canada and based on initial work by the Université de Moncton’s Groupe des technologies de l’apprentissage. Robert Grégoire is seconded to the NRC by the Université de Moncton for the duration of the project.

REFERENCES

[1] Université de Moncton’s Groupe des technologies de l’apprentissage - http://www.umoncton.ca/umcm-dgt/node/91 [2] eduSource Canada - http://edusource.netera.ca/english/home_eng.html

[3] Robert Grégoire, Rose-Marie Racine-April, Angèle Clavet, and Joanne Roy, "La description des ressources d’enseignement et d’apprentissage : une méthode développée dans le cadre du projet eduSource Canada," Journal of Distance Education, 20 (2005), 58-77. Accessed March 25, 2010 : http://www.jofde.ca/index.php/jde/article/viewFile/82/63

[4] eduRafael –

www.edurafael.org

[5] Normetic application profile : http://www.normetic.org/

[6] Paloma suite - http://en.sourceforge.jp/projects/sfnet_paloma-suite/ [7] http://www2.umoncton.ca/cfdocs/synergic3/Index.html

[8] Downes, Stephen (2003) Resource Profiles. Accessed March 25 at http://www.downes.ca/files/resource_profiles.htm

[9] Tim Berrners-Lee on the Next Web (2009) TED Ideas Worth Spreading. Accessed March 25 at http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html