here

(1)

9te Konferenz

Professionelles Wissensmanagement 5. – 7. April 2017

FZI Forschungszentrum Informatik | Haid-und-Neu-Straße 10-14 | 76131 Karlsruhe

mit Unterstützung von

Karlsruher Institut für Technologie (KIT) FZI Forschungszentrum Informatik

Universität Siegen Universität Rostock IT University of Copenhagen Auckland University of Technology Johann Wolfgang Goethe-Universität Frankfurt Norwegian University of Science and Technology (NTNU)

Know-Center: Forschungszentrum für Data-driven Business und Big Data Analytics

(2)

WORKSHOPS VII

PROGRAMMKOMMITEE XI

ORGANISATIONSKOMMITEE XII

FLEXIBLE WISSENSPRAKTIKEN UND DER DIGITALE ARBEITPLATZ (EN) 1 1 Determining the Outdatedness Level of Knowledge in Collaboartion Spaces Using

A Machine Learning-Based Approach 2

Zahra Grosser¹, Andreas Schmidt¹, Martin Bachl¹, und Christine Kunzmann²

1 Karlsruhe University of Applied Sciences, Germany

2 Pontydysgu Ltd., Remchingen, Germany

2 Structuring Digital Options towards reducing the Struggle of Finding the Needle in the

Digital Haystack 18

Michael Leyer¹, Susanne Durst²

1 University of Rostock, Institute for Business Administration, Germany

2 University of Skövde, School of Business, Sweden

3 Towards Multiagent-Based Simulation of Knowledge Management in Teams 25

Ingo J. Timm¹, Jan Ole Berndt¹, Lukas Reuter¹, Thomas Ellwart², Conny H. Antoni³, Anna-Sophie Ulfert³

1 Trier University, Business Informatics 1, Germany

2 Trier University, Business Psychology, Germany

3 Trier University, Work, Industrial and Organizational Psychology, Germany

4 Context Modeling for Knowledge Management Systems 41

Kurt Sandkuhl¹

1 Rostock University, Institute of Computer Science, Germany

5 Tackling Knowledge Gaps in Digital Service Delivery 55

Florian Bär^1,2

1 University of Rostock, Germany

2 Munich University of Applied Sciences, Germany

GERMAN WORKSHOP ON EXPERIENCE MANAGEMENT (DE) 61

(3)

1 University of Kassel, Research Center for Information System Design, Germany

5 Case Base Indexing Using Unique Attribute Combinations Based on

Parallel Power Set Tree Algorithm 107

Kareem Amin^1,4, Khaled El-Bahnasy², Klaus-Dieter Althoff^1,3, Andreas Dengel^1,4

1 German Research Center for Artificial Intelligence, Knowledge Management, Kaiserslautern, Germany

2 Ain Shams University, Cairo, Egypt

3 Institute of Computer Science, Intelligent Information Systems Lab, University of Hildesheim, Germany

4 Kaiserslautern University, Germany

HARDWARE-ORIENTIERTES SITUIERTES WISSENSMANAGEMENT (DE) 108 1 Herausforderungen des Wissensmanagements im Rahmen betrieblicher

Rüstprozesse 109

Sven Hoffmann¹, Nils Darwin Abele², Aparecido Fabiano Pinatti de Carvalho¹

1 Lehrstuhl für Wirtschaftsinformatik und Neue Medien, Siegen, Germany

2 Lehrstuhl für Technologiemanagement, Siegen, Germany

2 Wissensvermittlung als Mittel zur Vermeidung von Konflikten bei der

110 Produktionsplanung eines KMU

Christoph Kotthaus¹, Thomas Ludwig¹, Volkmar Pipek¹

1Universität Siegen, Germany

LERN- UND WISSENSMANAGEMENT IM ZEITALTER DER INDUSTRIE 4.0 (DE) 111 1 Knowledge Management 4.0 – lessons learned from IT trends 112

René Peinl¹

1 Hof University of Applied Sciences, Institute of Information Systems, Germany

2 Wissensmanagement in Industrie 4.0 – Supply Chains 118

Oliver Möllenstädt¹

1 Hauptgeschäftsführer Gesamtverband Kunstoffverarbeitende Industrie e.V. (GKV), Germany

3 Adaptive and Reflective Training Support for Improving Search Behaviour

in Industry 4.0 125

Angela Fessl¹, Stefan Thalmann^1,2, Viktoria Pammer Schindler^1,2

(4)

work process tool? 137

Ludger Deitmer¹, Lars Heinemann¹

1 University of Bremen, Institut Technik + Bildung, Germany

WISSENSMANAGEMENT IM KONTEXT DES DEMOGRAFISCHEN WANDELS (DE) 140 1 Vom operativen Wissenstransfer zum strategischen Wissensmanagement –

Erfahrungen aus der Praxis der Freien und Hansestadt Hamburg 141

Katharina Dahrendorf¹, Christina-Maria Schröder²

1 Freien und Hansestadt Hamburg, Personalamt, Germany

2 Freie und Hansestadt Hamburg, Behörde für Soziales, Familie und Integration, Germany

2 Wissenserwerb als Teil der „Produktionsumgebung Wissensmanagement“ der ÖV 151

Horst Friedrich¹, Dietmar Glachs², Sabine Moebs³, Jan Pawlowski⁴, Eric Ras⁵, Peter Schilling⁶, Juliane Schmeling¹, Petra Steffens¹, Julia Stoffregen⁴, Sonja Trapp³, Philippe Vallogia⁵

1 Fraunhofer Institut Fokus, Berlin, Germany

2 Salzburg Research Forschungsgesellschaft, Austria

3 DHBW, Heidenheim, Germany

4 Hochschule Ruhr West, Bottrop, Germany

5 Luxembourg Institute of Science and Technology, Luxemburg

6 Prof. i.R. Projektpartner FhI Fokus, Backnang, Germany

3 Soziale Lerntechnologien für die Weiterentwicklung des beruflichen

Selbstverständnisses am Beispiel europäischer Arbeitsagenturen 167

Christine Kunzmann¹, Andreas P. Schmidt²

1 Pontydysgu Ltd., Remchingen, Germany

2 Hochschule Karlsruhe, Institut für Lernen und Innovation in Netzwerken, Germany

4 Kompetenzen erkennen, dokumentieren und bewahren für ein bedarfsgerechtes

Wissensmanagement im demografischen Wandel 174

Nadine Ogonek¹, Michael Räckers¹, Jörg Becker¹

1 Westfälische Wilhelms-Universität Münster, Germany

5 Transformationsbedarf in der öffentlichen Verwaltung – kompetenzorientiert

den demografischen Wandel gestalten 186

Emanuel Zimmerling¹, Steffen Gilge², Eric Schoop¹ and Michael Breidung³

(5)

insbesondere deren methodische und technische Umsetzung mittels Informations- und Kommunikationstechnik.

Der Schwerpunkt der Tagung in 2017 liegt auf “Wissensmanagement im digitalen Wandel”.

Die zweijährlich ausgerichtete Konferenz bringt Vertreter/-innen aus Forschung und Praxis in eingeladenen Vorträgen, Workshops, Tutorials und einer begleitenden Industrieausstellung zusammen, um Erfahrungen, professionelle Anwendungen und Visionen zu diskutieren.

Wie in den vergangenen Jahren besteht das technisch-wissenschaftliche Hauptprogramm der Konferenz aus mehreren thematisch eigenständigen Workshops, die den wesentlichen Bestandteil des Tagungs-Programms darstellen. Durch das offene Workshop-Format ist es in besonderer Weise möglich, neue Themengebiete des Wissensmanagements und verwandten Gebieten zu fördern und eine starke Vernetzung und Kooperation von Wissenschaft und Praxis zu erreichen. Prinzipiell sind alle Themen mit klarem Wissensmanagementbezug mögliche Workshop-Themen.

DANKSAGUNG

Wir bedanken uns bei allen Workshop-Organisatorinnen und -Organisatoren, Autorinnen und Autoren, die mit Ihrem Engagement und Ihren Beiträgen dem vorliegenden Band eine besondere Qualität verleihen. Jede Einreichung wurde von zwei bis drei Experten der beteiligten Disziplinen in teilweise mehreren Überarbeitungsphasen in einem double-blind- Verfahren begutachtet. Daher gilt unser Dank auch den Gutachterinnen und Gutachtern aus Wissenschaft und Wirtschaft.

Ihnen als Leserinnen und Lesern wünschen wir eine spannende Lektüre!

Karlsruhe im März 2017

York Sure-Vetter, Stefan Zander, Andreas Harth

(6)

methodological and technical implementation of these factors with the help of information and communications technology.

Focus of the 2017 conference is on “knowledge management in the digital era”. The biennial event brings together representatives of research and practice to participate in presentations, workshops, tutorials and an exhibition by companies, to share experiences, professional applications and discuss future visions.

Similar to previous years, the main technical-scientific program consists of several thematic workshops that make up most of the conference program. The open workshop format allows facilitating new areas of knowledge management and related areas as well as connecting and achieving cooperation between science and practice. Generally, all topics with a clear relation to knowledge management can be workshop themes.

ACKNOWLEDGMENTS

We would like to thank all the workshop organisers and to the authors, who have enriched the quality of this volume by their initiatives and their contributions. Each submission has been evaluated by two or three experts from the disciplines involved. In some cases, with several revision stages in a double-bling procedure. Therefore, our special acknowledgement goes also to all the reviewers and evaluators from science and industry.

We wish our readers a beneficial experience!

Karlsruhe, March 2017

York Sure-Vetter, Stefan Zander, Andreas Harth

(7)

Dr. Rolf Zajonc and Dr. Marco Gärtler | tts GmbH

Kaleidoscope of Challenges in Business-Related Knowledge Management

PANEL

Prof. Dr. Ronald Maier | Leopold-Franzens-Universität Innsbruck, School of Management Prof. Dr. Klaus North | Wiesbaden Business School, Hochschule Rhein Main

Dr. Daniel Bachlechner | Fraunhofer ISI, Karlsruhe Hans-Peter Schnurr | semedy AG

Dr. Stefan Thalmann | Know-Center, Graz

Wissen 4.0 – Perspektiven des Wissensmanagements im digitalen Wandel

Die Digitalisierung ermöglicht Disruption und fordert in allen Sektoren den Wandel und die Neuerfindung der Organisation. Dabei haben die technischen Möglichkeiten große Mengen von Daten und Informationen verfügbar zu machen, zu vernetzen, intelligent auszuwerten, zu kommunizieren und zu visualisieren erhebliche Auswirkungen auf das Management der Ressource Wissen im Kontext von Organisationen, Wertschöpfungsketten, Geschäftsmodellen und Wissensarbeit. Zur Gestaltung eines „Wissensmanagement 4.0“ sind u.a. folgende Fragen zu beantworten: Welche Implikationen hat die Digitalisierung für den Umgang mit Daten, Informationen und Wissen in Organisationen? Wie verändern Digitalisierung und intelligente Systeme wissensbasierte Wertschöpfung und Wissensarbeit?

Was bedeutet dies für Führung/Unternehmensführung? Im Vortrag werden im Dialog beider

Forscher die Zusammenhänge von wissensbasierter Wertschöpfung im digitalen Wandel mit

der Weiterentwicklung der „Wissenstreppe“ zur „Wissenstreppe 4.0“ verdeutlicht.

(8)

4. German Workshop on Experience Management (GWEM2017) 5. Lern- und Wissensmanagement im Zeitalter Industrie 4.0 (LeWIn4.0)

HARDWARE-ORIENTIERTES SITUIERTES WISSENSMANAGEMENT (DE)

Kurzbeschreibung

Der Erfolg eines Unternehmens basiert auf der Wissensbasis der Mitarbeiter. Diesen Erfolg gilt es langfristig sicherzustellen, wodurch dem Wissensmanagement eine entscheidende Bedeutung zukommt. Dieser Workshop hat zum Ziel, Möglichkeiten der Wissensvermittlung auf Basis von Hardware zu beleuchten. Dabei sollen nicht allumfänglich Ziele verfolgt werden, vielmehr wird Wert auf einen situierten Ansatz gelegt. Dem Menschen als Wissensträger und -konsument kommt eine besondere Rolle zu. Um die Effektivität eines Wissensmanagements sicherzustellen, müssen sowohl die Bedürfnisse der Menschen, als auch kontextuelle Rahmenbedingungen einbezogen werden. Situiert bedeutet in diesem Kontext die nutzerzentrierte Gestaltung des Wissensmanagements direkt an der (Produktions-)Maschine.

Im Zuge der Digitalisierung gilt es, den Einsatz von technischen Lösungskonzepten und deren Integration in das Arbeitsumfeld ebenfalls vor dem Hintergrund der Nutzerzentrierung zu erörtern.

Zielsetzung

Innerhalb dieses Workshops gilt es die Probleme und Herausforderungen von situiertem Wissensmanagement zu thematisieren und neue Lösungsansätze zu präsentieren, welche die verschiedenen Sichtweisen (Arbeitgeber, Arbeitnehmer, rechtliche Standpunkte und Wissenschaft) aufbereitet. Dabei werden folgende Ergebnisse erwartet:



Aktuelle Chancen und Herausforderungen bei der Vermittlung von Wissen an der Maschine.



Technische Lösungskonzepte, z.B. durch mobile Technologien (Smartphone, Tablet)

oder Wearables (Smartwatch, Smart Glasses).

(9)

how those tools should be designed taking the different knowledge practices and their context into account. This covers the devices used (e.g. smartphones), the kind of knowledge provided and the way employees access relevant knowledge. Moreover, it is important to shed light on the question how employees should be connected and empowered with digital knowledge support solutions.

Zielsetzung

Arbeitsziele des Workshops sind die Herausforderungen im Themengebiet zu erörtern und auf Basis von bisherigen Forschungsergebnissen eine Agenda für ein verhaltensorientiertes Design von Wissensmanagementslösungen am Arbeitsplatz zu entwickeln.

WISSENSMANAGEMENT IM KONTEXT DES DEMOGRAFISCHEN WANDELS (DE)

Kurzbeschreibung

Der demografische Wandel stellt den Öffentlichen Bereich vor große Herausforderungen.

Abgang von Erfahrungswissen trifft auf Haushaltskonsolidierung und Stellenstreichung.

Kompensationsmaßnahmen des E-Government greifen oft zu kurz. Dokumentenmanagement und Prozessautomatisierung genügen nicht mehr. Eine tiefergreifende Transformation ist erforderlich. Der Workshop will Methoden und Instrumente eines professionellen Wis- sensmanagements für den Öffentlichen Bereich diskutieren und Bezüge zur privaten Wirtschaft ziehen. Gewünscht werden theoretische und/oder prakti- sche Beiträge. Mögliche Themen sind:

1. Erforderliche Kompetenzen des modernen Staats und Besonderheiten des Wissensmanagements im Öffentlichen Bereich

2. Auswirkung der Digitalen Transformation (Industrie 4.0) auf den Öffentlichen Bereich 3. Öffnung traditioneller Verfahren des E-Governments für Ansätze des

Wissensmanagements

4. Methoden zur Identifikation, Dokumentation, Bewertung und für den Transfer von Wissen im Öffentlichen Bereich

5. Kulturwandel im Öffentlichen Bereich: Wissensbewahrung durch Bindung von

(10)



Fachlicher Austausch der interessierten Communities, Identifikation von Kooperationspotentialen, z. B. für gemeinsame Förderanträge

GERMAN WORKSHOP ON EXPERIENCE MANAGEMENT (DE)

Kurzbeschreibung

Efficient utilization of experience is gaining in importance since expertise, "know how", applied knowledge can be assessed as one of the most important resources and the key issue for successfully competing organizations in almost every sector of economy. In the context of this workshop as well as the workshops previously held in this series, 'experience' is mainly considered as a form of knowledge obtained during the solution of problems. Therefore, similar but not identical issues from approaches developed in Knowledge Management, all aspects of the whole lifecycle of 'experience' are in the focus of this workshop: Analysing, modelling, storing, retrieving, developing, and reusing experience are major challenges to Experience Management (EM). Besides the traditionally strong focus of this workshop series on the design, development and integration of intelligent systems and methods for managing experience, subjects relevant in EM may be contributed by different communities from computer science, mathematics, social science, and business administration/economics.

Zielsetzung

Ziel des Workshops ist es Wissenschaftler aus dem Bereich Erfahrungsmanagement zusammenzubringen, um sich gegenseitig auszutauschen.

LERN- UND WISSENSMANAGEMENT IM ZEITALTER DER INDUSTRIE 4.0 (DE)

Kurzbeschreibung

Die Industrie 4.0 verändert unsere Arbeitswelt radikal und stellt Unternehmen und ihre

Mitarbeiter vor neue Herausforderungen. Integrierte Produktions- und Lieferketten erfordern

(11)

Anwendung eines konzeptionell-theoretischen, design- orientierten oder verhaltenswissenschaftlichen Ansatzes in diesem Kontext vorstellen.

Zielsetzung



Ein besseres Verständnis der Herausforderungen für und der Anforderungen an das Lern- und Wissensmanagement im Zeitalter der Industrie 4.0.



Entwicklung der Grundlage für ein gemeinsames Positionspapier über die

Wechselwirkungen zwischen der Industrie 4.0 einerseits und dem Lern- und

Wissensmanagement andererseits.

(12)

PhD Shahper Vodanovich | Auchland University of Technology

EXPERIENCE MANAGEMENT (GWEM2017)

Prof. Dr. Mirjam Minor | Johann Wolfgang Goethe-Universität Frankfurt Dr. Kerstin Bach | Norwegian University of Science and Technology (NTNU)

HARDWARE-ORIENTIERTES SITUIERTES WISSENSMANAGEMENT Dr. Thomas Ludwig | Universität Siegen

M.Sc. Sven Hoffmann | Universität Siegen

LERN- UND WISSENSMANAGEMENT IM ZEITALTER INDUSTRIE 4.0 (LeWIn4.0)

Dr. Stefan Thalmann | Know-Center: Forschungszentrum für Data-driven Business und Big Data Analytics, Graz, Österreich

Univ.-Prof. Dr. Stefanie Lindstaedt | Know-Center: Forschungszentrum für Data-driven Business und Big Data Analytics, Graz, Österreich

Dr. Daniel Bachlechner | Fraunhofer Institut für System- und Innovationsforschung, Karlsruhe, Germany

WISSENSMANAGEMENT IM KONTEXT DES DEMOGRAFISCHEN WANDELS:

TRANSFORMATIONSBEDARF IM ÖFFENTLICHEN BEREICH (WMTransÖff) Prof. Dr. Eric Schoop | Technische Universität Dresden

Dr. Steffen Gilge | Hochschule für öffentliche Verwaltung und Rechtspflege (FH),

(13)

FZI Forschungszentrum Informatik am KIT Förderverein Semantische Technologien e.V.

KOMMITEE

Prof. Dr. York Sure-Vetter | Karlsruher Institut für Technologie (KIT) Prof. Dr. Stefan Zander | FZI Forschungszentrum Informatik am KIT Prof. Dr. Ulrich Reimer | University of Applied Sciences St. Gallen PD Dr. Andreas Harth | Karlsruher Institut für Technologie (KIT), AIFB Dr. Dominik Riemer | FZI Forschungszentrum Informatik am KIT Dr. Maria Maleshkova | Karlsruher Institut für Technologie (KIT), AIFB Matthias Frank | FZI Forschungszentrum Informatik am KIT

Valentina Pavlova | Karlsruher Institut für Technologie (KIT), AIFB Suad Sejdovic | FZI Forschungszentrum Informatik am KIT

Steffen Thoma | Karlsruher Institut für Technologie (KIT), AIFB

Ignacio Traverso | FZI Forschungszentrum Informatik am KIT

Tobias Weller | Karlsruher Institut für Technologie (KIT), AIFB

Robert Orzanna | FZI Forschungszentrum Informatik am KIT

Beate Kühner | Karlsruher Institut für Technologie (KIT), AIFB

Gisela Schillinger | Karlsruher Institut für Technologie (KIT), AIFB

Heike Döhmer | FZI Forschungszentrum Informatik am KIT

(14)

FLEXIBLE KNOWLEDGE PRACTICES

AND THE DIGITAL WORKPLACE (FKPDW)

Prof. Dr. Michael Leyer Universität Rostock Prof. Dr. Alexander Richter IT University of Copenhagen PhD Shahper Vodanovich Auckland University of Technology

Zahra Grosser, Andreas Schmidt, Martin Bachl, und Christine Kunzmann

Determining the Outdatedness Level of Knowledge in Collaboration Spaces Using A Machine Learning-Based Approach

Michael Leyer, Susanne Durst

Structuring Digital Options towards reducing the Struggle of Finiding the Needle in the Digital Haystack

Ingo J. Timm, Jan Ole Berndt, Lukas Reuter, Thomas Ellwart, Conny H. Antoni, Anna-Sophie Ulfert Towards Multiagent-Based Simulation of Knowledge Management in Teams

Kurt Sandkuhl

(15)

(16)

even outdated. Problem can arise by using such documents particularly for new and peripheral members. The Knowledge Maturing Model proposed by Maier

& Schmidt [15] offers a developmental perspective on collective knowledge by identifying different phases and corresponding characteristics of knowledge ma

turing. Indicators for such phases can be of great help for enhancing collaborative spaces with visual cues or filtering mechanisms. One particular research area of knowledge maturing is to capture the phenomenon of outdated knowledge, also referred to as outdatedness. Therefore, we focus our research activity on using indicators to explore the outdatedness of knowledge. Uncertainty about whether knowledge in collaboration spaces is up-to-date, constitutes a major barrier, es

pecially when those spaces become larger in terms of number of documents, number of individuals, and usage time. Although an alternate approach to deal with this problem is regular review of documents, the marking of content for periodic review, as well as performing the actual review tasks in larger networks requires considerable effort. Therefore, automated outdatedness indicators are necessary and appear promising by creating awareness when sharing and using documents in such collaboration spaces. From the design point of view we are interested to employ visual cues as indicators.

Following this idea, the main contribution of this paper is to propose a model and workflow to automatically determine the outdatedness level of collective knowledge in collaboration spaces. There are two distinct concepts associated to the phenomenon of outdatedness of knowledge. First, knowledge can become irrelevant in a specific context, for example when the focus and the directions are changing for a professional community. Second, knowledge can become invalid with new evidence, for example the documents no longer reflect the current status of a conversation. In this paper, we cover mainly invalidity.

In our experiments, we focus on an active collaboration space of a research project with over 600 articles, called the EmployID wiki. At the current state of our experiments, the project was active for 2.5 years and there are still 1.5 years to go. More specifically, the contributions of this paper are to:

- investigate the value of using a supervised machine learning tech

nique, known as Support Vector Machines (SVM), to determine the outdatedness level of wiki articles through classification.

- evaluate the performance of the model using a number of features extracted from attributes of wiki articles including content, edit his

tory, and contributor authoritativeness.

- introduce novel features, namely category-dependent features, with respect to specific characteristics of groups of articles in a same cat

egory by applying text analysis techniques.

- evaluate the classification performance by employing a feature selec

tion method.

The remainder of this paper is structured as follows. Section 2 discusses related work. Section 3 introduces the EmployID wiki. Section 4 formally defines the problem of wiki article outdatedness classification. Section 5 describes our

(17)

proposed model and workflow and explains how we approach the wiki article outdatedness classification problem. Section 6 presents our experimental setup.

Section 7 demonstrates our experimental results of the wiki article outdatedness classification. Section 8 concludes this paper and sketches some future work.

2 Related Work

This section briefly surveys the research areas related to our work, which are knowledge maturing, outdatedness determination, feature extraction, and clas

sification using machine learning techniques. As a first study related to the un

derlying Knowledge Maturing Model [15], Braun & Schmidt [3] perform analysis of Wikipedia articles with respect to different article features to identify mea

sures for maturity. In this work, they use the excellent article badge as a gold standard.

Regarding the outdatedness determination, to the best of our knowledge, there is no work dedicated for automatically determining the outdatedness of wiki articles. However, a number of approaches address automated assessment of the quality of User Generated Content (UGC). Quality assessment is a multi

dimensional concept, which comprises several criteria such as trustworthiness, accuracy, reliability, and relevancy assessment. In the following, we review a number of UGC quality assessment models that have been developed for social media platforms, including web forums and Wikipedia. These research works are related to our study as they employ machine learning techniques to train classi

fication functions, which map sets of object features to sets of class attributes.

The literature provides numerous approaches to assess the quality of Wiki

pedia articles. All of these approaches rely on feature extraction. What differ

entiates the approaches from each other is the variety of feature sets and their complexity. Hu et al. [10] evaluate the effectiveness of combining contributor authoritativeness features with the character count feature to measure the qual

ity of articles in Wikipedia, which results in performance improvements of their model. Both these features are also employed in our model.

Anderka et al. [2] provide a comprehensive overview of several features, or

ganized along the four dimensions content, structure, network, and edit history to predict quality flaws in UGC. Additionally, they develop a flaw-specific view that combines feature selection, expert rules, and multilevel filtering to explore essential features for the prediction of a certain quality flaw. Our research is re

lated to this work in that we employ feature selection techniques to find effective subsets of features with respect to specific categories of related articles for which multiple classifiers are trained in parallel.

There is also another study, which applies domain-specific features for quality evaluation of Wikipedia medical articles [7]. This work creates a model, which combines some baseline structural and linguistic features with specific features extracted from the medical domain by employing Natural Language Processing (NLP) techniques. This study reports that applying domain-specific features,

(18)

such as domain informativeness and category, can significantly improve the au

tomatic classification results.

In a case study on web forums, Weimer et al. [20] is one of the first efforts.

They present a binary SVM classifier to predict the quality of forum posts by using content and lexical features. They observed that the classification perfor

mance is dependent on specific topics discussed. Their model generates better performance on a dataset containing discussions about software when compared to another dataset containing content from miscellaneous topics. This is due to the general assumption, that by grouping related content within categories be

fore classification, better classification performance is achieved. This idea is also adopted in our work. Wanas et al. [19] extend this work by proposing a model to classify the quality of forum posts into three quality classes (i.e., low, medium, and high) on a larger dataset. This work is different from previous works in that they evaluate edit history features, which are also represented in our study. The effectiveness of these features for forum posts quality classification is demon

strated in their work.

Chai et al. [4] propose a model that assesses post usage behavior of the fo

rum community to measure the quality of forum posts. Regarding our research, usage-based features, such as view count or dwell time, don't seem promising.

That's because an article could be opened several times randomly while its view count increases, and at the same time it does not reflect the popularity of the page. This work is further extended in [5] by adding content, reputation (con

tributor authoritativeness), and temporal features, which are employed by dif

ferent classifiers. Furthermore, they conduct a number of feature preprocessing tasks such as normalization, discretization and feature selection. Their work per

forms a number of classification experiments. Their proposed model is validated on three forum datasets and existing models known from [19, 20] are outper

formed in their experiments. Utilizing different classifiers and applying feature discretization techniques appear to be promising for future investigation.

3 The Employ ID Wiki

The EmployID wiki is a wiki-based collaboration space based on Media Wiki with Semantic MediaWiki extensions. lt is used by the EmployID research project¹ team for organizing knowledge and all project information related to workplace learning and technologies supporting public employment services. The EmployID wiki consists of roughly 50 project members and 600 content pages from about 2.5 years ( out of a total of four years project duration) of project work. The wiki allows individuals to create and share knowledge, collaborate on projects, and manage projects more effectively. Articles of the EmployID wiki cover various aspects of the project, including project news, brainstorming and ideas, project reports, guidelines, to-do tracking, meeting minutes, and scrum team overview pages. In addition to the content, metadata has been added to articles, such

1 www.employid.eu

(19)

as categories. Individuals categorize articles and files by adding one or more category tags as wiki markup. That makes it easy to browse related articles with same characteristics.

The data of the wiki pages, including the entire editing history, is stored in a database. The wiki database layout contains dozens of tables including pages and their content, individuals and their preferences, metadata, etc.

The EmployID wiki is a real-life and vibrant data source. lt is unknown how many pages are up-to-date, valid, or relevant. For example, there are articles, which should be updated and there are other articles, which are not valid any

more, describing what was important in the past. That makes the EmployID wiki a valuable resource for the purpose of our research to learn how informa

tion is getting outdated over time. Therefore, providing indicators for project members enables them to stay informed about outdated information or to be aware of articles, which should be reviewed from time to time.

4 Wiki Article Outdatedness Classification Definition

The phenomenon of outdatedness of knowledge in a wiki-based collaboration space is defined as follows:

Definition 1. An outdated wiki article is an article whose content is no langer considered to be true, good, useful, or has become invalid¹.

That is, definition 1 is based on facts and evidence as opposed to the perceived outdatedness of wiki articles, which leads to definition 2.

Definition 2. An outdated wiki article is an article whose content is marked as outdated by the ratings of the majority of individuals.

Invalidity and outdatedness of wiki articles need to be introduced as concepts, so that wiki members become aware of them. Therefore, the main contribution of this paper is to provide a model and workflow that automatically determines the outdatedness of wiki articles based on features extracted from various aspects of wiki articles. The generated feature vector is then passed into a supervised machine learning algorithm that learns from article outdatedness ratings derived from the feedback of individuals.

The problem of determining the outdatedness of wiki articles is formally defined as a multi-class classification problem. We focus on the definition of the wiki article outdatedness classification, the wiki article structure, and the representation of the wiki article.

Let A

=

^{a^1,^a^{2, ... ,}alAI} be a set of wiki articles to be classified. Based on ratings of articles, we evaluate three article outdatedness classes, which can be defined as C

= {

^c¹

=

^low,^c²

=

^medium,^c³

=

^high}.Each article is represented as a set of features we use for classifying the outdatedness of wiki articles as defined for ai in ( 1):

(1)

1 http:// results .learning-layers.eu/ scenarios /know ledge-maturing/

(20)

Then, we define F(ai, ck) be a decision matrix which represents whether ai belongs to Ck where k

=

{l, 2, 3} as defined in (2).

F(ai,ck): A x C-+ {True,False}

(2)

The task of the wiki article outdatedness classification is to approximate this unknown function for all articles by means of a rnachine learning classifier over of training sarnples of the wiki article dataset.

5 Model Definition

The general workflow of the wiki article outdatedness deterrnination is based on the standard workflow for classification problerns, and in our case it consists of five rnajor phases, narnely data collection, data preprocessing, feature extraction, feature preprocessing, and classification using rnachine learning techniques, as depicted in Figure l.

Data Collection Data Preprocesslng Classification using

Machine Leamin

Feature Extraction Feature

Fig. 1: The wiki article outdatedness determination model workflow

During the first phase large arnounts of raw data are extracted frorn the ErnploylD wiki dataset such as MediaWiki cornrnon tables.

The data preprocessing phase refers to preparing and cleaning data before running the feature extraction. This is an irnportant task, which can assist in the improving classification perforrnance. Removing unnecessary and irrelevant inforrnation is an exarnple of the data preprocessing.

The feature extraction phase is the rnost irnportant phase in our rnodel as the classification is based on features extracted during this phase. Features provide valuable information about different aspects or attributes of the wiki article, which are abstracted into a feature vector for the purpose of classification. The next section elaborates on the cornpact set of the proposed features for the outdatedness deterrnination rnodel.

The feature preprocessing phase refers to a process of representing the feature vector in a way that the rnachine learning classifier can process it.

In the last phase of our rnodel, the generated feature vector is fed into a supervised rnachine learning algorithm in order to automatically classify wiki articles into their corresponding outdatedness classes. We adopt one of the best known supervised machine learning techniques, known as SVM, to evaluate the effectiveness of the feature vector we constructed [4, 9, 14].

(21)

5.1 Features

As previously discussed in the outdatedness determination model, a number of features are derived from various aspects of wiki articles by applying different feature extraction techniques. These features, as illustrated in Figure 2, fall into one of the following three categories: (i) article--based features, (ii) contributor authoritativeness features, and (iii) category-dependent features

Content & Edit History ..---, Evaluation

Text Categorization &

Text Analysis

Article-Based Features

Category

Dependent Features Fig. 2: Feature extraction workflow and categories

5.1.1 Article-Based Features

Article-based features, as depicted in Figure 2, are evaluated from a subset of two dimensions content and edit history.

Content features focus on the plain text of the article and are extracted from the textual content. In addition, content features (fi - f5) also contain structural items of an article, as presented in Table l. Dalip et al. [8] discover the capa

bility of content features for predicting the quality of Wikipedia articles. Their work shows that applying content features, which are actually simple (low-level) indicators, can effectively improve the classifier performance when compared to other feature sets containing more complex features.

Edit history features represent the evolution of articles with respect to the timing of revisions, frequency of revisions, and community of editors. The fea

tures describing the timing of revisions are referred to as temporal features [5].

Edit history features are valuable features, which address different aspects of the Knowledge Maturi,ng Model [15] such as stability, maturity, and collaboration.

The edit history features (h - fi4) are described as follows:

h: ( Age) [2, 5, 16] This feature refers to the number of days since the articles creation time. At first glance, this feature can provide an indicator of outdatedness of the content of the article. On the other hand, it is observed that old articles are still valid as well.

(22)

IDI Feature Name

1

fi Character count

h

Internal link count

h

External link count

f4 In-link count

f5 Image count

f5 Category count

Tahle 1: Content features

Feature Description

1

^Ref.

Total numher of characters and ratio to the aver- [2, 7]

age character count of all articles within wiki

Numher of outgoing internal links (to other wiki [2, 16]

articles)

Numher of outgoing external links [2, 16]

N umher of incoming internal links (from other [2]

wiki articles)

N umher of images [2, 5]

N umher of wiki categories assigned to an article [2, 16]

f8: (Continuance of Edits) [14, 17] This feature calculates the time passed between the creation of an article and the last revision, which shows the continuance of edit contributions.

f9: (Recent Changes Tracking) The objective of this feature is to get an assessment of whether an article has been changed recently or not. This feature describes a numeric representation of performed revisions for the last four quarters.

f10 : (Duration of Revisions to the First Revision) [14] This feature refers to the time passed between the creation of an article and the creation of each revision. We consider the average duration as total duration divided by the number of revisions.

!11 : (Edit Count) [2, 16] This feature describes the total number of revi

sions.

fi2 : (Edit Rates) [2] This feature calculates the ratio between continuance of edits f 8 and edit count f 11.

fi3: (Author Count) [2, 14, 17] This feature simply counts the number of distinct authors who contributed to the article.

fi4 : (Collaborative Feature) The collaborative feature is a novel feature introduced in our work to investigate whether an article has been generated by relatively equal collaboration of authors or most gener

ated by few of authors. We measure the percentage contribution of each author, then the standard deviation of these numbers is com

puted. This feature provides a valuable indicator regarding collabo

rative activities of authors.

5.1.2 Contributor Authoritativeness Features

The category of contributor authoritativeness features, as shown in Figure 2, evaluates the authority and activeness of wiki contributors. A number of studies [1, 11] have shown that articles with significant contributions from authoritative contributors are likely to be of high quality in terms of trustworthiness and reliability. The features (!15 -fi8) are represented in Table 2.

(23)

Table 2: Contributor authoritativeness features

IDI Feature Name Feature Description

1

^Ref.

f15 Initiated pages count Number of articles created by authors of an article [14]

f15 Participated pages count Total number of contributions within wiki by au- [2, 14]

thors

fi7 Last contribution time The last edit time of authors within wiki ^-

!1s Membership age Time passed since registration of authors [5]

5.1.3 Category-Dependent Features

As previously mentioned, articles in the EmployID wiki are correlated to cat

egories. Categories are used to group related articles. In this section, we investi

gate novel feature sets to assess the outdatedness of articles within categories. In a different context, Weimer & Gurevych [20] suggest that predicting the quality of web forums within a specific topic can lead to a significant improvement in their classification performance. For instance, the category Report of the Em

ploy ID wiki, contains articles related to status reports of the project at different points in time. All articles in this category have the same structure such as an outline, review time, and probably an attached document. Considering articles within categories, we introduce the following features:

f19: {Category Indicator) This feature indicates the category an article belongs to. We extract this feature by grouping related articles within categories. In the EmployID wiki, the most appropriate categories are assigned manually to wiki articles by authors. However, there are articles that are not assigned to a category ( uncategorized articles).

Therefore, we apply a text categorization task to group uncategorized articles referring to categorized articles into one predefined category.

ho: {Category Domain-Specific Feature) This feature is defined as a nu

meric representation of domain-dependent entities in an article, con

sidering the particular characteristics of the category it belongs to.

This feature is calculated using text analysis techniques as part of the workflow shown in Figure 2.

A detailed description of all category domain-specific features would go beyond the scope of this paper. Preferably, we explain it for the Report category by way of example. Let f R

= {

e1, e2, e3} be a feature vector with three entities assigned to a report article in the Report category, where

• e₁defines whether the report was submitted due to an upcoming report or not.

• e2 defines whether the outline was created for the report or not.

• e₃defines whether the report notes are embedded in the wiki article as a document or not.

(24)

6 Experiment

In this section, we present the experimental setup and performed experiments to evaluate the performance of the outdatedness determination model.

6.1 Setup

The data of the EmployID Wiki is based on a real-life wiki in a project envi

ronment. A total number of 892 wiki articles are obtained, which were created within a time period of 2.5 years. First, a number of data preprocessing tasks are conducted before the dataset can be used effectively for classification. This includes removal of test pages, removal of all redirects referring to one particular page by multiple names or spelling, and the removal of wiki page records whose namespace is not zero. Wiki pages are grouped into collections called names

paces, which differentiate the purpose of pages at a higher level. Namespaces numbered zero contain real content, i.e., pages where main wiki articles reside.

The final dataset after data preprocessing consists of 600 content articles.

In our experiment setup, two EmployID project members (as moderators) were asked to manually rate the outdatedness of articles, using scores between 1- 5. This range of scoring allows moderators to have more flexibility in judgments, as opposed to defining an absolute range of low, medium, and high classes. In order to measure whether the ratings are in agreement, Pearson correlation [13]

is used, which is 0.625. This result shows that there is a positive association ( agreement) between ratings of articles. The scores from the two moderators are averaged for the final score on each article. There are 9 possible average article rating scores, which are then distributed equally into three outdatedness classes so that low

=

{1, 1.5, 2}, medium= {2.5, 3, 3.5}, and high= { 4, 4.5, 5}.

This manually rated dataset is used to train the classifier by learning im

portant correlations between a set of features representing articles and their corresponding outdatedness classes. Table 3 shows statistics of the final dataset along with the number of rated articles in their respective outdatedness dass.

Table 3: T he EmploylD wiki dataset

Total number of articles 600

Number of authors 50

Total number of users 79

Number of articles with low outdatedness 96 (16%) Number of articles with medium outdatedness 126 (21%) Number of articles with high outdatedness 378 (63%)

The SVM classifier is trained using LibSVM [6] to classify wiki articles into the three outdatedness classes. The SVM kernel is set to radial basis function

(25)

(RBF) with cost C and gamma 'Y parameters. A grid search is applied on C and 'Y parameters using cross-validation. The cross-validation is an iterative process, which can prevent the overfitting problem. The n-fold cross-validation evaluates the prediction accuracy of the model by partitioning the training set into n subsets of equal size. Sequentially, each subset is tested using the classifier trained on the remaining n-1 subsets. Thus, each instance of the whole training set is predicted once.

We improve the performance of the SVM classifier by performing data nor

malization. That is, all feature values are scaled to a range of [ü, 1].

6.2 Feature Selection

Sung & Mukkamala [18] show that the feature selection improves classification performance by searching for an optimal feature subset, which best classifies the training samples. In order to assess the outdatedness of articles within its categories, we need to identify the most important and relevant features, which efficiently construct the model describing articles in their related category. To this end, we employ a feature selection approach known as Sequential Forward Feature Selection (SFFS) [12]. This method selects the best single feature and then adds one feature at a time, which in combination with the selected features, minimizes the classification error.

6.3 Wiki Article Outdatedness Classification

We conduct three experiments to study the effect of using different feature sets on the classification performance of our outdatedness determination model. For all following experiments, the dataset is split into two disjoint subsets where 75%

of the data is used for training, and 25% of the data is used as the test dataset.

In the first experiment, we employ a single SVM classifier, which is trained on all articles with different topics in the training dataset. For this experiment, we apply the feature set containing all features from article-based and contributor authoritativeness features (fi - fi₈) as described in subsections 5.1.1 and 5.1.2.

In the second experiment, we investigate the effect of assessing the outdat

edness of articles within categories by applying an alternate feature selection method, based on the category. F irstly, a text categorization task is applied to associate uncategorised articles into a fitting category. Secondly, for each cat

egory, the feature selection is used to select a subset of relevant features from article-based and contributor authoritativeness features (fi - fi₈). Lastly, multi

SVM classifiers are employed in that the training of each classifier is clone in parallel on articles of the same category using the selected feature set. The over

all classification accuracy is computed as the average of the individual LibSVM classifier accuracy. All these steps are performed in an automated fashion. For example, once the category of an input article is determined, the related feature set is computed. Then, a classifier is invoked to evaluate the outdatedness dass of that article.

(26)

The last experiment is set up as the previous experiment and in addition category-dependent features are added for articles in the same category. That is, for each category the selected feature subset from article-based and contributor authoritativeness features (11 - fi8) is combined with the category-dependent features (119 - ho). This results in category-dependent features and leads to increased performance of our model as shown in the following section.

7 Results and Discussion

In this section, we show and discuss the results of the three performed exper

iments for the classification of the EmployID wiki articles into the three out

datedness classes low, medium, and high. The performance for each experiment is presented in Table 4, which also shows the classifier performance in correctly determining the low, medium and high outdatedness classes.

Table 4: Summary of the classification performance of experiments

Features Overall High" Medium" Low"

Accuracy

1st experiment Ji

-

^fis ^74% 92.70% 43.33% 37.5%

2nd experiment (fi - fis)' 85.33% 91.92% 70.37% 75%

3rd experiment (Ji

-

!isY and f19 - ho 92% 95.60% 87.88% 84.61%

1: Subset of features fi - fis is selected.

2: Accuracy for articles with high/medium/low outdatedness.

In the first experiment, we evaluate the accuracy of the proposed model with 18 features (fi - fis), which are collected from article content, article edit his

tory, and contributor authoritativeness. After training the LibSVM classifier on the normalized training set of 450 articles with the kernel parameters C = 32 and, = 0.5 and 5-fold cross validation with the rate of 75.15%, the overall clas

sification accuracy of 7 4 % is obtained. That is, the classifier correctly classifies 111 articles from 150 articles of the test set. The results show that the classifier performs well in classifying articles with high outdatedness classes (92.70%, 89 out of 96), but relatively poor for articles with medium ( 43. 33%, 13 out of 30) and low outdatedness classes (37.5%, 9 out of 24). After evaluating the misclas

sified articles, it is discovered that some articles, which were rated as low and medium outdatedness classes, are misclassified as highly outdated articles. In the following, we discuss reasons for this phenomenon. For example, the model may need to consider additional feature preprocessing tasks such as discretization, or to adopt another classifier such as decision trees, which can assist in improving classification performance. However, we believe that the misclassification prob

lem, which occurs on the first experiment, is due to two main reasons. Firstly, as shown in Table 3, many articles fall into the class of highly outdated articles,

(27)

which can cause the classifier to overfit this class, leading to misclassification.

Secondly, the proposed feature set (!1 -f¹⁸⁾can not effectively distinguish be

tween outdatedness classes, because the feature set does not cover the category of articles.

In order to improve the performance of our model we conduct the second experiment, in which a feature selection method is applied on articles associated to the same category. Table 5 shows the most significant categories recognized in our experiment along with the subset of the selected features for each, which are identified by the sequential forward feature selection.

Table 5: Predominant categories in the EmploylD wiki Category namelArticles in category1Selected feature subset Meeting 272 fi, fs, h, fs, Jg, ^!11

Report 60 fi, fs, 14, Jg, fio, ft2, f13, f14

Person 39 fi,h,f4,fi5 - fis

Partner 10 h, f4, f13, fi5 - fis

Tool 36 !1, fs, 14, h, Jg, ^!10

Task 10 h,fs,fg,fi2

Work packages 9 fi,h,fg,f13,ft4

Other 164 !1, h, h, h, Jg, !10, ft2, f13, f14

In this experiment the overall accuracy of 85.33% is achieved. The results of the classifiers show that the average accuracy of an article being correctly classified as high outdatedness class is 91.92% (91 out of 99), as medium out

datedness class is 70.37% (19 out of 27) and as low outdatedness class is 75%

(18 out of 24). This result indicates that assessing the outdatedness of articles within categories leads to significantly better classification performance, when compared to the first experiment. lt is also observed that performing feature selection by removing irrelevant features of particular categories avoids the over

fitting problem, which occurred in the first experiment. Moreover, these results are promising due to the reduction of the model complexity in terms of com

pute runtime for training and memory consumption because of smaller subsets of features.

In the following, we provide some concrete examples of correctly classified articles, in order to get a better understanding of how effective the selected fea

ture subsets are for determining the outdatedness level of articles. These articles are selected from the Meeting category, which is the most popular category con

taining 272 articles. Table 6 shows six articles along with values of the selected feature subsets.

The articles represented in Table 6 are some instances of a monthly online meeting or annual review meeting of project members. These pages contain in

formation about the meeting agenda and minutes, location information, links to materials, and lists of the participants. A discussion about the impact of

(28)

Table 6: Values of proposed features for the outdatedness classification of some articles from the Meeting category

fi lhl h I

^fs

1

^fg ^!11

1

Predicted Class

1

1st Example 7175 8 287 73 [-,1,1,0] 10.04.16 Highly outdated 2nd Example 1784 2 311 59 [-,1,0,0] 03.03.16 Highly outdated 3rd Example 280 1 41 0 [-,-,-,1] 30.09.16 Medium outdated 4th Example 279 2 296 51 [-,1,0,0] 10.03.16 Medium outdated 5th Example 2210 0 105 7 [-,-,1,1] 04.08.16 Up-to-date 6th Example 341 2 41 34 [-,-,-,1] 03.11.16 Up-to-date

individual features is beyond the scope of this paper. Therefore, we only discuss some observations of manually analyzed features. After comparing the evaluated features of the first and the second examples with the features of the forth exam

ple, it is observed that these are old articles (h), which have not been changed at least in the last quarter (Jg). However, the first two examples are classified as highly outdated, and the forth one as medium outdated. We believe that there is a positive correlation between highly outdated articles and the character count feature (fi). That means, an old article with long text, which has different sec

tions, is more likely to be highly outdated because of the outdatedness of some of those sections. Another interesting observation is made here, when comparing the third example with the sixth one. Both articles are almost new

(h,

Jg), ^but their continuance of edits (1₈) differs significantly. While the third example has not been changed since the date of creation (1₈

=

0), the sixth example had some edits indicated by f ⁸

=

34. When comparing these two values, it is found that the sixth example can be considered more up-to-date, while the third example appears to be abandoned, and therefore is more likely to be outdated.

In the following, the results of the third experiment are discussed. In this ex

periment, we investigate the impact of category-dependent features (119 -h_o) on the performance of the model, which provides the performance comparison between the results of this experiment and the previous experiment. The overall accuracy of 92% is obtained, which shows the best performance overall. As de

picted in Table 4, this experiment performs best in classifying low and medium outdatedness classes when compared to the previous experiments. The results indicate that this sort of features can significantly contribute to the wiki article outdatedness determination.

8 Conclusions and Future Directions

In this paper, we propose a model and workflow to automatically determine the outdatedness level of knowledge in a wiki-based collaboration space. We make use of a supervised machine learning algorithm known as SVM to classify wiki articles into three outdatedness classes low, medium and high. Three different experiments are conducted to evaluate the impact of different feature sets on the

(29)

classification performance. After the initial evaluation of article-based, and con

tributor authoritativeness features, which resulted in a modest performance, it was discovered that the groups of articles in a same category have similar charac

teristics. This observation brings us to the assessment of outdatedness of articles within categories. To this end, the sequential forward feature selection method is applied on articles associated to the same category and multi-svm classifiers are trained in parallel. We introduce novel category-dependent features, which extract specific characteristics from groups of articles in categories. The overall classification accuracy of 92% is obtained, which shows the best performance overall.

Our results indicate that the proposed model and workflow can determine the outdatedness level of knowledge in wiki articles with significant accuracy. This provides an automated perspective on providing support for collaborative spaces.

Based on that, we can think of presenting visual cues, advanced search ranking or similar support measures for users of collaborative spaces. An important insight is that this needs some limited domain knowledge on the category of articles, but then yields very good classification performance.

Our results suggest that certain features are good indicators for outdated

ness, but collaboration practices that involve multiple different collaboration tools make it diflicult to capture the phenomenon of outdatedness. The more functionality a single collaboration space can offer, the better the results will be.

That means our proposed model and workflow is practically applicable to any collaboration space data that allows access to the document edit history, and user activities. Moreover, the documents of such collaboration spaces need to be rated ones manually for the training purpose.

Regarding future work, there are several axes can be suggested for further investiation. We plan to validate the results through user feedback. Moreover, we intend to apply the proposed model to other collaborative spaces. One inter

esting research direction would be to apply other supervised machine learning techniques, such as decision trees, neural networks, and Bayesian networks to classify the outdatedness of wiki articles and compare their performance. W ith regard to knowledge maturing, we intend to apply our model and workflow to the classification of relevancy in addition to the determination of outdatedness.

Acknowledgements

This work has been supported by the European Unions Seventh Framework Program for research, technological development and demonstration under grant agreement no. 619619.

(30)

References

1. Adler, B.T., Alfaro, L., Pye, 1., Raman, V.: Measuring author contributions to the Wikipedia. In Proc. the 4th Int. WikiSym (2008)

2. Anderka, M., Stein, B., Lipka, N.: Predicting quality flaws in user-generated con

tent: the case of Wikipedia. In Proc. SIGIR, 981-990 (2012)

3. Braun, S., Schmidt, A.: Wikis as a technology fostering knowledge maturing: what we can learn from wikipedia. In Proc. 7th Int. Conf. IKNOW 07, 321-329 (2007) 4. Chai, K., Hayati, P., Potdar, V., Wu, C., Talevski, A.: Assessing post usage for

measuring the quality of forum posts. In Proc. the 4th DEST, 233-238 (2010) 5. Chai, K., Wu, C., Potdar, V., H_ayati, P.: Automatically measuring the quality of

user generated content in forums. In Pore. 24th Int. Conf. Adv. Artif. Intell., 51-60 (2011)

6. Chang, C., Lin, C.: LibSVM: a library for support vector machines. Software avail

able at http://www.csie.ntu.edu.tw/ cjlin/libsvm (2001)

7. Cozza, V., Petrocchi, M., Spognardi, A.: A matter of words: NLP for quality eval

uation of Wikipedia medical articles. arXiv:1603.01987 [es.IR] (2016)

8. Dalip, D., Goncalves, M., Cristo, M., Calado, P.: Automatie quality assessment of content created collaboratively by Web communities: a case study of Wikipedia.

In Proc. JCDL09, 295-304 (2009)

9. Dang, Q-V., lgnat, C-L.: Measuring quality of collaboratively edited documents:

the case of Wikipedia. In Proc. the 2nd IEEE CIC (2016)

10. Hu, M., Lim, E., Sun, A., Lauw, H., Vuong, B.: Measuring article quality in Wiki

pedia: models and evaluation. In Proc. CIKM07, 243-252 (2007)

11. Karre, Gerald C.: A Multimethod study of information quality in wiki collaboration.

ACM Trans. Manage. Inf. Syst., vol. 2, no. 1, 4:1-4:16 (2011)

12. Kohavi, R., John, G.: Wrappers for feature subset selection. Artificial intelligence, 97, 273-324 (1997)

13. Lane, D. M.: Introduction to Statistics: An Interactive eBook. Chap.4, Values of the Pearson Correlation (2013)

14. Lee, J-T., Yang, M-C., Rim, H-C.: Discovering high-quality threaded discussions in online forums. J. Comput. Sei. Technol., 29:519-531 (2014)

15. Maier, R., Schmidt, A.: Explaining organizational knowledge creation with a knowl

edge maturing model. Knowledge Management Research & Practice, no. 1, 1-20 (2014)

16. Marzini, E., Spognardi, A.: lmproved automatic maturity assessment of Wikipedia medical articles. In ODBASE Springer, 612-622 (2014)

17. Rizos, G., Papadopoulos, S., Kompatsiaris, Y.: Predicting News Popularity by Mining Online Discussions. SNOW /WWW, 737-742 (2016)

18. Sung, A.H., Mukkamala, S.: ldentifying important features for intrusion detection using support vector machines and neural networks. In Proc. SAINT '03, 209-216 (2003)

19. Wanas, N., El-Saban, M., Ashour, H., Ammar, W.: Automatie scoring of online discussion posts. In Proc. the 2nd WICOW, 19-26 (2008)

20. Weimer, M., Gurevych, 1.: Predicting the perceived quality of Web forum posts.

In Proc. the Con. RANLP (2007)

(31)

Structuring Digital Options towards reducing the Struggle of Finding the Needle in the Digital Haystack

Michael Leyer¹, Susanne Durst²

1 University of Rostock, Institute for Business Administration, Ulmenstr. 69, 18057 Rostock

2 University of Skövde, School of Business, Högskolevägen, 541 28 Skövde

Abstract. Employees are confronted with more and more different kinds of digital support in their workplace. However, not every support of such kind is perceived positively, but there are also downsides related to the increase of information and knowledge requirements that come along with it. Hence, employees are often overstrained, but differences and interdependencies between different kinds of digital options have to be considered. In this position paper, we propose a framework to categorize digital options in the workplace and call for further research regarding the analysis of downsides of these options from a knowledge perspective.

Keywords: digital options, workplace, information overload, digitalization, downsides

1 Introduction

With the increase of digital options that are used in the workplace, knowledge is not only required regarding the execution of tasks but also regarding the systems used [1].

Digital options in this paper refer to any kind of system that can support the execution of business processes in workplaces of employees. The main advantage of digital options in the workplace is to provide more information, in a faster and ubiquitous way [2]. As a consequence, more information is provided to employees in the same time period. While this provision of information is potentially reducing the uncertainty of decisions in daily work execution, employees have a limited ability to take cognitive load [3]. Thus, information overload can occur, employees will be stressed and have to spend more time in finding the needed information needle in the haystack [4]. At the same time, more knowledge is required to use the different kinds of digital systems.

Prior research has considered specific types of digital options separately thereby focus- ing on either knowledge gaps [e.g. 5] or information overload [e.g. 6]. We argue that