Dmitry I. Ignatov, Sergei O. Kuznetsov, Jonas Poelmans (Eds.)
CDUD 2012 – Concept Discovery in Unstructured Data
Workshop co-located with the 10th International Conference on Formal Concept Analysis (ICFCA 2012)
May 2012, Leuven, Belgium
i
Volume Editors
Dmitry I. Ignatov
School of Applied Mathematics and Information Science
National Research University Higher School of Economics, Moscow, Russia
Sergei O. Kuznetsov
School of Applied Mathematics and Information Science
National Research University Higher School of Economics, Moscow, Russia
Jonas Poelmans
Faculty of Business and Economics Katholieke Universiteit Leuven, Belgium
Printed in Belgium by the Katholieke Universiteit Leuven with ISBN 978-9-08- 140991-9.
The proceedings are also published online on the CEUR-Workshop website in volume Vol-871 of a series with ISSN 1613-0073.
Copyright c 2012 for the individual papers by papers’ authors, for the Volume by the editors. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means without the prior permission of the copyright owners.
Preface
Concept discovery is a subarea of Knowledge Discovery in Databases (KDD) where concept models, such as Formal Concept Analysis (FCA), multimodal clustering, conceptual graphs and other, are used for gaining insight into the underlying conceptual structure of data. Traditional machine learning techniques are mainly focusing on structured data given by object-attribute tables, whereas most data available nowadays are given in unstructured, often textual, form. As compared to traditional data mining techniques, human-centered instruments of concept discovery actively engage domain experts in the discovery process.
This volume contains the papers presented at the 2nd International Workshop on Concept Discovery in Unstructured Data (CDUD 2012) held on May 10, 2012 at the Katholieke Universiteit Leuven, Belgium. This workshop welcomes papers describing innovative research on data discovery in complex data. Moreover, this workshop provides a forum for researchers and developers of data mining instruments, working on issues associated with analyzing unstructured data.
This year the committee decided to accept 11 papers for publication in the proceedings. Each submission was reviewed by on average 3 program committee members.
A. Mestrovic presents an application of concept lattices to semantic match- ing in Croatian language. A. Chepovskiy et al. propose a method for auto- matic language identification for transliterated texts. X. Naidenova describes a novel neural network based data structure for inferring classification tests.
A. Kravchenko et al. introduce an approach for expert search which is based on analyzing e-mail communication patterns. D. Ustalov et al. propose an ontology- based approach for text-to-picture synthesis. A. Skabin presents a computerized recognition system for hand-written historical manuscripts. A. Panchenko et al.
extract semantic relations between concepts from Wikipedia using KNN algo- rithms. D. Fedyanin uses parameter identification methods for Markov models and applies them to influence analysis in social networks. S. Milyaev et al. dis- cuss a new method for self-tuning semantic image segmentation. A. Vorobev pro- poses a probabilistic model for evaluating the quality level of projects, authors and experts in collaborative innovation platforms. D. Gnatyshak et al. present a novel pseudo-triclustering algorithm and applied it to online social network data.
A. Bozhenyuk et al. discuss methods for maximum flow and minimum cost flow finding in fuzzy setting.
We would like to express our gratitude to all contributing authors and re- viewers. We also want to thank our sponsors Amsterdam-Amstelland police, IBM Belgium, Research Foundation Flanders, Vlerick Management School, OpenCon- nect Systems and Higher School of Economics (Moscow, Russia). Finally, we should thank the authors of the EasyChair system which helped us to manage the reviewing process.
May 10, 2012 Leuven
Dmitry I. Ignatov Sergei O. Kuznetsov Jonas Poelmans iii
Organization
The 2nd International Workshop on Concept Discovery in Unstructured Data (CDUD 2012) was held on May 10, 2012 at the Katholieke Universiteit Leuven, Belgium. The workshop was co-located with the 10th International Conference on Formal Concept Analysis (ICFCA-2012). The inaugural edition of CDUD was held on June 25, 2011 at the Higher School of Economics in Moscow, Russia.
Program Chairs
Dmitry I. Ignatov National Research University Higher School of Eco- nomics, Russia
Sergei O. Kuznetsov National Research University Higher School of Eco- nomics, Russia
Jonas Poelmans Katholieke Universiteit Leuven, Belgium
Program Committee
Simon Andrews Sheffield Hallam University, United Kingdom Guido Dedene Katholieke Universteit Leuven, Belgium Florent Domenach University of Nicosia, Cyprus
Irina Efimenko National Research University Higher School of Eco- nomics, Russia
Paul Elzinga Amsterdam-Amstelland Police, The Netherlands Boris Galitsky University of Girona, Spain
Bernhard Ganter Technische Universit¨at Dresden, Germany
Yury Katkov National Research University of Information Tech- nologies, Mechanics and Optics, Russia
Natalia Loukachevitch Moscow State University, Russia
Dmitry Mouromtsev National Research University of Information Tech- nologies, Mechanics and Optics, Russia
Xenia Naidenova Military Medical Academy, Russia
Alexey A. Neznanov National Research University Higher School of Eco- nomics, Russia
Sergei A. Obiedkov National Research University Higher School of Eco- nomics, Russia
Simon Polovina Sheffield Hallam University, United Kingdom Uta Priss Edinburgh Napier University, United Kingdom Dominik Slezak University of Warsaw and Infobright, Poland Rustam Tagiew Technische Universit¨at Freiberg, Germany Stijn Viaene Katholieke Universiteit Leuven, Belgium
Johanna Voelker University of Mannheim, Germany Rostislav Yavorsky Witology, Russia
Additional Reviewers
Ekaterina Cherniak, National Research University of Higher School of Economics, Russia Alexandr Vorobev, Moscow State University and Witology, Russia
Sponsoring Institutions
Amsterdam-Amstelland police, The Netherlands IBM, Belgium
OpenConnect Systems, USA
Research Foundation Flanders, Belgium Vlerick Management School, Belgium
National Research University Higher of School Economics, Russia
v
Table of Contents
The Methods of Maximum Flow and Minimum Cost Flow Finding in
Fuzzy Network . . . 1 Alexandr Bozhenyuk, Evgeniya Gerasimenko and Igor Rozenberg
Language Identification for Texts Written in Transliteration . . . 13 Andrey Chepovskiy, Sergey Gusev and Margarita Kurbatova
On Parameter Identification Methods for Markov Models Applied to
Social Networks. . . 21 Denis Fedyanin
Analysing Online Social Network Data with Biclustering and Triclustering 30 Dmitry Gnatyshak, Dmitry Ignatov, Alexander Semenov and Jonas Poelmans
Term Weighting in Expert Search Task: Analyzing Communication
Patterns . . . 40 Anna Kravchenko and Dmitry Romanov
Semantic Matching Using Concept Lattice . . . 49 Ana Mestrovic
Self-Tuning Semantic Image Segmentation . . . 59 Sergey Milyaev and Olga Barinova
A Neural Network-Like Combinatorial Data Structure for Inferring
Classification Tests. . . 67 Xenia Naidenova
Extraction of Semantic Relations between Concepts with KNN
Algorithms on Wikipedia . . . 78 Alexander Panchenko, Sergey Adeykin, Alexey Romanov and Pavel Ro- manov
Computerized Recognition System for Historical Manuscripts. . . 87 Artem Skabin
An Ontology-Based Approach to Text-to-Picture Synthesis Systems . . . 94 Dmitry Ustalov and Aleksander Kudryavtsev
Evaluating the Quality Level of Projects, Authors and Experts . . . 102 Alexandr Vorobev
Author Index. . . 107