• Aucun résultat trouvé

Business intelligence Using the Knowledge Graph Built over the Russian Legal Entities Registry

N/A
N/A
Protected

Academic year: 2022

Partager "Business intelligence Using the Knowledge Graph Built over the Russian Legal Entities Registry"

Copied!
2
0
0

Texte intégral

(1)

Business Analytics Using The Knowledge Graph Built Over The Russian Legal Entities Registry

Eugene Hlyzov1, Sergey Isaev1, Yury Emelyanov2, Dmitry Pavlov2, Olga Belyaeva2, Dmitry Mouromtsev3, Olga Parkhimovich3, and Maxim Kolchin3

1 DataFabric Ltd., St.Petersburg, Russia {eugene.hlyzov, isaev}@datafabric.cc

2 Vismart Ltd., St.Petersburg, Russia

{yury.emelyanov, dmitry.pavlov,olga.belyaeva}@vismart.biz

3 ITMO University, St.Petersburg, Russia

mouromtsev@mail.ifmo.ru,olya.parkhimovich@gmail.com,kolchinmax@niuitmo.ru

Basic facts

URL of the live web application:http://tree.datafabric.cc1(launched on 17.07.2017).

Product type: subscription-based commercial web application.

Application domains: business intelligence, data analytics.

Semantic technologies (ST) employed:RDF, OWL, knowledge graphs, SPARQL.

Volume: 2.8 bil. triples, 10 mil. companies, 12 mil. individual entrepreneurs, 27 mil. persons, 333 Gb of raw unstructured data.

1 Motivation And The Product

Throughout the last 3-4 years Russian authorities have made great volumes of data collected and maintained by government services open and accessible to the people. The knowledge concealed in these data is of a great business value and open a new prominent opportunities for services that offer convenient and efficient ways of finding, exploring and discovering the hidden relations and structures that the data contain. In our case, the published open data required a tremendous effort in merging various bits, checking and distilling it before it could be used in an application. We demonstrate how the linked data and vi- sual analytics are able to satisfy the information needs for official registry data exploration and save the user’s time in the process of business investigation. Fi- nally, from the market perspective, despite the existence of well-established tools for business intelligence over the open government data, we observe an unsatis- fied demand for the affordable, easy-to-use and lightweight services targeted to individuals and small and medium businesses.

Our web service allows users a) to search the registry for the entity of inter- est b) create a graph and explore visually its connections to other companies, persons, etc. c) produce a user-generated report about entity’s background. The service requires subscription fee to be paid up front. To access the full function- ality of the service for reviewing purposes please use the following credentials:

login: iswc2017review@example.com, password: demo.

1 English version (for demo purposes only): http://tree-i8n.apps.datafabric.cc

(2)

2 ST In The Product

The added value of ST

Expected business value from utilizing ST:a) simpler and cheaper integration of additional sources of data; b) cheaper custom development; c) richer user expe- rience - discovery-provoking visual exploration of company context.

The role of ST in the architecture of the application: The application is com- prised of 4 major parts: data extract-transform-load (ETL) pipeline hosted on google cloud, RDF-triple store deployed on open source version of Blazegraph2, server-side logic and request handling service running on Kotlin and Node.js and frontend user interface logic and rendering implemented using ReactJS and open source Ontodia library3. On all stages of data transformation from storing the data to visualizing it we employ ST: data is stored in RDF graph with OWL ontology applied to it in ETL process, data is queried with SPARQL, data is visualized in the form of a graph by conversion of RDF into nodes and edges.

Advantages of using ST in our application

– We apply ST standards to explicate the knowledge concealed in the raw data by transforming the semi-structured XML files into cumulative RDF graph with well-defined data schema expressed in a form of OWL ontology . – On the basis of ST we align the internal graph-based data structures with

their graph-based diagrammatic visualization in Ontodia library[1] thus re- ducing the amount of data transformation steps and simplifying our code base.

– With the help of ST we extend the graph with additional data sources, which carry their own schemas, by investing minimum effort into it.

– By employing the underlying ontology we are empowered by an effective and automated data publishing mechanism assuring convenient data consump- tion, which contributes greatly to visibility of our product.4

Challenges that arise from locking on ST

– To introduce the regular software developers to the tools of semantic web by selecting the most mature technologies and utilities.

– To solve the mismatch between the ontologies that come from knowledge engineers and requirements of our use case regarding querying the data.

– To adjust the queries and system architecture to achieve the expected com- mercial product stability and performance on complex data calls.

References

1. Mouromtsev, D., Pavlov, D., Emelyanov, Y., Morozov, A., Razdyakonov, D., Galkin, M.: The simple web-based tool for visualization and sharing of semantic data and ontologies. In: Int-l Semantic Web Conf. (Posters & Demos) (2015)

2 https://github.com/blazegraph/database

3 https://github.com/ontodia-org/ontodia

4 https://github.com/DataFabricRus/ontology-fts

Références

Documents relatifs

One of the main characteristics of the NIST evaluation cam- paigns has been to propose different kinds of environment to evaluate the speaker diarization systems: telephone

We contend that despite the potential of using open data to promote co-innovation between government agencies and external stakeholders, there is a lack of

Abstract. This paper is dedicated to using of LodView RDF browser for naviga- tion on the Linguistic Linked Open Data cloud in Russian and languages of Rus- sia. We reveal several

In this paper, we present CRIKE, a data-science approach to automatically detect concrete applications of legal abstract terms in case-law decisions. To this purpose, CRIKE relies

In the WebIsALOD knowledge graph, we provide provenance information on statement level, i.e., for each single triple with a skos:broader relation, meta- data has to e attached to

Therefore, Section 3.1 defines the role, description and requirements of the end- users and stakeholders for their participation in the KAP phase; Section 3.2 describes the

This project’s aim is to provide country-level monitoring and analysis of the labour market basing on the publicly available data on job offers, CVs and

For a plurality of papers by a scholar sorted by the number of quotes, g-index is the largest number that g most cited articles received a total of at least g 2 citations (see