• Aucun résultat trouvé

Efficient Application of Complex Graph Analytics on Very Large Real World RDF Datasets

N/A
N/A
Protected

Academic year: 2022

Partager "Efficient Application of Complex Graph Analytics on Very Large Real World RDF Datasets"

Copied!
2
0
0

Texte intégral

(1)

Efficient Application of Complex Graph Analytics on Very Large Real World RDF Datasets

Zhe Wu, Jay Banerjee

{alan.wu, jayanta.banerjee}@oracle.com Oracle USA

RDF [1] Graph modeling is a foundational technology in the whole semantic web (SW) technology stack. Since its debut in 2004, RDF graph has enjoyed many applications in the enterprise domain. Examples of these applications include, but certainly not limited to, integration and federated query of heterogeneous data sources, flexible and extensible representation of enterprise knowledge base, ad- hoc query and navigation on top of schema-less graph model of enterprise data, social network representation and link analysis, and metadata processing in the context of master data management (MDM). In the past decade, many mature open source and commercial RDF platforms and solutions [6] have been developed to store and index RDF graph data (triples and quads), edit and manage OWL [2]

ontologies, perform logical inference, execute pattern matching and graph navigation (SPARQL [3]), visualize RDF graph data and OWL ontologies, and link data in RDF format and also other data types including relational (RDB2RDF [4,5]). As a graph modeling language, RDF provides great flexibility for enterprise applications and it adds precision, through the use of URI and formal semantics, to enterprise data. SPARQL query and OWL inference have been two key functions for semantic web applications. A somewhat less obvious application of RDF is that such a graph model is also a great candidate for graph analytics.

Large-scale graph analytics [7-10] (page ranking, community detection, etc) for enterprise applications have many challenges, including efficient and scalable graph data management, high-performance implementation of graph algorithms, user- and operation-friendly management interfaces, and tight integration with high quality tools. To effectively address such challenges, we propose a comprehensive graph analytics architecture built upon the Oracle platform. Key components of

(2)

this platform include Oracle Database 12c, SQL-based graph analytics, parallel in- memory graph processing engine, and the RDF Semantic Graph capabilities. We show why complex graph-based analytics matter for large enterprise-scale RDF datasets, and we share our experiences in implementing several graph analytical functions, such as page ranking, clustering, path analysis, and so on. We apply and evaluate our implementation on several large-scale real world RDF graph datasets, including graphs from social networks, social media domain and the linked data domain. We make best practice recommendations based on our experiences.

References

[1] http://www.w3.org/TR/rdf11-concepts/

[2] http://www.w3.org/TR/owl2-syntax/

[3] http://www.w3.org/TR/rdf-sparql-query/

[4] http://www.w3.org/TR/rdb-direct-mapping/

[5] http://www.w3.org/TR/r2rml/

[6] http://www.w3.org/wiki/LargeTripleStores [7] http://graphlab.org/projects/index.html

[8] http://www.oracle.com/technetwork/oracle-labs/parallel-graph- analytics/overview/index.html

[9] http://spark.apache.org/graphx/

[10] http://giraph.apache.org/

Références

Documents relatifs

In order to illustrate our language of abstract queries, and show its expressive power, we show below the table query T for the complex use case (C) of Section 2. ) T 0 is the

The proliferation of digital data sources in many domains brings a new urgency to the need for tools which allow to flexibly query heterogeneous data (relational, JSON,

Data Inte- gration and VISualization (DIVIS) : from large heterogeneous datasets to interpretable visualisations in plant science... Data Integration and VISualization

This interface allows DBA visualizing the state of his/her database that concerns three aspects: (i) tables of the target data warehouse (their descriptions, denition and domain

In this paper, based on SPARQL CONSTRUCT query, we present a new query form called SPARQL Analyze Query, which inte- grates graph analytics with graph querying.. SPARQL Analyze

For instance, although using a SPARQL endpoint to evaluate a complete query is the best in terms of the number of requests and the amount of transferred data, having multiple

The Mediator receives the set of keywords specified by the user and executes the federated translation algorithm, which: 1 uses the Storage Component to find the data and metadata

In this article, we first introduce N EO EMF, a multi-database model persistence framework able to store very large models in an efficient way according to specific