• Aucun résultat trouvé

A Semantically-Based Big Data Processing System Using Hadoop and Map-Reduce

N/A
N/A
Protected

Academic year: 2021

Partager "A Semantically-Based Big Data Processing System Using Hadoop and Map-Reduce"

Copied!
3
0
0

Texte intégral

(1)

HAL Id: hal-01646637

https://hal.inria.fr/hal-01646637

Submitted on 23 Nov 2017

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Distributed under a Creative Commons Attribution| 4.0 International License

A Semantically-Based Big Data Processing System Using Hadoop and Map-Reduce

Wang Wanting, Qin Zheng

To cite this version:

Wang Wanting, Qin Zheng. A Semantically-Based Big Data Processing System Using Hadoop and

Map-Reduce. 17th International Conference on Informatics and Semiotics in Organisations (ICISO),

Aug 2016, Campinas, Brazil. pp.246-247. �hal-01646637�

(2)

A Semantically-Based Big Data Processing System Using Hadoop and Map-Reduce

Wang Wanting1and Qin Zheng1,2

1 School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai, China

2014311051@live.sufe.edu.cn, qinzheng@mail.shufe.edu.cn

2 South University of Science and Technology of China, Shenzhen, China

Abstract. Infinancial industry, a wide range offinancial systems generate vast amount of data in different structures, which change with compliance rules change and hard to manage due to their heterogeneity. This paper introduces a semantically-based big data processing system to integrate the data from dif- ferent sources, which realizes the query and computation in semantic layer. The system provides a new data management way for thefinancial industry. With Semantic Web, the information can be managed, integrated, and collaborated in a more fluent way than it in traditional ETL. In order to clear the complex logical relationship among data, the system uses SPARQL to query. Through Map-Reduce, this system, based on Hadoop and Hbase can improve the pro- cessing speed for big data.

Keywords: Big data

Semantics

Data integration

Distributed computation

1 Introduction

There are some characteristics infinancial big data, which bring lots of problems of data management infinancial field, including cross-regional and cross-system distri- bution [1], multiple structured and non-standardized data formats, and rapid change in the analysis strategy of big data [2]. This paper presents a new semantically-based big data processing system, which connects data through linked data to integrate the heterogeneous data.1

2 Model Design

The big data infinancial field is so huge-scaled and multi-structured that traditional ETL cannot integrate data efficiently. On the one hand, by using semantic analysis the data can be connected and the system can realize data sharing with RDF based on semantic data queries. On the other hand, distributed computation can provide efficient

1This research is supported by National Natural Science Fund of China (71302080) and Ministry of Education Research of Social Science Youth Foundation Project (13YJC630149).

©IFIP International Federation for Information Processing 2016

Published by Springer International Publishing Switzerland 2016. All Rights Reserved M.C.C. Baranauskas et al. (Eds.): ICISO 2016, IFIP AICT 477, pp. 246–247, 2016.

DOI: 10.1007/978-3-319-42102-5

(3)

big data processing [3]. An experiment of distributed semantics queries is tested in order to validate the practicability of semantic mapping of relational data. The semantically-based big data processing system is a vertical structure (Fig. 1), which is based on Hbase storage. The data collected from the data sources are uploaded to central system and permanently stored in Hbase after corresponding conversions. All the data can be connected through Linked Data, and then generate RDF data sets. With Hadoop platform, after semantic analysis and distributed computation, the results can be delivered to every application terminal in cloud.

3 Conclusion

The experiment illustrates that the system based on semantics can solve the problem of the integration of heterogeneous data to some extent. With semantics and distributed computation, the efficient process and integration of complex big data are able to be realized.

References

1. Madnick, S.E.: From VLDB to VMLDB (Very MANY Large Data Bases): dealing with large-scale semantic heterogeneity. In: Proceedings of the 21st International Conference on Very Large Data Bases, Zurich, Switzerland, pp. 11–15 (1995)

2. Rabl, T., Gómez-Villamor, S., Sadoghi, M., Muntés-Mulero, V., Jacobsen, H.A., Mankovskii, S.: Solving big data challenges for enterprise application performance manage- ment. Proc. Vldb Endowment5(12), 1724–1735 (2012)

3. Bizer, C., Heath, T., Idehen, K., Berners-Lee, T.: Linked Data on the Web (LDOW2008). In:

Proceedings of The 17th International Conference on World Wide Web, Beijing, China, pp. 21–25 (2008)

Fig. 1. The semantically-based big data processing system

A Semantically-Based Big Data Processing System 247

Références

Documents relatifs

Because of that metagraph processing framework can read data stored in the form of textual predicate representation and import it into flat graph data structure handled by

The collection and processing system of the medical data based on cloud computing, IoT in medicine and anticipation system for the dete- rioration of the patient's state on the

The diagram and data source sharing features lay the foundation for collab- orative work enabling multiple users to construct the views based on common data together and in

In this work, we present how we can apply the RDF Data Cube specification to se- mantically enrich longitudinal clinical study data to allow users to query the

All five tasks combined created a semantic stylesheet, where each task should cover an essential part of the whole stylesheet creation process – property selection (T1), data

Despite the success of the above-mentioned systems, very few individuals and research groups publish their bibliographic data on their websites in a structured format,

Moreover, online spreadsheets are increasingly popular and have the potential to boost the growth of the Semantic Web by providing well-formed and publicly shared data

When analysing large RDF datasets, users are left with two main options: using SPARQL or using an existing non-RDF-specific big data language, both with its own limitations.. The