• Aucun résultat trouvé

Evaluation de performances d'Apache Kafka en support aux applications Big Data de traitement de flux

N/A
N/A
Protected

Academic year: 2021

Partager "Evaluation de performances d'Apache Kafka en support aux applications Big Data de traitement de flux"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-01647229

https://hal.archives-ouvertes.fr/hal-01647229

Submitted on 24 Nov 2017

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

A Performance Evaluation of Apache Kafka in Support

of Big Data Streaming Applications

Paul Le Noac’H, Alexandru Costan, Luc Bougé

To cite this version:

Paul Le Noac’H, Alexandru Costan, Luc Bougé. A Performance Evaluation of Apache Kafka in Support of Big Data Streaming Applications. IEEE Big Data 2017, Dec 2017, Boston, United States. 2017. �hal-01647229�

(2)

Da

te

Bibliographie / sources

• 1.

https://data-artisans.com/blog/extending-the-yahoo-streaming-benchmark

• 2. http://www.tutorialspoint.com/apache_kafka/

3. Kafka Architecture

1. Context

5. Results

A Performance Evaluation of Apache Kafka in

Support of Big Data Streaming Applications

Producer performances when modifying batch size for several

number of nodes and a message size of 50B

7. Take-aways

The variation of the batch size shows that there is

a range of batches with a better performance.

When varying the number of nodes in some scenarios:

a sudden performance drop (probably due to the

internal Kafka synchronizations as well as the

underlying network).

Future work : evaluating reference processing

frameworks (Apache Spark and Flink)

Parameters

:

Message size

Batch size

Acquirement strategy

Network and disk I/O

threads

Message replication

Hardware

2. Contribution

Isolate the performance of each Kafka component

Separated tests for Producers and Consumers

Make correlations between configuration parameters,

resource usage and performance metrics

Experiments executed on Grid5000

Up to 32 nodes (16 cores per nodes, 28 GB RAM, 10

Gigabit Ethernet)

6. Key metrics

Stream computing: a new paradigm

enabling real-time Big Data processing

through 3 steps

Ingestion: Apache Kafka

Processing: Apache Spark / Flink

Storage: HDFS, Cassandra

Ingestion can be a bottleneck for stream

processing

Identify the impact of different parameter settings on Kafka’s overall

performance

Experiment evaluation of several configurations and performance

metrics of Kafka

Allow users to avoid bottlenecks and achieve good practice for stream

processing

Paul LE NOAC’H

1

, Alexandru COSTAN

2

, Luc BOUGE

3

1

INSA Rennes,

2

INRIA / INSA Rennes,

3

ENS Rennes

Contacts

paul.lenoach@irisa.fr

alexandru.costan@irisa.fr

luc.bouge@ens-rennes.fr

4. Methodology

Performance

Metrics:

Throughput

(MB/s, items/s)

Latency

CPU usage

Disk usage

Memory usage

Network usage

Références

Documents relatifs

I Sequential text classification offers good performance and naturally uses very little information. I Sequential image classification also performs

Apache Samza [9] is an open source distributed processing framework created by Linkedin to solve various kinds of stream processing requirements such as tracking data, service

Real time ana- lytics is becoming the fastest and most efficient way to obtain useful knowledge from what is happening now, allowing organizations to react quickly when prob-

Deep Learning on Big Data Sets in the Cloud with Apache Spark and Google TensorFlow.. December

expérimentation in vivo et in vitro Intervention sur le monde extérieur?. "  Science

with the collection of data to uncover hidden trends and patterns, (2) big data storage, which offers database management systems to store data at reduced cost, (3) big data

this is exactly the time the american this is exactly the time the american people need to hear from the person who in approx 40 days will be responsible for dealing with this

Using four rounds of Delphi study, followed by a verification survey, this research aims to identify the core characteristics of Big Data as determined by practitioners