• Aucun résultat trouvé

NIM: Scalable Distributed Stream Processing System on Mobile Network Data

N/A
N/A
Protected

Academic year: 2022

Partager "NIM: Scalable Distributed Stream Processing System on Mobile Network Data"

Copied!
2
0
0

Texte intégral

(1)

NIM: Scalable Distributed Stream Processing System on Mobile Network Data

Wei Fan

IBM T.J. Watson Research, Hawthorne, NY, USA

weifan@us.ibm.com

Abstract

As a typical example of New Moore’s law, the amount of 3G mobile broad- band (MBB) data has grown from 15 to 20 times in the past two years (30TB to 40TB per day on average for a major city in China), real-time processing and mining of these data are becoming increasingly necessary. The overhead of storage and file transfer to HDFS, delay in processing, etc are making offline analysis on these datasets obsolete. Analysis of these datasets are non-trivial, examples include mobile personal recommendation, anomaly traffic detection, and network fault diagnosis. In this talk, we describe NIM - Network Intelli- gence Miner. NIM is a scalable and elastic streaming solution that analyzes MBB statistics and traffic patterns in real-time and provides information for real-time decision making. The accuracy of statistical analysis and pattern recognition of NIM is identical to that of off line analysis, while NIM can pro- cess data at line rate. The design and the unique features (e.g., balanced data grouping, aging strategy) of NIM will be helpful not only for the network data analysis but also for other applications.

Short Bio

Dr. Wei Fan is the associate director of Huawei Noah’s Ark Lab. Prior to join- ing Huawei, he received his PhD in Computer Science from Columbia University in 2001 and had been working in IBM T.J. Watson Research since 2000. His main research interests and experiences are in various areas of data mining and database systems, such as, stream computing, high performance computing, extremely skewed distribution, cost-sensitive learning, risk analysis, ensemble methods, easy-to-use nonparametric methods, graph mining, predictive feature discovery, feature selection, sample selection bias, transfer learning, time series

(2)

analysis, bioinformatics, social network analysis, novel applications and com- mercial data mining systems. His co-authored paper received ICDM’2006 Best Application Paper Award, he lead the team that used Random Decision Tree to win 2008 ICDM Data Mining Cup Championship. He received 2010 IBM Outstanding Technical Achievement Award for his contribution to IBM Infos- phere Streams. He is the associate editor of ACM Transaction on Knowledge Discovery and Data Mining (TKDD).

Références

Documents relatifs

Multigent system of safety of cyber-physical system includes: Agent_Image_An – agent of Video Stream Flow Analyzer, Agent_Dy_Objects – agent of Dynamic Objects

We configure KerA similarly to Kafka: the number of active groups is 1 so the number of streamlets gives a number of active groups equal to the number of partitions in Kafka (one

More modern solutions intend to exploit the edges of the Internet (i.e. edge computing) for performing certain data processing tasks and hence: reduce the end-to-end latency and

The region hierarchy allows us to determine the data paths and operator de- ployment priorities, based on which two strategies are applied: Response Time Rate (RTR) that iterates

In the last decade, various distributed stream processing en- gines (DSPEs) were developed in order to process data streams in a flexible, scalable, fast and resilient manner..

Then, we fix the makespan objective as con- straint and a random local search algorithm that pro- motes voltage scaling is used to reduce the energy consumption of the

We considered as input the utility scores of the Twitter FSD application when no compression algorithm is applied (see the Default approach in Figure 9(c)) and we examined how

We discuss eight challenges: making models simpler, protecting privacy and confidentiality, dealing with legacy systems, stream preprocessing, timing and availability of