Workload models and generation for wide-area Content-Based
Publish/Subscribe
Publish/Subscribe is a data-dissemination model that enables the selective delivery of publications to subscribers on the basis of their registered interests.
Typically, publish/subscribe applications are deployed over a set of brokers or mediation routers. Publish/Subscribe service models can be classified accord- ingly to their expressiveness. The content-based publish/subscribe model en- ables subscribers to express their subscriptions as a predicate enabling complex subscriptions.
The scalability of content-based publish/subscribe solutions is usually lim- ited by the communication costs incurred. These costs involve: the traffic re- quired to compute the set of subscriptions matching a publication (1), the costs incurred for disseminating publications to subscribers (2) and the control traffic necessary to enable the dissemination method (3). The costs and benefits of solutions generally vary with the underlying workload assumptions. Communi- cation costs are usually sensitive to the volume and popularity of publications, the similarity among subscriptions, the matching distribution or the locality of interests. Due to the lack of publicly available datasets for content-based publish/subscribe systems, there is no consensus about the most realistic as- sumptions to assess solutions.
We believe that researchers not only need support to evaluate their solutions under realistic workload models, but also under fine-grained workload assump- tions.
The objectives of this internship are to:
1. Develop a framework that captures and models the most impactful work- load properties.
2. Interpolate workload properties from real world topic-based systems like Google groups or Twitter.
3. Develop a workload generator that would enable experimenters to validate their scenarios using realistic workload models or to tune them to meet a set of assumptions.
The candidates should demonstrate good modeling and programming skills.
An experience in interface design would be particularly useful. The internship will be held at Lip6 and will be funded for a 5 to 6 months period.
1
2
Contacts
• Mohamed Diallo, mohamed.diallo@lip6.fr
• Serge Fdida, serge.fdida@lip6.fr
References
[1] S. Fdida and M. Diallo. The network is a database. In Proceedings of the 4th Asian Conference on Internet Engineering, pages 1–6. ACM, 2008.
[2] A. Yu, P. Agarwal, and J. Yang. Generating Wide-Area Content-Based Publish/Subscribe Workloads.