Large-Scale Distributed Systems

Top PDF Large-Scale Distributed Systems:

Analysis of the Propagation Time of a Rumour in Large-scale Distributed Systems

Analysis of the Propagation Time of a Rumour in Large-scale Distributed Systems

semination of information in large scale distributed networks through pairwise interactions. This problem, originally called rumor mongering, and then rumor spreading has mainly been investigated in the synchronous model. This model relies on the assumption that all the nodes of the network act in synchrony, that is, at each round of the protocol, each node is allowed to contact a random neighbor. In this paper, we drop this assumption under the argument that it is not realistic in large scale systems. We thus consider the asynchronous variant, where at time unit, a single node interacts with a randomly chosen neighbor. We perform a thorough study of T n the total number of interactions
En savoir plus

12 En savoir plus

Multi-scale analysis of large distributed computing systems

Multi-scale analysis of large distributed computing systems

5. CONCLUSION Large-scale distributed systems usually rely on monitor- ing tools that gather high-level statistics about the resource utilization. The main reason for using such techniques is related to scalability, as a system gets larger, the amount of trace data becomes an important issue. In this paper, we have investigated what kind of analysis would be possible if detailed traces were available about the resource utiliza- tion of large-scale platforms. The analysis we present re- lies on data aggregation in space and time scales, coupled with elaborated visualization techniques. This combination of techniques, from tracing to analysis, provides interesting insights about the behavior of the whole distributed system. The results we present are based on two scenarios related to scheduling of bag-of-tasks in the BOINC volunteer com- puting platform. The analysis of the first scenario enabled us to detect a problem in the fairness of the scheduling al- gorithm of the simulated BOINC clients. Such problem ap- peared mainly on clients with low availability. The treemap visualization of Triva, combined with temporal integration on the whole simulated time, allowed the problem to be im- mediately spotted.
En savoir plus

9 En savoir plus

Speed for the elite, consistency for the masses: differentiating eventual consistency in large-scale distributed systems

Speed for the elite, consistency for the masses: differentiating eventual consistency in large-scale distributed systems

B. Problem Statement The key feature of update consistency lies in the ability to define precisely the nature of the convergence state reached once all updates have been issued. However, the nature of temporary states also has an important impact in practical systems. This raises two important challenges. First, existing systems address the consistency of temporary states by imple- menting uniform constraints that all the nodes in a system must follow [18]. But different actors in a distributed application may have different requirements regarding the consistency of these temporary states. Second, even measuring the level of inconsistency of these states remains an open question. Existing systems-oriented metrics do not take into account the ordering of update operations (append in our case) [3], [19], [20], [21], while theoretical ones require global knowledge of the system [22] which makes them impractical at large scale. In the following sections, we address both of these chal- lenges. First we propose a novel broadcast mechanism that, together with Algorithm 1, satisfies update consistency, while supporting differentiated levels of consistency for read oper- ations that occur before the convergence state. Specifically, we exploit the evident trade-off between speed of delivery and consistency, and we target heterogeneous populations consisting of an elite of Primary nodes that should receive fast, albeit possibly inconsistent, information, and a mass of Secondary nodes that should only receive stable consistent information, albeit more slowly. Second, we propose a novel metric to measure the level of inconsistency of an append-only queue, and use it to evaluate our protocol.
En savoir plus

11 En savoir plus

Coding for resource optimization in large-scale distributed systems

Coding for resource optimization in large-scale distributed systems

Yet, in spite of their appealing properties, distributed systems tend to be hardly predictable and unreliable. Indeed, to conciliate the need for performance and the scale of distributed systems, distributed algorithms trade global guarantees for efficiency thus introducing unpredictability in distributed systems. For example, in large-scale distributed systems, it may not be reasonable for every single device to maintain a complete knowledge of every other device. Hence, most proposed algorithms are local and run using only a partial knowledge of the whole system. Local algorithms generally also rely on randomized decisions thus leading to non-deterministic algo- rithms. Moreover, distributed systems are unreliable due to temporary or permanent failures. More specifically, if we consider large scale distributed systems, built from commodity hardware, the occurrence of failures may be frequent. Moreover, if we consider peer-to-peer systems, deployed on end-users computers, peers may disconnect on a daily basis. These failures obviously reduce the guarantees (availability, durability, performance. . . ) that can be offered. Yet, these distributed systems are now used for running services requiring a high quality of service (storage, telephony, software updates. . . ). Clearly, the unreliability and unpredictability of the underlying devices of a distributed system are not trivially compatible with required guarantees for running such services on top of it.
En savoir plus

127 En savoir plus

Towards a holistic construction of opportunistic large-scale distributed systems

Towards a holistic construction of opportunistic large-scale distributed systems

1.3. OUR VISION: OPPORTUNISTIC SYSTEMSWITH A HOLISTIC APPROACH 19 behavior of individual node or class of nodes, rather than on the high-level functions delivered by the distributed system as a whole. Challenge #2: Deploy and maintain a live system with a very large number of components Deploying a large-scale system is a complex task: how do you prop- erly configure each individual component? How do you boot-strap? Moreover, even after a successful initial deployment, large-scale systems are highly likely to experience some crashes or other run-time issues, simply due to the large number of components involved: one of them is bound to have a problem. As a consequence of all that, managing a large-scale distributed system and keeping it operational can be daunting. Challenge #3: Make system able to react to changing circumstances and to evolve over time In realistic situations, no single configuration is appropriate to all circumstances, but tweaking the configuration of a large, live system is extremely complex: the number of separate actions needed can be very large, unexpected con- flicts may arise, or unforeseen dependencies. Also, due to the heterogeneous context mentioned above, systems deployed for any significant length of time will see their environment evolve around them, new infrastructure, new features. A good system should be able to evolve in parallel, to adapt to and leverage its new environment, but building-in this kind of forward-looking adaptability to evolutions that are still unknown can be extremely tricky.
En savoir plus

112 En savoir plus

A Survey on Techniques for Improving the Energy Efficiency of Large-Scale Distributed Systems

A Survey on Techniques for Improving the Energy Efficiency of Large-Scale Distributed Systems

4. CONCLUSION This survey discussed techniques for improving the energy efficiency of computing and networking resources in large-scale distributed systems. As discussed, during the past decade, solutions have been proposed for improving the energy efficiency of computing and networking resources. For computing resources the solutions work at different levels, from individual nodes to entire infrastructures where they take advantage of recent advanced functionalities such as virtualization. In parallel, for wired networks, shutdown techniques have been extensively studied and evaluated to limit the number of resources that can remain idle and consume energy unneces- sarily. There are also techniques for adapting the performance of both computing and network resources (and their energy usage) to the needs of applications and services. These approaches are often combined and applied in a coordinated way in large scale distributed systems.
En savoir plus

36 En savoir plus

Census: Location-Aware Membership Management for Large-Scale Distributed Systems

Census: Location-Aware Membership Management for Large-Scale Distributed Systems

Many large-scale distributed systems employ ad-hoc solutions to track dynamic membership. A common ap- proach is to use a centralized server to maintain the list of active nodes, as in Google’s Chubby lock service [5]. Such an approach requires all clients to communicate di- rectly with a replicated server, which may be undesirable from a scalability perspective. An alternative, decentral- ized approach seen in Amazon’s Dynamo system [12] is to track system membership using a gossip protocol. This approach provides only eventual consistency, which is inadequate for many applications, and can be slow to converge. These systems also typically do not tolerate Byzantine faults, as evidenced by a highly-publicized outage of Amazon’s S3 service [1]
En savoir plus

15 En savoir plus

Re-ranking Approach to Classification in Large-scale Power-law Distributed Category Systems

Re-ranking Approach to Classification in Large-scale Power-law Distributed Category Systems

1.2 Related work and our contributions The work by [4] is among the pioneering studies in classification of power-law distributed web-scale directories such as the Yahoo! directory consisting of over 100,000 target classes. For similar category systems, classification techniques based on refined experts and deep classification have been proposed in [1] and [6] respectively. More recently recursive regularization based SVM (HR-SVM) has been studied in [3] wherein the optimization problem for learning the dis- criminant functions exploits the given taxonomy of categories. This approach represents the current state-of-art as it performs better than most techniques on large-scale datasets released as part of the Large Scale Hierarchical Text Classi- fication Challenge in last few years 1 . However, the drawback of this method is that the improvement in the Micro-F1 (same as accuracy for mono-label prob- lems) and Macro-F1 measures of this approach are not substantial over flat SVM classifier for which ready to use packages such as Liblinear are available. As shown in Table 3 of [3], the improvement over SVM baseline is less than 1% (in absolute terms) on most datasets. As a result, a natural question to ask is :
En savoir plus

11 En savoir plus

Distributed MPC of wide-area electromechanical oscillations of large-scale power systems

Distributed MPC of wide-area electromechanical oscillations of large-scale power systems

However, since wide-area power system oscillations tend to appear in very large-scale systems, ranging over thou- sands of kilometers and involving many different subsystems managed by different transmission system operators (TSOs), it is often practically not feasible to handle these problems with a fully centralized approach. On the other hand, reli- ability/vulnerability considerations may suggest that even in a system where a fully centralized control scheme would be feasible, it is not necessarily desirable to do so. Consequently, it is of interest to study distributed MPC schemes addressing various decompositions of the global control problem. In this setting, local MPC systems could determine optimal inputs for a subset of controllers under their authority, based on a model of their subsystem and a local control objective [7], [8].
En savoir plus

7 En savoir plus

Distributed control of electromechanical oscillations in very large-scale electric power systems

Distributed control of electromechanical oscillations in very large-scale electric power systems

5.3 Related works Various distributed MPC schemes have been proposed to replace the central- ized MPC in large-scale power systems [76, 77, 96–99]. In paper [96], control agents are placed at each generator and load to control power injections to eliminate operating-constraint violations before the protection system acts. They use detailed models of their surrounding areas and simplified models of remote areas to predict system states. The agents cooperate with each other by sharing their objectives and exchanging solutions and measurements. By cloning the boundary nodes of neighboring areas, paper [98] breaks the whole power grid into relatively independent subsystems that only interact through consistency constraints on shared variables; each local MPC controller calcu- lates optimized supplementary inputs for automatic voltage regulators and static var compensators in its area, and coordinates with neighboring MPC controllers by exchanging Lagrange multipliers.
En savoir plus

162 En savoir plus

BlobSeer: Towards efficient data storage management for large-scale, distributed systems

BlobSeer: Towards efficient data storage management for large-scale, distributed systems

The second experiment considers the complementary case to the one presented in the previ- ous section, namely when the whole content of the virtual image needs to be read by each VM instance. This represents the most unfavorable read-intensive scenario that corresponds to applications which need to read input data stored in the image. In this case, the time to run the VM corresponds to the time to boot and fully read the whole virtual image disk con- tent (by performing a “cat /dev/hda1”, where hda1 is the virtual disk corresponding to the image). Again, the evaluation is performed for three chunk sizes: 1 MB, 512 KB and 256 KB. The average time to boot and fully read the initial disk content is represented in Fig- ure 10.7. As expected, in the case of pre-propagation, this time remains constant as no read contention exists. In the case of our approach, almost perfect scalability is also noticeable up to 50 storage nodes, despite read concurrency. This is so because there are enough stor- age providers among which the I/O workload can be distributed. After the number of in- stances outnumbers the storage nodes, the I/O pressure increases on each storage node, which makes the average read performance degrade in a linear fashion. On a general note, the read performance is obviously worse in or approach, as the data is not available locally.
En savoir plus

170 En savoir plus

Mignon: A Fast Decentralized Content Consumption Estimation in Large-Scale Distributed Systems

Mignon: A Fast Decentralized Content Consumption Estimation in Large-Scale Distributed Systems

5 Conclusion In this paper, we have proposed Mignon, a new protocol to rapidly estimate the aggregate affinity of a newly uploaded video in a community of users in a fully decentralized manner. Our proposal avoids an explicit and costly aggregation by relying on the properties of similarity-based self-organizing overlay networks, and can be used to decide where to place videos in a decentralized UGC system. By eschewing the need for a central support infrastructure, our approach hints at the possibility of fast reactive aggregate analytics in decentralized systems. This may be useful both to promote alternatives to the cloud-centered model of current UGC video services, but also to improve hybrid P2P/cloud architectures [23,38] by offloading complex adaptive tasks to the P2P part of a hybrid system.
En savoir plus

15 En savoir plus

Visualization and Detection of Resource Usage Anomalies in Large Scale Distributed Systems

Visualization and Detection of Resource Usage Anomalies in Large Scale Distributed Systems

Centre de recherche INRIA Paris – Rocquencourt : Domaine de Voluceau - Rocquencourt - BP 105 - 78153 Le Chesnay Cedex Centre de recherche INRIA Rennes – Bretagne Atlantique : IRISA, Camp[r]

35 En savoir plus

On Line Trace Synchronization for Large Scale Distributed Systems

On Line Trace Synchronization for Large Scale Distributed Systems

Tracing and monitoring tools, and other similar analysis tools, add new requirements to the old problem of coping with asynchronous clocks in distributed systems. Existing ap- proaches based on the convex hull can achieve excellent accuracy for a posteriori analysis, but impose a significant cost and latency when used in live mode and over large clusters. We propose a novel method, LIANA (Live Incremental Asynchronous Network Analysis), for incrementally computing the clock offset, and updating it as the network evolves, along each communication link, as well as selecting the best synchronization paths and time reference node. Each connection in a network requires message exchanges to compute the clock skew and offset between two connected nodes. This method relies on the trace events recorded for the existing TCP/IP traffic between nodes. After computing the offset and its accuracy for every connection in the network graph, a minimum spanning tree is computed. The edges with the best accuracy are selected and form the spanning tree. Then, a central node is se- lected as the time reference to optimally compute the offset from any node to this reference node. LIANA is efficient, both in terms of synchronization accuracy and time complexity. The method, which is used for online distributed trace synchronization, has been evaluated in realistic scenarios with a diverse set of network topologies and traffic. We show that LIANA generates precise results highly efficiently, which makes it suitable for large cloud-distributed systems.
En savoir plus

147 En savoir plus

Markov Chains Competing for Transitions: Application to Large-Scale Distributed Systems

Markov Chains Competing for Transitions: Application to Large-Scale Distributed Systems

Centre de recherche INRIA Rennes – Bretagne Atlantique IRISA, Campus universitaire de Beaulieu - 35042 Rennes Cedex France Centre de recherche INRIA Bordeaux – Sud Ouest : Domaine Univer[r]

33 En savoir plus

A Distributed and Parallel Asynchronous Unite and Conquer Method to Solve Large Scale Non-Hermitian Linear Systems

A Distributed and Parallel Asynchronous Unite and Conquer Method to Solve Large Scale Non-Hermitian Linear Systems

the iteration number. These methods are already well implemented in parallel to profit from the great number of computation cores on large clusters. The solving of complicated linear systems with basic iterative methods cannot always converge fast. The conver- gence rate depends on the specialties of operator matrix. Thus the researchers introduce a kind of preconditioners which combine the stationary methods and iterative methods, to improve the spectral proprieties of operator A and to accelerate the convergence. This kind of preconditioners includes the incomplete LU factorization preconditioner (ILU) [6], the Jacobi preconditioner [5], the succes- sive over-relaxation preconditioner (SOR) [1], etc. Meanwhile, there is a kind of deflated preconditioners which use the approximated eigenvalues during the solving procedure to form a new initial vector for the next restart procedure, which allows to speed up a further computation. Erhel [11] studied a deflated technique for the restarted GMRES algorithm, based on an invariant subspace approximation which is updated at each cycle. Lehoucq [15] in- troduced a deflation procedure to improve the convergence of an Implicitly Restarted Arnoldi Method (IRAM) for computing the eigenvalues of large matrices. Saad [21] presented a deflated ver- sion of the conjugate gradient algorithm for solving linear systems. The implementation of these iterative methods was a good tool to resolve linear systems for a long time during past decades.
En savoir plus

12 En savoir plus

Comparison of centralized, distributed and hierarchical model predictive control schemes for electromechanical oscillations damping in large-scale power systems

Comparison of centralized, distributed and hierarchical model predictive control schemes for electromechanical oscillations damping in large-scale power systems

Ó 2014 Elsevier Ltd. All rights reserved. 1. Introduction Some characteristics of modern large-scale electric power systems, such as long transmission distances over weak grids, highly variable generation patterns and heavy loading, tend to in- crease the probability of appearance of sustained wide-area elec- tromechanical oscillations. The term ‘‘wide-area’’ is used here to emphasize the possible co-existence of local and inter-area oscilla- tion modes of different frequencies that might appear simulta- neously in different parts of large-scale systems. Such oscillations threaten the secure operation of power systems and if not con- trolled efficiently can lead to generator outages, line tripping and large-scale blackouts [1–3] . Current automatic control systems, de- signed to address low-frequency oscillations, are mostly based on very local control strategies realized through Power System Stabi- lizers (PSSs) and FACTS devices.
En savoir plus

10 En savoir plus

Testing Architectures for Large Scale Systems

Testing Architectures for Large Scale Systems

Abstract. Typical distributed testing architectures decompose test cases in actions and dispatch them to different nodes. They use a central test controller to synchronize the action execution sequence. This architec- ture is not fully adapted to large scale distributed systems, since the central controller does not scale up. This paper presents two approaches to synchronize the execution of test case actions in a distributed man- ner. The first approach organizes the testers in a B-tree synchronizing through messages exchanged among parents and children. The second approach uses gossiping messages synchronizing through messages ex- changed among consecutive testers. We compare these two approaches and discuss their advantages and drawbacks.
En savoir plus

13 En savoir plus

Performance Evaluation of Large-scale Dynamic Systems

Performance Evaluation of Large-scale Dynamic Systems

Rennes, France bruno.sericola@inria.fr ABSTRACT In this paper we present an in-depth study of the dynamicity and robustness properties of large-scale distributed systems, and in particular of peer-to-peer systems. When design- ing such systems, two major issues need to be faced. First, population of these systems evolves continuously (nodes can join and leave the system as often as they wish without any central authority in charge of their control), and second, these systems being open, one needs to defend against the presence of malicious nodes that try to subvert the system. Given robust operations and adversarial strategies, we pro- pose an analytical model of the local behavior of clusters, based on Markov chains. This local model provides an eval- uation of the impact of malicious behaviors on the correct- ness of the system. Moreover, this local model is used to evaluate analytically the performance of the global system, allowing to characterize the global behavior of the system with respect to its dynamics and to the presence of mali- cious nodes and then to validate our approach.
En savoir plus

11 En savoir plus

Interactive Analysis of Large Distributed Systems with Topology-based Visualization

Interactive Analysis of Large Distributed Systems with Topology-based Visualization

6 Conclusion With the advent of very large scale distributed systems through complex interconnec- tion network, phenomenon such as locality and resource congestion have become more and more critical to study and understand the performance of parallel applications. Yet, no visualization tool enables either to handle in a scalable way such workload or to pro- vide deep hindsight into such issues. We think such kind of tool should also need to allow for an interactive and exploratory analysis, knowing how to adapt to the variety of situations that can be investigated. In this article, we explain how we have built a graph-based visualization meeting the previous requirements. We have implemented the technique in an open-source visualization tool called V IVA 3 and which enables to study the correlation between quantities at a spatial and temporal level. The multiscale capability of this visualization allows the analyst to select the adequate level of de- tails and should be put in relation to what has been done for treemaps [31]. Our new visualization has the same aggregation features but also allows to display topological information. Such multiscale capability is also essential to achieve a scalable visual- ization both in term of fluidity and meaning. Yet, it is not sufficient in itself. The ability to dynamically aggregate and disaggregate groups of resources requires to adjust the layout of the graph, which we have done using a dynamic force-directed graph layout mechanism. Such algorithm allows the analyst to easily reorganize as well readjust the layout in a very efficient way.
En savoir plus

25 En savoir plus

Show all 10000 documents...