semination of information in largescaledistributed networks through pairwise interactions. This problem, originally called rumor mongering, and then rumor spreading has mainly been investigated in the synchronous model. This model relies on the assumption that all the nodes of the network act in synchrony, that is, at each round of the protocol, each node is allowed to contact a random neighbor. In this paper, we drop this assumption under the argument that it is not realistic in largescalesystems. We thus consider the asynchronous variant, where at time unit, a single node interacts with a randomly chosen neighbor. We perform a thorough study of T n the total number of interactions
Large-scaledistributedsystems usually rely on monitor- ing tools that gather high-level statistics about the resource utilization. The main reason for using such techniques is related to scalability, as a system gets larger, the amount of trace data becomes an important issue. In this paper, we have investigated what kind of analysis would be possible if detailed traces were available about the resource utiliza- tion of large-scale platforms. The analysis we present re- lies on data aggregation in space and time scales, coupled with elaborated visualization techniques. This combination of techniques, from tracing to analysis, provides interesting insights about the behavior of the whole distributed system. The results we present are based on two scenarios related to scheduling of bag-of-tasks in the BOINC volunteer com- puting platform. The analysis of the first scenario enabled us to detect a problem in the fairness of the scheduling al- gorithm of the simulated BOINC clients. Such problem ap- peared mainly on clients with low availability. The treemap visualization of Triva, combined with temporal integration on the whole simulated time, allowed the problem to be im- mediately spotted.
B. Problem Statement
The key feature of update consistency lies in the ability to define precisely the nature of the convergence state reached once all updates have been issued. However, the nature of temporary states also has an important impact in practical systems. This raises two important challenges. First, existing systems address the consistency of temporary states by imple- menting uniform constraints that all the nodes in a system must follow . But different actors in a distributed application may have different requirements regarding the consistency of these temporary states. Second, even measuring the level of inconsistency of these states remains an open question. Existing systems-oriented metrics do not take into account the ordering of update operations (append in our case) , , , , while theoretical ones require global knowledge of the system  which makes them impractical at largescale. In the following sections, we address both of these chal- lenges. First we propose a novel broadcast mechanism that, together with Algorithm 1, satisfies update consistency, while supporting differentiated levels of consistency for read oper- ations that occur before the convergence state. Specifically, we exploit the evident trade-off between speed of delivery and consistency, and we target heterogeneous populations consisting of an elite of Primary nodes that should receive fast, albeit possibly inconsistent, information, and a mass of Secondary nodes that should only receive stable consistent information, albeit more slowly. Second, we propose a novel metric to measure the level of inconsistency of an append-only queue, and use it to evaluate our protocol.
Yet, in spite of their appealing properties, distributedsystems tend to be hardly predictable and unreliable. Indeed, to conciliate the need for performance and the scale of distributedsystems, distributed algorithms trade global guarantees for efficiency thus introducing unpredictability in distributedsystems. For example, in large-scaledistributedsystems, it may not be reasonable for every single device to maintain a complete knowledge of every other device. Hence, most proposed algorithms are local and run using only a partial knowledge of the whole system. Local algorithms generally also rely on randomized decisions thus leading to non-deterministic algo- rithms. Moreover, distributedsystems are unreliable due to temporary or permanent failures. More specifically, if we consider largescaledistributedsystems, built from commodity hardware, the occurrence of failures may be frequent. Moreover, if we consider peer-to-peer systems, deployed on end-users computers, peers may disconnect on a daily basis. These failures obviously reduce the guarantees (availability, durability, performance. . . ) that can be offered. Yet, these distributedsystems are now used for running services requiring a high quality of service (storage, telephony, software updates. . . ). Clearly, the unreliability and unpredictability of the underlying devices of a distributed system are not trivially compatible with required guarantees for running such services on top of it.
1.3. OUR VISION: OPPORTUNISTIC SYSTEMSWITH A HOLISTIC APPROACH 19
behavior of individual node or class of nodes, rather than on the high-level functions delivered by the distributed system as a whole.
Challenge #2: Deploy and maintain a live system with a very large number of components Deploying a large-scale system is a complex task: how do you prop- erly conﬁgure each individual component? How do you boot-strap? Moreover, even after a successful initial deployment, large-scalesystems are highly likely to experience some crashes or other run-time issues, simply due to the large number of components involved: one of them is bound to have a problem. As a consequence of all that, managing a large-scaledistributed system and keeping it operational can be daunting. Challenge #3: Make system able to react to changing circumstances and to evolve over time In realistic situations, no single conﬁguration is appropriate to all circumstances, but tweaking the conﬁguration of a large, live system is extremely complex: the number of separate actions needed can be very large, unexpected con- ﬂicts may arise, or unforeseen dependencies. Also, due to the heterogeneous context mentioned above, systems deployed for any signiﬁcant length of time will see their environment evolve around them, new infrastructure, new features. A good system should be able to evolve in parallel, to adapt to and leverage its new environment, but building-in this kind of forward-looking adaptability to evolutions that are still unknown can be extremely tricky.
This survey discussed techniques for improving the energy efficiency of computing and networking resources in large-scaledistributedsystems. As discussed, during the past decade, solutions have been proposed for improving the energy efficiency of computing and networking resources. For computing resources the solutions work at different levels, from individual nodes to entire infrastructures where they take advantage of recent advanced functionalities such as virtualization. In parallel, for wired networks, shutdown techniques have been extensively studied and evaluated to limit the number of resources that can remain idle and consume energy unneces- sarily. There are also techniques for adapting the performance of both computing and network resources (and their energy usage) to the needs of applications and services. These approaches are often combined and applied in a coordinated way in largescaledistributedsystems.
Many large-scaledistributedsystems employ ad-hoc solutions to track dynamic membership. A common ap- proach is to use a centralized server to maintain the list of active nodes, as in Google’s Chubby lock service . Such an approach requires all clients to communicate di- rectly with a replicated server, which may be undesirable from a scalability perspective. An alternative, decentral- ized approach seen in Amazon’s Dynamo system  is to track system membership using a gossip protocol. This approach provides only eventual consistency, which is inadequate for many applications, and can be slow to converge. These systems also typically do not tolerate Byzantine faults, as evidenced by a highly-publicized outage of Amazon’s S3 service 
1.2 Related work and our contributions
The work by  is among the pioneering studies in classification of power-law distributed web-scale directories such as the Yahoo! directory consisting of over 100,000 target classes. For similar category systems, classification techniques based on refined experts and deep classification have been proposed in  and  respectively. More recently recursive regularization based SVM (HR-SVM) has been studied in  wherein the optimization problem for learning the dis- criminant functions exploits the given taxonomy of categories. This approach represents the current state-of-art as it performs better than most techniques on large-scale datasets released as part of the LargeScale Hierarchical Text Classi- fication Challenge in last few years 1 . However, the drawback of this method is that the improvement in the Micro-F1 (same as accuracy for mono-label prob- lems) and Macro-F1 measures of this approach are not substantial over flat SVM classifier for which ready to use packages such as Liblinear are available. As shown in Table 3 of , the improvement over SVM baseline is less than 1% (in absolute terms) on most datasets. As a result, a natural question to ask is :
However, since wide-area power system oscillations tend to appear in very large-scalesystems, ranging over thou- sands of kilometers and involving many different subsystems managed by different transmission system operators (TSOs), it is often practically not feasible to handle these problems with a fully centralized approach. On the other hand, reli- ability/vulnerability considerations may suggest that even in a system where a fully centralized control scheme would be feasible, it is not necessarily desirable to do so. Consequently, it is of interest to study distributed MPC schemes addressing various decompositions of the global control problem. In this setting, local MPC systems could determine optimal inputs for a subset of controllers under their authority, based on a model of their subsystem and a local control objective , .
5.3 Related works
Various distributed MPC schemes have been proposed to replace the central- ized MPC in large-scale power systems [76, 77, 96–99]. In paper , control agents are placed at each generator and load to control power injections to eliminate operating-constraint violations before the protection system acts. They use detailed models of their surrounding areas and simplified models of remote areas to predict system states. The agents cooperate with each other by sharing their objectives and exchanging solutions and measurements. By cloning the boundary nodes of neighboring areas, paper  breaks the whole power grid into relatively independent subsystems that only interact through consistency constraints on shared variables; each local MPC controller calcu- lates optimized supplementary inputs for automatic voltage regulators and static var compensators in its area, and coordinates with neighboring MPC controllers by exchanging Lagrange multipliers.
The second experiment considers the complementary case to the one presented in the previ- ous section, namely when the whole content of the virtual image needs to be read by each VM instance. This represents the most unfavorable read-intensive scenario that corresponds to applications which need to read input data stored in the image. In this case, the time to run the VM corresponds to the time to boot and fully read the whole virtual image disk con- tent (by performing a “cat /dev/hda1”, where hda1 is the virtual disk corresponding to the image). Again, the evaluation is performed for three chunk sizes: 1 MB, 512 KB and 256 KB. The average time to boot and fully read the initial disk content is represented in Fig- ure 10.7. As expected, in the case of pre-propagation, this time remains constant as no read contention exists. In the case of our approach, almost perfect scalability is also noticeable up to 50 storage nodes, despite read concurrency. This is so because there are enough stor- age providers among which the I/O workload can be distributed. After the number of in- stances outnumbers the storage nodes, the I/O pressure increases on each storage node, which makes the average read performance degrade in a linear fashion. On a general note, the read performance is obviously worse in or approach, as the data is not available locally.
In this paper, we have proposed Mignon, a new protocol to rapidly estimate the aggregate affinity of a newly uploaded video in a community of users in a fully decentralized manner. Our proposal avoids an explicit and costly aggregation by relying on the properties of similarity-based self-organizing overlay networks, and can be used to decide where to place videos in a decentralized UGC system. By eschewing the need for a central support infrastructure, our approach hints at the possibility of fast reactive aggregate analytics in decentralized systems. This may be useful both to promote alternatives to the cloud-centered model of current UGC video services, but also to improve hybrid P2P/cloud architectures [23,38] by offloading complex adaptive tasks to the P2P part of a hybrid system.
Tracing and monitoring tools, and other similar analysis tools, add new requirements to the old problem of coping with asynchronous clocks in distributedsystems. Existing ap- proaches based on the convex hull can achieve excellent accuracy for a posteriori analysis, but impose a significant cost and latency when used in live mode and over large clusters. We propose a novel method, LIANA (Live Incremental Asynchronous Network Analysis), for incrementally computing the clock offset, and updating it as the network evolves, along each communication link, as well as selecting the best synchronization paths and time reference node. Each connection in a network requires message exchanges to compute the clock skew and offset between two connected nodes. This method relies on the trace events recorded for the existing TCP/IP traffic between nodes. After computing the offset and its accuracy for every connection in the network graph, a minimum spanning tree is computed. The edges with the best accuracy are selected and form the spanning tree. Then, a central node is se- lected as the time reference to optimally compute the offset from any node to this reference node. LIANA is efficient, both in terms of synchronization accuracy and time complexity. The method, which is used for online distributed trace synchronization, has been evaluated in realistic scenarios with a diverse set of network topologies and traffic. We show that LIANA generates precise results highly efficiently, which makes it suitable for large cloud-distributedsystems.
the iteration number. These methods are already well implemented in parallel to profit from the great number of computation cores on large clusters. The solving of complicated linear systems with basic iterative methods cannot always converge fast. The conver- gence rate depends on the specialties of operator matrix. Thus the researchers introduce a kind of preconditioners which combine the stationary methods and iterative methods, to improve the spectral proprieties of operator A and to accelerate the convergence. This kind of preconditioners includes the incomplete LU factorization preconditioner (ILU) , the Jacobi preconditioner , the succes- sive over-relaxation preconditioner (SOR) , etc. Meanwhile, there is a kind of deflated preconditioners which use the approximated eigenvalues during the solving procedure to form a new initial vector for the next restart procedure, which allows to speed up a further computation. Erhel  studied a deflated technique for the restarted GMRES algorithm, based on an invariant subspace approximation which is updated at each cycle. Lehoucq  in- troduced a deflation procedure to improve the convergence of an Implicitly Restarted Arnoldi Method (IRAM) for computing the eigenvalues of large matrices. Saad  presented a deflated ver- sion of the conjugate gradient algorithm for solving linear systems. The implementation of these iterative methods was a good tool to resolve linear systems for a long time during past decades.
Ó 2014 Elsevier Ltd. All rights reserved.
Some characteristics of modern large-scale electric power systems, such as long transmission distances over weak grids, highly variable generation patterns and heavy loading, tend to in- crease the probability of appearance of sustained wide-area elec- tromechanical oscillations. The term ‘‘wide-area’’ is used here to emphasize the possible co-existence of local and inter-area oscilla- tion modes of different frequencies that might appear simulta- neously in different parts of large-scalesystems. Such oscillations threaten the secure operation of power systems and if not con- trolled efﬁciently can lead to generator outages, line tripping and large-scale blackouts [1–3] . Current automatic control systems, de- signed to address low-frequency oscillations, are mostly based on very local control strategies realized through Power System Stabi- lizers (PSSs) and FACTS devices.
Abstract. Typical distributed testing architectures decompose test cases
in actions and dispatch them to different nodes. They use a central test controller to synchronize the action execution sequence. This architec- ture is not fully adapted to largescaledistributedsystems, since the central controller does not scale up. This paper presents two approaches to synchronize the execution of test case actions in a distributed man- ner. The first approach organizes the testers in a B-tree synchronizing through messages exchanged among parents and children. The second approach uses gossiping messages synchronizing through messages ex- changed among consecutive testers. We compare these two approaches and discuss their advantages and drawbacks.
Rennes, France firstname.lastname@example.org
In this paper we present an in-depth study of the dynamicity and robustness properties of large-scaledistributedsystems, and in particular of peer-to-peer systems. When design- ing such systems, two major issues need to be faced. First, population of these systems evolves continuously (nodes can join and leave the system as often as they wish without any central authority in charge of their control), and second, these systems being open, one needs to defend against the presence of malicious nodes that try to subvert the system. Given robust operations and adversarial strategies, we pro- pose an analytical model of the local behavior of clusters, based on Markov chains. This local model provides an eval- uation of the impact of malicious behaviors on the correct- ness of the system. Moreover, this local model is used to evaluate analytically the performance of the global system, allowing to characterize the global behavior of the system with respect to its dynamics and to the presence of mali- cious nodes and then to validate our approach.
With the advent of very largescaledistributedsystems through complex interconnec- tion network, phenomenon such as locality and resource congestion have become more and more critical to study and understand the performance of parallel applications. Yet, no visualization tool enables either to handle in a scalable way such workload or to pro- vide deep hindsight into such issues. We think such kind of tool should also need to allow for an interactive and exploratory analysis, knowing how to adapt to the variety of situations that can be investigated. In this article, we explain how we have built a graph-based visualization meeting the previous requirements. We have implemented the technique in an open-source visualization tool called V IVA 3 and which enables to study the correlation between quantities at a spatial and temporal level. The multiscale capability of this visualization allows the analyst to select the adequate level of de- tails and should be put in relation to what has been done for treemaps . Our new visualization has the same aggregation features but also allows to display topological information. Such multiscale capability is also essential to achieve a scalable visual- ization both in term of fluidity and meaning. Yet, it is not sufficient in itself. The ability to dynamically aggregate and disaggregate groups of resources requires to adjust the layout of the graph, which we have done using a dynamic force-directed graph layout mechanism. Such algorithm allows the analyst to easily reorganize as well readjust the layout in a very efficient way.