High Performance Data Management

Top PDF High Performance Data Management:

High-Performance Big Data Management Across Cloud Data Centers

High-Performance Big Data Management Across Cloud Data Centers

1.2 – Contributions 5 Dedicating Cloud Compute Nodes for Advanced Data Stewarding Services The “data deluge” calls for scalable and reliable storage for cloud applications and a diver- sification of the associated data-related functionalities. Running large experiments reveals the limitation of the cloud-provided storage in coping with all the applications needs: it trades performance for durability and only provides basic put/get storage functions. How- ever, executing Big Data applications requires more advanced functionalities such as logging mechanisms, compression or transfer capabilities. As an alternative to cloud-provided stor- age service, building such advanced functionalities in the application nodes, on top of a local collocated storage, can become rather intrusive and impact on the application perfor- mance. Therefore, we propose a different approach, called DataSteward, that combines the advantages of traditional cloud-provided storage (isolation of the data management from computation) with the ones of TomusBlobs (high-performance through the use of the local free disks of nodes). To this purpose, we dedicate a subset of the compute nodes and build on top of them a data management system. Thanks to the separation from the computa- tion, this approach provides a higher degree of reliability while remaining non-intrusive. At the same time, applications perform efficient I/O operations as data is kept in the prox- imity of the compute nodes, by the use of a topology-aware clustering algorithm for the selection of the storage nodes. To capitalize on this separation further, we introduce a set of scientific data processing services on top of the storage layer that address the functionality needs of Big Data applications. Similarly to the concept of file from a traditional system, which is a generic object associated with a set of operations (e.g., move, view, edit, delete, compress), DataSteward confers a “cloud file” with its own set of actions. The evaluation results show that this approach improves by 3 to 4 times performance over cloud-provided storage. It can bring significant improvements for the management of applications data due to the topology-aware selection of storage nodes. The work was published in the TRUST- COM/ISPA ’13 conference.
En savoir plus

218 En savoir plus

JetStream: Enabling High Performance Event Streaming across Cloud Data-Centers

JetStream: Enabling High Performance Event Streaming across Cloud Data-Centers

Data Stream Management Systems (DSMS). These systems primarily focus on how queries can be exe- cuted and only support data transfers as a side effect, usually based on rudimentary mechanisms (e.g., simple event trans- fer over HTTP, TCP or UDP) or ignore this completely by delegating it to the data source. D-Streams [48] provides tools for scalable stream processing across clusters, building on the idea of handling small batches which can be processed using MapReduce; an idea also discussed in [36]. Here, the data acquisition is event driven: the system simply collects the events from the source. Comet [23] enables batch pro- cessing across streams of data. It is built on top of Dryad[26] and its management relies on its channel concept to trans- fer a finite set of items over shared memory, TCP pipes or files. However, it is designed to work within the same clus- ter and does not address the issues of sending continuous stream of events between data-centers. In [13], the authors propose a store manager for streams, which exploits access patterns. The streams are cached in memory or disks and shared between the producers and the consumers of events; there is no support for transfers. RIP [10] is another example of DSMS which scales the query processing by partitioning and event distribution. Middleware solutions like System S from IBM [18] were proposed for single cluster processing, with processing elements connected via typed streams. The ElasticStream system [27] migrates this solution to Amazon EC2 taking into consideration cloud-specific issues like SLA, VMs management and the economic aspects of performance. In [14], the authors provide an accurate cost and latency es- timation for complex event processing operators, validated with the Microsoft StreamInsight [4]. Other works in the area of cloud-based streaming, like Sphere[21], propose a GPU-like processing on top of a high performance infrastruc- ture. In [38], the authors propose to use web services for the creation of processing pipelines with data passed via the web service endpoints. Other systems like Aurora and Medusa [7] consider exploiting the geographically distributed nature of stream data sources. However, the systems have a series of limitations despite their strong requirements from the un- derlying infrastructure (e.g., naming schema, message rout- ing): Aurora runs on a single node and Medusa has a single administrator entity. All these cloud-based solutions focus on query processing with no specific improvements or solu- tions for the cloud streaming itself and, furthermore, they do not support multi-site applications.
En savoir plus

13 En savoir plus

Scalable data-management systems for Big Data

Scalable data-management systems for Big Data

DStore: an in-memory document-oriented store As a result of continuous innovation in hardware technology, computers are made more and more powerful than their prior models. Modern servers nowadays can possess large main memory capability that can size up to 1 Terabytes (TB) and more. As memory accesses are at least 100 times faster than disk, keeping data in main memory becomes an interesting de- sign principle to increase the performance of data management systems. We design DStore, a document-oriented store residing in main memory to fully exploit high-speed memory accesses for high performance. DStore is able to scale up by increasing memory capabil- ity and the number of CPU-cores rather than scaling horizontally as in distributed data- management systems. This design decision favors DStore in supporting fast and atomic complex transactions, while maintaining high throughput for analytical processing (read- only accesses). This goal is (to our best knowledge) not easy to achieve with high perfor- mance in distributed environments. DStore is built with several design principles: single threaded execution model, parallel index generations, delta-indexing and bulk updating, versioning concurrency control and trading freshness for performance of analytical process- ing. This work was carried out in collaboration with Dushyanth Narayanan at Microsoft Research Cambridge, as well as Gabriel Antoniu and Luc Bougé, INRIA Rennes, France.
En savoir plus

145 En savoir plus

High performance stucco to optimum moisture management in wood-frame stucco wall

High performance stucco to optimum moisture management in wood-frame stucco wall

Accidental Moisture Entry, Quantity and Location hygIRC-2D has the capability to inject a certain quantity of moisture that has entered accidentally at any location of the wall and at any time (hourly). The quantity of accidentally entered moisture inside the wall and its location were determined from the output of full-scale and small-scale laboratory tests done in a separate project at the NRC- IRC [ 26 ], and from external weather data (rainfall, wind speed and wind direction). In other words, the rate of accidental moisture entry is a function of the rate of wind-driven rain, the air-pressure difference and the type of deficiency in the wall system. In this study, it was assumed that the accidentally entered moisture settled down at the bottom of the insulated stud cavity. For further information regarding the accidental moisture entry feature of hygIRC-2D interested readers may refer to other relevant NRC-IRC publications [ 2,6 ].
En savoir plus

28 En savoir plus

Damaris: Addressing Performance Variability in Data Management for Post-Petascale Simulations

Damaris: Addressing Performance Variability in Data Management for Post-Petascale Simulations

To our knowledge, Damaris is the first middleware available to the community 7 that offers the use of dedicated cores or dedicated nodes to serve data management tasks ranging from I/O to in situ visualization. This work paves the way for a number of new research directions with high potential impact. Our study of in situ visualization using Damaris and CM1 revealed that in some simulations such as climate models, an important fraction of the data produced by the simulation does not actually contain any part of the phenomenon of interest to scientists. When visualizing this data in situ, one thus can lower the resolution of noninteresting parts in order to increase the performance of the visualization process, an approach that we call “smart in situ visu- alization.” Challenges to implement smart in situ visualization include automatically discriminating between relevant and nonrelevant data within the simulation while this data is being produced. This detection should be made without user intervention and should be fast enough to not diminish the overall performance of the visualization process. The plugin system of Damaris together with its existing connection with the VisIt visualization software provides an excellent ground to implement and evaluate smart in situ visualization.
En savoir plus

45 En savoir plus

Data stream management and mining

Data stream management and mining

An interesting other (artificial) application is the “Linear Road Benchmark” (see [9] ). The “Linear Road Benchmark” is a simulator of road traffic which has been designed to compare the performance of different Data Stream Management Systems. It simulates road traffic on an artificial highway network. Vehicles transmit continuously their position and speed: all data are received by the Data Stream Management System which is in charge of supervising traffic by computing for each vehicle a toll price depending on the heaviness of the traffic: high traffic on a highway portion leads to a high price. Computed toll are sent in real-time to vehicles so that they can adapt their route to pay less. This system – though only imaginary – is interesting because it falls in both of the categories described above: the result of a real-time supervision is directly re-injected in the operational system.
En savoir plus

15 En savoir plus

Message Scheduling for Data Redistribution through High Performance Networks

Message Scheduling for Data Redistribution through High Performance Networks

All algorithms have been implemented using MPI. Each communication is done synchronously and commu- nication steps are synchronized using barriers. Algorithms are compared to a raw approach where all communi- cations are issued simultaneously (and therefore asynchronously) in only one step of communication. We take as input an all to all communication (i.e. a complete bipartite graph) with random data size uniformly distributed between 10 MB and n MB. We plot the total communication time obtained when n increases from 10 to 80. The results for k = 3, k = 5, k = 7 are displayed respectively in Figures 31, 32, 33.
En savoir plus

35 En savoir plus

High-Performance Transactional Event Processing

High-Performance Transactional Event Processing

An interesting question is what advantages these programming models bring compared to RTSJ’s NoHeapRealtimeThread which is, after all, supported by all RT JVMs. Experience implementing [5,13,21,2] and using [8,20,7,22,24] the RTSJ revealed a number of serious deficiencies. In the RTSJ, interference from the garbage collection is avoided by allocating data needed by time critical real- time tasks from a part of the virtual machine’s memory that is not subject to garbage collection, dynamically checked regions known as scoped memory areas. Individual objects allocated in a scoped memory area cannot be deallocated; instead, an entire area is torn down as soon as all threads exit it. Dynamically enforced safety rules check that a memory scope with a longer lifetime does not hold a reference to an object allocated in a memory scope with a shorter lifetime and that a NoHeapRealtimeThread does not attempt to dereference a pointer into the garbage collected heap.
En savoir plus

20 En savoir plus

Approaches To Crisis Prevention In Lean Product Development By High Performance Teams And Through Risk Management

Approaches To Crisis Prevention In Lean Product Development By High Performance Teams And Through Risk Management

Contra: Might get “overloaded”, appropriate IT support might be necessary Application of Problem Solving Cycle to Definition of Measures A general problem solving cycle (see e.g. [Lindemann 2002]) can be applied to the definition of actions ([Hall 1998], pp. 110-111). In the first step, the problem is defined by selecting the risks for which actions are to be developed according to the assessment done. If necessary, high priority risks can be assessed in more detail by the methods described this section. At this stage, scenario analysis can be very helpful in understanding the dependencies of the risks under investigation. In a next step, alternative actions to treat a risk are developed, e.g. in accordance with the different categories of actions defined above. After possible actions have been defined, the desired actions are selected. This selection can be based on an integrative approach that tries to resolve as many risks with as few actions as possible. The Risk Reduction Leverage (see below) also yields valuable input for judging an action.
En savoir plus

170 En savoir plus

Compiling High Performance Recursive Filters

Compiling High Performance Recursive Filters

We then provide the split operator (Sec. 6). This applies our tiling transformations to filters defined on the full image, converting them into a series of filters that operate within image tiles, followed by filters across tiles, and then a final filter that assembles these two intermediate results and computes the final output. This tiling trans- formation exploits linearity and associativity of IIR filters. Internally, our compiler also makes critical performance optimization by mini- mizing back-and-forth communication between intra- and inter-tile computation and instead fuses computation by granularity level; e.g., we fuse all intra-tile stages because they have the same dependence pattern as the original non-tiled pipeline. This results in a compact internal graph of operations. We implement these transformations by mutating the internal Halide representation of the pipeline. Internally, our split operator mutates the original filters into a series of Halide functions corresponding to the intermediate opera- tions. We introduce automatic scheduling (Sec. 7), i.e. automatically mapping all the generated operations efficiently onto the hardware. We identify common patterns across the different operations and use heuristics to ensure memory coalescing, minimal bank conflicts for GPU targets, ideal thread pools, and unrolling/vectorization op- tions. We are able to do this without the hand-tuning or expensive autotuning required by general Halide pipelines because we can aggressively restrict our consideration to a much smaller space of “sensible” schedules for tiled recursive filters. Our heuristics sched- ule the entire pipeline automatically, which we show performs on par with manual scheduling. We also expose high-level schedul- ing operators to allow the user to easily write manual schedules, if needed, by exploiting the same restricted structure in the generated pipelines.
En savoir plus

11 En savoir plus

Creating High Performance Extended Enterprises

Creating High Performance Extended Enterprises

High Performer: contract structure between customer and system integrator (e.g. fixed price) was mirrored between system integrator and suppliers. Mediocre: contracts with both[r]

13 En savoir plus

Long-term exposure data analysis of residential high performance wall assemblies exposed to real climate

Long-term exposure data analysis of residential high performance wall assemblies exposed to real climate

/ La version de cette publication peut être l’une des suivantes : la version prépublication de l’auteur, la version acceptée du manuscrit ou la version de l’éditeur. Access and use of [r]

15 En savoir plus

Data management in a cloud federation

Data management in a cloud federation

Recent research shows all their advantages and disadvantage in various areas in cloud environment. Chapter 2 presents solutions for clouds, however only IReS open source platform consider the heterogeneous problem and can be extended to optimize solve MOP. Chapter 3 presents various approaches to optimize data storage config- uration of medical data. However, only [97] consider the characteristic of sparse and workload of DICOM data. Nevertheless, the authors do not provide an optimal solu- tion for hybrid data storage configuration of DICOM. Chapter 4 describes the recent research in estimation cost value and optimization approaches for MOP. The historical information should be used efficiently in machine learning approaches. Besides, NS- GAs should be improved quality for MOP, when the number of objectives is significant. The next chapters will show our approaches to solve the problem of medical data in cloud federations. First, the heterogeneity should be solved when the system connects data from various database engines in the clouds. Second, the variability of cloud en- vironment requires the estimation process to be more efficient. Third, searching and optimizing a solution in MOOP should be solved by an efficient MOO algorithm. This will make possible not only to process queries but also to find an optimal solution of hybrid data storage configuration.
En savoir plus

185 En savoir plus

Technical Paper : Proposed Performance Management Framework

Technical Paper : Proposed Performance Management Framework

collaboration, as necessary and important as they are, have only marginal impact on resilience if agricultural producers and industry stakeholders do not adopt on-farm the adaptive strategies and actions these processes develop and promote. Lamhauge et al. (2012) and Harley et al. (2008) recommend that the M&E system should use both output and outcome indicators to measure progress towards adaptive capacity and resilience. As climate adaptation is still in the early stages, output indicators are likely to be most important in the short term, with outcome indicators becoming more relevant in the long term. Ford et al. (2013) similarly suggest output indicators may be most important in the short term given that, in some cases, the full extent of the impact of climate change and adaptation interventions may not occur for decades, and data for some outcome indicators may not be available for many years. Dinsha et al. (2014) argue further that, on account of the complexity and the methodological challenges associated with climate change adaptation, it is necessary to combine different qualitative and quantitative methods to monitor and evaluate climate
En savoir plus

90 En savoir plus

La diversité, levier de performance ... sous condition de management

La diversité, levier de performance ... sous condition de management

jeu de binômes nous permettant de distinguer et compa- rer les principales approches et perspectives de la perfor- mance : • Economique/Sociétale : cette dichotomie est cer- tainement la plus utilisée de nos jours à l’heure où les organisations s’emploient pour combiner à la fois l’accomplissement de leurs actions, l’obtention des ré- sultats espérés (performance économique), mais égale- ment l’atteinte d’une certaine harmonie sociale auprès des individus de leurs entités (performance sociétale). • Financière/Commerciale : souvent assimilées, ces deux perspectives de la performance sont souvent l’objet de débats et de concertation au sein des orga- nisations en ce qui concerne notamment leurs mesures et les indices choisis. De manière générale, la perfor- mance financière peut être assimilée à la façon dont une entreprise va utiliser les actifs de son business et générer des revenus. Ses mesures souvent établies à l’aide d’indicateurs financiers (retour sur investis- sement, excédent de résultats,…) doivent donc être prises de manière agrégée. La performance commer- ciale pourra elle être globalement envisagée comme l’atteinte d’objectifs commerciaux relative aux moyens engagés pour les atteindre, sous-entendant le fait que l’atteinte d’un certain niveau de réalisation ne peut pas être dissocié du contexte et des ressources mobilisées, y compris financières, pour les atteindre.
En savoir plus

41 En savoir plus

Vessel performance analysis and fuel management

Vessel performance analysis and fuel management

Vessel performance analysis and fuel management Mak, Lawrence; Kuczora, Andrew; Seo, Dong Cheol; Sullivan, Michael https://publications-cnrc.canada.ca/fra/droits L’accès à ce site Web et l’utilisation de son contenu sont assujettis aux conditions présentées dans le site LISEZ CES CONDITIONS ATTENTIVEMENT AVANT D’UTILISER CE SITE WEB.

11 En savoir plus

La performance managériale et académique du Supply Chain Management

La performance managériale et académique du Supply Chain Management

christine.munier@u-bourgogne.fr ’ OBJET de cet article est d’effectuer une revue de la littérature récente sur le SCM afin de déterminer si oui ou non il existe un consensus sur sa définition et si les différentes études relatives à sa performance conduisent à des résultats convergents. Elle est motivée par la multiplication des travaux de recherche sur le sujet ainsi que celle des différents articles portant sur ce même thème dans les revues professionnelles. L’examen de 62 articles (dont plusieurs revues de littérature) montre que les définitions et les approches du SCM diffèrent d’une étude à l’autre. Il n’y a donc pas de consensus sur ce point. De nombreuses études ont porté sur la performance du SCM, soit de façon globale, soit de façon partielle, mais là encore, il est difficile de les comparer compte tenu de la multiplicité des approches utilisées. Beaucoup reposent sur l’utilisation de questionnaires avec des items très différents, du fait entre autres de la diversité des angles d’attaque choisis. La question qui se pose alors est celle de la validité du SCM comme domaine de recherche. Or, il apparaît que même si les théories utilisées proviennent d’autres disciplines, comme les théories économiques des organisations, la stratégie ou la sociologie, l’étude du SCM constitue une approche originale des organisations. Pour l’avenir, les recherches pourraient s’appuyer sur un plus grand échange entre disciplines, en dépassant les « silos » disciplinaires.
En savoir plus

22 En savoir plus

SOCIB applications for oceanographic data management

SOCIB applications for oceanographic data management

—• extract measurements for a given platform or a selected variable —• obtain a time series for a given data product. —• get images model, HF radar.[r]

46 En savoir plus

Bootstrapping high frequency data

Bootstrapping high frequency data

The main idea of this chapter is to see how and to what extent this local Gaussianity assumption can be explored to generate a bootstrap approximation. In particular, we pro- pose and analyze a new bootstrap method that relies on the conditional local Gaussianity of intraday returns. The new method (which we term the local Gaussian bootstrap) consists of dividing the original data into non-overlapping blocks of M observations and then generating the bootstrap observations at each frequency within a block by drawing a random draw from a normal distribution with mean zero and variance given by the real- ized volatility over the corresponding block. Using Mykland and Zhang’s (2009) blocking approach, one can act as if the instantaneous volatility is constant over a given block of consecutive observations. In practice, the volatility of asset returns is highly persistent, especially over a daily horizon, implying that it is at least locally nearly constant.
En savoir plus

137 En savoir plus

Comprehensive Performance Expression Model for Industrial Performance Management and Decision Support

Comprehensive Performance Expression Model for Industrial Performance Management and Decision Support

The proposed model expresses comprehensively the industrial performance at the time of evaluation, as well as the variation of the measures. It helps the decision maker to evaluate whether the preferred decision alternative can always be the most performing solution in the forth-coming performance evaluation periods. In addition, it improves the application of the BCVR methodology in decision support phase. The current proposal is mainly based on linear functions to generate performance variations. However, it is a fairly complex subject to estimate the deviation and liability of the overall performance expressions. Further experimental applications should be applied to verify robustness and improve the mathematical models if necessary.
En savoir plus

7 En savoir plus

Show all 10000 documents...