Sparse Linear Algebra System

Top PDF Sparse Linear Algebra System:

Evaluation of the OGF GridRPC Data Management library, and  study of its integration into an International Sparse Linear Algebra Expert System

Evaluation of the OGF GridRPC Data Management library, and study of its integration into an International Sparse Linear Algebra Expert System

We have extended the implementation of the library and a further integration has been made available into D IET as a back-end of its data manager Dagda. We thus present how the library is used in the International Sparse Linear Algebra Expert System GridTLSE which manages entire expertises for the user, including data transfers, tasks executions, and graphical charts, to help analysing the overall execution. GridTLSE relies on D IET to distribute computations and thus can benefit from the persistency functionalities to provide scientists with faster results when their expertises require the same input matrices. In addition, with the possibility for two middleware to interact in a seamless way as long as they’re using an implementation of the GridRPC Data Management API, new architecture of different domains can easily be integrated to the expert system and thus helps the linear algebra community.
En savoir plus

13 En savoir plus

Use of A Network Enabled Server System for a Sparse Linear Algebra Grid Application

Use of A Network Enabled Server System for a Sparse Linear Algebra Grid Application

This middleware is able to find an appropriate server according to the information given in the client’s request problem to be solved, size of the data involved, the performance of the t[r]

27 En savoir plus

Scheduling Trees of Malleable Tasks for Sparse Linear Algebra

Scheduling Trees of Malleable Tasks for Sparse Linear Algebra

(b) Values of α Fig. 1. Timings and α values for qr mumps frontal matrix factorization kernel comes interesting to rely on an optimized dynamic runtime system to allocate and schedule tasks on computing resources. These runtime systems (such as StarPU [3], KAAPI [9], or PaRSEC [4]) are able to process a task on a pre- scribed subset of the computing cores that may evolve over time. This motivates the use of the malleable task model, where the share of processors allocated to a task vary with time. This approach has been recently used and evaluated [13] in the context of the qr mumps solver using the StarPU runtime system.
En savoir plus

13 En savoir plus

Fast and Accurate Simulation of Multithreaded Sparse Linear Algebra Solvers

Fast and Accurate Simulation of Multithreaded Sparse Linear Algebra Solvers

∗ CNRS/Inria/University of Grenoble Alpes, France; firstname.lastname@imag.fr † Inria/University of Bordeaux, France; firstname.lastname@labri.fr ‡ CNRS/University Paul Sabatier, Toulouse, France; firstname.lastname@irit.fr Abstract—The ever growing complexity and scale of paral- lel architectures imposes to rewrite classical monolithic HPC scientific applications and libraries as their portability and performance optimization only comes at a prohibitive cost. There is thus a recent and general trend in using instead a modular approach where numerical algorithms are written at a high level independently of the hardware architecture as Directed Acyclic Graphs (DAG) of tasks. A task-based runtime system then dynamically schedules the resulting DAG on the different computing resources, automatically taking care of data movement and taking into account the possible speed heterogeneity and variability. Evaluating the performance of such complex and dynamic systems is extremely challenging especially for irregular codes. In this article, we explain how we crafted a faithful simulation, both in terms of performance and memory usage, of the behavior of qr_mumps , a fully-featured sparse linear algebra library, on multi-core architectures. In our approach, the target high-end machines are calibrated only once to derive sound performance models. These models can then be used at will to quickly predict and study in a reproducible way the performance of such irregular and resource-demanding applications using solely a commodity laptop.
En savoir plus

11 En savoir plus

Scheduling Trees of Malleable Tasks for Sparse Linear Algebra

Scheduling Trees of Malleable Tasks for Sparse Linear Algebra

Figure 1: Example of the decomposition of a task of the DAG of a Cholesky decomposition into smaller kernels. As computing platforms evolve quickly and become more complex (in particularly because of the increasing use of accelerators such as GPU or Xeon Phi), it becomes interesting to rely on an optimized dynamic runtime system to allocate and schedule the kernels on the computing resources. These runtime systems (such as StarPU [5], KAAPI [6], or PaRSEC [7]) are able to process the kernels of a given task on a prescribed subset of the computing cores, and this subset may evolve with time. This motivates the use of a malleable task model, where the share of processors allocated to a task vary with time. This approach has been recently used and evaluated [19] in the context of the qr mumps solver using the StarPU runtime system.
En savoir plus

44 En savoir plus

Hierarchical hybrid sparse linear solver for multicore platforms

Hierarchical hybrid sparse linear solver for multicore platforms

1 Introduction Parallel sparse linear algebra solvers are often the innermost numerical kernels in scientific and engineering applications; consequently, they are one of the most time consuming parts. In order to cope with the hierarchical hardware design of modern large-scale supercomputers, the HPC solver community has proposed new sparse methods. One promising approach towards the high- performance, scalable solution of large sparse linear systems in parallel scientific computing consists of combining direct and iterative methods. To achieve a high scalability, algebraic domain decom- position methods are commonly employed to split a large size linear system into smaller size linear systems that can be efficiently and concurrently handled by a sparse direct solver while the solu- tion along the interfaces is computed iteratively [27, 25, 13, 11]. Such an hybrid approach exploits the advantages of both direct and iterative methods. The iterative component allows us to use a small amount of memory and provides a natural way for parallelization. The direct part provides its favorable numerical properties; furthermore, this combination provides opportunities to exploit several levels of parallelism.
En savoir plus

29 En savoir plus

Linear algebra algorithms for cryptography

Linear algebra algorithms for cryptography

(and is also a constant factor larger than n, to ensure that the noise-free version of the corresponding linear algebra problem has a unique solution, and that the covariance matrix of the rows a of A is well-controlled). Our result applies to a very large class of distributions for A and e including bounded distributions and discrete Gaussians. It relies on sub-Gaussian concentration inequalities. Interestingly, ILWE can be interpreted as a bounded distance decoding problem in a certain lattice in Z n (which is very far from random), and the least squares approach coincides with Babai’s rounding algorithm for the approximate closest vector problem (CVP) when seen through that lens. As a side contribution, we also show that even with a much stronger CVP algorithm (including an exact CVP oracle), one cannot improve the number of samples necessary to recover s by more than a constant factor. And on another side note, we also consider alternate algorithms to least squares when very few samples are available (so that the underlying linear algebra system is not even full-rank), but the secret vector is known to be sparse. In this case, linear programming techniques from [ CT07 ] can solve the problem efficiently.
En savoir plus

227 En savoir plus

Towards an automatic generation of dense linear algebra solvers on parallel architectures

Towards an automatic generation of dense linear algebra solvers on parallel architectures

Some solutions have been proposed in recent years but they tend to solve partially the abstrac- tion/efficiency trade-off problem. The method followed by the Formal Linear Algebra Methods Environment (FLAME) with the Libflame library [46] is a good example. It offers a framework to develop dense linear solvers through the use of algorithmic skeletons [15] and an API which is more user-friendly than LAPACK while giving satisfactory performance results. Another ap- proach is the one followed in recent years by C++ libraries built around expression templates [48] or other generative programming [20] principles for high-performance computing. Examples of such libraries are Armadillo [16] and MTL [27]. Armadillo provides good performance with BLAS and LAPACK bindings and an API close to Matlab [36] for simplicity. However it does not provide a generic solver like the Matlab routine linsolve that can analyze the matrix type and choose the correct routine to call from the LAPACK library. It also does not support GPU computations which are becoming mandatory for medium to large dense linear algebra problems. In a similar way, while MTL can topple the performance of vendor-tuned codes, it does not offer linsolves-like implementation or GPU support. Other examples of libraries with similar content include Eigen [30] , Flens [34], Ublas [49] and Blaze [32].
En savoir plus

21 En savoir plus

Bayesian Functional Linear Regression with Sparse Step Functions

Bayesian Functional Linear Regression with Sparse Step Functions

MSC 2010 subject classifications: Primary 62F15; secondary 62J05. Keywords: Bayesian regression, functional data, support estimate, parsimony. 1 Introduction Consider that one wants to explain the final outcome y of a process along time (for instance the amount of some agricultural production) thanks to what happened during the whole history (for instance, the rainfall history, or temperature history). Among the statistical learning methods, functional linear models (Ramsay and Silverman, 2005 ) aim at predicting a scalar y based on covariates x 1 (t), x 2 (t), . . . , x q (t) lying in a functional
En savoir plus

26 En savoir plus

Approximate cross validation for sparse generalized linear models

Approximate cross validation for sparse generalized linear models

Our main contribution is to demonstrate one case in which this notion of effective dimension is helpful for approximate CV - that of f, regularized generalized lin[r]

60 En savoir plus

2D Static Resource Allocation for Compressed Linear Algebra and Communication Constraints

2D Static Resource Allocation for Compressed Linear Algebra and Communication Constraints

Abstract This paper adresses static resource allocation problems for irregular distributed parallel applica- tions. More precisely, we focus on two classical tiled linear algebra kernels: the Matrix Multiplication and the LU decomposition algorithms on large linear systems. In the context of parallel distributed platforms, data exchanges can dramatically degrade the performance of linear algebra kernels and in this context, compression techniques such as Block Low Rank (BLR) are good candidates both for limiting data storage on each node and data exchanges between nodes. On the other hand, the use of BLR representation makes the static allocation problem of tiles to nodes more complex. Indeed, the workload associated to each tile depends on its compression factor, which induces an heteroge- neous load balancing problem. In turn, solving this load balancing problem optimally might lead to complex allocation schemes, where the tiles allocated to a given node are scattered over the whole matrix. This in turn causes communication complexity problems, since matrix multiplication and LU decompositions heavily rely on broadcasting operations along rows and columns of processors, so that the communication volume is minimized when the number of different nodes on each row and column is minimized. In the fully homogeneous case, 2D Block cyclic allocation solves both load balancing and communication minimization issues simultaneously, but it might lead to bad load balancing in the heterogeneous case. Our goal in this paper is to propose data allocation schemes dedicated to BLR format and to prove that it is possible to obtain good performance on makespan when simultaneously balancing the load and minimizing the maximal number of different resources in any row or column.
En savoir plus

21 En savoir plus

A cut-free cyclic proof system for Kleene algebra

A cut-free cyclic proof system for Kleene algebra

We introduce in this paper a calculus HKA for Kleene algebra whose non- wellfounded proofs we prove sound and complete (Sects. 5 and 6). This calculus is cut-free and admits the subformula property. We actually prove that its regular fragment—those proofs with potentially cyclic but finite dependency graphs—is complete. Our approach is related to other works on cyclic systems for logics, e.g. [11,13], but is more fine-grained proof theoretically. We give a diagrammatic summary of our contributions in Fig. 1, where we use the symbols ` ω and ` ∞ to distinguish between regular proofs and arbitrary, potentially infinite proofs resp. Starting from Palka’s system, a natural idea when looking for a regular system consists in replacing her infinitary rules for Kleene star by finitary ones, and allowing non-wellfounded proofs. Doing so, we obtain the calculus LKA described in Sect. 3: proofs that are well-founded but of infinite width in Palka’s system become finitely branching but infinitely deep in LKA. These non-wellfounded proofs of LKA admit an elegant proof theory, but we show that its regular fragment is not complete: there are valid inequalities which require arbitrarily large sequents to appear in their proofs. We solve this problem by allowing slightly more structure in the succedents of sequents, moving to hypersequents to design the calculus HKA (Sect. 4). After showing completeness, inspection of the regular proofs of HKA yields an alternative proof that the equational theory of rational languages is in PSpace, without relying on automata-theoretic arguments (Sect. 7). We conclude this paper with some further comments and directions for future work (Sect. 8).
En savoir plus

19 En savoir plus

Identification of switched linear systems via sparse optimization

Identification of switched linear systems via sparse optimization

straining e to be bounded with respect to a certain norm. In [7], Candès and Randall used this approach to cor- rect errors occurring when decoding messages transmit- ted over communication channels. Their idea is to esti- mate, under sparsity of (25), the error e together with the vector θ which we suppose here to represent one sub- model of the switched system (1). To do so, one needs however to know a priori an upper bound η on the norm of the noise. More precisely, a somewhat tight bound η satisfying kek ℓ ≤ η, with ℓ a certain norm in {2, ∞, . . .}, is required. If θ is a PV for the switched linear system, then θ may be computed from the convex program
En savoir plus

15 En savoir plus

Reordering Strategy for Blocking Optimization in Sparse Linear Solvers

Reordering Strategy for Blocking Optimization in Sparse Linear Solvers

REORDERING STRATEGY FOR BLOCKING OPTIMIZATION IN SPARSE LINEAR SOLVERS ∗ GREGOIRE PICHON † , MATHIEU FAVERGE ‡ , PIERRE RAMET † , AND JEAN ROMAN † Abstract. Solving sparse linear systems is a problem that arises in many scientific applications, and sparse direct solvers are a time-consuming and key kernel for those applications and for more advanced solvers such as hybrid direct-iterative solvers. For this reason, optimizing their performance on modern architectures is critical. The preprocessing steps of sparse direct solvers—ordering and block-symbolic factorization—are two major steps that lead to a reduced amount of computation and memory and to a better task granularity to reach a good level of performance when using BLAS kernels. With the advent of GPUs, the granularity of the block computation has become more important than ever. In this paper, we present a reordering strategy that increases this block granularity. This strategy relies on block-symbolic factorization to refine the ordering produced by tools such as Metis or Scotch, but it does not impact the number of operations required to solve the problem. We integrate this algorithm in the PaStiX solver and show an important reduction of the number of off-diagonal blocks on a large spectrum of matrices. This improvement leads to an increase in efficiency of up to 20% on GPUs.
En savoir plus

24 En savoir plus

Minimax rate of testing in sparse linear regression

Minimax rate of testing in sparse linear regression

Bordenave, C. and Chafa¨ı, D. (2012). Around the circular law. Probability Surveys. 9 1-89. Collier, O., Comminges, L., and Tsybakov, A.B. (2017). Minimax estimation of linear and quadratic functionals under sparsity constraints. Ann. Statist. 45 923–958. Donoho, D.L. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mix- tures. Ann. Statist. 32 962–994.

17 En savoir plus

A model from signal analysis to design linear algebra activities

A model from signal analysis to design linear algebra activities

Finally, item c) intended to probe if students’ previously constructed structures about matrices and vectors enabled them to recognize the product of a matrix A and a vector s in the model for the transformation of sources and observations they were developing. We intended to observe if they were able to relate these constructions to BSS contextual elements in order to explain, beyond mathematics, the need for A to have n columns if vector s is in R n . Results obtained showed that effectively, most students had interiorized the matrix form of a system of equations into a process and could coordinate it with a process of coefficient matrix once the nxm linear system was identified in item b). Students related the size of the matrix A to the BSS context by observing that the number of columns of A must equal the number of sources, and the product results in m observations, so they concluded A has to have m rows by making reference to configurations in each case.
En savoir plus

11 En savoir plus

Stacked Sparse Blind Source Separation for Non-Linear Mixtures

Stacked Sparse Blind Source Separation for Non-Linear Mixtures

2007 ), it is very rare that sparse sources both have non-zeros values at the same time. Therefore, when plotting the scatter plot of S 1 as a function of S 2 (cf. Fig. 1(a) ), most of the source coefficients lie on the axes (in this work we even assume that all coefficients lie on the axes – this hypothesis is discussed in Sec. 4 ). Once mixed with the non-linear f , the source coefficients lying on the axes are transformed into n non-linear one dimensional (1D) manifolds ( Ehsandoust et al. , 2016 ; Puigt et al. , 2012 ), each manifold corresponding to one source (see Fig 1(b) ). To separate the sources, the idea is then to back-project each manifold on one of the axes. We propose to perform this back-projection by approximating the 1D-manifolds by a linear-by-part function, that we will invert. As evoked above, we then get separated sources which are only distorted through non-linear functions that do not remix them, called h in the following.
En savoir plus

11 En savoir plus

SaaS for Energy Efficient Utilization of HPC Resources of Linear Algebra Calculations

SaaS for Energy Efficient Utilization of HPC Resources of Linear Algebra Calculations

Fig. 4.1. Experiments results for small matrices infrastructure for conducting the experiments. Taking into account the testbed infrastructure’s parameters (mostly RAM), three different sizes of matrices are studied: small (4096×4096), medium (8192×8192) and large (12288×12288). The Simple Linux Utility for Resource Management (Slurm) [21] used for jobs scheduling which is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system both for large scale and small Linux clusters. To ensure that the results were statistically sound and the value of the execution time and power consumption of each matrix is reliable, for each random execution the experiment is run ten times and the arithmetic mean is taken.
En savoir plus

8 En savoir plus

A prelie algebra associated to a linear endomorphism and related algebraic structures

A prelie algebra associated to a linear endomorphism and related algebraic structures

3. For all a, b, c ∈ g, (a • b) • c − a • (b • c) = (a • c) • b − a • (c • b). 4. For all a, b ∈ T (V ), ∆(a • b) = a (1) ⊗ a (2) • b + a (1) • b (1) ⊗ a (2) b (2) , with Sweedler’s notations. Our aim in this text is to give a generalization of the construction of g and its relative H and G, and to study some general properties of this construction. Let us take any linear endomorphism f of a vector space V . We inductively define a pre-Lie product • on the shuffle Hopf algebra (T (V ), , ∆), making it a Com-Pre-Lie Hopf algebra denoted by T (V, f ) (definition 1 and theorem 2). For example, if x 1 , x 2 , x 3 ∈ V and w ∈ T (V ):
En savoir plus

38 En savoir plus

Recursion based parallelization of exact dense linear algebra routines for Gaussian elimination

Recursion based parallelization of exact dense linear algebra routines for Gaussian elimination

that the number of modular reductions is smaller in the case of tile recursive LU factorization, which is one motivation for the use of the tile recursive variant over a finite field. The impact of grain size. The granularity is the block dimension (or the dimen- sion of the smallest blocks in recursive splittings). Matrices with dimensions below this threshold are treated by a base-case variant (often referred to as the panel factorization [ 8 ], in the case of the PLUQ decomposition). It is an im- portant parameter for optimizations: a finer grain allows more flexibility in the scheduling when running on numerous cores, but it also challenges the efficiency of the scheduler and can increase the memory bus traffic. In numerical linear algebra, where cubic time algorithm are used, the arithmetic cost is indepen- dent of the cutting in blocks. Hence the granularity has very little impact on the efficiency of a block algorithm run sequentially. On the contrary, we saw in Table 1 that over a finite field, a finer granularity can lead to a larger number of costly modular reductions. The use of sub-cubic variants for the sequential matrix multiplications is another reason why coarser a granularity lead to a higher sequential efficiency. On the other hand, the granularity needs to be fine enough so as to generate enough independent tasks to be executed in parallel. Therefore, with a fixed number of resources, we will rather set the number of tasks to be created (usually to the number of available cores, or slightly more), instead of setting a fixed small grain size as usually done in numerical linear al- gebra. Hence, an increase in the dimensions, will result in a coarser granularity, making each sequential task perform more efficiently.
En savoir plus

24 En savoir plus

Show all 8322 documents...