• Aucun résultat trouvé

SMURFF: A High-Performance Framework for Matrix Factorization Methods

N/A
N/A
Protected

Academic year: 2022

Partager "SMURFF: A High-Performance Framework for Matrix Factorization Methods"

Copied!
2
0
0

Texte intégral

(1)

SMURFF: a High-Performance Framework for Matrix Factorization Methods

(extended abstract)

Tom Vander Aa1, Imen Chakroun1, Thomas J. Ashby1, Jaak Simm2, Adam Arany2, Yves Moreau2, Thanh Le Van3, Jos´e Felipe Golib Dzib3, J¨org Wegner3, Vladimir Chupakhin3, Hugo Ceulemans3, Roel Wuyts1, and Wilfried

Verachtert1

1 ExaScience Life Lab at imec, Leuven, Belgiumtom.vanderaa@imec.be

2 ESAT-STADIUS, KU Leuven, Leuven, Belgium

3 Janssen Pharmaceutica, Beerse, Belgium

Abstract This is a 2-page condensed abstract of the paper published at [1].

Bayesian Matrix Factorization Matrix factorization is a common machine learn- ing technique for recommender systems, like books for Amazon or movies for Netflix [2]. The Bayesian Matrix Factorization (BMF) variant is especially pow- erful because it produces good results and is relatively robust against overfitting.

As sketched in Figure 1, the idea of these methods is to approximate the user- movie rating matrixRas a product of two low-rank matricesU andV such that R≈U×V. In this wayU andV are constructed from the known ratings inR, which is usually very sparsely filled. The recommendations can be made from the approximation U×V which is dense.

High-Performance Matrix Factorization with SMURFF While BMF is powerful, it is computationally very intensive and thus more challenging to implement for large datasets. In this work we present SMURFF a high-performance feature- rich framework to compose and construct different Bayesian matrix-factorization methods, namely:

R

=

U

V

M M

N

K

x K

𝐾 ≪ 𝑀 𝐾 ≪ 𝑁 Ris very sparse

Fig. 1.Low-rank Matrix Factorization

Copyright c2019 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0).

(2)

2 Authors Suppressed Due to Excessive Length

– BPMF, the basic version [4];

– Macau, adding support for high-dimensional side information to the factor- ization [7];

– GFA, doing Group Factor Analysis [6].

Applications of SMURFF The framework has been used very successfully in drug discovery. Here a key problem is the identification of candidate molecules that affect proteins associated with diseases [3]. SMURFF with Macau has been used to predict compound-on-protein activity on a matrix with more than one million compounds (rows) and several thousand proteins (columns) [8]. Chemical fingerprints were used for the compounds, in both dense and sparse formats. This has led to important new insights and potential new compounds to be used in drug discovery.

Getting SMURFF SMURFF is available as open-source [5] and can be used both on a supercomputer and on a desktop or laptop machine. Documentation and several examples are provided as Jupyter notebooks using SMURFF’s high-level Python API.

Acknowledgments This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 801051.

References

1. T. Vander Aa, et.al, “SMURFF: a High-Performance Framework for Matrix Factorization,” 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), pp. 304–308, 2019. [Online]. Available:

https://arxiv.org/abs/1904.02514

2. C. A. Gomez-Uribe and N. Hunt, “The netflix recommender system: Algorithms, business value, and innovation,”ACM Trans. Manage. Inf. Syst., vol. 6, no. 4, pp.

13:1–13:19, Dec. 2015. [Online]. Available: http://doi.acm.org/10.1145/2843948 3. M. Bredel and E. Jacoby, “Chemogenomics: an emerging strategy for rapid target

and drug discovery,”Nature Reviews Genetics, vol. 5, p. 262, Apr. 2004. [Online].

Available: http://dx.doi.org/10.1038/nrg1317

4. R. Salakhutdinov and A. Mnih, “Bayesian probabilistic matrix factorization using Markov chain Monte Carlo,” in Proceedings of the International Conference on Machine Learning, vol. 25, 2008.

5. “SMURFF matrix factorization source code,” https://github.com/ExaScience/

smurff.

6. S. Virtanen, A. Klami, S. A. Khan, and S. Kaski, “Bayesian group factor analysis.”

inAISTATS, 2012, pp. 1269–1277.

7. J. Simm, A. Arany, P. Zakeri, T. Haber, J. K. Wegner, V. Chupakhin, H. Ceule- mans, and Y. Moreau, “Macau: Scalable bayesian factorization with high- dimensional side information using mcmc,” in2017 IEEE 27th International Work- shop on Machine Learning for Signal Processing (MLSP). IEEE, 2017, pp. 1–6.

8. The ExCAPE Consortium. ExCAPE: Exascale Compound Activity Prediction En- gine. http://excape-h2020.eu/. Retrieved: June 2017.

Références

Documents relatifs

Process innovation refers to technological progress observed at the various levels of the agro-food sector linked to the local food heritages.. Manufacturing a new

Pr. BOUHAFS Mohamed El Amine Chirurgie - Pédiatrique Pr. BOULAHYA Abdellatif* Chirurgie Cardio – Vasculaire Pr. CHENGUETI ANSARI Anas Gynécologie Obstétrique. Pr. DOGHMI Nawal

In this section, we present experiments on natural images and genomic data to demonstrate the efficiency of our method for dictionary learning, non-negative matrix factorization,

On the basis of our prediction results and available experi- mental data, we propose that interplay of phosphorylation and GlcNAc modification at Ser and Thr residues in the

Furthermore, we derive a novel collapsed Gibbs sampler and a collapsed variational algorithm to infer the posterior distribution of the factors.. Next, we extend the proposed models

However, a statistically significant difference based on the coverage analysis of the clone libraries (Table 2) was only obtained for the library from the water– sediment

With the English and French corpora, the Z score distance makes it possi- ble to correctly classify all the texts, while with the German collection, the authorship attribution

Multiple rating indexes were integrated into a user-item weighted matrix, as- suming the latent matrix obeyed the Gaussian distribution and density probabilistic dis-