Hybridization expansion Monte Carlo simulation of multi-orbital quantum impurity problems: matrix product formalism and improved sampling

(1)

()

Hybridization expansion Monte Carlo

simulation of multi-orbital quantum

impurity problems: matrix product

formalism and improved sampling

Hiroshi Shinaoka

1

_{, Michele Dolfi}

1

_{, Matthias Troyer}

1

_and

Philipp Werner

2

1_{Theoretische Physik, ETH Z}_{ürich, 8093 Zürich, Switzerland}

2_{Department of Physics, University of Fribourg, 1700 Fribourg, Switzerland}

E-mail: shinaoka@itp.phys.ethz.ch

Abstract. We explore two complementary modifications of the hybridization-expansion continuous-time Monte Carlo method, aiming at large multi-orbital quantum impurity problems. One idea is to compute the imaginary-time propagation using a matrix product state representation. We show that bond dimensions considerably smaller than the dimension of the Hilbert space are sufficient to obtain accurate results and that this approach scales polynomially, rather than exponentially with the number of orbitals. Based on scaling analyses, we conclude that a matrix product state implementation will outperform the exact-diagonalization based method for quantum impurity problems with more than 12 orbitals. The second idea is an improved Monte Carlo sampling scheme which is applicable to all variants of the hybridization expansion method. We show that this so-called sliding window sampling scheme speeds up the simulation by at least an order of magnitude for a broad range of model parameters, with the largest improvements at low temperature.

Keywords: quantum Monte Carlo simulations

ArXiv ePrint: 1404.1259 Published in -RXUQDORI6WDWLVWLFDO0HFKDQLFV

7KHRU\DQG([SHULPHQW3 which should be cited to refer to this work.

(2)

()

Contents

1. Introduction 3

2. Hybridization expansion algorithm 4

3. Imaginary time evolution with the Krylov subspace method 6

4. Quantum impurity model 7

5. Trace calculation with matrix product states 7

5.1. Matrix product state formalism . . . 7

5.1.1. Matrix product states (MPS) . . . 8

5.1.2. Compressing MPS . . . 8

5.1.3. Matrix product operators (MPO) . . . 8

5.1.4. Linear algebra with MPS and MPO . . . 9

5.2. Numerical details . . . 9

5.3. Accuracy of the MPS method . . . 10

5.4. Performance of the MPS method . . . 10

5.5. Discussion and future perspectives . . . 12

6. Improved Monte Carlo sampling 13 6.1. Conventional method . . . 14

6.2. Sliding-window approach . . . 14

6.3. Benchmark setup . . . 15

6.4. Benchmark results . . . 17

6.4.1. Insulating region: U = 6 and β = 50 . . . 17

6.4.2. Temperature and U dependence . . . 17

6.5. Discussion and future perspectives . . . 19

7. Summary 19

Acknowledgments 19

Appendix A. MPO for a model with uniform all-to-all interactions 20

Appendix B. MPO for general interactions 21

Appendix C. Ergodicity of the sliding-window approach 22

Appendix D. Detailed Monte Carlo update procedure 23

References 24

(3)

()

1. Introduction

Quantum impurity models appear in various contexts in condensed matter physics. An important example is the dynamical mean-field theory (DMFT) [1] for strongly cor-related electron systems. In a DMFT calculation, a corcor-related lattice model is mapped to an impurity problem whose bath degrees of freedom are self-consistently determined. Although the DMFT formalism was originally proposed for the single-band Hubbard model, it can be extended to multi-orbital systems and cluster-type impurities [2]. Furthermore, DMFT can be combined with density functional theory based ab initio calculations, to describe strongly correlated materials such as transition metal oxides [3]. For these applications, it is important to develop efficient algorithms to solve quan-tum impurity problems with multiple orbitals or sites.

In recent years, two complementary types of continuous-time quantum Monte Carlo (MC) impurity solvers have been developed, which are based on a stochastic sampling of perturbation expansions: the weak-coupling method [4] and the hybridization expan-sion method [5, 6]. The former approach is based on a perturbation expansion in pow-ers of the Coulomb interaction terms, while the latter one treats the local Coulomb interactions exactly and instead expands the partition function in the coupling between the impurity and the bath. For describing strongly correlated materials, the latter approach is typically favored because of its ability to treat general interactions such as spin flips and because the average perturbation order of the hybridization expansion is relatively low in the strongly correlated regime. The algorithm was further extended to treat retarded interactions [7], which has recently been used in a extended DMFT study of the effects of long-range interactions [8].

A drawback of the hybridization expansion approach is that the computational effort scales exponentially with the number of sites or orbitals, because the dimension of the Hilbert space grows exponentially. Without additional approximations, this lim-its the application to small impurity models with up to five orbitals, even if one uses an implementation based on sparse-matrix exact-diagonalization techniques [9].

On the other hand, various wavefuction based theories have been developed for interacting fermionic lattice models. In particular, the ground states of one-dimen-sional (1D) systems can be described essentially exactly by the formalism of matrix product states (MPS) [10] with reasonable computational effort. The MPS formalism is known to be equivalent to the density matrix renormalization group (DMRG) [11, 12]. It has also been used to solve impurity problems [13–17]. In such MPS based calculations, the bath is represented by a 1D chain (or 1D chains) attached to the impurity, which results in an exponential growth of the computational cost with the number of sites or orbitals in the impurity. Furthermore, it is not trivial to extend the formalism to a non-diagonal coupling between the impurity and the bath, or to retarded interactions.

A possible direction for the development of flexible impurity solvers for large multi-orbital systems may be to combine these two approaches, i.e., the hybridization expan-sion and the MPS formalism. In this paper, we propose and test such a combined approach, in which the local interaction is treated using an MPS representation. More specifically, we perform the imaginary time evolution, which is given by the local impu-rity Hamiltonian, using the MPS formalism. We test the accuracy of the imaginary

(4)

()

time evolution and compare its performance with that of the exact approach using a sparse-matrix exact-diagonalization technique.

Another direction of research is to develop a more efficient MC sampling algorithm. In the continuous-time MC method based on the hybridization expansion, one stochas-tically samples configurations represented by creation and annihilation operators of the local degree of freedoms on the imaginary time interval. In estimating the weight of a configuration, the most costly part in multiorbital cases is evaluating the trace of a matrix product over the local degrees of freedom of the quantum impurity. This matrix product consists of imaginary-time evolution operators as well as creation and anni-hilation operators. The cost of evaluating the trace grows as temperatures is lowered, because the expansion order increases.

The trace can be evaluated either by the matrix formalism [6, 18], by sparse-matrix exact-diagonalization techniques (Krylov method) [9] or by an MPS version of the Krylov method. In the former formalism, all operators are represented by matrices in the eigen-basis of the local Hamiltonian and the matrix product is computed by multiplying the matrices one by one. In the latter formalism, the trace is computed by performing the imaginary-time evolution starting from eigenstates using the basis in which operators are represented as sparse matrices. In this paper, we call this the Krylov method or Krylov-sparse-matrix method. It was shown that the Krylov method is superior in performance for impurity problems involving more than 4 orbitals as local degrees of freedom [9].

For the matrix formalism, an efficient MC sampling scheme based on a tree struc-ture has been proposed to suppress the growth of the computational cost at low tem-peratures [19]. Instead of recomputing the matrix product from scratch at each MC step, one reuses partial products of matrices that have been previously computed and stored. By using a tree data structure, the cost can then be reduced from O(β) to

O (log β), where β is the inverse temperature. However, these ideas based on storing

matrix products cannot be applied to the Krylov method. Thus, an alternative efficient MC sampling algorithm needs to be developed for the Krylov method.

The rest of the paper is organized as follows. In section 2, we describe the hybridiza-tion expansion algorithm. The Krylov method is described in sechybridiza-tion 3. The quantum impurity models used for the present study are defined in section 4. In section 5, we propose a combined approach of the Krylov method and the matrix-product formalism. We propose an improved MC sampling algorithm for the Krylov method in section 6. A summary is given in section 7

2. Hybridization expansion algorithm

A fermionic quantum impurity model is defined by the following Hamiltonian: H H= loc+Hmix+Hbath,

(1) where

∑

= + α β α β α β α β γ δ α β γ δ α β γ δ Hloc t c c U c c c c , , , , , , , , , † † † (2)

http://doc.rero.ch

(5)

()

†

∑

= α α α α H ε a a , k k k k bath , , , , (3) †

∑

= + α β α β α β H V a c h.c. k k k mix , , , , (4) The term Hloc describes an impurity with chemical potentials, intra-orbital hoppings

and two-body interactions, where α and β are combined orbital and spin indices. (We call the combined index of spin and orbital a flavor.) Hbath describes a

non-interacting bath with quantum numbers k and flavor α. The hybridization term Hmix

describes the exchange of electrons between the impurity and the bath.

In the hybridization expansion impurity solver, one expands the partition function

H

= −β

Z Tr[e ] with respect to the hybridization term Hmix as

∫

∑

∫

τ τ = = ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ = − ⎡_⎣⎢ ⎤_⎦⎥ β β τ τ β τ β β τ τ τ τ − − − = ∞ − − − − − β − − H H H H H H H H Z Tr[e ] Tr e Te d d ( 1) Tr e e e , n n n d ( ) 0 0 1 ( ) 2 ( ) 2 n n n n 1 0 2 1 1 1 1 1 1 (5)

where H H1= loc+Hbath and H2=Hmix and we employed the interaction picture.

In equation (5), the partition function Z is represented as the sum of all configura-tions c = {τ1, …, τn} with weight

H H

H H H

τ τ

= − − −β τ − −τ τ− −τ

w (6)c ( d ) Tr[en ( n) 1 2e (n n 1) 1 2e 1 1] d .n The weight can be simplified further by exploiting the fact that the time evolution of the impurity and the bath are not coupled by H2. By tracing out the bath degrees of

freedom, one obtains

τ † τ τ † τ τ α τ α τ α τ α τ = ⎡ ⎣⎢ ′ ′ ⎤⎦⎥ × { } { } { } { } β α _α α _α − − ′ ′ ′ ′ ′ ′ H w Tc c c c M d Z Tr e ( ) ( ) ( ) ( ) det ( , , , , ; , , , , )( ) . c n n n n n bath loc 1 1 1 1 1 1 1 1 1 2 n n loc 1 1 (7) Here, c represents a configuration with annihilation operators at τ1<…<τn with flavor α1, …, αn and creation operators at τ′1<…<τ′n with flavor α′1, …, α′n. The matrix element of M−1 at (i, j) is given by the hybridization function Δα αi′, j(τ τi′− j) defined in terms of εk, α and Vkα

b

,_{. The trace in equation (}₇_{) reduces to the form}

∑

⎡ ⎣⎢ ⎤⎦⎥ = Ψ | |Ψ β τ τ τ τ β τ τ τ τ − − − − − − − − − − − − − − H H H H H H O O O O O O Tr e e e e e e , n n m m n n m loc ( ) 2 ( ) 2 1 1 ( ) 2 ( ) 2 1 1 n n n n n n

2 loc 2 2 1 loc 1 loc

2 loc 2 2 1 loc 1 loc ₍₈₎

where O1, , O2n are time-ordered creation and annihilation operators appearing in equation (7). |Ψm〉 denotes an eigenstate of Hloc and the sum is over all eigenstates.

(6)

()

The contributions of the configurations c are stochastically sampled in the Monte Carlo simulation with the weight wc. When Hloc contains only chemical potentials and

density-density interactions, the occupation number basis is an eigensystem of Hloc. In

this case, equation (8) can be evaluated efficiently. Otherwise, the evaluation of equa-tion (8) is exponentially costly with respect to the number of orbitals in the impurity. In [9], it was shown that the sum over eigenstates can be restricted to ground states at low enough temperature. It was also proposed to evaluate the trace using the so-called Krylov subspace method described in the next section.

3. Imaginary time evolution with the Krylov subspace method

In evaluating the trace in equation (8), we perform an imaginary time evolution

H

τ

− _v

e (9) in each time-interval between creation/annihilation operators. We employ the Krylov subspace method in the same manner as in [9].

For a given Hamiltonian H and vector v, the Krylov subspace is defined as K =_span{_v_, H_v_, _, H − _v}_,

p p 1

(10) where p is the dimension of the subspace. Then, the full matrix exponential e−τHv is approximated by the matrix exponential of the Hamiltonian projected onto the Krylov space.

We construct an orthonormal basis for the Krylov subspace that tridiagonalizes H as † α β β α β β α = = ⎛ ⎝ ⎜⎜⎜ ⎜⎜⎜ ⎜⎜⎜ ⎞ ⎠ ⎟⎟⎟ ⎟⎟⎟ ⎟⎟⎟ ⎟⎟ U HU T 0 0 1 1 1 2 2 2 3 (11)

by using the Lanczos method. Here, αi and βi are real numbers. The column vectors of

U are orthonormal basis vectors {ui} with u1 = v/v.

The basis vectors ui and the matrix elements αi, βi are obtained step by step for i = 1, 2, 3, … as follows: † α = u Hu ,i i i (12) α β α =⎧_⎨⎪⎪ ⎩⎪⎪ − = − − > + − − v Hu u i Hu u u i ( 1) ( 1), i i i i i i i i i 1 1 1 (13) β = ||i vi 1+ ||, (14) β = + + u (15)i 1 vi 1/ i.

http://doc.rero.ch

(7)

()

Convergence of the result is checked at each Lanczos step between equations (12) and (13) by evaluating the matrix exponential as

∑

β β = = τ τ τ − − = − v u u e H e H (e ) , i p T i i 0 1 0 1 1 (16) where β0 = v. The matrix exponential e−τT can be evaluated by a direct

diagonaliza-tion because of the small dimension of the Krylov subspace. In the following calcula-tions, we use the criterion |(e−τT)m1/(e−τT)11|<ε with the tolerance ε = 10−5.

4. Quantum impurity model

Throughout this paper, we consider an N-orbital impurity model with a “Slater-Kanamori” interaction. The Hamiltonian is

† † † †

∑

μ = − + _{⎡⎣ ′} + ′− _⎤⎦ − + σ σ σ σ σ ↑ ↓ > − ≠ ↓ ↑ ↓ ↑ ↑ ↓ ↑ ↓ H

(

)

Un n n U n n U J n n J c c c c c c c c ( ) , i i i i i i j i j i j i j i j j i j j i i loc , (17) where c_i† and ci are creation/annihilation operators of an electron at site i and ni≡ c ci† i. We take U′ = U − 2J and J = U/6. The chemical potential is chosen such that the system is at half filling: _{μ =}

₍

_n₋1

₎

_U_{− −}₍_n ₁₎ _J

2

5

2 . We consider an orbital-diagonal

hybridization function corresponding to a noninteracting model with semicircular den-sity of states of bandwidth 4.

While the interaction terms in equation (17) may not correspond to a rotationally invariant interaction for N > 3, we use this Hamiltonian for the purpose of benchmark calculations. We do not take into account the special conserved quantities [20] which enable a particularly efficient sampling of the Slater–Kanamori Hamiltonian. None of the procedures discussed in the following sections depend on a specific form of the Hamiltonian.

5. Trace calculation with matrix product states

In this section, we investigate the accuracy and efficiency of a combined Krylov and MPS approach. A brief introduction of the MPS formalism is given in section 5.1. In section 5.2, we describe the details of benchmark calculations. In section 5.3 we discuss the accuracy of the method, while the performance of the method is investigated in section 5.4. Future perspectives are given in section 5.5.

5.1. Matrix product state formalism

Here we provide a very brief overview of the MPS formalism. For details see the review by Schollwöck [21].

(8)

()

5.1.1. Matrix product states (MPS). Let us consider a one-dimensional lattice of length L, with a local Hilbert space of dimension d at each site. Hereafter, the dimension d is referred to as the local dimension. For instance, Hubbard models with S = 1/2 elec-trons have a local dimension d = 4: The local Hilbert space at site i can be spanned by |0〉, †

↓

c 0_i , c 0_i†_↑ , c c 0 _i† †_{↑ ↓}_i .

Any pure state can be represented in the form

⟩ ⟩ ⟩ ⟩

∑

∑ ∑

∑

σ σ σ σ σ σ |Ψ = | = ⎛ ⎝ ⎜⎜⎜ ⎜ ⎞ ⎠ ⎟⎟⎟ ⎟⎟ ×| = | σ σ σ σ σ σ σ σ σ σ σ σ σ σ − M M M c , , M M M , L b b r r b b b b b L L , , , , 1 , , , , , 1, , , 1 , , 1 L L L L L L L L L L 1 1 1 1 1 1 1 1 2 2 1 1 1 2 (18)

where Mσl_(l= 1, …, L) are rank-3 tensors of dimension d × r

l−1× rl. At the left (l = 1) and right (l = L) edges, we take r0 = rL+1 = 1. The maximum value of bl is referred to as the bond dimension of the MPS.

The MPS formalism is the underlying variational approximation made by the DMRG algorithm [21]. For a non-critical 1D system with short-range interactions, the ground state can be described very accurately by an MPS with a small bond dimension of O (1). Note that the exponentially large tensor cσ1,,σL is reduced to a product of small

tensors of size O (1) because the entanglement entropy of the ground state is O (1) with respect to the system length.

5.1.2. Compressing MPS. An important remark is that MPS with a fixed bond dimen-sion do not form a vector space. For example, the sum of two MPS results in a larger bond dimension as discussed later in section 5.1.4. In general, an MPS with a larger bond dimension can contain more information. Thus, to keep the bond dimension bounded, one may have to reduce the bond dimension after an operation, while keeping the loss of accuracy as small as possible. This can be done by an algorithm based on the so-called singular value decomposition (SVD). A truncation of the bond dimension from D′ to D costs O(dD′3L) for D′ D.

5.1.3. Matrix product operators (MPO). Matrix product operators are a natural gen-eralization of the MPS concept to operators. Let us consider an arbitrary operator O :

₌

∑

_{| 〉〈 ′|}_{σ σ} σ σ σ σ ′ ′ O O . , , (19) The idea of MPS is directly applicable to operators by regarding (σlσ′l) as one big index at each site. That is, the coefficients are represented as a product of local tensors as follows:

∑

= σ σ σ σ σ σ = σ σ σ σ σ σ σ σ′ ′ ′ − ′ W W′ ′ W ′ O W W W , b b r r b b b b b , , , , , 1, , , L L L L L L L L 1 1 1 1 1 1 2 2 2 1 1 1 2 2 (20) where the W ’s are now rank-4 tensors. The maximum value of bl is referred to as the bond dimension of the MPO. We discuss how to construct an MPO for a given Hamiltonian in section 5.2.

(9)

()

5.1.4. Linear algebra with MPS and MPO. We can perform fundamental operations in quantum mechanics in the framework of MPS and MPO. One of the simplest examples is the summation of two wavefunctions |ϕ1〉 and |ϕ2〉, as is required in equations (13).

The sum of two MPSs with bond dimensions D1 and D2, respectively, has a bond

dimension of D′ D1 + D2. This can be understood by considering the sum of two MPS

with bond dimension one:

∑

φ | 〉 = σ σ|σ〉 σ A A , 1 1 L (21)

∑

φ | 〉 = σ σ|σ〉 σ B B . 2 1 L (22) One can easily see that the sum is given by

∑

σ σ ⎛_⎝⎜⎜ σ ⎞_⎠_{⎟⎟⎟× ×}⎛_⎝⎜⎜ ⎞_⎠_⎟⎟⎟ |σ〉 σ σ σ σ σ σ − −

( )

A B A B A B A B ( ) 0 0 0 0 , L L L L 1 1 2 2 1 1 (23) with a bond dimension of two. This can be extended to larger bond dimensions in a straightforward way. A sum of two MPS of bond dimension D requires only O(dD2_L)

operations. However, it may be necessary to compress the resulting MPS to keep the bond dimension bounded at D. This cost dominates over the summation for D 1 because the compression is O(dD3_L).

Another important operation is applying an operator O to a wavefunction |ϕ〉, such as applying the Hamiltonian to a wavefunction in equation (13). Let us consider an MPO of bond dimension DW and an MPS of bond dimension D. In this paper, we adopt an iterative approach which minimizes the residual | 〉− | 〉φ O φ 2 with respect to φ 〉 for a fixed bond dimension D. This algorithm scales as O (LD3DWd) for 1 DW D [21].

5.2. Numerical details

The simulations in this section are carried out for the impurity model given in section 4. We take U = 6 and J = U/6 and β = 50. The Hamiltonian (17) can be represented by an MPO with a bond dimension of DW∝Norb2 because the MPO for each term in

equa-tion (17) has a bond dimension of 1. This means that the computational effort scales polynomially with Norb as O D N( 3 _orb3 ). A further speed-up can be achieved by

compress-ing the MPO. As will be explained in appendix A, the Hamiltonian can be represented by a more compact MPO with bond dimension eight irrespective of Norb because the

two-body interactions are homogeneous. Using this compact MPO, the computational effort now scales as O D N( 3 )

orb .

Although we consider the Slater–Kanamori interaction in this paper, the approach can be applied to any impurity model including general one- and two-body interactions like intra-orbital hopping and correlated hopping. As explained in appendix B, any one- and two-body interaction term can be represented by an MPO with bond dimen-sion one. The compresdimen-sion of the MPO for the Hamiltonian is also possible for general one- and two-body interactions.

(10)

()

The following calculations were performed on a 2GHz Intel Core i7 CPU (Ivy Bridge), without parallelization. We used the Intel C++ Compiler v13.0 and the Math Kernel Library. The imaginary-time evolution was implemented with the MAQUIS/ DMRG code [22]. The following results were obtained without exploiting good quan-tum numbers such as the total electron number. We found that exploiting conserved quantum numbers does not reduce the computational cost for the small bond dimen-sions D 50 used in this study.

We measure the timings and the accuracy using the Krylov-sparse-matrix and Krylov-MPS methods as follows. First, we perform Monte Carlo simulations with an exact solver (Krylov-sparse-matrix solver) in the same way as in [9]. After thermaliza-tion, we randomly select several configurations and measure timings. Calculations with the MPS method are then repeated for the same configurations using the MAQUIS/ DMRG code. In the following, we measure the timings and the accuracy of the imagi-nary-time evolution for the ground state of the largest subspace with (N_↑, N_↓) = (3, 2), (4,3), (4,4), (5,5) for Norb = 5, 6, 7, 8 and 10, respectively.

5.3. Accuracy of the MPS method

First, we discuss the accuracy of the MPS formalism. In figure 1, we show the convergence of the calculated value of the trace with respect to the bond dimension D. The impurity sizes are Norb = 5, 7, 8 and 10. The relative error is defined as |( ( )t D −texact) /texact|,

where t(D) and texact are the values of the trace calculated by the MPS formalism and

the exact solver, respectively. The expansion order per flavor Nexp is 4.5–5 for Norb=5

and 3–3.5 for Norb= 7, 8 and 10, respectively.

As seen in figure 1, the relative error of the MPS method decreases rapidly as D increases. For Norb=5, the results are already converged at D = 8 for all sets of τ{O ( )i }. For the largest system, i.e., Norb=10, the relative error is well converged (and below 10−7) at D = 16, even though the dimension of the Hilbert space is

( )

10 =

5 63 504.

These results show that the MPS formalism yields accurate results even with a bond dimension considerably smaller than the dimension of the Hilbert space.

5.4. Performance of the MPS method

Next, we compare the performance of the two methods. Figure 2 shows the timing for an imaginary time evolution in the interval [0, β]. It is clearly seen that the timing for the exact solver increases exponentially with Norb. The red broken line in figure 2 is a fit by

CN 4N ,

orb2 orb

(24) where C is a positive constant. Equation (24) is derived as follows. The most costly oper-ation in the imaginary time evolution is applying a sparse matrix H to a dense vector in equations (12) and (13). Each such operation costs O N( orb2DHilbert), where the dimen-sion of the largest subspace DHilbert is given by

⎛ ⎝ ⎜⎜⎜NN / 2⎞_⎠⎟⎟⎟⎟∝4 /N N orb orb orb

orb _{. Assuming that the} expansion order per orbital is O(1), we immediately arrive at equation (24). As shown in figure 2, the data are well fitted by equation (24) for Norb⩾7 with C = 2.5 × 10−8.

(11)

()

On the other hand, the timing for the MPS formalism is expected to scale as

O D N( 3 )

orb2 for a fixed D. This comes from the fact that applying H to an MPS costs

O D N( 3 )

orb . As seen in figure 2, the data are indeed well fitted by the expected scaling

−

a N (25)( orb2 b)

with a and b positive constants. We note that the estimated value of a increases only slightly from 0.322 to 0.896 as D increases from D = 16 to D = 30, though one expects a (30/16)3 (≃ 6.59) time increase. This may be due to overhead in treating many small matrices for small D. This can be seen more explicitly when we plot the timings as

Figure 1. Convergence of the value of the trace with respect to the bond dimension

D for Norb=5, 7, 8, 10, respectively.

0.00 0.05 0.10 0.15 0.20 0.25 1/D 10−11 1010−10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 1010−1 0 101 Relativ e error Norb= 5 0.00 0.05 0.10 0.15 0.20 0.25 1/D 10−11 1010−10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 1010−1 0 101 Relativ e error Norb= 7 0.00 0.05 0.10 0.15 0.20 0.25 1/D 10−11 1010−10 −9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 1010−1 0 101 Relativ e error Norb= 8 0.00 0.05 0.10 0.15 0.20 0.25 1/D 10−11 1010−10 −9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 1010−1 0 101 Relativ e error Norb= 10

http://doc.rero.ch

(12)

()

a function of D for each Norb in figure 3. It is obvious that the timing increases more

slowly than the expected asymptotic scaling O (D3) for D 50.

Even for the largest Norb considered (Norb=10), the MPS formalism with D = 16 runs about

10 times slower than the exact solver. However, the MPS formalism is expected to become more efficient than the exact solver for larger Norb. Extrapolating the timings of the two

methods using equations (24) and (25), the crossover point is estimated to be Norb=12–13, with only a slight dependence on the value of D (see the lower panel of figure 3).

5.5. Discussion and future perspectives

Our results show that the Krylov-MPS formalism can be potentially superior to the exact Krylov-sparse-matrix solver for a large number of orbitals Norb . Impurity 12 problems with Norb⩾12 are relevant for example for cluster-type DMFT calculations of multi-orbital Hubbard models. However, the MC simulation of such large impurity problems is not feasible at the moment with our present code. (To date, simulations with hybridization expansion solvers have been restricted to at most 7 orbitals.) Thus, in this section, we discuss how the performance might be improved.

In a MC simulation, we update {O ( )τi } by an elementary update such as insert-ing or removinsert-ing a pair of annihilation and creation operators. Each operator must be updated before the MC sampling loses its memory of the original configuration. Thus, the autocorrelation time τauto is expected to be roughly N N2 orb exp/pacc in units

of elementary updates N( exp is the expansion order per flavor). The acceptance rate

pacc depends on the system and on parameters such as β. It is typically on the order

Figure 2. Norb dependence of the timings for the imaginary time evolution in the

interval [0, β]. The data are averaged over 10 different operator configurations τ

{O ( )i } for each Norb. The red broken line and the dotted black line are the fit by

equations (24) and (25), respectively.

0 50 100 150 200 250 Norb2 0 20 40 60 80 100 120 140 Timings (sec) MPS (D = 16) MPS (D = 30) Exact solver 4 6 8 10 12 14 16 Norb 10−4 10−3 10−2 10−1 100 101 102 103 104 Timings (sec) MPS (D = 16) MPS (D = 30) Exact solver

http://doc.rero.ch

(13)

()

of 0.01–0.1. Assuming pacc=0.1 and Nexp=3 (a typical value in the strongly correlated

regime and for temperatures of about 1% of the bandwidth) for Norb=12, we obtain

τauto 700 elementary updates. Recalling that the timing for evaluating the trace is

O (102_{) seconds (see figure}₂_{), the autocorrelation time is estimated to be O (10}5_{) s}

or 30 h. In a weakly correlated metal, where the perturbation order is higher, the autocorrelation time is on the order of a week. This is too long for practical DMFT calculations.

There are possible ways to reduce the autocorrelation time. First, we can increase the acceptance rate pacc by proposing several candidates at each MC update. Evaluating

their weights can be assigned to different nodes. By using the heat bath algorithm or a better algorithm [23], the acceptance rate pacc can be increased to almost 1. Another

factor of 10 can be gained by using the improved MC sampling introduced in section 6, which avoids recomputing the full imaginary-time evolution from scratch. By using these two tricks, the autocorrelation time τauto can be redued to O (103) s, which is still

not short enough for practical applications.

Another possible way is to speed up the imaginary time evolution by parallel com-puting. However, this is not trivial because the bond dimensions of MPS and MPO tensors are quite small in the case of quantum impurity problems.

6. Improved Monte Carlo sampling

In this section, we propose an improved Monte Carlo sampling procedure which signifi-cantly reduces autocorrelation times for multi-orbital impurity problems. This sampling strategy can be used both in the Krylov-sparse-matrix method and the Krylov-MPS method. After reviewing previously proposed improved sampling strategies in section 6.1, we describe our new MC sampling scheme in section 6.2. We explain the details of benchmark calculations in section 6.3. The benchmark results are shown in section 6.4 and future perspectives are discussed in section 6.5.

Figure 3. Bond-dimension D dependence of the timings for an imaginary time

evolution in the interval [0, β]. Data for Norb=7, 8 and 10 are shown. The different points represent data for different operator configurations {O ( )τi }.

100 ₁₀1 ₁₀2 D 100 101 102 103 Timings (sec) ∝ D3 ∝ D Norb= 7 Norb= 8 Norb= 10

http://doc.rero.ch

(14)

()

6.1. Conventional method

In a hybridization-expansion continuous-time Monte Carlo simulation, one updates a current configuration by proposing a new configuration which is slightly different from the current one: For example, one tries to insert or remove a pair of creation and anni-hilation operators. Then, the new configuration is accepted stochastically according to the ratio of the weights of the current and new configurations. Naively, one might evaluate the weight of the new configuration by performing an imaginary time evolu-tion in the full interval [0, β]. However, the computational cost of such a calculation grows linearly with the expansion order Nexp. This is costly at low temperature or in a

metallic phase, where the expansion order is large.

For the matrix formalism, in which the trace is evaluated using matrix products, Haule proposed a trick to improve the efficiency of the Monte Carlo sampling [18]. Here, we introduce a related idea in the context of the Krylov method. The improved sampling strategy proposed in [18] is based on the observation that the insertion or removal of pairs of operators is predominantly a local (in imaginary time) process. In other words, the acceptance rate for an insertion or removal of a pair of operators with a large time difference is very low. It was furthermore proposed to do the time evolu-tion from both sides, storing the resultant matrix products at several intermediate τ points. Then, when trying to insert or remove a pair of operators with a short time difference, the evaluation of the trace only requires the time evolution in a short time interval. Although this makes trial steps cheaper, one has to recompute the interme-diate results once an update is accepted. Since this requires the time evolution from both sides, which typically costs O N( exp), the method does not change the scaling with respect to Nexp. Thus, the method gives a significant improvement in performance only

when the acceptance rate is quite low and Nexp is not so large.

6.2. Sliding-window approach

We now propose an improved update scheme in which the computational cost of an elementary update stays constant with respect to the expansion order Nexp. Although the

exponential scaling with the number of orbitals is not affected, this method substantially reduces the prefactor of the scaling at low temperatures. Although we explain the idea in the context of the Krylov algorithm, it can be applied to the matrix formalism as well.

First, to make the maximum use of the locality in the imaginary time, we introduce an upper bound tmax on the time difference between the two operators which we try to

insert or remove. As mentioned in the previous study, tmax can be almost independent

of β and Nexp. In addition, we introduce an imaginary-time window in which updates

are allowed (see figure 4(a)). The window width τwin=β/Nwin is taken to be larger than

(but on the order of) tmax. Now, similarly to [18], one performs the time evolution from

both sides and stores the results at the end-points of the window. This allows us to evaluate the trace for a new configuration at constant cost.

After several updates, the window is moved to the next position by τ / 2win (see

figure 4(b)). Concurrently, one updates the wave vectors at the end-points, which again costs only O(τwin)=O(1). This procedure is repeated so that the window moves back and forth in the whole interval [0, β]. This procedure is ergodic because we can produce operator pairs with arbitrary separation by inserting one with a short separation and then gradually

(15)

()

increasing the separation through MC updates. We refer to appendix C for a proof of ergo-dicity. The advantage of our algorithm is that the cost of each MC step is independent of β and reperforming the time evolution over the full time interval is not required.

By definition, τwin needs to be larger than tmax. In practical simulations, however, τwin

has a stricter lower bound:

τ > twin 2 max.

(26) Let us consider a pair of operators with a time difference of tmax as shown in figure 4(c).

When τ < twin 2 max, this pair does not fit into any of the local windows shown there and

thus cannot be removed by a single elementary update. Although this does not break the ergodicity, the autocorrelation time may increase. This problem can be avoided by taking τ > twin 2 max.

If the window moves from one side to the other fast enough compared to the auto-correlation time, the sequential sweep should not badly affect the autoauto-correlation time. As discussed in section 5.5, the autocorrelation time is O

(

Nexp orbN /pacc

)

. On the other

hand, moving the window from one side to the other takes 2Nwin=O N N( orb exp) MC steps. Because these two time scales are always of the same order, the autocorrelation time should not be severely affected.

6.3. Benchmark setup

In this section, we show benchmark results for a 5-orbital impurity problem. At each position of the window, we try to insert or remove a pair Nf times, where Nf is the

Figure 4. (a) Inserting a pair of operators in the window. Updates are allowed only

in the window. (b) The window is moved by τ / 2win from (a). Now, the pair { c2†,c2} can be removed. (c) The pair does not fit into any of the three windows. This can happen if τ < twin 2 max.

(a) (b) removal insertion (c) windows

http://doc.rero.ch

(16)

()

number of flavors. When trying to insert a pair, the time difference between the opera-tors is chosen randomly in the interval [0, tmax]. Correspondingly, when we try to remove a pair in the window, we first list all pairs of creation and annihilation opera-tors with a time difference equal to or less than tmax. Then, we try to remove one of

them. The detailed procedure is described in appendix D.

The following simulations were done by the Krylov algorithm based on sparse-matrix techniques on one CPU core of AMD Opteron 6174 (2.2 GHz). We divide the Hilbert space into sectors by using the total particle number N and magnetiza-tion Sz as conserved quantum numbers. All data are averaged over four independent MC runs of fixed 1.28 × 107 steps. The diagonal Green’s function G(τ) is measured on 1001 points on the imaginary-time interval. We symmetrize G(τ) by using the particle-hole symmetry. The autocorrelation time of G(τ) is estimated by a binning analysis for each time point using bins of 16 384 MC steps. Simulations were per-formed using the ALPS libraries [24].

Figure 5. Benchmark results for a 5-orbital model with U = 6 and J = 1. (a) Distribution of

the time difference of a successfully removed or inserted pair of operators in the MC sampling performed with tmax/β =1 and Nwin=1. (b) Timing per MC step. The broken line represents

the relation timing∝τwin. (c), (d) Autocorrelation time of G(τ) in units of MC steps (c) and

seconds (d). 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 τpair/β 10−5 10−4 10−3 10−2 10−1 100 Distr ib ution (a.u.) (a) 10−2 ₁₀−1 ₁₀0 Nwin 10−3 10−2 10−1 Timing per MC step (sec) (b) 0.0 0.1 0.2 0.3 0.4 0.5 τ /β 101 102 103 104 A utocorrelation time (MC step) (c) Nwin= 1, dmax/β = 1 Nwin= 1, dmax/β = 0.02 Nwin= 10, dmax/β = 0.02 Nwin= 20, dmax/β = 0.02 0.0 0.1 0.2 0.3 0.4 0.5 τ /β 10−1 100 101 102 103 A utocorrelation time (sec) (d) Nwin= 1, dmax/β = 1 Nwin= 1, dmax/β = 0.02 Nwin= 10, dmax/β = 0.02 Nwin= 20, dmax/β = 0.02

http://doc.rero.ch

(17)

()

6.4. Benchmark results

6.4.1. Insulating region: U = 6 and β = 50. In figure 5(a), we show the distribution

of the time difference between pairs of operators successfully removed or inserted in the Monte Carlo sampling performed with tmax/β =1 and Nwin=1. As expected, we see

that the accepted updates are local in imaginary time. The distribution decreases expo-nentially for large time differences. We found that the range τpair/β⩽0.02 accounts for almost 94% of the successful updates. Considering the condition in equation (26), we take tmax/β =0.02 and Nwin⩽20. Since Nexp5.6, the window contains on average 5.6

operators for Nwin=20.

In figure 6, we show the Green’s function G(τ) computed using tmax/β =0.02 for different values of τwin. We also present data obtained for tmax/β =1 for comparison. All

the data shown are consistent within error bars, indicating our algorithm works cor-rectly. However, we found that G(τ) for Nwin=1 and tmax=1 is systematically smaller

than the others. This may be because the autocorrelation time is too long for the MC simulation to be thermalized.

In figure 5(b), we show the Nwin dependence of the timing per MC step for tmax/β =0.02.

As expected, the timing decreases linearly with the window size τwin. The estimated

autocorrelation time is shown in figures 5(c) in units of MC steps. For Nwin=1, the autocorrelation time is shorter by one order of magnitude for tmax/β =0.02 compared to that for tmax/β =1. This is consistent with the increase in the acceptance rate from 0.022 to 0.34 by introducing the cutoff. Now, we discuss the Nwin dependence. Around

τ ≃ β/2, the autocorrelation time is not affected badly by introducing the window, consistent with the above argument. Although the autocorrelation is affected around τ = 0.01, the increase is considerably smaller than the reduction in the CPU time.

Figure 5(d) shows the autocorrelation time in units of seconds. It is clearly seen that the autocorrelation becomes shorter in the entire τ region as Nwin increases up to

=

Nwin 20. The improvement is as much as two orders of magnitude from the most naive approach (Nwin=1 and tmax/β =1) to the best case (Nwin=20 and tmax/β =0.02).

6.4.2. Temperature and U dependence. Figure 7(a) shows the distribution function of

the length of successfully inserted and removed pairs of operators for different values of

Figure 6. G(τ) computed at U = 6 and β = 50. The inset shows a log-scale plot.

0.00 0.01 0.02 0.03 0.04 0.05 τ /β 0.0 0.1 0.2 0.3 0.4 0.5 − G (τ ) Nwin= 1, tmax/β = 1 Nwin= 1, tmax/β = 0.02 Nwin= 10, tmax/β = 0.02 Nwin= 20, tmax/β = 0.02 0.0 0.1 0.2 0.3 0.4 0.5 10−4 10−3 10−2 10−1 100

http://doc.rero.ch

(18)

()

β and U. The weakly correlated metallic region corresponds to U 2. First, we discuss the temperature dependence for U = 6. Comparing the data for β = 25 and β = 50, one can see that the distribution becomes more localized at low temperatures. This may be because the hybridization function Δ(τ) decays more rapidly with τ similarly to G(τ) at low temperatures. This result indicates that our improved MC sampling works even better at low temperatures. On the other hand, although the distribution becomes broader at smaller U, the distribution still decays exponentially at long distances. In figures 7(b) and (c), we plot the τwin dependence of the autocorrelation time averaged

over the interval 0<τ<β. We took tmax=0.05, 0.02 and 0.025 for (U = 6, β = 25), (U = 6, β = 50) and (U = 2, β = 50). It is clearly seen that the autocorrelation time scales linearly with τwin down to the lowest τwin for all parameter sets. This indicates the

robustness of the sliding window approach.

Figure 7. (a) Distribution of the time difference of a pair of operators successfully

removed or inserted in the MC sampling. (b), (c) τwin dependence of the

autocorrelation time in CPU time. The autocorrelation time is averaged over the interval of 0 < τ < β. In (c), the data are normalized by the timings for τwin/β =1.

0.00 0.02 0.04 0.06 0.08 0.10 τpair/β 10−2 10−1 100 Distribution (a.u.) (a) U = 6, β = 25 U = 6, β = 50 U = 2, β = 50 10−2 ₁₀−1 ₁₀0 τwin/β 10−1 100 101 102 A utocorrelation time (sec) (b) U = 6, β = 25 (tmax/β = 0.05) U = 6, β = 50 (tmax/β = 0.02) U = 2, β = 50 (tmax/β = 0.025) 10−2 ₁₀−1 ₁₀0 τwin/β 10−2 10−1 100 A utocorrelation time (a.u.) (c) U = 6, β = 25 (tmax/β = 0.05) U = 6, β = 50 (tmax/β = 0.02) U = 2, β = 50 (tmax/β = 0.025)

http://doc.rero.ch

(19)

()

6.5. Discussion and future perspectives

A simple way to choose the window size is to measure the distribution function of the distance between successfully inserted or removed pairs of operators during the thermalization process. Then, one can choose a reasonable cutoff tmax such that

most of the distribution, say 95%, is contained within the cutoff. The window size τwin=β/Nwin is then given by the minimum size that satisfies the lower bound given

in equation (26).

Further improvements of the efficiency may be possible by using the heat-bath algorithm or a better algorithm [23] where we propose several candidates at each update. This allows to increase the acceptance rate and reduce the autocorrelation time.

There are other kinds of local updates with acceptance rates higher than inserting/ removing pairs of operators. Examples include shifting an operator on the imaginary time axis or swapping two nearest neighboring operators. Introducing such efficient updates helps in practical calculations.

7. Summary

In this paper, we discussed two complementary approaches based on the hybrid-ization-expansion continuous-time Monte Carlo method for multi-orbital systems. First, we proposed to combine the Krylov approach with the MPS/MPO representa-tion of states and operators. We found that highly accurate results can be obtained by using bond dimensions considerably smaller than the dimension of the whole Hilbert space. Based on a scaling analysis, we showed that the performance becomes superior to the conventional method for quantum impurity problems involving more than 12 orbitals.

Second, we proposed an improved Monte Carlo sampling algorightm for the hybridization expansion Monte Carlo method. Detailed benchmark tests were car-ried out for a 5-orbital impurity model. We showed that the new algorithm works robustly for a broad range of on-site repulsions and temperatures. In particular, we confirmed that the “sliding window” approach works particularly efficiently at low temperatures and we expect that it will be useful in the study of phenom-ena emerging at low temperatures. The sampling scheme is easy to implement in existing Monte Carlo codes and applies to any variant of the hybridization expansion method.

Acknowledgments

We thank Iztok Pizorn for discussions on matrix product states. We also thank Jakub Imriska, Hidemaro Suwa, Hugo Strand, Lei Wang and Li Huang for useful comments on the improved Monte Carlo sampling. HS and PW acknowledge support from the DFG via FOR 1346 and SNF Grant 200021E-149122. This project was supported by ERC grant SIMCOFE. Simulations were performed using the ALPS libraries [24].

(20)

()

Appendix A. MPO for a model with uniform all-to-all interactions

Let us consider a Hamiltonian with uniform all-to-all interactions:

⩾ ⩾ H=

_{∑ ∑}

+

_∑

= < = A B O , n N i j i j L i n j n i L i 1 1, 2, ( ) ( ) 1 op (A.1) where Ai n ( ) and Bj n ( )

are operators acting on the local Hilbert spaces on sites i and j, respectively. Oi is an operator acting on site i. A compressed MPO can be explicitly constructed for this kind of model with all-to-all uniform interactions.

The Hamiltonian in equation (A.1) may be written in the form H= WW1 2W ,L

(A.2) where Wi is a matrix whose elements are operators acting on the local Hilbert space at site i. The Wi are given as follows:

=

(

)

W _I _A _A N _{O ,} 1 (1) ( op) (A.3) = ⎛ ⎝ ⎜⎜⎜ ⎜⎜⎜ ⎜⎜⎜ ⎜⎜⎜ ⎜⎜ ⎞ ⎠ ⎟⎟⎟ ⎟⎟⎟ ⎟⎟⎟ ⎟⎟⎟ ⎟⎟⎟ ⎟ W I A A O I B I B I 0 0 0 0 0 0 0 0 0 0 0 0 0 , i N N (1) ( ) (1) ( ) op op (A.4) = ⎛ ⎝ ⎜⎜⎜ ⎜⎜⎜ ⎜⎜⎜ ⎜⎜⎜ ⎜ ⎞ ⎠ ⎟⎟⎟ ⎟⎟⎟ ⎟⎟⎟ ⎟⎟⎟ ⎟⎟⎟ W O B B I , L N (1) ( op) (A.5)

where 1 < i < L and I denotes the identity operator. One can see that equation (A.2) is in the MPO form with bond dimension Nop+2 when each element in Wi is regarded as a 4 × 4 matrix.

For the multi-orbital Hubbard model given in equation (17), one obtains an MPO of bond dimension eight by taking

= ↑ A (A.6)(1) n , = ′− ↑+ ′ ↓ B (A.7)(1) (U J n) U n , = ↓ A (A.8)(2) n , = ′− ↓+ ′ ↑ B (A.9)(2) (U J n) U n , † = + ≡ ↑ ↓ A_(A.10)(3) S ( c c ),

http://doc.rero.ch

(21)

()

= − − ≡− ↓ ↑ B (A.11)(3) JS ( Jc c† ), = − A (A.12)(4) S , = − + B (A.13)(4) JS , † † = + ≡ ↑ ↓ A (A.14)(5) D ( c c ), = − − ≡− ↑ ↓ B (A.15)(5) JD ( Jc c ), = − A (A.16)(6) D , = − + B (A.17)(6) JD , = ↑ ↓ O (A.18)Un n .

For the local Hilbert space spanned by |0〉, |c_i†_↓ 0⟩, c | 〉_i†_↑ 0 , | 〉c c_i† †_{↑ ↓}_i 0 , = ⎛ ⎝ ⎜⎜⎜ ⎜⎜⎜ ⎜ ⎞ ⎠ ⎟⎟⎟ ⎟⎟⎟ ⎟⎟ ↑ c 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 , † (A.19) = ⎛ ⎝ ⎜⎜⎜ ⎜⎜⎜ ⎜ − ⎞ ⎠ ⎟⎟⎟ ⎟⎟⎟ ⎟⎟ ↓ c 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 , † (A.20) † = ↑ ↑ ↑ n (A.21)c c , = ↓ ↓ ↓ n (A.22)c c .†

Appendix B. MPO for general interactions

In this appendix, we show how the MPS formalism is extended to general interac-tions. Let us begin by showing the MPO representation of annihilation and creation operators. In the operator representation, they look like

_{⊗ ⊗ ⊗}₋ _{⊗ ⊗} ₊ _⊗

f1 f2 f_i 1 Oi Ii 1 I ,L

(B.1) with the site index explicitly shown. We omit the spin index for simplicity. Here, O is the matrix representation of the annihilation or creation operators given in appendix A. The operator f_i counts the number of particles and is given by

= ⎛ ⎝ ⎜⎜⎜ ⎜⎜⎜ ⎜ − − ⎞ ⎠ ⎟⎟⎟ ⎟⎟⎟ ⎟⎟ f 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 (B.2)

http://doc.rero.ch

(22)

()

in the local basis introduced in appendix A. Therefore, annihilation and creation opera-tors are obviously represented by an MPO with bond dimension one. Since the product of two MPO with bond dimension one has bond dimension one, any product of annihi-lation and creation operators can be represented by an MPO with bond dimension one. For example, a correlated hopping term n c c _{1 2 4}† reads

†

⊗ ⊗ ⊗ ⊗ ⊗

n (B.3)1 c f2 2 f3 f c4 4 I5 .

The summation over the site index can be explicitly taken in a way similar to that in appendix A. Let us consider the sum of correlated hopping terms

†

∑

≠ ≠ n c c i j k i j k (B.4) as an example. For simplicity, we restrict ourselves to the case i < j < k. In this case, the sum is represented by the MPO with the following local tensors:

=

(

)

W (B.5)1 I n 0 0 , † = ⎛ ⎝ ⎜⎜⎜ ⎜⎜⎜ ⎜⎜⎜ ⎜⎜ ⎞ ⎠ ⎟⎟⎟ ⎟⎟⎟ ⎟⎟⎟ ⎟⎟⎟⎟ < < W I n I c f f f c I i L 0 0 0 0 0 0 0 0 0 (1 ), i (B.6) = ⎛ ⎝ ⎜⎜⎜ ⎜⎜⎜ ⎜⎜ ⎞ ⎠ ⎟⎟⎟ ⎟⎟⎟ ⎟⎟⎟ W I 0 0 0 . L (B.7)

Appendix C. Ergodicity of the sliding-window approach

In this appendix, we show that the MC sampling based on the sliding window approach is ergodic. In particular, we show that a pair of operators with an arbi-trary time difference can be inserted by repeated insertions and removals of pairs with a short time difference. The procedure is illustrated in figure B1. First, we insert a pair in the window as shown in figures B1(a) and (b). Then, the window is moved to the next position (figure B1(c)). As shown in figure B1(d), the distance between the operators can be increased by inserting a new pair and removing two operators in the middle because the two windows are overlapping each other. By repeating this procedure, one can create a pair with an arbitrary time difference. One can also remove any pair of operators, independent of the time difference, by reversing the above procedure. Therefore, it is obvious that one can transform any configuration into any other configuration by inserting and removing pairs within the sliding window.

(23)

()

Appendix D. Detailed Monte Carlo update procedure

The local Monte Carlo update procedure has been described in section II B of [18]. In this appendix, we explain how this procedure is modified when the cutoff tmax and the

sliding window are introduced.

Let us consider an attempt to insert a pair of creation and annihilation operators of flavor f at τc and τa. More specifically, we first choose τc randomly and uniformly in

the window. Then, τa is choosen randomly and uniformly in the window under the

con-straint | − | tτ τc a ⩽ max. The reverse process of this update is removing one operator pair of

flavor f whose length is equal to or less than tmax.

We first discuss the case without a cutoff tmax. The window is located on the interval

τ τ

[ winmin, winmax] with τwin=τwinmax−τwinmin. The weights of the original and new configurations are

denoted by worg and wnew, respectively. The probability to accept this insertion is

τ = ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ P w w N min 1, new _t , org win2 pairmax (D.1) where τwin=τwinmax−τwinmin is the size of the window and Npair is the number of operator pairs

of flavor f in the window after the insertion.

By introducing a cutoff tmax, the probability is changed to

τ τ = ⎡ ⎣ ⎢ ⎢ ⎢ Δ ⎤ ⎦ ⎥ ⎥ ⎥ P w w N min 1, new _t , org win a pairmax (D.2) where τ τ τ τ τ

Δ = (D.3)a min( c+tmax, winmax)−max( c−tmax, winmin)

Figure B1. How to insert a pair of operators with an arbitrary time difference.

(b) (c) (a) insertion window move insertion (d) removal (e)

http://doc.rero.ch

(24)

()

and Nt

pairmax is the number of operator pairs of flavor f whose length is equal to or less than

tmax in the window after the insertion.

The probability to accept an attempt to remove a pair at τc and τa is

correspond-ingly given by τ τ = ⎡ ⎣ ⎢ ⎢ _Δ ⎤ ⎦ ⎥ ⎥ P w w N min 1, , t new old pair win a max (D.4) where Nt

pairmax is the number of operator pairs of flavor f for the original configuration.

References

[1] Georges A, Kotliar G, Krauth W and Rozenberg M J 1996 Rev. Mod. Phys. 6813–125 [2] Maier T, Jarrell M, Pruschke T and Hettler M H 2005 Rev. Mod. Phys. 771027–80

[3] Kotliar G, Savrasov S Y, Haule K, Oudovenko V S, Parcollet O and Marianetti C A 2006 Rev. Mod. Phys.

78865–951

[4] Rubtsov A N, Savkin V V and Lichtenstein A I 2005 Phys. Rev. B 72035122

[5] Werner P, Comanac A, de’ Medici L, Troyer M and Millis A J 2006 Phys. Rev. Lett. 97076405 [6] Werner P and Millis A J 2006 Phys. Rev. B 74155107

[7] Werner P and Millis A J 2010 Phys. Rev. Lett. 104146401 [8] Ayral T, Biermann S and Werner P 2013 Phys. Rev. B 87125149 [9] Läuchli A M and Werner P 2009 Phys. Rev. B 80235117 [10] Östlund S and Rommer S 1995 Phys. Rev. Lett. 753537–40 [11] White S R 1992 Phys. Rev. Lett. 692863–6

[12] White S R and Noack R M 1992 Phys. Rev. Lett. 683487–90

[13] Nishimoto S and Jeckelmann E 2004 J. Phys.: Condens. Matter 16613 [14] Raas C, Uhrig G S and Anders F B 2004 Phys. Rev. B 69041102

[15] Raas C and Uhrig G S 2005 Eur. Phys. J. B: Condens. Matter Complex Syst. 45293–303 [16] Nishimoto S, Pruschke T and Noack R M 2006 J. Phys.: Condens. Matter 18981

[17] Peters R 2011 Phys. Rev. B 84075139 [18] Haule K 2007 Phys. Rev. B 75155113

[19] Gull E, Millis A J, Lichtenstein A I, Rubtsov A N, Troyer M and Werner P 2011 Rev. Mod. Phys.

83349–404

[20] Parragh N, Toschi A, Held K and Sangiovanni G 2012 Phys. Rev. B 86155158 [21] Schollwöck U 2011 Ann. Phys. 32696–192

[22] Dolfi M et al 2014 in preparation

[23] Suwa H and Todo S 2010 Phys. Rev. Lett. 105120603 [24] Bauer B et al 2011 J. Stat. Mech. P05001