Probabilities and random variables - Notations, definitions and mathematical background

1.1 Notations, definitions and mathematical background

1.1.2 Probabilities and random variables

Quantum mechanics and more specifically quantum information theory make heavy use of the concept of probability distributions. The study of probabilities itself goes

beyond what is required to understand this work. We provide here a basic intro-duction. For illustration purposes, the case of a die-throw will serve as an example.

Probability space: A probability space is given by a triplet (Ω, E, P).

• Ω is the set of all outcomes. In the case of a standard 6-sided-die-throw, this set is given by {1,2,3,4,5,6}.

• E is a Σ-algebra² on Ω. It represents all events that one is interested in. This varies with the purpose of the die-throw. It could be that one is only interested in the parity of the result, which givesE =

∅,{1,3,5},{2,4,6},{1,2,3,4,5,6} . In another case one may want to know whether the result was a 6 or not, resulting in E =

∅,{1,2,3,4,5},{6},{1,2,3,4,5,6} . In general one can simply take E to be the powerset of Ω, i.e. the set of all subsets of Ω, which corresponds to all possible events.

• P is a probability measure, meaning a function P : E → [0,1] giving the probability of the event happening. It has to satisfy two requirements. First, P(Ω) = 1, which translates to saying that some outcome within the set of possible outcomes was produced with certainty. Second, for e₁, e₂ ∈ E with e₁∩e₂ =∅,P(e₁∪e₂) =P(e₁)+P(e₂), meaning that if two events are mutually exclusive, then the probability of either one or the other occurring is given by the sum of their individual probabilities. In the case of fair die being thrown, one hasP({1}) =. . .=P({6}) = ¹₆ and the probability of other events can be deduced from these using the second property. For example, the probability of getting an odd outcome is given byP({1,3,5}) =P({1})+P({3})+P({5}) =

2 since the events ’the die shows 1’, ’the die shows 3’ and ’the die shows 5’ are mutually exclusive.

Random variable: A random variableV is a functionV : Ω→A_V from the set of all outcomes Ω of a probability space (Ω, E, P) to a measurable setA_V, which we call the alphabet of this random variable. The random variable inherits a probability dis-tribution P_V from the probability space, given by P_V(v) =P({ω∈Ω :V(ω) = v)}. Let us look at the example of rolling a fair die and playing a game where a player gets points equal to twice the number that they rolled. We can then define the random variable V as the function V(1) = 2, V(2) = 4, . . ., V(6) = 12. Even further than just asking what the probability of gaining a certain amount of points is, we can also ask what the probability of getting points within a certain range is. For example, the probability that a player gets strictly more than 3 but strictly less than 7 points on their die-roll is given by P_V(3 < V < 7) = P({ω ∈ Ω : 3 <

V(ω) < 7}) = P({2,3}) = ¹₃. Furthermore, random variables are useful in order to select the outcomes of one object in a probability space spanning several. An easy example is the case of throwing two dice, where the outcome set is given by {(1,1),(1,2), . . . ,(6,6)}. If we want to conveniently consider only one die or the other, we can define two random variablesD₁ andD₂, whereD₁ maps any outcome of the two dice onto the outcome of the first die, i.e. D₁((d₁, d₂)) =d₁ and similarly for D₂.

2A Σ-Algebra is a set of subsets that is closed under complement, union and intersection.

Joint distribution: Two random variables V₁ : Ω →A₁ and V₂ : Ω→ A₂ on the same probability space (Ω, E, P) have a joint probability distributionP_V₁_V₂ given by P_V₁_V₂(v₁, v₂) = P({ω ∈ Ω : V₁(ω) = v₁, textV₂(ω) = v₂}). This corresponds to the probability that V₁ takes the value v₁ and V₂ takes the value v₂ at the same time.

The definition trivially generalizes to the probability for both of the variables to be within a certain range and of course to an arbitrary number of random variables.

Within the previous example of rolling two dice, we can therefore determine the probability that the first die shows 3 pips while the second die shows 5 or more:

P_D₁_D₂(D₁ = 3, D₂ ≥ 5) = P({(3,5),(3,6)}) = ₁₈¹. Note that it is also technically possible to ask for a relationship between the two random variables. For example, P_D₁_D₂(D₁ =D₂) =P({(1,1),(2,2),(3,3),(4,4),(5,5),(6,6)}) = ¹₆ is the probability that the two dice show the same number of pips.

Marginal distribution: Given the joint distribution P_V₁_V₂ of two random vari-ables V₁ and V₂, we can compute the distributions P_V₁ and P_V₂ of the individual random variables directly without having to resort back to the distribution P of the underlying probability space. This is done by simple integration or summation over the random variable we wish to eliminate, i.e. P_V₁(v₁) = R

F2dv₂P_V₁_V₂(v₁, v₂) and vice-versa.³ Within the context of this joint distribution, the distributions P_V₁ and P_V₂ are called the marginal distributions.

Conditional distribution: Given the joint distributionP_V₁_V₂ of two random vari-ablesV₁ andV₂, we define the distribution ofV₁ conditioned onV₂ byP_V₁_|V₂(v₁|v₂) =

PV1V2(v1,v2)

P_V₂(v2) for P_V₂(v₂) 6= 0. This corresponds to the probability that V₁ takes value v₁ if V₂ takes value v₂. Let us reconsider the example of throwing two dice, but now we define the random variable S to be the sum of the number of pips of the two dice. The joint distribution P_D₁_S can easily be computed. The conditional distribution allows us to determine what the probability is that the first dice shows 3 pips given that we know that the sum is larger than 7, i.e.

P_D₁_|S(D₁ = 3|S ≥7) = ^P^D¹^S_P^(D¹^=3,S≥7)

S(S≥7) = ^1/12_7/12 = ¹₇.

Independent distributions: The reason we did not use D₁ andD₂ as the example for conditional distributions is because they are in fact independent. Two random variables are said to be independent if knowledge about one of them tells us nothing about the other. The pips of the two dice in a fair dice-throw are a good example of this; knowing how many pips one of them shows gives us no information about the other. This manifests directly in the conditional distribution. Two random vari-ablesV₁ andV₂ are independent if and only if P_V₁_|_V₂(v₁|v₂) =P_V₁(v₁) for all v₁ ∈A₁, v₂ ∈A₂. This was clearly not the case when we looked at the distribution ofD₁ and S.

Extended distribution: Throughout this work, a question that will arise several times is whether a given distribution can be written as the marginal of another one which fulfills some conditions. However, this second distribution does not need to be defined on the same event-algebra. Let (Ω, E, P) be a probability space with a

3In the case where A2 is a discrete set, the integral is replaced by a sum over all the elements ofA2.

random variableV : Ω→A. Let Ω⁰ andE⁰ be an outcome-set and an event-algebra.

We define the extended outcome-set ˜Ω = {(ω, ω⁰) : ω ∈ Ω, ω⁰ ∈ Ω⁰} and extended event-algebra ˜E =

{(ω_i, ω_j⁰)}^i,j : {ω_i}ⁱ ∈ E,{ω_j⁰}^j ∈ E⁰ . Let ˜P be a probability distribution on ( ˜Ω,E) such that ( ˜˜ Ω,E,˜ P˜) is a probability space. We define the random variable ˜V : ˜Ω → A with ˜V((ω, ω⁰)) = V(ω), which we call the extension of V, and an arbitrary random variable V⁰ : ˜Ω → A⁰. We say that PV V˜ ⁰ is an ex-tended distribution of P_V if P_V(v) = PV˜(v) ∀v ∈A. As an example, let P_D be the distribution of a single die roll, with a well-established probability space. Then the distribution P_D₁_D₂ of rolling two dice, which we considered above, is an extended distribution ofP_D with ˜D=D₁, as is the distributionP_D₁_S.

The notation so far has been rather cumbersome. To alleviate this, we only specify the random variables if they are not clear from context. In general, we use uppercase letters to denote random variables and the corresponding lowercase letters for a generic value within its alphabet. We thereby shorten P_V(v) toP(v). We also abuse notation by denoting the alphabet of a random variable by the same letter as the random variable; for example, we writeR

V dv instead ofR

AV dv when we want to integrate over the full alphabet ofV. Whether the random variable or the alphabet is meant is clear from context. Finally, the notation for extended distributions is far less confusing if the random variable and its extension are denoted by the same letter.

Nonlocality

As is so often the case, nonlocality is best explained as a game. Two players, called Alice and Bob, participate in a cooperative game show. The game they have to play is well-known in advance. They will separately enter two sealed rooms with no means of communication to the outside or between each other. They are allowed to take anything they want with them as long as it does not allow them to communicate.

Inside the rooms, they will find two balls, one black, one white. After they enter, the game master will flick a switch on each room, turning on either a green or a red light inside the corresponding room. Alice and Bob then pick one of the balls and leave the other, see Fig. 2.1. They win the game under the following conditions: if the game master turned on at least one green light (Alice and Bob both saw a green light, or Alice saw green and Bob saw red, or vice-versa), then the players win if they chose balls of the same color. However if both lights were red then they win if their chosen balls have different colors.

Figure 2.1: Two players, named Alice and Bob, enter a room each in which a black and a white ball have been previously placed. After entering the room, either a green or a red light is turned on inside each room. The players choose one of the two balls to take out of the room and leave the other. Their choice of ball is labeled by A and B and the color of the light by X and Y for Alice and Bob respectively.

Before entering the room, the players can meet and agree on a strategy, labeled by Λ.

It is of interest to determine the players’ chances of winning this game. Consider one particular strategy which they could use: independently of which light is on, both players always choose the black ball. In this case, they win in the three cases where at least one light is green, but lose when both lights are red, i.e. they win in 3 out of 4 cases. Of course one may now argue that ignoring the lights has to be

suboptimal, so let us consider a second strategy: Alice still chooses the black ball independent of her light, Bob on the other hand now chooses the black ball when he sees a green light and the white ball when he sees a red light. With this strategy they win when both lights are red (since Alice chooses black and Bob white), but lose when Alice’s light is green and Bob’s is red. So again they win in 3 out of 4 cases. Continuing this analysis quickly leads to the conclusion that it is in fact not possible to win more than 3 out of 4 games, which still holds true if Alice and Bob were to introduce some randomness into their choices, e.g. by flipping potentially biased coins. The fact that the players have to agree on a strategy before the buttons are pressed and that there is no communication between the two rooms caps their winning ratio at 3 out of 4. This can of course be proven rigorously [21]. We give a proof in Section 2.1.

Strategies such as the ones considered above are called local. Alice and Bob agree on a predetermined strategy and choose their ball based on this strategy and the input they get from the game master. We denote Alice and Bob’s respective inputs by X ∈ {red,green} and Y ∈ {red,green}, their choice of ball, or output, by A ∈ {black,white} and B ∈ {black,white} and their pre-agreed strategy by Λ.

Their joint conditional probability distribution can be written as P(ab|xy) =

dλP_Λ(λ)P_A_|_X_Λ(a|xλ)P_B_|_Y_Λ(b|yλ). (2.1) P(ab|xy) is the probability of Alice giving outputa and Bob giving output b condi-tioned on Alice’s input beingx and Bob’s input beingy. They can choose different strategies λ with probability P(λ). Since the game master is only interested in which outputs were given for each input and not in which strategy was used to achieve this, the strategies λ are averaged out. The rest of the expression is based on three assumptions. First, the only information Bob has about Alice’s choice of ball is given by the hidden strategy and the inputs, i.e. P_B_|_AXY_Λ(b|black, xyλ) = P_B|AXY_Λ(b|white, xyλ) for all hidden strategiesλ, all inputsxand y and all outputs b. Second, Bob has no information about Alice’s input, so his choice of ball can-not depend on it. This implies thatP_B_|_XY_Λ(b|green, yλ) =P_B_|_XY_Λ(b|red, yλ) for all strategies λ, inputs y and outputs b. The same two conditions hold vice-versa for Alice’s knowledge about Bob’s output and input. Finally, Alice and Bob have no information about the inputs when deciding on their strategy and the inputs are not influenced by their choice of strategy, i.e. P(λ|xy) =P(λ|x⁰y⁰) for allλ, x, x⁰, y, y⁰.

As discussed above, these local strategies can only win 3 out of 4 games on average. This condition can be expressed as

W_CHSH(P)≤3 with

W_CHSH(P) = P(A=B|gg) +P(A =B|gr) +P(A=B|rg) +P(A 6=B|rr), (2.2) where we shortened green to g and red to r. This game is known as the CHSH game [21]. Given how natural and intuitive local strategies are, it may come as a surprise that if Alice and Bob are familiar with quantum mechanics, they can perform better. To achieve this, they prepare, in advance, two quantum systems in an entangled quantum state. They each take one of the system as well as a

measurement apparatus with them into the room. After they enter their respective room and the lights have been turned on, they choose, based on whether they see a red or a green light, a measurement to perform on their respective system.

They decide which ball to pick dependent on the outcome of this measurement. By choosing the quantum state and the measurements smartly, as shown below, the players can now win on average 3.41 out of 4 games. This is the ‘spooky’ part of quantum mechanics; despite there being no communication between the systems, they can still exhibit correlations that cannot be explained by local distributions.

Quantum mechanics is inherently nonlocal. This was first shown by Bell in his famous theorem in 1964 [11].

To verify that quantum mechanics is nonlocal, consider the quantum state

|Φ⁺i= 1

√2 |00i+|11i

, (2.3)

where the first system will be with Alice and the second with Bob. This state is called the maximally entangled 2-qubit state. If Alice’s light is green, she measures in the

|0i,|1i basis, if it is red, she measures in the_|₀_i₊_|₁_i

√2 ,^|⁰^i−|√ ¹ⁱ

2 basis. Similarly, Bob measures in the

cos^π₈|0i+sin^π₈|1i,cos^5π₈ |0i+sin^5π₈ |1i basis in case of a green light and in the

cos^π₈|0i −sin^π₈|1i,cos^3π₈ |0i+ sin^3π₈ |1i basis in case of a red light.

Computing the resulting correlations, we find

P_Q(A=B|gg) =P_Q(A=B|gr) =P_Q(A=B|rg) = P_Q(A6=B|rr) = 2 +√ 2 4

(2.4) Evaluating W_CHSH(P) in (2.2), we find a value of 2 +√

2 ≈ 3.41 > 3. It can be shown that this is the maximal value achievable by quantum mechanics [20].

The discovery of nonlocal correlations and their existence has major conse-quences. On the one hand, it challenges the intuition of physicists used to the classical world. In fact, before John Bell’s work demonstrating quantum nonlocality in 1964 [11], many physicists argued that there was a local mechanism explaining quantum correlations that had simply not yet been discovered [27]. Bell’s theorem proves that this is not the case, at least if we assume that there is a maximal speed with which influences propagate. The concept is therefore of major interest in the realm of fundamental physics. On the other hand, nonlocality gave rise to the con-cept of device independent quantum information processing (DIQIP) [10,18,22] with many potential applications. To understand DIQIP, it is useful to first introduce the adversarial point of view.

Let us turn the game example around. The game master, which we will from now on call Eve, is trying to convince Alice and Bob that she has nonlocal quantum resources at her disposal. To demonstrate this, she builds two devices. The devices each have two buttons, one green and one red, and a screen which can turn either black or white. Alice and Bob each take one of the devices and separate them such that the devices cannot communicate, either by putting a large distance between them or by putting them in sealed rooms. They then play the game we introduced earlier, only now they are the ones asking the questions by pressing either the green or red button and the devices have to show either a black or a white screen. If the

Figure 2.2: An adversary Eve built two devices with two buttons each and a screen which can be either black or white. She gives one device each to Alice and Bob, who can now press the buttons on the device and record the color of the screen. It is Eve’s goal to convince Alice and Bob that the devices are operating in a nonlocal manner, meaning that she wants to perform well in the CHSH game (2.2).

devices manage to win, on average, more than 3 out of 4 games, then Eve was telling the truth. This type of setup is called an adversarial scenario (see Fig. 2.2).

As a consequence of this, Alice and Bob now know that Eve could not have built these devices with a preplanned strategy; any preplanned strategy Λ can only win an average of 3 out of 4 games. Alice and Bob conclude that their outcomes could not have been predetermined, but were generated only after the devices have been separated. Even further, if Alice and Bob now communicate to each other which button it was that they pressed, they can infer with high probability which color the other person saw based on the color they saw on their own screen. If the left-hand-side of (2.2) is large, then with high probability the screens show the same color if at least one of them pressed the green button and opposite colors if they both pressed the red button. So not only are their outcomes random, but they are also strongly correlated. If one knows both inputs and the output of one of the devices, one can infer with high probability the output of the other. Eve however, who only knows both inputs cannot infer either output¹. If Alice and Bob wish to create random numbers or to generate a secret key that only they know, for example in order to communicate later using a one-time-pad encryption, they can use devices that were built by a third party Eve and, if they perform well in the game, even Eve, the manufacturer of the devices, cannot guess the random numbers or decipher the encryption.

A valid objection at this point is of course that Eve could have built a third device with its own buttons and screen that could be correlated in a similar manner with Alice and Bob’s devices. This is however not possible due to a fact referred to as themonogamy of nonlocal correlations [43]. In rough terms, it states that if two devices have strong nonlocal correlations between them, then a third device cannot be correlated with either. We will not elaborate on the details here, suffice to say that this is a very powerful property of nonlocal correlations; it implies that if the correlations between Alice’s and Bob’s device are nonlocal enough, then there is no

1Technically if WCHSH(P) does not evaluate to its maximal value, knowing the inputs can allow to partially guess the outputs, but this guessing probability gets lower the closer the value gets to the maximum [43]

way for a third party to know anything about their outcomes.

Since nonlocality is a property only of the observed correlations, it is not nec-essary for Alice and Bob to know anything about the inner workings of the device;

Dans le document Measurement dependence, limited detection and more: problems and applications of quantum nonlocality (Page 18-33)