CRC Press has granted the following speciﬁc permissions for the electronic version of this book:

(1)

For further information, see www.cacr.math.uwaterloo.ca/hac

CRC Press has granted the following speciﬁc permissions for the electronic version of this book:

Permission is granted to retrieve, print and store a single copy of this chapter for personal use. This permission does not extend to binding multiple chapters of the book, photocopying or producing copies for other than personal use of the person creating the copy, or making electronic copies available for retrieval by others without prior permission in writing from CRC Press.

Except where over-ridden by the speciﬁc permission above, the standard copyright notice from CRC Press applies to this electronic version:

Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microﬁlming, and recording, or by any information storage or retrieval system, without prior permission in writing from the publisher.

The consent of CRC Press does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Speciﬁc permission must be obtained in writing from CRC Press for such copying.

c 1997 by CRC Press, Inc.

(2)

Mathematical Background

Contents in Brief

2.1 Probability theory. . . 50

2.2 Information theory . . . 56

2.3 Complexity theory . . . 57

2.4 Number theory . . . 63

2.5 Abstract algebra . . . 75

2.6 Finite fields . . . 80

2.7 Notes and further references . . . 85

This chapter is a collection of basic material on probability theory, information theory, complexity theory, number theory, abstract algebra, and finite fields that will be used throughout this book. Further background and proofs of the facts presented here can be found in the references given in§2.7. The following standard notation will be used throughout:

1. Zdenotes the set ofintegers; that is, the set{. . . ,−2,−1,0,1,2, . . .}. 2. Qdenotes the set ofrational numbers; that is, the set{^a_b |a, b∈Z, b= 0}. 3. Rdenotes the set ofreal numbers.

4. πis the mathematical constant;π≈3.14159.

5. eis the base of the natural logarithm;e≈2.71828.

6. [a, b]denotes the integersxsatisfyinga≤x≤b.

7. xis the largest integer less than or equal tox. For example,5.2= 5and

−5.2=−6.

8. xis the smallest integer greater than or equal tox. For example, 5.2 = 6and −5.2=−5.

9. IfAis a finite set, then|A|denotes the number of elements inA, called thecardinality ofA.

10. a∈Ameans that elementais a member of the setA.

11. A⊆Bmeans thatAis a subset ofB.

12. A⊂Bmeans thatAis a proper subset ofB; that isA⊆BandA=B.

13. Theintersectionof setsAandBis the setA∩B={x|x∈Aandx∈B}. 14. Theunionof setsAandBis the setA∪B={x|x∈Aorx∈B}. 15. Thedifferenceof setsAandBis the setA−B={x|x∈Aandx∈B}. 16. TheCartesian productof setsAandBis the setA×B={(a, b)|a∈Aandb∈

B}. For example,{a₁, a₂} × {b₁, b₂, b₃} = {(a₁, b₁),(a₁, b₂),(a₁, b₃),(a₂, b₁), (a₂, b₂),(a₂, b₃)}.

(3)

17. Afunctionormappingf :A−→Bis a rule which assigns to each elementainA precisely one elementbinB. Ifa∈Ais mapped tob∈Bthenbis called theimage ofa,ais called apreimageofb, and this is writtenf(a) =b. The setAis called the domainoff, and the setBis called thecodomainoff.

18. A functionf:A−→Bis1−1(one-to-one) orinjectiveif each element inBis the image of at most one element inA. Hencef(a₁) =f(a₂)impliesa₁=a₂. 19. A functionf :A−→Bisontoorsurjectiveif eachb∈Bis the image of at least

onea∈A.

20. A functionf : A −→ Bis abijectionif it is both one-to-one and onto. Iffis a bijection between finite setsAandB, then|A|=|B|. Iffis a bijection between a setAand itself, thenfis called apermutationonA.

21. lnxis the natural logarithm ofx; that is, the logarithm ofxto the basee.

22. lgxis the logarithm ofxto the base2.

23. exp(x)is the exponential functione^x. 24. _n

i=1a_idenotes the suma₁+a₂+· · ·+a_n. 25. _n

i=1a_idenotes the producta₁·a₂· · · · ·a_n.

26. For a positive integern, the factorial function isn! = n(n−1)(n−2)· · ·1. By convention,0! = 1.

2.1 Probability theory

2.1.1 Basic definitions

2.1 Definition Anexperimentis a procedure that yields one of a given set of outcomes. The individual possible outcomes are calledsimple events. The set of all possible outcomes is called thesample space.

This chapter only considersdiscretesample spaces; that is, sample spaces with only finitely many possible outcomes. Let the simple events of a sample spaceSbe labeled s₁, s₂, . . . , s_n.

2.2 Definition Aprobability distributionPonSis a sequence of numbersp₁, p₂, . . . , p_nthat are all non-negative and sum to 1. The numberp_iis interpreted as theprobabilityofs_ibeing the outcome of the experiment.

2.3 Definition AneventEis a subset of the sample spaceS. Theprobabilitythat eventE occurs, denotedP(E), is the sum of the probabilitiesp_iof all simple eventss_iwhich belong toE. Ifs_i∈S,P({si})is simply denoted byP(s_i).

2.4 Definition IfEis an event, thecomplementary eventis the set of simple events not be- longing toE, denotedE.

2.5 Fact LetE⊆Sbe an event.

(i) 0≤P(E)≤1. Furthermore,P(S) = 1andP(∅) = 0. (∅is the empty set.) (ii) P(E) = 1−P(E).

(4)

(iii) If the outcomes inSare equally likely, thenP(E) = ^|E|_|S|.

2.6 Definition Two eventsE₁andE₂are calledmutually exclusiveifP(E₁∩E₂) = 0. That is, the occurrence of one of the two events excludes the possibility that the other occurs.

2.7 Fact LetE₁andE₂be two events.

(i) IfE₁⊆E₂, thenP(E₁)≤P(E₂).

(ii) P(E₁∪E₂) +P(E₁∩E₂) =P(E₁) +P(E₂). Hence, ifE₁andE₂are mutually exclusive, thenP(E₁∪E₂) =P(E₁) +P(E₂).

2.1.2 Conditional probability

2.8 Definition LetE₁andE₂be two events withP(E₂)>0. Theconditional probability of E₁givenE₂, denotedP(E₁|E2), is

P(E₁|E₂) =P(E₁∩E₂) P(E₂) .

P(E₁|E₂)measures the probability of eventE₁occurring, given thatE₂has occurred.

2.9 Definition EventsE₁andE₂are said to beindependentifP(E₁∩E₂) =P(E₁)P(E₂).

Observe that ifE₁andE₂are independent, thenP(E₁|E₂) =P(E₁)andP(E₂|E₁) = P(E₂). That is, the occurrence of one event does not influence the likelihood of occurrence of the other.

2.10 Fact (Bayes’ theorem) IfE₁andE₂are events withP(E₂)>0, then P(E₁|E₂) = P(E₁)P(E₂|E₁)

P(E₂) .

2.1.3 Random variables

LetSbe a sample space with probability distributionP.

2.11 Definition Arandom variableXis a function from the sample spaceSto the set of real numbers; to each simple events_i∈S,Xassigns a real numberX(s_i).

SinceSis assumed to be finite,Xcan only take on a finite number of values.

2.12 Definition LetXbe a random variable onS. Theexpected valueormeanofXisE(X) =

si∈SX(s_i)P(s_i).

2.13 Fact LetXbe a random variable onS. ThenE(X) =

x∈Rx·P(X=x).

2.14 Fact IfX₁, X₂, . . . , X_mare random variables onS, anda₁, a₂, . . . , a_mare real numbers, thenE(_m

i=1a_iX_i) =_m

i=1a_iE(X_i).

2.15 Definition Thevarianceof a random variableXof meanμis a non-negative number defined by

Var(X) =E((X−μ)²).

Thestandard deviationofXis the non-negative square root ofVar(X).

(5)

If a random variable has small variance then large deviations from the mean are un- likely to be observed. This statement is made more precise below.

2.16 Fact (Chebyshev’s inequality) LetX be a random variable with meanμ = E(X)and varianceσ²= Var(X). Then for anyt >0,

P(|X−μ| ≥t)≤σ² t².

2.1.4 Binomial distribution

2.17 Definition Letnandkbe non-negative integers. Thebinomial coefficient_n

k

is the number of different ways of choosingkdistinct objects from a set ofndistinct objects, where the order of choice is not important.

2.18 Fact (properties of binomial coefficients) Letnandkbe non-negative integers.

(i) _n

k

=_k!(n−k)!^n! . (ii) _n

k

= _n

n−k

. (iii) _n+1

k+1

=_n

k

+ _n

k+1

.

2.19 Fact_n (binomial theorem) For any real numbersa,b, and non-negative integern,(a+b)ⁿ=

k=0

_n

k

a^kb^n−k.

2.20 Definition ABernoulli trialis an experiment with exactly two possible outcomes, called successandfailure.

2.21 Fact Suppose that the probability of success on a particular Bernoulli trial isp. Then the probability of exactlyksuccesses in a sequence ofnsuch independent trials is

n k

p^k(1−p)^n−k, for each0≤k≤n. (2.1) 2.22 Definition The probability distribution (2.1) is called thebinomial distribution.

2.23 Fact The expected number of successes in a sequence ofnindependent Bernoulli trials, with probabilitypof success in each trial, isnp. The variance of the number of successes isnp(1−p).

2.24 Fact (law of large numbers) LetXbe the random variable denoting the fraction of successes innindependent Bernoulli trials, with probabilitypof success in each trial. Then for any >0,

P(|X−p|> )−→0,asn−→ ∞.

In other words, asngets larger, the proportion of successes should be close top, the probability of success in each trial.

(6)

2.1.5 Birthday problems

2.25 Definition

(i) For positive integersm,nwithm≥n, the numberm⁽ⁿ⁾is defined as follows:

m⁽ⁿ⁾=m(m−1)(m−2)· · ·(m−n+ 1).

(ii) Letm, nbe non-negative integers withm ≥n. TheStirling number of the second kind, denoted_m

n , is m

n

= 1 n!

n k=0

(−1)^n−k n

k

k^m,

with the exception that₀

0 = 1.

The symbol_m

n counts the number of ways of partitioning a set ofmobjects inton non-empty subsets.

2.26 Fact (classical occupancy problem) An urn hasmballs numbered1tom. Suppose thatn balls are drawn from the urn one at a time, with replacement, and their numbers are listed.

The probability that exactlytdifferent balls have been drawn is P₁(m, n, t) =

n t

m^(t)

mⁿ , 1≤t≤n.

The birthday problem is a special case of the classical occupancy problem.

2.27 Fact (birthday problem) An urn hasmballs numbered1tom. Suppose thatnballs are drawn from the urn one at a time, with replacement, and their numbers are listed.

(i) The probability of at least one coincidence (i.e., a ball drawn at least twice) is P₂(m, n) = 1−P₁(m, n, n) = 1−m⁽ⁿ⁾

mⁿ , 1≤n≤m. (2.2) Ifn=O(√

m)(see Definition 2.55) andm−→ ∞, then P₂(m, n)−→1−exp

−n(n−1) 2m +O

1

√m

≈1−exp

−n² 2m

. (ii) Asm−→ ∞, the expected number of draws before a coincidence is_πm

2 . The following explains why probability distribution (2.2) is referred to as thebirthday surpriseorbirthday paradox. The probability that at least2people in a room of23people have the same birthday isP₂(365,23)≈0.507, which is surprisingly large. The quantity P₂(365, n)also increases rapidly asnincreases; for example,P₂(365,30)≈0.706.

A different kind of problem is considered in Facts 2.28, 2.29, and 2.30 below. Suppose that there are two urns, one containingmwhite balls numbered1tom, and the other con- tainingmred balls numbered1tom. First,n₁balls are selected from the first urn and their numbers listed. Thenn₂balls are selected from the second urn and their numbers listed.

Finally, the number of coincidences between the two lists is counted.

2.28 Fact (model A) If the balls from both urns are drawn one at a time, with replacement, then the probability of at least one coincidence is

P₃(m, n₁, n₂) = 1− 1 mⁿ¹⁺ⁿ²

t1,t2

m^(t¹^+t²⁾ n₁

t₁ n₂

t₂

,

(7)

where the summation is over all0≤t₁≤n₁,0≤t₂≤n₂. Ifn=n₁=n₂,n=O(√ m) andm−→ ∞, then

P₃(m, n₁, n₂)−→1−exp

−n² m

1 +O 1

√m

≈1−exp

−n² m

.

2.29 Fact (model B) If the balls from both urns are drawn without replacement, then the probability of at least one coincidence is

P₄(m, n₁, n₂) = 1− m⁽ⁿ¹⁺ⁿ²⁾ m⁽ⁿ¹⁾m⁽ⁿ²⁾. Ifn₁=O(√

m),n₂=O(√

m), andm−→ ∞, then P₄(m, n₁, n₂)−→1−exp

−n₁n₂ m

1 +n₁+n₂−1

2m +O

1 m

.

2.30 Fact (model C) If then₁white balls are drawn one at a time, with replacement, and then₂ red balls are drawn without replacement, then the probability of at least one coincidence is

P₅(m, n₁, n₂) = 1− 1−n₂

m _n₁

. Ifn₁=O(√

m),n₂=O(√

m), andm−→ ∞, then P₅(m, n₁, n₂)−→1−exp

−n₁n₂ m

1 +O

1

√m

≈1−exp

−n₁n₂ m

.

2.1.6 Random mappings

2.31 Definition LetFndenote the collection of all functions (mappings) from a finite domain of sizento a finite codomain of sizen.

Models where random elements ofFn are considered are calledrandom mappings models. In this section the only random mappings model considered is where every function fromFnis equally likely to be chosen; such models arise frequently in cryptography and algorithmic number theory. Note that|Fn|=nⁿ, whence the probability that a particular function fromFnis chosen is1/nⁿ.

2.32 Definition Letfbe a function inFnwith domain and codomain equal to{1,2, . . . , n}.

Thefunctional graphoffis a directed graph whosepoints(orvertices) are the elements {1,2, . . . , n}and whoseedgesare the ordered pairs(x, f(x))for allx∈ {1,2, . . . , n}. 2.33 Example (functional graph) Consider the functionf:{1,2, . . . ,13} −→ {1,2, . . . ,13}

defined byf(1) = 4,f(2) = 11,f(3) = 1,f(4) = 6,f(5) = 3,f(6) = 9,f(7) = 3, f(8) = 11,f(9) = 1,f(10) = 2,f(11) = 10,f(12) = 4,f(13) = 7. The functional

graph offis shown in Figure 2.1.

As Figure 2.1 illustrates, a functional graph may have severalcomponents(maximal connected subgraphs), each component consisting of a directedcycleand some directed treesattached to the cycle.

2.34 Fact Asntends to infinity, the following statements regarding the functional digraph of a random functionffromFnare true:

(i) The expected number of components is¹₂lnn.

(8)

13

7

5 3

12 4

1 9

6 8

11 2

10

Figure 2.1:A functional graph (see Example 2.33).

(ii) The expected number of points which are on the cycles is πn/2.

(iii) The expected number ofterminal points(points which have no preimages) isn/e.

(iv) The expected number ofk-th iterate image points(xis ak-th iterate image point if x=f(f(· · ·f

ktimes

(y)· · ·))for somey) is(1−τ_k)n, where theτ_ksatisfy the recurrence τ₀= 0,τ_k+1=e^−1+τ^kfork≥0.

2.35 Definition Letfbe a random function from{1,2, . . . , n}to{1,2, . . . , n}and letu ∈ {1,2, . . . , n}. Consider the sequence of pointsu₀, u₁, u₂, . . . defined byu₀ = u,u_i = f(u_i−1)fori≥1. In terms of the functional graph off, this sequence describes a path that connects to a cycle.

(i) The number of edges in the path is called thetail lengthofu, denotedλ(u).

(ii) The number of edges in the cycle is called thecycle lengthofu, denotedμ(u).

(iii) Therho-lengthofuis the quantityρ(u) =λ(u) +μ(u).

(iv) Thetree sizeofuis the number of edges in the maximal tree rooted on a cycle in the component that containsu.

(v) Thecomponent sizeofuis the number of edges in the component that containsu.

(vi) Thepredecessors sizeofuis the number of iterated preimages ofu.

2.36 Example The functional graph in Figure 2.1 has2components and4terminal points. The pointu = 3has parametersλ(u) = 1,μ(u) = 4,ρ(u) = 5. The tree, component, and predecessors sizes ofu= 3are4,9, and3, respectively.

2.37 Fact Asntends to infinity, the following are the expectations of some parameters associ- ated with a random point in{1,2, . . . , n}and a random function fromFn: (i) tail length:

πn/8(ii) cycle length:

πn/8(iii) rho-length:

πn/2(iv) tree size:n/3(v) component size:2n/3(vi) predecessors size:

πn/8.

2.38 Fact Asntends to infinity, the expectations of the maximum tail, cycle, and rho lengths in a random function fromFnarec₁√

n,c₂√

n, andc₃√

n, respectively, wherec₁≈0.78248, c₂≈1.73746, andc₃≈2.4149.

Facts 2.37 and 2.38 indicate that in the functional graph of a random function, most points are grouped together in one giant component, and there is a small number of large trees. Also, almost unavoidably, a cycle of length about√

narises after following a path of length√

nedges.

(9)

2.2 Information theory

2.2.1 Entropy

LetXbe a random variable which takes on a finite set of valuesx₁, x₂, . . . , x_n, with prob- abilityP(X=x_i) =p_i, where0≤p_i≤1for eachi,1≤i≤n, and where_n

i=1p_i= 1.

Also, letY andZbe random variables which take on finite sets of values.

The entropy ofXis a mathematical measure of the amount of information provided by an observation ofX. Equivalently, it is the uncertainity about the outcome before an observation ofX. Entropy is also useful for approximating the average number of bits required to encode the elements ofX.

2.39 Definition TheentropyoruncertaintyofXis defined to beH(X) =−_n

i=1p_ilgp_i= _n

i=1p_ilg 1

pi

where, by convention,p_i·lgp_i=p_i·lg 1

pi

= 0ifp_i= 0.

2.40 Fact (properties of entropy) LetXbe a random variable which takes onnvalues.

(i) 0≤H(X)≤lgn.

(ii) H(X) = 0if and only ifp_i= 1for somei, andp_j= 0for allj=i(that is, there is no uncertainty of the outcome).

(iii) H(X) = lgnif and only ifp_i= 1/nfor eachi,1≤i≤n(that is, all outcomes are equally likely).

2.41 Definition Thejoint entropyofXandY is defined to be H(X, Y) =−

x,y

P(X=x, Y =y) lg(P(X=x, Y =y)),

where the summation indicesxandyrange over all values ofXandY, respectively. The definition can be extended to any number of random variables.

2.42 Fact IfXandY are random variables, thenH(X, Y)≤H(X) +H(Y), with equality if and only ifXandY are independent.

2.43 Definition IfX,Y are random variables, theconditional entropy ofXgivenY =yis H(X|Y =y) =−

x

P(X=x|Y =y) lg(P(X=x|Y =y)),

where the summation indexxranges over all values ofX. Theconditional entropy ofX givenY, also called theequivocation ofY aboutX, is

H(X|Y) =

y

P(Y =y)H(X|Y =y), where the summation indexyranges over all values ofY.

2.44 Fact (properties of conditional entropy) LetXandY be random variables.

(i) The quantityH(X|Y)measures the amount of uncertainty remaining aboutXafter Y has been observed.

(10)

(ii) H(X|Y)≥0andH(X|X) = 0.

(iii) H(X, Y) =H(X) +H(Y|X) =H(Y) +H(X|Y).

(iv) H(X|Y)≤H(X), with equality if and only ifXandY are independent.

2.2.2 Mutual information

2.45 Definition Themutual informationortransinformationof random variablesXandY is I(X;Y) =H(X)−H(X|Y). Similarly, the transinformation ofXand the pairY,Zis defined to beI(X;Y, Z) =H(X)−H(X|Y, Z).

2.46 Fact (properties of mutual transinformation)

(i) The quantityI(X;Y)can be thought of as the amount of information thatY reveals aboutX. Similarly, the quantityI(X;Y, Z)can be thought of as the amount of information thatY andZtogether reveal aboutX.

(ii) I(X;Y)≥0.

(iii) I(X;Y) = 0if and only ifXandY are independent (that is,Y contributes no information aboutX).

(iv) I(X;Y) =I(Y;X).

2.47 Definition Theconditional transinformationof the pairX,Y givenZis defined to be I_Z(X;Y) =H(X|Z)−H(X|Y, Z).

2.48 Fact (properties of conditional transinformation)

(i) The quantityI_Z(X;Y)can be interpreted as the amount of information thatY provides aboutX, given thatZhas already been observed.

(ii) I(X;Y, Z) =I(X;Y) +I_Y(X;Z).

(iii) I_Z(X;Y) =I_Z(Y;X).

2.3 Complexity theory

2.3.1 Basic definitions

The main goal of complexity theory is to provide mechanisms for classifying computational problems according to the resources needed to solve them. The classification should not depend on a particular computational model, but rather should measure the intrinsic dif- ficulty of the problem. The resources measured may include time, storage space, random bits, number of processors, etc., but typically the main focus is time, and sometimes space.

2.49 Definition Analgorithmis a well-defined computational procedure that takes a variable input and halts with an output.

(11)

Of course, the term “well-defined computational procedure” is not mathematically precise. It can be made so by using formal computational models such as Turing machines, random-access machines, or boolean circuits. Rather than get involved with the technical intricacies of these models, it is simpler to think of an algorithm as a computer program written in some specific programming language for a specific computer that takes a variable input and halts with an output.

It is usually of interest to find the most efficient (i.e., fastest) algorithm for solving a given computational problem. The time that an algorithm takes to halt depends on the “size”

of the problem instance. Also, the unit of time used should be made precise, especially when comparing the performance of two algorithms.

2.50 Definition Thesizeof the input is the total number of bits needed to represent the input in ordinary binary notation using an appropriate encoding scheme. Occasionally, the size of the input will be the number of items in the input.

2.51 Example (sizes of some objects)

(i) The number of bits in the binary representation of a positive integernis1 +lgn bits. For simplicity, the size ofnwill be approximated bylgn.

(ii) Iffis a polynomial of degree at mostk, each coefficient being a non-negative integer at mostn, then the size offis(k+ 1) lgnbits.

(iii) IfAis a matrix withrrows,scolumns, and with non-negative integer entries each

at mostn, then the size ofAisrslgnbits.

2.52 Definition Therunning timeof an algorithm on a particular input is the number of primitive operations or “steps” executed.

Often a step is taken to mean a bit operation. For some algorithms it will be more con- venient to take step to mean something else such as a comparison, a machine instruction, a machine clock cycle, a modular multiplication, etc.

2.53 Definition Theworst-case running timeof an algorithm is an upper bound on the running time for any input, expressed as a function of the input size.

2.54 Definition Theaverage-case running timeof an algorithm is the average running time over all inputs of a fixed size, expressed as a function of the input size.

2.3.2 Asymptotic notation

It is often difficult to derive the exact running time of an algorithm. In such situations one is forced to settle for approximations of the running time, and usually may only derive the asymptoticrunning time. That is, one studies how the running time of the algorithm increases as the size of the input increases without bound.

In what follows, the only functions considered are those which are defined on the positive integers and take on real values that are always positive from some point onwards. Let fandgbe two such functions.

2.55 Definition (order notation)

(i) (asymptotic upper bound)f(n) =O(g(n))if there exists a positive constantcand a positive integern₀such that0≤f(n)≤cg(n)for alln≥n₀.

(12)

(ii) (asymptotic lower bound)f(n) = Ω(g(n))if there exists a positive constantcand a positive integern₀such that0≤cg(n)≤f(n)for alln≥n₀.

(iii) (asymptotic tight bound)f(n) = Θ(g(n))if there exist positive constantsc₁andc₂, and a positive integern₀such thatc₁g(n)≤f(n)≤c₂g(n)for alln≥n₀. (iv) (o-notation)f(n) =o(g(n))if for any positive constantc >0there exists a constant

n₀>0such that0≤f(n)< cg(n)for alln≥n₀.

Intuitively,f(n) =O(g(n))means thatfgrows no faster asymptotically thang(n)to within a constant multiple, whilef(n) = Ω(g(n))means thatf(n)grows at least as fast asymptotically asg(n)to within a constant multiple.f(n) =o(g(n))means thatg(n)is an upper bound forf(n)that is not asymptotically tight, or in other words, the functionf(n) becomes insignificant relative tog(n)asngets larger. The expressiono(1)is often used to signify a functionf(n)whose limit asnapproaches∞is0.

2.56 Fact (properties of order notation) For any functionsf(n),g(n),h(n), andl(n), the following are true.

(i) f(n) =O(g(n))if and only ifg(n) = Ω(f(n)).

(ii) f(n) = Θ(g(n))if and only iff(n) =O(g(n))andf(n) = Ω(g(n)).

(iii) Iff(n) =O(h(n))andg(n) =O(h(n)), then(f+g)(n) =O(h(n)).

(iv) Iff(n) =O(h(n))andg(n) =O(l(n)), then(f·g)(n) =O(h(n)l(n)).

(v) (reflexivity)f(n) =O(f(n)).

(vi) (transitivity) Iff(n) =O(g(n))andg(n) =O(h(n)), thenf(n) =O(h(n)).

2.57 Fact (approximations of some commonly occurring functions)

(i) (polynomial function) Iff(n)is a polynomial of degreekwith positive leading term, thenf(n) = Θ(n^k).

(ii) For any constantc >0,log_cn= Θ(lgn).

(iii) (Stirling’s formula) For all integersn≥1,

√2πn n

e _n

≤ n! ≤ √ 2πn

n e

n+(1/(12n))

. Thusn! =√

2πn_n

e

_n

1 + Θ(_n¹)

. Also,n! =o(nⁿ)andn! = Ω(2ⁿ).

(iv) lg(n!) = Θ(nlgn).

2.58 Example (comparative growth rates of some functions) Letandcbe arbitrary constants with0< <1< c. The following functions are listed in increasing order of their asymptotic growth rates:

1<ln lnn <lnn <exp(√

lnnln lnn)< n< n^c< n^lnⁿ< cⁿ< nⁿ< c^cⁿ.

2.3.3 Complexity classes

2.59 Definition Apolynomial-time algorithmis an algorithm whose worst-case running time function is of the formO(n^k), wherenis the input size andkis a constant. Any algorithm whose running time cannot be so bounded is called anexponential-time algorithm.

Roughly speaking, polynomial-time algorithms can be equated withgoodorefficient algorithms, while exponential-time algorithms are consideredinefficient. There are, however, some practical situations when this distinction is not appropriate. When considering polynomial-time complexity, the degree of the polynomial is significant. For example, even

(13)

though an algorithm with a running time ofO(n^{ln ln}ⁿ),nbeing the input size, is asymptotically slower that an algorithm with a running time ofO(n¹⁰⁰), the former algorithm may be faster in practice for smaller values ofn, especially if the constants hidden by the big-O notation are smaller. Furthermore, in cryptography, average-case complexity is more important than worst-case complexity — a necessary condition for an encryption scheme to be considered secure is that the corresponding cryptanalysis problem is difficult on average (or more precisely, almost always difficult), and not just for some isolated cases.

2.60 Definition Asubexponential-time algorithmis an algorithm whose worst-case running time function is of the forme^o(n), wherenis the input size.

A subexponential-time algorithm is asymptotically faster than an algorithm whose running time is fully exponential in the input size, while it is asymptotically slower than a polynomial-time algorithm.

2.61 Example (subexponential running time) LetAbe an algorithm whose inputs are either elements of a finite fieldFq(see§2.6), or an integerq. If the expected running time ofAis of the form

L_q[α, c] =O exp

(c+o(1))(lnq)^α(ln lnq)^1−α

, (2.3)

wherecis a positive constant, andαis a constant satisfying0 < α < 1, thenAis a subexponential-time algorithm. Observe that forα = 0,L_q[0, c]is a polynomial inlnq, while forα= 1,L_q[1, c]is a polynomial inq, and thus fully exponential inlnq.

For simplicity, the theory of computational complexity restricts its attention todeci- sion problems, i.e., problems which have either YES or NO as an answer. This is not too restrictive in practice, as all the computational problems that will be encountered here can be phrased as decision problems in such a way that an efficient algorithm for the decision problem yields an efficient algorithm for the computational problem, and vice versa.

2.62 Definition The complexity classPis the set of all decision problems that are solvable in polynomial time.

2.63 Definition The complexity classNPis the set of all decision problems for which a YES answer can be verified in polynomial time given some extra information, called acertificate.

2.64 Definition The complexity classco-NPis the set of all decision problems for which a NO answer can be verified in polynomial time using an appropriate certificate.

It must be emphasized that if a decision problem is inNP, it may not be the case that the certificate of a YES answer can be easily obtained; what is asserted is that such a certificate does exist, and, if known, can be used to efficiently verify the YES answer. The same is true of the NO answers for problems inco-NP.

2.65 Example (problem inNP) Consider the following decision problem:

COMPOSITES

INSTANCE: A positive integern.

QUESTION: Isncomposite? That is, are there integersa, b >1such thatn=ab?

COMPOSITES belongs toNPbecause if an integernis composite, then this fact can be verified in polynomial time if one is given a divisoraofn, where1< a < n(the certificate in this case consists of the divisora). It is in fact also the case that COMPOSITES belongs toco-NP. It is still unknown whether or not COMPOSITES belongs toP.

(14)

2.66 Fact P⊆NPandP⊆co-NP.

The following are among the outstanding unresolved questions in the subject of complexity theory:

1. IsP=NP?

2. IsNP=co-NP?

3. IsP=NP∩co-NP?

Most experts are of the opinion that the answer to each of the three questions is NO, although nothing along these lines has been proven.

The notion of reducibility is useful when comparing the relative difficulties of problems.

2.67 Definition LetL₁andL₂be two decision problems.L₁is said topolytime reducetoL₂, writtenL₁≤P L₂, if there is an algorithm that solvesL₁which uses, as a subroutine, an algorithm for solvingL₂, and which runs in polynomial time if the algorithm forL₂does.

Informally, ifL₁≤P L₂, thenL₂is at least as difficult asL₁, or, equivalently,L₁is no harder thanL₂.

2.68 Definition LetL₁andL₂be two decision problems. IfL₁≤P L₂andL₂≤P L₁, then L₁andL₂are said to becomputationally equivalent.

2.69 Fact LetL₁,L₂, andL₃be three decision problems.

(i) (transitivity) IfL₁≤P L₂andL₂≤P L₃, thenL₁≤P L₃. (ii) IfL₁≤P L₂andL₂∈P, thenL₁∈P.

2.70 Definition A decision problemLis said to beNP-completeif (i) L∈NP, and

(ii) L₁≤P Lfor everyL₁∈NP.

The class of allNP-complete problems is denoted byNPC.

NP-complete problems are the hardest problems inNPin the sense that they are at least as difficult as every other problem inNP. There are thousands of problems drawn from diverse fields such as combinatorics, number theory, and logic, that are known to beNP- complete.

2.71 Example (subset sum problem) Thesubset sum problemis the following: given a set of positive integers{a₁, a₂, . . . , a_n}and a positive integers, determine whether or not there is a subset of thea_ithat sum tos. The subset sum problem isNP-complete.

2.72 Fact LetL₁andL₂be two decision problems.

(i) IfL₁isNP-complete andL₁∈P, thenP=NP.

(ii) IfL₁∈NP,L₂isNP-complete, andL₂≤P L₁, thenL₁is alsoNP-complete.

(iii) IfL₁isNP-complete andL₁∈co-NP, thenNP=co-NP.

By Fact 2.72(i), if a polynomial-time algorithm is found for any singleNP-complete problem, then it is the case thatP=NP, a result that would be extremely surprising. Hence, a proof that a problem isNP-complete provides strong evidence for its intractability. Fig- ure 2.2 illustrates what is widely believed to be the relationship between the complexity classesP,NP,co-NP, andNPC.

Fact 2.72(ii) suggests the following procedure for proving that a decision problemL₁ isNP-complete:

(15)

co-NP

NPC NP P

NP∩co-NP

Figure 2.2:Conjectured relationship between the complexity classesP,NP,co-NP, andNPC.

1. Prove thatL₁∈NP.

2. Select a problemL₂that is known to beNP-complete.

3. Prove thatL₂≤P L₁.

2.73 Definition A problem isNP-hardif there exists someNP-complete problem that polytime reduces to it.

Note that theNP-hard classification is not restricted to only decision problems. Ob- serve also that anNP-complete problem is alsoNP-hard.

2.74 Example (NP-hard problem) Given positive integersa₁, a₂, . . . , a_nand a positive integers, the computational version of the subset sum problem would ask to actually find a subset of thea_iwhich sums tos, provided that such a subset exists. This problem isNP-

hard.

2.3.4 Randomized algorithms

The algorithms studied so far in this section have beendeterministic; such algorithms fol- low the same execution path (sequence of operations) each time they execute with the same input. By contrast, arandomizedalgorithm makes random decisions at certain points in the execution; hence their execution paths may differ each time they are invoked with the same input. The random decisions are based upon the outcome of a random number generator. Remarkably, there are many problems for which randomized algorithms are known that are more efficient, both in terms of time and space, than the best known deterministic algorithms.

Randomized algorithms for decision problems can be classified according to the probability that they return the correct answer.

2.75 Definition LetAbe a randomized algorithm for a decision problemL, and letIdenote an arbitrary instance ofL.

(i) Ahas0-sided errorifP(Aoutputs YES|I’s answer is YES) = 1, and P(Aoutputs YES|I’s answer is NO) = 0.

(ii) Ahas1-sided errorifP(Aoutputs YES|I’s answer is YES)≥ ¹₂, and P(Aoutputs YES|I’s answer is NO) = 0.

(16)

(iii) Ahas2-sided errorifP(Aoutputs YES|I’s answer is YES)≥ ²₃, and P(Aoutputs YES|I’s answer is NO)≤ ¹₃.

The number ¹₂in the definition of 1-sided error is somewhat arbitrary and can be replaced by any positive constant. Similarly, the numbers²₃and¹₃in the definition of 2-sided error, can be replaced by¹₂+and ¹₂−, respectively, for any constant,0< < ¹₂. 2.76 Definition Theexpected running timeof a randomized algorithm is an upper bound on the

expected running time for each input (the expectation being over all outputs of the random number generator used by the algorithm), expressed as a function of the input size.

The important randomized complexity classes are defined next.

2.77 Definition (randomized complexity classes)

(i) The complexity classZPP(“zero-sided probabilistic polynomial time”) is the set of all decision problems for which there is a randomized algorithm with 0-sided error which runs in expected polynomial time.

(ii) The complexity classRP(“randomized polynomial time”) is the set of all decision problems for which there is a randomized algorithm with 1-sided error which runs in (worst-case) polynomial time.

(iii) The complexity classBPP(“bounded error probabilistic polynomial time”) is the set of all decision problems for which there is a randomized algorithm with 2-sided error which runs in (worst-case) polynomial time.

2.78 Fact P⊆ZPP⊆RP⊆BPPandRP⊆NP.

2.4 Number theory

2.4.1 The integers

The set of integers{. . . ,−3,−2,−1,0,1,2,3, . . .}is denoted by the symbolZ.

2.79 Definition Leta,bbe integers. Thenadividesb(equivalently:ais adivisorofb, orais afactorofb) if there exists an integercsuch thatb=ac. Ifadividesb, then this is denoted bya|b.

2.80 Example (i)−3|18, since18 = (−3)(−6). (ii)173|0, since0 = (173)(0).

The following are some elementary properties of divisibility.

2.81 Fact (properties of divisibility) For alla,b,c∈Z, the following are true:

(i) a|a.

(ii) Ifa|bandb|c, thena|c.

(iii) Ifa|banda|c, thena|(bx+cy)for allx, y∈Z.

(iv) Ifa|bandb|a, thena=±b.

(17)

2.82 Definition (division algorithm for integers) Ifaandbare integers withb ≥ 1, then ordinary long division ofabybyields integersq(thequotient) andr(theremainder) such that

a=qb+r, where0≤r < b.

Moreover,qandrare unique. The remainder of the division is denotedamodb, and the quotient is denotedadivb.

2.83 Fact Leta, b∈Zwithb= 0. Thenadivb=a/bandamodb=a−ba/b.

2.84 Example Ifa = 73,b = 17, thenq = 4andr = 5. Hence73 mod 17 = 5and

73 div 17 = 4.

2.85 Definition An integercis acommon divisorofaandbifc|aandc|b.

2.86 Definition A non-negative integerdis thegreatest common divisorof integersaandb, denotedd= gcd(a, b), if

(i) dis a common divisor ofaandb; and (ii) wheneverc|aandc|b, thenc|d.

Equivalently,gcd(a, b)is the largest positive integer that divides bothaandb, with the exception thatgcd(0,0) = 0.

2.87 Example The common divisors of12and18are{±1,±2,±3,±6}, andgcd(12,18) = 6.

2.88 Definition A non-negative integerdis theleast common multipleof integersaandb, de-

notedd= lcm(a, b), if (i) a|dandb|d; and

(ii) whenevera|candb|c, thend|c.

Equivalently,lcm(a, b)is the smallest non-negative integer divisible by bothaandb.

2.89 Fact Ifaandbare positive integers, thenlcm(a, b) =a·b/gcd(a, b).

2.90 Example Sincegcd(12,18) = 6, it follows thatlcm(12,18) = 12·18/6 = 36.

2.91 Definition Two integersaandbare said to berelatively primeorcoprimeifgcd(a, b) = 1.

2.92 Definition An integerp≥2is said to beprimeif its only positive divisors are 1 andp.

Otherwise,pis calledcomposite.

The following are some well known facts about prime numbers.

2.93 Fact Ifpis prime andp|ab, then eitherp|aorp|b(or both).

2.94 Fact There are an infinite number of prime numbers.

2.95 Fact (prime number theorem) Letπ(x)denote the number of prime numbers≤x. Then

x→∞lim π(x) x/lnx= 1.

(18)

This means that for large values ofx, π(x) is closely approximated by the expres- sionx/lnx. For instance, whenx= 10¹⁰,π(x) = 455,052,511, whereasx/lnx = 434,294,481. A more explicit estimate forπ(x)is given below.

2.96 Fact Letπ(x)denote the number of primes≤x. Then forx≥17 π(x)> x

lnx and forx >1

π(x)<1.25506 x lnx.

2.97 Fact (fundamental theorem of arithmetic) Every integern ≥ 2has a factorization as a product of prime powers:

n=pê₁¹pê₂²· · ·pê_k^k,

where thep_iare distinct primes, and thee_iare positive integers. Furthermore, the factorization is unique up to rearrangement of factors.

2.98 Fact Ifa=pê₁¹pê₂²· · ·pê_k^k,b=p^f₁¹p^f₂²· · ·p^f_k^k, where eache_i≥0andf_i≥0, then gcd(a, b) =p^min(e₁ ¹^,f¹⁾p^min(e₂ ²^,f²⁾· · ·p^min(e_k ^k^,f^k⁾

and

lcm(a, b) =p^max(e₁ ¹^,f¹⁾p^max(e₂ ²^,f²⁾· · ·p^max(e_k ^k^,f^k⁾.

2.99 Example Leta= 4864 = 2⁸·19,b= 3458 = 2·7·13·19. Thengcd(4864,3458) = 2·19 = 38andlcm(4864,3458) = 2⁸·7·13·19 = 442624.

2.100 Definition Forn≥1, letφ(n)denote the number of integers in the interval[1, n]which are relatively prime ton. The functionφis called theEuler phi function(or theEuler totient function).

2.101 Fact (properties of Euler phi function) (i) Ifpis a prime, thenφ(p) =p−1.

(ii) The Euler phi function ismultiplicative. That is, ifgcd(m, n) = 1, thenφ(mn) = φ(m)·φ(n).

(iii) Ifn=pê₁¹pê₂²· · ·pê_k^kis the prime factorization ofn, then φ(n) =n

1− 1

p₁ 1− 1 p₂

· · ·

1− 1 p_k

. Fact 2.102 gives an explicit lower bound forφ(n).

2.102 Fact For all integersn≥5,

φ(n)> n 6 ln lnn.

(19)

2.4.2 Algorithms in Z

Letaandbbe non-negative integers, each less than or equal ton. Recall (Example 2.51) that the number of bits in the binary representation ofnislgn+ 1, and this number is approximated bylgn. The number of bit operations for the four basic integer operations of addition, subtraction, multiplication, and division using the classical algorithms is summa- rized in Table 2.1. These algorithms are studied in more detail in§14.2. More sophisticated techniques for multiplication and division have smaller complexities.

Operation Bit complexity

Addition a+b O(lga+ lgb) =O(lgn) Subtraction a−b O(lga+ lgb) =O(lgn) Multiplication a·b O((lga)(lgb)) =O((lgn)²) Division a=qb+r O((lgq)(lgb)) =O((lgn)²)

Table 2.1:Bit complexity of basic operations inZ.

The greatest common divisor of two integersaandbcan be computed via Fact 2.98.

However, computing a gcd by first obtaining prime-power factorizations does not result in an efficient algorithm, as the problem of factoring integers appears to be relatively difficult. The Euclidean algorithm (Algorithm 2.104) is an efficient algorithm for computing the greatest common divisor of two integers that does not require the factorization of the integers. It is based on the following simple fact.

2.103 Fact Ifaandbare positive integers witha > b, thengcd(a, b) = gcd(b, amodb).

2.104 AlgorithmEuclidean algorithm for computing the greatest common divisor of two integers INPUT: two non-negative integersaandbwitha≥b.

OUTPUT: the greatest common divisor ofaandb.

1. Whileb= 0do the following:

1.1 Setr←amodb, a←b, b←r.

2. Return(a).

2.105 Fact Algorithm 2.104 has a running time ofO((lgn)²)bit operations.

2.106 Example (Euclidean algorithm) The following are the division steps of Algorithm 2.104 for computinggcd(4864,3458) = 38:

4864 = 1·3458 + 1406 3458 = 2·1406 + 646 1406 = 2·646 + 114

646 = 5·114 + 76 114 = 1·76 + 38

76 = 2·38 + 0.

(20)

The Euclidean algorithm can be extended so that it not only yields the greatest common divisordof two integersaandb, but also integersxandysatisfyingax+by=d.

2.107 AlgorithmExtended Euclidean algorithm

INPUT: two non-negative integersaandbwitha≥b.

OUTPUT:d= gcd(a, b)and integersx,ysatisfyingax+by=d.

1. Ifb= 0then setd←a, x←1, y←0, and return(d,x,y).

2. Setx₂←1, x₁←0, y₂←0, y₁←1.

3. Whileb >0do the following:

3.1 q←a/b, r←a−qb, x←x₂−qx₁, y←y₂−qy₁. 3.2 a←b, b←r, x₂←x₁, x₁←x, y₂←y₁, and y₁←y.

4. Setd←a, x←x₂, y←y₂, and return(d,x,y).

2.108 Fact Algorithm 2.107 has a running time ofO((lgn)²)bit operations.

2.109 Example (extended Euclidean algorithm) Table 2.2 shows the steps of Algorithm 2.107 with inputsa = 4864andb = 3458. Hencegcd(4864,3458) = 38and(4864)(32) +

(3458)(−45) = 38.

q r x y a b x2 x1 y2 y1

− − − − 4864 3458 1 0 0 1

1 1406 1 −1 3458 1406 0 1 1 −1

2 646 −2 3 1406 646 1 −2 −1 3

2 114 5 −7 646 114 −2 5 3 −7

5 76 −27 38 114 76 5 −27 −7 38

1 38 32 −45 76 38 −27 32 38 −45

2 0 −91 128 38 0 32 −91 −45 128

Table 2.2:Extended Euclidean algorithm (Algorithm 2.107) with inputsa= 4864,b= 3458. Efficient algorithms for gcd and extended gcd computations are further studied in§14.4.

2.4.3 The integers modulo n

Letnbe a positive integer.

2.110 Definition Ifaandbare integers, thenais said to becongruent tobmodulon, written a≡b (modn), ifndivides(a−b). The integernis called themodulusof the congruence.

2.111 Example (i)24≡9 (mod 5)since24−9 = 3·5.

(ii)−11≡17 (mod 7)since−11−17 =−4·7.

2.112 Fact (properties of congruences) For alla,a₁,b,b₁,c∈Z, the following are true.

(i) a≡b (modn)if and only ifaandbleave the same remainder when divided byn.

(ii) (reflexivity)a≡a (modn).

(iii) (symmetry) Ifa≡b (modn)thenb≡a (modn).

(21)

(iv) (transitivity) Ifa≡b (modn)andb≡c (modn), thena≡c (modn).

(v) Ifa ≡ a₁ (modn)andb ≡ b₁ (modn), thena+b ≡ a₁+b₁ (modn)and ab≡a₁b₁ (modn).

Theequivalence classof an integerais the set of all integers congruent toamodulo n. From properties (ii), (iii), and (iv) above, it can be seen that for a fixednthe relation of congruence modulonpartitionsZinto equivalence classes. Now, ifa =qn+r, where 0≤r < n, thena≡r (modn). Hence each integerais congruent modulonto a unique integer between0andn−1, called theleast residueofamodulon. Thusaandrare in the same equivalence class, and sormay simply be used to represent this equivalence class.

2.113 Definition Theintegers modulon, denotedZn, is the set of (equivalence classes of) integers{0,1,2, . . . , n−1}. Addition, subtraction, and multiplication inZnare performed modulon.

2.114 Example Z25 = {0,1,2, . . . ,24}. InZ25,13 + 16 = 4, since13 + 16 = 29 ≡ 4

(mod 25). Similarly,13·16 = 8inZ25.

2.115 Definition Leta∈ Zn. Themultiplicative inverseofamodulonis an integerx ∈Zn

such thatax≡1 (modn). If such anxexists, then it is unique, andais said to beinvert- ible, or aunit; the inverse ofais denoted bya⁻¹.

2.116 Definition Leta, b∈Zn.Divisionofabybmodulonis the product ofaandb⁻¹modulo n, and is only defined ifbis invertible modulon.

2.117 Fact Leta∈Zn. Thenais invertible if and only ifgcd(a, n) = 1.

2.118 Example The invertible elements inZ9are1,2,4,5,7, and8. For example,4⁻¹ = 7

because4·7≡1 (mod 9).

The following is a generalization of Fact 2.117.

2.119 Fact Letd = gcd(a, n). The congruence equationax≡b (modn)has a solutionxif and only ifddividesb, in which case there are exactlydsolutions between0andn−1;

these solutions are all congruent modulon/d.

2.120 Fact (Chinese remainder theorem, CRT) If the integersn₁, n₂, . . . , n_kare pairwise relatively prime, then the system of simultaneous congruences

x ≡ a₁ (modn₁) x ≡ a₂ (modn₂)

...

x ≡ a_k (modn_k) has a unique solution modulon=n₁n₂· · ·n_k.

2.121 Algorithm (Gauss’s algorithm) The solutionxto the simultaneous congruences in the Chinese remainder theorem (Fact 2.120) may be computed asx=_k

i=1a_iN_iM_imodn, whereN_i = n/n_iandM_i = N_i⁻¹modn_i. These computations can be performed in O((lgn)²)bit operations.

(22)

Another efficient practical algorithm for solving simultaneous congruences in the Chinese remainder theorem is presented in§14.5.

2.122 Example The pair of congruencesx≡3 (mod 7),x≡7 (mod 13)has a unique solu-

tionx≡59 (mod 91).

2.123 Fact Ifgcd(n₁, n₂) = 1, then the pair of congruencesx≡a (modn₁),x≡a (modn₂) has a unique solutionx≡a (modn₁n₂).

2.124 Definition Themultiplicative groupofZnisZ^∗_n = {a ∈ Zn | gcd(a, n) = 1}.In particular, ifnis a prime, thenZ^∗_n={a|1≤a≤n−1}.

2.125 Definition TheorderofZ^∗_nis defined to be the number of elements inZ^∗_n, namely|Z^∗_n|.

It follows from the definition of the Euler phi function (Definition 2.100) that|Z^∗_n|= φ(n). Note also that ifa ∈ Z^∗_nandb∈ Z^∗_n, thena·b∈ Z^∗_n, and soZ^∗_nis closed under multiplication.

2.126 Fact Letn≥2be an integer.

(i) (Euler’s theorem) Ifa∈Z^∗_n, thena^φ(n)≡1 (modn).

(ii) Ifnis a product of distinct primes, and ifr≡s (modφ(n)), thena^r≡a^s (modn) for all integersa. In other words, when working modulo such ann, exponents can be reduced moduloφ(n).

A special case of Euler’s theorem is Fermat’s (little) theorem.

2.127 Fact Letpbe a prime.

(i) (Fermat’s theorem) Ifgcd(a, p) = 1, thena^p−1≡1 (modp).

(ii) Ifr ≡ s (modp−1), thena^r ≡ a^s (modp)for all integersa. In other words, when working modulo a primep, exponents can be reduced modulop−1.

(iii) In particular,a^p≡a (modp)for all integersa.

2.128 Definition Leta∈Z^∗_n. Theorderofa, denotedord(a), is the least positive integertsuch thata^t≡1 (modn).

2.129 Fact If the order ofa ∈ Z^∗_nist, anda^s ≡ 1 (modn), thentdividess. In particular, t|φ(n).

2.130 Example Letn = 21. ThenZ^∗₂₁ = {1,2,4,5,8,10,11,13,16,17,19,20}. Note that φ(21) =φ(7)φ(3) = 12 =|Z^∗₂₁|. The orders of elements inZ^∗₂₁are listed in Table 2.3.

a∈Z^∗₂₁ 1 2 4 5 8 10 11 13 16 17 19 20

order ofa 1 6 3 6 2 6 6 2 3 6 6 2

Table 2.3:Orders of elements inZ^∗21.

2.131 Definition Letα ∈ Z^∗_n. If the order ofαisφ(n), thenαis said to be ageneratoror a primitive elementofZ^∗_n. IfZ^∗_nhas a generator, thenZ^∗_nis said to becyclic.

(23)

2.132 Fact (properties of generators ofZ^∗_n)

(i) Z^∗_nhas a generator if and only ifn = 2,4, p^kor2p^k, wherepis an odd prime and k≥1. In particular, ifpis a prime, thenZ^∗_phas a generator.

(ii) Ifαis a generator ofZ^∗_n, thenZ^∗_n={αⁱmodn|0≤i≤φ(n)−1}.

(iii) Suppose thatαis a generator ofZ^∗_n. Thenb=αⁱmodnis also a generator ofZ^∗_n if and only ifgcd(i, φ(n)) = 1. It follows that ifZ^∗_nis cyclic, then the number of generators isφ(φ(n)).

(iv) α ∈ Z^∗_nis a generator ofZ^∗_nif and only ifα^φ(n)/p ≡ 1 (modn)for each prime divisorpofφ(n).

2.133 Example Z^∗₂₁is not cyclic since it does not contain an element of orderφ(21) = 12(see Table 2.3); note that21does not satisfy the condition of Fact 2.132(i). On the other hand,

Z^∗₂₅is cyclic, and has a generatorα= 2.

2.134 Definition Leta∈Z^∗_n.ais said to be aquadratic residuemodulon, or asquaremodulo n, if there exists anx∈Z^∗_nsuch thatx²≡a (modn). If no suchxexists, thenais called aquadratic non-residuemodulon. The set of all quadratic residues modulonis denoted byQ_nand the set of all quadratic non-residues is denoted byQ_n.

Note that by definition0∈Z^∗_n, whence0∈Q_nand0∈Q_n.

2.135 Fact Letpbe an odd prime and letαbe a generator ofZ^∗_p. Thena ∈ Z^∗_pis a quadratic residue modulopif and only ifa=αⁱmodp, whereiis an even integer. It follows that

|Q_p|= (p−1)/2and|Q_p|= (p−1)/2; that is, half of the elements inZ^∗_pare quadratic residues and the other half are quadratic non-residues.

2.136 Example α= 6is a generator ofZ^∗₁₃. The powers ofαare listed in the following table.

i 0 1 2 3 4 5 6 7 8 9 10 11

αⁱmod 13 1 6 10 8 9 2 12 7 3 5 4 11

HenceQ₁₃={1,3,4,9,10,12}andQ₁₃={2,5,6,7,8,11}. 2.137 Fact Letnbe a product of two distinct odd primespandq,n =pq. Thena ∈ Z^∗_nis a quadratic residue modulonif and only ifa ∈ Q_panda ∈ Q_q. It follows that|Q_n| =

|Q_p| · |Q_q|= (p−1)(q−1)/4and|Q_n|= 3(p−1)(q−1)/4.

2.138 Example Letn= 21. ThenQ₂₁={1,4,16}andQ₂₁={2,5,8,10,11,13,17,19,20}. 2.139 Definition Leta∈Q_n. Ifx∈Z^∗_nsatisfiesx² ≡a (modn), thenxis called asquare

rootofamodulon.

2.140 Fact (number of square roots)

(i) Ifpis an odd prime anda∈Q_p, thenahas exactly two square roots modulop.

(ii) More generally, letn=pê₁¹pê₂²· · ·pê_k^kwhere thep_iare distinct odd primes ande_i≥ 1. Ifa∈Q_n, thenahas precisely2^kdistinct square roots modulon.

2.141 Example The square roots of12modulo37are7and30. The square roots of121modulo 315are11,74,101,151,164,214,241, and304.