P ROCESSING THE C LAUSES IN A S ELECT B LOCK

Table Expressions, and Subqueries

6.3 P ROCESSING THE C LAUSES IN A S ELECT B LOCK

Each select block consists of clauses, such as the ^SELECT,^FROM, and ^{ORDER BY}clauses.

This section explains through examples how the different clauses from a select block are processed. In other words, we show the steps MySQL performs to come to the desired result. Other examples clearly show what the job of each clause is.

In all these examples, the select block forms the entire table expression and the entire^SELECTstatement.

Later chapters discuss each clause in detail.

Example 6.1: Find the player number for each player who has incurred at least two penalties of more than $25; order the result by player number (smallest number first).

SELECT PLAYERNO FROM PENALTIES WHERE AMOUNT > 25 GROUP BY PLAYERNO HAVING COUNT(*) > 1 ORDER BY PLAYERNO

Figure 6.3 shows the order in which MySQL processes the different clauses.

You likely noticed immediately that this order differs from the order in which the clauses were entered in the select block (and, therefore, the ^SELECTstatement). Be careful never to confuse these two.

Explanation: Processing each clause results in one(intermediate result) tablethat consists of zero or more rows and one or more columns.This automatically means that every clause, barring the first, has one table of zero or more rows and one or more columns as its input. The first clause, the ^FROMclause, retrieves data from the database and has as its input one or more tablesfrom the database. Those tables that still have to be processed by a subsequent clause are called intermediate results.

SQL does not show the user any of the intermediate results; it presents the state-ment as a single large process. The only table the end user sees is the final result table.

MySQL developers may determine themselves how their product will process the

SELECTstatements internally. They can switch the order of the clauses or combine the processing of clauses. In fact, they can do whatever they want, as long as the final result of the query is equal to the result they would get if the statement were processed according to the method just described.

FIGURE 6.3 The clauses of the ^SELECTstatement

start

FROM:

specifies the tables that are queried

WHERE:

selects rows that satisfy the condition

GROUP BY:

groups rows on basis of equal values in columns

HAVING:

selects groups that satisfy the condition

ORDER BY:

sorts rows on basis of columns

SELECT:

selects columns

LIMIT:

removes rows on basis of their order number

end result

Chapter 25, “Using Indexes,” examines how MySQL actually processes state-ments. The method of processing described here is extremely useful if you want to determine the end result of a ^SELECTstatement “by hand.”

Let’s examine the clauses for the given example one by one.

Only the PENALTIES table is named in the ^FROM clause. For MySQL, this means that it works with this table. The intermediate result of this clause is an exact copy of the PENALTIES table:

PAYMENTNO PLAYERNO PAYMENT_DATE AMOUNT --- --

The^WHEREclause specifies AMOUNT > 25as a condition. All rows in which the value in the AMOUNT column is greater than ²⁵ satisfy the condition. Therefore, the rows with payment numbers ⁵ and⁶ are discarded, while the remaining rows form the intermediate result table from the ^WHEREclause:

PAYMENTNO PLAYERNO PAYMENT_DATE AMOUNT --- --

The^{GROUP BY}clause groups the rows in the intermediate result table. The data is divided into groups on the basis of the values in the PLAYERNO column (^GROUP

BY PLAYERNO). Rows are grouped if they contain equal values in the relevant col-umn. For example, the rows with payment numbers ²and⁷form one group because the PLAYERNO column has the value of ⁴⁴in both rows.

This is the intermediate result (the column name PLAYERNO has been short-ened to PNO to conserve some space):

PAYMENTNO PNO PAYMENT_DATE AMOUNT

Explanation: Thus, for all but the PLAYERNO column, more than one value can exist in one row. For example, the PAYMENTNO column contains two values in the second and third rows. This is not as strange as it might seem because the data is grouped and each row actually forms a group of rows. A single value for each row of the intermediate table is found only in the PLAYERNO column because this is the column by which the result is grouped. For the sake of clarity, the groups with val-ues have been enclosed by brackets.

In some ways, you can compare this fourth clause, this ^HAVING clause, with the

WHEREclause. The difference is that the ^WHEREclause acts on the intermediate table from the ^FROMclause and the ^HAVINGclause acts on the grouped intermediate result table from the ^{GROUP BY}clause. The effect is the same; in the ^HAVINGclause, rows are also selected with the help of a condition. In this case, the condition is as follows:

COUNT(*) > 1

This means that all (grouped) rows made up of more than one row must satisfy the condition. Chapter 10, “SELECT Statement: The GROUP BY Clause,” looks at this condition in detail.

The intermediate result is:

PAYMENTNO PNO PAYMENT_DATE AMOUNT

--- --- --- ---{2, 7} 44 {1981-05-05, 1982-12-30} {75.00, 30.00}

{3, 8} 27 {1983-09-10, 1984-11-12} {100.00, 75.00}

The^{ORDER BY} clause has no impact on the contents of the intermediate result, but it sorts the final remaining rows. In this example, the result is sorted on PLAYERNO.

The intermediate result is:

PAYMENTNO PNO PAYMENT_DATE AMOUNT

--- --- --- ---{3, 8} 27 {1983-09-10, 1984-11-12} {100.00, 75.00}

{2, 7} 44 {1981-05-05, 1982-12-30} {75.00, 30.00}

The ^SELECT clause, which is the last clause in this example, specifies which columns must be present in the final result. In other words, the ^SELECT clause selects columns.

The end user sees this end result:

PLAYERNO ---27 44

Example 6.2: Get the player number and the league number of each player resi-dent in Stratford; order the result by league number.

SELECT PLAYERNO, LEAGUENO FROM PLAYERS

WHERE TOWN = 'Stratford' ORDER BY LEAGUENO

The intermediate result after the ^FROMclause is:

PLAYERNO NAME ... LEAGUENO

The intermediate result after the ^WHEREclause is:

PLAYERNO NAME ... LEAGUENO

No^{GROUP BY}clause exists; therefore, the intermediate result remains unchanged.

In addition, no ^HAVING clause exists, so again, the intermediate result remains unchanged.

The intermediate result after the ^{ORDER BY}clause is:

PLAYERNO NAME ... LEAGUENO - ---

---7 Wise ... ? 39 Bishop ... ? 83 Hope ... 1608

2 Everett ... 2411 57 Brown ... 6409 100 Parmenter ... 6524 6 Parmenter ... 8467

Note that the null values are presented first if the result is sorted. Chapter 12,

“SELECT Statement: The ORDER BY Clause,” describes this in greater detail.

The^SELECT clause asks for the PLAYERNO and LEAGUENO columns. This gives the following final result:

PLAYERNO LEAGUENO

---7 ? 39 ? 83 1608

2 2411 57 6409 100 6524 6 8467

The smallest ^SELECTstatement that can be specified consists of one select block with just a ^SELECTclause.

Example 6.3: How much is 89 ×73?

SELECT 89 * 73

The result is:

89 * 73 ---6497

The processing of this statement is simple. If no ^FROMclause is specified, the statement returns a result consisting of one row. This row contains just as many val-ues as there are expressions. In this example, that is also just one.

Exercise 6.7: For the following ^SELECT statement, determine the intermediate result table after each clause has been processed; give the final result as well.

SELECT PLAYERNO FROM PENALTIES

WHERE PAYMENT_DATE > '1980-12-08' GROUP BY PLAYERNO

HAVING COUNT(*) > 1 ORDER BY PLAYERNO

Dans le document SQL for MySQL Developers (Page 177-183)