Priority Queues - MATHEMATICAL ALGORITHMS

MATHEMATICAL ALGORITHMS

11. Priority Queues

In many applications, records with keys must be processed in order, but not necessarily in full sorted order and not necessarily all at once.

Often a set of records must be collected, then the largest processed, then perhaps more records collected, then the next largest processed, and so forth.

An appropriate data structure in such an environment is one which supports the operations of inserting a new element and deleting the largest element.

This can be contrasted with queues (delete the oldest) and stacks (delete the newest). Such a data structure is called a priority queue. In fact, the priority queue might be thought of as a generalization of the stack and the queue (and other simple data structures), since these data structures can be implemented with priority queues, using appropriate priority assignments.

Applications of priority queues include simulation systems (where the keys might correspond to “event times” which must be processed in order), job scheduling in computer systems (where the keys might correspond to

“priorities” which indicate which users should be processed first), and numeri-cal computations (where the keys might be computational errors, so the largest can be worked on first).

Later on in this book, we’ll see how to use priority queues as basic building blocks for more advanced algorithms. In Chapter 22, we’ll develop a file compression algorithm using routines from this chapter, and in Chapters 31 and 33, we’ll see how priority queues can serve as the basis for several fundamental graph searching algorithms. These are but a few examples of the important role served by the priority queue as a basic tool in algorithm design.

It is useful to be somewhat more precise about how a priority queue will be manipulated, since there are several operations we may need to perform on priority queues in order to maintain them and use them effectively for applications such as those mentioned above. Indeed, the main reason that

127

128 CHAPTER 11

priority queues are so useful is their flexibility in allowing a variety of different operations to be efficiently performed on set of records with keys. We want to build and maintain a data structure containing records with numerical keys (priorities), suppor mg some of the following operations:t’

Construct a priority queue from N given items.

Insert a new item.

Remove the largest item.

Replace the largest item with a new item (unless the new item is larger).

Change the priority of an item.

Delete an arbitrary specified item.

Join two priority queues into one large one.

(If records can have duplicate keys, we take “largest” to mean “any record with the largest key value.“)

The replace operation is almost equivalent to an insert followed by a remove (the difference being that the insert/remove requires the priority queue to grow temporarily by one element). Note that this is quite different from doing a remove followed by an insert. This is included as a separate capability because, as we will see, some implementations of priority queues can do the replace operation quite efficiently. Similarly, the change operation could be implemented as a delete followed by an insert and the construct could be imple-mented with repeated uses of the insert operation, but these operations can be directly implemented more efficiently for some choices of data structure. The join operation requires quite advanced data structures for efficient implemen-tation; we’ll concentrate instead on a “classical” data structure, called a heap, which allows efficient implementations of the first five operations.

The priority queue as described above is an excellent example of an abstract data structure: it is very well defined in terms of the operations performed on it, independent of the way the data is organized and processed in any particular implementation. The basic premise of an abstract data structure is that nothing outside of the definitions of the data structure and the algorithms operating on it should refer to anything inside, except through function and procedure calls for the fundamental operations. The main motivation for the development of abstract data structures has been as a mechanism for organizing large programs. They provide a way to limit the size and complexity of the interface between (potentially complicated) algorithms a.nd associated data structures and (a potentially large number of) programs which use the algorithms and data structures. This makes it easier to understand the large program, and makes it more convenient to change or improve the fundamental algorithms. For example, in the present

PRIORITY QUEUES 129

context, there are several methods for implementing the various operations listed above that can have quite different performance characteristics. Defining priority queues in terms of operations on an abstract data structure provides the flexibility necessary to allow experimentation with various alternatives.

Different implementations of priority queues involve different performance characteristics for the various operations to be performed, leading to cost tradeoffs. Indeed, performance differences are really the only differences al-lowed by the abstract data structure concept. First, we’ll illustrate this point by examining a few elementary data structures for implementing priority queues. Next, we’ll examine a more advanced data structure, and then show how the various operations can be implemented efficiently using this data structure. Also, we’ll examine an important sorting algorithm that follows naturally from these implementations.

Elementary Implementations

One way to organize a priority queue is as an unordered list, simply keeping the items in an array a[l..N] without paying attention to the keys. Thus construct is a “no-op” for this organization. To insert simply increment N and put the new item into a[N], a constant-time operation. But replace requires scanning through the array to find the element with the largest key, which takes linear time (all the elements in the array must be examined). Then remove can be implemented by exchanging a[N] with the element with the largest key and decrementing N.

Another organization is to use a sorted list, again using an array a [1..N]

but keeping the items in increasing order of their keys. Now remove simply involves returning a[N] and decrementing N (constant time), but insert in-volves moving larger elements in the array right one position, which could take linear time.

Linked lists could also be used for the unordered list or the sorted list.

This wouldn’t change the fundamental performance characteristics for insert, remove, or replace, but it would make it possible to do delete and join in constant time.

Any priority queue algorithm can be turned into a sorting algorithm by successively using insert to build a priority queue containing all the items to be sorted, then successively using remove to empty the priority queue, receiving the items in reverse order. Using a priority queue represented as an unordered list in this way corresponds to selection sort; using the sorted list corresponds to insertion sort.

As usual, it is wise to keep these simple implementations in mind because they can outperform more complicated methods in many practical situations.

For example, the first method might be appropriate in an application where

130 CRAPTER 11

only a few “remove largest” operations are performed as opposed to a large number of insertions, while the second method would be appropriate if the items inserted always tended to be close to the largest element in the priority queue. Implementations of methods similar to these for the searching problem (find a record with a given key) are given in Chapter 14.

Heap Data Structure

The data structure that we’ll use to support the priority queue operations involves storing the records in an array in such a way that each key is guaranteed to be larger than the keys at two other specific positions. In turn, each of those keys must be larger than two more keys, and so forth. This ordering is very easy to see if we draw the array in a two-dimensional “tree”

structure with lines down from each key to the two keys known to be smaller.

This structure is called a “complete binary tree”: place one node (called the root), then, proceeding down the page and from left to right, connect two nodes beneath each node on the previous level until N nodes have been placed. The nodes below each node are called its sons; the node above each node is called its father. (We’ll see other kinds of “binary trees” and “trees” in Chapter 14 and later chapters of this book.) Now, we want the keys in the tree to satisfy the heap condition: the key in each node should be larger than (or equal to) the keys in its sons (if it has any). Note that this implies in particular that the largest key is in the root.

We can represent complete binary trees sequentially within an array by simply putting the root at position 1, its sons at positions 2 and 3, the nodes at the next level in positions 4, 5,6 and 7, etc., as numbered in the diagram above.

For example, the array representation for the tree above is the following:

1 2 3 4 5 6 7 8 9 10 11 12

X T O G S M N A E R A I

PRIORITY QUEUES 1 3 1

This natural representation is useful because it is very easy to get from a node to its father and sons. The father of the node in position j is in position j div 2, and, conversely, the two sons of the node in position j are in position 2j and 2j + 1. This makes t,raversal of such a tree even easier than if the tree were implemented with a standard linked representation (with each element containing a pointer to its father and sons). The rigid structure of complete binary trees represented as arrays does limit their utility as data structures, but there is just enough flexibility to allow the implementation of efficient priority queue algorithms. A heap is a complete binary tree, represented as an array, in which every node satisfies the heap condition. In particular, the largest key is always in the first position in the array.

All of the algorithms operate along some path from the root to the bottom of the heap (just moving from father to son or from son to father). It is easy to see that, in a heap of N nodes, all paths have about 1gN nodes on them.

(There are about N/2 nodes on the bottom, N/4 nodes with sons on the bottom, N/8 nodes with grandsons on the bottom, etc. Each “generation”

has about half as many nodes as the next, which implies that there can be at most lg N generations.) Thus all of the priority queue operations (except join) can be done in logarithmic time using heaps.

Algorithms on Heaps

The priority queue algorithms on heaps all work by first making a simple structural modification which could violate the heap condition, then traveling through the heap modifying it to ensure that the heap condition is satisfied everywhere. Some of the algorithms travel through the heap from bottom to top, others from top to bottom. In all of the algorithms, we’ll assume that the records are one-word integer keys stored in an array a of some maximum size, with the current size of the heap kept in an integer N. Note that N is as much a part of the definition of the heap as the keys and records themselves.

To be able to build a heap, it is necessary first to implement the insert operation. Since this operation will increase the size of the heap by one, N must be incremented. Then the record to be inserted is put into a[N], but this may violate the heap property. If the heap property is violated (the new node is greater than its father), then the violation can be fixed by exchanging the new node with its father. This may, in turn, cause a violation, and thus can be fixed in the same way. For example, if P is to be inserted in the heap above, it is first stored in a[N] as the right son of M. Then, since it is greater than M, it is exchanged with M, and since it is greater than 0, it is exchanged with 0, and the process terminates since it is less that X. The following heap results:

132 CHAPTER 11

The code for this method is straightforward. In the following implementation, insert adds a new item to a[N], then calls upheap to fix the heap condition violation at N

procedure upheap(k: integer);

var v: integer;

begin

v:=a[k]; a[O]:=maxint;

while a[k div 21 <=v do

begin a[k]:=a[k div 21; k:=k div 2 end;

a[k] :=v;

end ;

procedure insert (v: integer) ; begin

N:=N+l; a[N] :=v;

whew(N) end ;

As with insertion sort, it is not necessary to do a full exchange within the loop, because v is always involved in the exchanges. A sentinel key must be put in a[O] to stop the loop for the case that v is greater than all the keys in the heap.

The replace operation involves replacing the key at the root with a new key, then moving down the heap from top to bottom to restore the heap condition. For example, if the X in the heap above is to be replaced with C, the first step is to store C at the root. This violates the heap condition, but the violation can be fixed by exchanging C with T, the larger of the two sons of the root. This creates a violation at the next level, which can be fixed

Dans le document ROBERT SEDGEWICK ALGORITHMS (Page 135-141)