• Aucun résultat trouvé

Analysis Techniques

Dans le document Algorithms in a Nutshell 2E (Page 109-113)

When analyzing a sorting algorithm, one must explain its best-case, worst-case, and average-case performance (as discussed in Chapter 2). The average case is typically hardest to accurately quantify and relies on advanced mathematical techniques and estimation. Also, it assumes a reasonable understanding of the likelihood that the input may be partially sorted. Even when an algorithm has been shown to have a desirable average-case cost, its implementation may simply be impractical. Each sorting algorithm in this chapter is analyzed both by its theoretical behavior and by its actual behavior in practice.

A fundamental result in computer science is that no algorithm that sorts by comparing elements can do better than O(n log n) perfor‐

mance in the average or worst case. We now sketch a proof. Given n items, there are n! permutations of these elements. Every algorithm that sorts by pairwise comparisons corresponds to a binary decision tree. The leaves of the tree correspond to an underlying permutation, and every permutation must have at least one leaf in the tree. The nodes on a path from the root to a leaf correspond to a sequence of compar‐

isons. The height of such a tree is the number of comparison nodes in the longest path from the root to a leaf node; for example, the height of the tree in Figure 4-10 is 5 because only five comparisons are needed in all cases (although in four cases only four comparisons are needed).

Construct a binary decision tree where each internal node of the tree represents a comparison ai ≤ aj and the leaves of the tree represent one of the n! permutations. To sort a set of n elements, start at the root and evaluate the statements shown in each node. Traverse to the left child when the statement is true; otherwise, traverse to the right child.

Figure 4-10 shows a sample decision tree for four elements.

Figure 4-10. Binary decision tree for ordering four elements

One could construct many different binary decision trees. Nonethe‐

less, we assert that given any such binary decision tree for comparing n elements, we can compute its minimum height h; that is, there must be some leaf node that requires h comparison nodes in the tree from the root to that leaf. Consider a complete binary tree of height h in which all non-leaf nodes have both a left and right child. This tree contains a total of n = 2h − 1 nodes and height h=log (n+1); if the tree is not complete, it could be unbalanced in strange ways, but we know that h ≥ ⌈log (n+1)⌉. Any binary decision tree with n! leaf nodes already demonstrates it has at least n! nodes in total. We need only compute h=⌈log (n!)⌉ to determine the height of any such binary decision tree.

We take advantage of the following properties of logarithms: log (a*b)=log (a)+log (b) and log (x y)=y*log (x).

Thus h > (n/2)*(log(n)-1). What does this mean? Well, given n ele‐

ments to be sorted, there will be at least one path from the root to a leaf of size h, which means that an algorithm that sorts by comparison requires at least this many comparisons to sort the n elements. Note that h is computed by a function f(n); here in particular, f(n)=(1/2)*n*log(n)-n/2, which means that any sorting algorithm us‐

ing comparisons will require O(n log n) comparisons to sort.

References

Bentley, Jon Louis and M. Douglas McIlroy, “Engineering a Sort Func‐

tion,” Software—Practice and Experience, 23(11): 1249–1265, 1993, http://citeseer.ist.psu.edu/bentley93engineering.html.

Blum, Manuel, Robert Floyd, Vaughan Pratt, Ronald Rivest, and Rob‐

ert Tarjan, “Time bounds for selection.” Journal of Computer and Sys‐

tem Sciences, 7(4): 448–461, 1973.

Cormen, Thomas H., Charles E. Leiserson, Ronald L. Rivest, and Clif‐

ford Stein, Introduction to Algorithms, Second Edition. McGraw-Hill, 2001.

References | 99

Davis, Mark and Ken Whistler, “Unicode Collation Algorithm, Uni‐

code Technical Standard #10,” March 2008, http://unicode.org/reports/

tr10/.

Gilreath, William, “Hash sort: A linear time complexity multiple-dimensional sort algorithm.” Proceedings of First Southern Symposi‐

um on Computing, December 1998, http://www.citebase.org/abstract?

id=oai:arXiv.org:cs/0408040=oai:arXiv.org:cs/0408040.

Musser, David, “Introspective sorting and selection algorithms.” Soft‐

ware—Practice and Experience, 27(8): 983–993, 1997.

Sedgewick, Robert, “Implementing Quicksort Programs.” Communi‐

cations ACM, 21: 847–857, 1978.

Trivedi, Kishor Shridharbhai, Probability and Statistics with Reliabil‐

ity, Queueing, and Computer Science Applications, Second Edition.

Wiley-Interscience Publishing, 2001.

CHAPTER 5

Searching

Given a collection C of elements, there are two fundamental queries:

Existence

Does C contain a target element? Given a collection C, one often simply wants to know whether the collection already contains a given element t. The response to such a query is true if an element exists in the collection that matches the desired target t, or false if this is not the case.

Associative lookup

Return information associated in collection C with a target key value k. Usually, a key is associated with a complex structure called a “value.” The lookup retrieves or replaces this value.

The algorithms in this chapter describe specific ways to structure data to more efficiently process search queries. For example, you might order the collection C using the sorting algorithms previously covered in Chapter 4. As we will see, sorting improves the performance of queries, but there are other costs involved in maintaining a sorted collection, especially when elements are frequently inserted or deleted.

Ultimately the performance is based on how many elements an algo‐

rithm inspects as it processes a query. Use the following guide to select the best algorithm for you:

Small collections

Sequential Search offers simplest implementation and is imple‐

mented as a basic construct in many programming languages. Use this algorithm when the collection is available only sequentially, as with an iterator.

101

Restricted memory

When the collection is an array that doesn’t change and you want to conserve memory, use Binary Search.

Dynamic membership

If the elements in the collection change frequently, consider Hash-based Search and Binary Search Tree for their ability to spread out the costs associated with maintaining their data structures.

Sorted access

Use Binary Search Tree when you need dynamic membership and the ability to process elements in the collection in sorted order.

Don’t forget to account for any upfront pre-processing required by the algorithm to structure data in advance of handling search queries.

Choose an appropriate structure that not only speeds up the perfor‐

mance of individual queries, but also minimizes the overall cost of maintaining the collection structure in the face of both dynamic access and multiple queries.

We assume the existence of a set U (the universe) of possible values.

The collection C contains elements drawn from U, and the target el‐

ement being sought, t, is a member of U. If t is instead a key value, we consider U to be the set of potential key values, k U, and the collection C may contain more complex elements. Note that duplicate values may exist within C, so it cannot be treated as a set (which only supports unique membership).

When the collection C allows the indexing of arbitrary elements, we refer to the collection as an array A with the notation A[i] representing the ith element of A. By convention, we use the value null to represent an element not in U; such a value is useful when a search is asked to return a specific element in a collection but that element is not present.

In general, we assume it is impossible to search for null in a collection.

Dans le document Algorithms in a Nutshell 2E (Page 109-113)