Related Searching Problems - Further Information

Further Information

2.5 Related Searching Problems

Searching in an Unbounded Set

In most cases we search in a bounded set. We can also search in anunbounded set. Consider the following game: one person thinks about a positive number and another person has to guess it with questions of the type:the number x is less than, equal to, or larger than the number that you are thinking?This problem was considered in [4].

A ﬁrst obvious solution is to use the sequence 1, 2,. . . , n(that is, sequential search), usingnquestions.

We can do better by using the “gambler” strategy. That is, we use the sequence 1, 2, 4,. . . , 2^m, until we have 2^m≥n. In the worst case we havem= logn +1. Next, we can use binary search in the interval 2^m−1+1 to 2^mto search forn, using in the worst casem−1 questions. Hence, the total number of questions is 2m−1=2logn +1. This algorithm is depicted inFig. 2.4. That is, only twice a binary search in a ﬁnite set ofnelements. Can we do better? We can think that what we did is to search the exponentmusing sequential search. So, we can use this algorithm,A1, to search formusing 2logm +1 questions, and then use binary search, with a total number of questions of logn+2 log logn+O(1) questions. We could call this algorithmA2.

In general, we can deﬁne algorithmA_k, which usesA_k−1 to ﬁndmand then use binary search. The complexity of such algorithm is

S_n^k=logn+log logn+ · · · +log^(k−1)n+2 log^(k)n+O(1)

questions, where log⁽ⁱ⁾ndenotes log applieditimes. Of course, if we could know the value ofnin advance, there is an optimal value forkofO(log^∗n),³because ifkis too large, we go too far. However, we do not knowna priori!

3log^∗nis the number of times that we have to apply the log function before we reach a value less than or equal to 0.

FIGURE 2.4 A1unbounded search (left) and the person problem (right).

Searching with Bounded Resources

Most of the time we assume that we can perform an unbounded number of questions when searching.

However, in many real situations, we search with bounded resources. For example, gasoline when using a car. As an example, we use a variable cost searching problem, initially proposed in [5, Section 3.2], with some changes, but maintaining the same philosophy. Given a building ofnﬂoors andkpersons, we want to answer the following problem: which is the largest ﬂoor from where a person can jump and does not break a leg? We assume that a person with a broken leg cannot jump again.⁴

Suppose that the answer is floorj. So, we havejfloors that will give us a positive answer andn−j floors that give us a negative answer. However, we can only afford up toknegative answers to solve the problem (in generalk < n−j). So, we have a bounded resource: persons.

If we have just one person, the solution is easy, since we are forced to use sequential search to ﬁndj.

Any other strategy does not work, because if the person fails, we do not solve the problem. If we have many persons (more preciselyk >logn), we can use binary search. In both cases, the solution is optimal in the worst case.

If we have two persons, a first solution would be to start using binary search with the first person, and then use the second sequentially in the remaining segment. In the worst case, the first person fails in the first jump, giving an/2 jumps algorithm. The problem is that both persons do not perform the same amount of work. We can balance the work by using the following algorithm: the first person tries sequentially everyn/pfloors for a chosenp, that isn/p, 2n/p, etc. When his/her leg breaks, the second person has a segment of approximatelyn/pfloors to check (seeFig. 2.4). In the worst case the number of floors isp(first person) plusn/p(second person). So we have

U_n²=p+n/p+O(1) Balancing the work, we havep=n/p, which impliesp=√

n, givingU_n²=2√

n+O(1). Note that to succeed, any algorithm has to do sequential search in some segment with the last person.

We can generalize the above algorithm tokpersons using the partitioning idea recursively. Every person except the last one partitions the remaining segment inpparts and the last person uses sequential search.

In the worst case, every person (except the last one) has to performpjumps. The last one does sequential search on a segment of sizen/p^k−¹. So, the total cost is approximately

U_n^k=(k−1)p+ n p^k−1

4This is a theoretical example, do not try to solve this problem in practice!

Balancing the work for every person, we must havep = n/p^k−1, obtainingp =n^1/k(same as using calculus!). Then, the ﬁnal cost is

U_n^k =kn^1/k If we considerk=log₂n, we have

U_n^k=kn2^log²⁽ⁿ^1/k⁾=log₂n2^log2kⁿ =2 log₂n

which is almost like binary search. In fact, taking care of the partition boundaries, and using an optimal partition (related to binomial trees), we can savekjumps, which gives the same as binary search. So we have a continuum from sequential to binary search askgrows.

We can mix the previous two cases to have unbounded search with limited resources. The solution mixes the two approaches already given and can be a nice exercise for interested readers.

Searching with Nonuniform Access Cost

In the traditional RAM model we assume that any memory access has the same cost. However, this is not true if we consider the memory hierarchy of a computer: registers, cache and main memory, secondary storage, etc. As an example of this case, we use thehierarchical memorymodel introduced in [1]. That is, the access cost to positionxis given by a functionf (x). The traditional RAM model is whenf (x)is a constant function. Based in access times of current devices, possible values aref (x)=logxorf (x)=x^α with 0< α≤1.

Given a set ofnintegers in a hierarchical memory, two problems are discussed. First, given a ﬁxed order (sorted data), what is the optimal worst-case search algorithm. Second, what is the optimal ordering (implicit structure) of the data to minimize the worst-case search time. This ordering must be described using constant space.

In both cases, we want to have thenelements inncontiguous memory locations starting at some position and only using a constant amount of memory to describe the searching procedure. In our search problem, we consider only successful searches, with the probability of searching for each one of then elements being the same.

Suppose that the elements are sorted. LetS(i, j)be the optimal worst-case cost to search for an element which is between positionsiandjof the memory. We can express the optimal worst-case cost as

S(i, j)= min

k=i,...,j{f (k)+max(S(i, k−1), S(k+1, j))}

fori ≥ j or 0 otherwise. We are interested inS(1, n). This recurrence can be solved using dynamic programming inO(n²)time. This problem was considered in [20], where it is shown that for logarithmic or polynomialf (x), the optimal algorithm needsO(f (n)logn)comparisons. In particular, iff (x)=x^α, a lower and upper bound of

n^αlogn 1+α for the worst-case cost of searching is given in [20].

In our second problem, we can order the elements to minimize the searching cost. A ﬁrst approach is to store the data as the implicit complete binary search tree induced by a binary search in the sorted data, such that the last level is compacted to the left (left complete binary tree). That is, we store the root of the tree in position 1 and in general the children of the element in positioniin positions 2iand 2i+1 like in a heap. Nevertheless, there are better addressing schemes that balance as much as possible every path of the search tree.

Searching with Partial Information

In this section we use a nonuniform cost model plus an unbounded domain. In addition the algorithm does not know all the information of the domain and learns about it while searching. In this case we are searching for an object in some space under the restriction that for each new “probe” we must pay costs proportional to the distance of the probe position relative to our current probe position and we wish to minimize this cost. This is meant to model the cost in real terms of a robot (or human) searching for an object when the mobile searcher must move about to ﬁnd the object. It is also the case for many searching problems on secondary memory devices as disk and tapes. This another example of anon-line algorithm.

An on-line algorithm is calledc-competitiveif the solution to the problem related to the optimal solution when we have all the information at the beginning (off-line case) is bounded by

Solution (on-line) Optimal (off-line) ≤c

Suppose that a person wants to find a bridge over a river. We can abstract this problem as finding some distinguished point on a line. Assume that the point isn(unknown) steps away along the line and that the person does not know how far away the point is. What is the minimum number of steps it must make to find the point, as a function ofn?

The optimal way to ﬁnd the point (up to lower order terms) is given by linear spiral search [3]: execute cycles of steps where the function determining the number of steps to walk before thei^thturn starting from the origin is 2ⁱfor alli≥1. That is, we ﬁrst walk one step to the left, we return to the origin, then two steps to the right, returning again to the origin, then four steps to the left, etc. The total distance walked is 2,^log_i=1ⁿ⁺¹2ⁱ+n, which is no more than 9 times the original distance. That is, this is a 9-competitive algorithm, and this constant cannot be improved.

Dans le document ALGORITHMS and THEORY of COMPUTATION HANDBOOK (Page 47-50)