Searching

Searching#

Searching is the algorithmic process of finding a particular item in a collection of items. A search typically answers either True or False as to whether the item is present. On occasion it may be modified to return where the item is found.

The Sequential Search#

Data items are stored in a collection such as a list. Each data item is stored in a position relative to the others. In Python lists, these relative positions are the index values of the individual items. Since these index values are ordered, it is possible for us to visit them in sequence. This process gives rise to our first searching technique, the sequential search.

def sequential_search(arr, num):
    for indx, val in enumerate(arr): 
        if val == num: 
            return indx
    return -1

num = 15
arr = [ 2, 3, 10, 5, 7, 15, 22, 30, 11, 40 ]; 

index = sequential_search(arr, num) 

if(index == -1): 
    print("Element is not present in array") 
else: 
    print(f"Element is present at index {index}")


def ordered_sequential_search(arr, num):
    for indx, val in enumerate(arr): 
        if val == num: 
            return indx
        if val > num:
            return -1
    return -1

num = 15
arr = [ 2, 3, 5, 7, 15, 22, 30]; 

index = ordered_sequential_search(arr, num) 

if(index == -1): 
    print("Element is not present in array") 
else: 
    print(f"Element is present at index {index}")

Element is present at index 5
Element is present at index 4

Analysis of Sequential Search#

To analyze searching algorithms, we need to decide on a basic unit of computation. Recall that this is typically the common step that must be repeated in order to solve the problem. For searching, it makes sense to count the number of comparisons performed. Each comparison may or may not discover the item we are looking for.

There are actually three different scenarios that can occur. In the best case we will find the item in the first place we look, at the beginning of the list. We will need only one comparison. In the worst case, we will not discover the item until the very last comparison, the nth comparison. On average, we will find the item about halfway into the list; that is, we will compare against n/2 items.

Time Complexicities For Unorderede Sequential Serach

	Best Case	Worst Case	Average Case
item is present	`1`	`𝑛`	`𝑛/2`
item is not present	`𝑛`	`𝑛`	`𝑛`

If the list of items was constructed so that the items were in ascending order, from low to high. If the item we are looking for is present in the list, the chance of it being in any one of the n positions is still the same as before. We will still have the same number of comparisons to find the item. However, if the item is not present there is a slight advantage. In this case, the algorithm does not have to continue looking through all of the items to report that the item was not found. It can stop immediately as soon as an item greater than the one we are looking for is found.

Time Complexicities For Orderede Sequential Serach

	Best Case	Worst Case	Average Case
item is present	`1`	`𝑛`	`𝑛/2`
item is not present	`𝑛`	`𝑛`	`𝑛/2`

The Binary Search#

Binary search will start by examining the middle item. If that item is the one we are searching for, we are done. If it is not the correct item, we can use the ordered nature of the list to eliminate half of the remaining items. If the item we are searching for is greater than the middle item, we know that the entire lower half of the list as well as the middle item can be eliminated from further consideration. The item, if it is in the list, must be in the upper half. Similarly, if the item we are searching for is less than the middle item, we know that the entire upper half of the list as well as the middle item can be eliminated from further consideration. The item, if it is in the list, must be in the lower half.

We can then repeat the process with the upper half/lower half. Start at the middle item and compare it against what we are looking for. Again, we either find it or split the list in half, therefore eliminating another large part of our possible search space.

def binary_search(arr, num):
    low = 0
    high = len(arr) - 1

    while low <= high:
        mid = (low + high) // 2
        
        if arr[mid] == num:
            return mid
        
        if num > arr[mid]:
            low = mid + 1

        if num < arr[mid]:
            high = mid - 1
    
    return -1

num = 15
arr = [2, 3, 5, 7, 15, 22, 30]; 

index = binary_search(arr, num) 

if(index == -1): 
    print("Element is not present in array") 
else: 
    print(f"Element is present at index {index}")

Element is present at index 4

Analysis of Binary Search#

To analyze the binary search algorithm, we need to recall that each comparison eliminates about half of the remaining items from consideration. What is the maximum number of comparisons this algorithm will require to check the entire list? If we start with n items, about 𝑛/2 items will be left after the first comparison. After the second comparison, there will be about 𝑛/4. Then 𝑛/8, 𝑛/16, and so on.

The number of comparisons necessary to get to this point is i where 𝑛/2^𝑖=1. Solving for i gives us 𝑖=log𝑛. The maximum number of comparisons is logarithmic with respect to the number of items in the list. Therefore, the binary search is 𝑂(log𝑛).

Sorting#

Sorting is the process of placing elements from a collection in some kind of order.

Example

A list of words could be sorted alphabetically or by length.
A list of cities could be sorted by population, by area, or by zip code.

There are many, many sorting algorithms that have been developed and analyzed, sorting is an important area of study in computer science. Sorting a large number of items can take a substantial amount of computing resources. Like searching, the efficiency of a sorting algorithm is related to the number of items being processed.

For small collections, a complex sorting method may be more trouble than it is worth. The overhead may be too high. On the other hand, for larger collections, we want to take advantage of as many improvements as possible, we will discuss several sorting techniques and compare them with respect to their running time.

Bubble Sort#

The bubble sort makes multiple passes through a list. It compares adjacent items and exchanges those that are out of order. Each pass through the list places the next largest value in its proper place. In essence, each item “bubbles” up to the location where it belongs.

To analyze the bubble sort, we should note that regardless of how the items are arranged in the initial list, n−1 passes will be made to sort a list of size n. The total number of comparisons is the sum of the first n−1 integers. The sum of the first n−1 integers is n*(n-1)/2. This is still O(n^2) comparisons. In the best case, if the list is already ordered, no exchanges will be made. However, in the worst case, every comparison will cause an exchange. On average, we exchange half of the time.

A bubble sort is often considered the most inefficient sorting method since it must exchange items before the final location is known. These “wasted” exchange operations are very costly.

def bubble_sort(arr):
    n = len(arr)

    for i in range(n):
        swaps = False
        for j in range(n - i - 1):
            if arr[j] > arr[j + 1]:
                arr[j], arr[j + 1] = arr[j + 1], arr[j]
                swaps = True 
        if not swaps:
            return arr
    
    return arr

arr = [5, 4, 1, 2, 3, 9, 8, 7, 6]
print(bubble_sort(arr))

[1, 2, 3, 4, 5, 6, 7, 8, 9]

Selection Sort#

The selection sort improves on the bubble sort by making only one exchange for every pass through the list. In order to do this, a selection sort looks for the smallest value as it makes a pass and, after completing the pass, places it in the proper location. As with a bubble sort, after the first pass, the smallest item is in the correct place. After the second pass, the next smallest is in place. This process continues and requires n−1 passes to sort n items, since the final item must be in place after the (n−1) st pass.

def selection_sort(arr):
    n = len(arr)

    for i in range(n - 1):
        min_index = i
        for j in range(i + 1, n):
            if arr[min_index] > arr[j]:
                min_index = j
        arr[i], arr[min_index] = arr[min_index], arr[i]
    
    return arr

arr = [5, 4, 1, 2, 3, 9, 8, 7, 6]
print(selection_sort(arr))

[1, 2, 3, 4, 5, 6, 7, 8, 9]

Insertion Sort#

The insertion sort, although still O(n^2), works in a slightly different way. It always maintains a sorted sublist in the lower positions of the list. Each new item is then “inserted” back into the previous sublist such that the sorted sublist is one item larger.

def insertion_sort(arr):
    n = len(arr)

    for i in range(1, n):
        for j in range(i, 0, -1):
            if arr[j] > arr[j - 1]:
                break
            if arr[j - 1] > arr[j]:
                arr[j - 1], arr[j] = arr[j], arr[j - 1]
    return arr

arr = [5, 4, 1, 2, 3, 9, 8, 7, 6]
print(insertion_sort(arr))

[1, 2, 3, 4, 5, 6, 7, 8, 9]

Merge Sort#

Merge sort is a recursive algorithm that continually splits a list in half. If the list is empty or has one item, it is sorted by definition (the base case). If the list has more than one item, we split the list and recursively invoke a merge sort on both halves. Once the two halves are sorted, the fundamental operation, called a merge, is performed. Merging is the process of taking two smaller sorted lists and combining them together into a single, sorted, new list.

def merge_sort(arr):
    n = len(arr)

    if n <= 1:
        return arr

    left_sub_arr = merge_sort(arr[:n // 2])
    right_sub_arr = merge_sort(arr[n // 2:])

    arr = []
    i, j = 0, 0
    
    while i < len(left_sub_arr) and j < len(right_sub_arr):
        if left_sub_arr[i] <= right_sub_arr[j]:
            arr.append(left_sub_arr[i])
            i += 1
        elif left_sub_arr[i] > right_sub_arr[j]:
            arr.append(right_sub_arr[j])
            j += 1
        
    return arr + left_sub_arr[i:] + right_sub_arr[j:]

arr = [5, 4, 1, 2, 3, 9, 8, 7, 6]
print(merge_sort(arr))

[1, 2, 3, 4, 5, 6, 7, 8, 9]

Quick Sort#

The quick sort uses divide and conquer to gain the same advantages as the merge sort, while not using additional storage. As a trade-off, however, it is possible that the list may not be divided in half. When this happens, we will see that performance is diminished.

A quick sort first selects a value, which is called the pivot value. Although there are many different ways to choose the pivot value, we will simply use the first item in the list. The role of the pivot value is to assist with splitting the list. The actual position where the pivot value belongs in the final sorted list, commonly called the split point, will be used to divide the list for subsequent calls to the quick sort.

from random import randrange

def quick_sort(arr):
    n = len(arr)

    if n <= 1:
        return arr

    pivot_index = randrange(n)
    pivot_value = arr[pivot_index]
    
    arr[-1], arr[pivot_index] = arr[pivot_index], arr[-1]

    pivot_index = 0

    for i in range(n - 1):

        if arr[i] <= pivot_value:
            arr[i], arr[pivot_index] = arr[pivot_index], arr[i]
            pivot_index += 1
    
    arr[-1], arr[pivot_index] = arr[pivot_index], arr[-1]

    return quick_sort(arr[:pivot_index]) + [arr[pivot_index]] + quick_sort(arr[pivot_index + 1:])

arr = [5, 4, 1, 2, 3, 9, 8, 7, 6]
print(quick_sort(arr))

[1, 2, 3, 4, 5, 6, 7, 8, 9]