You are on page 1of 8

Efficient Search Algorithms

Explore, implement, and analyze different search algorithms and understand their
time complexity in C++ using Linear and Binary Search.

What is Linear Search?


Linear search is a simple searching algorithm based on a sequential model. Unlike binary search,
the linear search algorithm checks every item in a list to find the required element. Like stacks in
data structures, a linear search algorithm will check elements in a particular order. A linear
search, sometimes referred to as a sequential search and is suitable for searching over a small
array or an unsorted array. When the element you are looking for is in the first position of the
data structure, only one comparison would be sufficient. But an N number of comparisons will
be required to find elements in the last position .

What is Binary Search?


Binary search is an algorithm for finding an item from a sorted list of items. It is called
binary search because it splits the array into two halves as part of the algorithm. It is also known
as half-interval search, logarithmic search, or binary chop. When you compare linear search and
binary search, the binary search seems to be faster. But a significant difference between linear
search and binary search is that you need to sort the items in a list before using the binary
search algorithm. Click on this to understand more about sorting in data structures. Basically, the
binary search can help you find any item by comparing the middlemost item of the collection.
Therefore, one massive difference between linear search and binary search is that binary search
can be completed within a short span.

C++ Program Used for Testing


This C++ code demonstrates the implementation and comparison of linear search and binary search
algorithms. Below is a breakdown of its components:

1. Header Includes: The code includes necessary header files such as <iostream>, <vector>,
<string>, <chrono>, and <fstream>.
2. Function Definitions:
 linearSearch: This function implements the linear search algorithm. It iterates through the
given vector (arr) to find the key. If found, it returns the index of the key; otherwise, it
returns -1.
 binarySearch: This function implements the binary search algorithm. It requires
the input vector to be sorted. It repeatedly divides the search interval in half until
the key is found or the interval is empty. If found, it returns the index of the key;
otherwise, it returns -1.
 generateRandomIntegers: This function generates a vector of random integers of
the specified size.
 plotExecutionTimes: This function writes the execution times of linear and binary
searches for different input sizes to a data file. It then generates a GNU plot script
to visualize the data.

3. Main Function:
 It defines input sizes to test.
 It initializes vectors to store execution times of linear and binary searches.
 It iterates over each input size:
 Generates a dataset of random integers.
 Randomly selects a key to search.
 Measures the execution time of linear search.
 Measures the execution time of binary search (requires the dataset to be
sorted).
 Stores the execution times in respective vectors.
 Finally, it calls plotExecutionTimes function to visualize the execution times using
GNU plot.

Code Used:
#include <iostream>
#include <vector>
#include <string>
#include <chrono>
#include <fstream>

// Linear search function


template<typename T>
int linearSearch(const std::vector<T>& arr, const T& key) {
for (size_t i = 0; i < arr.size(); ++i) {
if (arr[i] == key) {
return i;
}
}
return -1; // Not found
}

// Binary search function


template<typename T>
int binarySearch(const std::vector<T>& arr, const T& key) {
int left = 0;
int right = arr.size() - 1;

while (left <= right) {


int mid = left + (right - left) / 2;

if (arr[mid] == key) {
return mid; // Found
} else if (arr[mid] < key) {
left = mid + 1;
} else {
right = mid - 1;
}
}

return -1; // Not found


}

// Function to generate random integers


std::vector<int> generateRandomIntegers(size_t size) {
std::vector<int> result;
for (size_t i = 0; i < size; ++i) {
result.push_back(rand() % 1000); // generating numbers between 0 and 999
}
return result;
}

// Function to plot execution times


void plotExecutionTimes(const std::vector<int>& sizes, const std::vector<double>& linearTimes, const
std::vector<double>& binaryTimes) {
std::ofstream dataFile("execution_times.dat");
for (size_t i = 0; i < sizes.size(); ++i) {
dataFile << sizes[i] << " " << linearTimes[i] << " " << binaryTimes[i] << std::endl;
}
dataFile.close();

std::ofstream plotFile("plot_script.gnu");
plotFile << "set terminal png\n";
plotFile << "set output 'execution_times.png'\n";
plotFile << "set title 'Linear vs Binary Search Execution Times'\n";
plotFile << "set xlabel 'Input Size'\n";
plotFile << "set ylabel 'Execution Time (ms)'\n";
plotFile << "plot 'execution_times.dat' using 1:2 with lines title 'Linear Search', 'execution_times.dat' using
1:3 with lines title 'Binary Search'\n";
plotFile.close();

system("gnuplot plot_script.gnu");
}

int main() {
std::vector<int> sizes = {1000, 2000, 3000, 4000, 5000}; // Input sizes to test
std::vector<double> linearTimes;
std::vector<double> binaryTimes;
for (size_t i = 0; i < sizes.size(); ++i) {
std::vector<int> dataset = generateRandomIntegers(sizes[i]);
int key = rand() % 1000; // randomly select a key to search
auto start = std::chrono::high_resolution_clock::now();
linearSearch(dataset, key);
auto end = std::chrono::high_resolution_clock::now();
std::chrono::duration<double, std::milli> linearDuration = end - start;
linearTimes.push_back(linearDuration.count());

std::sort(dataset.begin(), dataset.end()); // Binary search requires a sorted dataset


start = std::chrono::high_resolution_clock::now();
binarySearch(dataset, key);
end = std::chrono::high_resolution_clock::now();
std::chrono::duration<double, std::milli> binaryDuration = end - start;
binaryTimes.push_back(binaryDuration.count());
}

plotExecutionTimes(sizes, linearTimes, binaryTimes);

return 0;
}

Results and Conclusions


Plot comparing times of binary and linear search algorithms. Binary Search times
barely increase as the list size is increased.

The choice of search algorithm can significantly affect performance for


different types of input data. Here's how the performance may vary:

1. Sorted Data:
 For sorted data, binary search performs much better than linear search.
 Binary search has a time complexity of O(log n), where n is the size of the
dataset. This means that its performance improves logarithmically as the
size of the dataset increases.
 Linear search, on the other hand, has a time complexity of O(n). Its
performance grows linearly with the size of the dataset. Therefore, for
large datasets, binary search will be much faster compared to linear
search.
2. Unsorted Data:
 In the case of unsorted data, binary search cannot be applied directly
because it requires the dataset to be sorted.
 Linear search is the only option for unsorted data. Its performance remains
O(n) regardless of the input data type.
 However, the performance of linear search may still vary based on the
distribution of the target element within the dataset. If the target element
is closer to the beginning of the dataset, linear search will find it faster. If
it's towards the end, it will take longer.
3. Nature of Data:
 For uniformly distributed data, there might not be a significant difference
in performance between linear and binary search, especially for small
datasets. This is because linear search may find the target element quickly
due to its uniform distribution.
 For skewed or irregularly distributed data, the performance of linear search
may vary greatly. If the target element is more likely to be found towards
one end of the dataset, linear search may take longer to find it.
4. Data Size:
 For very small datasets, the overhead of sorting the data (required for
binary search) may outweigh the benefits of the faster search time.
 As the dataset size grows larger, binary search becomes increasingly
advantageous over linear search due to its logarithmic time complexity.
Difference Of Linear Search and Binary Search

Comparison
Linear Search Binary Search
Factor
Algorithm Sequential searching Divide and conquer approach
Sorted List Not mandatory mandatory
Divides a list in two halves from the
Checks every element
middle on every iteration as long as
Working sequentially as long as a
the position of a target value cannot
match can’t be found
be spotted
Useful for data structures that
Useful only for data structures
Implementation enable traversal in a single
supporting two way traversal
way like arrays and linked list
Simplicity and
Simple and less complex More complicated than linear search
Complexity
Time Required 0 (n) 0 (log n)
More efficient for finding elements in
Efficiency Not useful for large datasets
large datasets
When an element can be
When an element can be found at first
Best Case found in the first position of an
split, the middle element of an array
array
Half-interval search or logarithmic
Also Called Sequential search
search

Advantages and Disadvantages of Linear Search


While linear search is straightforward and suitable for small datasets, it becomes
impractical for large ones. This inefficiency leads us to explore the alternative search
algorithm, Binary Search, which is highly efficient for sorted data collections. Let's delve
into Binary Search in the next section.
Advantages Disadvantages
Simple and easy to understand. Inefficient for large datasets. It may have to
Works on both sorted and unsorted data. examine every element in the worst case.
Does not require any special data Linear time complexity, O(n), where n is the
structure. number of elements in the collection.
Advantages and Disadvantages of Binary Search
Binary search's efficiency in locating items in large sorted collections makes it an
attractive choice when performance matters. It significantly shortens the search time
and is a valuable tool in many real-world applications.

Advantages Disadvantages
Highly efficient for sorted data collections. Requires the data collection to be sorted.
Time complexity of O(log n), making it More complex to implement than linear
much faster for large datasets. search.
Reduces the search time drastically
compared to linear search.

General Summary of the Differences between the two


Search A
Scenarios where one search algorithm may be preferred
over another:
1. Sorted Data:
 Prefer binary search for sorted data due to its faster search time (O(log n)
compared to O(n) for linear search).
2. Unsorted Data:
 Linear search is the only option for unsorted data. It can be directly
applied without the need for sorting.
3. Small Datasets:
 For very small datasets, linear search may be preferred due to its simplicity
and lower overhead compared to sorting required for binary search.
4. Large Datasets:
 Binary search is advantageous for large datasets due to its logarithmic
time complexity, ensuring efficient search operations.
5. Memory Constraints:
 Linear search typically requires less memory overhead compared to binary
search, making it suitable for scenarios with memory constraints.
6. Real-time Applications:
 Binary search may be preferred in real-time applications where quick
response times are critical.
7. Ease of Implementation:
 Linear search is easier to implement and understand compared to binary
search, making it suitable for simpler applications or educational purposes.

You might also like