Amortized analysis is a technique used in algorithm analysis to analyze the average time complexity of a sequence of operations on a data structure or algorithm. It allows us to determine the overall performance of a series of operations, even if some individual operations may be expensive.
In many cases, the time complexity of individual operations can vary significantly. Some operations may be very fast, while others may be slower. Amortized analysis helps us understand the average cost of operations over a sequence of operations, taking into account both the fast and slow operations.
The basic idea behind amortized analysis is to allocate the cost of expensive operations over a series of cheaper operations. This means that while a single operation may have a high time complexity, its cost is spread out over multiple operations, resulting in an average cost that is much lower.
Amortized analysis is particularly useful when analyzing data structures that have expensive operations in some cases but are generally efficient overall. Examples include dynamic arrays, binary heaps, and hash tables.
By using amortized analysis, we can gain insight into the overall performance characteristics of algorithms and data structures, helping us make informed decisions about their suitability for specific applications.
There are generally three methods for performing amortized analysis: the aggregate analysis, the accounting method, and the potential method. All of these give correct answers; the choice of which to use depends on which is most convenient for a particular situation.
Aggregate Analysis: Determines the upper bound $T(n)$ on the total cost of a sequence of n operations, then calculates the amortized cost to be $T(n)/n$. This method is simple and intuitive, but it may not reveal the individual cost of each operation.
Accounting Method: A form of aggregate analysis which assigns to each operation an amortized cost which may differ from its actual cost. Early operations have an amortized cost higher than their actual cost, which accumulates a saved ācreditā that pays for later operations having an amortized cost lower than their actual cost. This method allows us to assign different costs to different operations, but it may require some creativity to come up with the right credit scheme.
Potential Method: A more sophisticated form of aggregate analysis which associates with each state of the data structure a āpotentialā which is a function of the current state. The amortized cost of an operation is then defined as its actual cost plus the change in potential due to the operation. This method is more general and flexible than the accounting method, but it may also require some ingenuity to define the right potential function.
Letās consider a dynamic array implementation in C++, where the array automatically resizes itself when needed. This type of dynamic array is often called a vector
in C++. Weāll use amortized analysis to analyze the time complexity of the push_back
operation.
#include <iostream>
#include <vector>
int main() {
// Create an empty vector of integers
std::vector<int> v;
// Insert some elements at the end using push_back
v.push_back(10); // v = [10]
v.push_back(20); // v = [10, 20]
v.push_back(30); // v = [10, 20, 30]
// Print the size and capacity of the vector
std::cout << "Size: " << v.size() << "\n"; // Size: 3
std::cout << "Capacity: " << v.capacity() << "\n"; // Capacity: 4
// Insert another element at the end using push_back
v.push_back(40); // v = [10, 20, 30, 40]
// Print the size and capacity of the vector
std::cout << "Size: " << v.size() << "\n"; // Size: 4
std::cout << "Capacity: " << v.capacity() << "\n"; // Capacity: 4
// Insert one more element at the end using push_back
v.push_back(50); // v = [10, 20, 30, 40, 50]
// Print the size and capacity of the vector
std::cout << "Size: " << v.size() << "\n"; // Size: 5
std::cout << "Capacity: " << v.capacity() << "\n"; // Capacity: 8
// Notice that the capacity increased by a factor of 1.5 when it ran out of space
return 0;
}
The std::vector
has a method called push_back
that inserts an element at the end of the vector. The push_back
method has an amortized time complexity of $O(1)$, which means that the average time per insertion is constant, even though some insertions may take longer than others.
The reason why some insertions may take longer than others is that the vector may run out of space and need to resize its underlying array. The resizing operation involves allocating a new array with a larger capacity, copying all the elements from the old array to the new array, and deleting the old array. The resizing operation has a time complexity of $O(n)$, where n is the number of elements in the vector.
However, the resizing operation does not happen very often, and its cost can be amortized over several insertions. The std::vector
uses a constant factor to increase its capacity whenever it needs to resize. For example, if the factor is 1.5, then the vector will increase its capacity by 50% every time it resizes.
The choice of the factor affects the performance of the vector. If the factor is too large, then the vector will waste a lot of space and incur cache misses. If the factor is too small, then the vector will resize too frequently and incur copying overhead. The optimal factor depends on various factors such as memory allocation cost, cache size, and expected number of insertions.