\title{A comparison of the bubble, shell and STL sort algorithm}

\author{Jostein Bratlie and Rune Dalmo}

\title{Graph comparison}

\author{Markus Tobiassen}

\date{\today}

...

...

@@ -45,9 +45,8 @@

\maketitle

\begin{abstract}

This is an article template which can be used to generate a report.

Several \LaTeX features are demonstrated, including how to display maths and algorithms, generate plots and use a bibliography to manage a list of references.

We use a comparison compare the standard template library (STL) sort algorithm, to the basic bubble and shell sort algorthms, to demonstrate article build up.

\end{abstract}

...

...

@@ -56,130 +55,69 @@ We use a comparison compare the standard template library (STL) sort algorithm,

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Introduction}

The C++ programming language is supported by a standard library, the \emph{Standard Template Library} (STL)~\cite{plauger:2000}, which complements the language with standard containers, algorithms, language modules and other utilities.

The default sort algorithm in the STL is based on the quick-sort algorithm presented in~\cite{musser:1997}, which is a two-part introspective algorithm combining heap sort~\cite{williams:1997} and median-of-3 quick sort~\cite{hoare:1962} elements.

We will compare the STL sort algorithm to two rudimentary custom implemented sorting algorithms.

The Bubble~\cite{aho:1974} and Shell~\cite{shell:1959} sorting algorithms.

\subsection{STL sort}

STL implementations of the sort algorithm is based on the introspective quick sort algorithm from~\cite{musser:1997}.

This algorithm is a two-part algorithm combining meadian-of-3 quick sort and uses heap-sort in situations where partitioning is tending towards quadratic behaviour.

The three-way-partition quick sort algorithm has a best case time complexity of $O(n)$, an average of $O(n \log n)$ and a worst case of $O(n^2)$.

\subsection{Bubble sort and shell sort}

The bubble sort algorithm~\cite{aho:1974} is a naive sorting algorithm which repeatedly iterates over an unsorted collection and swaps adjacent elements until the collection is sorted.

The bubble sort has a best case time complexity of $O(n)$ comparisons and $O(1)$ swaps while the average and worst case time complexities all are $O(n^2)$.

As an example of including an algorithm see the pseudo-code bubble sort algorithm in \cref{alg:bubblesort}.

\noindent The Breadth first search(BFS) algorithm is a traversing algorithm that start from a specific node and progressively visits all neighbor nodes.

The algorithm continues to visit each nodes neighbor's untill all nodes have been visited.~\cite{kurant:2010}

\indent

The depth first search(DFS) algorithm traverses one neighbor at a time untill it reaches the end point. Then the algorithm backtracks

and traverses the next neigbors, and repeats untill all nodes are traversed. ~\cite{mehlhorn:2008}

In \cite{shell:1959} Shell introduced a high-speed sorting procedure.

It has later become known as \emph{Shell sorting}.

The Shell sort algorithm has a best and worst case time complexity of $O(n \log n)$ and $O(n^2)$ gap sequences, respectively.

\indent

Dijkstra's algorithm is an algorithm used for finding the shortest path from a starting point in the graph to the destination.~\cite{javaid:2013}

\begin{algorithm}

\caption{Bubble sort}\label{alg:bubblesort}

\begin{algorithmic}[1]

\Procedure{sort}{$A$}\Comment{list of sortable items}

\State$n \gets |A|$

\Do

\State$swapped \gets false$

\For{$i \gets1$ to $n-1$}

The A* algorithm is very similar to dijkstra's, it uses the weight in addition to a hueristic value to determine the shortest path.

\If{$A[i-1]\leq A[i]$}

\State$swap(A[i-1], A[i])$

\State$swapped \gets true$

\EndIf

\EndFor

\doWhile{$swapped = false$}

\EndProcedure

\end{algorithmic}

\end{algorithm}

\section{Algorithm complexity}

\subsection{BFS}

The BFS alhorithm visits every reachable vertex and edge once. The time complexity of the algortihm is thus

the number of nodes(V) and the number of edges(E) in the graph. O(V + E) linear time.

\subsection{DFS}

The DFS algorithm also visits every vertex and edge once, therfore the time complexity is also O(V + E), linear time.

\section{Benchmark set-up}

In this section we will objectively describe how the experiments have been set up.

In this example article we provide a rather weak comparison of the three sorting algorithms over a rather speculative data set.

But is quite well serves its purpose as a build up example.

\subsection{Dijkstras's and Shortest path}

Time complexity for dijkstra's algorithm: O(V + E*logV).

The shortest path problem is about finding the shortest path between vertices of a given graph such that the sum of the weights of the edges between the nodes is minimized.

The algorithms is benchmarked over datasets of an increasing size, $n$, to benchmark average time complexity.

For each $n$ the data set is filled with $n$ samples uniformly sampled from the test-function based on \eqref{eq:integral}.

\begin{equation}\label{eq:integral}

f(x) = \int_{-\infty}^{\infty}\frac{sin(x)}{x}dx.

\end{equation}

By the way this demonstrates how to incorporate math into your article; and do not try to be funny.

The integral function is also plotted in \cref{fig:plot_function}, which demonstrates use of the graphical tikz package..

\begin{figure}[tbp]

\begin{tikzpicture}

\begin{axis}[

xmajorgrids=false,ymajorgrids=false,

legend pos=north east,

width=0.9\textwidth, height=0.25\textheight,

]

\subsection{A* algorithm and heuristics}

A* algorithm is based on the branch and bound algorithm, and uses extended list and admissible heuristics to find the shortests path faster.

The algorithm is very similar to dijkstra's, but it includes an estimated value(hueristic) when deciding the shortest path.

\caption{A plot of the function $\frac{sin(x)}{x},$ for $x \in[-25.1,25.1]$.}

\label{fig:plot_function}

\end{figure}

The heuristics is an estimated cost towards the goal from a given node. It is combined with the actual distance so far to choose the next node to visit.

Each of the sampling experiments is then run a number of times and the timed data is averaged for the combination of a given $n$ and algorithm.

The number of runs is consistent across the algorithms and $n$.

\subsection{Dynamic programming search algorithm}

The reference implementation of dp-search uses only recursion for finding the shortest path. In the improved version I use an unordered map to store

the results of the subproblems. Before calling the recursive method, the program checks the map, and if the value is found we skip the recursive method, otherwise

we call the rucursion and store the result in the map. This way we never compute the same subproblem twice.

\section{Results}

\Cref{fig:bench_sort} demonstrates the use of pgf-plots and tikz to shows performance graphs for the benchmarked sorting algorithms.

From the illustration we can quite clearly see the averaging $O(n)$, $O(n \log n)$ and $O(n^2)$ tendencies of the three sorting algorithms, Quick, Shell and Bubble, respectively.

\begin{figure}[tbp]

\begin{tikzpicture}

\begin{axis}[

xmajorgrids=false,ymajorgrids=false,

legend pos=north west,

width=0.9\textwidth, height=0.25\textheight,

% reverse legend

]

\addplot[red] table[x=size ,y=time, skip first n=0] {dat/benchmark_mylib_bubble.dat};

\addplot[green] table[x=size ,y=time, skip first n=0] {dat/benchmark_mylib_shell.dat};

\addplot[blue] table[x=size ,y=time, skip first n=0] {dat/benchmark_stl_sort.dat};

\legend{

Bubble,

Shell,

Quicksort.

}

\end{axis}

\end{tikzpicture}

\caption{Run-times: \emph{milliseconds} to sort $n$-number of objects.}

\label{fig:bench_sort}

\end{figure}

\section{Discussion and concluding remarks}

In the discussion and concluding remarks section one should:

\begin{itemize}

\item reflect over the methods with respect to the results, and

\item topics for future work could be suggested.

\end{itemize}

To end off this white-paper article template we demonstrate how to include figures by including the famous ``lena'' image utilized in number of article on the topic image processing and data compression compression, (see \cref{fig:lena}).

\caption{Diagram showing the performance of the algorithms, the size of the graphs are (Vertices x Edges) }

\label{fig:Diagram}

\end{table}

My improved version of the DP shortest paths algorithm performs worse than the reference implementation, and it has the worst performance of all the algorithms.

My dijkstra implementation seems to perform really well compared to the reference imlementation and the other algorithms.

I have yet to finish my implementation of the A* algorithm, therefore can't compare it yet.

\section{Concluding Remarks}

The purpose of the current study was to compare the performance between the different algorithms presented in this paper.

The results indicate that my own implementations performs worse than the the reference implementations.

The sizes of the graphs used to produce the benchmarking results are very small. To obtain better accuracy much larger graphs should have been used.

\noindent The Breadth first search(BFS) algorithm is a traversing algorithm that start from a specific node and progressively visits all neighbor nodes.

The algorithm continues to visit each nodes neighbor's untill all nodes have been visited.~\cite{kurant:2010}

\indent

The depth first search(DFS) algorithm traverses one neighbor at a time untill it reaches the end point. Then the algorithm backtracks

and traverses the next neigbors, and repeats untill all nodes are traversed. ~\cite{mehlhorn:2008}

\indent

Dijkstra's algorithm is an algorithm used for finding the shortest path from a starting point in the graph to the destination.~\cite{javaid:2013}

The A* algorithm is very similar to dijkstra's, it uses the weight in addition to a hueristic value to determine the shortest path.

\section{Algorithm complexity}

\subsection{BFS}

The BFS alhorithm visits every reachable vertex and edge once. The time complexity of the algortihm is thus

the number of nodes(V) and the number of edges(E) in the graph. O(V + E) linear time.

\subsection{DFS}

The DFS algorithm also visits every vertex and edge once, therfore the time complexity is also O(V + E), linear time.

\subsection{Dijkstras's and Shortest path}

Time complexity for dijkstra's algorithm: O(V + E*logV).

The shortest path problem is about finding the shortest path between vertices of a given graph such that the sum of the weights of the edges between the nodes is minimized.

\subsection{A* algorithm and heuristics}

A* algorithm is based on the branch and bound algorithm, and uses extended list and admissible heuristics to find the shortests path faster.

The algorithm is very similar to dijkstra's, but it includes an estimated value(hueristic) when deciding the shortest path.

The heuristics is an estimated cost towards the goal from a given node. It is combined with the actual distance so far to choose the next node to visit.

\subsection{Dynamic programming search algorithm}

The reference implementation of dp-search uses only recursion for finding the shortest path. In the improved version I use an unordered map to store

the results of the subproblems. Before calling the recursive method, the program checks the map, and if the value is found we skip the recursive method, otherwise

we call the rucursion and store the result in the map. This way we never compute the same subproblem twice.

\caption{Diagram showing the performance of the algorithms, the size of the graphs are (Vertices x Edges) }

\label{fig:Diagram}

\end{table}

My improved version of the DP shortest paths algorithm performs worse than the reference implementation, and it has the worst performance of all the algorithms.

My dijkstra implementation seems to perform really well compared to the reference imlementation and the other algorithms.

I have yet to finish my implementation of the A* algorithm, therefore can't compare it yet.

\section{Concluding Remarks}

The purpose of the current study was to compare the performance between the different algorithms presented in this paper.

The results indicate that my own implementations performs worse than the the reference implementations.

The sizes of the graphs used to produce the benchmarking results are very small. To obtain better accuracy much larger graphs should have been used.