\section{Complexity analysis} \label{sec:analysis} Robson provides us with a very detailed proof that the $\ms$ algorithm runs in $\pazocal O(\mathrm{polynomial}(n)\cdot2^{cn})$, where $c<0.296$ and with $n$ the number of vertices of the input graph \cite{robson}. However, the argument goes over my head, so I will provide my own, simpler, less exact analysis here. We write $T_{\ms}(n)$ for the running time of $\ms$ on a graph $G$ with $|G|=n$. For $\ms^n$ we write $T_{\ms^n}(p,q)$ for the appliance of $\ms^n$ on a graph $G$, with $|G|=p$, and a set $S$ to pick $n$ vertices from, with $|S|=q$. In the case of $n=1$, we write $T_{\ms^1}(p)$ instead of $T_{\ms^1}(p,2)$, because we only use the case of $q=2$. For the sake of brevity, I will use but not discuss here that basic graph operations such as dominance checking, edge checking, degree computation and others run in $\pazocal O(\mathrm{polynomial}(n))$. Informally, this can be checked by looking at the methods in the Java code accompanying this report. \subsection{$\ms^1$: \autoref{alg:ms1}} \label{sec:analysis:ms1} This function is a simple case distinction. Let's therefore consider the complexities of all the cases one by one. The results, along with their line numbers in \autoref{alg:ms1}, may be found in \autoref{tab:analysis-ms1}. \begin{table}[h] \centering \renewcommand{\arraystretch}{1.3} \begin{tabular}{r | p{29mm} | p{19mm} | p{50mm}} Ln. & Case & Complexity & Argumentation \\\hline\hline \ref{alg:ms1:case-d1} & $d(s_1)\leq 1$ & $T_{\ms}(n)$;\newline $T_{\ms}(n-1)+\pazocal O(n)$ & Simple recurrence; see below\\\hline \ref{alg:ms1:case-edge-d4} & $e(s_1,s_2), d(s_1)>3$ & $2T_{\ms}(n-5)$ & Both vertices have degree four, so we run $\ms$ twice on a graph with five vertices less\\\hline \ref{alg:ms1:case-edge-d3} & $e(s_1,s_2), d(s_1)\leq3$ & $T_{\ms}(n)$;\newline $3T_{\ms}(n-2)$ & Simple recurrence; see below\\\hline \ref{alg:ms1:intersection} & $N(s_1)\cap N(s_2)\neq\emptyset$ & $T_{\ms^1}(n-1)$ & At least one vertex removed \\\hline \ref{alg:ms1:case-2-2-edge} & $d(s_1)=d(s_2)=2$, $e(w_1,w_2)$ & $T_{\ms}(n-3)$ & Simple recurrence \\\hline \ref{alg:ms1:case-2-2-dominance} & $d(s_1)=d(s_2)=2$, $N^2(s_1)\subset N(s_2)$ & $T_{\ms}(n-6)$ & Exactly six vertices removed \\\hline \ref{alg:ms1:case-2-2-else} & $d(s_1)=d(s_2)=2$, otherwise & $T_{\ms}(n-3) + T_{\ms}(n-7)$ & In the first case three vertices removed, in the second case seven \\\hline \ref{alg:ms1:case-general} & Otherwise & $T_{\ms}(n-3) + T_{\ms^2}(n-4,4)$ & In the first case at least three vertices removed, in the second case four, and at most four to choose two from \end{tabular} \caption{Worst case time complexity analysis of the different cases in \autoref{alg:ms1}} \label{tab:analysis-ms1} \end{table} The case in line \ref{alg:ms1:case-d1} is problematic in the sense that the size of the graph is not reduced. However, we know that $d(s_1)\le1$, which means that there is either a strongly connected component of size $1$, namely $\{s_1\}$, or that there is a $v$, namely $s_1$, with $d(v)=1$. In the first case, the time complexity will reduce to $T_{\ms}(1)+T_{\ms}(n-1)=T_{\ms}(n-1) + \pazocal O(n)$, as we will see below in \autoref{tab:analysis-ms}. In the second case, the time complexity reduces to $T_{\ms}(n-2) < T_{\ms}(n-1) + \pazocal O(n)$. We need to do something similar for the case in line \ref{alg:ms1:case-edge-d3}. In this case, we know there is a vertex $v$ with $d(v)\le3$. This is broken down into a number of cases in \autoref{alg:ms}, all of which the time complexity may be found in \autoref{tab:analysis-ms} below. It is easy to see that all of the relevant cases have a running time less than $T_{\ms}(n-2)+T_{\ms^2}(n-3,6)+T_{\ms^2}(n-1,3)$. Since $T_{\ms^2}(n-3,\dots)4$ is in line \ref{alg:ms:case-d2-noedge}. In \autoref{tab:analysis-ms} below we see that this particular case $3$ vertices were already removed from the graph. We will use this below in the analysis of \autoref{alg:ms}, line \ref{alg:ms:case-d2-noedge} in \autoref{tab:analysis-ms}. \begin{thmproof}[ms2-le-ms]{lemma}{There exists a $c$ for which for all graphs $G$ and subgraphs $S$ with $|S|\le4$, we have $T_{\ms^2}(|G|,|S|)\le T_{\ms}(n-2)$.} This holds for every case in \autoref{tab:analysis-ms2} with $c=3$. Cases with complexity $T_{\ms^1}(n-x)$ suffice by \autoref{lem:ms1-le-ms}. \end{thmproof} \subsection{$\ms$: \autoref{alg:ms}} \label{sec:analysis:ms} Also this is a simple case distinction. Let's again put the different cases in a table, along with their line numbers. See \autoref{tab:analysis-ms}. \begin{table}[h] \centering \renewcommand{\arraystretch}{1.3} \begin{tabular}{r | p{28mm} | p{20mm} | p{50mm}} Ln. & Case & Complexity & Argumentation \\\hline\hline \ref{alg:ms:case-component} & Multiple strongly connected components & $T_{\ms}(|C|) + T_{\ms}(n-|C|)$ & $|C|$ is the size of some strongly connected component\\\hline \ref{alg:ms:case-g1} & $|G|\le1$ & $\pazocal O(n)$ & Counting the number of vertices\\\hline \ref{alg:ms:case-d1} & $d(v)=1$ & $T_{\ms}(n-2)$ & $v$ and its neighbour removed\\\hline \ref{alg:ms:case-d2-edge} & $d(v)=2$, $e(w,w')$ & $T_{\ms}(n-3)$ & $v$ and its two neighbours removed\\\hline \ref{alg:ms:case-d2-noedge} & $d(v)=2$, otherwise & $T_{\ms}(n-6) + T_{\ms^2}(n-3,6)$; \newline $T_{\ms}(n-6) + T_{\ms}(n-3)$ & $d(w)$ and $d(w')\ge2$; $N^2(v)$ is at most $6$ for any degree-2 vertex in a graph with maximum degree $4$; as we saw in \autoref{sec:analysis:ms2}, this reduces to the second sum\\\hline \ref{alg:ms:case-d3} & $d(v)=3$ & $T_{\ms}(n-4) + T_{\ms^2}(n-1,3)$; \newline $T_{\ms}(n-4) + T_{\ms}(n-3)$ & Simple recurrences; second form by \autoref{lem:ms2-le-ms}\\\hline \ref{alg:ms:case-dominance} & $v>w$ & $T_{\ms}(n-1)$; \newline $T_{\ms}(n-3) + T_{\ms}(n-5)$ & Simple recurrence; see below\\\hline \ref{alg:ms:case-otherwise} & Otherwise & $T_{\ms}(n-1) + T_{\ms}(n-5)$ & $d(w)\ge d(v)\le4$ so $|\iN(w)|\le5$ \end{tabular} \caption{Worst case time complexity analysis of the different cases in \autoref{alg:ms}} \label{tab:analysis-ms} \end{table} We saw already in \autoref{sec:algorithm:good-v:dominance} that after removing $w$ in the case of dominance of $v$ over $w$ on line \ref{alg:ms:case-dominance}, we will have a vertex with degree $3$. Therefore, the recurrence $T_{\ms}(n-1)$ in this case reduces to one of the above cases with $n:=n-1$. This is at most $T_{\ms}(n-3)+T_{\ms}(n-5)$. We can do something similar in the `Otherwise' case of line \ref{alg:ms:case-otherwise}. We can only be in this case if there is no vertex with degree less than $4$. Since all vertices have a maximum degree of $4$, this means that every vertex has degree $4$. In both subcases that are considered in line \ref{alg:ms:case-otherwise} of \autoref{alg:ms}, vertices are removed from the graph, leaving a graph with a vertex of maximum degree $3$. If we use that knowledge, we see from the other cases in \ref{tab:analysis-ms} that the time complexity of the `Otherwise' case can be rewritten as $T_{\ms}(n-3)+T_{\ms}(n-5)+T_{\ms}(n-7)+T_{\ms}(n-9)$. \begin{thmproof}[otherwise-half-of-cases]{lemma}{Of any $k$ successive applications of $\ms$, at most $\lceil\frac k2\rceil$ will be using the inefficient `Otherwise' case.} After applying the `Otherwise' case, there will be at least one vertex with degree lower than $4$. In the iteration directly after applying the `Otherwise' case, we don't need to apply it. \end{thmproof} \begin{thmproof}[ms-complexity]{theorem}{\autoref{alg:ms} runs in $\pazocal O\left(2^{n/2}\right)$.} We see that the time complexity of $\ms(n)$ depends on $\ms(n-c)$ with $c\ge2$. Furthermore, if $c=2$, we have $\ms(n)=\ms(n-2)$, so if $\ms(n)$ depends on $\ms(n-2)$, it depends on that \emph{alone}, giving a branching factor of $1$ in the search tree. Branching only occurs when $c\ge3$. Only in the `Otherwise' case, the branching factor is $4$, in the other cases it is $2$. We then consider the search tree $T$ where we compress directly related nodes with branching factor $1$ into one node. By \autoref{lem:otherwise-half-of-cases}, only on one of two consecutive levels can we have a branching factor of $4$ (and in the worst case this will happen as much as possible). If we then combine the levels with branching factor $4$ with the levels directly below them (having branching factor $2$), we find a search tree $T'$ with in the worst case a constant branching factor of $8$. The original search tree $T$ has at most $\frac n3$ levels, since branching only occurs when $c\ge3$. Therefore, the compressed tree $T'$ has, in the worst case, $\frac n6$ levels (note, that this case is worse than the case of a tree with constant branching factor $2$ and $\frac n3$ levels, and also worse than any case in between). In every vertex of the search tree $T'$ we need polynomial time, since we assumed that basic graph operations run in polynomial time. Therefore, \autoref{alg:ms} runs in $$\pazocal O\left(\left(8\cdot\mathrm{polynomial}(n)\right)^{n/6}\right) = \pazocal O\left(\left(2\cdot\mathrm{polynomial}(n)\right)^{n/2}\right) = \pazocal O\left(2^{n/2}\right).$$ \end{thmproof} This result is, naturally, coherent with Robson's result that the algorithm runs in $\pazocal O\left(\mathrm{polynomial}(n)\cdot2^{cn}\right)$ with $c<0.296$ \cite{robson}. We have proven that the bound holds with $c\le\frac12$. \subsection{Space complexity} \label{sec:analysis:space} A graph is stored as a list of nodes, and for every edge both nodes hold one reference. Therefore, the space complexity of a single graph $G=(V,E)$ is $\pazocal O(|V|+|E|)$. As discussed in \autoref{sec:implementation}, we're using a remove-and-restore strategy. For every recurrence, no matter what the branching factor, we do not need to create new objects: the whole algorithm runs on one object. This happens at the cost of using a stack of restore information, that are lists of vertices that are removed from the graph. Since at any moment a node will either be in the graph \emph{or} on that stack (and then only once so), the space complexity of the graph object is still $\pazocal O(|V|+|E|)$. There is one small exception to this: the \texttt{findStronglyConnectedComponent} method does create a new object. Space-wise, the worst case is then the case of a graph consisting of $|V|$ strongly connected components. In this case, we need $|V|$ graph objects, giving a quadratic worst case space complexity of $\pazocal O\left(|V|\cdot(|V|+|E|)\right)$. This is not taking into regard the garbage collector. A closer look at the implementation of the \texttt{maximumIndependentSetSize} method will show that in fact at most two graph objects are needed at any point in the program. In any case, thanks to the efficiency gained by applying \autoref{lem:components}, the algorithm runs very fast on graphs with many strongly connected components, so we do not need to worry about the theoretically quadratic space complexity.