Practical1/report/analysis.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189

\section{Complexity analysis}
\label{sec:analysis}

Robson provides us with a very detailed proof that the $\ms$ algorithm runs in $\pazocal O(\mathrm{polynomial}(n)\cdot2^{cn})$, where $c<0.296$ and with $n$ the number of vertices of the input graph \cite{robson}. However, the argument goes over my head, so I will provide my own, simpler, less exact analysis here.

We write $T_{\ms}(n)$ for the running time of $\ms$ on a graph $G$ with $|G|=n$. For $\ms^n$ we write $T_{\ms^n}(p,q)$ for the appliance of $\ms^n$ on a graph $G$, with $|G|=p$, and a set $S$ to pick $n$ vertices from, with $|S|=q$. In the case of $n=1$, we write $T_{\ms^1}(p)$ instead of $T_{\ms^1}(p,2)$, because we only use the case of $q=2$.

For the sake of brevity, I will use but not discuss here that basic graph operations such as dominance checking, edge checking, degree computation and others run in $\pazocal O(\mathrm{polynomial}(n))$. Informally, this can be checked by looking at the methods in the Java code accompanying this report.

\subsection{$\ms^1$: \autoref{alg:ms1}}
\label{sec:analysis:ms1}

This function is a simple case distinction. Let's therefore consider the complexities of all the cases one by one. The results, along with their line numbers in \autoref{alg:ms1}, may be found in \autoref{tab:analysis-ms1}.

\begin{table}[h]
    \centering
    \renewcommand{\arraystretch}{1.3}
    \begin{tabular}{r | p{29mm} | p{19mm} | p{50mm}}
        Ln. & Case & Complexity & Argumentation \\\hline\hline
        \ref{alg:ms1:case-d1}       & $d(s_1)\leq 1$
            & $T_{\ms}(n)$;\newline
              $T_{\ms}(n-1)+\pazocal O(n)$
            & Simple recurrence; see below\\\hline
        \ref{alg:ms1:case-edge-d4} & $e(s_1,s_2), d(s_1)>3$
            & $2T_{\ms}(n-5)$
            & Both vertices have degree four, so we run $\ms$ twice on a graph with five vertices less\\\hline
        \ref{alg:ms1:case-edge-d3} & $e(s_1,s_2), d(s_1)\leq3$
            & $T_{\ms}(n)$;\newline
              $3T_{\ms}(n-2)$
            & Simple recurrence; see below\\\hline
        \ref{alg:ms1:intersection} & $N(s_1)\cap N(s_2)\neq\emptyset$
            & $T_{\ms^1}(n-1)$ 
            & At least one vertex removed \\\hline
        \ref{alg:ms1:case-2-2-edge} & $d(s_1)=d(s_2)=2$, $e(w_1,w_2)$
            & $T_{\ms}(n-3)$
            & Simple recurrence \\\hline
        \ref{alg:ms1:case-2-2-dominance} & $d(s_1)=d(s_2)=2$, $N^2(s_1)\subset N(s_2)$
            & $T_{\ms}(n-6)$
            & Exactly six vertices removed \\\hline
        \ref{alg:ms1:case-2-2-else} & $d(s_1)=d(s_2)=2$, otherwise
            & $T_{\ms}(n-3) + T_{\ms}(n-7)$ 
            & In the first case three vertices removed, in the second case seven \\\hline
        \ref{alg:ms1:case-general} & Otherwise
            & $T_{\ms}(n-3) + T_{\ms^2}(n-4,4)$
            & In the first case at least three vertices removed, in the second case four, and at most four to choose two from
    \end{tabular}
    \caption{Worst case time complexity analysis of the different cases in \autoref{alg:ms1}}
    \label{tab:analysis-ms1}
\end{table}

The case in line \ref{alg:ms1:case-d1} is problematic in the sense that the size of the graph is not reduced. However, we know that $d(s_1)\le1$, which means that there is either a strongly connected component of size $1$, namely $\{s_1\}$, or that there is a $v$, namely $s_1$, with $d(v)=1$. In the first case, the time complexity will reduce to $T_{\ms}(1)+T_{\ms}(n-1)=T_{\ms}(n-1) + \pazocal O(n)$, as we will see below in \autoref{tab:analysis-ms}. In the second case, the time complexity reduces to $T_{\ms}(n-2) < T_{\ms}(n-1) + \pazocal O(n)$.

We need to do something similar for the case in line \ref{alg:ms1:case-edge-d3}. In this case, we know there is a vertex $v$ with $d(v)\le3$. This is broken down into a number of cases in \autoref{alg:ms}, all of which the time complexity may be found in \autoref{tab:analysis-ms} below. It is easy to see that all of the relevant cases have a running time less than $T_{\ms}(n-2)+T_{\ms^2}(n-3,6)+T_{\ms^2}(n-1,3)$. Since $T_{\ms^2}(n-3,\dots)<T_{\ms}(n-2)$ and $T_{\ms^2}(n-1,3)<T_{\ms}(n-2)$ by \autoref{tab:analysis-ms2} below, we know that this is surely less than $3T_{\ms}(n-2)$.

\begin{thmproof}[ms1-le-ms]{lemma}{There exists a $c$ for which for all $n$ it holds that $T_{\ms^1}(n)\le c\cdot T_{\ms}(n-1)$.}
    This holds for every case in \autoref{tab:analysis-ms1} with $c=3$.
\end{thmproof}

\subsection{$\ms^2$: \autoref{alg:ms2}}
\label{sec:analysis:ms2}
Also this is a simple case distinction. Let's again put the different cases in a table, along with their line numbers. See \autoref{tab:analysis-ms2}.

\begin{table}[h]
    \centering
    \renewcommand{\arraystretch}{1.3}
    \begin{tabular}{r | c | p{29mm} | p{19mm} | p{40mm}}
        Ln. & $|S|$ & Case & Complexity & Argumentation \\\hline\hline
        \ref{alg:ms2:case-s1} & $0,1$ & All
            & $\pazocal O(1)$
            & Constant return value\\\hline
        \ref{alg:ms2:case-s2-edge} & $2$ & $e(s_1,s_2)$
            & $\pazocal O(1)$
            & Constant return value\\\hline
        \ref{alg:ms2:case-s2-noedge} & $2$ & Otherwise
            & $T_{\ms}(n-2)$
            & At least two vertices removed\\\hline
        \ref{alg:ms2:case-s3-d0} & $3$ & $d(s_i)=0$
            & $T_{\ms^1}(n-1)$
            & Simple recurrence\\\hline
        \ref{alg:ms2:case-s3-cycle} & $3$ & $s_1,s_2,s_3$ form a three-cycle
            & $\pazocal O(1)$
            & Constant return value\\\hline
        \ref{alg:ms2:case-s3-path} & $3$ & $i\neq k \land e(s_i,s_j)\land e(s_i,s_k)$
            & $T_{\ms}(n-3)$
            & At least $s_1,s_2,s_3$ removed\\\hline
        \ref{alg:ms2:case-s3-edge} & $3$ & $e(s_i,s_j)$
            & $T_{\ms^1}(n-1)$
            & At least $s_k$ removed\\\hline
        \ref{alg:ms2:case-s3-intersection} & $3$ & $N(s_i)\cap N(s_j)\neq\emptyset$
            & $T_{\ms^2}(n-1,2)$
            & At least one element in the intersection which is removed\\\hline
        \ref{alg:ms2:case-s3-d1} & $3$ & $d(s_i)=1$
            & $T_{\ms^1}(n-2)$
            & Both $s_i$ and its neighbour removed\\\hline
        \ref{alg:ms2:case-s3-otherwise} & $3$ & Otherwise
            & $T_{\ms^1}(n-3) + T_{\ms^2}(n-7,4)$
            & Every $s_i$ has $d(s_i)\ge2$\\\hline
        \ref{alg:ms2:case-s4-dle3} & $4$ & $d(s_i)\le3$
            & $T_{\ms}(n)$; \newline
              $3T_{\ms}(n-2)$
            & Simple recurrence; see below\\\hline
        \ref{alg:ms2:case-s4-no-dle3} & $4$ & Otherwise
            & $T_{\ms}(n-5) + T_{\ms^2}(n-1,3)$
            & $d(s_1)\ge4$ so at least $5$ removed in the first case; second case is a simple recurrence\\\hline
        \ref{alg:ms2:case-otherwise} & $5+$ & All
            & $T_{\ms}(n)$
            & Simple recurrence
    \end{tabular}
    \caption{Worst case time complexity analysis of the different cases in \autoref{alg:ms2}}
    \label{tab:analysis-ms2}
\end{table}

The case in line \ref{alg:ms2:case-s4-dle3} is again problematic. However, as above for line \ref{alg:ms1:case-edge-d3} of \autoref{alg:ms1}, this reduces to $3T_{\ms}(n-2)$. 

Also line \ref{alg:ms2:case-otherwise} is problematic. However, in \autoref{alg:ms} we see that the only time $\ms^2$ is called with $|S|>4$ is in line \ref{alg:ms:case-d2-noedge}. In \autoref{tab:analysis-ms} below we see that this particular case $3$ vertices were already removed from the graph. We will use this below in the analysis of \autoref{alg:ms}, line \ref{alg:ms:case-d2-noedge} in \autoref{tab:analysis-ms}.

\begin{thmproof}[ms2-le-ms]{lemma}{There exists a $c$ for which for all graphs $G$ and subgraphs $S$ with $|S|\le4$, we have $T_{\ms^2}(|G|,|S|)\le T_{\ms}(n-2)$.}
    This holds for every case in \autoref{tab:analysis-ms2} with $c=3$. Cases with complexity $T_{\ms^1}(n-x)$ suffice by \autoref{lem:ms1-le-ms}.
\end{thmproof}

\subsection{$\ms$: \autoref{alg:ms}}
\label{sec:analysis:ms}
Also this is a simple case distinction. Let's again put the different cases in a table, along with their line numbers. See \autoref{tab:analysis-ms}.

\begin{table}[h]
    \centering
    \renewcommand{\arraystretch}{1.3}
    \begin{tabular}{r | p{28mm} | p{20mm} | p{50mm}}
        Ln. & Case & Complexity & Argumentation \\\hline\hline
        \ref{alg:ms:case-component} & Multiple strongly connected components
            & $T_{\ms}(|C|) + T_{\ms}(n-|C|)$
            & $|C|$ is the size of some strongly connected component\\\hline
        \ref{alg:ms:case-g1} & $|G|\le1$
            & $\pazocal O(n)$
            & Counting the number of vertices\\\hline
        \ref{alg:ms:case-d1} & $d(v)=1$
            & $T_{\ms}(n-2)$
            & $v$ and its neighbour removed\\\hline
        \ref{alg:ms:case-d2-edge} & $d(v)=2$, $e(w,w')$
            & $T_{\ms}(n-3)$
            & $v$ and its two neighbours removed\\\hline
        \ref{alg:ms:case-d2-noedge} & $d(v)=2$, otherwise
            & $T_{\ms}(n-6) + T_{\ms^2}(n-3,6)$; \newline
              $T_{\ms}(n-6) + T_{\ms}(n-3)$
              & $d(w)$ and $d(w')\ge2$; $N^2(v)$ is at most $6$ for any degree-2 vertex in a graph with maximum degree $4$; as we saw in \autoref{sec:analysis:ms2}, this reduces to the second sum\\\hline
        \ref{alg:ms:case-d3} & $d(v)=3$
            & $T_{\ms}(n-4) + T_{\ms^2}(n-1,3)$; \newline
              $T_{\ms}(n-4) + T_{\ms}(n-3)$
            & Simple recurrences; second form by \autoref{lem:ms2-le-ms}\\\hline
        \ref{alg:ms:case-dominance} & $v>w$
            & $T_{\ms}(n-1)$; \newline
              $T_{\ms}(n-3) + T_{\ms}(n-5)$
            & Simple recurrence; see below\\\hline
        \ref{alg:ms:case-otherwise} & Otherwise
            & $T_{\ms}(n-1) + T_{\ms}(n-5)$
            & $d(w)\ge d(v)\le4$ so $|\iN(w)|\le5$
    \end{tabular}
    \caption{Worst case time complexity analysis of the different cases in \autoref{alg:ms}}
    \label{tab:analysis-ms}
\end{table}

We saw already in \autoref{sec:algorithm:good-v:dominance} that after removing $w$ in the case of dominance of $v$ over $w$ on line \ref{alg:ms:case-dominance}, we will have a vertex with degree $3$. Therefore, the recurrence $T_{\ms}(n-1)$ in this case reduces to one of the above cases with $n:=n-1$. This is at most $T_{\ms}(n-3)+T_{\ms}(n-5)$.

We can do something similar in the `Otherwise' case of line \ref{alg:ms:case-otherwise}. We can only be in this case if there is no vertex with degree less than $4$. Since all vertices have a maximum degree of $4$, this means that every vertex has degree $4$. In both subcases that are considered in line \ref{alg:ms:case-otherwise} of \autoref{alg:ms}, vertices are removed from the graph, leaving a graph with a vertex of maximum degree $3$. If we use that knowledge, we see from the other cases in \ref{tab:analysis-ms} that the time complexity of the `Otherwise' case can be rewritten as $T_{\ms}(n-3)+T_{\ms}(n-5)+T_{\ms}(n-7)+T_{\ms}(n-9)$.

\begin{thmproof}[otherwise-half-of-cases]{lemma}{Of any $k$ successive applications of $\ms$, at most $\lceil\frac k2\rceil$ will be using the inefficient `Otherwise' case.}
    After applying the `Otherwise' case, there will be at least one vertex with degree lower than $4$. In the iteration directly after applying the `Otherwise' case, we don't need to apply it.
\end{thmproof}

\begin{thmproof}[ms-complexity]{theorem}{\autoref{alg:ms} runs in $\pazocal O\left(2^{n/2}\right)$.}
    We see that the time complexity of $\ms(n)$ depends on $\ms(n-c)$ with $c\ge2$. Furthermore, if $c=2$, we have $\ms(n)=\ms(n-2)$, so if $\ms(n)$ depends on $\ms(n-2)$, it depends on that \emph{alone}, giving a branching factor of $1$ in the search tree. Branching only occurs when $c\ge3$. Only in the `Otherwise' case, the branching factor is $4$, in the other cases it is $2$.
    
    We then consider the search tree $T$ where we compress directly related nodes with branching factor $1$ into one node. By \autoref{lem:otherwise-half-of-cases}, only on one of two consecutive levels can we have a branching factor of $4$ (and in the worst case this will happen as much as possible). If we then combine the levels with branching factor $4$ with the levels directly below them (having branching factor $2$), we find a search tree $T'$ with in the worst case a constant branching factor of $8$.
    
    The original search tree $T$ has at most $\frac n3$ levels, since branching only occurs when $c\ge3$. Therefore, the compressed tree $T'$ has, in the worst case, $\frac n6$ levels (note, that this case is worse than the case of a tree with constant branching factor $2$ and $\frac n3$ levels, and also worse than any case in between).
    
    In every vertex of the search tree $T'$ we need polynomial time, since we assumed that basic graph operations run in polynomial time. Therefore, \autoref{alg:ms} runs in $$\pazocal O\left(\left(8\cdot\mathrm{polynomial}(n)\right)^{n/6}\right) = \pazocal O\left(\left(2\cdot\mathrm{polynomial}(n)\right)^{n/2}\right) = \pazocal O\left(2^{n/2}\right).$$
\end{thmproof}

This result is, naturally, coherent with Robson's result that the algorithm runs in $\pazocal O\left(\mathrm{polynomial}(n)\cdot2^{cn}\right)$ with $c<0.296$ \cite{robson}. We have proven that the bound holds with $c\le\frac12$.

\subsection{Space complexity}
\label{sec:analysis:space}
A graph is stored as a list of nodes, and for every edge both nodes hold one reference. Therefore, the space complexity of a single graph $G=(V,E)$ is $\pazocal O(|V|+|E|)$.

As discussed in \autoref{sec:implementation}, we're using a remove-and-restore strategy. For every recurrence, no matter what the branching factor, we do not need to create new objects: the whole algorithm runs on one object. This happens at the cost of using a stack of restore information, that are lists of vertices that are removed from the graph. Since at any moment a node will either be in the graph \emph{or} on that stack (and then only once so), the space complexity of the graph object is still $\pazocal O(|V|+|E|)$.

There is one small exception to this: the \texttt{findStronglyConnectedComponent} method does create a new object. Space-wise, the worst case is then the case of a graph consisting of $|V|$ strongly connected components. In this case, we need $|V|$ graph objects, giving a quadratic worst case space complexity of $\pazocal O\left(|V|\cdot(|V|+|E|)\right)$. This is not taking into regard the garbage collector. A closer look at the implementation of the \texttt{maximumIndependentSetSize} method will show that in fact at most two graph objects are needed at any point in the program. In any case, thanks to the efficiency gained by applying \autoref{lem:components}, the algorithm runs very fast on graphs with many strongly connected components, so we do not need to worry about the theoretically quadratic space complexity.