assignments/assignment2/assignment2.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221

\documentclass[british]{scrartcl}

\usepackage[british]{babel}
\usepackage{csquotes}
\usepackage{enumerate}
\usepackage[hidelinks]{hyperref}
\usepackage{graphicx}
\usepackage{caption}
\usepackage{cleveref}
\usepackage{subfig}

\usepackage{adjustbox}
\usepackage{minted}

\newenvironment{longlisting}{\captionsetup{type=listing}}{}

\let\oldurl\url
\def\url#1{{\small\oldurl{#1}}}

\title{Testing Techniques}
\subtitle{Assignment 2 Automata Learning}
\author{Ren\'e den Hertog \and Camil Staps \and Erin van der Veen}
% Remove the date from '\maketitle'.
\makeatletter
	\patchcmd
		{\@maketitle}
		{{\usekomafont{date}{\@date\par}}\vskip\z@\@plus 1em}
		{}
		{}
		{}
\makeatother

\begin{document}

\maketitle

\section{Chocolate Bar Machine}
\begin{enumerate}
	\item
		L* creates large hypothesis model from the beginning.
		This means that it works faster with inefficient equivalence oracles.
		On the other hand, Rivest-Schapire and TTT create small models which can perform better with a fast equivalence oracles,
			because these algorithms spend less time on creating a well-based hypothesis.

	\item
		We used L* with the following counterexamples:

		\begin{itemize}
			\item 10ct 10ct 5ct snickers
			\item 10ct 10ct 10ct 10ct 10ct snickers snickers
			\item 10ct 10ct 10ct snickers 10ct twix
			\item 10ct 10ct 10ct 5ct snickers twix
			\item 10ct 10ct 5ct 10ct snickers mars
			\item 10ct 10ct 5ct 10ct 10ct mars mars snickers
		\end{itemize}

		The hypotheses are given in \cref{fig:lstar-run} (p.~\pageref{fig:lstar-run}).

	\item
		States are in principle uniquely identified by the amount injected in the machine.
		The machine does not accept more than 45ct, and the granularity is 5ct.
		Hence, there are $\frac{45}{5}+1=10$ states.
		However, there are two states in which we have 35ct, namely one where we can still insert money
			(accessed through \enquote{10ct 10ct 5ct 10ct})
			and one where we cannot (e.g. \enquote{10ct 10ct 10ct 5ct}).
		Hence, the complete model has 11 states.

		The learned model also contains 11 states, therefore they must be equivalent.

	\item
		A counter-example $\pi$ consists of a prefix, an input where an inconsistency appears and a suffix showing the inconsistency:
			$\pi = \sigma u \rho$.
		However, when we provide some sequence $\pi$ it may be possible to find distinct partitions
			$\pi = \sigma_1 u_1 \rho_1 \neq \sigma_2 u_2 \rho_2 = \pi$
			which both help to improve the hypothesis.
		If on the first run we add $\rho_1$ to the suffices of our table,
			$\sigma_2 u_2 \rho_2$ may not be covered yet,
			and we may be able to add $\pi$ a second time.

	\item
		We use Rivest-Schapire.

	\item
		The found models are the same modulo state names.
		This is because, as argued in the answer to question 3, the SUT has eleven states
			and all testing methods come up with a model of eleven states.

		With a random walk we needed only 56s.
		WMethod took 15154s and WPMethod took 5956s.

	\item
		The method with user queries worked very efficiently in combination with L*.
		It allows the user to insert cleverly chosen counter-examples,
			but does not ask him as often as other learning methods would do.
		However, we realise that for more complex systems it may be difficult for the user to provide counter-examples.

		As for the testing method,
			on our small example a random walk appears to be by far the fastest.
		However, as noted in the source code (\texttt{BasicLearner.java}),
			it may perform worse than the others on large models.

\end{enumerate}

\begin{figure}[p]
	\includegraphics[width=.5\textwidth]{LStar_hypothesis0}
	\includegraphics[width=.5\textwidth]{LStar_hypothesis1}
	\includegraphics[width=.5\textwidth]{LStar_hypothesis2}
	\includegraphics[width=.5\textwidth]{LStar_hypothesis3}
	\includegraphics[width=.5\textwidth]{LStar_hypothesis4}
	\includegraphics[width=.5\textwidth]{LStar_hypothesis5}
	\caption{Learning the chocolate bar machine with L* (top to bottom, left to right). Continued on the next page.\label{fig:lstar-run}}
\end{figure}
\begin{figure}[p]
	\ContinuedFloat
	\includegraphics[width=\textwidth]{LStar_hypothesis6}
	\caption[]{Continued from the previous page: learning the chocolate bar machine with L*; final learned model.\label{fig:lstar-run}}
\end{figure}
\thispagestyle{empty}

\section{Bounded Retransmission Protocol}

\subsection*{Q8}

When running the learning system (a normal laptop computer and a common distribution of Linux) on the reference implementation with the default values for the settings of the random walk testing method to find counter examples, that is,
\begin{minted}{java}
    BasicLearner.randomWalk_chanceOfResetting = 0.1;
    BasicLearner.randomWalk_numberOfSymbols = 300;
\end{minted}
it executes for about 5 minutes resulting in a final model \emph{without} the outputs \texttt{OCONF\_0} and \texttt{OCONF\_2}. \\
When inspecting the visual form of the final model, we can see that most of the states only transition to another state on \emph{one} input, while ``looping" on itself on any \emph{other} input.
See \texttt{Bounded Retransmission Protocol Tester/Out Put/Only "Reference.jar" With Default (300 : 0.1) Random Walk/Final Model Of Reference.pdf}.
This leads to the state ``behaving'' almost like a ``dead end''.
We call these states `stubborn'.
Note that the resulting final model does differ from the real model, as some of the required outputs are missing.
However, it is reasonable to assume that the models of the systems under test also include mostly stubborn states. \\
In order to ``escape" stubborn states, the random walk must continuously try different inputs, until the ``escaping" input is chosen.
Hence, we need to \emph{lower} the chance of resetting of the random walk to the initial state. \\

When running the learning system on the reference implementation with the following settings,
\begin{minted}{java}
    BasicLearner.randomWalk_chanceOfResetting = 0.01;
    BasicLearner.randomWalk_numberOfSymbols = 300;
\end{minted}
it executes for \emph{also} around 5 minutes \emph{as well as} resulting in a final model \emph{without} the outputs \texttt{OCONF\_0} and \texttt{OCONF\_2}. \\
It is likely that the learning system requires a ``lengthier" testing method in order to result in the real model.
Thus, we need to \emph{increase} the number of symbols random walk tries. \\

When running the learning system on the reference implementation on the following settings,
\begin{minted}{java}
    BasicLearner.randomWalk_chanceOfResetting = 0.01;
    BasicLearner.randomWalk_numberOfSymbols = 1001;
\end{minted}
it executes for \emph{30 minutes}. However, it definitely results in a final model containing \emph{all} the required outputs (\texttt{OCONF\_0}, \texttt{OCONF\_1} and \texttt{OCONF\_2}).
Hence, we will use these last settings for the entire learning system. \\

The learning system is ``backboned" by the Java program shown in listing \ref{L:BRPTM}.

\subsection*{Q9}

Only for the 5\textsuperscript{th} implementation did the learning system not result in a final model containing all the required outputs: it is missing \texttt{OCONF\_2}. \\

One probable reason why this model is lacking outputs is simple: the implementation contains a fault which prevents it from ever outputting \texttt{OCONF\_2}. \\

Another likely reason is the fact that random walk does not guarantee to show that a given hypothesis model differs from the real model.
Due to the fact that a counter example may exist which is not found by this testing method.
Either because the randomness lead it to not occur or because it contains more inputs than the current settings allow to ``visit''. \\
The solution to this problem is changing the random walk parameters to test the given hypothesis more strictly.
By lowering the chance of resetting and/or increasing the number of input symbols tried.
Even though the 5\textsuperscript{th} final model may be different from the real model, we did not continue testing this implementation.
Because the execution already took around 30 minutes and it is still possible that a bug in the implementation prevents \texttt{OCONF\_2} from even outputting. \\

All the other implementations (Reference, 1, 2, 3, 4 and 6) resulted in final models which contained the required outputs \texttt{OCONF\_0}, \texttt{OCONF\_1} and \texttt{OCONF\_2}.

\subsection*{Q10}

\begin{adjustbox}{center, width=\textwidth}
    \begin{tabular}{c|l}
        \textbf{Implementation} & \textbf{Equivalent To Reference?}                              \\
                                & (\textbf{Counter Example})                                     \\
        \hline
        \hline
        1                       & No                                                             \\
                                & \footnotesize{\verb|IREQ_0_0_0 ITIMEOUT|}                      \\
        \hline
        2                       & No                                                             \\
                                & \footnotesize{
            \verb|IREQ_0_0_0 ITIMEOUT ITIMEOUT ITIMEOUT ITIMEOUT ITIMEOUT|
                                  } \\
        \hline
        3                       & Yes                                                            \\
                                & -                                                              \\
        \hline
        4                       & No                                                             \\
                                & \footnotesize{\verb|IREQ_0_0_0 IACK IACK|}                     \\
        \hline
        5                       & No                                                             \\
                                & \footnotesize{
            \verb|IREQ_0_0_0 IACK IACK ITIMEOUT ITIMEOUT ITIMEOUT ITIMEOUT ITIMEOUT ITIMEOUT|
                                  } \\
        \hline
        6                       & No                                                             \\
                                & \footnotesize{\verb|IREQ_0_0_1 IACK IACK|}
    \end{tabular}
\end{adjustbox}

\pagebreak

\appendix

\section*{Source : Bounded Retransmission Protocol}

\begin{longlisting}
    \inputminted[fontsize=\footnotesize]
                {java}
                {Bounded Retransmission Protocol Tester/Source/Main.java}
    \caption{
            \texttt{Bounded Retransmission Protocol Tester} Learning System's \texttt{Main.java}
    } \label{L:BRPTM}
\end{longlisting}

\end{document}