assignments/assignment1/assignment1.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410

\documentclass[british]{scrartcl}

\usepackage[british]{babel}
\usepackage{csquotes}
\usepackage{enumerate}
\usepackage[hidelinks]{hyperref}
\usepackage{minted}
\setminted{fontsize=\small,breaklines,breakanywhere,tabsize=4}
\usepackage{caption}
\newenvironment{longlisting}{\captionsetup{type=listing}}{}
\usepackage{skak}
\usepackage{tikz}
\usetikzlibrary{arrows, matrix, positioning}
\usepackage{cleveref}

\let\oldurl\url
\def\url#1{{\small\oldurl{#1}}}

\title{Testing the \texttt{chesshs} Haskell Package}
\subtitle{Test Plan, Manual Testing and Automated Testing}
\author{Ren\'e den Hertog \and Camil Staps \and Erin van der Veen}
% Remove the date from '\maketitle'.
\makeatletter
	\patchcmd
		{\@maketitle}
		{{\usekomafont{date}{\@date\par}}\vskip\z@\@plus 1em}
		{}
		{}
		{}
\makeatother

\begin{document}

\maketitle

\section{Test Plan}

\subsection*{System Under Test (SUT) Description}

The chosen SUT is the \texttt{chesshs} Haskell package.
\cite{SUT} It should be able to run in any software environment and hardware platform which supports Haskell.
The latest version of the SUT, since the 13\textsuperscript{th} of March 2015, is \texttt{0.2.1}.

\subsubsection*{Executing/Running the SUT}\label{S:ERSUT}

It is required to be able to compile Haskell,
	typically through the Glasgow Haskell Compiler (GHC),
	and having the \texttt{chesshs} package installed,
		normally through Cabal via the \mintinline{bash}{cabal install chesshs} command~\cite{GHC, C}.
On our systems,
	compiling and starting the SUT is done with the \mintinline{bash}{make} and \mintinline{bash}{./runchess} commands, respectively.

The SUT is used by providing it with standard input containing a game of Chess in Portable Game Notation~\cite{PGN}.
After computation,
	some general information about the Chess match is given on standard output,
	including the end result,
	followed by the state of the board after each move.
In case of a parse error or an invalid move, an error message is displayed.

As is common with all software, the SUT is likely to contain faults.
However, few risks are associated with the program, due to the very low impact of the program faulting.
As for our expectation, the most \enquote{risky} scenario for the SUT is at a notable Chess competition.
In the case the program does not behave as intended,
	it is almost certain that a human referee would take over its place.
Furthermore, we expect such a system to be used as a supporting tool for match officials
	and not as a replacement for regulation.

\subsection*{Test Goals}

\subsubsection*{Specification}

The specification of the SUT is for the most part similar to the input-output behaviour described above.
Because the SUT is strongly related to Chess,
	many finer details of its specification can be derived from documentation from the world of Chess.

\paragraph{F\'ed\'eration Internationale Des Echecs (FIDE)}
Due to the popularity of Chess, most people are familiar with how it is played.
Hence, a summary of the rules is omitted here.
However, due to the competitive nature of the game,
	the World Chess Federation describes the rules painfully precise~\cite{FIDE}.

\paragraph{Portable Game Notation (PGN)}
Many Chess matches are recorded in PGN~\cite{PGN}, an example of which is shown in \cref{L:PGNE}.

PGN consists of two parts: the tag pairs and the movetext.

The tag pairs encode properties of the game of Chess in question,
	such as the name of the tournament or match event,
	the player of the white pieces and the player of the black pieces,
	and the result of the game.
It is possible to add self defined tag pairs.
During our testing, we have added the name of the test with a \texttt{TestName} tag pair.
Tag pairs always follow the structure

\begin{minted}{text}
	[<Tag Name> "<Tag Value>"]
\end{minted}

where \texttt{'Tag Name'} holds the name of the tag
	(\texttt{Event}, \texttt{White}, \texttt{Black}, \texttt{Result}, ...)
	and \texttt{'Tag Value'} the value of the tag
	(\texttt{Third Rosenwald Trophy}, \texttt{Donald Byrne}, \texttt{Robert James Fischer}, \texttt{0-1}, ...).

The movetext encodes the actual game of Chess in question.
It mainly consists of the moves made during the match, which are encoded with

\begin{minted}{text}
	<Round Number>. <White's Move> <Black's Move>
\end{minted}

where each player's move is recorded in the Standard Algebraic Notation (SAN).

\paragraph{Standard Algebraic Notation (SAN)}

Most moves are recorded with the SAN, which adheres to the following structure~\cite{SAN}:

\begin{minted}{text}
	<Piece Identifier>[x]<End Position>
\end{minted}

The \texttt{Piece Identifier} denotes the piece being moved by a single capital letter:
	\texttt{B} (Bishop), \texttt{K} (King), \texttt{N} (Knight), \texttt{Q} (Queen) and \texttt{R} (Rook).
Pawns either do not have an abbreviation or are denoted with \texttt{P}.
In the case the abbreviation alone is not sufficient to identify the piece being moved,
	(partial) initial position information is added to resolve ambiguity,
	according to the notation explained in \cref{F:SN}.
The \texttt{Piece Identifier} is followed by \texttt{x} only if the move resulted in a capture.
The \texttt{End Position} records the final position of the moved piece using the notation explained in \cref{F:SN}.
Additional notation exists to describe more specialized aspects of Chess moves.
For example, \texttt{+} is appended to checking moves and \texttt{O-O} alone denotes king side castling moves.

\begin{figure}
	\centering
	\newgame
	\showboard
	\caption{Square Notation: Each square is noted by its file (column letter) and rank (row number).\label{F:SN}}
\end{figure}

\paragraph{Testing Scope}

The SUT is the \texttt{chesshs} library.
However, it is useful to have an actual program that can be run.
To this end, we have written a small wrapper program that acts as a command-line interface to the library.%
	\footnote{The source code of the wrapper will be elaborated on later.}
This program is kept as minimal as possible to contain (almost) no logic that might influence our tests.

The \texttt{chesshs} package contains several subsystems.
Specifically, we will test

\begin{enumerate}
	\item PGN parsing; \label{EI:PGNP}
	\item move legality verification and \label{EI:MLV}
	\item checkmate position determination. \label{EI:CPD}
\end{enumerate}

The PGN specification has effect on \ref{EI:PGNP};
	the FIDE Chess rules have an effect on \ref{EI:MLV} and \ref{EI:CPD}.
Some Chess rules not implemented in the \texttt{chesshs} library,
	like those concerning timing,
	can and will not be tested.

Our main focus will lie on the quality characteristic of functionality.
We would like to have a reasonable level of certainty that the SUT behaves as specified
	and correctly implements the rules of Chess.
Reliability is also relevant to the SUT,
	especially because it is a library that other systems depend on:
	we do not expect any crashes; exceptions should be handled using \texttt{Maybe}s and the like.
Usability does not apply in the same way to libraries as it does to end user programs.
The actual users are the programmers that use the library in their code.
Hence, usability mainly concerns whether the API has sensible, well-scoped functions.
We will be able to comment on the usability as we have written the small wrapper,
	but will not attempt to achieve an objective judgement in this aspect
	--- that would require finding a reasonably large set of programmers using \texttt{chesshs},
		and we suspect that may be hard.

\subsection*{Test Method}

Our tests are mainly at the unit level, and somewhat at the integration level.
The SUT only consists of two components: the wrapper and the \texttt{chesshs} package.
Both are \enquote{atomic} enough to be tested via unit testing.
The interaction between these two, mostly the usage of the library by the wrapper, will be tested via integration testing.
Testing at the module, system and acceptance level is not reasonable in the case of our SUT, as its scale and number of subsystems limits us to lower testing levels.

\subsubsection*{Test Generation Techniques}

The majority of our test generation techniques are based upon or similar to black box methods.

\paragraph{Black Box Techniques}

The focus of the black box testing is on the SUT in its entirety, that is, the \texttt{chesshs} package surrounded by the wrapper.
The black box tester will try to find faults in the input-output behaviour of the program.
In this type, test generation is based on error guessing and an examples database.

Error guessing is based mostly on our collective experience of developing and testing software in the past.
But also somewhat on our experience with the game of Chess and using other similar SUTs.

The examples database consists of a collection of text files describing a game of Chess in (possibly illegal) PGN (the \enquote{\texttt{.in}} files) and the expected output of the SUT (the \enquote{\texttt{.out}} files).
To create these examples, we look at the FIDE rules in~\cite{FIDE} and the PGN specification~\cite{PGN}
	for border cases, where we expect the library may deviate from the standard.

Sadly, equivalence partitioning and boundary value analysis are hard to apply to our SUT,
	given that the collection of all possible Chess games is nearly impossible to categorise.
Use case testing is also tricky to apply.
The wrapper is intentionally very limited in its interaction and is designed purely to make the \texttt{chesshs} package \enquote{usable} for testing.
Since the core of our SUT is a library, no end user will likely interact with it directly.
In summary, it does not feel reasonable to apply use case testing to the SUT.

\paragraph{White Box Techniques}

White box and grey box testing would look somewhat similar when testing the SUT, due to the fact that the system is a package.
Grey box testing would only look at the components available to other programs, that is, the exported elements.
For instance, we might look at the types of functions, but not their implementation.
White box testing would also look at every component of the library including their implementation.
Grey box testing would consider every component as a black box:
	it would only focus on testing if each component adheres to its related specification.
White box testing would consider every component by its literal code.
In conclusion, white box testing would focus on the actual implementation of the library's functionality
	and grey box testing would only focus on the library's exported component's functionality without considering any actual code.

Sadly, path, branch and condition coverage are not included in the tools we will be using during the testing.
Hence, it is substantially harder to implement these test generation techniques and will not be applied during our testing.

\subsection*{Test Environment}

Since the SUT is in actuality the \texttt{chesshs} package, a user interface is clearly lacking. Thats why we have written a wrapper which makes the library interactive. See \cref{L:WI} for the full implementation.

Black box tests will be automatically performed using shell scripting.
In our case, we will be using bash and Linux compatible scripts.
Test automation for white box testing is achieved by functionalities available in the to be used testing tool QuickCheck~\cite{QC}.

The software environments we will be using for testing our SUT are the operating system Linux, specifically Debian and Ubuntu, and the Haskell Platform \texttt{2014.2.0.0}, which includes the Glasgow Haskell Compiler (GHC) \texttt{7.10.3}, Cabal \texttt{1.22.6.0} (\texttt{1.22.5.0} for the library) and QuickCheck \texttt{2.8.1}~\cite{HP, GHC, C, QC}.
The hardware platforms are three differently powerful laptops, notably Acer and Toshiba.

The testing architecture is best described by the graphic shown in \cref{F:TAGO}.

\begin{figure}
	\centering
	\begin{tikzpicture}[
			every node/.style={draw,rectangle},
			title/.style={draw=none},
			->,>=open triangle 60]
		\node (bbtest) at (4,0)  {Black Box Tester (\texttt{test.sh})};
		\node[matrix,row sep=.5em] (prog) at (0,0) {
			\node[title] {Wrapper};\\
			\node (chesshs) {\texttt{chesshs}};\\
		};
		\node[left=2em of chesshs] (wbtest) {White Box Tester (\texttt{test.hs})};
		\node[below of=wbtest] (quickcheck) {QuickCheck};
		\draw (bbtest) -- (prog);
		\draw (wbtest) -- (chesshs);
		\draw (quickcheck) -- (wbtest);
	\end{tikzpicture}
	\caption{Testing Architecture Graphical Overview} \label{F:TAGO}
\end{figure}

No stubs nor drivers are necessary for executing the testing architecture.

Most, if not all, documentation surrounding the tests, test results and test environment will be included in this document. It is possible that some tests and test results will contain information not stated here, such as the name of a test case and the possible flaw that it (tries to) detect(s). However, any external information will easily be found in the appropriate files.

The tests will either be stored in the examples database (directory) or will be part of the QuickCheck unit test module(s) (source file). The test results will be stored, if necessary, in the same files as their related test or in another easy to locate file. The test environment will be stored, as every other testing component, in a Git repository found at \url{https://gitlab.science.ru.nl/eveen/Testing-Techniques.git}.

Since Haskell is a pure functional language and we have fixed version numbers,
	the tests are by construction reproducible.

\section{Manual Testing}

Our tests are stored in the form of input files that are fed to the wrapper.%
	\footnote{These are just PGN files, an example of which was already shown in \cref{L:PGNE}.}
In turn, the wrapper uses the \texttt{chesshs} API to construct the position sequence.
This position sequence is then compared to a user defined expected output, an example of which is shown in \cref{L:OUTE}.
The first two lines contain metadata, after which a list of board configurations is printed,
	either until the match is finished or an invalid move is detected.

Because we do not test an interactive program,
	our manual tests are in a sense already automated.
If the wrapper were extended to support interactivity,
	manual testing would have a more prominent place.

\subsection*{Database}
We have created 11 test cases for both valid and invalid games.
As described above, the invalid games are derived from the FIDE rules
	in that they attempt to find borderline cases where the library may deviate from those rules~\cite{FIDERULES}.
We also looked at the PGN specification~\cite{PGN} for these borderline cases.
The concrete tests we have used are shown in \cref{T:T}.
Although it is impossible to calculate the coverage of these tests,
	we believe a fair part of the rules is covered.
By also including some valid games,
	we also make sure that the library does not disallow valid moves.

\begin{table}
	\centering
	\begin{tabular}{r l l}
		Test & Description & Expected result\\\hline
		 0 & A normal legal game & Accepted\\
		 1 & Double castling on kingside & \texttt{InvalidMove}\\
		 2 & White moving a black piece & \texttt{InvalidMove}\\
		 3 & Castling on both queen and kingside & \texttt{InvalidMove}\\
		 4 & Taking \emph{en passent} on second move & \texttt{InvalidMove}\\
		 5 & Illegal mate annotation & \texttt{InvalidMove}\\
		 6 & Castling through check & \texttt{InvalidMove}\\
		 7 & Castling in check & \texttt{InvalidMove}\\
		 8 & Mixed PGN comment notations & Accepted\\
		 9 & Continuing after insufficient material draw & \texttt{InvalidMove}\\
		10 & Two games in a single file & The outputs for both games\\
		11 & Illegal move by Bishop & \texttt{InvalidMove}\\
		12 & Illegal move by Knight & \texttt{InvalidMove}\\
		13 & Illegal move by Queen & \texttt{InvalidMove}\\
		14 & Double jump forward & \texttt{InvalidMove}\\
	\end{tabular}
	\caption{Descriptions of the manual tests.\label{T:T}}
\end{table}

\section{Automated Testing}

To automate our manual tests, we use a small bash script (\cref{L:ATS}).
This script runs all tests and compares the output that is produced by the wrapper with a user defined expected output.
If there is no difference between the two, the test is considered \enquote{passed}.
If there is a difference, the test is considered \enquote{failed} and the script shows the difference.

\section{Results}

The output of the test run on the tests in \cref{T:T} is given in \cref{L:TR}.
We were able to find several failures:

\begin{description}
	\item[White moving black piece]
		This test had the white player moving a black piece.
		We would expect that this causes an \texttt{InvalidMove}.
		Instead, \texttt{chesshs} gives a \texttt{NoParse} error.
		That is odd, because the PGN file is valid according to the specification~\cite{PGN}.

	\item[Taking En Passent on second move]
		This test has the white player taking a black piece \emph{en passent}.%
			\footnote{In this move, a pawn that moved two ranks forward is captured by an enemy pawn on the field one rank before the destination field.}
		This special pawn capture is only allowed in the move directly after the target pawn moved.
		The test attempts to capture the pawn one round later.
		Like the test above, we would expect an \texttt{InvalidMove} error here,
			but \texttt{chesshs} gives a \texttt{NoParse} error.%
			\footnote{%
				This is not a big issue.
				Most important is that the move was rejected.
				However, the \texttt{NoParse} error is not a very clear name for what it is used for.}

	\item[Illegal moves]
		Likewise, the library produces \texttt{NoParse} when a move is attempted to an unreachable field,
			whereas an \texttt{InvalidMove} error would be more appropriate.
		Also the \textbf{\textsf{double jump forward}} test,
			which has a pawn moving two ranks forward twice,
			suffers from this issue.

	\item[Mate when impossible]
		This test includes the \texttt{\#} annotation in the PGN file to annotate a checkmate.
		However, it includes it in the wrong position (after a move that does not produce checkmate).
		The library ignores the \texttt{\#} annotation,
			while it is not a valid move.

	\item[Castling in check]
		In this test, the white player attempts to get out of a check configuration by castling,
			which is not allowed~\cite[3.8.2.2.2]{FIDERULES}.
		The library does not take this into account, and allows the castling.

	\item[Insufficient material draw]
		This test contains a game that ends with so few pieces that a checkmate position is impossible
			(one bishop, besides the kings).
		In such a situation, the game is drawn~\cite[1.5]{FIDERULES}.
		However, \texttt{chesshs} allows further moves.
\end{description}

\clearpage
\begin{thebibliography}{8}
	\bibitem[C]    {C}    \url{http://haskell.org/cabal}
	\bibitem[FIDE] {FIDE} \url{http://fide.com/fide/handbook.html}
	\bibitem[FIDERULES] {FIDERULES} \url{http://www.fide.com/fide/handbook.html?id=207&view=article}
	\bibitem[GHC]  {GHC}  \url{http://haskell.org/ghc}
	\bibitem[HP]   {HP}   \url{http://haskell.org/platform}
	\bibitem[PGN]  {PGN}  \url{http://saremba.de/chessgml/standards/pgn/pgn-complete.htm}
	\bibitem[QC]   {QC}   \url{http://hackage.haskell.org/package/QuickCheck}
	\bibitem[SAN]  {SAN}  \url{http://cfajohnson.com/chess/SAN}
	\bibitem[SUT]  {SUT}  \url{http://hackage.haskell.org/package/chesshs}
\end{thebibliography}

\appendix
\section{Listings}

\begin{longlisting}
	\inputminted{text}{../test/database/m0.in}
	\caption{An example of Portable Game Notation (PGN).\label{L:PGNE}}
\end{longlisting}

\begin{longlisting}
	\inputminted{haskell}{../src/runchess.hs}
	\caption{Wrapper Implementation} \label{L:WI}
\end{longlisting}

\begin{longlisting}
	\inputminted{text}{../test/database/m2.out}
	\caption{An example output file.\label{L:OUTE}}
\end{longlisting}

\begin{longlisting}
	\inputminted{bash}{../test/test.sh}
	\caption{Automated Test script\label{L:ATS}}
\end{longlisting}

\begin{longlisting}
	\inputminted{text}{assignment1-test-run.txt}
	\caption{Results of the test run.\label{L:TR}}
\end{longlisting}

\end{document}