aboutsummaryrefslogtreecommitdiff
path: root/Assignment 6/report
diff options
context:
space:
mode:
Diffstat (limited to 'Assignment 6/report')
-rw-r--r--Assignment 6/report/assignment6.tex132
-rw-r--r--Assignment 6/report/ex611_1.pngbin0 -> 32622 bytes
-rw-r--r--Assignment 6/report/ex611_2.pngbin0 -> 39543 bytes
-rw-r--r--Assignment 6/report/ex611_3.pngbin0 -> 38378 bytes
-rw-r--r--Assignment 6/report/ex611_4.pngbin0 -> 49015 bytes
-rw-r--r--Assignment 6/report/ex612.pngbin0 -> 18607 bytes
-rw-r--r--Assignment 6/report/ex613.pngbin0 -> 15942 bytes
-rw-r--r--Assignment 6/report/ex621.pngbin0 -> 41650 bytes
-rw-r--r--Assignment 6/report/ex622.pngbin0 -> 136692 bytes
-rw-r--r--Assignment 6/report/ex623.pngbin0 -> 152707 bytes
-rw-r--r--Assignment 6/report/ex624.pngbin0 -> 157007 bytes
11 files changed, 132 insertions, 0 deletions
diff --git a/Assignment 6/report/assignment6.tex b/Assignment 6/report/assignment6.tex
new file mode 100644
index 0000000..45f4be6
--- /dev/null
+++ b/Assignment 6/report/assignment6.tex
@@ -0,0 +1,132 @@
+\documentclass[10pt,a4paper]{article}
+
+\usepackage[margin=2cm]{geometry}
+\usepackage{graphicx}
+\usepackage{rotating}
+
+\let\assignment6
+
+\usepackage{enumitem}
+\setenumerate[1]{label=\assignment.\arabic*.}
+\setenumerate[2]{label=\arabic*.}
+\setenumerate[3]{label=\roman*.}
+
+\usepackage{fancyhdr}
+\renewcommand{\headrulewidth}{0pt}
+\renewcommand{\footrulewidth}{0pt}
+\fancyhead{}
+%\fancyfoot[C]{Copyright {\textcopyright} 2015 Camil Staps}
+\pagestyle{fancy}
+
+\usepackage{caption}
+\usepackage{subcaption}
+\usepackage[hidelinks]{hyperref}
+
+\usepackage{listings}
+\lstset{basicstyle=\small\ttfamily,columns=flexible,breaklines=true}
+
+\usepackage{nicefrac}
+
+\parindent0pt
+
+\title{Data Mining - assignment \assignment}
+\author{Camil Staps\\\small{s4498062}}
+
+\begin{document}
+
+\maketitle
+\thispagestyle{fancy}
+
+\begin{enumerate}
+ \item \begin{enumerate}
+ \item See \autoref{fig:ex611}.
+
+ For \texttt{synth1} we may take any metric but correlation distance. This is because \texttt{correlation($(x,y),(x+k,y+k)$)} is very low for any $x,y,k$. This makes that the blue and red points cannot be distinguished. The other metrics work well because this is a relatively easy dataset. We need only one neighbour.
+
+ For \texttt{synth2} the Euclidean distance and the Manhattan distance are comparable. The cosine metric performs very well, while correlation gives poor results. This is probably due to the same reason as above. When using the cosine measure, points are close if they are on one line through the origin, which is useful in this dataset. The Manhattan or Euclidean metrics don't take this into account. We need only one neighbour.
+
+ In \texttt{synth3} we should actually first normalise the data. Then we could probably use Euclidean or Manhattan distance without problems. The correlation metric doesn't perform well for similar reasons as above. We should use only one neighbour, or the classification error will increase.
+
+ In \texttt{synth4} all measures perform more or less equally well, although correlation does a little worse than the other ones for low $k$. We should use one of the other ones with just one neighbour.
+
+ As seen in \texttt{synth1}, with well-separated data one neighbour is enough. In \texttt{synth4} we see that in less well-separated datasets we may as well take more neighbours, without it being such a big problem (as seen in \texttt{synth2} and \texttt{synth3}). We could even imagine datasets with even less well-separated data where it is better to use more neighbours.
+
+ \setcounter{figure}{1}
+ \begin{figure}[b]
+ \centering
+ \begin{minipage}{.32\linewidth}
+ \centering
+ \includegraphics[width=\linewidth]{ex612}
+ \caption{}
+ \label{fig:ex612}
+ \end{minipage}
+ \begin{minipage}{.32\linewidth}
+ \centering
+ \includegraphics[width=\linewidth]{ex613}
+ \caption{}
+ \label{fig:ex613}
+ \end{minipage}
+ \begin{minipage}{.32\linewidth}
+ \centering
+ \includegraphics[width=\linewidth]{ex621}
+ \caption{}
+ \label{fig:ex621}
+ \end{minipage}
+ \end{figure}
+
+ \setcounter{figure}{0}
+ \begin{sidewaysfigure}[ht]
+ \centering
+ \includegraphics[width=\linewidth]{ex611_1}
+ \includegraphics[width=\linewidth]{ex611_2}
+ \includegraphics[width=\linewidth]{ex611_3}
+ \includegraphics[width=\linewidth]{ex611_4}
+ \caption{K-nearest neighbours classification on four synthetic datasets. Accuracy is blue, error rate is red.}
+ \label{fig:ex611}
+ \end{sidewaysfigure}
+
+ \setcounter{figure}{4}
+
+ \item See \autoref{fig:ex612}.
+
+ \item See \autoref{fig:ex613}. The error rate is lowest at $k=11$, so that would be the best choice.
+
+ \end{enumerate}
+
+ \item \begin{enumerate}
+ \item Well, this is given, but $X$ and $y$ are related by the XOR operation. See \autoref{fig:ex621} for the plot.
+
+ \item See \autoref{fig:ex622}. The network performs poorly because it is impossible to model XOR with just one hidden unit. As can be seen, we can only draw one boundary, but that is not enough.
+
+ \begin{figure}[p]
+ \centering
+ \includegraphics[width=.9\linewidth]{ex622}
+ \caption{Plots of all models fitted during our cross-validation run.}
+ \label{fig:ex622}
+ \end{figure}
+
+ \item See \autoref{fig:ex623}. The sixth, ninth and tenth model perform well, but the others don't. I'm not sure why -- perhaps there is something with the stopping condition. But, on average, this network performs better than with just one hidden unit, because, as can be seen, we can now draw two boundaries.
+
+ \begin{figure}[p]
+ \centering
+ \includegraphics[width=.9\linewidth]{ex623}
+ \caption{Plots of all models fitted during our cross-validation run.}
+ \label{fig:ex623}
+ \end{figure}
+
+ \item See \autoref{fig:ex624}. We see here that more models are well-fitted, which is a benefit. However, they are also getting complexer (the decision boundaries get jagged), which is a drawback. Furthermore, especially in this case, the data is not as complex as the model is trying to represent: the model is overfitting.
+
+ It's a pity that the models don't get much better, I'd like to know why.
+
+ \begin{figure}[p]
+ \centering
+ \includegraphics[width=.9\linewidth]{ex624}
+ \caption{Plots of all models fitted during our cross-validation run.}
+ \label{fig:ex624}
+ \end{figure}
+
+ \end{enumerate}
+\end{enumerate}
+
+\end{document}
+
diff --git a/Assignment 6/report/ex611_1.png b/Assignment 6/report/ex611_1.png
new file mode 100644
index 0000000..83b18ec
--- /dev/null
+++ b/Assignment 6/report/ex611_1.png
Binary files differ
diff --git a/Assignment 6/report/ex611_2.png b/Assignment 6/report/ex611_2.png
new file mode 100644
index 0000000..0bc275f
--- /dev/null
+++ b/Assignment 6/report/ex611_2.png
Binary files differ
diff --git a/Assignment 6/report/ex611_3.png b/Assignment 6/report/ex611_3.png
new file mode 100644
index 0000000..451b419
--- /dev/null
+++ b/Assignment 6/report/ex611_3.png
Binary files differ
diff --git a/Assignment 6/report/ex611_4.png b/Assignment 6/report/ex611_4.png
new file mode 100644
index 0000000..56508ca
--- /dev/null
+++ b/Assignment 6/report/ex611_4.png
Binary files differ
diff --git a/Assignment 6/report/ex612.png b/Assignment 6/report/ex612.png
new file mode 100644
index 0000000..ede35af
--- /dev/null
+++ b/Assignment 6/report/ex612.png
Binary files differ
diff --git a/Assignment 6/report/ex613.png b/Assignment 6/report/ex613.png
new file mode 100644
index 0000000..66899e1
--- /dev/null
+++ b/Assignment 6/report/ex613.png
Binary files differ
diff --git a/Assignment 6/report/ex621.png b/Assignment 6/report/ex621.png
new file mode 100644
index 0000000..d26dc4a
--- /dev/null
+++ b/Assignment 6/report/ex621.png
Binary files differ
diff --git a/Assignment 6/report/ex622.png b/Assignment 6/report/ex622.png
new file mode 100644
index 0000000..0b4d5ad
--- /dev/null
+++ b/Assignment 6/report/ex622.png
Binary files differ
diff --git a/Assignment 6/report/ex623.png b/Assignment 6/report/ex623.png
new file mode 100644
index 0000000..7b51dbf
--- /dev/null
+++ b/Assignment 6/report/ex623.png
Binary files differ
diff --git a/Assignment 6/report/ex624.png b/Assignment 6/report/ex624.png
new file mode 100644
index 0000000..d1c3880
--- /dev/null
+++ b/Assignment 6/report/ex624.png
Binary files differ