diff options
author | Camil Staps | 2015-09-27 15:40:21 +0200 |
---|---|---|
committer | Camil Staps | 2015-09-27 15:40:21 +0200 |
commit | a600b6f4ef0fd0e92c51f2f3aa6bfe7a6ba2d436 (patch) | |
tree | 414778fdd5699a2869aa667fda502ff3c9fa51b3 | |
parent | Finish assignment 1 (diff) |
Added solutions LaTeX file
-rw-r--r-- | .gitignore | 28 | ||||
-rw-r--r-- | Assignment 1/solutions.tex | 45 |
2 files changed, 73 insertions, 0 deletions
@@ -2,3 +2,31 @@ *.spyderworkspace *.pyc +*.aux +*.glo +*.idx +*.fdb_latexmk +*.fls +*.log +*.toc +*.ist +*.acn +*.acr +*.alg +*.bbl +*.blg +*.dvi +*.glg +*.gls +*.ilg +*.ind +*.lof +*.lot +*.maf +*.mtc +*.mtc1 +*.out +*.pdf +*.swp +*.synctex.gz + diff --git a/Assignment 1/solutions.tex b/Assignment 1/solutions.tex new file mode 100644 index 0000000..dd8f6f2 --- /dev/null +++ b/Assignment 1/solutions.tex @@ -0,0 +1,45 @@ +\documentclass[10pt,a4paper]{article} + +\usepackage[utf8]{inputenc} +\usepackage[margin=2cm]{geometry} + +\usepackage{enumitem} +\setenumerate[1]{label=1.\arabic*.} +\setenumerate[2]{label=\arabic*.} +\setenumerate[3]{label=\alph*.} + +\usepackage{amsmath} +\usepackage{amsfonts} + +\parindent0pt + +\title{Data Mining - assignment 1} +\author{Camil Staps\\\small{s4498062}} + +\begin{document} + +\maketitle + +\begin{enumerate} + \item See the code. For e we cannot compute $N$, because non-square matrices don't have eigenv*s. + \item \begin{enumerate} + \item See the code. + \item \begin{enumerate} + \item PCA is a method that can be used to reduce dimensionality of a dataset. It can be used when some variables are correlated; we then basically rewrite one of them as a function of the other(s). Of course, in general that implies data loss. + \item EVD is a way to rewrite a diagnoalisable matrix into a canonical form (a summation of products of eigenvalues and corresponding eigenvectors). SVD is a generalisation which can be applied to any matrix. + + In SVD, we write $A = U\Sigma V^T$. The $u_i$s are eigenvectors of $AA^T$, the $v_i$s of $A^TA$. These can be computed using EVD. + \setcounter{enumiii}{4} + \item As you can see in the execution of the code, the second component mainly takes into account the last two attributes (G and H). Adherence with attribute A, B or C would give a large negative projection, while adherence with attribute G or H would give a large positive projection. + \end{enumerate} + \end{enumerate} + + \item \begin{enumerate} + \item For any two similarity measures, the five least similar are quite different. Based on the five most similar images, SMC and Jaccard produce similar results. Correlation and Cosine produce some similar results. Using image 2 it is clear that SMC and ExtendedJaccard are sensitive to lighting conditions, and thus maybe not a very good choice to compare faces. Also Correlation seems a little sensitive to this. Lastly, Cosine seems to recognise faces somewhat better than Jaccard (take e.g. nr. 635). + + \item See the execution of the code. + \end{enumerate} +\end{enumerate} + +\end{document} + |