aboutsummaryrefslogtreecommitdiff
path: root/Assignment 5/report
diff options
context:
space:
mode:
authorCamil Staps2015-12-12 16:47:44 +0000
committerCamil Staps2015-12-12 16:47:44 +0000
commitc6f86bdb722aac53bb39b0d78d2b538b6f07a692 (patch)
treea13212987d15f369b0a448df87b49bcca4cc7f51 /Assignment 5/report
parentFinish assignment 4 (diff)
Assignment 5
Diffstat (limited to 'Assignment 5/report')
-rw-r--r--Assignment 5/report/assignment5.tex156
1 files changed, 156 insertions, 0 deletions
diff --git a/Assignment 5/report/assignment5.tex b/Assignment 5/report/assignment5.tex
new file mode 100644
index 0000000..47ff27e
--- /dev/null
+++ b/Assignment 5/report/assignment5.tex
@@ -0,0 +1,156 @@
+\documentclass[10pt,a4paper]{article}
+
+\usepackage[margin=2cm]{geometry}
+\usepackage{graphicx}
+
+\let\assignment5
+
+\usepackage{enumitem}
+\setenumerate[1]{label=\assignment.\arabic*.}
+\setenumerate[2]{label=\arabic*.}
+\setenumerate[3]{label=\roman*.}
+
+\usepackage{fancyhdr}
+\renewcommand{\headrulewidth}{0pt}
+\renewcommand{\footrulewidth}{0pt}
+\fancyhead{}
+%\fancyfoot[C]{Copyright {\textcopyright} 2015 Camil Staps}
+\pagestyle{fancy}
+
+\usepackage{caption}
+\usepackage{subcaption}
+\usepackage[hidelinks]{hyperref}
+
+\usepackage{listings}
+\lstset{basicstyle=\small\ttfamily,columns=flexible,breaklines=true}
+
+\usepackage{nicefrac}
+
+\parindent0pt
+
+\title{Data Mining - assignment \assignment}
+\author{Camil Staps\\\small{s4498062}}
+
+\begin{document}
+
+\maketitle
+\thispagestyle{fancy}
+
+\begin{enumerate}
+ \item This is the output:
+
+ \begin{lstlisting}[gobble=12]
+ Mining for frequent itemsets by the Apriori algorithm
+ Mining for associations by the Apriori algorithm
+ Apriori analysis done, extracting results
+
+
+ RESULTS:
+
+ Frequent itemsets:
+ Item: 6[Sup. 100]
+ Item: 2[Sup. 83.3]
+ Item: 2 6[Sup. 83.3]
+ Item: 7[Sup. 83.3]
+ Item: 7 6[Sup. 83.3]
+
+
+ Association rules:
+ Rule: 6 <- [Conf. 100,Sup. 100]
+ Rule: 6 <- 2[Conf. 100,Sup. 83.3]
+ Rule: 6 <- 7[Conf. 100,Sup. 83.3]
+ \end{lstlisting}
+
+ In rule 1 we see simply that everyone studies physics. Either it is very popular or the data has been selected based on this (e.g. we could be only interested in physics students).
+
+ Rule 2 and rule 3 say that taking math or chemistry is a good indicator for taking physics. However, in this dataset everyone is taking physics, so it isn't particularly intersting.
+
+ \item \begin{enumerate}[start=2]
+ \item Here are some rules with high confidence:
+
+ \begin{lstlisting}[gobble=20]
+ [Conf. 99.7,Sup. 31.2]:
+ Star Wars (1977) <- Empire Strikes Back, The (1980), Raiders of the Lost Ark (1981), Return of the Jedi (1983)
+ [Conf. 91.8,Sup. 32.2]:
+ Star Wars (1977) <- Indiana Jones and the Last Crusade (1989)
+ [Conf. 86.6,Sup. 33.5]:
+ Star Wars (1977) <- Star Trek: First Contact (1996)
+ [Conf. 85.1,Sup. 30.3]:
+ Raiders of the Lost Ark (1981) <- Fugitive, The (1993)
+ [Conf. 81.6,Sup. 33.9]:
+ Fargo (1996) <- Twelve Monkeys (1995)
+ \end{lstlisting}
+
+ I haven't seen any of these movies, so I can't say much about this. Some research tells me many of them (like in the first rule) are Star Wars movies, so it isn't surprising they are correlated.
+
+ Some more research tells me there are connections between Star Wars and Indiana Jones, so the second rule makes sense as well.
+
+ Wikipedia\footnote{\url{https://en.wikipedia.org/wiki/Comparison_of_Star_Trek_and_Star_Wars}} says there are similarities between Star Wars and Star Trek, and that the two have influenced each other. That would explain the third rule.
+
+ Both Raiders of the Lost Ark and The Fugitive are in the categories Action and Adventure according to the IMDb. Harrison Ford was the lead in both movies.
+
+ Fargo and Twelve Monkeys both are marked as thriller in the IMDb. Apart from that and the data of publication, I can't find anything similar in the two at first glance.
+
+ \item The movies bought by most users are:
+
+ \begin{lstlisting}[gobble=20]
+ Star Wars (1977) (support: 61.8)
+ Contact (1997) (support: 54.0)
+ Fargo (1996) (support: 53.9)
+ Return of the Jedi (1983) (support: 53.8)
+ \end{lstlisting}
+
+ There are few rules with more than three items, because there is just one itemset with support greater than or equal to 30\% (\texttt{[172, 174, 181, 50]}) that has more than three items. Apparently, no other four or more movies can be appointed that have been bought together by at least 30\% of the users. Reducing the support would, obviously, give more rules with more than three items.
+
+ \item Yes, it is very well possible to have rules with low support but high confidence. Consider a dataset where every record (say 1000) has but one item, \texttt{1}. However, two of the records have two other items as well, \texttt{2} and \texttt{3}. Then the support for the rules \texttt{2 <- 3} and \texttt{3 <- 2} is but $0.2\%$, however, the confidence is $100\%$.
+ \end{enumerate}
+
+ \item \begin{enumerate}
+ \item \begin{enumerate}
+ \item $\frac{\sigma(a)}{\sigma(b)} = \frac{45}{80} = 0.5625 < 0.6$, so not interesting.
+ \item $\frac{\sigma(a,b)}{\sigma(a)\sigma(b)} = \frac{0.3}{0.45\cdot0.8} \approx 0.833$. Since this is below one, the items are negatively correlated.
+ \item The rule is not interesting. Neither the confidence nor the interest is high enough.
+ \end{enumerate}
+
+ \item \begin{enumerate}
+ \item Table 2:
+ $$\frac{\nicefrac{105}{294}\cdot\nicefrac{62}{294}}{\nicefrac{87}{294}\cdot\nicefrac{40}{294}} \approx 1.870.$$
+
+ Table 3, students:
+ $$\frac{\nicefrac{2}{36}\cdot\nicefrac{20}{36}}{\nicefrac{5}{36}\cdot\nicefrac{9}{36}} \approx 0.889.$$
+
+ Table 3, adults:
+ $$\frac{\nicefrac{103}{258}\cdot\nicefrac{42}{258}}{\nicefrac{35}{258}\cdot\nicefrac{78}{258}} \approx 1.585.$$
+
+ The association is stronger (positively) when data is pooled together, since both $1.870 > 0.889$ and $1.870 > 1.585$.
+
+ \item Table 2:
+
+ $$\frac{\nicefrac{105}{294}-\nicefrac{192}{294}\cdot\nicefrac{145}{294}}{\sqrt{\nicefrac{192}{294}\cdot\nicefrac{145}{294}\cdot\left(1-\nicefrac{192}{294}\right)\left(1-\nicefrac{145}{294}\right)}} \approx 0.147.$$
+
+ Table 3, students:
+ $$\frac{\nicefrac{2}{36}-\nicefrac{11}{36}\cdot\nicefrac{7}{36}}{\sqrt{\nicefrac{11}{36}\cdot\nicefrac{7}{36}\cdot\left(1-\nicefrac{11}{36}\right)\left(1-\nicefrac{7}{36}\right)}} \approx -0.021.$$
+
+ Table 3, adults:
+ $$\frac{\nicefrac{103}{258}-\nicefrac{181}{258}\cdot\nicefrac{138}{258}}{\sqrt{\nicefrac{181}{258}\cdot\nicefrac{138}{258}\cdot\left(1-\nicefrac{181}{258}\right)\left(1-\nicefrac{138}{258}\right)}} \approx 0.105.$$
+
+ Also this association is stronger when data is pooled together, for the analogous reason.
+
+ \item Table 2:
+
+ $$\frac{\nicefrac{105}{294}}{\nicefrac{145}{294}\cdot\nicefrac{192}{294}} \approx 1.109.$$
+
+ Table 3, students:
+ $$\frac{\nicefrac{2}{36}}{\nicefrac{7}{36}\cdot\nicefrac{11}{36}} \approx 0.935.$$
+
+ Table 3, adults:
+ $$\frac{\nicefrac{103}{258}}{\nicefrac{138}{258}\cdot\nicefrac{181}{258}} \approx 1.063.$$
+
+ Also this association is stronger when data is pooled together, for the analogous reason.
+ \end{enumerate}
+ \end{enumerate}
+
+\end{enumerate}
+
+\end{document}
+