Professional Documents
Culture Documents
Introduction
Data mining is the process of discovering new knowledge in the forms of patterns
and relationships in large data sets. It helps find knowledge from a data set that
was previously impossible to obtain with traditional methods. Modern data mining is
well equipped to discover useful knowledge or patterns from unstructured data such
as web traffic, emails and social media content. Data mining uses a range of
machine learning algorithms and modern statistical techniques to discover
knowledge from data sets.
This unit will introduce the theoretical foundation of data mining and a range of
data mining processes and techniques. The unit will also provide hands-on
experience in developing data mining applications using an appropriate
programming language or data mining tool.
Topics included in this unit are: data mining terminologies, scope of data mining
such as classification, regression and clustering methods and techniques, associate
pattern mining, mining time series data, and mining text data.
On successful completion of this unit, students will appreciate the theoretical and
technical concepts of data mining and its techniques and processes, gain hands-on
experience in implementing data mining techniques using a programming language
such as Python, R, or a tool such as Weka, KNIME, Excel etc.
As a result students will develop skills such as communication literacy, critical
thinking, analysis, reasoning and interpretation, which are crucial for gaining
employment and developing academic competence.
It is assumed that students will have some knowledge of data analytics and
machine learning, or will have completed Unit 12: Data Analytics and Unit 26:
Machine Learning.
Learning Outcomes
By the end of this unit students will be able to:
LO1. Discuss the historical and theoretical foundation of data mining, its scope,
techniques, and processes.
LO2. Investigate a range of data mining techniques to discover patterns and
relationships in large data sets.
LO3. Illustrate how a data mining algorithm performs text mining to identify
relationships within text.
LO4. Evaluate a range of graph data mining techniques that recognise patterns
and relationships in graph-based technologies.
LO1 Discuss the historical and theoretical foundation of data mining, its
scope, techniques, and processes
Clustering methods.
Topic Modelling.
Presentation methods of text (final outcome of the mining): charts, graphs,
word cloud and so forth.
Textbooks
Aggarwal, C. (2015) Data Mining: The Textbook. Springer.
Hofmann, M. and Chisholm, A. (2015) Text Mining and Visualization: Case Studies
Using Open-Source Tools. Chapman and Hall/CRC.
Russell, M. (2013) Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn,
Google+, GitHub, and More. 2nd Ed. O'Reilly Media.
Witten, I., Eibe, F. and Hall, M. (2011) Data Mining: Practical Machine Learning
Tools and Techniques. 3rd Ed. Morgan Kaufmann.
Websites
archive.ics.uci.edu/ml University of California, Irvine
“Machine Learning Repository” (Data sets)
www.lfd.uci.edu University of California, Irvine – Laboratory for
Fluorescence Dynamics
“Binaries for Python Extension Packages” (Development
Tool)
cran.r-project.org The R Project for Statistical Computing
“R Archive Network” (Development Tool)
www.cs.waikato.ac.nz University of Waikato – Machine Learning Group
“Data Mining Software in Java” (Development Tool)
www.knime.org Konstanz Information Miner
“KNIME” (Development Tool)
Links
This unit links to the following related units: