You are on page 1of 3

Data Mining with Weka

University of Waikato

WEEK 1
What’s data mining? What’s Weka?

Hi! Welcome to the course Data Mining with Weka. I’m Ian Witten from the
University of Waikato in New Zealand and I’m presenting the videos for this course
which is being prepared by the Department of Computer Science at the University
of Waikato.

Data mining is a mature technology that a lot of people are beginning to take very
seriously, and a lot of other people find it very mysterious. The real aim of this
course is to take the mystery out of data mining. This is a practical course on how
to use the Weka workbench, which you will download as part of the course, for
data mining. We explain the basic principles of several popular data mining
algorithms and how to use them in practical applications.

In the world today, we’re overwhelmed with data. Every time you swipe your credit
card, every item you checkout out at the supermarkets, every time you send a text,
make a phone call, or send an email, or type a key on a computer, even every time
you walk past a security camera – it all generates a little bit of data in a database.
Data mining is about going from the raw data to information, information that can
be used to make predictions, predictions that are useful in the real world.

Let me give you an example. You’re at the supermarket checkout. The till records
every item you bought. At the end, you hand over your loyalty card, and they give
you a couple of percent off, and you give them your name and address, and,
indirectly, access to all sorts of demographic information about you and people like
you.

Everybody likes a good bargain. It’s been a good day today, because, thanks to
those coupons they sent you in the mail last week, you’ve been able to stock up on
some things you wouldn’t normally have bought, but you bought today because
they’re such a good deal. Next week they’ll send you some more coupons, and you’ll
go shopping again and buy some more stuff. They do little experiments on you, you

FutureLearn 1
know, they try to figure out how much more you would buy if the price was just
that little bit less.

These coupons are a mechanism for personalized pricing. They’ve got access to all
sorts of data from you, and people like you, in order to do these experiments and
figure these things out. Everybody wins: you get your bargains; they sell more stuff.
It sounds like a good deal to me.

Here’s another application. Suppose you and your partner want a child, but you can’t
have one. It’s fun trying, but it can get a little bit frustrating, and, ultimately, very
frustrating, perhaps even tragic. In artificial insemination, they take some eggs from
the woman’s ovaries, and they fertilize them with partner or donor sperm, and then
they select from amongst the embryos that are produced some to implant back into
the womb. You want to select the ones with the best chance of success of
producing a live birth, but you don’t want too many live births. The embryologist
has access to all sorts of data on these embryos. I think there are 50–100 pieces of
information that they record about individual embryos, and they have historical
data on which ones produced a live birth – a success.

So here’s an ideal situation for data mining. We have lots of historical data; we have
data on the present situation; and we want to select those embryos that have the
best chance of success. Now, that’s a good application for data mining, bringing a
live child to a couple who wants one.

I talk about “data mining” and “machine learning”. Data mining is the application,
and machine learning is the algorithms we use. We’re talking about using machine
learning algorithms for the purposes of data mining.

The next question – this is Data Mining with Weka – “What’s Weka?” This is a weka
here, this little bird. It’s a flightless bird, like its better known cousin the kiwi, found
only in the islands of New Zealand. This is what it sounds like, coming to you from
New Zealand.

However, in our context, Weka is a data mining workbench. It’s an acronym for the
Waikato Environment for Knowledge Analysis. We just call it Weka. It contains a
large number of algorithms for classification, and a lot of algorithms for data
preprocessing, feature selection, clustering, finding association rules, things like
that. It’s a very comprehensive workbench, and it’s free open source software that
you will download as part of this course in the next lesson. It runs on any computer.

FutureLearn 2
It’s written in Java, and runs on Linux, Windows, Mac. You’ll be able to download it
and run it on your workstation and use it during the course.

You’re going to learn how to load data into Weka and look at it. You’re going to
learn about preprocessing, cleaning up data using filters, exploring it using
visualizations, applying classification algorithms, interpreting the output,
understanding evaluation methods – evaluation is very important in this area –
understand various representations for models, how popular machine learning
algorithms work, and be aware of common pitfalls with data mining. The ultimate
goal really is to empower you to use Weka on your own data, and, most
importantly, to understand what it is you are doing.

That’s it. I just thought I’d show you were I am. I’m in New Zealand, that’s where
Weka is from. That’s where I’m sitting right now. This is the world as we see it in
New Zealand. We’re at the top, you’re probably down at the bottom somewhere.
We’re at the top, in the center, and that arrow to the North Island of New Zealand
is where the University of Waikato is.

I’ll see you again in the next lesson. I’m looking forward to that. Goodbye for now.

FutureLearn 3

You might also like