You are on page 1of 22

Information Systems

Lecture 6: Introduction to Data Mining

Dr. Sobhan Sarkar


PDF(University of Edinburgh), Ph.D.(IIT Kharagpur)
Assistant Professor
IIM Ranchi
Email: sobhan.sarkar@iimranchi.ac.in

IIM Ranchi
Contents
1. Introduction to Data Mining (DM)
2. Why DM?
3. What is DM and Knowledge Discovery in Databases (KDD)
4. KDD process
5. DM is multidisciplinary
6. DM applications
7. Real-life applications
8. Case study
9. Tools and techniques
10. References

IIM Ranchi
IIM Ranchi
IIM Ranchi
IIM Ranchi
What is 'Data Mining'

• Data mining is defined as a process used to extract usable data from a larger set of any raw data. It implies
analysing data patterns in large batches of data using one or more software.
• Data mining has applications in multiple fields, like science and research.
• As an application of data mining, businesses can learn more about their customers and develop more effective
strategies related to various business functions and in turn leverage resources in a more optimal and insightful
manner.
• Data mining involves effective data collection and warehousing as well as computer processing. For segmenting
the data and evaluating the probability of future events, data mining uses sophisticated mathematical algorithms.
• Data mining is also known as Knowledge Discovery in Data (KDD).

IIM Ranchi
IIM Ranchi
IIM Ranchi
IIM Ranchi
IIM Ranchi
IIM Ranchi
IIM Ranchi
IIM Ranchi
IIM Ranchi
IIM Ranchi
IIM Ranchi
IIM Ranchi
Limitations of Data Mining
1. Cost
Data mining involves lots of technology in use for the data collection process. Every data generated needs its
own storage space as well as maintenance. This can greatly increase the implementation cost. And also, for the
tool selection and other operations, a specialist must be hired which can also contribute to the overall
expenses.

2. Security
Identity theft is a big issue when using data mining. If adequate security is not provided, it could pose vulnerabilities in
the security. Various information of the customers are collected in the data mining. With such huge amount of data,
hackers could easily access them and steal critical information.

3. Privacy
When using data mining there are many privacy concerns raised. The information that is collected for data mining can be
used for purposes other than which it was created. Some could be leaked unknowingly or else they would be sold to
others intentionally violating user privacy. The ones who are able to acquire this data could potentially track individuals.

4. Accuracy
Even though data mining has paved the way for easy data collection with their own methods. Still it has limitations when
it comes to accuracy. Information gathered can be inaccurate causing problems in decision making.

IIM Ranchi
5. Technical Skills
There are different mining tools available based on their manners. Each of them might be with different
algorithm and design. Without a proper technical knowledge the tool selection will be a difficult task. Therefore,
a skilled technician needs to be deployed for the tool selection process.

6. Information Misuse
Apart from identity theft, the weak security present in data mining can often lead to misuse of information. People tend
to use information from a data mining for their own personal gains. Or else a group of people can be targeted to be
harmed from this information. Hence, it is the responsibility of the companies to ensure that the data is only used for
the intended purpose.

IIM Ranchi
Data Mining Applications: A Few Case Studies

 Sarkar, S., Ejaz, N., Maiti, J. & Pramanik, A. (2022). An integrated approach using growing self-
organizing map-based genetic K-means clustering and tolerance rough set in occupational risk
analysis. Neural Computing & Applications, 34, 9661-9687.

 Sarkar, S., Pramanik, A., Maiti, J., & Reniers, G. (2020). Predicting and analyzing injury severity: A machine
learning-based approach using class-imbalanced proactive and reactive data. Safety Science, 125, 104616.

 Sarkar, S., Vinay, S., Raj, R., Maiti, J., & Mitra, P. (2019). Application of optimized machine learning techniques for
prediction of occupational accidents. Computers & Operations Research, 106, 210-224.

*Note: All the aforementioned papers are attached in the drive.

IIM Ranchi
References
[1] Laudon, K. C., & Laudon, J. P. (2004). Management information systems: Managing the digital firm. Pearson Educación.

IIM Ranchi
IIM Ranchi

You might also like