This action might not be possible to undo. Are you sure you want to continue?
Manoj Pandia, Silicon Institute of Technology
Introduction - Data
Data are any facts, numbers, or text that can be processed by a computer. Today, organizations are accumulating vast and growing amounts of data in different formats and different databases. This includes:
operational or transactional data such as, sales, cost, inventory, payroll, and accounting nonoperational data, such as industry sales, forecast data, and macro economic data meta data - data about the data itself, such as logical database design or data dictionary definitions
Information The patterns. For example.Introduction . associations. or relationships among all this data can provide information. Manoj Pandia . analysis of retail point of sale transaction data can yield information on which products are selling and when.
Knowledge Information can be converted into knowledge about historical patterns and future trends. Thus. a manufacturer or retailer could determine which items are most susceptible to promotional efforts. Manoj Pandia . summary information on retail supermarket sales can be analyzed in light of promotional efforts to provide knowledge of consumer buying behavior. For example.Introduction .
Information & Knowledge Manoj Pandia .Data.
Manoj Pandia .Presence of Data Data is found everywhere Education Hospital Manufacturing Industry Finance Banking Marketing Retailing Insurance Transport And so on….
which includes insertions.Database v/s Data Warehouse A database is a collection of related data and a database system is a database and database software together. DSS and data mining.That is. they can't further optimized for the applications such as OLAP. Traditional databases support on-line transaction processing (OLTP). In contrast to databases. while also supporting information query requirements. updates. network or hierarchical. object-oriented. Data warehouses are designed specifically to support efficient extraction. data warehouses generally contain very large amounts of data from multiple sources that may include databases from different data models and sometimes files acquired from independent systems and platforms. Thus databases must strike a balance between efficiency in transaction processing and supporting query requirements (ad hoc user requests). Databases are transactional such as relational. processing and presentation for analytic and decisionmaking purposes. But a data warehouse is typically optimized for access from a decision maker's needs. and deletions. Traditional databases are optimized to process queries that may touch a small part of the database and transactions that deal with insertions or updates of a few tuples per relation to process. Manoj Pandia . A data warehouse is also a collection of in formation as well as a supporting system.
presentations all in terms that are familiar to them.. sales. The data comes form operational information that is needed by a particular group of employees for analysis. finance. Manoj Pandia .Data Mart A data mart is an easy-to-access repository of a subset of highly focused data for a single function or department (i.e. marketing) and is considerably smaller than a data warehouse. content. Data for a data mart is derived from a data warehouse or from more specialized access.
Sybase. data (1990s) warehouses Data Mining "What’s likely to happen Advanced algorithms. Warehousing New England last March? processing (OLAP). Arbor." multidimensional Microstrategy Support databases. Query Language (SQL). Microsoft ODBC Data "What were unit sales in On-line analytic Pilot. proactive information delivery Manoj Pandia . massive numerous databases startups (nascent industry) Enabling Technologies Computers. Lockheed. & Decision Drill down to Boston. IBM. disks Product Providers IBM. Cognos. (Emerging to Boston unit sales next multiprocessor IBM. CDC Characteristics Retrospective. SGI. Pilot. dynamic data delivery at record level Retrospective. New England last March?" (RDBMS). static data delivery Retrospective. Comshare. dynamic data delivery at multiple levels Prospective.The Evolution Evolutionary Step Data Collection (1960s) Data Access (1980s) Business Question "What was my total revenue in the last five years?" "What were unit sales in Relational databases Oracle. Today) month? Why?" computers. Structured Informix. tapes.
” a shorter term. Mining of gold from rocks or sand is referred to as gold mining rather than rock or sand mining. may not reflect the emphasis on mining from large amounts of data.What is Data Mining Data mining refers to extracting or “mining” knowledge from large amounts of data.” which is unfortunately somewhat long. Manoj Pandia . data mining should have been more appropriately named “knowledge mining from data. “Knowledge mining. Thus.
data warehouses and other information repositories We are drowning in data.Why Data Mining? Data explosion problem Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases. regularities. constraints) from data in large databases Manoj Pandia . but starving for knowledge! Solution: Data warehousing and data mining Data warehousing and on-line analytical processing Extraction of interesting knowledge (rules. patterns.
but information poor Manoj Pandia .Why Data Mining? How can I analyze my data? We are data rich.
or KDD Others view data mining as simply an essential step in the process of knowledge discovery Manoj Pandia .Data Mining & KDD Many people treat data mining as a synonym for Knowledge Discovery from Data.
Data Mining & KDD Evaluation & Presentation Knowledge Data Mining Selection & Transformation Patterns Cleaning & Integration Task Relevant Data Data Warehouse Databases Manoj Pandia .
2. 5. Data cleaning (to remove noise and inconsistent data) Data integration (where multiple data sources may be combined) Data selection (where data relevant to the analysis task are retrieved from the database) Data transformation (where data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations. 3. 7. 6. 4.Data Mining & KDD 1. for instance) Data mining (an essential process where intelligent methods are applied in order to extract data patterns) Pattern evaluation (to identify the truly interesting patterns representing knowledge based on some interestingness measures) Knowledge presentation (where visualization and knowledge representation techniques are used to present the mined knowledge to the user) .
data mining should be applicable to any kind of data repository Thus the scope of our examination of data repositories will include relational databases data warehouses transactional databases advanced database systems object-relational databases specific application-oriented databases spatial databases time-series databases text databases multimedia databases flat files data streams World Wide Web .Data Mining: on What Kind of Data In principle.
Data Mining: What Kinds of Patterns Can Be Mined? data mining tasks can be classified into two categories: Descriptive characterize the general properties of the data in the database Predictive Perform inference on the current data in order to make predictions .
Associations.Data Mining: What Kinds of Patterns Can Be Mined? Concept/Class Description: Characterization and Discrimination Mining Frequent Patterns. and Correlations Classification and Prediction Cluster Analysis Outlier Analysis Evolution Analysis .
Major Issues Mining methodology and user interaction issues Mining different kinds of knowledge in databases Interactive mining of knowledge at multiple levels of abstraction Incorporation of background knowledge Data mining query languages and ad hoc data mining Presentation and visualization of data mining results Handling noisy or incomplete data Pattern evaluation—the interestingness problem Efficiency and scalability of data mining algorithms Parallel. distributed. and incremental mining algorithms Handling of relational and complex types of data Mining information from heterogeneous databases and global information systems Performance issues Issues relating to the diversity of database types .
Classification of Data Mining Systems .
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.