You are on page 1of 25

Knowledge discovery in databases

An exciting recent movement in the database area is knowledge discovery in databases(KDD) KDD is an umbrella term used to describe all activities involved in making sense of data stored in large and complex databases KDD encompasses a number of terms that are currently receiving attention namely data warehousing,datamart and data mining

Data warehousing-Database consists of data stored on a computer that facilitates retrieval Data warehousing is a refinement of the database concept that makes an improved data resource available to the users It enables the users to manipulate and use data in intuitive ways Key concept is that it encompasses a very wide range of computer based data

The data resource is here called as data warehouse and its typically very large of very high quality and highly retrievable But the large size of data does not come at the cost of poor quality This is because extensive data cleaning,ie removal of incorrect and inconsistent data and converting it into higher quality

One statistical technique is clustering which arranges the data in the ways users want to view it This is similar to like goods arranged together in a supermarket Data warehousing is typically performed in mainframe computers because of extremely large amount of stored data

The data is performed in a relational database DBMS vendors such as oracle,sybase and informix are promoting the use of their products as data warehouse platforms IBM is actively positioned itself as the builder of computer hardware that supports data warehousing activity

The data mart Achieving a data warehouse sounds like a big challenge so that experts recommended taking a modest approach A data mart is a database that contains data describing only a segment of the firms operations A firm may have a marketing data mart and,human resources data mart and so on

Data mining-A term is often used in conjunction with data warehousing and data mart is data mining It s the process of finding relationships in data that are unknown to the user Data mining helps the user by discovering the relationships and presenting them in an understandable way

The relationships may provide the basis for decision making Data mining enables the user to discover knowledge in databases that the user may not know it exists Its not presenting the same data in a different format,it shows relationships that were not previously recognised

Take an eg: of a bank who have decided to offer mutual funds to its customers Bank management wants to aim promotional materials to the customers They want to target the customer segment that offers greatest potential for business For this there is data mining required to relate to the customer database and prospects

Verification driven data mining-One approach is for the managers to identify characters they believe the members of the target will have Assume the managers want to target young,married,two income and high networth customers The query could be entered in to the DBMS and appropriate records will be retrieved

Such an approach which begins with the users hypothesis of how the data is related is called verification driven data mining The short coming of this approach is that the retrieval process is guided entirely by the user The selected information can be no better than the users view of the data

Discovery driven data mining-Another approach enables a data mining system to identify the best customers for the promotion This system enables the system to analyse the database and looks for group with common charecteristics In the previous bank eg:the mining system will not only target the young married group but also retired married couple having incomes,thereby recommending a promotional campaign for both the groups

Combined discovery and verification data mining-The concept enables the user and computer to work together to solve a problem The user applies expertise in the problem domain and computer performs the data analysis This combination selects the appropriate data and put it in the right form for decision making

Components of a telecommunication system


The speed of data transmission is slower in telephone systems than between two computers connected by a telephone wire Computers need extremely reliable connections but the humans who use the telephone can understand communication even when the line is static Protocols for the public telephone system were established to meet the minimun criteria of voice transactions The telephone system quality is significantly below the needs of computer data transmission

Communication networks
Networks are differentiated by the size of audience that is served Technology plays a role because there are physical limits to the distance between computers based on the communications medium used The distinction between different types of networks has blurred as communication technologies improve and the quality of data transmission also improves

To be included on a network,each device-each computer,printer,or similar device must be attached to the communications medium This is done using a network interface card The network interface card (NIC) acts as an intermediary between the data moving to and from the computer or other device

The NIC is more than just a buffer to allow data storage It deciphers information from the packets to determine if the data is meant to be captured It also decides if the data should be allowed to pass down the communications medium

Local area networks- A LAN is a group of computers and other devices(such as printers) that are connected together by a common medium LANs typically join computers that are physically close together such as in the same room or building Only a limited number of computers and other devices can be connected on a single LAN

The limitations vary based on the medium connecting the computers and devices as well as the LAN software being used As a general rule,a LAN will cover a total distance of only half a mile The distance between computers linked by communication medium is typically at least 2 feet and not more than 60 feet

The distances are only guidelines since the specifications imposed by the type of communication medium,the network interface card used and the LAN software dictate the actual distances The current transmission speed of data along a LAN generally runs from 10 million bits per second to 100 mbps

LAN use only private network media and they do not transfer data to the public telephone system Only a single network protocol such as Ethernet or token-ring can be used on a single LAN

LAN topology and implementation-LAN utilizes three separate configurations for connecting the computers and other devices The network configuration is called topology and three major topologies are used The three are ring,bus and hub topologies which are named after their form of arrangement in the network

The importance of stars and hubs to most professionals has less to with the technology and more to do with the communication The managers and professional staff became more dependent on computer resources They were realising the difficulties in passing information from one to another and was time consuming to communicate

Advantages of LAN LAN allowed work groups to share computer based data and to utilise computer resources (like laser printer),not in the workers desk but on the network It was possible to send electronic messages to coworkers The ability to share costly hardware like a laser printer proved to be a cost saving strategy

Sharing electronic messages allowed individual users to act as a group Benefits from group decisions became apparent to firms They started to take advantage of other network technologies to link local groups to other local groups and then to the entire company

You might also like