This action might not be possible to undo. Are you sure you want to continue?
Too often you hear someone say, "Oh yeah, I know how to use a computer. I can surf the Web with the best of them and I can play Solitaire for hours. I'm really good at computers." Okay. So that person can pound a keyboard, use a mouse at lightning speed, and has a list of favorite Web sites a mile long. But the real question is "Is that person information literate?" Just because you can pound the keyboard doesn't necessarily mean you can leverage the technology to your advantage or the advantage of your organization. An organization can gather and keep all the data on its customers that a hard drive can hold. You can get all the output reports that one desk can physically hold. You can have the fastest Internet connection created to date. But if the organization doesn't take advantage of customer data to create new opportunities, then all it has is useless information. If the output report doesn't tell the management that it has a serious problem on the factory floor, then all that's been accomplished is to kill a few more trees. If you don't know how to analyze the information from a Web site to take advantage of new sales leads, then what have you really done for yourself today? Most of us think only of hardware and software when we think of an Information System. There is another component of the triangle that should be considered, and that's the people side, or "persware." Think of it this way:
In this section of the text, Laudon & Laudon discuss the components of an information system. They talk about the input, processing, output and feedback processes. Most important is the feedback process; unfortunately it's the one most often overlooked. Just as in the triangle above, the hardware (input and output) and the software (processing) receive the most attention. With those two alone, you have computer literacy. But if you don't use the "persware" side of the triangle to complete the feedback loop, you don't accomplish much. Add the "persware" angle with good feedback and you have the beginnings of information literacy.
A Business Perspective on Information Systems Using feedback completes the information processing loop. To be a good Information Systems manager, however, you must bring into that loop far more than just the computer data. For instance, your information system reports that you produced 100,000 widgets last week with a "throwback" rate of 10%. The feedback loop tells you that the throwback rate has fallen 2% in the last month. Wow, you say, that's a pretty good improvement. So far, so good. But if you put that information into a broader context, you're still costing the organization a huge sum of money because each percentage point on the throwback rate averages $10,000. And when you bring in available external environmental information, your company is 5% above the industry norm. Now that's information you can use - to your advantage or disadvantage! If you, as a manager, can then take other information from the internal and external environments to come up with a solution to this problem, you can consider yourself "information literate." Organizations Organizations are funny things. Each one tends to have its own individual personality and yet share many things in common with other organizations. Look at some of the organizations you may be associated with - softball team, fraternity/sorority, health club, or a child's soccer team. See, organizations exist everywhere and each of them has its own structure, just as workplace organizations have their own structure and personality to fit their needs, or in some cases, habits. A baseball team needs talented, well-trained players at different positions. Sometimes, the success of the team depends on a good, well-informed coach or manager. So too with the workplace organization. Business organizations need many kinds of players with various talents, who are well-trained and well-informed, in order to succeed. Every organization requires tools to help it succeed. If the baseball team uses bats that are 25 years old against a team whose bats are 2 years old, they will have to work harder on their own to make up for that disadvantage. If your child's soccer team uses balls with torn seams, they're going to have a harder time hitting the ball into the goal. So if your organization is using older equipment or uses it the wrong way, it just stands to reason it is going to have a harder time beating the odds. Management Every good organization needs a good manager. Pretty simple, pretty reasonable. Take professional baseball managers. They don't actually play the game, they don't hit the home run, catch the flyball for the last out, or hang every decoration for the celebration party. They stay on the sidelines during the game. Their real role is to develop the game plan by analyzing their team's strengths and weaknesses. But that's not all; they also determine the competition's strengths and weaknesses. Every good manager has a game plan before the team even comes out of the locker room. That plan may change as the game progresses, but managers pretty much know what they're going to do if they are losing or if they are winning. The same is true in workplace organizations. Technology
Do you own a Digital Video Disk? Probably not, since it's only been on the market for a short time. How old is your car or truck? Manufacturers are constantly offering us new vehicles, yet we tend to upgrade only every few years. Your personal computer may be a year old or three years old. Do you have the latest gadgets? Chances are you don't. Face it, you just can't keep up with all the new stuff. No one can. Think about how hard, not to mention expensive, it is for an individual to acquire everything introduced to the marketplace. Think how difficult it is sometimes to learn how to use every feature of all those new products. Now put those thoughts into a much larger context of an organization. Yes, it would be nice if your company could purchase new computers every three months so you could have the fastest and best technology on the market. But it can't. Not only is it expensive to buy the hardware and the software, but the costs of installing, maintaining, updating, integrating, and training must all be taken into account. We'll look at the hardware and software sides of the Information Systems triangle in upcoming chapters, but it's important that you understand now how difficult it is for an organization, large or small, to take advantage of all the newest technology.
Role of Information Systems
As a consumer, you have instant access to millions of pieces of data. With a few clicks of the mouse button, you can find anything from current stock prices and video clips of current movies. You can get product descriptions, pictures, and prices from thousands of companies across India and around the world. Trying to sell services and products? You can purchase demographic, economic, consumer buying pattern, and market-analysis data. Your firm will have internal financial, marketing, production, and employee data for past years. This tremendous amount of data provides opportunities to managers and consumers who know how to obtain it and analyze it to make better decisions. The speed with which Information Technology (IT) and Information Systems (IS) are changing our lives is amazing. Only 50 years ago communication was almost limited to the telephone, the first word processors came out in the mid-sixties and the fax entered our offices in the 1970's. Today information systems are everywhere; from supermarkets to airline reservations, libraries and banking operations they have become part of our daily lives. The first step in learning how to apply information technology to solve problems is to get a broader picture of what is meant by the term information system. You probably have some experience with using computers and various software packages. Yet, computers are only one component of an information system. A computer information system (CIS) consists of related components like hardware, software, people, procedures, and collections of data. The term information technology (IT) represents the various types of hardware and software used in an information system, including computers and networking equipment. The goal of Information System is to enable managers to make better decisions by providing quality information. The physical equipment used in computing is called hardware. The set of instructions that controls the hardware is known as software. In the early days of computers, the people directly involved in are tended to be programmers, design analysts, and a few external users. Today, almost everyone in the firm is involved with the information system. Procedures are instructions that help people use the systems. They include items such as user manuals, documentation, and procedures to ensure that backups are made regularly. Databases are collections of related data that can be retrieved easily and processed by the computers. To create an effective information system, you need to do more than simply purchase the various components. Quality is an important issue in business today, particularly as it relates to information systems. The quality of an information system is measured by its ability to provide exactly the information needed by managers in a timely manner. The information must be accurate and up-to-date. Users should be able to receive the information in a variety of formats: tables of data, graphs, summary statistics, or even pictures or sound: Users have different perspectives and different requirements, and a good information system must have the flexibility to present information in diverse forms for each user.
Lecture - 3 The Relationship Between Organizations and Information Systems How organizations and information systems work together, or sometimes against each other. The idea of course is to keep them in sync, but that's not always possible. We'll look at the nature of organizations and how they relate to Information Systems. The Two-Way Relationship
This figure shows the complexity of the relationship between organizations and information technology. Installing a new system or changing the old one involves much more than simply plunking down new terminals on everyone's desk. The greatest influence, as the text points out, could simply be sheer luck! What Is an Organization? An organization is very similar to the Information System described previously. These figures have many things in common. Both require inputs and some sort of processing, both have outputs, and both then depend on feedback for successful completion of the loop.
Information Systems use data as their main ingredient. Organizations rely on people. However, the similarities are remarkable. They are both a structured method of turning raw products (data/people) into useful entities (information/producers). Think of some of the organizations you've been involved in. Didn't each of them have a structure, even if it wasn't readily apparent? Perhaps the organization seemed chaotic or didn't seem to have any real purpose. Maybe that was due to poor input, broken-down processing, or unclear output. It could very well be that feedback was ignored or missing altogether. Often times an organization's technical definition, the way it's supposed to work, is quite different from the behavioral definition, the way it ireally works. For instance, even though Sally is technically assigned to the Production Department with Sam as her supervisor on paper, she really works for Tom in Engineering. When a company is developing a new information system, it's important to keep both the technical and behavioral definitions in perspective and build the system accordingly. Salient Features of Organizations This section gives you a perspective on how organizations are constructed and compares their common and uncommon features. Why Organizations Are So Much Alike: Common Features The class you're enrolled is an organization of sorts, isn't it? Think about it. Look at the table describing the characteristics of an organization:
When you hear the term bureaucracy, you immediately think of government agencies. Not so; bureaucracies exist in many private and public companies. Bureaucracies are simply very formal organizations with strict divisions of labor and very structured ways of accomplishing tasks. They are usually thought of in a negative way, but they can be positive.
Standard Operating Procedures How many of these characteristics fit your college class? How many fit any organization you're in? Some of the Standard Operating Procedures (SOPs), politics, and culture are so ingrained in organizations that they actually hinder the success of the group. Think about your experiences in groups. You had a leader (hierarchy), a set of rules by which you operated (explicit rules and procedures), and people appointed to perform certain tasks (clear division of labor). You probably voted on different issues (impartial judgments), and you decided on the best person to fill various positions within the group (technical qualifications for positions). Hopefully, the organization was able to fulfill its goals (maximum organizational efficiency), whether winning a softball game or putting on an award-winning play. If your organization wasn't successful, perhaps it was because of the SOPs, the politics, or the culture. The point is, every group of people is an organization. The interesting question you could ask yourself would be "How would the world look and function without some kind of organization?" Organizational Politics Everyone has their own opinion about how things should get done. People have competing points of view. What might be good for Accounting may not be to the advantage of Human Resources. The Production Department may have a different agenda for certain tasks than the Shipping Department. Especially when it comes to the allocation of important resources in an organization, competition heats up between people and departments. The internal competition can have a positive or negative influence on the organization, depending on how it's handled by management. The fact remains that politics exist in every organization and should be taken into account when it comes to the structure of the information system. Organizational Culture Just as countries or groups of people have their own habits, methods, norms, and values, so too do businesses. It's not unusual for companies to experience clashes between the culture and desired changes brought about by new technologies. Many companies are facing such challenges as they move toward a totally different way of working, thanks to the Internet.
Introduction to Decision Making Everybody makes decisions. It's a natural part of life, and most of the time we don't even think about the process. In an organization, decisions are made at every level. The level at which the decision is made can also determine the complexity of the decision in relation to the input of data and output of information. Levels of Decision Making In Chapter 2 we discussed the various types of Information Systems and how they relate to the levels of an organization. We can also relate those Information Systems to the types of decisions managers make.
Strategic Decision Making. These decisions are usually concerned with the major objectives of the organization, such as "Do we need to change the core business we are in?" They also concern policies of the organization, such as "Do we want to support affirmative action?" Management Control. These decisions affect the use of resources, such as "Do we need to find a different supplier of packaging materials?" Management-level decisions also determine the performance of the operational units, such as "How much is the bottleneck in Production affecting the overall profit and loss of the organization, and what can we do about it?" Knowledge-Level Decision Making. These decisions determine new ideas or improvements to current products or services. A decision made at this level could be "Do we need to find a new chocolate recipe that results in a radically different taste for our candy bar?" Operational control. These decisions determine specific tasks that support decisions made at the strategic or managerial levels. An example is "How many candy bars do we produce today?"
Types of Decisions: Structured versus Unstructured Some decisions are very structured while others are very unstructured. You may wake up in the morning and make the structured, routine decision to get out of bed. Then you have to make the unstructured decision of what clothes to wear that day (for some of us this may be a very routine decision!). Structured decisions involve definite procedures and are not necessarily very complex. The more unstructured a decision becomes the more complex it becomes. Types of Decisions and Types of Systems
One size does not fit all when it comes to pairing the types of systems to the types of decisions. Every level of the organization makes different types of decisions, so the system used should fit the organizational level, as shown in Figure 4.4. It's easy to develop an information system to support structured decision making. Do you increase production on the day shift or hold it to the swing shift; do you purchase another piece of equipment or repair the old one? What hasn't been so easy to develop is a system that supports the unstructured decision making that takes place in the upper echelons of a company. Do we expand into foreign markets or stay within the confines of our own country; do we build a new plant in Arizona or Alabama; do we stop production of a long-time product due to falling demand or boost our marketing? The ability to create information systems to support the latter decisions is long overdue.
Stages of Decision Making Some people seem to make sudden or impulsive decisions. Other people seem to make very slow, deliberate decisions. But regardless of appearances, the decision-making process follows the same stages of development and implementation. Let's use the example of purchasing a new television, using Figure
Intelligence. You identify the facts: You don't have a television or the one that you do have isn't any good. You intuitively understand what the problem is and the effect it's having on you. You missed your favorite show last night. Design. You design possible solutions: You could watch the television in your neighbor's apartment or you could purchase a new one for yourself. Your neighbor will get annoyed if you keep coming over. On the other hand, you won't be able to go on vacation if you use your money to buy a new television. Choice. You gather data that helps you make a better decision: Your neighbor doesn't like the same shows you like or she's getting rather tired of you being there. You also determine that televisions cost a lot of money so you figure out how you can afford one. You choose to purchase a new television instead of watching your neighbor's.
Implementation. You implement the decision: You stop at the appliance store on your way home from work and carry out your decision to purchase a new television. Feedback. You gather feedback: You're broke but you can watch anything you want!
Of course this is a simplified example of the decision-making process. But the same process is used for almost every decision made by almost every person. Information Systems help improve the decision-making process by
• • • • •
providing more information about the problem presenting a greater variety of possible alternatives showing consequences and effects of choices measuring the outcome of different possible solutions providing feedback on the decision that is made
Rule 1 : The information Rule.
"All information in a relational data base is represented explicitly at the logical level and in exactly one way - by values in tables." Everything within the database exists in tables and is accessed via table access routines.
Rule 2 : Guaranteed access Rule.
"Each and every datum (atomic value) in a relational data base is guaranteed to be logically accessible by resorting to a combination of table name, primary key value and column name." To access any data-item you specify which column within which table it exists, there is no reading of characters 10 to 20 of a 255 byte string.
Rule 3 : Systematic treatment of null values.
"Null values (distinct from the empty character string or a string of blank characters and distinct from zero or any other number) are supported in fully relational DBMS for representing missing information and inapplicable information in a systematic way, independent of data type." If data does not exist or does not apply then a value of NULL is applied, this is understood by the RDBMS as meaning non-applicable data.
Rule 4 : Dynamic on-line catalog based on the relational model.
"The data base description is represented at the logical level in the same way as-ordinary data, so that authorized users can apply the same relational language to its interrogation as they apply to the regular data." The Data Dictionary is held within the RDBMS, thus there is no-need for off-line volumes to tell you the structure of the database.
Rule 5 : Comprehensive data sub-language Rule.
"A relational system may support several languages and various modes of terminal use (for example, the fill-in-the-blanks mode). However, there must be at least one language whose statements are expressible, per some well-defined syntax, as character strings and that is comprehensive in supporting all the following items
• • • •
Data Definition View Definition Data Manipulation (Interactive and by program). Integrity Constraints
Every RDBMS should provide a language to allow the user to query the contents of the RDBMS and also manipulate the contents of the RDBMS.
Rule 6 : .View updating Rule
"All views that are theoretically updatable are also updatable by the system." Not only can the user modify data, but so can the RDBMS when the user is not logged-in.
Rule 7 : High-level insert, update and delete.
"The capability of handling a base relation or a derived relation as a single operand applies not only to the retrieval of data but also to the insertion, update and deletion of data." The user should be able to modify several tables by modifying the view to which they act as base tables.
Rule 8 : Physical data independence.
"Application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representations or access methods." The user should not be aware of where or upon which media data-files are stored
Rule 9 : Logical data independence.
"Application programs and terminal activities remain logically unimpaired when information-preserving changes of any kind that theoretically permit un-impairment are made to the base tables." User programs and the user should not be aware of any changes to the structure of the tables (such as the addition of extra columns).
Rule 10 : Integrity independence.
"Integrity constraints specific to a particular relational data base must be definable in the relational data sub-language and storable in the catalog, not in the application programs." If a column only accepts certain values, then it is the RDBMS which enforces these constraints and not the user program, this means that an invalid value can never be entered into this column, whilst if the
constraints were enforced via programs there is always a chance that a buggy program might allow incorrect values into the system.
Rule 11 : Distribution independence.
"A relational DBMS has distribution independence." The RDBMS may spread across more than one system and across several networks, however to the end-user the tables should appear no different to those that are local.
Rule 12 : Non-subversion Rule.
"If a relational system has a low-level (single-record-at-a-time) language, that low level cannot be used to subvert or bypass the integrity Rules and constraints expressed in the higher level relational language (multiple-records-at-a-time)."
Lecture-6 Enhancing Management Decision Making The more information you have, based on internal experiences or from external sources, the better your decisions. Business executives are faced with the same dilemmas when they make decisions. They need the best tools available to help them. Decision-Support Systems When we discussed Transaction Processing Systems and Management Information Systems, the decisions were clear-cut: "Should we order more sugar to support the increased production of candy bars?" Most decisions facing executives are unstructured or semi structured: "What will happen to our sales if we increase our candy bar prices by 5%?" Decision Support Systems (DSS) help executives make better decisions by using historical and current data from internal Information Systems and external sources. By combining massive amounts of data with sophisticated analytical models and tools, and by making the system easy to use, they provide a much better source of information to use in the decision-making process. DSS and MIS In order to better understand a decision support system, let's compare the characteristics of an MIS system with those of a DSS system: MIS Structured decisions Reports based on routine flows of data General control of organization Structured information flows Presentation in form of reports DSS Semi structured, unstructured decisions Focused on specific decisions or classes of decisions End-user control of data, tools, and sessions Emphasizes responses change, flexibility, quick
Presentation in form of graphics Greater emphasis on models, assumptions, ad hoc queries
Traditional systems development
You can also understand the differences between these two types of systems by understanding the differences in the types of decisions made at the two levels of management.
Lecture-7 Types of Decision-Support Systems Because of the limitations of hardware and software, early DSS systems provided executives only limited help. With the increased power of computer hardware, and the sophisticated software available today, DSS can crunch lots more data, in less time, in greater detail, with easy to use interfaces. The more detailed data and information executives have to work with, the better their decisions can be. Model-Driven DSS were isolated from the main Information Systems of the organization and were primarily used for the typical "what-if" analysis. That is, "What if we increase production of our candy bars and decrease the shipment time?" These systems rely heavily on models to help executives understand the impact of their decisions on the organization, its suppliers, and its customers. Data-Driven DSS take the massive amounts of data available through the company's TPS and MIS systems and cull from it useful information which executives can use to make more informed decisions. They don't have to have a theory or model but can "freeflow" the data. By using data mining, executives can get more information than ever before from their data. One danger in data mining is the problem of getting information that, on the surface, may seem meaningful but when put into context of the organization's needs, simply doesn't provide any useful information. For instance, data mining can tell you that on a hot summer day in the middle of Texas, more bottled water is sold in convenience stores than in grocery stores. That's useful information executives can use to make sure more stock is targeted to convenience stores. Data mining could also reveal that when customers purchase white socks, they also purchase bottled water 62% of the time. We seriously doubt there is any correlation between the two purchases. The point is that you need to beware of using data mining as a sole source of decision making and make sure your requests are as focused as possible. Laudon and Laudon describe five types of information you can get from data mining customer information:
• • • • •
Associations: Immediate links between one purchase and another purchase Sequences: Phased links; because of one purchase, another purchase will be made at a later time Classification: Predicting purchases based on group characteristics and then targeting marketing campaigns Clustering: Predicting consumer behavior based on demographic information about groups to which individuals belong Forecasting: Use existing values to determine what other values will be
Components of DSS
A DSS has three main components, as shown in Figure 15.1: the database, software and models. The database is, of course, data collected from the organization's other Information Systems. Another important source of information the organization may use is external data from governmental agencies or research data from universities. The data can be accessed from the warehouse or from a data mart (extraction of data from the warehouse). Many databases are now being maintained on desktop computers instead of mainframes. The DSS software system must be easy to use and adaptable to the needs of each executive. A well-built DSS uses the models that the text describes. You've probably used statistical models in other classes to determine the mean, median, or deviations of data. These statistical models are the basis of data mining. The What-If decisions most commonly made by executives use sensitivity analysis to help them predict what effect their decisions will have on the organization. Executives don't make decisions based solely on intuition. The more information they have, the more they experiment with different outcomes in a safe mode, the better their decisions. That's the benefit of the models used in the software tools.
Examples of DSS Applications
The Advanced Planning System - A Manufacturing DSS: Uses the sensitivity analysis model of "What-If" analysis Southern California Gas Company: Uses classification and clustering data mining techniques to focus new marketing efforts. Shop-Ko Stores: Uses the datamining technique of associations to recognize customers' purchasing patterns. Geographic Information Systems (GIS): Very popular with, of all people, farmers and ranchers. Using GIS tools, they can determine exactly how much fertilizer to spread on their fields without over- or under-spraying. They save money, time, and the land! Because of pinpoint accuracy, GIS systems are used by emergency response teams to help rescue stranded skiers, hikers, and bicyclists.
Web-Based DSS Of course, no discussion would be complete without information about how companies are using the Internet and the Web in the customer DSS decision-making process. Figure 15.3 shows an Internet CDSS (Customer Decision-Support System).
Here's an example: You decide to purchase a new home and use the Web to search real estate sites. You find the perfect house in a good neighborhood but it seems a little
pricey. You don't know the down payment you'll need. You also need to find out how much your monthly payments will be based on the interest rate you can get. Luckily the real estate Web site has several helpful calculators (customer decision support systems) you can use to determine the down payment, current interest rates available, and the monthly payment. Some customer decision support systems will even provide an amortization schedule. You can make your decision about the purchase of the home or know instantly that you need to find another house.
Lecture-9 Group Decision-Support Systems More and more, companies are turning to groups and teams to get work done. Hours upon hours are spent in meetings, in group collaboration, in communicating with many people. To help groups make decisions, a new category of systems was developed--the group decision-support system (GDSS). What Is a GDSS? You've been there: a meeting where nothing seemed to get done, where some people dominated the agenda and others never said a word, which dragged on for hours with no
clear agenda. When it was all over no one was sure what was accomplished, if anything. But the donuts and coffee were good! Organizations have been struggling with this problem for years. They are now using GDSS as a way to increase the efficiency and effectiveness of meetings. The text includes a list of elements that GDSS use to help organizations. We'll highlight a few of them:
• • • • •
Preplanning: A clear-cut agenda of the topics for the meeting. Open, collaborative meeting atmosphere: Free flow of ideas and communications without any of the attendees feeling shy about contributing Evaluation objectivity: Reduces "office politics" and the chance that ideas will be dismissed because of who presented them instead of what was presented Documentation: Clear communication about what took place and what decisions were made by the group Preservation of "organizational memory": Even those unable to attend the meeting will know what took place; great for geographically separated team members.
GDSS Characteristics and Software Tools
In GDSS the hardware includes more than just computers and peripheral equipment. It also includes the conference facilities, audiovisual equipment, and networking equipment that connects everyone. The persware extends to the meeting facilitators and the staff that keeps the hardware operating correctly. As the hardware becomes more sophisticated and widely available, many companies are bypassing specially equipped rooms in favor of having the group participants "attend" the meeting through their individual desktop computers. Many of the software tools and programs discussed in Chapter 14, Groupware, can also be used to support GDSS. Some of these software tools are being reworked to allow people to attend meetings through Intranets or Extranets. Some highlights:
Electronic questionnaires: Set an agenda and plan ahead for the meeting Electronic brainstorming: Allows all users to participate without fear of reprisal or criticism
• • •
Questionnaire tools: Gather information even before the meeting begins, so facts and information are readily available Stakeholder identification: Determines the impact of the group's decision Group dictionaries: Reduce the problem of different interpretations
Now instead of wasting time in meetings, people will know ahead of time what is on the agenda. All of the information generated during the meeting is maintained for future use and reference. Because input is anonymous, ideas are evaluated on their own merit. And for geographically separated attendees, travel time and dollars are saved. Electronic meeting systems make these efficiencies possible. Figure 15.6 shows the sequence of activities at a typical EMS meeting.
All is not perfect with EMS, however. Face-to-face communications is critical for managers and others to gain insight into how people feel about ideas and topics. Body language can often speak louder than words. Some people still may not contribute freely because they know that all input is stored on the file server, even though it is anonymous. And the system itself imposes disciplines on the group that members may not like. How GDSS Can Enhance Group Decision Making Go back to the previous list of problems associated with meetings and you can determine how GDSS solve some of these problems. 1. Improved preplanning: Forces an agenda to keep the meeting on track. 2. Increased participation: Increases the number of people who can effectively contribute to the meeting.
3. Open, collaborative meeting atmosphere: Nonjudgmental input by all attendees. 4. Criticism-free idea generation: Anonymity can generate more input and better ideas. 5. Evaluation objectivity: The idea itself is evaluated and not the person contributing the idea. 6. Idea organization and evaluation: Organized input makes it easier to comprehend the results of the meeting. 7. Setting priorities and making decisions: All management levels are on equal footing. 8. Documentation of meetings: Results of meeting are available soon after for further use and discussion. 9. Access to external information: Reduces amount of disagreements by having the facts.
Lecture -10 Groupware Technologies Groupware technology has long been heralded as a way to improve business processes and individual work practices. However, many instantiations of groupware technologies have not met expectations. Some groupware has failed to be adopted by enough individuals in an organization to make its use beneficial. Failure has been in part attributed to deployment problems where the technology was not available to those who could most benefit from it , or required those who would not benefit from it to adopt it .
Electronic calendars/on-line meeting schedulers make good groupware examples for study because they have obvious mappings to real world artifacts, putting them among the seemingly simplest groupware technologies. Additionally, a 1991 Internetadministered survey found that calendaring systems were the most available groupware technology, although they were the least used . Further, information about their early use is available as a result of investigations by Ehrlich [2,3] and Grudin  who studied organizations, including software development companies, that failed to adopt calendaring systems. Today there are examples where groupware--and calendaring systems in particular--are taking a strong hold. What has changed? Studies of the use of calendaring systems at two sites--Microsoft and Sun Microsystems--reported in Grudin & Palen  have revealed several organizational, behavioral and technical factors that enable widespread use. This study has also raised additional research issues about adaptation of groupware technologies to their organizational environments and individual work practices. Interim findings indicate that social norms and communication behaviors about meeting arranging might be influenced by the amount of information calendars reveal; that tangible artifacts can be born out of technologically-supported collaborations which in turn are useful for other purposes; and that there are potentially critical trade-offs between efficiency, information resource creation, and privacy. My dissertation research will refine and elaborate the conditions that facilitate groupware adoption, and investigate subsequent integration of groupware technology into work practices and the organizational environment.
An expert system, also known as a knowledge based system, is a computer program that contains the knowledge and analytical skills of one or more human experts, related to a specific subject. This class of program was first developed by researchers in artificial intelligence during the 1960s and 1970s and applied commercially throughout the 1980s The primary goal of expert systems research is to make expertise available to decision makers and technicians who need answers quickly. There is never enough expertise to go around -- certainly it is not always available at the right place and the right time. Portable with computers loaded with in-depth knowledge of specific subjects can bring decades worth of knowledge to a problem. The same systems can assist supervisors and managers with situation assessment and long-range planning. Many small systems now exist that bring a narrow slice of in-depth knowledge to a specific problem, and these provide evidence that the broader goal is achievable. These knowledge-based applications of artificial intelligence have enhanced productivity in business, science, engineering, and the military. With advances in the last decade, today's expert systems clients can choose from dozens of commercial software packages with easy-to-use interfaces. Each new deployment of an expert system yields valuable data for what works in what context, thus fueling the AI research that provides even better applications.
Lecture –11 What is Data Warehouse? A decision support database that is maintained separately from the organization’s operational database
Support information processing by providing a solid platform of consolidated, historical data for analysis. “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision-making process.”—W. H. Inmon Data Warehouse—Integrated Constructed by integrating multiple, heterogeneous data sources relational databases, flat files, on-line transaction records Data cleaning and data integration techniques are applied. Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources E.g., Hotel price: currency, tax, breakfast covered, etc. When data is moved to the warehouse, it is converted Data Warehouse—Time Variant The time horizon for the data warehouse is significantly longer than that of operational systems. Operational database: current value data. Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years) Every key structure in the data warehouse Contains an element of time, explicitly or implicitly But the key of operational data may or may not contain “time element”. Data Warehouse—Non-Volatile Operational update of data does not occur in the data warehouse environment. Does not require transaction processing, recovery, and concurrency control mechanisms Requires only two operations in data accessing: Initial loading of data and access of data. Data warehousing and on-line analytical processing (OLAP) are essential elements of decision support, which has increasingly become a focus of the database industry. Many commercial products and services are now available, and all of the principal database management system vendors now have offerings in these areas. Decision support places some rather different requirements on database technology compared to traditional online transaction processing applications. We describe back end tools for extracting, cleaning and loading data into a data warehouse; multidimensional data models typical of OLAP; front end client tools for querying and data analysis; server extensions for efficient query processing; and tools for metadata management and for managing the warehouse
Data Warehouse vs. Heterogeneous DBMS Traditional heterogeneous DB integration: Query driven approach When a query is posed to a client site, a meta-dictionary is used to translate the query into queries appropriate for individual heterogeneous sites involved, and the results are integrated into a global answer set Complex information filtering, compete for resources Data warehouse: update-driven, high performance Information from heterogeneous sources is integrated in advance and stored in warehouses for direct query and analysis Data Warehouse vs. Operational DBMS OLTP (on-line transaction processing) Major task of traditional relational DBMS Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc. OLAP (on-line analytical processing) Major task of data warehouse system Data analysis and decision making Distinct features (OLTP vs. OLAP): User and system orientation: customer vs. market Data contents: current, detailed vs. historical, consolidated Database design: ER + application vs. star + subject View: current, local vs. evolutionary, integrated Access patterns: update vs. read-only but complex queries
The Case Against Data Warehousing Data warehousing systems, for the most part, store historical data that have been generated in internal transaction processing systems. This is a small part of the universe of data available to manage a business. Sometimes this part has limited value. Data warehousing systems can complicate business processes significantly. If most of your business needs are to report on data in one transaction processing system and/or all the historical data you need are in that system and/or the data in the system are clean and/or your hardware can support reporting against the live system data and/or the structure of the system data is relatively simple and/or your firm does not have much interest in end user ad hoc query/report tools, data warehousing may not be for your business. Data warehousing can have a learning curve that may be too long for impatient firms. Many "strategic applications" of data warehousing have a short life span and require the developers to put together a technically inelegant system quickly. Some developers are reluctant to work this way. There is a limited number of people available who have worked with the full data warehousing system project "life cycle". Data warehousing systems can require a great deal of "maintenance" which many organizations cannot or will not support. From Tables and Spreadsheets to Data Cubes Data warehouse is based on a multidimensional data model which views data in the form of a data cube A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions Dimension tables, such as item (item_name, brand, type), or time(day, week, month, quarter, year) Fact table contains measures (such as dollars_sold) and keys to each of the related dimension tables In data warehousing literature, an n-D base cube is called a base cuboid. The top most 0D cuboids, which holds the highest-level of summarization, is called the apex cuboid. The lattice of cuboids forms a data cube.
Lecture-12 Conceptual Modeling of Data Warehouses Modeling data warehouses: dimensions & measures Star schema: A fact table in the middle connected to a set of dimension tables Snowflake schema: A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake Fact constellations: Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation Online Transaction Processing (OLTP) Schema In Online Transaction Processing (OLTP), the database is designed to achieve efficient transactions such as INSERT and UPDATE. This is very different from the OLAP design. Unlike OLAP, normalization is very important to reduce duplicates and also cut down on the size of the data. our OLTP schema may look like this: Locations Table Field Name Loc_Id Loc_Code Loc_Name State_Id Country_Id States Table
Type INTEGER (4) VARCHAR (5) VARCHAR (30) INTEGER (4) INTEGER (4)
Field Name Sate_Id State_Name
Type INTEGER (4) VARCHAR (50) Countries Table
Field Name Country_Id Country_Name
Type INTEGER (4) VARCHAR (50)
In order to query for all locations that are in country 'USA' we will have to join these three tables. The SQL will look like: SELECT * FROM Locations, States, Countries where Locations.State_Id = States.State_Id AND Locations.Country_id=Countries.Country_Id and Country_Name='USA' Dimension Tables - Key elements of a Dimension Table Dimensional modeling allows only one table per dimension. But your OLTP data spans across multiple tables as described. So we need de-normalize the OLTP schema and export into your Dimension Tables. For example, for the location dimension, you achieve this by joining the three OLTP tables and inserting the data into the single Location Location Field Name Dim_Id Loc_Code Name State_Name Country_Name Table will Dimension Type INTEGER (4) VARCHAR (4) VARCHAR (50) VARCHAR (20) VARCHAR (20) look Table like this: Schema
All Dimension tables contain a key column called the dimension key. In this example Dim_Id is our dimension Id. This is the unique key into our Location dimension table. The actual data in your Location Table may look like this Location Dimension Table Data
1001 1002 1003 1004 1005
IL01 IL02 NY01 TO01 MX01
Chicago Loop Arlington Hts Brooklyn Toronto Mexico City
Illinois Illinois New York Ontario Distrito Federal
USA USA USA Canada Mexico
We may notice that some of the information is repeated in the above dimension table. The State Name and Country Name are repeated through out the table. You may feel that this is waste of data space and against the normalization principles. But in dimensional modeling this type of design makes the querying very optimized and reduces the query times. Also we will learn later that in a typical data warehouse, the dimension tables make up only 10 to 15 % of the storage as the fact table is by far the largest table and takes up the rest of the storage allocation. Time Dimension Table After Time de-normalization, your Dimension Type INTEGER (4) SMALL INTEGER (2) VARCHAR (3) SMALL INTEGER (4) VARCHAR (2) SMALL INTEGER (2) data in Time TM TM TM TM your Time Table may will look like this: Data TM_Year Time table will Table look like this: Schema
Field Name Dim_Id Month Month_Name Quarter Quarter_Name Year
_Dim_Id 1001 1002 1003 1004 1005
_Month _Month_Name _Quarter _Quarter_Name 1 Jan 1 Q1 2003 2 Feb 1 Q1 2003 3 Mar 1 Q1 2003 4 Apr 2 Q2 2003 5 May 2 Q2 2003
Product Dimension Table After de-normalization, your Product Product Dimension Field Name Dim_Id SKU Name Category table will Table look like this: Schema
Type INTEGER (4) VARCHAR (10) VARCHAR (30) VARCHAR (30)
In this table PR_Dim_Id is our dimension Id. This is the unique key into our Product dimension table. The actual data in your Product Table may look like this: Product Dim_Id 1001 1002 1003 SKU DOVE6K MLK66F# SMKSAL55 Dimension Name Category Dove Soap 6Pk Sanitary Skim Milk 1 Gal Dairy Smoked SalmonMeat 6oz Table Data
Lecture – 13 Data Warehouse Design Process Top-down, bottom-up approaches or a combination of both Top-down: Starts with overall design and planning (mature) Bottom-up: Starts with experiments and prototypes (rapid) Typical data warehouse design process Choose a business process to model, e.g., orders, invoices, etc. Choose the grain (atomic level of data) of the business process Choose the dimensions that will apply to each fact table record Choose the measure that will populate each fact table record Three Data Warehouse Models Enterprise warehouse collects all of the information about subjects spanning the entire organization Data Mart A subset of corporate-wide data that is of value to a specific groups of users. Its scope is confined to specific, selected groups, such as marketing data mart Independent vs. dependent (directly from warehouse) data mart Virtual warehouse A set of views over operational databases Only some of the possible summary views may be materialized
OLAP Server Architectures
Relational OLAP (ROLAP) Use relational or extended-relational DBMS to store and manage warehouse data and OLAP middle ware to support missing pieces Include optimization of DBMS backend, implementation of aggregation navigation logic, and additional tools and services greater scalability Multidimensional OLAP (MOLAP) Array-based multidimensional storage engine (sparse matrix techniques) fast indexing to pre-computed summarized data Hybrid OLAP (HOLAP) User flexibility, e.g., low level: relational, high-level: array
The Case for Data Warehousing To perform server/disk bound tasks associated with querying and reporting on servers/disks not used by transaction processing systems To use data models and/or server technologies that speed up querying and reporting and that are not appropriate for transaction processing To provide an environment where a relatively small amount of knowledge of the technical aspects of database technology is required to write and maintain queries and reports and/or to provide a means to speed up the writing and maintaining of queries and reports by technical personnel To provide a repository of "cleaned up" transaction processing systems data that can be reported against and that does not necessarily require fixing the transaction processing systems To make it easier, on a regular basis, to query and report data from multiple transaction processing systems and/or from external data sources and/or from data that must be stored for query/report purposes only To provide a repository of transaction processing system data that contains data from a longer span of time than can efficiently be held in a transaction processing system and/or to be able to generate reports "as was" as of a previous point in time
Lecture – 14 Data Warehouse Usage Three kinds of data warehouse applications Information processing supports querying, basic statistical analysis, and reporting using crosstabs, tables, charts and graphs Analytical processing multidimensional analysis of data warehouse data supports basic OLAP operations, slice-dice, drilling, pivoting Data mining knowledge discovery from hidden patterns supports associations, constructing analytical models, performing classification and prediction, and presenting the mining results using visualization tools.
Data explosion problem Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories We are drowning in data, but starving for knowledge! Solution: Data warehousing and data mining –Data warehousing and on-line analytical processing Extraction of interesting knowledge (rules, regularities, patterns, constraints) from data in large databases
Data Marts In some data warehouse implementations, a data mart is a miniature data warehouse; in others, it is just one segment of the data warehouse. Data marts are often used to provide information to functional segments of the organization. Typical examples are data marts for the sales department, the inventory and shipping department, the finance department, upper level management, and so on. Data marts can also be used to segment data warehouse data to reflect a geographically compartmentalized business in which each region is relatively autonomous. For example, a large service organization may treat regional operating centers as individual business units, each with its own data mart that contributes to the master data warehouse. Data marts are sometimes designed as complete individual data warehouses and contribute to the overall organization as a member of a distributed data warehouse. In other designs, data marts receive data from a master data warehouse through periodic updates, in which case the data mart functionality is often limited to presentation services for clients. Regardless of the functionality provided by data marts, they must be designed as components of the master data warehouse so that data organization, format, and schemas are consistent throughout the data warehouse. Inconsistent table designs, update mechanisms, or dimension hierarchies can prevent data from being reused throughout the data warehouse, and they can result in inconsistent reports from the same data. For example, it is unlikely that summary reports produced from a finance department data mart that organizes the sales force by management reporting structure will agree with summary reports produced from a sales department data mart that organizes the same sales force by geographical region. It is not necessary to impose one view of data on all data marts to achieve consistency; it is usually possible to design consistent schemas and data formats that permit rich varieties of data views without sacrificing interoperability. For example, the use of a standard format and organization for time, customer, and product data does not preclude data marts from presenting information in the diverse perspectives of inventory, sales, or financial analysis. Data marts should be designed from the perspective that they are components of the data warehouse regardless of their individual functionality or construction. This provides consistency and usability of information throughout the organization. Data Warehouse Architecture
Architecture Design & Project Planning for Business Intelligence, Data Warehouse, and Corporate Performance Management Projects
Examine architecture dimensions: data along warehouse the following
• • • •
Data Information Technology Product
• • • • •
Define what data is needed to meet business user needs. Examine the completeness and correctness of source systems that are needed to obtain data. Identify the data facts and dimensions. Define the logical data models. Establish preliminary aggregation plan. Define the framework for the transformation of data into information from the source systems to information used by the business users. Recommend the data stages necessary for data transform and information access. Develop source-to-target data mapping for each data stage. Review data quality procedures and reconciliation techniques. Define the physical data models. Define technical functionality used to build a data warehousing and business intelligence environment. Identify available technologies available and review tradeoffs associated between any overlapping or competing technologies. Review current technical environment and company's strategic technical directions. Recommend technologies to be used to meet your business requirements and implementation plan. List product categories needed to implement the technology architecture. Review tradeoffs between overlapping or competing product categories. Outline implementation of product architecture in stages. Identify short list of products in each of these categories.
• • • •
• • • •
• • • •
Lecture-15 What Is Data Mining? Data mining (knowledge discovery in databases): Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases Data mining: a misnomer? Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc. From On-Line Analytical Processing to On Line Analytical Mining (OLAM) Why online analytical mining? High quality of data in data warehouses DW contains integrated, consistent, cleaned data Available information processing structure surrounding data warehouses ODBC, OLEDB, Web accessing, service facilities, reporting and OLAP tools OLAP-based exploratory data analysis mining with drilling, dicing, pivoting, etc. On-line selection of data mining functions Integration and swapping of multiple mining functions, algorithms, and tasks. Data Mining: On What Kind of Data? Relational databases Data warehouses Transactional databases Advanced DB and information repositories Object-oriented and object-relational databases Spatial databases Time-series data and temporal data Text databases and multimedia databases Heterogeneous and legacy databases WWW
Data Mining Functionalities Concept description: Characterization and discrimination –Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet regions Association (correlation and causality) –Multi-dimensional vs. single-dimensional association age(X, “20..29”) ^ income(X, “20..29K”) buys(X, “PC”) [support = 2%, confidence = 60%] contains(T, “computer”) contains(x, “software”) [1%, 75%] Classification and Prediction Finding models (functions) that describe and distinguish classes or concepts for future prediction E.g., classify countries based on climate, or classify cars based on gas mileage Presentation: decision-tree, classification rule, neural network Prediction: Predict some unknown or missing numerical values Cluster analysis Class label is unknown: Group data to form new classes, e.g., cluster houses to find distribution patterns Clustering based on the principle: maximizing the intra-class similarity and minimizing the interclass similarity Outlier analysis Outlier: a data object that does not comply with the general behavior of the data It can be considered as noise or exception but is quite useful in fraud detection, rare events analysis Trend and evolution analysis Trend and deviation: regression analysis Sequential pattern mining, periodicity analysis Similarity-based analysis
Lecture – 16 OLAP Mining: An Integration of Data Mining and Data Warehousing Data mining systems, DBMS, Data warehouse systems coupling No coupling Loose-coupling Semi-tight-coupling Tight-coupling On-line analytical mining data Integration of mining and OLAP technologies Interactive mining multi-level knowledge Necessity of mining knowledge and patterns at different levels of abstraction by drilling/rolling, pivoting, slicing/dicing, etc. Major Issues in Data Mining Mining methodology and user interaction –Mining different kinds of knowledge in databases –Interactive mining of knowledge at multiple levels of abstraction –Incorporation of background knowledge –Data mining query languages and ad-hoc data mining –Expression and visualization of data mining results –Handling noise and incomplete data –Pattern evaluation: the interestingness problem Performance and scalability –Efficiency and scalability of data mining algorithms –Parallel, distributed and incremental mining methods
Issues relating to the diversity of data types Handling relational and complex types of data Mining information from heterogeneous databases and global information systems (WWW) Issues related to applications and social impacts Application of discovered knowledge •Domain-specific data mining tools •Intelligent query answering •Process control and decision making Integration of the discovered knowledge with existing knowledge: A knowledge fusion problem Protection of data security, integrity, and privacy
Data Mining System Architectures Coupling -data mining system with DB/DW system No coupling—flat file processing, not recommended nLoose coupling Fetching data from DB/DW Semi-tight coupling—enhanced DM performance nProvide efficient implement a few data mining primitives in a DB/DW system, e.g., sorting, indexing, aggregation, histogram analysis, multiway join, precomputation of some stat functions Tight coupling—A uniform information processing environment . DM is smoothly integrated into a DB/DW system, mining query is optimized based on mining query, indexing, query processing methods, etc.
Lecture-17 Types of Text Data Mining -Keyword-based association analysis -Automatic document classification -Similarity detection --Cluster documents by a common author --Cluster documents containing information from a common source -Link analysis: unusual correlation between entities -Sequence analysis: predicting a recurring event -Anomaly detection: find information that violates usual patterns -Hypertext analysis --Patterns in anchors/links ---Anchor text correlations with linked objects Keyword-based association analysis -Collect sets of keywords or terms that occur frequently together and then find the association or correlation relationships among them -First preprocess the text data by parsing, stemming, removing stop words, etc. -Then evoke association mining algorithms --Consider each document as a transaction --View a set of keywords in the document as a set of items in the transaction -Term level association mining --No need for human effort in tagging documents --The number of meaningless results and the execution time is greatly reduced Mining the World-Wide Web -The WWW is huge, widely distributed, global information service center for --Information services: news, advertisements, consumer information, management, education, government, e-commerce, etc. --Hyper-link information --Access and usage information financial
-WWW provides rich sources for data mining Challenges -Too huge for effective data warehousing and data mining -Too complex and heterogeneous: no standards and structure Web search engines Index-based: search the Web, index Web pages, and build and store huge keyword-based indices Help locate sets of Web pages containing certain keywords Deficiencies -A topic of any breadth may easily contain hundreds of thousands of documents -Many documents that are highly relevant to a topic may not contain keywords defining them (polysemy) Web Mining: A more challenging task Searches for -Web access patterns -Web structures -Regularity and dynamics of Web contents Problems -The “abundance” problem -Limited coverage of the Web: hidden Web sources, majority of data in DBMS -Limited query interface based on keyword-oriented search -Limited customization to individual users Mining the Web's Link Structures Finding authoritative Web pages -Retrieving pages that are not only relevant, but also of high quality, or authoritative on the topic Hyperlinks can infer the notion of authority -The Web consists not only of pages, but also of hyperlinks pointing from one page to another -These hyperlinks contain an enormous amount of latent human annotation -A hyperlink pointing to another Web page, this can be considered as the author's endorsement of the other page Problems with the Web linkage structure -Not every hyperlink represents an endorsement --Other purposes are for navigation or for paid advertisements
--If the majority of hyperlinks are for endorsement, the collective opinion will still dominate -One authority will seldom have its Web page point to its rival authorities in the same field -Authoritative pages are seldom particularly descriptive Hub -Set of Web pages that provides collections of links to authorities
Lecture-18 Data Mining Applications Data mining is a young discipline with wide and diverse applications -There is still a nontrivial gap between general principles of data mining and domainspecific, effective data mining tools for particular applications Some application domains -Biomedical and DNA data analysis -Financial data analysis -Retail industry -Telecommunication industry Biomedical Data Mining and DNA Analysis -DNA sequences: 4 basic building blocks (nucleotides): adenine (A), cytosine (C), guanine (G), and thymine (T). -Gene: a sequence of hundreds of individual nucleotides arranged in a particular order -Humans have around 100,000 genes -Tremendous number of ways that the nucleotides can be ordered and sequenced to form distinct genes -Semantic integration of heterogeneous, distributed genome databases --Current: highly distributed, uncontrolled generation and use of a wide variety of DNA data --Data cleaning and data integration methods developed in data mining will help Data Mining for Financial Data Analysis Financial data collected in banks and financial institutions are often relatively complete, reliable, and of high quality
Design and construction of data warehouses for multidimensional data analysis and data mining -View the debt and revenue changes by month, by region, by sector, and by other factors -Access statistical information such as max, min, total, average, trend, etc. Loan payment prediction/consumer credit policy analysis -feature selection and attribute relevance ranking -Loan payment performance -Consumer credit rating Data Mining for Retail Industry Retail industry: huge amounts of data on sales, customer shopping history, etc. Applications of retail data mining -Identify customer buying behaviors -Discover customer shopping patterns and trends -Improve the quality of customer service -Achieve better customer retention and satisfaction -Enhance goods consumption ratios -Design more effective goods transportation and distribution policies Data Mining for Telecomm. Industry A rapidly expanding and highly competitive industry and a great demand for data mining -Understand the business involved -Identify telecommunication patterns -Catch fraudulent activities -Make better use of resources -Improve the quality of service Multidimensional analysis of telecommunication data -Intrinsically multidimensional: calling-time, duration, location of caller, location of callee, type of call, etc. Fraudulent pattern analysis and the identification of unusual patterns -Identify potentially fraudulent users and their atypical usage patterns -Detect attempts to gain fraudulent entry to customer accounts -Discover unusual patterns which may need special attention Multidimensional association and sequential pattern analysis -Find usage patterns for a set of communication services by customer group, by month, etc. -Promote the sales of specific services -Improve the availability of particular services in a region Use of visualization tools in telecommunication data analysis How to choose a data mining system
Commercial data mining systems have little in common -Different data mining functionality or methodology -May even work with completely different kinds of data sets Need multiple dimensional view in selection Data types: relational, transactional, text, time sequence, spatial? System issues -running on only one or on several operating systems? -a client/server architecture? -Provide Web-based interfaces and allow XML data as input and/or output? Data sources -ASCII text files, multiple relational data sources -support ODBC connections (OLE DB, JDBC)? Data mining functions and methodologies -One vs. multiple data mining functions -One vs. variety of methods per function --More data mining functions and methods per function provide the user with greater flexibility and analysis power Coupling with DB and/or data warehouse systems -Four forms of coupling: no coupling, loose coupling, semitight coupling, and tight coupling --Ideally, a data mining system should be tightly coupled with a database system Scalability -Row (or database size) scalability -Column (or dimension) scalability -Curse of dimensionality: it is much more challenging to make a system column scalable that row scalable Visualization tools -“A picture is worth a thousand words” -Visualization categories: data visualization, mining result visualization, mining process visualization, and visual data mining Data mining query language and graphical user interface -Easy-to-use and high-quality graphical user interface -Essential for user-guided, highly interactive data mining
Lecture-19 Examples of Data Mining Systems IBM Intelligent Miner -A wide range of data mining algorithms -Scalable mining algorithms -Toolkits: neural network algorithms, statistical methods, data preparation, and data visualization tools -Tight integration with IBM's DB2 relational database system SAS Enterprise Miner -A variety of statistical analysis tools -Data warehouse tools and multiple data mining algorithms Mirosoft SQLServer 2000 -Integrate DB and OLAP with mining -Support OLEDB for DM standard edx SGI MineSet -Multiple data mining algorithms and advanced statistics -Advanced visualization tools Clementine (SPSS) -An integrated data mining development environment for end-users and developers -Multiple data mining algorithms and visualization tools DBMiner (DBMiner Technology Inc.) -Multiple data mining modules: discovery-driven OLAP analysis, association, classification, and clustering -Efficient, association and sequential-pattern mining functions, and visual classification tool -Mining both relational databases and data warehouses Visual Data Mining
Visualization: use of computer graphics to create visual images which aid in the understanding of complex, often massive representations of data Visual Data Mining: the process of discovering implicit but useful knowledge from large data sets using visualization techniques Purpose of Visualization -Gain insight into an information space by mapping data onto graphical primitives -Provide qualitative overview of large data sets -Search for patterns, trends, structure, irregularities, relationships among data. -Help find interesting regions and suitable parameters for further quantitative analysis. -Provide a visual proof of computer representations derived Visual Data Mining & Data Visualization Integration of visualization and data mining -data visualization -data mining result visualization -data mining process visualization -interactive visual data mining Data visualization -Data in a database or data warehouse can be viewed --at different levels of granularity or abstraction --as different combinations of attributes or dimensions -Data can be presented in various visual forms Audio Data Mining -Uses audio signals to indicate the patterns of data or the features of data mining results -An interesting alternative to visual mining -An inverse task of mining audio (such as music) databases which is to find patterns from audio data -Visual data mining may disclose interesting patterns using graphical displays, but requires users to concentrate on watching patterns -Instead, transform patterns into sound and music and listen to pitches, rhythms, tune, and melody in order to identify anything interesting or unusual Scientific and Statistical Data Mining There are many well-established statistical techniques for data analysis, particularly for numeric data -applied extensively to data from scientific experiments and data from economics and the social sciences Regression -predict the value of a response (dependent) variable from one or more predictor (independent) variables where the variables are numeric -forms of regression: linear, multiple, weighted, polynomial, nonparametric, and robust Generalized linear models -allow a categorical response variable (or some transformation of it) to be related to a set of predictor variables -similar to the modeling of a numeric response variable using linear regression
-include logistic regression and Poisson regression Regression trees -Binary trees used for classification and prediction -Similar to decision trees:Tests are performed at the internal nodes -Difference is at the leaf level --In a decision tree a majority voting is performed to assign a class label to the leaf --In a regression tree the mean of the objective attribute is computed and used as the predicted value Analysis of variance -Analyze experimental data for two or more populations described by a numeric response variable and one or more categorical variables (factors) Mixed-effect models -For analyzing grouped data, i.e. data that can be classified according to one or more grouping variables -Typically describe relationships between a response variable and some covariates in data grouped according to one or more factors Factor analysis -determine which vars are combined to generate a given factor -e.g., for many psychiatric data, one can indirectly measure other quantities (such as test scores) that reflect the factor of interest Discriminant analysis -predict a categorical response variable, commonly used in social science -Attempts to determine several discriminant functions (linear combinations of the independent variables) that discriminate among the groups defined by the response variable Time series: many methods such as autoregression, ARIMA (Autoregressive integrated moving-average modeling), long memory time-series modeling Survival analysis -predict the probability that a patient undergoing a medical treatment would survive at least to time t (life span prediction) Quality control -display group summary charts
Lecture-20 Understanding Knowledge
• • •
Knowledge can be defined as the ``understanding obtained through the process of experience or appropriate study.'' Knowledge can also be an accumulation of facts, procedural rules, or heuristics. o A fact is generally a statement representing truth about a subject matter or domain. o A procedural rule is a rule that describes a sequence of actions. o A heuristic is a rule of thumb based on years of experience. Intelligence implies the capability to acquire and apply appropriate knowledge. o Memory indicates the ability to store and retrieve relevant experience according to will. o Learning represents the skill of acquiring knowledge using the method of instruction/study. Experience relates to the understanding that we develop through our past actions. Knowledge can develop over time through successful experience, and experience can lead to expertise. Common sense refers to the natural and mostly unreflective opinions of humans.
• • • •
Cognitive psychology tries to identify the cognitive structures and processes that closely relates to skilled performance within an area of operation. It provides a strong background for understanding knowledge and expertise. In general, it is the interdisciplinary study of human intelligence. The two major components of cognitive psychology are: o Experimental Psychology: This studies the cognitive processes that constitutes human intelligence. o Artificial Intelligence(AI): This studies the cognition of Computer-based intelligent systems. The process of eliciting and representing experts knowledge usually involves a knowledge developer and some human experts (domain experts).
In order to gather the knowledge from human experts, the developer usually interviews the experts and asks for information regarding a specific area of expertise. It is almost impossible for humans to provide the completely accurate reports of their mental processes. The research in the area of cognitive psychology helps to a better understanding of what constitutes knowledge, how knowledge is elicited, and how it should be represented in a corporate knowledge base. Hence, cognitive psychology contributes a great deal to the area of knowledge management.
Data, Information and Knowledge
Data represents unorganized and unprocessed facts. o Usually data is static in nature. o It can represent a set of discrete facts about events. o Data is a prerequisite to information. o An organization sometimes has to decide on the nature and volume of data that is required for creating the necessary information. Information o Information can be considered as an aggregation of data (processed data) which makes decision making easier. o Information has usually got some meaning and purpose. Knowledge o By knowledge we mean human understanding of a subject matter that has been acquired through proper study and experience. o Knowledge is usually based on learning, thinking, and proper understanding of the problem area. o Knowledge is not information and information is not data. o Knowledge is derived from information in the same way information is derived from data. o We can view it as an understanding of information based on its perceived importance or relevance to a problem area. o It can be considered as the integration of human perceptive processes that helps them to draw meaningful conclusions.
Figure 1.1: Data, Information, Knowledge and Wisdom Kinds of Knowledge
• • • •
Deep Knowledge: Knowledge acquired through years of proper experience. Shallow Knowledge: Minimal understanding of the problem area. Knowledge as Know-How: Accumulated lessons of practical experience. Reasoning and Heuristics: Some of the ways in which humans reason are as follows: o Reasoning by analogy: This indicates relating one concept to another. o Formal Reasoning: This indicates reasoning by using deductive (exact) or inductive reasoning. Deduction uses major and minor premises. In case of deductive reasoning, new knowledge is generated by using previously specified knowledge. Inductive reasoning implies reasoning from a set of facts to a general conclusion. Inductive reasoning is the basis of scientific discovery.
A case is knowledge associated with an operational level. Common Sense: This implies a type of knowledge that almost every human being possess in varying forms/amounts. We can also classify knowledge on the basis of whether it is procedural, declarative, semantic, or episodic. o Procedural knowledge represents the understanding of how to carry out a specific procedure. o Declarative knowledge is routine knowledge about which the expert is conscious. It is shallow knowledge that can be readily recalled since it consists of simple and uncomplicated information. This type of knowledge often resides in short-term memory. o Semantic knowledge is highly organized, ``chunked'' knowledge that resides mainly in long-term memory. Semantic knowledge can include major concepts, vocabulary, facts, and relationships. o Episodic knowledge represents the knowledge based on episodes (experimental information). Each episode is usually ``chunked'' in longterm memory. Another way of classifying knowledge is to find whether it is tacit or explicit o Tacit knowledge usually gets embedded in human mind through experience. o Explicit knowledge is that which is codified and digitized in documents, books, reports, spreadsheets, memos etc.
Expert Knowledge It is the information woven inside the mind of an expert for accurately and quickly solving complex problems.
Knowledge Chunking o Knowledge is usually stored in experts long-range memory as chunks. o Knowledge chunking helps experts to optimize their memory capacity and enables them to process the information quickly. o Chunks are groups of ideas that are stored and recalled together as an unit. Knowledge as an Attribute of Expertise o In most areas of specialization, insight and knowledge accumulate quickly, and the criteria for expert performance usually undergo continuous change. o In order to become an expert in a particular area, one is expected to master the necessary knowledge and make significant contributions to the concerned field. o The unique performance of a true expert can be easily noticed in the quality of decision making. o The true experts (knowledgeable) are usually found to be more selective about the information they acquire, and also they are better able in acquiring information in a less structured situation. o They can quantify soft information, and can categorize problems on the basis of solution procedures that are embedded in the experts long range memory and readily available on recall. o Hence, they tend to use knowledge-based decision strategies starting with known quantities to deduce unknowns. o If a first-cut solution path fails, then the expert can trace back a few steps and then proceed again. o Nonexperts use means-end decision strategies to approach the the problem scenario. o Nonexperts usually focus on goals rather than focusing on essential features of the task which makes the task more time consuming and sometimes unreliable. o Specific individuals are found to consistently perform at higher levels than others and they are labeled as experts.
Thinking and Learning in Humans
Research in the area of artificial intelligence has introduced more structure into human thinking about thinking.
• • • • • •
Humans do not necessarily receive and process information in exactly the same way as the machines do. Humans can receive information via seeing, smelling, touching, hearing (sensing) etc., which promotes a way of thinking and learning that is unique to humans. On macro level, humans and computers can receive inputs from a multitude of sources. Computers can receive inputs from keyboards, touch screens etc. On micro level, both human brain and CPU of a computer receive information as electrical impulses. The point to note here is that the computers must be programmed to do specific tasks. Performing one task does not necessarily transcend onto other tasks as it may do with humans. Human learning: Humans learn new facts, integrate them in some way which they think is relevant and organize the result to produce necessary solution, advice and decision. Human learning can occur in the following ways: o Learning through Experience. o Learning by Example. o Learning by Discovery.
Challenges in KM Systems Development
• • •
Changing Organizational Culture: o Involves changing people's attitudes and behaviours. Knowledge Evaluation: o Involves assessing the worth of information. Knowledge Processing: o Involves the identification of techniques to acquire, store, process and distribute information. o Sometimes it is necessary to document how certain decisions were reached. Knowledge Implementation: o An organization should commit to change, learn, and innovate. o It is important to extract meaning from information that may have an impact on specific missions. o Lessons learned from feedback can be stored for future to help others facing the similar problem(s).
Lecture-22 Capturing Knowledge
• • • • • •
Capturing Knowledge involves extracting, analyzing and interpreting the concerned knowledge that a human expert uses to solve a specific problem. Explicit knowledge is usually captured in repositories from appropriate documentation, files etc. Tacit knowledge is usually captured from experts, and from organization's stored database(s). Interviewing is one of the most popular methods used to capture knowledge. Data mining is also useful in terms of using intelligent agents that may analyze the data warehouse and come up with new findings. In KM systems development, the knowledge developer acquires the necessary heuristic knowledge from the experts for building the appropriate knowledge base. Knowledge capture and knowledge transfer are often carried out through teams (refer to Figure 2.4). Knowledge capture includes determining feasibility, choosing the appropriate expert, tapping the experts knowledge, retapping knowledge to plug the gaps in the system, and verify/validate the knowledge base (refer to Table 3.4 in page 76 of your textbook).
Figure 2.4: Matching business strategies with KM strategies The Role of Rapid Prototyping
In most of the cases, knowledge developers use iterative approach for capturing knowledge.
Foe example, the knowledge developer may start with a prototype (based on the somehow limited knowledge captured from the expert during the first few sessions). The following can turn the approach into rapid prototyping: o Knowledge developer explains the preliminary/fundamental procedure based on rudimentary knowledge extracted from the expert during the few past sessions. o The expert reacts by saying certain remarks. o While the expert watches, the knowledge developer enters the additional knowledge into the computer-based system (that represents the prototype). o The knowledge developer again runs the modified prototype and continues adding additional knowledge as suggested by the expert till the expert is satisfied. The spontaneous, and iterative process of building a knowledge base is referred to as rapid prototyping.
The Role of the Knowledge Developer
• • •
The knowledge developer can be considered as the architect of the system. He/she identifies the problem domain, captures knowledge, writes/tests the heuristics that represent knowledge, and co-ordinates the entire project. Some necessary attributes of knowledge developer: o Communication skills. o Knowledge of knowledge capture tools/technology. o Ability to work in a team with professional/experts. o Tolerance for ambiguity. o To be able ti think conceptually. o Ability to frequently interact with the champion, knowledge workers and knowers in the organization.
Figure 2.5: Knowledge Developer's Role Designing the KM Blueprint This phase indicates the beginning of designing the IT infrastructure/ Knowledge Management infrastructure. The KM Blueprint (KM system design) addresses a number of issues.
• • • •
Aiming for system interoperability/scalability with existing IT infrastructure of the organization. Finalizing the scope of the proposed KM system. Deciding about the necessary system components. Developing the key layers of the KM architecture to meet organization's requirements. These layers are: o User interface
o o o o o o
Authentication/security layer Collaborative agents and filtering Application layer Transport internet layer Physical layer Repositories
• • • • • • • •
Knowledge update can mean creating new knowledge based on ongoing experience in a specific domain and then using the new knowledge in combination with the existing knowledge to come up with updated knowledge for knowledge sharing. Knowledge can be created through teamwork (refer to Figure 3.1) A team can commit to perform a job over a specific period of time. A job can be regarded as a series of specific tasks carried out in a specific order. When the job is completed, then the team compares the experience it had initially (while starting the job) to the outcome (successful/disappointing). This comparison translates experience into knowledge. While performing the same job in future,the team can take corrective steps and/or modify the actions based on the new knowledge they have acquired. Over time, experience usually leads to expertise where one team (or individual) can be known for handling a complex problem very well. This knowledge can be transferred to others in a reusable format.
• • •
Figure 3.1: Knowledge Creation/Knowledge Sharing via Teams There exists factors that encourage (or retard) knowledge transfer. Personality is one factor in case of knowledge sharing. For example, extrovert people usually posses self-confidence, feel secure, and tend to share experiences more readily than the introvert, self-centered, and security-conscious people.
People with positive attitudes, who usually trust others and who work in environments conductive to knowledge sharing tends to be better in sharing knowledge. Vocational reinforcers are the key to knowledge sharing. People whose vocational needs are sufficiently met by job reinforcers are usually found to be more likely to favour knowledge sharing than the people who are deprived of one or more reinforcers.
Figure 3.2: Impediments to Knowledge Sharing Capturing the Tacit Knowledge
• • •
Knowledge Capture can be defined as the process using which the expert's thoughts and experiences can be captured. In this case, the knowledge developer collaborates with the expert in order to convert the expertise into the necessary program code(s). Important steps: o Using appropriate tools for eliciting information. o Interpreting the elicited information and consequently inferring the experts underlying knowledge/reasoning process. o Finally, using the interpretation to construct the the necessary rules which can represent the experts reasoning process.
Fuzzy Reasoning & Quality of Knowledge Capture
• • •
Sometimes, the information gathered from the experts via interviewing is not precise and it involves fuzziness and uncertainty. The fuzziness may increase the difficulty of translating the expert's notions into applicable rules. Analogies/Uncertainties: o In the course of explaining events, experts can use analogies (comparing a problem with a similar problem which has been encountered in possibly different settings, months or years ago). o An expert's knowledge or expertise represents the ability to gather uncertain information as input and to use a plausible line of reasoning to clarify the fuzzy details. o Belief, an aspect of uncertainty, tends to describe the level of credibility. o People may use different kinds of words in order to express belief. o These words are often paired with qualifiers such as highly, extremely. Understanding experience: o Knowledge developers can benefit from their understanding/knowledge of cognitive psychology. o When a question is asked, then an expert operates on certain stored information through deductive, inductive, or other kinds of problemsolving methods. o The resulting answer is often found to be the culmination of the processing of stored information. o The right question usually evokes the memory of experiences that produced good and appropriate solutions in the past. Sometimes, how quickly an expert responds to a question depends on the clarity of content, whether the content has been recently used , and how well the expert has understood the question.
Problem with the language: How well the expert can represent internal processes can vary with their command of the language they are using and the knowledge developer's interviewing skills.
The language may be unclear in the following number of ways:
o o o o
Comparative words (e.g., better, faster) are sometimes left hanging. Specific words or components may be left out of an explanation. Absolute words and phrases may be used loosely. Some words always seem to have a built-in ambiguity
Interviewing as a Tacit Knowledge Capture Tool
Advantages of using interviewing as a tacit knowledge capture tool: o It is a flexible tool. o It is excellent for evaluating the validity of information. o It is very effective in case of eliciting information regarding complex matters. o Often people enjoy being interviewed. Interviews can range from the highly unstructured type to highly structured type. o The unstructured types are difficult to conduct, and they are used in the case when the knowledge developer really needs to explore an issue. o The structured types are found to be goal-oriented, and they are used in the case when the knowledge developer needs specific information. o Structured questions can be of the following types: Multiple-choice questions. Dichotomous questions. Ranking scale questions. o In semistructured types, the knowledge developer asks predefined questions, but he/she allows the expert some freedom in expressing his/her answer. Guidelines for successful interviewing: o Setting the stage and establishing rapport. o Phrasing questions. o Listening closely/avoiding arguments. o Evaluating the session outcomes. Reliability of the information gathered from experts: Some uncontrolled sources of error that can reduce the information's reliability: Expert's perceptual slant. The failure in expert's part to exactly remember what has happened. Fear of unknown in the part of expert. Problems with communication. Role bias. Errors in part of the knowledge developer: validity problems are often caused by the interviewer effect (something about the knowledge developer colours the response of the expert). Some of the effects can be as follows: o Gender effect o Age effect o Race effect Problems encountered during interviewing o Response bias. o Inconsistency. o Problem with communication. o Hostile attitude.
o o o o o
Standardizing the questions. Setting the length of the interview. Process of ending the interview: o The end of the session should be carefully planned. o One procedure calls for the knowledge developer to halt the questioning a few minutes before the scheduled ending time, and to summarize the key points of the session. o This allows the expert to comment a schedule a future session. o Many verbal/nonverbal cues can be used for ending the interview. Issues: Many issues may arise during the interview, and to be prepared for the most important ones, the knowledge developer can consider the following questions: o How would it be possible to elicit knowledge from the experts who can not say what they mean or can not mean what they say. o How to set up the problem domain. o How to deal with uncertain reasoning processes. o How to deal with the situation of difficult relationships with expert(s). o How to deal with the situation when the expert does not like the knowledge developer for some reason. Rapid Prototyping in interviews: o Rapid prototyping is an approach to building KM systems, in which knowledge is added with each knowledge capture session. o This is an iterative approach which allows the expert to verify the rules as they are built during the session. o This approach can open up communication through its demonstration of the KM system. o Due to the process of instant feedback and modification, it reduces the risk of failure. o It allows the knowledge developer to learn each time a change is incorporated in the prototype. o This approach is highly interactive. o The prototype can create user expectations which in turn can become obstacles to further development effort.
Some Knowledge Capturing Techniques On-Site Observation (Action Protocol)
• • •
It is a process which involves observing, recording, and interpreting the expert's problem-solving process while it takes place. The knowledge developer does more listening than talking; avoids giving advice and usually does not pass his/her own judgment on what is being observed, even if it seems incorrect; and most of all, does not argue with the expert while the expert is performing the task. Compared to the process of interviewing, on-site observation brings the knowledge developer closer to the actual steps, techniques, and procedures used by the expert. One disadvantage is that sometimes some experts to not like the idea of being observed. The reaction of other people (in the observation setting) can also be a problem causing distraction. Another disadvantage is the accuracy/completeness of the captured knowledge.
• • •
It is an unstructured approach towards generating ideas about creative solution of a problem which involves multiple experts in a session. In this case, questions can be raised for clarification, but no evaluations are done at the spot. Similarities (that emerge through opinions) are usually grouped together logically and evaluated by asking some questions like: o What benefits are to be gained if a particular idea is followed. o What specific problems that idea can possibly solve. o What new problems can arise through this. The general procedure for conducting a brainstorming session: Introducing the session. Presenting the problem to the experts. Prompting the experts to generate ideas. Looking for signs of possible convergence. If the experts are unable to agree on a specific solution, they knowledge developer may call for a vote/consensus
o o o o
Is is a computer-aided approach for dealing with multiple experts. It usually begins with a pre-session plan which identifies objectives and structures the agenda, which is then presented to the experts for approval.
• • • • •
During the session, each expert sits on a PC and get themselves engaged in a predefined approach towards resolving an issue, and then generates ideas. This allows experts to present their opinions through their PC's without having to wait for their turn. Usually the comments/suggestions are displayed electronically on a large screen without identifying the source. This approach protects the introvert experts and prevents tagging comments to individuals. The benefit includes improved communication, effective discussion regarding sensitive issues, and closes the meeting with concise recommendations for necessary action (refer to Figure 5.1 for the sequence of steps). This eventually leads to convergence of ideas and helps to set final specifications. The result is usually the joint ownership of the solution.
Protocol Analysis (Think-Aloud Method)
• • • •
In this case, protocols (scenarios) are collected by asking experts to solve the specific problem and verbalize their decision process by stating directly what they think. Knowledge developers do not interrupt in the interim. The elicited information is structured later when the knowledge developer analyzes the protocol. Here the term scenario refers to a detailed and somehow complex sequence of events or more precisely, an episode. A scenario can involve individuals and objects. A scenario provides a concrete vision of how some specific human activity can be supported by information technology.
Consensus Decision Making
• • • • • •
Consensus decision making usually follows brainstorming. It is effective if and only if each expert has been provided with equal and adequate opportunity to present their views. In order to arrive at a consensus, the knowledge developer conducting the exercise tries to rally the experts towards one or two alternatives. The knowledge developer follows a procedure designed to ensure fairness and standardization. This method is democratic in nature. This method can be sometimes tedious and can take hours.
Lecture-26 Repertory Grid
• • • • • • •
This is a tool used for knowledge capture. The domain expert classifies and categorizes a problem domain using his/her own model. The grid is used for capturing and evaluating the expert's model. Two experts (in the same problem domain) may produce distinct sets of personal and subjective results. The grid is a scale (or a bipolar construct) on which elements can be placed within gradations. The knowledge developer usually elicits the constructs and then asks the domain expert to provide a set of examples called elements. Each element is rated according to the constructs which have been provided.
Nominal Group Technique (NGT)
• • • •
This provides an interface between consensus and brainstorming. Here the panel of experts becomes a Nominal Group whose meetings are structured in order to effectively pool individual judgment. Ideawriting is a structured group approach used for developing ideas as well as exploring their meaning and the net result is usually a written report. NGT is an ideawriting technique.
It is a survey of experts where a series of questionnaires are used to pool the experts' responses for solving a specific problem. Each experts' contributions are shared with the rest of the experts by using the results from each questionnaire to construct the next questionnaire.
• • • • • •
It is a network of concepts consisting of nodes and links. A node represents a concept, and a link represents the relationship between concepts (refer to Figure 6.5 in page 172 of your textbook). Concept mapping is designed to transform new concepts/propositions into the existing cognitive structures related to knowledge capture. It is a structured conceptualization. It is an effective way for a group to function without losing their individuality. Concept mapping can be done for several reasons: o To design complex structures. o To generate ideas.
To communicate ideas. To diagnose misunderstanding. Six-step procedure for using a concept map as a tool: o Preparation. o Idea generation. o Statement structuring. o Representation. o Interpretation o Utilization. Similar to concept mapping, a semantic net is a collection of nodes linked together to form a net. o A knowledge developer can graphically represent descriptive/declarative knowledge through a net. o Each idea of interest is usually represented by a node linked by lines (called arcs) which shows relationships between nodes. o Fundamentally it is a network of concepts and relationships
• • •
In this case, the experts work together to solve a specific problem using the blackboard as their workspace. Each expert gets equal opportunity to contribute to the solution via the blackboard. It is assumed that all participants are experts, but they might have acquired their individual expertise in situations different from those of the other experts in the group. The process of blackboarding continues till the solution has been reached. Characteristics of blackboard system: o Diverse approaches to problem-solving. o Common language for interaction. o Efficient storage of information o Flexible representation of information. o Iterative approach to problem-solving. o Organized participation. Components of blackboard system: o The Knowledge Source (KS): Each KS is an independent expert observing the status of the blackboard and trying to contribute a higher level partial solution based on the knowledge it has and how well such knowledge applies to the current blackboard state. o The Blackboard : It is a global memory structure, a database, or a repository that can store all partial solutions and other necessary data that are presently in various stages of completion. o A Control Mechanism: It coordinates the pattern and flow of the problem solution. The inference engine and the knowledge base are part of the blackboard system.
This approach is useful in case of situations involving multiple expertise, diverse knowledge representations, or situations involving uncertain knowledge representation.
• • • • • •
Knowledge codification means converting tacit knowledge to explicit knowledge in a usable form for the organizational members. Tacit knowledge (e.g., human expertise) is identified and leveraged through a form that is able to produce highest return for the business. Explicit knowledge is organized, categorized, indexed and accessed. The organizing often includes decision trees, decision tables etc. Codification must be done in a form/structure which will eventually build the knowledge base. The resulting knowledge base supports training and decision making. o Diagnosis. o Training/Instruction. o Interpretation. o Prediction. o Planning/Scheduling. The knowledge developer should note the following points before initiating knowledge codification: o Recorded knowledge is often difficult to access (because it is either fragmented or poorly organized). o Diffusion of new knowledge is too slow. o Knowledge is nor shared, but hoarded (this can involve political implications). o Often knowledge is not found in the proper form. o Often knowledge is not available at the correct time when it is needed. o Often knowledge is not present in the proper location where it should be present. o Often the knowledge is found to be incomplete.
Modes of Knowledge Conversion
• • • •
Conversion from tacit to tacit knowledge produces socialization where knowledge developer looks for experience in case of knowledge capture. Conversion from tacit to explicit knowledge involves externalizing, explaining or clarifying tacit knowledge via analogies, models, or metaphors. Conversion from explicit to tacit knowledge involves internalizing (or fitting explicit knowledge to tacit knowledge. Conversion from explicit to explicit knowledge involves combining, categorizing, reorganizing or sorting different bodies of explicit knowledge to lead to new knowledge.
An organization must focus on the following before codification: o What organizational goals will the codified knowledge serve? o What knowledge exists in the organization that can address these goals? o How useful is the existing knowledge for codification? o How would someone codify knowledge? Codifying tacit knowledge (in its entirety) in a knowledge base or repository is often difficult because it is usually developed and internalized in the minds of the human experts over a long period of time. ]
It is another technique used for knowledge codification. It consists of some conditions, rules, and actions.
A phonecard company sends out monthly invoices to permanent customers and gives them discount if payments are made within two weeks. Their discounting policy is as follows: ``If the amount of the order of phone cards is greater than $35, subtract 5% of the order; if the amount is greater than or equal to $20 and less than or equal to $35, subtract a 4% discount; if the amount is less than $20, do not apply any discount.'' We shall develop a decision table for their discounting decisions, where the condition alternatives are `Yes' and `No'.
Figure 6.2: Example: Decision Table
It is also a knowledge codification technique. A decision tree is usually a hierarchically arranged semantic network.
A decision tree for the phonecard company discounting policy (as discussed above) is shown next.
Figure 6.3: Example: Decision Tree
• • •
A frame is a codification scheme used for organizing knowledge through previous experience. It deals with a combination of declarative and operational knowledge. Key elements of frames: o Slot: A specific object being described/an attribute of an entity. o Facet: The value of an object/slot.
Lecture-29 & 30 Production Rules
• • • • • • • • • •
They are conditional statements specifying an action to be taken in case a certain condition is true. They codify knowledge in the form of premise-action pairs. Syntax: IF (premise) THEN (action) Example: IF income is `standard' and payment history is `good', THEN `approve home loan'. In case of knowledge-based systems, rules are based on heuristics or experimental reasoning. Rules can incorporate certain levels of uncertainty. A certainty factor is synonymous with a confidence level , which is a subjective quantification of an expert's judgment. The premise is a Boolean expression that should evaluate to be true for the rule to be applied. The action part of the rule is separated from the premise by the keyword THEN. The action clause consists of a statement or a series of statements separated by AND's or comma's and is executed if the premise is true.
In case of knowledge-based systems, planning involves:
• • • • • • • •
Breaking the entire system into manageable modules. Considering partial solutions and liking them through rules and procedures to arrive at a final solution. Deciding on the programming language(s). Deciding on the software package(s). Testing and validating the system. Developing the user interface. Promoting clarity, flexibility; making rules clear. Reducing unnecessary risk.
Role of inferencing:
• • •
Inferencing implies the process of deriving a conclusion based on statements that only imply that conclusion. An inference engine is a program that manages the inferencing strategies. Reasoning is the process of applying knowledge to arrive at the conclusion. o Reasoning depends on premise as well as on general knowledge. o People usually draw informative conclusions.
• • •
It is reasoning from relevant past cases in a way similar to human's use of past experiences to arrive at conclusions. Case-based reasoning is a technique that records and documents cases and then searches the appropriate cases to determine their usefulness in solving new cases presented to the expert. The aim is to bring up the most similar historical case that matches the present case. Adding new cases and reclassifying the case library usually expands knowledge. A case library may require considerable database storage as well as an efficient retrieval system.
• • •
An intelligent agent is a program code which is capable of performing autonomous action in a timely fashion. They can exhibit goal directed behaviour by taking initiative. they can be programmed to interact with other agents or humans by using some agent communication language.
In terms of knowledge-based systems, an agent can be programmed to learn from the user behaviour and deduce future behaviour for assisting the user.1
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.