Professional Documents
Culture Documents
ITEM TITLE
1) What is a Data Model 2) The History of Data Modeling 3) Data Modeling Explained 4) Data Warehouse Glossary 5) Entity Relationship Model 6) Physical Data Models 7) Tips for Mastering Data Modeling 8) Role of Data Modeling within Enterprise Management 9) Data Modeling Overview 10) Data Modeling Introduction 11) Connection between Data Model and Data Warehouse
described. The conceptual data model is basically a collection of assertions about the type of information that is being used by a company. Entity classes are named using natural language, as opposed to technical jargon, and concrete assertions about the subject area benefit from proper naming. Another way of organizing data involves the use of a database management system. This involves the use of relational tables, columns, classes, and attributes. These models are sometimes called physical data models, but in the use of ANSI three schema architecture, it is referred to as logical. In this type of architecture, the storage media is described in the physical model cylinders, tablespaces, tracks, etc. It should be derived from the more conceptual model. There might be slight differences however, for example in the accounting for usage patterns and processing capacity. Data analysis is a term that has become synonymous with data modeling. Although in truth, the activity seems to have more in common with synthesis than analysis. Synthesis, after all, refers to the process whereby general concepts are inferred from particular instances; in analysis, the opposite happens particular concepts are identified from more general ones. I guess the professionals call themselves systems analysts because no one can pronounce systems synthesists! All joking aside, data modeling is an important method where by various data structures of interest are brought together into one cohesive whole, relating different structures into relationships and thereby eliminating redundancies making everyones lives a lot easier!
idea of object models was introduced without the acknolwedgment of the fact that systems analysts had already discovered similar models. In the 80s, a significantly new approach to data modeling was engineered by G.M. Nijssen. Deemed NIAM, short for Nijssens Information Analysis Methodology, it has since been re-named ORM, or object role modeling. The purpose is to show representations of relationships instead of showing types of entities as relational table analogs. With a focus on the use of language in making data modeling more accessible to a wider audience, ORM has a much higher potential for describing business regulations as well as constraints. In recent times, agile methodologies have arisen to the forefront of data modeling. Central to these methodologies is the concept of evolutionary design. From this standpoint, when confronted with a systems requirements, you acknowledge that you cannot fix them all up front. It is not practical to have a detailed designing phase at the very beginning of a project. Instead, the idea is that the systems design must evolve throughout the softwares numerous iterations. People tend to learn things by experimenting trying new things out. Evolutionary design recognizes this key component of human nature. Under this concept, developers are expected to experiment with ways of implementing a particular feature. It may take them several tries before they arrive at a settled method. It is the same for database design. Some experts claim that it is virtually impossible to manage multiple databases. But others claim that a database manager can easily look over a hundred database instances with relative ease, given the possession of the right tools. While experimentation is important, it is also important to bring the different approaches one has tried out back together into an integrated whole every once in a while. For this, you need a shared master database, one that all work flows out of. When beginning a task, be sure to copy the master into your own workspace. Then you can manipulate it and enter the changes back into the master copy. Be sure to integrate at least once a day. Integrating constantly is key. It is a lot easier to perform smaller integrations frequently as opposed to occasionally performing larger integrations. The difficulties of integration tend to increase exponentially with each integrations size. So in other words, doing a bunch of small changes is a lot easier in practice, even though it might not seem to make a lot of sense. The Software Configuration Management community has taken note of similarities to this practice when dealing with source code.
common in the data modeling field. This is important, because giving the proper names to relationships will allow strong assertions to be made about various areas. Another concept that you will want to study is generic data modeling. Distinct modelers will produce numerous models for the same domain. This can make it hard to bring together models from distinct people or entities. Despite this, it should be noted that the differences are related to the varying levels of abstraction that occur in the models. If an agreement can be made among the modelers for specific elements that are concreted, the differences between the entities can be less emphasized, and they can be rendered with a higher level of detail. Data models have played important roles in the functions of many database management systems, and they have become more important as we move further in the information age. Companies that understand how to properly use data models will greatly benefit. There are a large number of fields where data modeling technique are very useful.
Architecture - The architecture is the underlying structure for the data warehouse. It represents the planning of the warehouse, as well as the implementation of the data and the resources that are used to deal with it. The architecture of a data warehouse can be broken down into technologies, data, and processes. The architecture is a blueprint, and it will provide a description for the data warehouse and its environment. Atomic data - As the name implies, atomic data is data that has been broken down into its most simple form, much like matter is broken down into simple atoms and other subatomic elements. Attributes - This is a term that is closely related to data modeling as well as data warehouses. It deals with the characteristics that a piece of data will have. Each unit will have its own unique values, and when a logical models are transformed into a physical model, these entities will be transformed into tables. The attributes themselves will be transformed into columns. Back-end - A back-end can be described as filling up the data warehouse with data that comes from a system that is operational. Best of Breed - This term is used to refer to the most power products that fall under various categories. When an organization chooses their tools, they will find that some are better than others. By choosing the best products from the best vendors, the efficiecny of the data warehouse will be greatly increased. Best practices - The best practices can be defined as the processes which maximizes the companies use of the data warehouse. Business analyst - An analyst is a person who is responsible for studying the data and the operations that are used to maintain it. Business Intelligence - Business intelligence is an important concept that deals with the evaluation of business data. It deals with both databases and various applications. Business intelligence is a broad term that deals with a large number of topics. Some of them include data mining and alternative forms of storage. CRM (Customer Relationship Management) - Customer relationship management deals with the infrastructure that will allow businesses to help companies better serve their customers. Customer relationship management plays important roles in the interactions between customers and companies. Customer Segmentation - This is the process by which customers are split into factors based on their age, educational, or their gender.
relationships, an arrow, known as a key constraint, will be drawn connecting the set of entities to the set of relationships. A thick arrow represents that every entity in a set of entities is involved in one relationship.
A couple of other useful definitions: Associative entities are often used to solve the conflict of a couple of entities with manyto-many relationships. Unary relationships are those relationships between the rows of a single table. Weak entities cannot be identified with the help of its own attributes. It is thus necessary to use these entities primary keys, as well as the primary key of an entity that is related to it. In an entity relationship model, attributes can be composite, derived, or multivalued. Multi-valued attributes might have more than one value in one or more instances of its entity. These attributes are denoted with a two line ellipse. So, if a piece of software happens to run on more than one operating system, it could have the attribute platform; this is a multi-valued attribute. Composite attributes, on the other hand, are those attributes that might contain two or more attributes. A composite attribute will have contributing attributes of its own. An example of a composite attribute is an address, as it is composed of attributes that include street address, city, state/region, country, etc. Finally, there are derived attributes. The value of these attributes is dependent wholly on another attribute. Derived attributes are denoted with dashed ellipses. So, in a database of employees that has an age attribute, the age attribute would have to be derived from an atttribute of birth dates.
youre working on a project from an agile standpoint, the assumption from the outset is that you will not be able to fix the systems requirements upfront. So it is not practical to have a detailed design process at the very beginning of the project. Instead, the systems design is expected to evolve through the softwares iterations. These new agile techniques, which have come to the forefront in methodologies such as XP (extreme programming), have been proven to be incredibly practical when it comes to database design. Agile methods are distinguished by their flexibility and eagerness to adapt. In contrast to the waterfall approach (in which techniques and rules are learned and signed off on early in the design process), agile methods are not driven by advanced plans. But what is wrong with the waterfall approach? After all, isnt this the way most of us learn? Sure, but we want a system that is flexible and ready to adapt to change. The waterfall approach tries to eliminate changes in the beginning by doing most of the work upfront. If any changes need to be made once that work is done, then major problems can occur. Agile methods, then, are more reliable, in that they have a more open approach to change. In fact, they are based on the concept that change is inevitable and that changes will occur throughout the process of development. Of course it is essential that changes be monitored and controlled, but agile methods differ from the waterfall approach in that they actually encourage change. The reasoning behind this is twofold. First, it provides a more dynamic model for businesses that constantly have to shift their requirements in order to keep up with the competition that the market poses. Secondly, many projects these days are inherently unstable when it comes to systems requirements, so it is best to design a system that incorporates the possibility of change into its components. What is thus needed for a successful database design is to completely shift ones attitude, ones way of thinking about the design process. Instead of viewing it as merely a phase, one that is to be completed before construction begins, it should be viewed as a constant process that is interwoven with testing, construction, and delivery. This is the main thing that distinguishes evolutionary design from planned design. Agile methods have come up with ways to allow evolutionary design to function in a manner that is controlled, rather than chaotic. Iterative development is key to this approach. It involves the running of the complete life cycle of the software several times throughout the projects life span. In each iteration, the complete life cycle is run through by the agile process. The iteration is completed with the working code for a subset of the final products requirements. These techniques have garnered widespread interest and are widely used. Yet the question remains: can evolutionary design function for databases? The answer is yes. One such project, Atlas, involved nearly one hundred individuals spread over multiple sites all
over the world, more than two hundred tables, and nearly half a million code lines. It took about a year and a half of initial development and continues to evolve to this day. At the beginning, month-long iterations were used, but later, two week iterations proved to be more effective. The techniques used in the Atlas project included the close collaboration of DBAs with developers, frequent integration into a shared master by developers, automated refactorings, automatically updated database developers, the successful usage of schema and test data, and the clear separation of database access code. On smaller projects, one of the Atlas analysts concluded, a full-time DBA is not even needed. Instead, a developer with interest in DBA issues can do the job part-time.
-When youre naming the entities, aim for clarity and cohesion. Make sure the name is a clear representation of the thing. By using normal English (or Spanish, whatever language youre working in), you eliminate a lot of confusion in advance. It is also to keep in mind who youre designing the model for, i.e. your audience. Therefore the entity names should be universally recognizable by all. Steer clear of acronyms, table names, and abbreviations. Accompany each entity name with a concise, clear definition in the dictionary. -If you have not had time to analyze an identified entity, be sure to state that in the dictionary. Try to come back to it as soon as possible. You should strive to identify all possibly relationships. -Each time an entity occurs, it should be marked with the use of a sub-type. Each entitys occurrence should only appear in that singular sub-type. Never use a sub-type where an example would function better. Using a sub-type implies that it has relationships that distinguish it from others and that the division of entities into sub-types is a stable form of organizing the data.
-Weak relationship names should be avoided at all costs. You dont need to use statements like related to; the fact that it is a relationship makes this point pretty clear. A gerund can be employed if need be, but only as a last resort. In general, concreteness should always be striven for, so that the client may easily correct it if it is not 100% accurate. -The drawing should be organized in a way so that it will be attractive visually. Always give equal weight to both sides of the diagram. White spaces should be left with discretion. Make sure different elements are not crowded together unless they absolutely have to be. -The overall model should be broken into different topics. That way, each diagram only concerns at most two subjects. The drawings should be organized in a way that each one represents a whole logical horizon. - If you are building a model for publication, laying it out in portrait is a lot better. But if its just for the screen, choose landscape form it is a lot more efficient. Whatever you choose, make sure it is used consistently throughout all the diagrams.
enterprise project, it is thus necessary to depend on data modeling in the successful execution of projects. Data modeling can be used to deliver dependable, cost effective systems. Project managers are responsible for numerous tasks, including planning, estimating, evaluating risks, managing resources, monitoring, managing deliveries, and more. Almost every one of those activities depends on the data models evolution. In recent years, data modeling has been the subject of fierce criticism. It is said that data modeling has been poor to adapt to changes forced by the rapid changes of globalization and decentralized structures. A lot of this criticism is rooted in negative experiences with data modeling. But data modeling should not be eliminated altogether. Rather, the goal should be to use it for its advantages, and get it right. But for those not keyed in to the pros and cons, lets start by looking at some of the arguments against data modeling. Then well use these negatives to show why data modeling can, in fact, be an effective enterprising strategy. First, it is argued that data modeling slows down the development of systems in an era when speed is the key in technological development. It is also claimed that in the process of developing software, enterprise data models add unnecessary complexity. Prefabricated data models are said to be of no good use, and are thus not good investments. Data modeling is said to not be able to keep up with the changes in the information processing systems belonging to financial institutions. It is also claimed that data modeling cannot provide banks with information systems that are adaptable to the necessary speed of innovation. Above all, data modeling is seen as a brake, rather than an accelerator. In the realm of systems development, change is a constant. So, the reasoning goes, by using data modeling, you are promoting the idea that structure shouldnt change. Yet despite the rampant criticism that enterprise data modeling often becomes the target of, the fact remains that some basic requirements are needed when developing software. No financial institution can survive without using integrated systems. They manage complexities and handle interdependencies. Integrating new and old systems is, in fact, the norm in the realm of systems development. The usage of consistent terms in an enterprises single systems will result in the consistent processing of data over the course of several different systems. The fact is, vast majority of the problems identified above can be repaired by a quality data modeling system. By using these arguments against, we will demonstrate how data modeling can actually be used for the benefit of an enterprise. In regards to the argument that data modeling is unable to keep up with rapid change, the obvious answer is that rapidly changing variants should not be processed in a data model. It is the core, steadfast aspects of a business that are the subject of data modeling not
short-term organizational schemes. External partners or bank account info, for example, can be readily modeled in data model schemes, as they are not prone to rapid changes. These areas should remain the focus of data modeling. This is the objective of reference models. In fact, when used correctly, data modeling does work as an accelerator, rather than a brake. It should work as a service function for various projects. By visiting the companys data model group, you can learn about other efforts occurring within the enterprise, about other entities, and how similar problems were solved by others using a similar reference model. If your data administration department only does reviews, then yes, it will slow down the progress of your projects. Instead, the data administration department should aid in the development of projects and thus help them pick up speed. What is needed in beneficial data modeling is just the right amount of detail. A top level data model is a viable framework, although it is not essential to work from the top downwards. Never assume that it is necessary to expand your data model to the fourth level of detail. This kind of detailing should never be a part of the enterprise data model. Rather it should be kept as part of the individual project data model. The question that critics of data modeling fail to answer is what should be done as an alternative to data modeling. We dont want to go back to the old days of data processing, do we? In fact, it will take up even more time and resources to revert back to these stoneage techniques. With a little time and smart planning, data modeling can be an effective tool in an enterprise.
model should always strive to resemble the finished structure. This makes it easier to perform any necessary adjustments along the way. The designer/builder should get the message from the model this is what you want to build. Now lets take a look at the structure of data. This is what the data model describes within the confines of a particular domain and, as it implies, the underlying structure of that specific domain. What this means is that data models actually specify a special grammar for the domains own private artificial language. Data models are representations of different entity classes that a company wants to possess information about, containing the specifics behind that information, and the relationship among the differing entities and attributes. The data may be represented in a different fashion on the actual computer system than the way it is described in the data model. The entities, or types of things, represented in the data model might be tangible entities, but models with entity classes that are so concrete usually change over time. Abstractions are often identified by robust data models. A data model might have an entity class marked Persons, which is meant to represent all the people who interact with a company. This abstract entity is more appropriate than ones called Salesman or Boss, which would specify a special role played by certain people. In a conceptual data model, the semantics of a particular subject area are what is described. The conceptual data model is basically a collection of assertions about the type of information that is being used by a company. Entity classes are named using natural language, as opposed to technical jargon, and concrete assertions about the subject area benefit from proper naming. Another way of organizing data involves the use of a database management system. This involves the use of relational tables, columns, classes, and attributes. These models are sometimes called physical data models, but in the use of ANSI three schema architecture, it is referred to as logical. In this type of architecture, the storage media is described in the physical model cylinders, tablespaces, tracks, etc. It should be derived from the more conceptual model. There might be slight differences however, for example in the accounting for usage patterns and processing capacity.
Data analysis is a term that has become synonymous with data modeling. Although in truth, the activity seems to have more in common with synthesis than analysis. Synthesis, after all, refers to the process whereby general concepts are inferred from particular instances; in analysis, the opposite happens particular concepts are identified from more general ones. I guess the professionals call themselves systems analysts because no one can pronounce systems synthesists! All joking aside, data modeling is an important method whereby various data structures of interest are brought together into one cohesive whole, relating different structures into relationships and thereby eliminating redundancies making everyones lives a lot easier!
Another way of organizing data involves the use of a database management system. This involves the use of relational tables, columns, classes, and attributes. These models are sometimes called physical data models, but in the use of ANSI three schema architecture, it is referred to as logical. In this type of architecture, the storage media is described in the physical model cylinders, tablespaces, tracks, etc. It should be derived from the more conceptual model. There might be slight differences however, for example in the accounting for usage patterns and processing capacity. Data analysis is a term that has become synonymous with data modeling. Although in truth, the activity seems to have more in common with synthesis than analysis. Synthesis, after all, refers to the process whereby general concepts are inferred from particular instances; in analysis, the opposite happens particular concepts are identified from more general ones. I guess the professionals call themselves systems analysts because no one can pronounce systems synthesis's! All joking aside, data modeling is an important method where by various data structures of interest are brought together into one cohesive whole, relating different structures into relationships and thereby eliminating redundancies, making everyones lives a lot easier! Notable data modeling techniques include IDEF, entity-relationship diagrams, Bachman diagrams, Barkers notation, Object Role Modeling (or Nijssens Information Analysis Method), the business rules approach, object-relationship modeling, and RM/T. Common data modeling tools include GNU Ferret, Datanamic DeZign, ERwin, ARIS, Oracle Designer, Visio Microsoft, SILVERRUN, Mogwai ER-Designer, MySQL Workbench, PowerDesigner, and ER/Studio.
The attributes of data found in the data warehouse should have information on subjects that is possible to be interpreted in a broad, general way. It has to be far reaching, representing many streams and classes of data. So if a given subject area is marked Customer and is modeled properly on the data warehouse standard, then it should include attributes of all sorts of customers past, present, and future. Attributes should be arranged in the data model that note when a person became a customer, when a person was last a customer, and whether that person was ever a customer. All this has to be noted in the Customer subject area. In placing all the relevant attributes that may be needed to classify a customer in the subject area, preparations have been made for future contingency for this piece of data. As a result, the DSS analyst will be able to utilize these attributes in order to have a look at past, potential, or future customers, as well as present day customers. The data model should operate as a tool for paving the way for this ultimate flexibility, in placing the right attributes in the data warehouses atomic data. To use a further example of the placing of numerous attributes within atomic data, a part number could include all kind of info about a part, even if the information is not immediately needed by current requirements. A part number can include such attributes as part number, technical description, drawing number, engineering specification, raw goods, precious goods, replenishment categories, weight, length, accounting cost basis, bill of material to, bill of material from, store number, assembly identification number, packaging info, etc. It might seem that some of these attributes appear to be extraneous for the vast majority of information that is typically processed in production control processing. However, by including all of these attributes to the data models part number, then the road has already been paved for future forms of processing that are unknown at present but just might arise some day. To put it in other terms, the data warehouses data model should try its best to include numerous classifications of data as many as possible. It should never exclude any forms of reasonable classification. By taking care of this at the outset, the responsible data modeler is setting the stage for the multitude of requirements that the data warehouse should serve to satisfy. Thus, from the standpoint of data modeling, the most atomic data should be modeled by the data modeler with the largest interpretive latitude. It is not difficult to create a data model like this, and it can serve to represent a companys simplest data. Once all this has been defined in the data model, the data warehouse shall be prepared to take on many different requirements. Therefore, its not such a difficult task when it comes to modeling atomic data and inserting attributes that will allow that data to be stretched in any way.