You are on page 1of 18

Data Modeling

ITEM TITLE
1) What is a Data Model 2) The History of Data Modeling 3) Data Modeling Explained 4) Data Warehouse Glossary 5) Entity Relationship Model 6) Physical Data Models 7) Tips for Mastering Data Modeling 8) Role of Data Modeling within Enterprise Management 9) Data Modeling Overview 10) Data Modeling Introduction 11) Connection between Data Model and Data Warehouse

What is a Data Model?


Quite simply, data models are abstract models whose purpose is to describe how data can be used and represented effectively. The term data model is, however, used in two different ways. The first is in talking about data model theory that is, formal descriptions of how data can be used and structured. The second is in talking about an instance of a data model in other words, how a particular data model theory is applied in order to make a proper data model instance for a specific application. Data modeling refers to the process where by data is structured and organized. It is a key component in the field of computer science. Once data is structured, it is usually then implemented into what is called a database management system. The main idea behind these systems to manage vast amounts of both structured and unstructured data. Unstructured data include documents, word processing, e-mail messages, pictures, and digital video and audio files. Structured data what is needed to make a data model (via a data model theory) is found in management systems like relational databases. A data model theory is the formal description of a data model. In the development of software, the projects may focus on the design of a conceptual data model, or a logical data model. Once the project is well on its way, the model is usually referred to as the physical data model. These two instances logical and physical represent two ways of describing data models. The logical description focuses on the basic features of the model, outside of any particular implementation. The physical description, on the other hand, focuses on the implementation of the particular database hosting the models features. Now lets take a look at the structure of data. This is what the data model describes within the confines of a particular domain and, as it implies, the underlying structure of that specific domain. What this means is that data models actually specify a special grammar for the domains own private artificial language. Data models are representations of different entity classes that a company wants to possess information about, containing the specifics behind that information, and the relationship among the differing entities and attributes. The data may be represented in a different fashion on the actual computer system than the way it is described in the data model. The entities, or types of things, represented in the data model might be tangible entities, but models with entity classes that are so concrete usually change over time. Abstractions are often identified by robust data models. A data model might have an entity class marked Persons, which is meant to represent all the people who interact with a company. This abstract entity is more appropriate than ones called Salesman or Boss, which would specify a special role played by certain people. In a conceptual data model, the semantics of a particular subject area are what is

described. The conceptual data model is basically a collection of assertions about the type of information that is being used by a company. Entity classes are named using natural language, as opposed to technical jargon, and concrete assertions about the subject area benefit from proper naming. Another way of organizing data involves the use of a database management system. This involves the use of relational tables, columns, classes, and attributes. These models are sometimes called physical data models, but in the use of ANSI three schema architecture, it is referred to as logical. In this type of architecture, the storage media is described in the physical model cylinders, tablespaces, tracks, etc. It should be derived from the more conceptual model. There might be slight differences however, for example in the accounting for usage patterns and processing capacity. Data analysis is a term that has become synonymous with data modeling. Although in truth, the activity seems to have more in common with synthesis than analysis. Synthesis, after all, refers to the process whereby general concepts are inferred from particular instances; in analysis, the opposite happens particular concepts are identified from more general ones. I guess the professionals call themselves systems analysts because no one can pronounce systems synthesists! All joking aside, data modeling is an important method where by various data structures of interest are brought together into one cohesive whole, relating different structures into relationships and thereby eliminating redundancies making everyones lives a lot easier!

The History of Data Modeling


The programming of computers is an abstract realm of thought. In the 70s, it was thought that people would benefit from an increased use in graphic representations. On the side of process, flow charts led to data flow diagrams. Then, in the mid-70s, entity relationship modeling was created as a means of graphically representing data structures. Entity relationship models are used during the first stage of information system design in order to elucidate types of info that are needed to be stored in the database during the phase of requirements analysis. Any ontology can be described via the data modeling technique for a specific area of interest. If the information system being designed is based on a database, then the conceptual data model will later be mapped on to a logical data model, which in turn will be mapped on to a physical model during the physical design process. (Sometimes both of these phases are referred to as physical design.) Object oriented programming has been used since the 1960s in the realm of writing programs. In the very beginning, programs were organized based on what they did data was only attached if necessary. Programmers working in this area would organize their work based on the terms that the datas objects described. For real time systems, this was a major breakthrough. In the 1980s, it broke in to the mainstream data processing scene. It was during this time that graphic user interfaces were able to introduce object-oriented programmers to commercial applications. The problem, they realized, of defining requirements would enormously benefit from an insight in to the realm of objects. The

idea of object models was introduced without the acknolwedgment of the fact that systems analysts had already discovered similar models. In the 80s, a significantly new approach to data modeling was engineered by G.M. Nijssen. Deemed NIAM, short for Nijssens Information Analysis Methodology, it has since been re-named ORM, or object role modeling. The purpose is to show representations of relationships instead of showing types of entities as relational table analogs. With a focus on the use of language in making data modeling more accessible to a wider audience, ORM has a much higher potential for describing business regulations as well as constraints. In recent times, agile methodologies have arisen to the forefront of data modeling. Central to these methodologies is the concept of evolutionary design. From this standpoint, when confronted with a systems requirements, you acknowledge that you cannot fix them all up front. It is not practical to have a detailed designing phase at the very beginning of a project. Instead, the idea is that the systems design must evolve throughout the softwares numerous iterations. People tend to learn things by experimenting trying new things out. Evolutionary design recognizes this key component of human nature. Under this concept, developers are expected to experiment with ways of implementing a particular feature. It may take them several tries before they arrive at a settled method. It is the same for database design. Some experts claim that it is virtually impossible to manage multiple databases. But others claim that a database manager can easily look over a hundred database instances with relative ease, given the possession of the right tools. While experimentation is important, it is also important to bring the different approaches one has tried out back together into an integrated whole every once in a while. For this, you need a shared master database, one that all work flows out of. When beginning a task, be sure to copy the master into your own workspace. Then you can manipulate it and enter the changes back into the master copy. Be sure to integrate at least once a day. Integrating constantly is key. It is a lot easier to perform smaller integrations frequently as opposed to occasionally performing larger integrations. The difficulties of integration tend to increase exponentially with each integrations size. So in other words, doing a bunch of small changes is a lot easier in practice, even though it might not seem to make a lot of sense. The Software Configuration Management community has taken note of similarities to this practice when dealing with source code.

Data Modeling Explained


Data modeling is a computer science term that is used to describe the process of generating a data model. A data model will be generated by applying a special theory that is known as the data model theory. The data model theory will be used to create an entity that is known as a data model instance. When you go through the process of data modeling, you are essentially organizing data, as well creating a structure for it. Once the data has been organized, it will be placed in a DBS, or database management system. When data is organized, the modeling process will generate limitations that is placed on the structure of the data. One of the primary functions of information systems is to manage large amounts of data that is structured, as well as unstructured. Data models will typically deal with structured data that will be used within relational databases. Data models will rarely be used to deal with unstructured data, and an example of this type of data would be pictures, video, or documents that are created in word processing programs. When a software product is in the early stages of development, a great importance will be placed on the structure of a conceptual data model. It is possible to take this design architecture and transform it into a data model that is logical in nature. At the later stages of the software development process, this may be transformed into a data model that is physical. The data model will commonly be described in two ways, and this is physical or logical. The physical picture of a data model deals with the implementation of a specific database, a database that will host the model. The logical picture of a database will deal with generic aspects of the model, and it will not be concerned with any specific implementation. The structure of the data also plays an important role in data modeling. The data model will be responsible for providing a description of the structure of the data. It will also deal with the primary structure of the domain. As you can seen, data models play an important role in the structures of both domains and data. A data model could also be described as an entity that will symbolize classes of various objects. This will be closely related to information. It could be the information that a company stores, as well as the characteristics that are inherent in that information. The relationships of these characteristics will often be taken into consideration as well. How the data is presented in the computer system is largely irrelevant. The data model will place an emphasis on providing information about how the data is organized. While the objects that the data models represent may be tangible, the models that deal with concrete classes will often change over a certain period of time. If the data model is highly robust, it may be possible for it to find abstractions for these objects. If a data model is conceptual, it can be used to showcase the semantics of various topics. It can be presented as a collection of assertions that are made about the function of the information, information that may be used by various companies or organizations. Many of these classes will be used with common words rather than the technical terms that are

common in the data modeling field. This is important, because giving the proper names to relationships will allow strong assertions to be made about various areas. Another concept that you will want to study is generic data modeling. Distinct modelers will produce numerous models for the same domain. This can make it hard to bring together models from distinct people or entities. Despite this, it should be noted that the differences are related to the varying levels of abstraction that occur in the models. If an agreement can be made among the modelers for specific elements that are concreted, the differences between the entities can be less emphasized, and they can be rendered with a higher level of detail. Data models have played important roles in the functions of many database management systems, and they have become more important as we move further in the information age. Companies that understand how to properly use data models will greatly benefit. There are a large number of fields where data modeling technique are very useful.

Data Warehouse Glossary


Because of the complexity surrounding data warehouses, there are a number of terms that you will want to become familiar with. While there are too many terms to present in this article, I will go over the fundamental terms that you should know. Understanding the terminology surrounding a data warehouse will make it easier for you to learn how to use it effectively, and it will make communication with your peers easier. Access - Access can be defined as the process of obtaining data from the databases that exist within the data warehouse. It is a fundamental term, and it is one that everyone who works with a data warehouse should know. Ad hoc query - This is a request for data that cannot be prepared for in advance. The ad hoc query will generally be comprised of an SQL statement that has been built by a skilled user. It will generally be composed of a data access tool. Aggregation - The procedure in which data values are grouped with the goal of managing each data unit as a single entity. One good example of this would be multiple fields from one customer being combined into a single unit from numerous places. Analysis - The analysis occurs when a user takes data from a warehouse and takes the time to study it. This is a fundamental concept, since studying the data will allow the user to make important business decisions. Anomaly - An anomaly is a situation where a user gets a result that is unexpected or strange. It may also be known as a data anomaly. One of the most common scenarios in which an anomaly will occur is when a data unit is defined for one specific purpose but used for another. One example of an anomaly is when a number has either a negative value, or it has a value that is too high for the entity it represents.

Architecture - The architecture is the underlying structure for the data warehouse. It represents the planning of the warehouse, as well as the implementation of the data and the resources that are used to deal with it. The architecture of a data warehouse can be broken down into technologies, data, and processes. The architecture is a blueprint, and it will provide a description for the data warehouse and its environment. Atomic data - As the name implies, atomic data is data that has been broken down into its most simple form, much like matter is broken down into simple atoms and other subatomic elements. Attributes - This is a term that is closely related to data modeling as well as data warehouses. It deals with the characteristics that a piece of data will have. Each unit will have its own unique values, and when a logical models are transformed into a physical model, these entities will be transformed into tables. The attributes themselves will be transformed into columns. Back-end - A back-end can be described as filling up the data warehouse with data that comes from a system that is operational. Best of Breed - This term is used to refer to the most power products that fall under various categories. When an organization chooses their tools, they will find that some are better than others. By choosing the best products from the best vendors, the efficiecny of the data warehouse will be greatly increased. Best practices - The best practices can be defined as the processes which maximizes the companies use of the data warehouse. Business analyst - An analyst is a person who is responsible for studying the data and the operations that are used to maintain it. Business Intelligence - Business intelligence is an important concept that deals with the evaluation of business data. It deals with both databases and various applications. Business intelligence is a broad term that deals with a large number of topics. Some of them include data mining and alternative forms of storage. CRM (Customer Relationship Management) - Customer relationship management deals with the infrastructure that will allow businesses to help companies better serve their customers. Customer relationship management plays important roles in the interactions between customers and companies. Customer Segmentation - This is the process by which customers are split into factors based on their age, educational, or their gender.

Entity Relationship Model


Structured data is stored in databases. Along with various other constraints, this datas structure can be designed using entity relationship modeling, with the end result being an entity relationship diagram. Data modeling entails the usage of a notation for the representation of data models. An entity relationship diagram can be thought of as a type of semantic data model, or a conceptual one. Entity relationship models are used during the first stage of information system design in order to elucidate types of info that are needed to be stored in the database during the phase of requirements analysis. Any ontology can be described via the data modeling technique for a specific area of interest. If the information system being designed is based on a database, then the conceptual data model will later be mapped on to a logical data model, which in turn will be mapped on to a physical model during the physical design process. (Sometimes both of these phases are referred to as physical design.) Numerous conventions exist for entity relationship diagrams. Conceptual modeling is based largely on classical notation. Other notations, mostly used in physical and logical database design, include ICAM Definition Language, dimensional modeling, and information engineering. An entity can be thought of as the representation of a discrete object. They are also nouns. So, for example, a song is an entity, as is an employee, a computer, a mathematical theorem, etc. Relationships capture how entities relate to one another. Relationships are also verbs. Supervises might designate the relationship between an overseer and a department, performs serves to describe the relationship between an actor and a role, etc. Whereas relationships are drawn as diamonds, entities are represented in the form of rectangles. Attributes are unique qualities that can be possessed by both relationships and entities. Attributes are represented by ovals, which are connected to the entity sets that possess them by a line. With the exception of weak entities, all entities have to have a minimum amount of attributes. These make up the primary key of the entity. Rather than displaying single instances of relations or lone entities, entity relationship diagrams display sets of entities and relationships. So, one song on a database is an entity, whereas a collection of songs on the database is the entity set. Lines connect sets of entities to the sets of relationships with which they are involved. A thick line, known as a participation constraint, will be drawn in those instances when all the entities in a particular set are involved with the relationship set. If it is possible for every entity in an entity set to participate in at most one relation in the set of

relationships, an arrow, known as a key constraint, will be drawn connecting the set of entities to the set of relationships. A thick arrow represents that every entity in a set of entities is involved in one relationship.

A couple of other useful definitions: Associative entities are often used to solve the conflict of a couple of entities with manyto-many relationships. Unary relationships are those relationships between the rows of a single table. Weak entities cannot be identified with the help of its own attributes. It is thus necessary to use these entities primary keys, as well as the primary key of an entity that is related to it. In an entity relationship model, attributes can be composite, derived, or multivalued. Multi-valued attributes might have more than one value in one or more instances of its entity. These attributes are denoted with a two line ellipse. So, if a piece of software happens to run on more than one operating system, it could have the attribute platform; this is a multi-valued attribute. Composite attributes, on the other hand, are those attributes that might contain two or more attributes. A composite attribute will have contributing attributes of its own. An example of a composite attribute is an address, as it is composed of attributes that include street address, city, state/region, country, etc. Finally, there are derived attributes. The value of these attributes is dependent wholly on another attribute. Derived attributes are denoted with dashed ellipses. So, in a database of employees that has an age attribute, the age attribute would have to be derived from an atttribute of birth dates.

Physical Data Models


Physical data models represent the design of data while also taking into account both the constraints and facilities of a particular database management system. Generally, it is taken from a logical data model. Although it can also be engineered in reverse from a particular database implementation. All database artifacts that are needed to create relationships will be included on a physical data model. These include linking tables, indexes, constraint definitions, and partitioned clusters. The model can generally be used as a calculation device for figuring storage estimates. Storage allocation details for a particular system might be included. Physical data modeling is also referred to as database design. Recent years have seen the spread of agile methodologies a whole new approach to database design. The concept of evolutionary design is key in these new demands. When

youre working on a project from an agile standpoint, the assumption from the outset is that you will not be able to fix the systems requirements upfront. So it is not practical to have a detailed design process at the very beginning of the project. Instead, the systems design is expected to evolve through the softwares iterations. These new agile techniques, which have come to the forefront in methodologies such as XP (extreme programming), have been proven to be incredibly practical when it comes to database design. Agile methods are distinguished by their flexibility and eagerness to adapt. In contrast to the waterfall approach (in which techniques and rules are learned and signed off on early in the design process), agile methods are not driven by advanced plans. But what is wrong with the waterfall approach? After all, isnt this the way most of us learn? Sure, but we want a system that is flexible and ready to adapt to change. The waterfall approach tries to eliminate changes in the beginning by doing most of the work upfront. If any changes need to be made once that work is done, then major problems can occur. Agile methods, then, are more reliable, in that they have a more open approach to change. In fact, they are based on the concept that change is inevitable and that changes will occur throughout the process of development. Of course it is essential that changes be monitored and controlled, but agile methods differ from the waterfall approach in that they actually encourage change. The reasoning behind this is twofold. First, it provides a more dynamic model for businesses that constantly have to shift their requirements in order to keep up with the competition that the market poses. Secondly, many projects these days are inherently unstable when it comes to systems requirements, so it is best to design a system that incorporates the possibility of change into its components. What is thus needed for a successful database design is to completely shift ones attitude, ones way of thinking about the design process. Instead of viewing it as merely a phase, one that is to be completed before construction begins, it should be viewed as a constant process that is interwoven with testing, construction, and delivery. This is the main thing that distinguishes evolutionary design from planned design. Agile methods have come up with ways to allow evolutionary design to function in a manner that is controlled, rather than chaotic. Iterative development is key to this approach. It involves the running of the complete life cycle of the software several times throughout the projects life span. In each iteration, the complete life cycle is run through by the agile process. The iteration is completed with the working code for a subset of the final products requirements. These techniques have garnered widespread interest and are widely used. Yet the question remains: can evolutionary design function for databases? The answer is yes. One such project, Atlas, involved nearly one hundred individuals spread over multiple sites all

over the world, more than two hundred tables, and nearly half a million code lines. It took about a year and a half of initial development and continues to evolve to this day. At the beginning, month-long iterations were used, but later, two week iterations proved to be more effective. The techniques used in the Atlas project included the close collaboration of DBAs with developers, frequent integration into a shared master by developers, automated refactorings, automatically updated database developers, the successful usage of schema and test data, and the clear separation of database access code. On smaller projects, one of the Atlas analysts concluded, a full-time DBA is not even needed. Instead, a developer with interest in DBA issues can do the job part-time.

Tips for Mastering Data Modeling


Data modeling refers to the process where by data is structured and organized. It is a key component in the field of computer science. Once data is structured, it is usually then implemented into what is called a database management system. The main idea behind these systems to manage vast amounts of structured and unstructured data. Unstructured data include documents, word processing, e-mail messages, pictures, and digital video and audio files. Structured data what is needed to make a data model (via a data model theory) is found in management systems like relational databases. A data model theory is the formal description of a data model. Data models are highly flexible structures that can be employed to meet a variety of ends. In same ways, the process of data modeling can be compared to class modeling, the difference being that whereas in class modeling, classes are identified, in data modeling, types are identified. The data model serves two main functions. First off, it needs to serve as an accurate representation of the analysts understanding of the overall enterprise. This way, the customer will be able to judge rightly whether or not the analyst understood the project. It is the ultimate test to see if the analyst really understands the nature of the business. If a data model is executed properly, then the answer will be quite clear. It should ask the user if it fulfills what he or she desires. Next, the data model must provide an accurate reflection of the organizations data. As such, it will provide the best starting point for the design of a database. While the final database design may very well end up looking a lot different than the data model, the data model should always strive to resemble the finished structure. This makes it easier to perform any necessary adjustments along the way. The designer/builder should get the message from the model this is what you want to build. Below, you will find some useful tips for building effective data models.

-When youre naming the entities, aim for clarity and cohesion. Make sure the name is a clear representation of the thing. By using normal English (or Spanish, whatever language youre working in), you eliminate a lot of confusion in advance. It is also to keep in mind who youre designing the model for, i.e. your audience. Therefore the entity names should be universally recognizable by all. Steer clear of acronyms, table names, and abbreviations. Accompany each entity name with a concise, clear definition in the dictionary. -If you have not had time to analyze an identified entity, be sure to state that in the dictionary. Try to come back to it as soon as possible. You should strive to identify all possibly relationships. -Each time an entity occurs, it should be marked with the use of a sub-type. Each entitys occurrence should only appear in that singular sub-type. Never use a sub-type where an example would function better. Using a sub-type implies that it has relationships that distinguish it from others and that the division of entities into sub-types is a stable form of organizing the data.

-Weak relationship names should be avoided at all costs. You dont need to use statements like related to; the fact that it is a relationship makes this point pretty clear. A gerund can be employed if need be, but only as a last resort. In general, concreteness should always be striven for, so that the client may easily correct it if it is not 100% accurate. -The drawing should be organized in a way so that it will be attractive visually. Always give equal weight to both sides of the diagram. White spaces should be left with discretion. Make sure different elements are not crowded together unless they absolutely have to be. -The overall model should be broken into different topics. That way, each diagram only concerns at most two subjects. The drawings should be organized in a way that each one represents a whole logical horizon. - If you are building a model for publication, laying it out in portrait is a lot better. But if its just for the screen, choose landscape form it is a lot more efficient. Whatever you choose, make sure it is used consistently throughout all the diagrams.

Role of Data Modeling within Enterprise Management


When it comes to the development, maintaining, augmentation, and integration of enterprise systems, data modeling is key. Over 90% of enterprise systems functionality is based on the creation, manipulation, and querying of data. When managing a major

enterprise project, it is thus necessary to depend on data modeling in the successful execution of projects. Data modeling can be used to deliver dependable, cost effective systems. Project managers are responsible for numerous tasks, including planning, estimating, evaluating risks, managing resources, monitoring, managing deliveries, and more. Almost every one of those activities depends on the data models evolution. In recent years, data modeling has been the subject of fierce criticism. It is said that data modeling has been poor to adapt to changes forced by the rapid changes of globalization and decentralized structures. A lot of this criticism is rooted in negative experiences with data modeling. But data modeling should not be eliminated altogether. Rather, the goal should be to use it for its advantages, and get it right. But for those not keyed in to the pros and cons, lets start by looking at some of the arguments against data modeling. Then well use these negatives to show why data modeling can, in fact, be an effective enterprising strategy. First, it is argued that data modeling slows down the development of systems in an era when speed is the key in technological development. It is also claimed that in the process of developing software, enterprise data models add unnecessary complexity. Prefabricated data models are said to be of no good use, and are thus not good investments. Data modeling is said to not be able to keep up with the changes in the information processing systems belonging to financial institutions. It is also claimed that data modeling cannot provide banks with information systems that are adaptable to the necessary speed of innovation. Above all, data modeling is seen as a brake, rather than an accelerator. In the realm of systems development, change is a constant. So, the reasoning goes, by using data modeling, you are promoting the idea that structure shouldnt change. Yet despite the rampant criticism that enterprise data modeling often becomes the target of, the fact remains that some basic requirements are needed when developing software. No financial institution can survive without using integrated systems. They manage complexities and handle interdependencies. Integrating new and old systems is, in fact, the norm in the realm of systems development. The usage of consistent terms in an enterprises single systems will result in the consistent processing of data over the course of several different systems. The fact is, vast majority of the problems identified above can be repaired by a quality data modeling system. By using these arguments against, we will demonstrate how data modeling can actually be used for the benefit of an enterprise. In regards to the argument that data modeling is unable to keep up with rapid change, the obvious answer is that rapidly changing variants should not be processed in a data model. It is the core, steadfast aspects of a business that are the subject of data modeling not

short-term organizational schemes. External partners or bank account info, for example, can be readily modeled in data model schemes, as they are not prone to rapid changes. These areas should remain the focus of data modeling. This is the objective of reference models. In fact, when used correctly, data modeling does work as an accelerator, rather than a brake. It should work as a service function for various projects. By visiting the companys data model group, you can learn about other efforts occurring within the enterprise, about other entities, and how similar problems were solved by others using a similar reference model. If your data administration department only does reviews, then yes, it will slow down the progress of your projects. Instead, the data administration department should aid in the development of projects and thus help them pick up speed. What is needed in beneficial data modeling is just the right amount of detail. A top level data model is a viable framework, although it is not essential to work from the top downwards. Never assume that it is necessary to expand your data model to the fourth level of detail. This kind of detailing should never be a part of the enterprise data model. Rather it should be kept as part of the individual project data model. The question that critics of data modeling fail to answer is what should be done as an alternative to data modeling. We dont want to go back to the old days of data processing, do we? In fact, it will take up even more time and resources to revert back to these stoneage techniques. With a little time and smart planning, data modeling can be an effective tool in an enterprise.

Data Modeling Overview


Data modeling refers to the process whereby data is structured and organized. It is a key component in the field of computer science. Once data is structured, it is usually then implemented into what is called a database management system. The main idea behind these systems is to manage vast amounts of both structured and unstructured data. Unstructured data include documents, word processing, e-mail messages, pictures, and digital video and audio files. Structured data what is needed to make a data model (via a data model theory) is found in management systems like relational databases. A data model theory is the formal description of a data model. The data model serves two main functions. First off, it needs to serve as an accurate representation of the analysts understanding of the overall enterprise. This way, the customer will be able to judge rightly whether or not the analyst understood the project. It is the ultimate test to see if the analyst really understands the nature of the business. If a data model is executed properly, then the answer will be quite clear. It should ask the user if it fulfills what he or she desires. Next, the data model must provide an accurate reflection of the organizations data. As such, it will provide the best starting point for the design of a database. While the final database design may very well end up looking a lot different than the data model, the data

model should always strive to resemble the finished structure. This makes it easier to perform any necessary adjustments along the way. The designer/builder should get the message from the model this is what you want to build. Now lets take a look at the structure of data. This is what the data model describes within the confines of a particular domain and, as it implies, the underlying structure of that specific domain. What this means is that data models actually specify a special grammar for the domains own private artificial language. Data models are representations of different entity classes that a company wants to possess information about, containing the specifics behind that information, and the relationship among the differing entities and attributes. The data may be represented in a different fashion on the actual computer system than the way it is described in the data model. The entities, or types of things, represented in the data model might be tangible entities, but models with entity classes that are so concrete usually change over time. Abstractions are often identified by robust data models. A data model might have an entity class marked Persons, which is meant to represent all the people who interact with a company. This abstract entity is more appropriate than ones called Salesman or Boss, which would specify a special role played by certain people. In a conceptual data model, the semantics of a particular subject area are what is described. The conceptual data model is basically a collection of assertions about the type of information that is being used by a company. Entity classes are named using natural language, as opposed to technical jargon, and concrete assertions about the subject area benefit from proper naming. Another way of organizing data involves the use of a database management system. This involves the use of relational tables, columns, classes, and attributes. These models are sometimes called physical data models, but in the use of ANSI three schema architecture, it is referred to as logical. In this type of architecture, the storage media is described in the physical model cylinders, tablespaces, tracks, etc. It should be derived from the more conceptual model. There might be slight differences however, for example in the accounting for usage patterns and processing capacity.

Data analysis is a term that has become synonymous with data modeling. Although in truth, the activity seems to have more in common with synthesis than analysis. Synthesis, after all, refers to the process whereby general concepts are inferred from particular instances; in analysis, the opposite happens particular concepts are identified from more general ones. I guess the professionals call themselves systems analysts because no one can pronounce systems synthesists! All joking aside, data modeling is an important method whereby various data structures of interest are brought together into one cohesive whole, relating different structures into relationships and thereby eliminating redundancies making everyones lives a lot easier!

Data Modeling Introduction


Data modeling refers to the process where by data is structured and organized. It is a key component in the field of computer science. Once data is structured, it is usually then implemented into what is called a database management system. The main idea behind these systems to manage vast amounts of both structured and unstructured data. Unstructured data include documents, word processing, e-mail messages, pictures, and digital video and audio files. Structured data, what is needed to make a data model (via a data model theory), is found in management systems like relational databases. A data model theory is the formal description of a data model. In the development of software, the projects may focus on the design of a conceptual data model, or a logical data model. Once the project is well on its way, the model is usually referred to as the physical data model. These two instances, logical and physical. They represent two ways of describing data models. The logical description focuses on the basic features of the model, outside of any particular implementation. The physical description, on the other hand, focuses on the implementation of the particular database hosting the models features. Now lets take a look at the structure of data. This is what the data model describes within the confines of a particular domain and, as it implies, the underlying structure of that specific domain. What this means is that data models actually specify a special grammar for the domains own private artificial language. Data models are representations of different entity classes that a company wants to possess information about, containing the specifics behind that information, and the relationship among the differing entities and attributes. The data may be represented in a different fashion on the actual computer system than the way it is described in the data model. The entities, or types of things, represented in the data model might be tangible entities, but models with entity classes that are so concrete usually change over time. Abstractions are often identified by robust data models. A data model might have an entity class marked Persons, which is meant to represent all the people who interact with a company. This abstract entity is more appropriate than ones called Salesman or Boss, which would specify a special role played by certain people. In a conceptual data model, the semantics of a particular subject area are what is described. The conceptual data model is basically a collection of assertions about the type of information that is being used by a company. Entity classes are named using natural language, as opposed to technical jargon, and concrete assertions about the subject area benefit from proper naming.

Another way of organizing data involves the use of a database management system. This involves the use of relational tables, columns, classes, and attributes. These models are sometimes called physical data models, but in the use of ANSI three schema architecture, it is referred to as logical. In this type of architecture, the storage media is described in the physical model cylinders, tablespaces, tracks, etc. It should be derived from the more conceptual model. There might be slight differences however, for example in the accounting for usage patterns and processing capacity. Data analysis is a term that has become synonymous with data modeling. Although in truth, the activity seems to have more in common with synthesis than analysis. Synthesis, after all, refers to the process whereby general concepts are inferred from particular instances; in analysis, the opposite happens particular concepts are identified from more general ones. I guess the professionals call themselves systems analysts because no one can pronounce systems synthesis's! All joking aside, data modeling is an important method where by various data structures of interest are brought together into one cohesive whole, relating different structures into relationships and thereby eliminating redundancies, making everyones lives a lot easier! Notable data modeling techniques include IDEF, entity-relationship diagrams, Bachman diagrams, Barkers notation, Object Role Modeling (or Nijssens Information Analysis Method), the business rules approach, object-relationship modeling, and RM/T. Common data modeling tools include GNU Ferret, Datanamic DeZign, ERwin, ARIS, Oracle Designer, Visio Microsoft, SILVERRUN, Mogwai ER-Designer, MySQL Workbench, PowerDesigner, and ER/Studio.

Connection between Data Model and Data Warehouse


Data models are required in order to build data warehouses. The problem is, those who develop data warehouses need to be able to show results at a fast pace. Data models are problematic in that they take a very long time to build. Is it possible, then to speed up the process somehow? The data that is in the data warehouse is the most atomic data that exists. The various data summaries and aggregations are external to the data warehouse, being found in such places as DSS applications, data marts, ODS, etc. These constantly changing forms of data arent situated in the atomic level of data that is in the data warehouse. In fact, data models should only be concerned with basic elemental data. It does not have to concern itself with information such as regional weekly revenues, quarterly units sold, regional monthly revenues, etc. These types of data should be situated outside the data warehouse. So the data model doesnt have to specify every single permutation of atomic data. Whats more, data in data warehouses is incredibly stable. It hardly ever changes. It is the outside data that changes. So the data model for the data warehouse is very small, but also very stable.

The attributes of data found in the data warehouse should have information on subjects that is possible to be interpreted in a broad, general way. It has to be far reaching, representing many streams and classes of data. So if a given subject area is marked Customer and is modeled properly on the data warehouse standard, then it should include attributes of all sorts of customers past, present, and future. Attributes should be arranged in the data model that note when a person became a customer, when a person was last a customer, and whether that person was ever a customer. All this has to be noted in the Customer subject area. In placing all the relevant attributes that may be needed to classify a customer in the subject area, preparations have been made for future contingency for this piece of data. As a result, the DSS analyst will be able to utilize these attributes in order to have a look at past, potential, or future customers, as well as present day customers. The data model should operate as a tool for paving the way for this ultimate flexibility, in placing the right attributes in the data warehouses atomic data. To use a further example of the placing of numerous attributes within atomic data, a part number could include all kind of info about a part, even if the information is not immediately needed by current requirements. A part number can include such attributes as part number, technical description, drawing number, engineering specification, raw goods, precious goods, replenishment categories, weight, length, accounting cost basis, bill of material to, bill of material from, store number, assembly identification number, packaging info, etc. It might seem that some of these attributes appear to be extraneous for the vast majority of information that is typically processed in production control processing. However, by including all of these attributes to the data models part number, then the road has already been paved for future forms of processing that are unknown at present but just might arise some day. To put it in other terms, the data warehouses data model should try its best to include numerous classifications of data as many as possible. It should never exclude any forms of reasonable classification. By taking care of this at the outset, the responsible data modeler is setting the stage for the multitude of requirements that the data warehouse should serve to satisfy. Thus, from the standpoint of data modeling, the most atomic data should be modeled by the data modeler with the largest interpretive latitude. It is not difficult to create a data model like this, and it can serve to represent a companys simplest data. Once all this has been defined in the data model, the data warehouse shall be prepared to take on many different requirements. Therefore, its not such a difficult task when it comes to modeling atomic data and inserting attributes that will allow that data to be stretched in any way.

You might also like