RF

Corsello Research Foundation

Data Modeling Interviews
Lines of Questioning

Basics
 Data modeling is about defining standard structures for data
 Many data sets may share a common structure
 Each “thing” in the real world should have only one data structure  Each data structure may appear in multiple data models

 Data models come in 2 primary flavors
 Domain model
 Models all entities specific to a domain  Aligns to task automation and workflows

 Entity model
 Models entities regardless of domain

RF

Corsello Research Foundation

Software
 Software works with or on data
  Software is actually a form of data as well Software should be keyed to a data model

Software may be built for dynamic data models
  Allows for mapping to a specific implementation of a data model Results in general purpose software


Generally, lower performance
Lower specialization, Higher generality

Software may be keyed to a specific data model
   Allows for high-performance, specialized tooling Allows for integration with workflows specific to the domain Lower generality

Neither model is better, just different

RF

Corsello Research Foundation

Data Stores
 A collection of data based upon a single data model in a coherent repository is a data store  A relational database is a form of repository for data stores
 A single RDBMS instance may contain multiple data stores

 Data stores may be abstracted by software in numerous ways to enable access
 Web-based services (SOAP/REST/JSON/RSS)  Database API (e.g. ODBC/JDBC)

 Remote Service (non-web, like CORBA/DCOM/IIOP)
 Native API (e.g. code library/dll/jar)
Corsello Research Foundation

RF

Data Models
 Data models serve several purposes

  

Standard data models enable standard data formats, which enable sharing
Standard data models enable standard software implementations, which enable application integration Data models provide standard vocabularies for communicating Data models provide references for standardizing workflows
 A workflow will require and produce data from the model  Better enables defining standard entry and exit criteria

Data format standards are not data models
  A standard XML schema is a standard encoding of data that implies structure, but is not itself a data model A data model is more abstract, it does not constrain implementation, encoding or use

Data models are only part of the bigger picture of standardization of practice

RF

Corsello Research Foundation

Parts and Pieces
 The goal is consistency, repeatability, measurability and reuse (sharing)

This goal requires multiple facets:
   Standard data models Standard methodologies
 Technical models, algorithms and approaches

Standard business processes
   Delineation of responsibility Processes and procedures Workflow models

In short, standards
   Does not require agreement, only acceptance Standards do not need to fit everyone’s needs, only the cross-section of needs Standards should be composable to get more detail – that’s how to support everyone (a web of standards)

RF

Corsello Research Foundation

People
 All activities are performed for and/or by people  An task is automated to remove a person from needing to perform the task, however the result of the task will flow to a person  People will appreciate the results of standardization, if done well – but:
 There is a fear that automation is meant to put them out of work  There is a dislike for being required to do things in a different manner than we are used to (xenophobia)  People want results, standardization is not quick

RF

Corsello Research Foundation

Coping with change
 To enable standardization to work well, expect long time lines

 Expect people to not support the time lines
 Deliver results in the interim, without the promise of the standardization  The “grand vision” of the resulting utopia from standardization should be avoided
 There is no silver bullet, only hard work and good intent  Don’t “hide” the goals, but emphasize the short-term goals  Don’t let the short-term goals undermine the grand vision

 The long-term goals are the most important to maintain relevance
 The short-term goals are the most important to maintain support

RF

Corsello Research Foundation

Interviewing (finally)
 When holding a data modeling / business process session, remember it is a collaborative interview  Get relevant people involved:
 Average “user” in the domain  “Hotshot” or “Hero” in the domain  “Trouble child” or “Technophobe” in the domain  Minimal managers in general meetings  Meet with management in a separate meeting both before and after for differing views

 Get a cross-section of what the domain is

RF

Corsello Research Foundation

The Session(s)
  Ask questions to spur discussion
 The people are a cross-section of the domain to ensure active discussion

The facilitator / modeler do not actually “create” the model, the audience does
  Maintain enough control and direction to stay on topic Some discussion need to go off-topic to get to a point

 

The modeler guides the model development based upon their knowledge of modeling practices, not the domain
 The modeler should understand the domain well enough to know what is on or off track

The outcome of the meetings is a high-level abstract data model and process model
  One is of little use without the other in a specific domain Entity data model sessions

Should result in a domain map indicating what domains this entity model is relevant to
Map should directly intersect the audience

RF

Corsello Research Foundation

Questions
   There are no fixed questions to ask
 It is imperative to teach data model basics in most cases

The line of questioning should be exploratory Try to answer
 What does your domain do (and not do)
 Establishes boundary of the domain

Who does your domain contain (and not contain)
  Establishes a list of organizations of responsibility and regulatory environment Establishes a relative size for the domain

Who do you serve and interact with
  Establishes a list of “consumers” of what domain produces Establishes a list of “suppliers” the domain consumes from

How does your domain accomplish this
 Establishes a list of processes / practices

RF

Corsello Research Foundation

Modeling
 Continue elaborating the previous questions

 Extract from the answers
 What do you use (tools, data, techniques)  Where do you use X (for each data entity X)  What is the same/different about each data entity

 Establish a baseline of entities
 Forms the core data model  Extract fields/attributes

 Extract metadata (descriptions)
 Extract relations/multiplicities
Corsello Research Foundation

RF

Build a Model
 Still during the meeting

Depict graphically:
 Data entities  Entity relations  Process uses / domain mappings

Probe users for “issues” with the model
 What’s missing  What is not “always true” with the model  What is domain specific about the model

 What cannot be lived without
 What is too costly to require or is inherently optional

RF

Corsello Research Foundation

Build the Real Model
 After the meeting is over


Decompose the model into a logical data model representation (e.g. in UML)
Partition the model
  Find natural “break points” in the entities Isolate each entity

Resolve dependencies into a “parent” and “child”
 Extends the relational concept in that the parent data model owns the link to the child, the child is not required to know about the link

Address partition consistency issues
 Define any mandatory constraints in the model


Expect implementations will not be 100% able to enforce contstraints
Expect implementations to be fully distributed, loosely coupled and inter-organizational

RF

Corsello Research Foundation

Review and ‘Splanations
 Provide the real model to the community
 Expect “concerns” and “issues”
 No word generally means nobody understands, or nobody cares  Expect most issues will be addressed not by changing the model, but by explaining the concepts of the model

 Educate, explain and provide examples
 Most users will want to directly relate a model to an implementation of the model  It is extremely hard to convey the difference  It is critical to maintain a complete separation of the model from it’s implementation  If (when) example implementations are shown, they should test the boundary of what is “compliant” with the model

RF

Corsello Research Foundation

Questions
RF
Corsello Research Foundation

Sign up to vote on this title
UsefulNot useful