Professional Documents
Culture Documents
1. Differentiate between structured and unstructured data. Give ONE example for each type of
data. (6 marks)
Structured data
Data that normally organized in smaller chunks (entities)
Have the same predefined format, length
Example: (data in database, point of sales data, web server log, sensory data from GPS,
medical devices etc)
Unstructured data
Information that is not organized in a predefined manner
Not predictable
Example: (e.g email, images, word docs, videos, sound, messaging)
Semi-structured data
is a form of structured data that does not conform with the formal structure of data
models associated with relational databases or other forms of data tables, but
nonetheless contains tags or other markers to separate semantic elements and enforce
hierarchies of records and fields within the data. Therefore, it is also known as self-
describing structure. Example: (e.g XML data, EDI data)
6. Building and managing knowledge is one of the greatest challenges that face organization in
the 21st century. List TWO (2) types of knowledge. Provide examples for each answer. (6
marks)
Formal, explicit or generally available knowledge. This is knowledge that has been
captured and used to develop policies and operating procedures for example.
Tacit knowledge. Within the organization there are certain people who hold specific
knowledge or have the ‘know how’.
8. Explain the differences between data, information and knowledge. Illustrate your answers
with examples. (6 marks)
9. Explain THREE (3) consequences of poorly managed data. (6 marks)
less customer satisfaction
increased running costs
inefficient decision-making processes
lower performance and,
lowered employee job satisfaction
increases operational costs since time and other resources are spent detecting
and correcting errors.
10. List THREE (3) data management functions. Describe how ONE (1) of the functions that you
have listed is implemented in your university. (8 marks)
Data management functions:
The discipline of development, execution and
supervision
of plans, policies, programs, projects, processes,
practices and procedures
That control, protect, deliver and enhance
The value of data and information assets.
Data Governence
Data Architecture
Data Development
Data Operations
Data Security Management
Data Quality Management
Reference and Master Data Management
Data Warehousing
Document and Content Management
Meta-data Management
11. Explain the FOUR (4) essential characteristics of high quality data. (8 marks)
o Complete – data that has all those items required to measure intended activity
or event
o Legible – data that the intended users will find easy to read and understand
o Relevant – meets the need of the information users
o Reliable – data is collected consistently over time and reflects the true facts
12. List any FOUR (4) objectives of data governance. (4 marks)
To define, approve, and communicate data strategies, policies, standards,
architecture, procedures, and metrics.
To track and enforce conformance to data policies, standards, architecture, and
procedures.
To sponsor, track, and oversee the delivery of data management projects and
services.
To manage and resolve data related issues.
To understand and promote the value of data assets
13. Discuss TWO (2) primary functions of data governance under DAMA model. (4 marks)
17. Explain THREE (3) principles of good data and information management. (9 marks)
• Public information is published: Public information includes the objective, factual
information on which public services run and are assessed, on which policy decisions are
based, or which is collected or generated in the course of public service delivery. Public
information should be published wherever practicable, unless there are overriding reasons
not to.
• Information is managed: Information should be managed – stored, protected and
exploited – according to its value. Data and information managers need to consider the
whole lifecycle of the information, from identification of need, creation, quality assurance,
maintenance, reuse and ultimately to archiving or destruction once the information has
ceased to be useful.
• Information is fit for purpose Information must be good quality and fit for both its
primary purpose and potential secondary uses. It will not always be possible for the
originator to foresee secondary uses, so it is important that the quality of the information is
communicated consistently so future users can decide if it is suitable.
19. Data and information created from data are widely recognized as enterprise assets. Discuss
THREE (3) reasons why you would agree with the above statement. (9 marks)
• Take active measures to improve the value of their assets. For example, they invest in
hiring the right people, training employees, and succession planning.
• Use (exploit might be a better word) their assets inside the organization and in their
marketplaces. For example, they use people’s brainpower to innovate and develop
strategies to market and sell their products. And they invest their dollars in plant and
equipment to manufacture them economically.
• Adjust their management systems in recognition of the asset’s special properties.
They put someone in charge (e.g., chief financial officers, heads of human resources), and
they recognize. For example, that managing a person and managing a dollar are not the
same.
21. Describe the types of data created and illustrate your answers with examples. (6 marks)
Refer question 1
22. Describe FOUR (4) importance of data management in any business organizations. Use
examples to illustrate your answers. (10 marks)
Productivity
Good data management will make Good data management makes it
your organization more productive. easier for employees to find and
On the flip side, poor data understand information that they need
management will lead to your to do their job. It allows them to easily
organization being very inefficient. validate results or conclusions they
may have. It also provides the
structure for information to be easily
shared with others and to be stored for
future reference and easy retrieval.
Cost Efficiency
23. To be useful data must also satisfy a number of conditions, identify and explain any TWO (2)
conditions for data to be useful. (5 marks)
Read: Using Various Measures The following two texts, in conjunction,
provide several reasons to consider
multiple sources of data to analyze, and
the types of data schools should consider.
Try it: Conduct a Data Audit Read Conducting a Data Audit. Use the
suggested protocol as a grade level or
leadership team to identify existing data
points and how (and if) they are collected,
organized, analyzed and acted upon.
24. List TWO (2) types of data that a banking institution would be interested in collecting.
Explain with reasons the need to collect this data. (6 marks)
Loan Interest Rates On the face of it, figuring out how a bank
makes money is a pretty straightforward
affair. A bank earns a spread on the money
it lends out from the money it takes in as a
deposit. The net interest margin (NIM),
which most banks report quarterly,
represents this spread, which is simply the
difference between what it earns on loans
versus what it pays out as interest on
deposits. This, of course, gets much more
complicated given the dizzying array of
credit products and interest rates used to
determine the rate eventually charged for
loans
25. List TWO (2) types of data that your university would be interested in collecting. Explain
with reasons the need to collect this data. (4 marks)
o Quantitative - The University want know how many student by go through the
number that represent it.
o Qualitative - Based on the quality observer and generally not to be measure
with numerical result. For example the university want know the
measurement of scale with value assign
26. Identify and discuss FOUR (4) causes of data quality issues at the data sources stage. (8
marks)
Entry Quality Entry quality is probably the easiest problem
to identify but is often the most difficult to
correct. Entry issues are usually caused by a
person entering data into a system. The
problem may be a typo or a willful decision,
such as providing a dummy phone number or
address. Identifying these outliers or missing
data is easily accomplished with profiling
tools or simple queries.
Process quality Was the integrity of the information
maintained during processing through the
system?
27. Discuss FOUR (4) dimensions of data quality. Provide examples for each answer. (8 marks)
- Validity/accuracy
o refer to how closely the data correctly captures what it is designed to capture
o A European school is receiving applications for its annual September intake and
requires students to be aged 5 before the 31st August of the intake year
- Reliability/consistency
o data is collected consistency over time and reflects the true facts
o School admin: a student’s date of birth has the same value and format in the
school register as that stored within the Student database.
- Completeness
o the data that has all those items required to measure intended activity or event
o Parents of new students at school are requested to complete a Data Collection
Sheet which includes medical conditions and emergency contact details as well
as confirming the name, address and date of birth of the student.
- Timeliness
o data is collected within a reasonable agreed time period
o Tina Jones provides details of an updated emergency contact number on 1st
June 2013 which is then entered into the Student database by the admin team
on 4th June 2013. This indicates a delay of 3 days. This delay breaches the
timeliness constraint as the service level agreement for changes is 2 days
29. Describe “dirty data”. Provide TWO (2) examples of dirty data. (4 marks)
In a data warehouse, dirty data is a database record that contains errors. Dirty data can be
caused by a number of factors including duplicate records, noise, incomplete or outdated data
(missing value), outlier and (invalid data) the improper parsing of record fields from disparate
systems.
30. Define the following data quality issues and give ONE example for each issue: i) Noise, ii)
Outliers. (6 marks)
In general, noise is any undesirable or unwanted signal, or part of a signal. Noise
may or may not be random.
An "outlier" is a data point or value that differs considerably from all or most
other data in a dataset.
31. Discuss THREE (3) causes of poor data quality. Suggest solutions to improve data quality. (9
marks)
- Manual data entry
a. People mistype. They choose the wrong entry from a list. They enter the right
data value into the wrong box.
b. Given complete freedom on a data field, those who enter data have to go from
memory. Is the vendor named Grainger, WW Granger, or W. W. Grainger?
Solutions:
- Monitoring: Make public the results of poorly entered data and praise those who enter
data correctly.
- Real-time Validation: In addition to forms, validation data quality tools can be
implemented to validate addresses, e-mail addresses and other important information
as it is entered.
- Communication: Regular communication and a well-documented metadata model will
make the process of change much easier.
32. Describe data quality management process. (6 marks)
Data definition: In this step the data describing the business of the undertaking must be
appropriate and complete. The definition of the data involves the identification of data
requirements that fulfill this criterion. Data requirements should contain a proper
description of the single items and their relationship.
Data quality assessment: Data quality assessment involves validating the data according
to the three criteria: appropriateness, completeness, and accuracy. The assessment
should consider the channel through which data is collected and elaborated, whether
through internal systems, external third parties, or publicly available electronic sources.
Problem resolution: The problems that are identified during the assessment of the data
quality are addressed in this phase. It is important to document data limitations and
justify the remedies applied to deficient data.
Data quality monitoring: Data quality monitoring involves monitoring the performance
of the associated IT systems, based on data quality performance indicators. Data quality
monitoring involves two dimensions: quantitative and qualitative.
33. Describe FOUR (4) benefits of implementing data quality program. (8 marks)
o Deliver high-quality data for a range of enterprise initiatives including business
intelligence, applications consolidation and retirement, and master data management
o Reduce time and cost to implement CRM, data warehouse/BI, data governance, and
other strategic IT initiatives and maximize the return on investments
o Construct consolidated customer and household views, enabling more effective cross-
selling, up-selling, and customer retention
o Help improve customer service and identify a company's most profitable customers
34. Discuss THREE (3) measures to protect the quality of data. (6 marks)
Protect data in transit with IP security. Encapsulating Security Payload (ESP) is the
protocol IPsec uses to encrypt data for confidentiality. It can operate in tunnel mode, for
gateway-to-gateway protection, or in transport mode, for end-to-end protection.
Use disk encryption. Disk encryption products can be used to encrypt removable USB
drives, flash drives, etc. Some allow creation of a master password along with secondary
passwords with lower rights you can give to other users. Examples include PGP Whole
Disk Encryption and DriveCrypt, among many others.
Secure wireless transmissions. You should send or store data only on wireless networks
that use encryption, preferably Wi-Fi Protected Access (WPA), which is stronger than
Wired Equivalent Protocol (WEP).
35. Discuss THREE (3) examples of the importance of data quality in organizational processes. (9
marks)
Data Quality Defined. For example, one street address may occur in more than one state
and more than one country. Therefore, correctly document an address by referring to
the street name and number as well as the state and country.
Performance. Darryl Enos writes in "Performance Improvement: Making It Happen" that
performance is the achievement of tangible, specific, measurable and worthwhile goals.
To sustain evidence-based performance, use quality data to measure and report on
business processes. Only then can you determine if you have achieved the objectives of
your performance improvement initiatives and whether they are appropriate
Data and Performance.Customers, product performance, internal operations and
performance, and cost and financial data are each used to formulate performance
improvement strategies and gauge their success. Use accurate, consistent and complete
data to select appropriate performance improvement initiatives and fairly evaluate the
success of each.
36. Describe any THREE (3) reasons for an organization’s data quality problems. (6 marks)
Manual Data Entry: Data is fed manually into the system many times, and is hence
prone to human error. Since user data is often entered through various user-friendly
interfaces, they may not be directly compatible with the internal data representation. In
addition, end-users tend to fill ‘shortcut’ information in fields that they perceive to be
unimportant, but which may be crucial to internal data management. The data operator
may not have the expertise to understand this data and might incorrectly fill values in
the wrong fields or may mistype the information.
Real-Time Interfaces: This is in complete opposition to batch feeds. With real-time
interfaces and applications becoming the flavor of interactive and enhanced user
experience, data enters the database in real-time and often propagates quickly to the
chain of interconnected databases. This triggers actions and responses that might be
visible to the user almost immediately, leaving little room for validation and verification.
This causes a huge hole in the data quality assurance where a wrong entry may cause
havoc at the back end.
Data Cleansing: Every company needs to rectify its incorrect data periodically.
Manual cleansing has been taken over by time and effort saving automations. Although
this is very helpful, it has the potential risk of wrongly affecting thousands of records.
The software used to automate may have bugs, or the data specifications which form
the basis of cleansing algorithms may be incorrect. This can result in making absolutely
valid data, invalid, and virtually reverse the very advantage of the cleaning exercise.
37. Discuss THREE (3) benefits to the organization of implement a data quality assurance
program in maintaining data quality.(6 marks)
- Help improve customer service and identify a company’s most profitable customers.
- Reduce time and cost to implement CRM, data warehouse and other strategic IT
initiatives and maximize the return on investments.
- Provide business intelligence on individuals and organization for research, fraud
detections and planning.
40. Explain the FOUR (4) main steps in data management plan. (8 marks)
http://instr.iastate.libguides.com/dmp/writingDMP
Data identification/collection
Data organization
Data documentation
Data storage & security
Data preservation
Data sharing
43. One of the most fundamental challenges in the process of data integration is
heterogeneous data. Discuss strategies to overcome the challenges. (6 marks)
- A detailed analysis of the characteristics and uses of data is necessary to mitigate issues
with heterogeneous data. First, a model is chosen-either a federated or data warehouse
environment- that serves the requirements of the business applications and other uses
of the data.
- Then the database developer will need to ensure that various applications can use this
format or, alternatively, that standard operating procedures are adopted to convert the
data to another format.
- Bringing disparate data together in a database system or migrating and fusing highly
incompatible databases is painstaking work that can sometimes feel like an
overwhelming challenge. Thankfully, software technology has advanced to minimize
obstacles through a series of data access routines that allow structured query languages
to access nearly all DBM and data file systems-relational or non-relational.
44. List TWO (2) most common data integration approaches. (2 marks)
- ETL (Extract, Transform and Load)
- ELT (Extract, Load and Transform)
47. Suggest THREE (3) causes for the increase in data integration software. (6 marks)
- Reduce data complexity. Data integration is about managing complexity, streamlining
these connections, and making it easy to deliver data to any systems.
- To increase the value of data through unified systems. Bringing disparate datasets
together increases the value of the information.
- To make data more available. Centralizing your data makes it easy for anyone at your
company to retrieve, inspect and analyze it.
48. Briefly explain TWO (2) difficulties that are encountered when data is integrated. (4 marks)
- Implementing Shared Data. Designing a system to manage business data on an
integrated basis means configuring your data architecture to handle company wide
access to information which can be difficult.
- Don’t understand how it works. Another common problems is when the user don’t
know what to do with the data.
49. Explain the differences between a data warehouse and virtual data integration system.
Illustrate your answers with diagrams. (8 marks)
Integrate by bringing the data into a single Leave the data at the sources and access it
physical warehouse. at query time.