You are on page 1of 16

ICT550

Class Activity #1 (Topic 1-4)


SCHEME

1. Differentiate between structured and unstructured data. Give ONE example for each type of
data. (6 marks)
Structured data
 Data that normally organized in smaller chunks (entities)
 Have the same predefined format, length
 Example: (data in database, point of sales data, web server log, sensory data from GPS,
medical devices etc)
Unstructured data
 Information that is not organized in a predefined manner
 Not predictable
 Example: (e.g email, images, word docs, videos, sound, messaging)
Semi-structured data
 is a form of structured data that does not conform with the formal structure of data
models associated with relational databases or other forms of data tables, but
nonetheless contains tags or other markers to separate semantic elements and enforce
hierarchies of records and fields within the data. Therefore, it is also known as self-
describing structure. Example: (e.g XML data, EDI data)

2. Define any THREE (3) phases in the data lifecycle. (6 marks)

3. Define data management. (5 marks)


Data management is the development and execution of architectures, policies, practices
and procedures in order to manage the information lifecycle needs of an enterprise in an
effective manner.

4. Distinguish data management and information management. (4 marks)


Data management Information management
Data management is the development  Program that manages the people,
and execution of architectures, policies, processes and technology in an
practices and procedures in order to enterprise
manage the information lifecycle needs
of an enterprise in an effective manner
Everything from file naming conventions Both electronic and physical
to policies and practices on creating metadata information.
and documentation for the long term
5. Differentiate between data management and data governance. (6 marks)
 Data management is the development and execution of architectures, policies, practices
and procedures in order to manage the information lifecycle needs of an enterprise in
an effective manner
 Data governance- CONSISTS OF THE PROCESSES, POLICIES, ORGANIZATION AND
TECHNOLOGIES REQUIRED TO MANAGE AND ENSURE AVAILABILITY,USABILITY,
INTEGRITY AND SECURITY OF DATA USED IN AN ENTERPRISE.

6. Building and managing knowledge is one of the greatest challenges that face organization in
the 21st century. List TWO (2) types of knowledge. Provide examples for each answer. (6
marks)
 Formal, explicit or generally available knowledge. This is knowledge that has been
captured and used to develop policies and operating procedures for example.
 Tacit knowledge. Within the organization there are certain people who hold specific
knowledge or have the ‘know how’.

7. Explain the importance of good data and information management. (6 marks)


 Ensures data for analysis are of high quality so that conclusions are correct
 Good DM allows further user of the data in the future and enables efficient integration
of results with other studies.
 Improved processing efficiency
 Improved data quality
 Improved meaningfulness of the data

8. Explain the differences between data, information and knowledge. Illustrate your answers
with examples. (6 marks)
9. Explain THREE (3) consequences of poorly managed data. (6 marks)
 less customer satisfaction
 increased running costs
 inefficient decision-making processes
 lower performance and,
 lowered employee job satisfaction
 increases operational costs since time and other resources are spent detecting
and correcting errors.

10. List THREE (3) data management functions. Describe how ONE (1) of the functions that you
have listed is implemented in your university. (8 marks)
 Data management functions:
 The discipline of development, execution and
supervision
 of plans, policies, programs, projects, processes,
practices and procedures
 That control, protect, deliver and enhance
 The value of data and information assets.
 Data Governence
 Data Architecture
 Data Development
 Data Operations
 Data Security Management
 Data Quality Management
 Reference and Master Data Management
 Data Warehousing
 Document and Content Management
 Meta-data Management

 Example of implementation of DM function in university environment


Data Storage Management: This particular function is used for the storage of data and any
related data entry forms or screen definitions, report definitions, data validation rules,
procedural code, and structures that can handle video and picture formats. Users do not
need to know how data is stored or manipulated.

11. Explain the FOUR (4) essential characteristics of high quality data. (8 marks)
o Complete – data that has all those items required to measure intended activity
or event
o Legible – data that the intended users will find easy to read and understand
o Relevant – meets the need of the information users
o Reliable – data is collected consistently over time and reflects the true facts
12. List any FOUR (4) objectives of data governance. (4 marks)
 To define, approve, and communicate data strategies, policies, standards,
architecture, procedures, and metrics.
 To track and enforce conformance to data policies, standards, architecture, and
procedures.
 To sponsor, track, and oversee the delivery of data management projects and
services.
 To manage and resolve data related issues.
 To understand and promote the value of data assets

13. Discuss TWO (2) primary functions of data governance under DAMA model. (4 marks)

14. Explain TWO (2) benefits of data governance. (4 marks)


 better decision making
 Ensure transparency
 Operational efficiency
 Protects the needs of the stakeholder
 Reduce cost and increase effectiveness
 Enables decision making
 Ensure transparency
15. Identify TWO (2) data quality problems in your university environment. Indicate how these
problems are solved. (6 marks)
 Missing values - impute data
 Duplicate data – remove duplicate data
 Noise – replace with correct data
 Invalid Data
 Outliers

16. Explain the importance of data management in an organization. (6 marks)


-refer question #4

17. Explain THREE (3) principles of good data and information management. (9 marks)
• Public information is published: Public information includes the objective, factual
information on which public services run and are assessed, on which policy decisions are
based, or which is collected or generated in the course of public service delivery. Public
information should be published wherever practicable, unless there are overriding reasons
not to.
• Information is managed: Information should be managed – stored, protected and
exploited – according to its value. Data and information managers need to consider the
whole lifecycle of the information, from identification of need, creation, quality assurance,
maintenance, reuse and ultimately to archiving or destruction once the information has
ceased to be useful.
• Information is fit for purpose Information must be good quality and fit for both its
primary purpose and potential secondary uses. It will not always be possible for the
originator to foresee secondary uses, so it is important that the quality of the information is
communicated consistently so future users can decide if it is suitable.

18. Discuss the role of information technology in data management. (5 marks)


Data is a representation of the organization. The organization uses this representation, the
data to operate record, manage, report and plan. Organizations have been creating and
using data long before computers were ever thought of. Data is clearly a business asset, not
an IT asset as is hardware or software. Prior to computerization, the business owned,
managed, understood and governed their data assets

19. Data and information created from data are widely recognized as enterprise assets. Discuss
THREE (3) reasons why you would agree with the above statement. (9 marks)
• Take active measures to improve the value of their assets. For example, they invest in
hiring the right people, training employees, and succession planning.
• Use (exploit might be a better word) their assets inside the organization and in their
marketplaces. For example, they use people’s brainpower to innovate and develop
strategies to market and sell their products. And they invest their dollars in plant and
equipment to manufacture them economically.
• Adjust their management systems in recognition of the asset’s special properties.
They put someone in charge (e.g., chief financial officers, heads of human resources), and
they recognize. For example, that managing a person and managing a dollar are not the
same.

20. Describe the sources of data in an organizational. (5 marks)


Data organization is about working more efficiently with data. Creating and using data
requires some level of data organization. Often this organization becomes time consuming
and error prone, in which case automated data organization methods should be considered.
The standard methods of data organization:
• File Transfers and Remote Access
• File Synchronization
• Collaboration
• Revision Control
Some automated and more efficient alternatives are suggested, but keep in mind that they
often require some configuration and familiarization with the software. If the standard
methods are adequate for your needs, then it is best to continue using them. If you think
you are spending too much time organizing your data, then you should consider looking into
the advanced methods.

21. Describe the types of data created and illustrate your answers with examples. (6 marks)
Refer question 1

22. Describe FOUR (4) importance of data management in any business organizations. Use
examples to illustrate your answers. (10 marks)
Productivity
Good data management will make Good data management makes it
your organization more productive. easier for employees to find and
On the flip side, poor data understand information that they need
management will lead to your to do their job. It allows them to easily
organization being very inefficient. validate results or conclusions they
may have. It also provides the
structure for information to be easily
shared with others and to be stored for
future reference and easy retrieval.

Cost Efficiency

Another benefit of proper data Maximizing cost efficiency in a


management can be that it should marketingcampaignishighly
allow your organization to avoid desirable for a business since the
unnecessary duplication. Be storing greatest product exposure is achieved
and making all data easily referable it for the least amount of financial
ensures you never have employees investment
conducting the same research,
analysis or work that has already been
completed by another employee.

Reduced Instances Of Data Loss


With a data management system and With a data management plan things
plan in place that all your employees will be put in place to ensure that
know and following it can greatly important information is backed up
reduce the risk of losing vital and retrievable from a secondary
information. source if the primary source ever
becomes non accessible.

Accurate Decision The primary reasons of bad data and


The corrective costs of inadequate data loss is that there is no data
data management can be significant and management system or plan is place or
can run into millions of dollars from a the plan or system is of poor quality. The
single occurrence. unfortunate part is that often
organization realizes that they have an
issue only after an issue arises. Instead of
being proactive most organization are
reactive, which in the long run costs
them significantly more.

23. To be useful data must also satisfy a number of conditions, identify and explain any TWO (2)
conditions for data to be useful. (5 marks)
Read: Using Various Measures The following two texts, in conjunction,
provide several reasons to consider
multiple sources of data to analyze, and
the types of data schools should consider.

Try it: Conduct a Data Audit Read Conducting a Data Audit. Use the
suggested protocol as a grade level or
leadership team to identify existing data
points and how (and if) they are collected,
organized, analyzed and acted upon.
24. List TWO (2) types of data that a banking institution would be interested in collecting.
Explain with reasons the need to collect this data. (6 marks)

Bank Deposit Bank deposits consist of money placed into banking


institutions for safekeeping. These deposits are made
to deposit accounts such as savings accounts,
checking accounts and money market accounts. The
account holder has the right to withdraw deposited
funds, as set forth in the terms and conditions
governing the account agreement.

Loan Interest Rates On the face of it, figuring out how a bank
makes money is a pretty straightforward
affair. A bank earns a spread on the money
it lends out from the money it takes in as a
deposit. The net interest margin (NIM),
which most banks report quarterly,
represents this spread, which is simply the
difference between what it earns on loans
versus what it pays out as interest on
deposits. This, of course, gets much more
complicated given the dizzying array of
credit products and interest rates used to
determine the rate eventually charged for
loans

25. List TWO (2) types of data that your university would be interested in collecting. Explain
with reasons the need to collect this data. (4 marks)
o Quantitative - The University want know how many student by go through the
number that represent it.
o Qualitative - Based on the quality observer and generally not to be measure
with numerical result. For example the university want know the
measurement of scale with value assign

26. Identify and discuss FOUR (4) causes of data quality issues at the data sources stage. (8
marks)
Entry Quality Entry quality is probably the easiest problem
to identify but is often the most difficult to
correct. Entry issues are usually caused by a
person entering data into a system. The
problem may be a typo or a willful decision,
such as providing a dummy phone number or
address. Identifying these outliers or missing
data is easily accomplished with profiling
tools or simple queries.
Process quality Was the integrity of the information
maintained during processing through the
system?

Identification quality Are two similar objects identified correctly to


be the same or different?

Integration quality Is all the known information about an object


integrated to the point of providing an
accurate representation of the object

27. Discuss FOUR (4) dimensions of data quality. Provide examples for each answer. (8 marks)
- Validity/accuracy
o refer to how closely the data correctly captures what it is designed to capture
o A European school is receiving applications for its annual September intake and
requires students to be aged 5 before the 31st August of the intake year
- Reliability/consistency
o data is collected consistency over time and reflects the true facts
o School admin: a student’s date of birth has the same value and format in the
school register as that stored within the Student database.

- Completeness
o the data that has all those items required to measure intended activity or event
o Parents of new students at school are requested to complete a Data Collection
Sheet which includes medical conditions and emergency contact details as well
as confirming the name, address and date of birth of the student.
- Timeliness
o data is collected within a reasonable agreed time period
o Tina Jones provides details of an updated emergency contact number on 1st
June 2013 which is then entered into the Student database by the admin team
on 4th June 2013. This indicates a delay of 3 days. This delay breaches the
timeliness constraint as the service level agreement for changes is 2 days

28. Discuss THREE (3) benefits of data quality management. (9 marks)


o Increased revenues: When a business is able to make decisions on a foundation of
high quality, validated data, positive top-line outcomes are a likely result. Unreliable
data results in less confident decisions that can often lead to missteps and rework that
don’t deliver increased revenues.
o Reduced costs: If data quality enables an organization to complete a project
correctly the first time around, it also enables the organization to operate more
efficiently and complete more projects. Project delays due to course corrections burn
through budgets and slow business growth.
o Less time spent reconciling data: Manually reconciling data is a time sink that
consumes costly resources, as manual reconciliation does not scale. As data sources and
associated error rates increase, the law of diminishing returns impedes progress, which
implies that rules-based automation can help contain reconciliation time and effort.

29. Describe “dirty data”. Provide TWO (2) examples of dirty data. (4 marks)
In a data warehouse, dirty data is a database record that contains errors. Dirty data can be
caused by a number of factors including duplicate records, noise, incomplete or outdated data
(missing value), outlier and (invalid data) the improper parsing of record fields from disparate
systems.

30. Define the following data quality issues and give ONE example for each issue: i) Noise, ii)
Outliers. (6 marks)
 In general, noise is any undesirable or unwanted signal, or part of a signal. Noise
may or may not be random.
 An "outlier" is a data point or value that differs considerably from all or most
other data in a dataset.

31. Discuss THREE (3) causes of poor data quality. Suggest solutions to improve data quality. (9
marks)
- Manual data entry
a. People mistype. They choose the wrong entry from a list. They enter the right
data value into the wrong box.
b. Given complete freedom on a data field, those who enter data have to go from
memory. Is the vendor named Grainger, WW Granger, or W. W. Grainger?

- Information obfuscation (not clear info)


a. If a field is not available, an alternate field is often used. This can lead to such data
quality issues as having Tax ID numbers in the name field or contact information in the
comments field.

- After the Merger


a. They usually happen fast and are unforeseen by IT departments.
b. Mergers can result in a loss of expertise when key people leave midway through
the project to seek new ventures.

Solutions:
- Monitoring: Make public the results of poorly entered data and praise those who enter
data correctly.
- Real-time Validation: In addition to forms, validation data quality tools can be
implemented to validate addresses, e-mail addresses and other important information
as it is entered.
- Communication: Regular communication and a well-documented metadata model will
make the process of change much easier.
32. Describe data quality management process. (6 marks)

Data definition: In this step the data describing the business of the undertaking must be
appropriate and complete. The definition of the data involves the identification of data
requirements that fulfill this criterion. Data requirements should contain a proper
description of the single items and their relationship.

Data quality assessment: Data quality assessment involves validating the data according
to the three criteria: appropriateness, completeness, and accuracy. The assessment
should consider the channel through which data is collected and elaborated, whether
through internal systems, external third parties, or publicly available electronic sources.

Problem resolution: The problems that are identified during the assessment of the data
quality are addressed in this phase. It is important to document data limitations and
justify the remedies applied to deficient data.

Data quality monitoring: Data quality monitoring involves monitoring the performance
of the associated IT systems, based on data quality performance indicators. Data quality
monitoring involves two dimensions: quantitative and qualitative.

33. Describe FOUR (4) benefits of implementing data quality program. (8 marks)
o Deliver high-quality data for a range of enterprise initiatives including business
intelligence, applications consolidation and retirement, and master data management
o Reduce time and cost to implement CRM, data warehouse/BI, data governance, and
other strategic IT initiatives and maximize the return on investments
o Construct consolidated customer and household views, enabling more effective cross-
selling, up-selling, and customer retention
o Help improve customer service and identify a company's most profitable customers
34. Discuss THREE (3) measures to protect the quality of data. (6 marks)
 Protect data in transit with IP security. Encapsulating Security Payload (ESP) is the
protocol IPsec uses to encrypt data for confidentiality. It can operate in tunnel mode, for
gateway-to-gateway protection, or in transport mode, for end-to-end protection.
 Use disk encryption. Disk encryption products can be used to encrypt removable USB
drives, flash drives, etc. Some allow creation of a master password along with secondary
passwords with lower rights you can give to other users. Examples include PGP Whole
Disk Encryption and DriveCrypt, among many others.
 Secure wireless transmissions. You should send or store data only on wireless networks
that use encryption, preferably Wi-Fi Protected Access (WPA), which is stronger than
Wired Equivalent Protocol (WEP).

35. Discuss THREE (3) examples of the importance of data quality in organizational processes. (9
marks)
 Data Quality Defined. For example, one street address may occur in more than one state
and more than one country. Therefore, correctly document an address by referring to
the street name and number as well as the state and country.
 Performance. Darryl Enos writes in "Performance Improvement: Making It Happen" that
performance is the achievement of tangible, specific, measurable and worthwhile goals.
To sustain evidence-based performance, use quality data to measure and report on
business processes. Only then can you determine if you have achieved the objectives of
your performance improvement initiatives and whether they are appropriate
 Data and Performance.Customers, product performance, internal operations and
performance, and cost and financial data are each used to formulate performance
improvement strategies and gauge their success. Use accurate, consistent and complete
data to select appropriate performance improvement initiatives and fairly evaluate the
success of each.

36. Describe any THREE (3) reasons for an organization’s data quality problems. (6 marks)
 Manual Data Entry: Data is fed manually into the system many times, and is hence
prone to human error. Since user data is often entered through various user-friendly
interfaces, they may not be directly compatible with the internal data representation. In
addition, end-users tend to fill ‘shortcut’ information in fields that they perceive to be
unimportant, but which may be crucial to internal data management. The data operator
may not have the expertise to understand this data and might incorrectly fill values in
the wrong fields or may mistype the information.
 Real-Time Interfaces: This is in complete opposition to batch feeds. With real-time
interfaces and applications becoming the flavor of interactive and enhanced user
experience, data enters the database in real-time and often propagates quickly to the
chain of interconnected databases. This triggers actions and responses that might be
visible to the user almost immediately, leaving little room for validation and verification.
This causes a huge hole in the data quality assurance where a wrong entry may cause
havoc at the back end.
 Data Cleansing: Every company needs to rectify its incorrect data periodically.
Manual cleansing has been taken over by time and effort saving automations. Although
this is very helpful, it has the potential risk of wrongly affecting thousands of records.
The software used to automate may have bugs, or the data specifications which form
the basis of cleansing algorithms may be incorrect. This can result in making absolutely
valid data, invalid, and virtually reverse the very advantage of the cleaning exercise.

37. Discuss THREE (3) benefits to the organization of implement a data quality assurance
program in maintaining data quality.(6 marks)
- Help improve customer service and identify a company’s most profitable customers.
- Reduce time and cost to implement CRM, data warehouse and other strategic IT
initiatives and maximize the return on investments.
- Provide business intelligence on individuals and organization for research, fraud
detections and planning.

38. Explain the purpose of root cause analysis. (2 marks)


Root Cause Analysis helps to identify what, how, and why something happened, thus
preventing recurrence.

39. Explain the main steps in root cause analysis. (6 marks)


 Problem Selection: A business always has problems so all that is required is to order
them on the basis of risk to the organization and deal with the most urgent ones first.
 Problem Statement: Be precise in the selection, keep to a tight definition of the problem
and make sure that the problem has a potential solution.
 Implement the Corrective Actions: Implement the Corrective and Preventive Actions
making sure to communicate them to all involved. Clearly communicate the reasons,
benefits and the required time lines. Don’t miss out anybody.

40. Explain the FOUR (4) main steps in data management plan. (8 marks)
http://instr.iastate.libguides.com/dmp/writingDMP
 Data identification/collection
 Data organization
 Data documentation
 Data storage & security
 Data preservation
 Data sharing

41. Differentiate between data profiling and parsing. (4 marks)


 Data profiling - is a technology for discovering and investigating data quality issues,
such as duplication, lack of consistency, and lack of accuracy and completeness.
 Parsing - Breaking a data block into smaller chunks by following a set of rules, so that
it can be more easily interpreted, managed
42. Identify the FOUR (4) methods in data cleaning. (4 marks)
 Correct
 Filter
 Detect and Report
 Prevent

43. One of the most fundamental challenges in the process of data integration is
heterogeneous data. Discuss strategies to overcome the challenges. (6 marks)
- A detailed analysis of the characteristics and uses of data is necessary to mitigate issues
with heterogeneous data. First, a model is chosen-either a federated or data warehouse
environment- that serves the requirements of the business applications and other uses
of the data.
- Then the database developer will need to ensure that various applications can use this
format or, alternatively, that standard operating procedures are adopted to convert the
data to another format.
- Bringing disparate data together in a database system or migrating and fusing highly
incompatible databases is painstaking work that can sometimes feel like an
overwhelming challenge. Thankfully, software technology has advanced to minimize
obstacles through a series of data access routines that allow structured query languages
to access nearly all DBM and data file systems-relational or non-relational.

44. List TWO (2) most common data integration approaches. (2 marks)
- ETL (Extract, Transform and Load)
- ELT (Extract, Load and Transform)

45. Explain THREE (3) challenges in data integration. (5 marks)


- Testing along with the implementation, the proper testing is a must to ensure that the
unified data are correct, complete and up-to-date
- Heterogeneity problems
- Cultural and organizational readiness
- Technological issue
- Missing or incomplete documentation

46. State the main goal of data integration. (2 marks)


- To uniform query access to a set of data sources.
- To handle heterogeneity, autonomy and semi-structure data

47. Suggest THREE (3) causes for the increase in data integration software. (6 marks)
- Reduce data complexity. Data integration is about managing complexity, streamlining
these connections, and making it easy to deliver data to any systems.
- To increase the value of data through unified systems. Bringing disparate datasets
together increases the value of the information.
- To make data more available. Centralizing your data makes it easy for anyone at your
company to retrieve, inspect and analyze it.
48. Briefly explain TWO (2) difficulties that are encountered when data is integrated. (4 marks)
- Implementing Shared Data. Designing a system to manage business data on an
integrated basis means configuring your data architecture to handle company wide
access to information which can be difficult.
- Don’t understand how it works. Another common problems is when the user don’t
know what to do with the data.

49. Explain the differences between a data warehouse and virtual data integration system.
Illustrate your answers with diagrams. (8 marks)

Data Warehouse Virtual Data Integration

Integrate by bringing the data into a single Leave the data at the sources and access it
physical warehouse. at query time.

50. Discuss the concepts of federated database. (4 marks)


 federated database system is a type of meta-database management system (DBMS),
which transparently maps multiple autonomous database systems into a
single federated database
 Simplest architecture
 Every pair of sources can build their own mapping and transformation
 Source X needs to communicate with source Y  build a mapping between X and Y
o Does not have to be between all sources (on demand)
Discuss the concepts of federated database. (4 marks)

You might also like