Professional Documents
Culture Documents
Database Development
Name
Tutor
Course
Date
Running head: DATABASE DEVELOPMENT 2
1. Three tasks to be performed in the process of improving the quality of datasets by means
of the Software Development Life Cycle methodology with the description of the activity at
each stage.
It is cheaper to correct database related issues when discovered in an earlier stage than
when they erupt at the final phase of development. Therefore, there is a need to perform data
quality checks in each phase of SDLC for the sake of coming up with an error free product as
deliberated below;
Development Phase
At this stage is the totally coding and engineering of the system with the effort to meet
the set system requirements. Bassil (2012) explained that for the sake of quality, there should be
an iterative process in data assessment so that the end product is perfect. Software and hardware
specifications are reviewed together with the system architecture. Data that is being created as
well as existing data should be monitored well. However, there should be set processes of error
detection with special tools like CA Veracode Greenlight and CA Service Virtualization.
Testing Phase
This is where the code produced during the development phase is tested. For the purpose
of refining quality datasets, there is a need to consider process control together with
improvement. There should be both dynamic, static, and manual analysis carried out in this
phase. A comprehensive array of functional, integration, performance, and even unit testing is
Maintenance Phase
Running head: DATABASE DEVELOPMENT 3
Here, systematic, application and administrative changes are witnessed in the system of
the application being developed. There has to be an appropriate continuous monitoring metrics
for the purpose of checking the quality of data hence providing the means for taking speedy
action when need be. In this way, error correction is easily achieved hence quality data is
realized.
2. Actions to be performed with the aim of optimizing the selection of records and entire
Automated controls can be well applied in the design stage of SDLC whereby controls
including processing, input and output are employed for the purpose of security, reliability, and
integrity of the system and also datasets (Chikkerur et al., 2012). For instance, duplicate
information and blank fields are avoided with the help of input controls like duplication checks
and completeness checks. Automating process controls, on the other hand, monitors the
correctness of the system in processing and also in information recording. Error detection,
process design, and process control are some of the quality management techniques that can be
3. Three maintenance plans together with three activities to be performed with the aim of
Three maintenance plans. Corrective plan, preventive plan, and maintenance plan are
vital for the purpose of improving data quality. The corrective plan is done after a defect has
been witnessed, unlike a preventive plan which is a precaution put in place to avoid errors that
may emerge. The maintenance plan, on the other hand, involved daily serving of the entire
system.
Running head: DATABASE DEVELOPMENT 4
Error Detection and Correction. Here, activities that can be performed while
improving data quality. Missing values are checked, the available data is compared to the correct
baseline, and also the time stamp that is associated with the current data is examined. The
complexity of data like the processing stages, outputs, and inputs is considered while
Process control and improvement. The quality requirements of data are defined by the
Total Data Quality Management (TDQM) which is a methodology that results in analyzed and
improved data. The methodologies that support TDQM are quality dimension visualization,
Process design. Here the data processes are built as new and the existing ones are
redesigned for the purpose of either eliminating or reducing data errors. Therefore, the quality of
4. (i)The most efficient method for planning proactive concurrency control together with
environment.
results in inconsistencies. Rows, pages, cells, and even tables are locked by means of granular
locking schemes. High and low granularity approach are two ways or rather methods that’s that
serves databases that are distributed in nature with consistency. Therefore, maximum
concurrency is attained with high granularity despite that it needs additional overhead unlike low
granularity that reduces concurrency and at the same time requires minimal overhead. However,
Running head: DATABASE DEVELOPMENT 5
proactive concurrency control is attained within the system by means of providing extra
overhead by means of locking granularity at diverse stages of object oriented hierarchy levels
(ii) How to avoid record-level locking of the database that is in use due to its current
transactions while employing the verify method in planning out of a system in a more
effective manner.
Serializability model, which is a transaction isolation model is used to make it look like all the
transactions always happen at one time. Multiple users are provided with a separate view of
real-time data hence avoiding record-level locking interfering with the database with the help of
Discussion 1
Challenges that come with big data. Big data generally implies to a massive amount of
data that may be structured or unstructured to an extent that it is so large that the means of
processing it with traditional software techniques and databases is difficult. The processing
capacity that is currently available finds it difficult to manage its capacity and also its speed.
There are challenges that come with big data which have not been an issue for the traditionally
designed databases like the relational ones (Özsu and Valduriez, 2011). The first one is that big
data is made up of cluster servers where each one has a slice of data that is stored in then. There
are multiple uses of nodes among applications when communicating in this clusters. This makes
Running head: DATABASE DEVELOPMENT 6
it hard to protect big data since it needs one to secure the whole data center and not a single
server.
The other challenge with big data is the fact that it lacks a standard cluster. Tuple stores,
wide columnar stores, and graph data are just but a few to mention among the more than one
hundred and fifty data variants available in bid data and each of them with a unique
specialization. Components only can be swapped between many of these variations but things
like resource manager, data model, data access layer and orchestration tools among others are
interchangeable. While building these platforms, security was not considered but only
performance and scalability was what was the building blocks. This leads to limited capabilities
When you talk of compatibility between big data and the existing traditional tools, a
number of traditional tools do not fit and work in a good way with the technologies that are seen
in big data. The capabilities that the traditional products have is outpaced with the velocity of
data, multi-node design, sheer scale and variety that comes with big data. There are also
challenges in terms of scaling on some forms of security like masking, encryption that is row-
level and even analyzing packets. However, some of the forms of security like query monitoring
How NoSQL addresses these challenges. The term NoSQL is used to give the
difference between the relational database and these platforms simple to carry the meaning of
“Not Only SQL”. The most known way that NoSQL approaches big data security issues is by
means of a model known as “walled garden security model”. In this approach, the entire
structure is placed on a separate network allowing it to control its logical access through access
controls and firewalls. This is to mean that within the NoSQL, there is no security but only on
Running head: DATABASE DEVELOPMENT 7
the outer protective shell of applications and network around the database (Hecht and Jablonski,
2011). It is a cost-effective and simple approach but only for organizations that are not so much
The other way that NoSQL uses in approaching big data challenges is by means of third
party products or leverages security tools that are made in the NoSQL cluster. Some of these
tools include Kerberos which serves the function of node authentication, SSL or TSL which
assists in securing communication, the transparent encryption which offers data-at-rest security
among others. The only setbacks are that they do not control rogue admins despite being most
Data-centric security is another NoSQL security model that is known for protecting data
even before the very data moves to a data repository that is bigger. This is done with the help of
basic tools like masking, tokenization, and also the data element encryption. In an event where
the system that is tasked with processing data cannot be in one way or the other trusted, data-
centric security model is employed. This, therefore, is to mean big data clustered are not trusted
in information keeping by many enterprises. The controls are defined on data before any effort of
NoSQL data models. Denormalization model is one of the NoSQL data models that
entails copying of similar data into multiple tables or rather documents with the aim of
simplifying the process of querying or so that a user’s records can fit into a certain data model.
This model is advantaged whereby data needed for a query to be processed is grouped in one
place hence resulting to simplicity in query processing. Unlike traditional databases where
modeling-time normalization and what can be termed as query-time that adds more complexity
Running head: DATABASE DEVELOPMENT 8
on the side of the query processor, denormalization provides for storage of data in structures that
database to fit a certain application. Online Analytical Processing applications (OLAP) like
financial reporting, business reporting and sales, and budgeting are the most beneficiaries of
denormalization. This is due to their behavior of extracting data that has been kept for a longer
period. Here, denormalization helps by avoiding joins in the databases, reduced tables, reduced
Discussion 2
These are application software that tasked with retrieving, transforming, reporting, and
even analyzing data form systems that are internal or even external (Turban et al., 2010). There
are several tools that are designed which can as well be used to report business performance like
Actuate business intelligence and reporting tools (BIRT). It comes with the advantage
of being open source which is purely Java coded with the capability of publishing reports across
multiple data sources like XML, business relational databases, to even Java objects that are in-
memory. It also has the character of being composed with a component of Java that is runtime. It
has features like the single view of all data, user friendly, analytical techniques that are best
Popularly known to be an application that ranks at an enterprise level and usually for server
Running head: DATABASE DEVELOPMENT 9
systems and open clients too. It is currently ranked as the best among organizations due to its
portability and quality services that it provides. It has features like simple warehouse architecture
in terms of its data, it is flexible, its applications are compatible with any system, it can be easily
utilized due to its’ modular concept, it has support in terms of cloud deployment and On-
premise, and the best of all is that it can be easily integrated with SAP and other applications that
are not SAP in nature. It has special add-ins that play a vital role as far as business performance
reporting is concerned like excel add-ins and other BI platforms like arcplan, Cognos, QlikView
among others.
company has to consider before purchasing any business intelligence product. For instance, when
a company desires to venture into the business of buying the above discussed business
intelligence tools, aspects like functionality, integration capabilities, and even the benefits that
the product will bring to the company have to be considered. Looking at BIRT, for instance, it is
estimated to be one of the most expensive business intelligence tools currently. It is estimated to
cost around 20 000 dollars a year for a company to be able to get full services that it comes with.
On the other hand, SAP is much cheaper than BIRT since it is estimated to cost 3213 dollars a
The functionality part of BIRT considering the price that it comes with, it is more
complex whereby the cost of training the users is also incorporated in the pricing of the software.
There have to be certain configurations done to fit your business before it is released to the
buyer. The integration part of BIRT also is complex meaning it has to run on Java platform only
making it expensive for the buyer. SAP on the other has it is one of the tools that can be easily
integrated to multiple environments and the good thing about it is that it works on any browser.
Running head: DATABASE DEVELOPMENT 10
This feature makes its cost to be lower. The functionality character of SAP is recommendable
due to its ease to use and portability. This is the reason as to why the vendors did not include the
cost of training and integration to the product. Therefore, most enterprises nowadays are going
for SAP due to the benefits it comes with compared to BIRT. An organization does not require
special servers and technicians to maintain and integrate SAP compared to the requirements that
References
Bassil, Y. (2012). A simulation model for the waterfall software development life cycle. arXiv
preprint arXiv:1205.6904.
Chikkerur, S., Sundaram, V., Reisslein, M., & Karam, L. J. (2011). Objective video quality
Cowling, J. A., & Liskov, B. (2012, June). Granola: Low-Overhead Distributed Transaction
Hecht, R., & Jablonski, S. (2011, December). NoSQL evaluation: A use case oriented survey.
In Cloud and Service Computing (CSC), 2011 International Conference on (pp. 336-
341). IEEE.
Turban, E., Sharda, R., & Delen, D. (2010). Decision Support and Business Intelligence Systems
(required). Google Scholar.