You are on page 1of 4

Unit – 4

Q. 1 What is Role of CS in Data Science ?


1. CS set the stage for data science in that it provided the programming
languages necessary to process big data.
2. Computer science provide data structures into their programs, which provide
a method of organizing big data so that its elements are easily retrievable.
3. Computer Science provide the tools such as databases to efficiently store and
retrieve the data.
4. Computer Science helps in interpreting algorithm/tool behavior for different
business use cases.
5. CS gives Ability to design, scale and optimize technical solutions.
6. Data extraction involves heavy usage of SQL in data sciences.
7. CS helps to document and report analysis of large datasets.

Q. 2 What are four major components of Data Science ?


The four components of Data Science include :
1. Data Strategy
Developing a data strategy is simply determining what data are you going
to gather and why.

2. Data Engineering
Data Engineering is about the technology and systems that are leveraged to
access, organize and use the data. It primarily involves the creation of
software solutions for data problems.

3. Data Analysis and Models


The data analysis and mathematical modeling aspect of data science is
anything that involves the combination of :
 Computing
 Math and/or Statistics
 A domain
 The application of the scientific method or aspects of it.

4. Data Visualization and Operationalization


Visualization is not just about taking the data analysis and presenting it
“correctly”’. Sometimes, it involves going back into the raw data and
understanding what needs to be visualized based on the needs and goals of
both the user and the operations.
Operationalizing is really about doing something with the data; someone
(or occasionally a machine) has to make a decision and/or take an action
based on the math and computing that has happened.

Q. 3 What is NoSQL ?

1. NoSQL Database is a non-relational Data Management System, that does not


require a fixed schema.
2. It avoids joins, and is easy to scale.
3. The major purpose of using a NoSQL database is for distributed data stores
with humongous data storage needs.
4. NoSQL is used for Big data and real-time web apps.
5. NoSQL database stands for “Not Only SQL” or “Not SQL.”
6. It can store structured, semi-structured, unstructured and polymorphic data.
7. Features of NoSQL
 Non-relational : NoSQL databases never follow the relational model.
 Schema-free : NoSQL databases are either schema-free or have relaxed
schemas
 Simple API : Offers easy to use interfaces for storage and querying data
provided.
 Distributed : Multiple NoSQL databases can be executed in a
distributed fashion

8. Types of NoSQL Databases


 Key-value Pair Based
 Column-oriented Graph
 Graphs based
 Document-oriented

Q. 4 What is difference between SQL and NoSQL ?

SQL NoSQL

Relational Database Management Non-relational or distributed


System ( RDBMS ) database system.
These databases have fixed or static
They have dynamic schema
or predefined schema
These databases are not suited for These databases are best suited for
hierarchical data storage. hierarchical data storage.
These databases are best suited for These databases are not so good for
complex queries complex queries
Vertically Scalable Horizontally scalable
Follows CAP(consistency, availability,
Follows ACID property
partition tolerance)

Q. 5 What is Data Warehousing techniques with advantages ?


Data warehousing can be defined as the process of data collection and storage
from various sources and managing it to provide valuable business insights. It
can also be referred to as electronic storage, where businesses store a large
amount of data and information. It is a critical component of a business
intelligence system that involves techniques for data analysis.

Steps in Data Warehousing


The following steps are involved in the process of data warehousing :
 Extraction of data – A large amount of data is gathered from various
sources.

 Cleaning of data – Once the data is compiled, it goes through a cleaning


process. The data is scanned for errors, and any error found is either
corrected or excluded.

 Conversion of data – After being cleaned, the format is changed from the
database to a warehouse format.

 Storing in a warehouse – Once converted to the warehouse format, the


data stored in a warehouse goes through processes such as consolidation
and summarization to make it easier and more coordinated to use. As
sources get updated over time, more data is added to the warehouse.

Key benefits of data warehousing :


1. Saves Time : you won’t have to rely on the 24/7 availability of a
technical expert to troubleshoot problems associated with retrieving
information.

2. Improves Data Quality : you can ensure the reliability and quality of
your corporate data.
3. Improves Business Intelligence : You can use these to gather,
assimilate, and derive data from any source and set up a process to
leverage business analytics.

4. Leads to Data Consistency : Another important benefit of using central


data stores is the evenness of big data. This guarantees improved quality
and consistency of data.

5. Stores Historical Data : As a data warehouse allows you to store large


volumes of historical data from databases, you can easily investigate
different time phases and inclinations that can be ground-breaking for
your company.

6. Increases Data Security : Using a warehousing solution, you can keep


all your data sources consolidated and protected. This will
significantly decrease the threat of a data breach.

You might also like