You are on page 1of 14

veryImp questions can be asked as compulsory

questions
Learn them thoroughly.
Learn all the answers given too
Q) Compare between storage types?( 3 points)

Q) Expalin containerization 3 mk) very imp


Containerization is a method of virtualizing an operating
system so that multiple isolated applications can run on a
single host operating system.
Containerization is a method of virtualization that packages
applications and their dependencies into isolated, self-
contained containers, allowing them to run securely and
independently from each other on the same host. It provides
an efficient way to
deploy, manage, and scale applications across different
platforms.
For example, Docker is a popular form of containerization
that allows software
developers to package their applications into standardized
isolated containers. Docker makes it easier for applications
to run on any system, regardless of its underlying
infrastructure. The most significant benefit of
containerization is increased efficiency. Containers allow
applications to run in isolated, secure environments,
improving resource utilization and allowing for more
flexibility in deployment. Additionally, containers make
deploying, scaling, and managing applications easier,
resulting in improved
operational agility. Steps are:
Package the application and dependencies into a standard
file format, such as a Docker image.
Deploy the packaged application and its dependencies into a
container.
Execute the containerized application in the container
runtime environment.
Q) List container registers types?
Ans; Public and private registers
Public container registries are generally the faster and easier
route when initiating a container registry.
2)Public registries are also seen to be easier to use.
3)they may also be less secure than private registries.
4) They are for smaller teams and wroks for standard
and open
sourced images from public registries.

A private container registry is set up by the organization


using it.
2) Private registries are either hosted or on premises and
popular with
larger organization or enterprises that are more set on using
a
container registry.
3) Having complete control over the registry in development
allows an
organization more freedom in how they choose to manage it.
4) private registries are seen to be the more secure .

Q) explain aws sagemaker? 4mk

What is sagemaker
SageMaker provides every developer and data scientist with
the ability to build, train, and deploy machine learning
models quickly. Amazon SageMaker is a fully-managed
service that covers the entire machine learning workflow to
label and prepare your data, choose an algorithm, train the
model, tune and optimize it for deployment, make
predictions, and take action. Your models get to production
faster with much less effort and lower cost
Amazon SageMaker is a fully-managed service that enables
data scientists and developers to quickly and easily build,
train, and deploy machine learning models at any scale.
Amazon SageMaker includes modules that can be used
together or independently to build, train, and deploy your
machine-learning models.

Q) Describe docker? Very imp


1) Docker is the containerization platform that is used to
package your application and all its
dependencies together in the form of containers to make
sure that your application works
seamlessly in any environment which can be
developed or tested or in production.
2) Docker is a tool designed to make it easier to
create, deploy, and run applications by using
containers.
3) Docker is the world’s leading software
container platform. It was launched in 2013 by a
company called Dotcloud, Inc which was later
renamed Docker, Inc. It is written in the Go
language.

Q) What is DevOps?2 mk draw diagram


And write the content in red
The term ‘DevOps’ ,As its name implies it is a
conjunction of two different words,
‘Development’ and ‘Operations’.
The reason behind this name is that it combines
these two areas of an organization.
IN conventional organizations where these two
areas were considered separate, DevOps
believes

in bridging this gap by incorporating their


operations with one another at each step.
By means of combining, DevOps helps achieve
agility, quality, and consistency at the same
time.

Q) difference between hybrid and multi cloud?(2mk)


 Hybrid clouds always include a private cloud
and are typically managed as one entity
 Multi-clouds always include more than one
public cloud service, which often perform
different functions. Multi-clouds do not have to
include a private cloud component, but they can,
in which case they can be both multi-cloud and
hybrid cloud.
Q) Differences between Batch and Stream Processing
in storage types? 3mk -3 points
Batch and Stream processing techniques are
different in the following ways −

Key Factor Batch Stream


Processing Processing
Infrastructure Less complex Complex than
Complexity as it does not Batch
need constant processing
data entry or
unique
hardware
support.
Works best It handles very
Data Size for large data small data
chunks. chunks.
Data Data
processing processing
takes place takes place
Occurrence of
on the data immediately.
Processing
which is
stored over
some time.
The data size The data size
Knowledge of is known or is neither
Data Size can be known in
before anticipated in advance nor
processing advance. can be
anticipated.
Time Required Long, Short, typically
for Data typically in in seconds or
Processing minutes or milliseconds.
hours, or
even days,
depending
upon the
Batch size.
On Almost
completing immediately.
Provision of
the Batch
Response
Processing
operation.
Large storage Less storage is
space is required only
Storage Space
required for for processing
Requirement
this small data.
processing.

Q) list cloud based tools for data science?3


mk -list 3 tools

 Statistical Analysis System (SAS)


 Apache Hadoop
 Tableau
 TensorFlow
 BigML
 Knime
 RapidMiner
 Excel
 Apache Flink
 PowerBI
 Google Analytics
 Python
 R (RStudio)
 DataRobot
 D3.js
 Microsoft HDInsight
 Jupyter
 Matplotlib
 MATLAB
 QlikView
 PyTorch
 Pandas
 Scikit Learn
 WEKA
 Minitab

Q)describe cloud data warehouse function? 3 mk


A data warehouse is a centralized repository for
storing and managing large amounts of data from
various sources for analysis and reporting. It is
optimized for fast querying and analysis, enabling
organizations to make informed decisions by
providing a single source of truth for data.
Q) Explain only functions in paper any three 3mk
Functions of Data warehouse: It works as a collection
of data and here is organized by various communities
that endures the features to recover the data functions.
It has stocked facts about the tables which have high
transaction levels which are observed so as to define
the data warehousing techniques and major functions
which are involved in this are mentioned below:
1. Data Consolidation: The process of combining
multiple data sources into a single data
repository in a data warehouse. This ensures a
consistent and accurate view of the data.
2. Data Cleaning: The process of identifying and
removing errors, inconsistencies, and irrelevant
data from the data sources before they are
integrated into the data warehouse. This helps
ensure the data is accurate and trustworthy.
3. Data Integration: The process of combining data
from multiple sources into a single, unified data
repository in a data warehouse. This involves
transforming the data into a consistent format
and resolving any conflicts or discrepancies
between the data sources. Data integration is an
essential step in the data warehousing process to
ensure that the data is accurate and usable for
analysis. Data from multiple sources can be
integrated into a single data repository for
analysis.
4. Data Storage: A data warehouse can store large
amounts of historical data and make it easily
accessible for analysis.
5. Data Transformation: Data can be transformed
and cleaned to remove inconsistencies, duplicate
data, or irrelevant information.
6. Data Analysis: Data can be analyzed and
visualized in various ways to gain insights and
make informed decisions.
7. Data Reporting: A data warehouse can provide
various reports and dashboards for different
departments and stakeholders.
8. Data Mining: Data can be mined for patterns and
trends to support decision-making and strategic
planning.
9. Performance Optimization: Data warehouse
systems are optimized for fast querying and
analysis, providing quick access to data.

Q)Explain Azure Ml studio?4mk

Azure is Microsoft’s cloud computing platform which


helps to build solutions to meet business goals. It
supports infrastructure (IaaS), platform (PaaS), and
software as a service (SaaS) computing services. It also
supports advanced computing services like artificial
intelligence, machine learning, and IoT. Azure allows
you to build, manage and deploy the application on a
global network.
Microsoft Azure Machine Learning Studio is a
collaborative, drag-and-drop tool you can use to
build, test, and deploy predictive analytics
solutions on your data. Machine Learning Studio
publishes models as web services that can easily
be consumed by custom apps or BI tools.
Q) explain jupyter notebook and explain its
workflow? 4mk
What is Jupyter? • Ju(lia) + Py(thon) + (e)R • The
Jupyter Notebook is an open-source web application
that allows you to create and share documents. • This
document contain live code, equations, visualizations
and narrative text.
Advantages of Jupyter? • Useful for data cleaning and
transformation, numerical simulation, statistical
modelling, data visualization, machine learning, and
much more. • Language of choice  40+ Languages •
Notebooks can be shared with others using email,
Dropbox, GitHub and the Jupyter Notebook Viewer. •
Your code can produce rich, interactive output: HTML,
images, videos, and custom MIME types. • Big data
integration - Leverage big data tools, such as Apache
Spark, from Python, R and Scala. Explore that same
data with pandas, scikit-learn, ggplot2, TensorFlow.

Limitations of Jupyter • It messes with your version


control. • The Jupyter Notebook format is just a big
JSON, which contains your code and the outputs of the
code • Code can only be run in chunks.

You might also like