Benefits of using git or any version control system.

If you want to revert your files or even projects to previous state, to

know who the person last modified some files to pinpoint the errors,
and know who cause the problems compare 2 versions of files or
systems to, in general to have a safety net so when you screw things up
or lose some files you can go back and get these files.
Git has its own database that records the changes, files, commits so you
can query this database if you want to find some files

Why do ppl need version control system?

Historically, they first invented RSC to save patches of files, that is the
changes in the files. Then they needed one central repo to save those
modified files such as subversion or perforce that has a single server
that contain all the versioned files.
The drawbacks of the legacy systems are the single point of failure, on
server down and you lose all your work.
Difference between Git and other version control system:
Other control systems store information as list of file-based changes, so
they call it delta-based version control (storing data as changes to base
changes to a base of original file. Whereas git saves snapshots of the
files. In git each time you change or commit git store picture or
snapchat of your whole files, and stores a reference to that snapshot.
Every user has their own local database, so if you need to see old
version of your project, git just pull this version from your local
database instead of going to the central repository from remote server.
Every file in Git is check summed before it is stored in git, and then it
referred it to by this checksum.
Git contents:
Working tree is the single checkout of one version of your project,
these files are pulled out of the compressed database in the Git
directory and placed in your disk to modify or use.
File are committed that means it’s in the git repo, if file modified and
added to the staging area (using the add command) that means its
staged, if the file has been changed since its has checked but not staged
that means its modified.
Monolithic Systems: consists of components that are tightly coupled
together and have to be developed, deployed, managed as a single
entity, because they all run together as a single operating system
process. Changes in one part means redeployment of the whole project
or application, as the time goes and to deal with increasing number
load of the system or the application. running these systems require,
vertically expanding by adding more hardware resources like CPUs and
memory and other server components. (Also known as scaling up). Or
horizontally expanding by adding more servers or more replica of the
Microservices runs independently, and communicate with other
microservices through simple, well-defined interface (called API)
Microservices communicate through synchronous protocol https.
Restful representational state transfer API, or through asynchronous
protocol AMQP asynchronous messaging queuing protocol. Each
microservice is a standalone process with relatively static API, a change
on them does not require the redeployment of the whole system. In
the book, the 4 next titles show that microservice is great solution
compared to legacy monolithic system, but it has also some drawbacks,
so the 4 titles address these problems, and give an introduction to what
Kubernetes can offer as a new solution (titles are scaling up, deploying
microservice, understanding the divergence of environment
Microservices dependencies mean libraries, packages, shared libraries,
system process.
The data engineers’ mission is to build highly scalable stream
processing applications for moving, enriching, and transforming huge
amount of data in real time, these skills are often needed to support
business intelligence, analytic pipeline, threat detection, event
processing and more. Streaming of data (non resting, and continuously
changing), like Netflix, and IOT sensors, medical sensors, user and
customer analytic software, application and server logs, kafka provides
solution to access data in its flowing state, working with this continuous
streaming unbounded data quickly and efficiently. Apache kafka is a
platform for storing, ingesting, processing.
Producers publish, Consumers subscribe, topics or streams, logs
append only data structure that capture ordered sequence of events.
Events are piece of data with a certain structure. The architecture of
kafka cluster works as medium between producers and consumers.
Logs are immutable, you can’t update records, instead you append only
the new records, and the original record remain untouched.
Comparison between virtual machines and containers. Virtual machines
need to run its own system process in addition to the apps process, the
container on the other hand is a single isolated process running in the
host OS, consuming only the resources that the app consumes without
the overhead of any additional resources.
Technically speaking about the difference between virtual machines
and the containers: the virtual machines have their own OS running and
sharing the same bare metal hardware. Underneath these virtual
machines are the host OS, and the hypervisor which divided the
physical resources into smaller virtual resources sets that can be used
by the operating systems of the separated virtual machines.
Applications running inside those virtual machines perform system calls
to the guest OS kernel which subsequently perform x86 instructions to
the host physical CPU through the hypervisor. Containers on the other
hand perform system calls to the same OS host kernel. This means only
kernel performs x86 instructions on the host CPU. The only benefits for
the virtual machines the full isolation they provide, because each virtual
machine runs its own OS kernel. Linux namespace makes sure each
process sees its own personal views of the system (files, processes,
network interfaces, hostname), and Linux control group cgroup limits
the number of resources the process can consume (CPU, memory,
network bandwidth).
Each Linux system has a single namespace, each System resources such
as process ID, user ID, filesystem, network belong to one namespace.
But you can create additional namespaces and organize resources
across them. The following name spaces exist:
Mount (mnt), process ID, network (net) , inter process communication
IPC, user ID.
Docker components:
Docker images: package that contain the application and its
environment, it contains the file system and any other dependencies
and meta data, such as the path of the executables that should be
executed when the application will run.
Docker registries:
Repositories that contain your Docker image, and facilitate the
distribution of the image across different platforms. When you build an
image you either run it on your own machine, or you can push (upload)
it into registry and pull (download) it to another computer. Some
registries are public and others are private.
Docker container:
A Docker-based container, is a regular Linux container created from
docker based container image. A running container is a process
container running on the docker host container, but it is isolated from
the process of host and any other processes. This process is also
resource restricted process meaning it can only consume the amount of
resources that only allocated to it.
Kubernetes is container orchestration solution build for docker AND rct
The origin of Kubernetes:
Google created Borg system (later they created another system called
Omega) to help both application developers and system administrator
manage thousands of applications and services to achieve much higher
utilization for their resources. Kubernetes is an open-source system
based on the previous projects. So, Kubernetes is a system that help
you deploy and manage containerized applications on top of it.
The architecture of Kubernetes system:
Kubernetes consists of master node and any number of worker nodes,
when a developer submits a list of apps to a master node, Kubernetes
deploys these applications to the worker nodes. Developer can specify
certain nodes work together and Kubernetes will help by deploying
these application to the same worker node. Helping developers focus
on the core functions of the applications by taking care of the
infrastructures service jobs like service directory, scaling, load
balancing, self-healing, and leader election. Also help the system
administrator for better resource utilization.
The master node of Kubernetes hosts the Kubernetes control plane that
controls the whole Kubernetes system. And the worker node that runs
the actual applications you deploy.
The components of the Kubernetes cluster:
The master node contains the API server, Scheduler, etcd, and
controller manger, and the worker node contains kublet, container
runtime and kube proxy. The components of the master node can be in
one node or it can be replicated and split across different nodes to
ensure high availability. API server is what you and other control plane
components communicate with, the scheduler manages your
applications, assign worker node to each deployable component of
your application, the controller manager performs cluster-level
functions like replicating master node components, handling node
failure, keeping track of worker node, the etcd is a reliable distributed
data center that store the cluster configurations.
The worker nodes are the computers or machines that run your
applications, the task of running, monitoring and providing services to
your applications is done by:
Docker, rkt, or any other container runtime, which runs your container.
Kublet that communicate with the API server and manage containers
on its nodes. The Kubernetes service proxy which manages load-
balancing the network traffic between the application components.
To run an application on Kubernetes you need first to package it into
one or more container images, push these images into a registry, and
write a description of your application in the Kubernetes API server.
Description includes the container images, information about the
image’s components, how are these components will communicate
with each other, and which need to run together in one node. Services
whether it will run for external or internal client, so it need single IP.
Based on the information provided in the API server, the scheduler uses
these information to schedule those containers into the available
worker nodes, based on the computational resources for these
containers and the unallocated resources on each worker node. The
kublet then instruct the container runtime like docker or rkt to pull the
required image and run the container.
The app descriptor lists four container images into 3 group sets called

