Professional Documents
Culture Documents
Data Science is about finding patterns in data, through analysis, and make future predictions.
Data Science can be applied in nearly every part of a business where data is available. Examples
are:
• Consumer goods
• Stock markets
• Industry
• Politics
• Logistic companies
• E-commerce
• Machine Learning
• Statistics
• Programming (Python or R)
• Mathematics
• Databases
A Data Scientist must find patterns within the data. Before he/she can find the patterns, he/she must
organize the data in a standard format.
What is Data?
Data is a collection of information.
One purpose of Data Science is to structure data, making it interpretable and easy to work with.
• Structured data
• Unstructured data
Unstructured Data
Unstructured data is not organized. We must organize the data for analysis purposes.
Structured Data
Structured data is organized and easier to work with.
Example of an array:
[80, 85, 90, 95, 100, 105, 110, 115, 120, 125]
Database Table
A database table is a table with structured data.
The following table shows a database table with health data extracted from a sports watch:
30 80 120 240 10 7
30 85 120 250 10 7
45 90 130 260 8 7
45 95 130 270 8 7
This dataset contains information of a typical training session such as duration, average pulse,
calorie burnage etc.
Database Table Structure
A database table consists of column(s) and row(s):
Ro 30 80 120 240 10 7
w1
Ro 30 85 120 250 10 7
w2
Ro 45 90 130 260 8 7
w3
Ro 45 95 130 270 8 7
w4
Variables
A variable is defined as something that can be measured or counted.
In the example under, we can observe that each column represents a variable.
30 80 120 240 10 7
30 85 120 250 10 7
45 90 130 260 8 7
45 95 130 270 8 7
There are 6 columns, meaning that there are 6 variables (Duration, Average_Pulse, Max_Pulse,
Calorie_Burnage, Hours_Work, Hours_Sleep).
But if there are 11 rows, how come there are only 10 observations?
It is because the first row is the label, meaning that it is the name of the variable.
Data science began in statistics. Part of the evolution of data science was the
inclusion of concepts such as machine learning, artificial intelligence, and the
internet of things. With the flood of new information coming in and businesses
seeking new ways to increase profit and make better decisions, data science started
to expand to other fields, including medicine, engineering, and more.
Origins, Predictions, Beginnings
We could say that data science was born from the idea of merging applied statistics with
computer science. The resulting field of study would use the extraordinary power of modern
computing. Scientists realized they could not only collect data and solve statistical problems
but also use that data to solve real-world problems and make reliable fact-driven predictions.
1962: American mathematician John W. Tukey first articulated the data science dream. In his
now-famous article "The Future of Data Analysis," he foresaw the inevitable emergence of a
new field nearly two decades before the first personal computers. While Tukey was ahead of
his time, he was not alone in his early appreciation of what would come to be known as "data
science." Another early figure was Peter Naur, a Danish computer engineer whose book
Concise Survey of Computer Methods offers one of the very first definitions of data science:
"The science of dealing with data, once they have been established, while the relation of the
data to what they represent is delegated to other fields and sciences."
1977: The theories and predictions of "pre" data scientists like Tukey and Naur became more
concrete with the establishment of The International Association for Statistical Computing
(IASC), whose mission was "to link traditional statistical methodology, modern computer
technology, and the knowledge of domain experts in order to convert data into information and
knowledge."
1980s and 1990s: Data science began taking more significant strides with the emergence of
the first Knowledge Discovery in Databases (KDD) workshop and the founding of the
International Federation of Classification Societies (IFCS). These two societies were among
the first to focus on educating and training professionals in the theory and methodology of data
science (though that term had not yet been formally adopted).
It was at this point that data science started to garner more attention from leading professionals
hoping to monetize big data and applied statistics.
1990s and early 2000s: We can clearly see that data science has emerged as a recognized and
specialized field. Several data science academic journals began to circulate, and data science
proponents like Jeff Wu and William S. Cleveland continued to help develop and expound
upon the necessity and potential of data science.
2000s: Technology made enormous leaps by providing nearly universal access to internet
connectivity, communication, and (of course) data collection.
2005: Big data enters the scene. With tech giants such as Google and Facebook uncovering
large amounts of data, new technologies capable of processing them became necessary.
Hadoop rose to the challenge, and later on Spark and Cassandra made their debuts.
2014: Due to the increasing importance of data, and organizations’ interest in finding patterns
and making better business decisions, demand for data scientists began to see dramatic growth
in different parts of the world.
2015: Machine learning, deep learning, and Artificial Intelligence (AI) officially enter the
realm of data science. These technologies have driven innovations over the past decade — from
personalized shopping and entertainment to self-driven vehicles along with all the insights to
efficiently bring forth these real-life applications of AI into our daily lives.
2018: New regulations in the field are perhaps one of the biggest aspects in the evolution in
data science.
2020s: We are seeing additional breakthroughs in AI, machine learning, and an ever-more-
increasing demand for qualified professionals in Big Data
In other words, data scientists are working tirelessly toward developments in deep learning to
make computers smarter. These developments can bring about advanced robotics paired with
a powerful AI. Experts predict the AI will be capable of understanding and interacting
seamlessly with humans, self-driving vehicles, and automated public transportation in a world
interconnected like never before. This new world will be made possible by data science.
Perhaps, on the more exciting side, we may see an age of extensive automation of labor in the
near future. This is expected to revolutionize the healthcare, finance, transportation, and
defense industries.
With data becoming increasingly important to companies’ bottom lines, data scientists are
some of the most valuable individuals in the professional world today. But with more open
roles in the field than ever before, the initial excitement from seeing all those job postings can
quickly turn to anxiety when you realize just how many options are out there. Not to mention
the positions that you don’t even know about yet.
What is a Data Scientist?
The first thing to keep in mind is that the phrase “data scientist” tends to be used as an umbrella
term for just about every job in the field. In fact, if you ask ten different people what they think
a data scientist does, you’ll likely get ten different answers.
The reality is, data science is a vast field that employs individuals in a variety of roles and
responsibilities. They may hold the title of Data Analyst, Business Analyst, Software
Developer or Marketing Data Scientist—just to name a few. Because data has become so
prevalent in our everyday lives, it might surprise you to learn which industries are actively
searching for and hiring data science professionals.
Unfortunately, many companies’ job listings don’t always get the distinction right, so it’s
crucial to understand your personal and professional goals before starting your search for a data
boot camp and eventually, a job in the field.
Data science teams are constantly faced with complex problems they need to solve using—you
guessed it—data. Whether it’s analyzing the sentiment of incoming communications (like
Tweets or survey responses), tracking sales leads, or devising a new marketing campaign, there
are a variety of data science jobs assigned to perform the myriad processes required of the field.
While many of these positions share some of the same tools and responsibilities, the day-to-
day experience for each can vary drastically.
Whether you’re preparing to enroll in a boot camp or you’re starting the job hunt for a position
in the field, you should have a basic idea of how you want to apply your skills. Take a look at
some of the most in-demand data science jobs to get a better understanding of how they fit into
their respective teams.
Data Analyst
As a typical entry-level position, a Data Analyst’s primary job is to develop systems that collect
and sift through company data, then use it to extract insights that answer business questions
with actionable solutions. Individuals in this role should have a keen eye for detail and the
ability to brainstorm new approaches to analyzing data. Often times, Data Analysts are tapped
to work with a variety of departments and individuals, so collaboration and communication
skills are a must, especially when explaining technical ideas to non-technical teams.
Responsibilities: Accessing and cleaning data, performing statistical analysis, visualizing and
communicating the results
Top industries: Finance, insurance, gambling, retail banking, consumer products, healthcare,
energy
Data Scientist
Think of a Data Scientist as taking the Data Analyst role another step further down the data
science funnel. Data Scientists take on many of the same responsibilities as analysts, but they’re
also responsible for building machine learning models and working with algorithms to make
accurate predictions based on collected data—ultimately making Data Analysts’ jobs a little
easier. Of course, it’s always good to know how analysis fits into the larger picture, and
successful Data Scientists have a solid understanding of handling raw data, analyzing it and
sharing insights in a compelling way. Since the role tends to be more independent, motivation
and curiosity go a long way for these professionals.
Responsibilities: Analyzing data, building and training machine learning models to make
reliable future predictions
Tools/skills required: Everything required from a data analyst, plus strong foundations in math,
analytics and computer science, knowledge of machine learning methods, statistical models,
advanced data science programming and familiarity with Apache Spark
Growth potential: Data Scientists may move on to become a senior data scientist, while some
decide to take the path to become a machine learning engineer or a chief data officer
Interested in learning more about data scientists? Read this article on the Data Scientist Skills
Employers Want to See.
Business Analyst
In order for Data Analysts’ insights to be communicated throughout a company, it’s up to the
Business Analyst to use storytelling techniques to turn them into actionable business insights.
The main goal for individuals in this role is to facilitate potential solutions to organizational
problems, but they should also be prepared to take on additional responsibilities like quality
assurance and management. Needless to say, time management and prioritization are common
traits shared among successful Business Analysts—and you’re not likely to get hired as one
without them. While it’s not a heavily tech-focused role, understanding how to apply a variety
of business processes using high-level strategic thinking is a crucial skill for these data science
specialists.
Growth potential: With experience, many Business Analysts take on a leadership title or move
on to more senior roles in product management
Software Engineer
Nowadays, most software companies want to leverage their users’ data to optimize their
offerings, while data-driven businesses have turned to creating custom software built around
their specific needs or goals. That’s where Software Engineers come in. Depending on the type
of company, a Software Engineer might be tasked with optimizing certain product features
based on user data, or they might be responsible for building a new program that will ultimately
increase a company’s bottom line. Needless to say, individuals holding these roles should be
well-versed in programming and data analytics to truly be successful.
Responsibilities: Collaborate with data scientists and business analysts to ensure alignment
between the business objectives and the analytics back-end of the software they are working to
produce or modify, as well as ensure the scalability and security of the final product
Tools/skills required: Experience with machine learning and deep learning frameworks,
understanding of mathematics including linear algebra and statistics, strong programming and
debugging skills, data processing, writing and communication and attention to detail
Growth potential: Given the fact that this is a relatively new role within the industry, the
opportunities for individuals holding this role are virtually endless
Top industries: Retail, healthcare, research and development, government and defense, IT
services
When a company builds a new campaign, it’s up to the Marketing Data Scientist to analyze
company data and user research to inform the marketing strategy around the launch and
measure its outcomes. On a granular level, this could involve anything from email marketing
and search engine optimization (SEO) to web analytics and growth hacking—and everything
in between! To be a successful Marketing Data Scientist, candidates need to have the ability to
leverage data to enhance key marketing components and achieve desired company outcomes.
Because market data tends to change rapidly, Marketing Data Scientists should be able to adapt
to the pace at which campaigns progress.
Responsibilities: Gather and analyze data to objectively strategize the launch and evolution of
a business’s promotions and marketing campaigns while communicating between stakeholders
Growth potential: With so many specialties to choose from, the sky’s the limit for individuals
holding a Marketing Data Scientist role, some of whom go on to hold senior-level positions or
even start their own companies
While Data Scientists build a company’s machine learning models and Data Analysts
determine which data is worthy of exploring, it’s the Machine Learning Engineer who wrangles
and applies the algorithms to the datasets. Usually, the ultimate goal for individuals in this role
is to eventually create artificial intelligence. There’s plenty of trial-and-error involved in the
job, so persistence and resilience are key contributors to success. In addition, having a solid
understanding of how long it takes to apply various approaches will also prove advantageous
in this field.
Growth potential: Many Machine Learning Engineers progress to become more specialized in
deep learning methods, while others transition to machine learning researchers or leads on data
engineering teams
1. Business Understanding: The complete cycle revolves around the enterprise goal. What will
you resolve if you do not longer have a specific problem? It is extraordinarily essential to
apprehend the commercial enterprise goal sincerely due to the fact that will be your ultimate
aim of the analysis. After desirable perception only we can set the precise aim of evaluation
that is in sync with the enterprise objective. You need to understand if the customer desires to
minimize savings loss, or if they prefer to predict the rate of a commodity, etc.
3. Preparation of Data: Next comes the data preparation stage. This consists of steps like
choosing the applicable data, integrating the data by means of merging the data sets, cleaning
it, treating the lacking values through either eliminating them or imputing them, treating
inaccurate data through eliminating them, additionally test for outliers the use of box plots and
cope with them. Constructing new data, derive new elements from present ones. Format the
data into the preferred structure, eliminate undesirable columns and features. Data preparation
is the most time-consuming but arguably the most essential step in the complete existence
cycle. Your model will be as accurate as your data.
4. Exploratory Data Analysis: This step includes getting some concept about the answer and
elements affecting it, earlier than constructing the real model. Distribution of data inside
distinctive variables of a character is explored graphically the usage of bar-graphs, Relations
between distinct aspects are captured via graphical representations like scatter plots and
warmth maps. Many data visualization strategies are considerably used to discover each and
every characteristic individually and by means of combining them with different features.
5. Data Modeling: Data modeling is the coronary heart of data analysis. A model takes the
organized data as input and gives the preferred output. This step consists of selecting the
suitable kind of model, whether the problem is a classification problem, or a regression problem
or a clustering problem. After deciding on the model family, amongst the number of algorithms
amongst that family, we need to cautiously pick out the algorithms to put into effect and enforce
them. We need to tune the hyperparameters of every model to obtain the preferred performance.
We additionally need to make positive there is the right stability between overall performance
and generalizability. We do no longer desire the model to study the data and operate poorly on
new data.
6. Model Evaluation: Here the model is evaluated for checking if it is geared up to be deployed.
The model is examined on an unseen data, evaluated on a cautiously thought out set of
assessment metrics. We additionally need to make positive that the model conforms to reality.
If we do not acquire a quality end result in the evaluation, we have to re-iterate the complete
modelling procedure until the preferred stage of metrics is achieved. Any data science solution,
a machine learning model, simply like a human, must evolve, must be capable to enhance itself
with new data, adapt to a new evaluation metric. We can construct more than one model for a
certain phenomenon, however, a lot of them may additionally be imperfect. The model
assessment helps us select and construct an ideal model.
7. Model Deployment: The model after a rigorous assessment is at the end deployed in the
preferred structure and channel. This is the last step in the data science life cycle. Each step in
the data science life cycle defined above must be laboured upon carefully. If any step is
performed improperly, and hence, have an effect on the subsequent step and the complete effort
goes to waste. For example, if data is no longer accumulated properly, you’ll lose records and
you will no longer be constructing an ideal model. If information is not cleaned properly, the
model will no longer work. If the model is not evaluated properly, it will fail in the actual
world. Right from Business perception to model deployment, every step has to be given
appropriate attention, time, and effort.
Applications of Data Science
1. In Search Engines
The most useful application of Data Science is Search Engines. As we know when we want to
search for something on the internet, we mostly used Search engines like Google, Yahoo,
Safari, Firefox, etc. So Data Science is used to get Searches faster.
For Example, When we search something suppose “Data Structure and algorithm courses ”
then at that time on the Internet Explorer we get the first link of GeeksforGeeks Courses. This
happens because the GeeksforGeeks website is visited most in order to get information
regarding Data Structure courses and Computer related subjects. So this analysis is Done using
Data Science, and we get the Topmost visited Web Links.
2. In Transport
Data Science also entered into the Transport field like Driverless Cars. With the help of
Driverless Cars, it is easy to reduce the number of Accidents.
For Example, In Driverless Cars the training data is fed into the algorithm and with the help of
Data Science techniques, the Data is analyzed like what is the speed limit in Highway, Busy
Streets, Narrow Roads, etc. And how to handle different situations while driving etc.
3. In Finance
Data Science plays a key role in Financial Industries. Financial Industries always have an issue
of fraud and risk of losses. Thus, Financial Industries needs to automate risk of loss analysis in
order to carry out strategic decisions for the company. Also, Financial Industries uses Data
Science Analytics tools in order to predict the future. It allows the companies to predict
customer lifetime value and their stock market moves.
For Example, In Stock Market, Data Science is the main part. In the Stock Market, Data Science
is used to examine past behavior with past data and their goal is to examine the future outcome.
Data is analyzed in such a way that it makes it possible to predict future stock prices over a set
timetable.
4. In E-Commerce
E-Commerce Websites like Amazon, Flipkart, etc. uses data Science to make a better user
experience with personalized recommendations.
For Example, When we search for something on the E-commerce websites we get suggestions
similar to choices according to our past data and also we get recommendations according to
most buy the product, most rated, most searched, etc. This is all done with the help of Data
Science.
5. In Health Care
In the Healthcare Industry data science act as a boon. Data Science is used for:
Detecting Tumor.
Drug discoveries.
6. Image Recognition
Currently, Data Science is also used in Image Recognition. For Example, When we upload our
image with our friend on Facebook, Facebook gives suggestions Tagging who is in the picture.
This is done with the help of machine learning and Data Science. When an Image is
Recognized, the data analysis is done on one’s Facebook friends and after analysis, if the faces
which are present in the picture matched with someone else profile then Facebook suggests us
auto-tagging.
7. Targeting Recommendation
Targeting Recommendation is the most important application of Data Science. Whatever the
user searches on the Internet, he/she will see numerous posts everywhere. This can be explained
properly with an example: Suppose I want a mobile phone, so I just Google search it and after
that, I changed my mind to buy offline. Data Science helps those companies who are paying
for Advertisements for their mobile. So everywhere on the internet in the social media, in the
websites, in the apps everywhere I will see the recommendation of that mobile phone which I
searched for. So this will force me to buy online.
With the help of Data Science, Airline Sector is also growing like with the help of it, it becomes
easy to predict flight delays. It also helps to decide whether to directly land into the destination
or take a halt in between like a flight can have a direct route from Delhi to the U.S.A or it can
halt in between after that reach at the destination.
In most of the games where a user will play with an opponent i.e. a Computer Opponent, data
science concepts are used with machine learning where with the help of past data the Computer
will improve its performance. There are many games like Chess, EA Sports, etc. will use Data
Science concepts.
The process of creating medicine is very difficult and time-consuming and has to be done with
full disciplined because it is a matter of Someone’s life. Without Data Science, it takes lots of
time, resources, and finance or developing new Medicine or drug but with the help of Data
Science, it becomes easy because the prediction of success rate can be easily determined based
on biological data or factors. The algorithms based on data science will forecast how this will
react to the human body without lab experiments.
Various Logistics companies like DHL, FedEx, etc. make use of Data Science. Data Science
helps these companies to find the best route for the Shipment of their Products, the best time
suited for delivery, the best mode of transport to reach the destination, etc.
12. Autocomplete
AutoComplete feature is an important part of Data Science where the user will get the facility
to just type a few letters or words, and he will get the feature of auto-completing the line. In
Google Mail, when we are writing formal mail to someone so at that time data science concept
of Autocomplete feature is used where he/she is an efficient choice to auto-complete the whole
line. Also in Search Engines in social media, in various apps, AutoComplete feature is widely
used.
What is Data Security?
Beginning with ‘What is data security,’ it is defined as the protection from unknown, unwanted
or external access to data. It refers to protection from a data breach, corruption, modification
and theft. The strategies to set up data security include hashing, data encryption and
tokenization. In other words, it refers to protecting the information from unauthorized access
throughout its lifecycle. The protection requiring components of data security include software,
user and storage devices, hardware, organization’s policies and procedures and access and
administrative controls.
Data security is achieved via different tools which enable encryption, data masking and
redaction of confidential information. Data security is achieved by following strict regulations,
and setting up a practical and efficient management process, reducing data security breaches
and human error.
Data security is of utmost importance in today’s digital age. It refers to data protection from
unauthorized access, use, disclosure, alteration, or destruction. Here are several reasons why
data security is crucial:
Compliance with Regulations: Many industries are subject to strict regulations regarding the
protection and privacy of data. Compliance with these regulations, such as the General Data
Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act
(HIPAA), is mandatory. Failure to comply can lead to legal consequences, fines, and
reputational damage.
Prevention of Data Breaches: Data breaches can have severe consequences for businesses.
They can lead to financial loss, reputational damage, and loss of customer trust. Implementing
robust data security measures reduces the risk of data breaches and helps protect valuable
assets.
Business Continuity and Disaster Recovery: it is crucial in business continuity and disaster
recovery planning. Regular data backups, secure storage, and disaster recovery plans ensure
that critical business data can be recovered in the event of a data loss or system failure.
Data Security
Data security protects data from unauthorized access, use, disclosure, alteration, or destruction.
It involves implementing technical, physical, and procedural measures to protect data integrity,
confidentiality, and availability.
It measures include encryption, firewalls, access controls, intrusion detection systems, and data
backup.
The main goal of data security is to prevent data breaches and unauthorized access and protect
data from external threats.
Data Privacy
Data privacy, on the other hand, focuses on controlling the collection, use, disclosure, and
sharing of personal data.
It ensures that individuals control their personal information and how organizations use it.
Data privacy involves implementing policies, procedures, and measures to comply with privacy
laws and regulations.
It includes obtaining consent for data collection and processing, providing transparency about
data usage, and respecting individuals’ rights.
Data privacy also addresses issues such as data anonymization, data retention, and data subject
rights.
Malware, also known as malicious software, is a broad category that includes multiple types
of software designed to harm computer systems. This includes various variants such as
spyware, viruses, and ransomware, which can contribute to a data breach. Malware refers to
code created by cyber attackers intending to damage or gain unauthorized access to a system
or data. Malware is activated by clicking on an attachment or malicious link. Once activated,
malware can cause a variety of harmful actions:
Installation of additional harmful software
The mobile data breach is a well-known example of a data leak of around 37 million customers
through malware. Eventually, the company agreed to pay customers who filed class action
lawsuits around $350 million.
Phishing Attacks
Phishing attacks are fake communication methods with the wrong intent. Users often receive
these as emails depicting sent from a trusted source. The components are a set of instructions
asked for the receiver to follow. The actions may include revealing confidential information
like credit card numbers, login information, CVV and other similar details. The messages or
communication method may also contain links that can compromise the data on clicks.
Social Engineering
Insider Threats
These refer to internally generated threats from the company or organization. These can be
non-deliberate or intentional and are as follows:
Malicious insiders aim to steal data or harm the organization for personal benefit.
Non-malicious insider threats are unaware individuals who accidentally set up the trap.
Compromised insiders are unaware of their system or account being compromised. The
harmful activities happen from the person’s account without their knowledge.
Physical Theft or Loss of Devices
Portable devices such as laptops, pen drives, and hard drives are easily stealable things with
the potential to cause excessive harm to the company and user. Limiting access to such devices
is one of the standard methods to protect data.
Here are some of the best practices for improving data security:
Generally, online-based components come already coupled with enhanced data security. The
feature includes accepting only strong passwords with variable types of digits, increasing the
possible combination of code if put in by guesswork. Additionally, multi-factor authentication
requires different devices to be in proximity and authority to login into the specific account.
Crossing multiple levels of security checks is uncommon and highly challenging enough.
The software and systems often encounter bugs. However, software updates aim to resolve
such shortcomings, providing enhanced security. It closes the window for internal or external
data security breaches.
Access control is essential in providing data security by limiting access to a restricted number
of users. It promotes accountability and responsibility among a selected group of individuals.
Every organization and department must take this crucial step to ensure data security. Access
control only allows permission or visual access to specific sections corresponding to a user’s
job role. For instance, the finance team does not need access to the software workflow, and
vice versa. By implementing access control measures, an organization can ensure that only
authorized individuals access sensitive data, reducing the risk of unauthorized access and data
breaches.
Regardless of the data’s current usage status, ensure to follow data encryption. It refers to
converting the data into an unreadable and non-decodable format. This happens through
algorithm and key, which protects the integrity and confidentiality of data. The data in transit
and the rest are prone to attack and must undergo encryption.
The above-stated data security threats include system compromise. It leads to an inability to
perform activities due to a lack of data availability. Thus, regular data backup helps modify
and use it to prevent harm. It decreases the harm as the lost information due to data breach may
take longer to recover.
The updated information on possible attacks and prevention methods can protect company data
from numerous losses. It enables the employees to take mindful actions and precautions while
dealing with unknown or strange data. It also makes them aware of how to identify social
engineering attacks. Enlighten them about ‘what is data security’ and other crucial aspects such
as data security regulations like PCI DSS, HIPAA and others.