You are on page 1of 11

KONERU LAKSHMAIAH EDUCATION FOUNDATION

(Deemed to be University estd, u/s, 3 of the UGC Act,

1956) (NAAC Accredited “A++” Grade University)

Green Fields, Guntur District, A.P., India – 522502

B.Tech. IInd Year


PROGRAM
A.Y.2023-24 EVEN
Semester
22AD2227 DATA ANALYTICS AND VISUALIZATION

CO 1

Session 1.: INTRODUCTION OF DATA SCIENCE

1. Course Description
We need Data analytics and visualization are integral components of the data-driven
decision-making process. Data analytics involves the exploration, analysis, and interpretation
of data to extract meaningful patterns and insights, utilizing techniques such as descriptive,
diagnostic, predictive, and prescriptive analytics. On the other hand, data visualization
transforms data into graphical representations, enhancing comprehension and
communication of complex information through charts, graphs, and interactive dashboards.
Together, they empower individuals and organizations to not only understand their data but
also effectively communicate their findings, facilitating informed decision-making in a wide
range of industries and applications.

2. Aim

Understand the modelling of various types of data analytics and the Visualization fundamentals.

3. Instructional Objectives (Course Objectives)

 Understand the modelling of various types of data analysis and the Visualization
fundamentals.
 Apply methods and tools in descriptive statistics to summarize and explore datasets, using
measures like mean, median, variance, and graphical representations like histograms and
box plots.
 Apply methods for Scientific/ Spatial Data Visualization and Web data visualization
 Use Dashboard and its categories.

4. Learning Outcomes (Course Outcome)

Students will be able to Understand the modelling of various types of data and the
Visualization fundamentals.
5. Module Description (CO-1 Description)

Data Modeling : Conceptual models, Spread sheet models, Relational Data Models,
object-oriented models, semi structured data models, unstructured data models.
Visualization Fundamentals, Design principles, The Process of Visualization, Data
Abstraction, Visual Encodings, Use of Color, Perceptual Issues, Designing Views,
Interacting with Visualizations, Filtering and Aggregation

1. Course Description
We need data visualization because a visual summary of information makes it easier
to identify patterns and trends than looking through thousands of rows on a spread
sheet. It's the way the human brain works. Since the purpose of data analysis is to
gain insights, data is much more valuable when it is visualized. Even if a data
analyst can pull insights from data without visualization, it will be more difficult to
communicate the meaning without visualization. Charts and graphs make
communicating data findings easier even if you can identify the patterns without
them. Data visualization has many uses. Each type of data visualization can be used
in different ways. Data visualization can also: Identify areas that need attention or
improvement.

2. Aim

Apply methods and tools for Non-Spatial Data Visualization.

3. Instructional Objectives (Course Objectives)

a. Understand the modelling of various types of data and the Visualization


fundamentals.
b. Apply methods and tools for Non-Spatial Data Visualization
c. Apply methods for Scientific/ Spatial Data Visualization and Web data visualization.
d. Use Dashboard and its categories.

4. Learning Outcomes (Course Outcome)

Students will be able to Apply methods and tools for Non-Spatial Data Visualization

5. Modul Description (CO-1 Description)

Introduction to Data Science: Evolution of Data Science, Data Science Roles, Stages in a
Data Science Project, Applications of Data Science in various fields, Data Security Issues
Data Collection Strategies. Data Pre-Processing Overview.

6. Session Introduction
Data science is a multidisciplinary field that combines expertise from computer science,
statistics, and domain-specific knowledge to extract valuable insights and knowledge from
large and complex datasets. It involves a systematic approach to data collection, analysis,
interpretation, and communication, with the goal of informing data-driven decision-
making and solving real-world problems.

7. Session description

Introduction to Data Science


Data Science is a multidisciplinary field that combines techniques from statistics,
computer science, and domain expertise to extract insights and knowledge from data. It
encompasses a wide range of activities, from data collection and cleaning to analysis and
interpretation, with the goal of making data-driven decisions and predictions.
Some key components and concepts in data science:
Data Collection: Data scientists gather data from various sources, which can include
databases, sensors, social media, and more. This data can be structured (e.g., databases)
or unstructured (e.g., text or images).
Data Cleaning and Preprocessing: Raw data often needs to be cleaned and pre-processed
to remove noise, handle missing values, and ensure data quality. This step is crucial to
ensure the accuracy and reliability of subsequent analyses.
Exploratory Data Analysis (EDA): EDA involves visualizing and summarizing data to
identify patterns, trends, and potential relationships. It helps data scientists gain an initial
understanding of the data.
Machine Learning: Machine learning is a subset of data science that involves building
models and algorithms that can learn from data to make predictions or decisions.
Supervised learning, unsupervised learning, and reinforcement learning are common
approaches in this field.
Data Visualization: Visualizations such as charts, graphs, and dashboards are used to
communicate findings and insights effectively. Visualization is essential for presenting
complex data in an understandable and actionable format.
Statistical Analysis: Statistics plays a crucial role in data science, helping to quantify
uncertainty, assess the significance of findings, and make data-driven decisions.
Big Data: Data science often deals with large volumes of data, known as big data.
Technologies like Hadoop and Spark are used to process and analyze these massive
datasets.
Domain Expertise: Understanding the specific domain or industry in which data science
is being applied is vital. Domain knowledge helps in formulating meaningful questions
and interpreting results in a relevant context.
Data Ethics and Privacy: Data scientists must be aware of ethical considerations and
privacy concerns when working with data. This includes ensuring the responsible use of
data and complying with regulations like GDPR.
Programming: Proficiency in programming languages like Python and R is essential for
data scientists to write code for data analysis and modeling.
Data science is widely used in various fields, including finance, healthcare, marketing, e-
commerce, and more. It empowers organizations to make informed decisions, optimize
processes, and develop data-driven strategies. As the volume of data continues to grow,
the demand for data scientists and their skills is expected to increase, making data
science a dynamic and evolving field.
Figure: Session 1

Applications of Data Science


Data science has a wide range of real-time applications across various industries. These
applications leverage real-time data processing and analysis to make immediate decisions
and provide valuable insights. Here are some notable real-time applications of data
science:
Predictive Maintenance: In industries like manufacturing, utilities, and transportation,
data science is used to monitor equipment and predict when maintenance is required.
Sensors collect real-time data, which is then analyzed to detect anomalies and potential
failures, helping to prevent costly downtime.
Fraud Detection: In the financial sector, data science is used to identify fraudulent
transactions in real time. Machine learning models analyze transaction data and raise
alerts when they detect unusual patterns or suspicious activities.
Recommendation Systems: Online platforms like Netflix, Amazon, and Spotify use
data science to provide real-time recommendations to users. These systems analyze user
behaviour and preferences to suggest relevant content or products instantly.
Figure: Session 2

Internet of Things (IoT): IoT devices generate vast amounts of real-time data. Data
science is used to process this data, extract valuable insights, and trigger actions based on
the data, such as adjusting thermostat settings in a smart home or optimizing logistics in
supply chain management.
Stock Market Analysis: Financial institutions and traders use real-time data analysis to
make split-second decisions in the stock market. Algorithms analyze market data, news,
and social media sentiment to inform trading strategies.
Healthcare Monitoring: Wearable devices and sensors collect real-time health data,
which can be analyzed to monitor patients' health conditions. In cases of critical health
events, immediate alerts can be sent to healthcare providers or emergency services.
Online Advertising: Advertisers use real-time bidding and data science to target users
with relevant ads. Bids are adjusted in real time based on user behavior, demographics,
and other data to maximize ad placement effectiveness.
Traffic Management: Cities use data science to analyze real-time traffic data from
sensors and GPS devices to optimize traffic signal timings, reroute traffic, and manage
congestion.
Energy Grid Optimization: Utility companies use real-time data analysis to optimize
the distribution of energy across the grid. This includes load forecasting, demand-
response programs, and the integration of renewable energy sources.
Weather Forecasting: Meteorologists use real-time data from weather stations,
satellites, and other sources to generate accurate and up-to-the-minute weather forecasts.
This is crucial for disaster preparedness and resource allocation.
E-commerce Inventory Management: Retailers use real-time data to manage inventory
efficiently. Data science helps in predicting demand, optimizing restocking, and reducing
overstock and understock situations.
Social Media Sentiment Analysis: Companies monitor social media in real time to
gauge public sentiment about their products or services. This can inform marketing
strategies and help address customer concerns promptly.
These are just a few examples of how data science is used in real-time applications to
extract insights and make instant decisions. The ability to process and analyze data in
real time has become increasingly important in today's fast-paced, data-driven world,
enabling businesses and organizations to respond swiftly to changing conditions and
make informed choices.

Evolution of Data Science

The field of data science has evolved significantly over the years, with its development
closely tied to advances in technology, data availability, and the changing needs of
organizations and industries. Here's a brief overview of the evolution of data science:
Early Foundations (1960s-1980s):
The roots of data science can be traced back to statistics and computer science.
Early data analysis focused on small datasets and relied on traditional statistical methods.
Growth of Data Warehousing (1990s):
The emergence of data warehousing allowed organizations to collect and store large
volumes of data.
Business Intelligence (BI) tools became popular for data reporting and analysis.
Big Data Era (2000s):
The explosion of digital data, including web data, social media data, and sensor data, led
to the term "big data."
Technologies like Hadoop and NoSQL databases were developed to process and manage
massive datasets.
Emergence of Data Science (2000s-2010s):

The term "data science" gained popularity, representing a multidisciplinary approach to


data analysis.
Data scientists began to use machine learning and data mining techniques to uncover
insights from data.
Mainstream Adoption (2010s):
Data science became an integral part of businesses and organizations, driving decision-
making, product development, and customer insights.
The demand for data scientists and data engineers increased significantly.
Machine Learning and Deep Learning (2010s):
Advances in machine learning and deep learning led to breakthroughs in image
recognition, natural language processing, and recommendation systems.
Ethics and Privacy Concerns (2010s-Present):
As data collection and analysis expanded, concerns about data privacy and ethics became
more prominent, leading to regulations like GDPR.
The responsible use of data and ethical considerations in AI and machine learning have
gained attention.
AI Integration (2010s-Present):
Data science has become closely intertwined with artificial intelligence (AI), with
machine learning playing a central role in AI applications.
AI-driven technologies have been adopted in various industries, from healthcare to
finance.
Data Science in the Cloud (2010s-Present):
Cloud computing platforms like AWS, Azure, and Google Cloud have made it easier to
store, process, and analyze data at scale.
Interdisciplinary Nature (Present):
Data science has become increasingly interdisciplinary, involving skills in mathematics,
computer science, domain knowledge, and communication.
Automated Machine Learning (AutoML) (Present):
The development of AutoML tools has made it easier for non-experts to build and deploy
machine learning models.
Continual Evolution (Ongoing):
Data science continues to evolve with emerging technologies, such as quantum
computing and edge computing, and new applications in fields like healthcare,
autonomous vehicles, and more.
Data science remains a dynamic and evolving field that is likely to continue changing as
technology, data, and the needs of society and industry evolve. Its impact on decision-
making, innovation, and our understanding of the world is expected to grow in the
coming years.

8. Activities/ Case studies/related to the session.

Case Study: Customer Churn Prediction for a Telecom Company


Background: A telecom company is experiencing a high rate of customer churn, where
subscribers are canceling their contracts and switching to competitors. The company
wants to use data science to identify factors that contribute to churn and predict
which customers are most likely to leave. By proactively addressing these issues,
they aim to reduce churn and improve customer retention.
Objectives:
1. Analyze the data to identify key factors contributing to customer churn.
2. Build a predictive model to identify customers at risk of churn.
3. Provide actionable recommendations for reducing churn.
Data: The dataset includes historical customer information, such as contract length,
monthly charges, usage patterns, customer demographics, and whether the customer
churned (yes/no).
Data Science Process:
1. Data Collection and Exploration:
 Gather the dataset and examine its structure.
 Explore the data to understand its characteristics, including summary statistics and
visualizations.
2. Data Preprocessing:
 Handle missing data and outliers.
 Encode categorical variables.
 Split the data into training and testing sets.
3. Exploratory Data Analysis (EDA):
 Conduct EDA to identify patterns and relationships between variables.
 Use visualizations to gain insights into factors associated with customer churn.
4. Feature Selection:
 Identify important features that significantly affect churn.
 Feature selection techniques may include correlation analysis and feature importance
rankings.
5. Model Building:
 Select appropriate machine learning algorithms (e.g., logistic regression, decision
trees, random forest).
 Train the model on the training data.
6. Model Evaluation:
 Evaluate the model's performance on the testing dataset using metrics like accuracy,
precision, recall, and F1-score.
 Use a confusion matrix to understand true positives, true negatives, false positives,
and false negatives.
7. Predictive Analytics:
 Use the trained model to predict which customers are at high risk of churn.
 The model will output probabilities or predictions for each customer.
8. Actionable Insights:
 Provide actionable recommendations based on the model's insights. For example,
offer targeted incentives or retention strategies to at-risk customers.
9. Monitoring and Continuous Improvement:
 Implement the recommendations and monitor the impact on churn rates.
 Continuously update the model with new data to improve its accuracy and
effectiveness.
This case study demonstrates the application of data science basics in solving a real-
world problem. By analyzing historical data and building a predictive model, the
telecom company can take proactive steps to reduce customer churn and improve
their business outcomes.

9. Examples & contemporary extracts of articles/ practices to convey the idea


of the session.

Case Study: Predicting Student Exam Scores


Background: A high school wants to improve its students' academic performance by
identifying factors that influence exam scores. The school collects data on student
demographics, study hours, attendance, and previous exam scores.
Objectives:
1. Analyze the data to understand the key factors that impact students' exam scores.
2. Build a predictive model to forecast a student's exam score based on the available
data.
3. Provide actionable recommendations to help students improve their performance.
Data: The dataset contains information on a sample of students, including their age,
gender, study hours, attendance, and past exam scores. The target variable is the final
exam score.
Data Science Process:
This case study demonstrates the application of data science in an educational context. By
analyzing student data and building a predictive model, the school can identify areas for
improvement and develop strategies to enhance student performance.
10. SAQ's-Self Assessment Questions

1.Data science is also known as


A) Data Driven science.
b) Data Mining
c) Big data
d) Information retrieval
2. Data preprocessing, cleaning, and feature engineering
a ) Data Scientist
b) Machine Learning Engineer
c) Data Analyst
d)None
11. Summary
Data science is a multidisciplinary field that combines principles from computer science,
statistics, and domain-specific knowledge to extract valuable insights from data. It
involves a systematic process that includes data collection, cleaning, analysis, and
interpretation, with the primary aim of supporting data-driven decision-making and
solving real-world problems. The field encompasses essential components such as data
preprocessing, exploratory data analysis, statistical analysis, machine learning, data
visualization, and communication of findings. Data science has become increasingly
important in the age of big data, enabling organizations to harness the power of data to
gain a competitive edge and drive innovation in various domains. It offers a wide range of
career opportunities for individuals with skills in data analysis, modeling, and
visualization and continues to evolve with technological advancements and changing data
landscapes. Ethical considerations, including data privacy and responsible data use, are
integral to the practice of data science, ensuring that it benefits society while respecting
individual rights and values.

12. Terminal Questions

1. Can you explain the significance of data collection, data cleaning, and data
preprocessing in the data science workflow?
2. How does data visualization play a role in data science, and why is it
important?
3. What is predictive modeling, and how does it relate to data science?
4. Describe the ethical considerations associated with working with data in the
field of data science.
5. What are some key milestones in the development of data science techniques
and methodologies?
6. What are some of the challenges and ethical considerations that have emerged
over the years as data science has grown in importance and scale?
13. Case Studies (Co Wise)
14. Answer Key
1.d 2.b

15. Glossary

Data Science: A multidisciplinary field that combines computer science, statistics, and
domain knowledge to extract insights from data.

Data Analysis: The process of examining, cleaning, transforming, and interpreting data
to discover patterns, trends, and insights.

Data Visualization: The representation of data using charts, graphs, and visual elements
to aid in understanding and communication.

Data Preprocessing: The initial step in data analysis that involves cleaning and
preparing data for analysis by addressing missing values, outliers, and inconsistencies.

Exploratory Data Analysis (EDA): The practice of visually and statistically exploring
data to understand its characteristics and relationships.

Predictive Modeling: Building models that make predictions based on historical data,
often using machine learning algorithms.

Descriptive Statistics: Numerical and graphical methods used to summarize and


describe the main features of a dataset.

Feature Engineering: Creating new variables from existing data to improve model
performance.

Machine Learning: A subset of artificial intelligence that focuses on developing


algorithms that allow computers to learn and make predictions from data.

Data Mining: The process of discovering patterns and relationships within large
datasets.

Big Data: Extremely large and complex datasets that traditional data processing tools are
inadequate to handle.

Data Scientist: A professional who specializes in analyzing and interpreting data,


developing models, and providing insights to drive data-driven decision-making.

Hypothesis Testing: A statistical technique used to test hypotheses and make inferences
about data.

Feature Selection: Identifying and choosing the most relevant variables or features for
modeling.

Cross-Validation: A technique used to assess the performance of a predictive model by


dividing the data into training and testing sets.

Overfitting: When a model is too complex and fits the training data too closely,
potentially leading to poor generalization to new data.

Bias and Variance: Terms used to describe the sources of error in a model, with bias
indicating underfitting and variance indicating overfitting.

Algorithm: A step-by-step procedure or set of rules for solving a specific problem in


data analysis and machine learning.

Ethical Considerations: The moral and legal aspects of working with data, including
data privacy and responsible data use.

Data Ethics: A branch of ethics that deals with the moral principles governing data
collection, handling, and sharing in the context of data science.

16. References of books, sites, links Textbooks:

1. Python Data Science Handbook, by Jake VanderPlas, Released November 2016


Publisher(s): O'Reilly Media, Inc. ISBN: 9781491912058

Sites and Web links:


Text and Annotation | Python Data Science Handbook (jakevdp.github.io)

17. Keywords

Data Science, Big Data, Data Scientist, Data Mining

You might also like