Professional Documents
Culture Documents
TDA MOOC Activity II Format
TDA MOOC Activity II Format
Department of CS & IT
V Semester
Activity #2
Certification Course’s
Specialization: GENERAL A
Date of Submission:
Submitted by: 20BCAR0280
Student Name:PRINCE LOKKO
USN No: 20BCAR0280
Name of Faculty In-Charge:
EVALUATION CRITERIA
Introduction - /10
Applications - /5
Certification- /10
Total - /25
This is to certify that -------------. has satisfactorily completed the course of assignment
prescribed by the Jain Deemed to be University for the Fifth Semester
____________________ Degree Course in the year 2023-2024.
MARKS
MAX OBTAINED
15
10
REPORT EXPLANATION
Data Fluency: Exploring And Describing Data
INTRODUCTION
"Data Fluency: Exploring and Describing Data" refers to a set of skills and knowledge related
to working with data effectively. It involves the ability to understand, analyse, and
communicate information from datasets. Data fluency is crucial in various fields, including
data science, business, research, and many others. Here are some key aspects of data fluency:
Data Collection: Data fluency starts with collecting relevant data. This may involve using
surveys, sensors, databases, or other means to gather information. It's essential to ensure that
the data collected is accurate, reliable, and representative of the problem or question at hand.
Data Cleaning: Raw data often contains errors, missing values, and inconsistencies. Data
fluency includes the ability to preprocess and clean data to make it suitable for analysis. This
step involves data validation, imputation, and handling outliers.
Data Exploration: Exploring data means getting a sense of its characteristics. This includes
understanding the data's distribution, central tendencies, and spread. Visualization tools and
techniques like histograms, scatter plots, and summary statistics are commonly used for data
exploration.
Data Analysis: Data fluency also involves performing various analyses on the data to extract
insights. This may include statistical analyses, machine learning algorithms, or other methods
depending on the nature of the data and the objectives of the analysis.
Data Interpretation: Interpreting data results is a vital skill. Data fluency involves the ability
to draw meaningful conclusions from the analysis and relate these findings to the problem or
question being addressed.
Continuous Learning: Data fluency is not static; it requires continuous learning and
adaptation. As new tools, techniques, and data sources become available, individuals and
organizations need to stay updated to remain data fluent.
Domain Expertise: In many cases, data fluency is most effective when combined with domain
expertise. Understanding the specific context in which the data is generated and used can lead
to more meaningful insights.
Overall, data fluency is a multifaceted skill set that combines technical, analytical, and
communication abilities to make data-driven decisions and solve complex problems
effectively. It is highly valuable in today's data-driven world, where information is a critical
asset for decision-making and innovation.
Scenario: Imagine you work for a retail company that wants to optimize its inventory
management. The company has collected sales data for the past year and wants to explore and
describe this data to make informed decisions about stocking levels and product assortment
for the upcoming year.
Data Collection: Start by collecting the sales data for the past year, including product IDs,
sales quantities, dates, and customer information. Ensure data integrity and accuracy.
Data Cleaning: Clean the data by removing duplicates, handling missing values, and checking
for outliers. This step ensures that your analysis is based on reliable information.
Data Exploration: Use data visualization tools to create histograms, line charts, and scatter
plots to explore the data. Identify trends, seasonality, and any outliers that might affect
inventory decisions.
Data Analysis: Apply statistical methods to analyse the data. Calculate summary statistics
like mean, median, and standard deviation for key product categories. Use time-series
analysis to identify sales patterns over different time periods.
Data Visualization: Create visualizations such as bar charts to show the best-selling products,
line charts to display sales trends over the year, and heatmaps to identify peak sales days or
seasons.
Data Interpretation: Interpret your findings. For example, you might discover that certain
products have high sales during specific months, while others are consistent year-round. This
information can help in making seasonal stocking decisions.
Data Communication: Prepare a presentation or report summarizing your analysis. Use clear
visualizations and explanations to convey your insights to the inventory management team.
Emphasize the importance of stocking products based on their historical sales patterns.
Ethical Considerations: Ensure that customer and sales data is handled with privacy and
security in mind. Anonymize customer information and comply with data protection
regulations.
Continuous Learning: Stay updated with the latest inventory management techniques and
data analysis tools to continually improve your data fluency skills.
Domain Expertise: Collaborate with the inventory management team to combine your data
fluency skills with their domain expertise. They can provide insights into market trends,
product lifecycles, and customer preferences.
By applying data fluency in this scenario, you can help the retail company make data-driven
decisions regarding inventory management, leading to improved stock levels, reduced
carrying costs, and better customer satisfaction. This application demonstrates how data
fluency can be a valuable asset in real-world business situations.
https://www.linkedin.com/learning/data-fluency-exploring-and-describing-data-15815339/
make-better-decisions-with-your-data?contextUrn=urn%3Ali%3AlyndaLearningPath
%3A5ea9f8c8498e31684804bbca&u=92695330
15 Hours 18 minutes
Introduction to Statistics:
Different types of data: categorical (nominal and ordinal) and numerical (discrete and
continuous).
Understanding the nature of data and its implications for statistical analysis.
Descriptive Statistics:
Calculating confidence intervals for population parameters (e.g., mean and proportion).
Interpreting confidence intervals in the context of statistical inference.
Hypothesis Testing:
Ethical considerations in data collection and analysis, including informed consent and
privacy.
The responsible use of statistics in research and decision-making.
"Statistics Foundations 1: The Basics" is often the starting point for individuals looking to
build a strong foundation in statistics. It provides the essential knowledge and skills needed to
work with data, draw meaningful conclusions, and make informed decisions based on
statistical analysis.
Scenario: You are a marketing analyst for a retail company, and your goal
is to assess the effectiveness of a recent marketing campaign to increase
sales. You have collected data on daily sales figures before and after the
campaign. Your manager wants to know if there is a statistically significant
increase in sales due to the campaign.
Data Collection:
Gather data on daily sales for a period before and after the marketing campaign. Ensure that
the data is accurate and well-documented.
Data Types:
Categorize the data as numerical (daily sales) and categorical (before vs. after the campaign).
Understanding the data types helps in choosing appropriate statistical methods.
Descriptive Statistics:
Calculate descriptive statistics such as the mean and standard deviation of daily sales before
and after the campaign. This provides an initial sense of the data.
Data Visualization:
Create visualizations like line charts to show the trend in daily sales over time, highlighting
the campaign period.
Hypothesis Testing:
Formulate a hypothesis: Null Hypothesis (H0) - There is no significant increase in sales due
to the campaign. Alternative Hypothesis (Ha) - There is a significant increase in sales due to
the campaign.
Choose an appropriate statistical test (e.g., paired t-test for comparing means before and after
the campaign).
Set a significance level (alpha, often 0.05) to determine the threshold for statistical
significance.
Confidence Intervals:
Calculate confidence intervals for the mean sales before and after the campaign to estimate
the range in which the population mean lies.
Hypothesis Testing Results:
If applicable, explore the relationship between sales and other factors such as marketing
spending. Use regression analysis to assess how these variables influence sales.
Data Communication:
Present your findings in a report or presentation to your manager. Use clear visualizations
and language to convey the statistical significance of the campaign's impact on sales.
Ethical Considerations:
Ensure that customer data and any sensitive information are handled with care and in
compliance with privacy regulations.
By applying the principles learned in "Statistics Foundations 1: The Basics," you can provide
your manager with data-driven insights regarding the effectiveness of the marketing
campaign. This application demonstrates how a solid understanding of basic statistics can be
used to make informed decisions in a business context.
https://www.linkedin.com/learning/paths/develop-your-data-analysis-skills?u=92695330
33 minutes
Understanding the behavior of sample means as sample size increases (Law of Large
Numbers).
The Central Limit Theorem and its significance in statistical inference.
Probability Inequalities:
How probability concepts are applied in statistical analysis, hypothesis testing, and statistical
modeling.
"Statistics Foundations 2: Probability" equips students with a deeper understanding of
probability theory, which forms the basis for more advanced statistical techniques. This
knowledge is valuable for anyone involved in data analysis, research, or decision-making, as
it provides the tools to model and make sense of uncertainty and randomness in data.
https://www.linkedin.com/learning/statistics-foundations-2-probability/the-importance-of-
probabilities?contextUrn=urn%3Ali%3AlyndaLearningPath
%3A5ea9f8c8498e31684804bbca&u=92695330
1 Hour 10 minutes
Introduction to Tableau:
Connecting Tableau to various data sources, including databases, spreadsheets, and web data
connectors.
Cleaning and transforming data within Tableau using the Data Source tab.
Data Visualization Fundamentals:
Building more complex visualizations, including heatmaps, tree maps, and geographic maps.
Customizing visualizations using formatting and design options.
Data Blending and Joins:
Creating calculated fields and expressions to derive new insights from data.
Using functions and logical operators for data manipulation.
Dashboard Design and Layout:
Publishing and sharing Tableau workbooks and dashboards on Tableau Server or Tableau
Online.
Collaborating with team members and stakeholders.
Advanced Analytics and Forecasting:
Leveraging Tableau's advanced analytics features for predictive modeling and forecasting.
Integrating R or Python scripts for advanced analytics.
Security and Data Governance:
Exploring real business scenarios and case studies where Tableau has been used to drive
insights and decision-making.
Best Practices and Tips:
Learning best practices for Tableau usage, including naming conventions, dashboard
organization, and performance optimization.
Certification and Further Learning:
Preparing for Tableau Desktop or Tableau Server certification exams.
Exploring advanced topics and resources for continued learning.
"Using Tableau to Discover Powerful Business Insights" is designed to empower participants
with the skills and knowledge needed to extract actionable insights from data and create
compelling visualizations and dashboards for effective data-driven decision-making within a
business context.
Introduction to Tableau:
Exploring more advanced visualization types, such as heatmaps, tree maps, geographic maps,
and dual-axis charts.
Using calculated fields and sets for complex data representations.
Data Analytics and Calculations:
Utilizing calculated fields, table calculations, and LOD (Level of Detail) expressions to
perform in-depth data analysis.
Applying statistical functions and aggregations to gain insights.
Advanced Analytics Integration:
Integrating predictive analytics and forecasting into Tableau using built-in features or
external tools like R and Python.
Data Storytelling:
Publishing and sharing Tableau workbooks and dashboards on Tableau Server, Tableau
Online, or Tableau Public.
Managing user access and permissions.
Data Security and Compliance:
Exploring real business use cases and case studies where Tableau has been instrumental in
decision-making and problem-solving.
Best Practices and Tips:
Learning best practices for dashboard design, data organization, and maintaining Tableau
projects efficiently.
Certification and Continuing Education:
Introduction to Tableau:
Creating fundamental visualizations, including bar charts, line charts, scatter plots, and pie
charts.
Customizing visual elements like colors, labels, and tooltips.
Interactive Dashboards:
Exploring more advanced visualization types, such as heatmaps, tree maps, geographic maps,
and dual-axis charts.
Using calculated fields and sets for complex data representations.
Data Analytics and Calculations:
Utilizing calculated fields, table calculations, and LOD (Level of Detail) expressions to
perform in-depth data analysis.
Applying statistical functions and aggregations to gain insights.
Advanced Analytics Integration:
Integrating predictive analytics and forecasting into Tableau using built-in features or
external tools like R and Python.
Data Storytelling:
Crafting compelling data narratives and stories within Tableau.
Guiding viewers through data insights using storyboards and annotations.
Collaboration and Sharing:
Publishing and sharing Tableau workbooks and dashboards on Tableau Server, Tableau
Online, or Tableau Public.
Managing user access and permissions.
Data Security and Compliance:
Exploring real business use cases and case studies where Tableau has been instrumental in
decision-making and problem-solving.
Best Practices and Tips:
Learning best practices for dashboard design, data organization, and maintaining Tableau
projects efficiently.
Certification and Continuing Education:
User
application sample of Using Tableau to Discover Powerful Business Insights
ChatGPT
Certainly, here's a sample application of using Tableau to discover powerful business insights
in a real-world scenario:
Scenario: You work for a retail company that wants to improve its inventory management
and optimize product assortment. The company has collected sales data over the past few
years and wants to use Tableau to gain insights that will lead to more efficient inventory
planning.
Clean the data within Tableau to address any missing values, duplicates, or inconsistencies.
Explore the data visually using Tableau's data exploration features to identify trends, seasonal
patterns, and potential outliers.
Basic Visualization:
Create basic visualizations like line charts and bar charts to show sales trends over time,
identify top-selling products, and visualize product categories.
Advanced Visualization:
Develop more advanced visualizations, such as heatmaps or tree maps, to explore correlations
between product sales, customer demographics, and other relevant factors.
Data Analytics:
Use Tableau's calculated fields and aggregation functions to calculate key performance
metrics like year-over-year growth, customer retention rates, and product profitability.
Interactive Dashboards:
Design interactive dashboards that allow inventory managers and decision-makers to filter
data by various dimensions like product category, region, or time period.
Implement dashboard actions and parameters to enable users to explore data dynamically.
Forecasting:
Use Tableau's forecasting features to predict future sales trends for different product
categories.
Incorporate confidence intervals to account for uncertainty in forecasts.
Data Storytelling:
Create a data story within Tableau that presents the insights and findings in a compelling
narrative format.
Highlight key takeaways and recommendations for inventory optimization.
Sharing Insights:
Continuously update the Tableau dashboard as new sales data becomes available.
Monitor the impact of inventory management changes based on the insights provided by
Tableau.
Performance Optimization:
Optimize the dashboard's performance for fast loading, especially when dealing with large
datasets.
By applying Tableau to this scenario, you enable the retail company to make data-driven
decisions about inventory management, ensuring that products are stocked efficiently,
reducing carrying costs, and improving overall profitability. This application showcases how
Tableau can be a powerful tool for discovering actionable business insights and driving
informed decision-making.
https://www.linkedin.com/learning/using-tableau-to-discover-powerful-business-insights-
15616295/using-tableau-to-create-business-insights?contextUrn=urn%3Ali
%3AlyndaLearningPath%3A5ea9f8c8498e31684804bbca&u=92695330
1 Hour 58 minutes
"Python Data Analysis" refers to the process of using the Python programming language and
its associated libraries and tools to analyze and manipulate data. Python has become one of
the most popular programming languages for data analysis due to its simplicity, versatility,
and a wide range of libraries designed for data manipulation and visualization. Here are the
key aspects and details of Python data analysis:
Data Acquisition:
Reading data from various sources, such as CSV files, Excel spreadsheets, databases, web
APIs, and more.
Python libraries commonly used for data acquisition include pandas, NumPy, and SQL
Alchemy for database interaction.
Data Cleaning and Preprocessing:
Visualizing data using libraries like matplotlib, seaborn, and Plotly to understand data
distributions, relationships, and patterns.
Descriptive statistics to summarize data.
Data Manipulation:
Using pandas for data manipulation tasks such as filtering, grouping, merging, and reshaping
data.
Applying custom functions to data.
Statistical Analysis:
Applying machine learning algorithms for predictive modeling, classification, clustering, and
regression.
Libraries such as scikit-learn for machine learning tasks.
Time Series Analysis:
Creating informative and aesthetically pleasing data visualizations using libraries like
matplotlib, seaborn, and interactive visualizations with Plotly.
Building dashboards with tools like Dash or integrating with Tableau for advanced
visualization.
Big Data and Distributed Computing:
Analyzing large datasets using distributed computing frameworks like Disk or integrating
with big data platforms like Hadoop and Spark.
Geospatial Data Analysis:
Analyzing and visualizing geospatial data using libraries like geopandas, folium, and
Cartopy.
Text Data Analysis:
Analyzing and processing text data, including text mining, sentiment analysis, and natural
language processing (NLP) using libraries like nltk and spaCy.
Data Reporting and Communication:
Creating reports and data presentations using Jupyter notebooks, matplotlib, and Pandas for
rendering tables and charts.
Sharing insights and results with stakeholders.
Version Control:
Using version control systems like Git to manage code and collaborate with other data
analysts and scientists.
Integration with Data Science Ecosystem:
Integrating Python data analysis with other data science tools such as R, SQL, and cloud
platforms like AWS, Google Cloud, and Azure.
Automation and Reproducibility:
Data Collection:
Gather customer data, including purchase history, demographic information, and website
usage data from various sources such as databases and web APIs.
Data Cleaning and Preprocessing:
Visualize customer demographics using matplotlib and seaborn to understand the customer
base.
Analyze purchase patterns, such as order frequency and order size, to identify trends.
Customer Segmentation:
Apply clustering algorithms from scikit-learn to segment customers into groups based on
their behavior, preferences, or purchase history.
Create profiles for each segment to understand their characteristics.
Churn Prediction:
Use machine learning techniques to build a churn prediction model using historical data.
Evaluate the model's performance and identify factors contributing to customer churn.
Recommendation Engine:
Design and conduct A/B tests to assess the impact of different marketing strategies or website
changes on customer behavior.
Analyze the test results to determine which strategies are more effective.
Time Series Analysis:
Analyze time series data to identify seasonal patterns in customer behavior, such as holiday
shopping trends.
Use libraries like stats models for time series analysis.
Customer Retention Strategies:
Develop data-driven customer retention strategies based on insights gained from the analysis.
Create targeted marketing campaigns for specific customer segments.
Data Visualization and Reporting:
Create interactive dashboards and reports using tools like Jupyter notebooks and Plotly to
communicate insights to the marketing team and stakeholders.
Automation and Deployment:
2 Hours 30 minutes