Data Science Using Python - Syllabus V 11

You might also like

You are on page 1of 46

(Data Analytics with Python)

Introduction to Data Science


Data science is an inter-disciplinary field that uses scientific methods,
processes, algorithms and systems to extract knowledge and insights from
structural and unstructured data.

Data science combines the scientific method, math and statistics, specialized
programming, advanced analytics, AI, and even storytelling to uncover and
explain the business insights buried in data.

Data science is the field of study that combines domain expertise,


programming skills, and knowledge of mathematics and statistics to extract
meaningful insights from data.

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


PostgreSQL
https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46
Data Science Components
Languages
Python & R

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


CAREER GROWTH
WITH
DATA SCIENCE

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


DATA SCIENCE
https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46
Data Science Jargons
Data Science Components
Artificial intelligence (AI) is intelligence demonstrated by machines, unlike
the natural intelligence displayed by humans and animals, which involves
consciousness and emotionality.

Machine learning (ML) is the study of computer algorithms that improve


automatically through experience.

Deep learning (DL) is part of a broader family of machine learning methods


based on artificial neural networks with representation learning. Learning can
be supervised, semi-supervised or unsupervised.

Natural Language Processing (NLP) is a subfield of linguistics, computer


science, and artificial intelligence concerned with the interactions between
computers and human language.

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


Data Science Components
Text mining, also referred to as text data mining, similar to text analytics, is
the process of deriving high-quality information from text. It involves "the
discovery by computer of new, previously unknown information, by
automatically extracting information from different written resources." Written
resources may include websites, books, emails, reviews, and articles. High-
quality information is typically obtained by devising patterns and trends by
means such as statistical pattern learning.

A Neural Network is a network or circuit of neurons, or in a modern sense, an


artificial neural network, composed of artificial neurons or nodes. Thus a
neural network is either a biological neural network, made up of real
biological neurons, or an artificial neural network, for solving artificial
intelligence (AI) problems.

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


Data Science Components

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


Python

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


Python
Python is an interpreted, object-oriented, high-level programming language
with dynamic semantics. Its high-level built in data structures, combined with
dynamic typing and dynamic binding, make it very attractive for Rapid
Application Development, as well as for use as a scripting or glue language to
connect existing components together. Python's simple, easy to learn syntax
emphasizes readability and therefore reduces the cost of program maintenance.

Python supports modules and packages, which encourages program


modularity and code reuse. The Python interpreter and the extensive standard
library are available in source or binary form without charge for all major
platforms, and can be freely distributed.

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


Topmost Features of Python
1. Platform Independent: It can run on Windows, Linux, Unix, Ubuntu and
many more OS.

2. Open Source: Python is developed under an OSI-approved open source


license, making it freely usable and distributable, even for commercial use.

3. Open Source Libraries: Python and it's all associate libraries are open
source license, making it freely usable and distributable, even for
commercial use.

4. Dynamically Typed Language: Python doesn't occupy space for it's


variables, like static language (C, C++, Java).

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46
https://colab.research.google.com/

Web I.D.E. Google Colab Login Google

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


Nitty-gritty of Web I.D.E. Google Colab
A web IDE for python, to enable Machine Learning with storage on the cloud
and is set to make a huge difference in the world of machine learning, artificial
intelligence and data science work.

Google says: “It’s a Jupyter notebook environment that requires no setup to


use and runs entirely in the cloud.”

Features
1. Free virtual machines
2. Free GPU access
3. Supports Python 2 and Python 3
4. Integration with Google Drive
5. Can import an existing Jupyter/IPython notebooks.

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


Power
of
Python
Libraries
https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46
Python Libraries
1. Pandas – Data analysis and manipulation tool.
2. NumPy – Large, multi-dimensional arrays and matrices.
3. Matplotlib – Creating static, animated, and interactive visualizations.
4. Seaborn – Python data visualization library based on matplotlib.
5. Scikit-Learn – Simple and efficient tools for predictive data analysis.
6. NLTK – Work with human language data.
7. TextBlob – Python library for processing textual data.
8. Tensorflow – Symbolic math library based on dataflow and differentiable
programming.
9. Keras – Interface for artificial neural networks.
10. Requests, Beautiful Soup 4 (bs4), lxml, Selenium, Scrapy – For Web
Scraping

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46
R – Programming Language
R is a programming language and free software environment for statistical
computing and graphics supported by the R Foundation for Statistical
Computing. The R language is widely used among statisticians and data
miners for developing statistical software and data analysis. Polls, data mining
surveys, and studies of scholarly literature databases show substantial
increases in popularity; as of January 2021, R ranks 9th in the TIOBE index, a
measure of popularity of programming languages.

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46
R-Studio Features
RStudio is an integrated development environment (IDE) for R. It includes a
console, syntax-highlighting editor that supports direct code execution, as well
as tools for plotting, history, debugging and workspace management. Click
here to see more RStudio features.

RStudio is available in open source and commercial editions and runs on the
desktop (Windows, Mac, and Linux) or in a browser connected to RStudio
Server or RStudio Server Pro (Debian/Ubuntu, Red Hat/CentOS, and SUSE
Linux).

1. Access RStudio locally.


2. Syntax highlighting, code completion, and smart indentation.
3. Execute R code directly from the source editor.
4. Quickly jump to function definitions.

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


R-Studio Features
5. View content changes in real-time with the Visual Markdown Editor.
6. Easily manage multiple working directories using projects.
7. Integrated R help and documentation.
8. Interactive debugger to diagnose and fix errors.
9. Extensive package development tools.

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46
What is PostgreSQL?
PostgreSQL is a powerful, open source object-relational database system that
uses and extends the SQL language combined with many features that safely
store and scale the most complicated data workloads. The origins of
PostgreSQL date back to 1986 as part of the POSTGRES project at the
University of California at Berkeley and has more than 30 years of active
development on the core platform.

PostgreSQL has earned a strong reputation for its proven architecture,


reliability, data integrity, robust feature set, extensibility, and the dedication of
the open source community behind the software to consistently deliver
performant and innovative solutions. PostgreSQL runs on all major operating
systems, has been ACID-compliant since 2001, and has powerful add-ons such
as the popular PostGIS geospatial database extender. It is no surprise that
PostgreSQL has become the open source relational database of choice for many
people and organisations.

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


Why use PostgreSQL?
PostgreSQL comes with many features aimed to help developers build
applications, administrators to protect data integrity and build fault-tolerant
environments, and help you manage your data no matter how big or small the
dataset. In addition to being free and open source, PostgreSQL is highly
extensible. For example, you can define your own data types, build out custom
functions, even write code from different programming languages without
recompiling your database!

PostgreSQL tries to conform with the SQL standard where such conformance
does not contradict traditional features or could lead to poor architectural
decisions. Many of the features required by the SQL standard are supported,
though sometimes with slightly differing syntax or function. Further moves
towards conformance can be expected over time. As of the version 13 release in
September 2020, PostgreSQL conforms to at least 170 of the 179 mandatory
features for SQL:2016 Core conformance. As of this writing, no relational
database meets full conformance with this standard.

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


Features of PostgreSQL
1. Data Types
2. Data Integrity
3. Concurrency, Performance
4. Reliability, Disaster Recovery
5. Security
6. Extensibility
7. Internationalizations, Text Search

https://www.postgresql.org

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


Big Data

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


Recognize Big Data & Hadoop

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


Web Scraping

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


Web Scraping Using Python
Web scraping, web harvesting, or web data extraction is data scraping used for
extracting data from websites. The web scraping software may directly access the
World Wide Web using the Hypertext Transfer Protocol or a web browser. While
web scraping can be done manually by a software user, the term typically refers to
automated processes implemented using a bot or web crawler. It is a form of
copying in which specific data is gathered and copied from the web, typically into a
central local database or spreadsheet, for later retrieval or analysis.

Web scraping a web page involves fetching it and extracting from it. Fetching is the
downloading of a page (which a browser does when a user views a page).
Therefore, web crawling is a main component of web scraping, to fetch pages for
later processing. Once fetched, then extraction can take place. The content of a page
may be parsed, searched, reformatted, its data copied into a spreadsheet, and so
on. Web scrapers typically take something out of a page, to make use of it for
another purpose somewhere else. An example would be to find and copy names
and telephone numbers, or companies and their URLs, to a list (contact scraping).

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


Web Scraping Using Python

Amazon e-business
Indeed Job Portal
Yahoo News
Salary Portal
Yahoo Stocks
Twitter

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


DATA VISUALIZATION

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


Top Features of Analytics and
Business Intelligence (BI) Platforms
Integrated support for enterprise reporting capabilities. Organizations are
interested in how these platforms, known for their agile data visualization
capabilities, can now help them modernize their enterprise reporting needs.

Augmented analytics. Machine learning (ML) and


artificial intelligence (AI)-assisted data preparation, insight generation and
insight explanation.

Security: Capabilities that enable platform security, administering of users,


auditing of platform access and authentication.

Manageability: Capabilities to track usage, manage how information is


shared and by whom, perform impact analysis and work with third-party
applications.

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


Top Features of Analytics and
Business Intelligence (BI) Platforms
Cloud: The ability to support building, deploying and managing analytics and
analytic applications in the cloud, based on data both in the cloud and
on-premises, and across multicloud deployments.

Data source connectivity: Capabilities that enable users to connect to,


and ingest, structured and unstructured data contained in various types of
storage platforms, both on-premises and in the cloud.

Data preparation: Support for drag-and-drop, user-driven combination of


data from different sources, and the creation of analytic models (such as
user-defined measures, sets, groups and hierarchies).

Model complexity: Support for complex data models, including the ability
to handle multiple fact tables, interoperate with other analytic platforms and
support knowledge graph deployments.

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


Top Features of Analytics and
Business Intelligence (BI) Platforms
Catalog: The ability to automatically generate and curate a searchable catalog
of the artifacts created and used by the platform and their dependencies

Automated insights: A core attribute of augmented analytics, this is the


ability to apply ML techniques to automatically generate insights for end
users (for example, by identifying the most important attributes in a dataset).

Advanced analytics: Advanced analytical capabilities that are easily


accessed by users, being either contained within the ABI platform itself or
usable through the import and integration of externally developed models.

Data visualization: Support for highly interactive dashboards and the


exploration of data through the manipulation of chart images. Included are
an array of visualization options that go beyond those of pie, bar and line
charts, such as heat and tree maps, geographic maps, scatter plots and
other special-purpose visuals.

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


Top Features of Analytics and
Business Intelligence (BI) Platforms
Natural language query: This enables users to query data using
business terms that are either typed into a search box or spoken.

Data storytelling: The ability to combine interactive data visualization


with narrative techniques in order to package and deliver insights
in a compelling, easily understood form for presentation to decision makers.

Embedded analytics: Capabilities include an SDK with APIs and


support for open standards in order to embed analytic content
into a business process, an application or a portal.

Natural language generation (NLG): The automatic creation of


linguistically rich descriptions of insights found in data.

Reporting: The ability to create and distribute (or “burst”) to consumers


grid-layout, multipage, pixel-perfect reports on a scheduled basis.

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


Data Visualization
Salesforce Tableau: Tableau Software is interactive data visualization
software company focused on business intelligence.

Features of Tableau
1. Tableau Dashboard
2. Collaboration & Sharing
3. Live & In Memory Data
4. Different Data Sources
5. Robust Security
6. Mobile View
7. Trend Lines and Predictive Analysis

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


Data Visualization
Microsoft Power BI: Power BI is a business analytics service by
Microsoft. It aims to provide interactive visualizations and business
intelligence capabilities with an interface simple enough for end users to create
their own reports and dashboards.

Features of Power BI
1. Range of Attractive Visualizations
2. Get Data (Infinite Data Source)
3. Datasets Filtration
4. Customizable Dashboards
5. Natural Language Q & A Question Box

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


Sky is Limit
Years Other Data Science
Fresher 180 K 650 K
1 – 5 Years 240 K 1200 K
6 – 10 Years 600 K 2400 K
11 More 1500 K 4800 K

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


About The DataLytics

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


The DataLytics
The Datalytics is an institution dedicated to education in Data Science with a
vision to groom Data Leaders of tomorrow. Promoted by the veteran corporate
minds from the Industry, having over 20 years of experience and expertise the
Big Data, Advanced Analytics and Data Science.

The Datalytics is aimed at disseminating world-class education targeted to


make young minds into Data leaders of tomorrow. The programs at
Thedatalytics is designed for students like you who aspire to make careers in
Data Science. It will equip you with the tools, training and most important
mindset required to cracking jobs and becoming a data leader of tomorrow.

https://thedatalytics.institute/about-us/

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


Our Team

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


About Trainer
Total IT Experience of 15 Years as OeBS Consultant at different positions,
companies like Zensar, Pune; CapGemini, Mumbai; Blue Star Infotech,
Mumbai; GTL Limited, Navi Mumbai.

Over +5000 students, apprentices, undergraduates, professors, professionals,


instructors, and research scholars benefited with over +3500 hours of trainings,
learning, seminars, workshops, presentations, conferences, and techno meet-
ups.

Key Skills:
1. Data Science Using Python
2. Data Visualization Using Power BI and Tableau
3. Big Data Hadoop
4. Oracle EBS

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


Key Trainings
1. Data Science
1. Python & Adv. Python
2. Concepts of Data Science
3. Concepts of Big Data Hadoop & it’s Ecosystem
4. Machine Learning using Python
5. Data Visualization
1. Sales Force - Tableau
2. Microsoft -Power BI

2. SalesForce: SFDC

3. Oracle ERP: Technical Training

4. Oracle APEX: Oracle Application Express

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46


Contact Us
Address:
Plot No. 158, Vivekanand Nagar, Wardha Road, Nagpur 440015

Mobile/WhatsApp Number:
+91 866 96 195 46
+91 952 77 991 52

Email Us:
anurag.e@thedatalytics.institute

https://thedatalytics.institute info@thedatalytics.institute +91 866 96 195 46

You might also like