Professional Documents
Culture Documents
Data Analysis
-Is the process of inspecting, cleansing, transforming and modelling data with the
goal of discovering useful information.
“METHODS”
1. Data Mining - Discovers patterns in large data sets using methods of Statistics,
AI(Artificial Intelligence), Machine Learning, Databases.
2. Text Analytics - Is the process of deriving useful information.
3. Business Intelligence(BI) - Transforms data into actionable intelligence for business
purposes.
4. Data Visualization - Refers simply to the graphical representation of data by using
charts,graphs,Maps,etc…
AI - An “Intelligent” computer uses AI to think like a human and perform tasks on it’s own.
Machine Learning - how a computer system develops its intelligence.
Data Science
-is the study and extraction of useful information from raw data.
Data science uses:
*Scientific algorithms
*Processes
*Systems
*Modern tools
*and Techniques
DataScientist
-They collect data, analyze it, and share their insights with technology leaders and
businesses to help organizations solve issues.
ADT(Abstract Data Types)
NOTABLE ADTs:
-List
-Stack-Queue
ADT features
-Abstraction
-Better conceptualization
-Robust
MIDTERM
Sets
- German mathematician Georg Cantor introduced the concept of sets.
- A set is an unordered collection of different elements.
Cardinality
-the number of elemets in set
Example:
|{1,4,3,5}|=4
The cardinality is “4”
Types of Sets
1.A set which contains a definite number of elements is called a finite set.
2.A set which contains infinite number of elements is called an infinite set.
3. Subset - A subset is a part of a given set (another set or the same set).A = {1, 2, 3} is a
subset of B = {1, 2, 3, 4, 10}.
4. Proper Subset - is any subset of the set except itself. For example, if A = {1, 2, 3}, then its
proper subsets are {}, {1}, {2}, {3}, {1, 2}, {2, 3}, and {3, 1}, but the set itself {1, 2, 3} is NOT a
proper subset of A.
5.Universal Set - Consider two sets, A = {x,y,z} and B = {1,2,3,x,y}, then the universal set
associated with these two sets is U = {1,2,3,x,y,z}.
6. If two sets contain the same elements they are said to be equal.
7.If the cardinalities of two sets are same, they are called equivalent sets.
8.Two sets that have at least one common element are called overlapping sets.
9.Two sets A and B are called disjoint sets if they do not have even one element in common.
10.Venn diagram, invented in 1880 by John Venn, is a schematic diagram that shows all possible
logical relations between different mathematical sets.
Set Operations
1..Set Union - If A={10,11,12,13} and B = {13,14,15} then A∪B={10,11,12,13,14,15}
4.Complement of a set - If the universal set is all prime numbers up to 25 and set A = {2, 3, 5} then
the complement of set A is other than the elements of A.
Step 1: Check for the universal set and the set for which you need to find the complement. U = {2, 3, 5,
7, 11, 13, 17, 19, 23}, A = {2, 3, 5}.
U - A = A'
5.Cartesian Product/Cross Product - Consider two non-empty sets C = {x, y, z} and D = {1, 2,
3} as shown in the image ->
7.Partitioning of a Set - one possible partition of {1, 2, 3, 4, 5, 6} is, {1, 3}, {2}, {4, 5, 6}.
8. Relations - Suppose there are two sets… X = {4, 36, 49, 50} and Y = {1, -2, -6, -7, 7, 6, 2}.
A relation that states that "(x, y) is in the relation R if x is a square of y" can be represented using
ordered pairs as… R = {(4, -2), (4, 2), (36, -6), (36, 6), (49, -7), (49, 7)}.
Types Of Relations
Empty Transitive
Universal Equivalence
PRE FINALS
Algorithm
-Derived from the name of the Persian mathematician Muhammad ibn Mūsā al-Khwārizmī.
-We first demonstrate the algorithm using pseudocode, which explains the algorithm in an English-like
syntax.
-The same algorithm is shown in a programming language.
- Ada Lovelace is credited as being the first computer programmer and the first person to develop an
algorithm for a machine (Analytic Engine).
TYPES
Best case: Define the input for which algorithm takes less time or minimum time.
Worst Case: Define the input for which algorithm takes a long time or maximum time.
Average case: In the average case take all random inputs and calculate the computation time for all
inputs.
Cost Models
1. Uniform cost model - Assigns a constant cost to every machine operation, regardless of the size of
the numbers involved.
2. Logarithmic cost model - Assigns a cost to every machine operation proportional to the number of
bits involved.
Run-Time Analysis
-is a theoretical classification that estimates and anticipates the increase in running time (or run-time or
execution time) of an algorithm as its input size increases.
Data Science
- is the study of data to extract meaningful insights for business.
DS Process
OSEMN
-OBTAIN DATA
-SCRUB DATA
-EXPLORE DATA
-MODEL DATA
-INTERPRET DATA
DS Techniques
1.Classification is the sorting of data into specific groups or categories.
2.Regression is the method of finding a relationship between two seemingly unrelated data points.
3.Clustering is the method of grouping closely related data together to look for patterns and
anomalies.
DS Technologies
Artificial intelligence: Machine learning models and related software are used for
predictive and prescriptive analysis.
Cloud computing: Cloud technologies have given data scientists the flexibility and
processing power required for advanced data analytics.
IoT(Internet of things): refers to various devices that can automatically connect to
the internet. These devices collect data for data science initiatives. They generate
massive data which can be used for data mining and data extraction.
Quantum computing: Quantum computers can perform complex calculations at high
speed. Skilled data scientists use them for building complex quantitative algorithms.
Statistical Methods
- mainly useful to ensure that your data are interpreted correctly.
Steps in the DA Process
1.Pose a Question
2.What to Measure and How to Measure
3.Data Collection
4.Data Cleaning
5.Summarizing and Visualizing Data
6.Data Modeling
7.Optimize and Repeat
2.Median - 5, 5, 5, 5, 6, 7, 8, 9
The Median is 5.5
We order the dataset first.
Then we take the middle one in the order.
But since its an even number dataset, we have two numbers in the middle.
We add them (5+6) and divide them by 2.