Industrialreport

MOVELY IP] ROFESSIONAL MINIVERSITY SIX WEEKS SUMMER TRAINING REPORT DATA SCIENCE WITH PYTHON Submitted by ASTUTE Registration No. 11716658 Programme Name: Data Selence Under the Guidance of Intershala, ‘School of Computer Science & Engineering Lovely Professional University, Phagwara (une-Suly, 2019)DECLARATION Thereby declare that Ihave completed my six weeks summer training at Interashala from 1 June,2019 to 13, July.2019 under the guidance of Kunal Jain, declare that have worked with full dedication during these six weeks of training and my leaming outcomes fulfil the requirements of taining forthe award of degree of Data Science with Python, Lovely Professional University, Phagwara, Name of Student: Registration no! Date:ACKNOWLEDGEMENT Its with sense of gratitude; I acknowledge the efforts of entire hosts of well-wishers wito have in some way. or other contributed in their own special ways to the success and completion of the Summer Training. ‘Successfully completion of any type of technology requires helps from a number of people, I have also taken. help from different people forthe preparation ofthe report. Now, there is litle effort to show my deep gratitude to those helpfil people Firs, I express my sense of gratitude and indebtedness to our Training mentor- Kunal Jain, From the bottom of my heart, for his immense support and guidance throughout the taining. Without his kind direotion and proper guidance this study would have been a litle success. In every phase of the project his Supervision and guidaace shaped this training to be completed perfectly.w INTERNSHALA TRAININGS. ot Sor oy eral Uy os cent undagere wees cine unmet ang or a scree ig prego cea of occ ook canoe yon Kr Oko Soe, Uncen the Sec br Dota tance aed Pecve Ming and ai of Mane Leng pe rol aneamartal acompiatontiha rang poor Atco mat,‘TABLE OF CONTENTS Introduetion Applications of Data Seience Python Introduction Statistics Predictive Modelling Stages of Predictive Modelling Model Building Reason for choosing data seience Gantt Chart BibliographyIntroduction t» Data Sclence Data Science The field of bringing insights from data using scientific techniques is called data science Applications Amazon Go - No checkout lines (ComprsterVitim=iThs sdremcement in recognitsing mnimage by. compabcrirmlves processing bgp chs of image data from multiple objects of same category. For example, Face recognition, Spectrum of Business Analysis what can happen? Given datas collected ond used Big data What is ikely to happen? Predictive Analysis What's happening Complex Dachboards Why aidit happen? Detective Analysis, What happened? ReportingReporting / Management Information System To tack what is happening in organization Detective Analysis Asking questions based on data we are seein, lke, Why something happened?” Dashboard / Business Intelligence Utopia of reporting, Every action about business i reflected infront of sereen Predictive Modelling Using past data to predict what is happening at granular level Big Data Stage where complexity of handling data gets beyond the traditional system, Can be caused because of volume, variety or velocity of data. Use specific tools to analyse such seale data. Application of Data Science Example-In Amazon recommendations are different for different users aevording (other past seareh, + Social Media 1. Recommendation Engine ‘Ad placement 3. Sentiment Analysis 4 Deciding the right ered limit for eet eard customers. © Suggesting right products from e-commerce companies 1, Recommendation System 2. Past Data Searced 3. Discouat Price Optimization ‘+ How google and other search engines know what are the more relevant results for our search query? 1. Apply ML and Data Science 2. Fraud Detection 3, AD placement 4. Personalized search results‘Python Introduction Python is an interpreted, high-Level, general-purpose programming language. It has efficient high-level data structures and a simple but effeetive approach to object-oriented programming, Python’s elegant syntax and <éynamic typing, together with its interpreted nature, make it an ideal language for seripting and rapid application development in many areas on most platforms, Python for Data science: Why Python??? 1, Python is an open source language 2, Syntax as simple as English. 3. Very large and Collaborative developer community 4, Extensive Packages. + UNDERSTANDING OPERATORS: ‘Theory of operators: - Operators are symbolic representation of Mathematical tasks VARIABLES AND DATATYPES: ‘Variables are named bounded to objects. Data types in python are int (Integer), Float, Boolean and strings. + CONDITIONAL STATEMENTS: Ifelse statements (Single condition) If- elif else statements (Multiple Condition) + LOOPING CONSTRUCTS: For loop + FUNCTIONS Functions are re-usable piece of code. Created for solving specific problem, ‘Two types: Built-in functions and User-defined functions, Functions cannot be reused in python. + DATA STRUCTURES: ‘Two types of Data structures: LISTS: A list is an ordered data structure with elements separated by comma and enclosed within square brackets DICTIONARY: A dictionary isan unordered data structure with elements separated by comma and sfored as key: valve pair, enelosed with curly braces {}Statics Descriptive Statistic Mode Tris a number which occurs mos frequently inthe daa sees. Iris robust and isnot generally affected much by addition of coupe of new values. Code import pandas as pd ata=pd.read_esv{ "Modecesv") _/reads data from csv file ata.hend() print frst five lines mode_dat Subject]. mode() to take mode of subject column print(mode_data) Mean impoit pandas as pd ata=pd.read_esv( "mean.esv") _//reads data from esv file éatashead() “print frst five lines ‘mean _data=data[Overallmarks].mean() to take mode of subject column print(mean_data) Median Absolute centeal value of data set. import pandas as pd ata~pd.read_esv( "dataese") reads data from esv file ata head) (print frst five lines ‘median_date=data[Overallinarks].median() (to take mode of subject column print(median_data) ‘Types of variables + Continous ~ Which takes continuous numeric values. Eg-marks + Categorial-Which have diserete values. Eg- Gender + Ordinal ~ Ordered categoria variables. Eg- Teacher feedback. + Nominal - Uaorderd categoria variable. Eg- GenderOutliers Any value which will fall outside the range of the data is termed as a outlier, Eg-9700 instead of 97. ‘Reasons of Outliers ‘© Typos- During collection. Eg-adding extra zero by mistake ‘© Measursient Esror-Outlers in data due to measurement operator being faulty ‘© Intentional Error-Errors which are induced intentionally. Eg-claiming smaller amount of alcohol ‘consumed then actual ‘+ Legit Outlier—These are values which are not actually errors but in data due t legitimate reasons. Eg-a CEO's salary might actually be high as compared to other employees Anterquartile Range (OR) Is difference between third and first quartile from last. It is robust to outliers Histograms Histograms depict the underlying frequency of a set of discrete or continuous data that are measured on an interval seule. import pandas as pd histogram=pe.sead_esv(histogram.esv) import matplotib,pyplo as pit oanatplo inkine plt.hist(s="Overall Marks data~histogram) plt.show() Inferential Statistics Inferential statistics allows to make inferences about the population from the sample data, Hypothesis Testing Hypothesis testing is a kind of statistical inference that involves asking # question, collecting data, and then. examining what the data tells us about how to proceed. The hypothesis tobe tested is called the mull hypothesis and given the symbol Ho, We test the ull hypothesis against an alternative hypothesis, which is aiven the symbol Ha, Dacian Mace Ral ypohaws is Te Nal typos hs Fae Reject Nall Typhi Type Teor ‘Correct Decision Do not Reject Null Hypothesis | Comeet Decision “Type I Error Tests ‘When we have just a sample not population statistics. Use sample standard deviation to estimate population standard deviation, ‘Test is more prone to errors, because we just have samplesZScore ‘The distance in terms of number of standard deviations, the observed value is away from mean, is standard z= Q3-15 *1OR Where IQR- Q3- QL Q3-Value of quartile QU=Value of Ist quartile ‘Treating Ou 1. Deleting observations 2, ‘Transforming and binning values 3, Impoting outliers like missing values 44, Treat them as separate Variable Transformation Is the process by which- |. Weteplace a variable with some function ofthat variable. Eg — Replacing a variable x with its log. 2. We change the distribution or relationship ofa variable with others. Used to — 1. Change the seale ofa variable ‘Transforming non linear relationships into linear relationship 3. Creating symmetric distribution from skewed distribution, (Common methods of Variable Transformation Logarithm, Square root, Cube root, Binning. et.‘Model Building tis aprocess to create a mathematical model for estimating / predicting the future based on past data Eg A retail wants to know the default behaviour ofits credit card customers. They want to predict the probability of default for each customer in next three months ‘© Probability of default would lie between 0 and L ‘© Assume every customer has a 10% default rate Probability of default foreach customer in next 3 months=0.1 It moves the probability towards one of the extremes based on attributes of past information A customer with volatile income is more likely (closer to) to default A customer with healthy ered history for last years has low chances of default (closer to 0) Steps in Mode! Building 1 Algorithm Selection Training Model 3. Prediction / Scoring Algorithm Selection Example- Have dependent variable? Yes a ee - Supervised Unsupervised Learning Leaming \ Is dependent Variable continuous? a Regression Classification Eg- Predict the customer will buy produet oF tot.Algorithms © Logistic Regression © Decision Tree ‘© Random Forest Training Model It isa process to lear relationship / correlation between independent and dependent variables. We use dependent variable of train data set to preictlestimate Dataset Train ‘Past data (known dependent variable). ‘Used to ain mode. Test Foture data (unknown dependent variable) Used to core. Prediction / Scoring It's the process to estimate predict dependent variable of tran dataset by applying model rules We apply training learning to test dataset for prediction estimation. Algorithm of Machine Learning Linear Resression Linear regression isa statistical approach for modelling relationship between a dependent variable with a given set of independent variables. Is assumed that the wo variables are linearly related. Hence, we try to find linear funetion. That predicts, the response value(y) as accurately as possible as a function of the feature or independent variable(s). ‘ies ‘The equation of egresin lini . serene a + h(x) = Bo + Bixi 5 { The squared enor or sos faction, 32 ; J(80,A1) = & DyLogistic Regression Logistic regression isa statistical model that in ts basie form uses a logistic funtion to model a binary
You might also like
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Rating: 4 out of 5 stars
4/5 (5808)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1092)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Rating: 4.5 out of 5 stars
4.5/5 (843)
Magazines
Podcasts
Sheet music
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (608)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1716)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (590)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1104)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (897)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (540)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (2104)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (346)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1018)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (474)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (821)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (1850)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (271)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (122)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (441)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Toibin
3.5/5 (1946)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (401)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2259)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (4610)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4203)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (1929)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (806)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (98)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (266)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2521)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (1898)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (231)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (234)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (738)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (3811)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2409)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (74)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (789)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (792)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
3.5/5 (104)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (137)
Little Women
From Everand
Little Women
Louisa May Alcott
4/5 (104)

Industrialreport

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Industrialreport

Uploaded by

Copyright:

Available Formats

You might also like