DHILANI E (16PGM09) What does the course provide ?
• What is python and why is it useful.
• Pythonized tools for retrieving and dealing with data. • Basics of data science with python. • Methods and application of python • Dealing with Data Data Scientist : the coolest job of the 21st century- HARVARD Business Review Why Python ? • It is a widely-used General purpose high-level programming language. • Beginner program language since it is easy to learn and to maintain. • It supports GUI programming and its library is portable. • It can be used to create application portable in Mac, Windows and Unix system. Computer science + Mathematics/Statistics+ Visualization = Data Science
• Example : Web companies like Facebook, Amazon,
Google, LinkedIn uses thos. DATA outline • Harvesting • Cleaning • Analyzing • Visualizing • publishing Data Harvesting • Also called as web scraping. • It is the process where a small script is used to automatically extract large amount of data from websites. • Cheap and easy way to collect online data. • DATA SOURCES: own system, other service, locally available data, data dumps from web, web documents, Data Cleansing( Preprocessing ) • Harvested data might come with lots of noise- for detection. Ex: scatter plot • Data preprocess : provide structured presentation for analysis. Ex: Network, Graph. Data Analyzing
• Analyzing the data
• Numpy( offers efficient multidimensional array) and scipy.org(builds on top of NumPy) is used. Data visualization • Python interface for the Graphviz layout engine. • Graphviz is a collection of graph layout programs. Matplotlib Data publishing • Open data should be available for usage ( for the benefit of most people). • Examples of open dataset types 1. Government data 2. Life science 3. Commerce 4. Social media 5. culture Thankyou