You are on page 1of 12
Python for Informatics Exploring Information Version 2.7.0 Charles Severance Copyright © 2009- Charles Severance Printing history May 2014: Editorial pass thanks to Sue Blumenberg. October 2013: Major revision to Chapters 13 and 14 to switch to ISON and use OAuth, Added new chapter on Visualization. September 2013: Published book on Amazon CreateSpace January 2010: Published book using the University of Michigan Espresso Book ma- chine. December 2009: Major revision to chapters 2-10 from Think Python: How to Think Like ‘a Computer Scientist and writing chapters | and 11-15 to produce Python for In formatics: Exploring Information June 2008: Major revision, changed title to Think Python: How to Think Like a Com- ‘puter Scientist August 2007: Major revision, changed tile to How to Think Like a (Python) Progran- April 2002: First edition of How to Think Like a Computer Scientist, “This work is licensed under a Creative Common Autibution-NonCommercial- 3.0 Unported License. This license is available at creat ivec: y-ne-sa/3.0/, You can see what the author considers commercial and non-commercial uses ofthis material as well as license exemptions in the Appendix titled Copyright Detail hare Alike “The IHX source for the Think Python: How to Think Like a Computer Scientist version of this book is available from http: //www.chinkpython..com Preface Python for Informatics: Remixing an Open Book It is quite natural for academics who are continuously told to “publish or perish” to want to always create something from scratch that is their own fresh creation, This book is an experiment in not starting from scratch, but instead “remixing” the book titled Think Python: How to Think Like a Computer Scientist written by Allen B, Downey, Jeff Elkner, and others In December of 2009, I was preparing to teach $1502 - Networked Programming al the University of Michigan for the fifth semester in a row and decided it was time to write a Python textbook that focused on exploring data instead of understanding algorithms and abstractions. My goal in 1502 is to teach people lifelong data handling skills using Python. Few of my students were planning to be professional computer programmers. Instead, they planned to be librarians, managers, lawyers, biologists, economists, etc., who happened to want to skillfully use technology in their chosen field, Inever seemed to find the perfect data-oriented Python book for my course, so T set out to write just such a book. Luckily at a faculty meeting three weeks before was about to start my new book from scratch over the holiday break, Dr. Atul Prakash showed me the Think Python book which he had used to teach his Python course that semester. It is a well-written Computer Science text with a focus on short, direct explanations and ease of learning ‘The overall book structure has been changed to get to doing data analysis problems as quickly as possible and have a series of running examples and exercises about data analysis from the very beginning Chapters 2-10 are similar to the Think Python book, but there have been major changes. Number-oriented examples and exercises have been replaced with dat oriented exercises. Topics are presented in the order needed to build increasingly sophisticated data analysis solutions. Some topics like try and except are pulled forward and presented as part of the chapter on conditionals. Functions are given very light treatment until they are needed to handle program complexity rather than introduced as an early lesson in abstraction, Nearly all user-defined functions iv Chapter 0. Preface have been removed from the example code and exercises outside of Chapter 4 The word “recursion”? does not appear in the book at all. In chapters 1 and 11-16, all of the material is brand new, focusing on real-world uses and simple examples of Python for data analysis including regular expres- sions for searching and parsing, automating tasks on your computer, retrieving data across the network, scraping web pages for data, using web services, parsing XML and JSON data, and creating and using databases using Structured Query Language. The ultimate goal of all of these changes is a shift from a Computer Science to an Informatics focus is to only include topics into a first technology class that can be useful even if one chooses not to become a professional programmer. Students who find this book interesting and want to further explore should look at Allen B. Downey's Think Python book. Because there is a lot of overlap be- tween the two books, students will quickly pick up skills in the additional areas of technical programming and algorithmic thinking that are covered in Think Python. ‘And given that the books have a similar writing style, they should be able to move quickly through Think Python with a minimum of effort. As the copyright holder of Think Python, Allen has given me permission to change the book’s license on the material from his book that remains in this book from the GNU Free Documentation License to the more recent Creative Commons Attri- bution — Share Alike license. This follows a general shift in open documentation licenses moving from the GFDL to the CC-BY-SA (e.g., Wikipedia). Using the CC-BY-SA license maintains the book’s strong copyleft tradition while making it even more straightforward for new authors to reuse this material as they see fit. 1 feel that this book serves an example of why open materials are so important to the future of education, and want to thank Allen B, Downey and Cambridge University Press for their forward-looking decision to make the book available under an open copyright. Ihope they are pleased with the results of my efforts and Thope that you the reader are pleased with our collective efforts T would like to thank Allen B. Downey and Lauren Cowles for their help, patience, and guidance in dealing with and resolving the copyright issues around this book. Charles Severance www.dr-chuck.com ‘Ann Arbor, MI, USA September 9, 2013 Charles Severance is @ Clinical Associate Professor at the University of Michigan School of Information. xcept, of course, fortis line Contents Preface 1 Why should you learn to write programs? rat 12 13 14 15 16 17 18 19 Creativity and motivation Computer hardware architecture Understanding programming Words and sentences Conversing with Python Terminology: interpreter and compiler Writing a program What is a program? ‘The building blocks of programs What could possibly go wrong? ‘The learning journey Glossary Exercises 2 Variables, expressions, and statements 24 2.2 23 2d ‘Values and types Variables ‘Variable names and keywords Statements. a1 au 25 26 27 28 29 2.10 2a. 212 213 244 245 Operators and operands Expressions Order of operations ‘Modulus operator . String operations Asking the user for input ‘Comments Choosing mnemonic variable names Debugging Glossary Exercises Conditional execution 3.1 Boolean expressions 3.2. Logical operators 3.3 Conditional execution 3.4 Alternative execution 3.5 Chained conditionals 3.6 Nested conditionals 3.7 Catching exceptions using try and except. 3.8 Short-circuit evaluation of logical expressions 3.9 Debugging 3.10. Glossary 3.11 Exercises Functions 4.1 Function calls 4.2 Built-in functions 4.3. ‘Type conversion functions 4.4 Random numbers Contents 2 23 23 4 os 4 25 26 28 2B 30 31 31 32. 32 33 34 35 36 37 38 39 40 43 43 43 44 4s Contents 4.5 Math funetions 4.6 Adding new functions 47 Definitions and uses 4.8 — Flow of execution 4.9 Parameters and arguments 4.10 Fruitful functions and void functions 4.11 Why functions? 4.12 Debugging 4.13 Glossary 4.14 Exercises 5. Iteration 5.1 Updating variables 5.2 The while statement 5.3. Infinite loops 5.4 “Infinite loops” and break 5.5 Finishing iterations with cont inue 5.6 Definite loops using for 5.7 Loop patterns 5.8 Debugging 5.9 Glossary 5.10 Exercises 6 Strings 6.1 Astting is a sequence 6.2 Getting the length of a string using Len 6.3. Traversal through a string with a loop 6.4 String slices 6.5 Strings are immutable 6.6 Looping and counting 46 47 48 49 49 50 52 52. 33 37 37 57 58 67 or 68 68 69 6 70 67 68 69 6.10 oat 6.12 6.13 6.14 Files 16 1 78 19 7.10 TAL Lists 81 82 83 84 85 86 87 ‘The in operator String comparison ng methods Parsing strings Format operator Debugging Glossary Exercises Persistence Opening files Text files and lines Reading files Searching through a file Letting the user choose the file name 1 and oper Using try, ex Writing files Debugging Glossary Exercises A list is a sequence Lists are mutable ‘Traversing a list List operations List slices List methods Deleting elements Contents 70 70 n 2B 14 15 16 n ” 9 80 31 82 83 85 85 87 87 88 88 on a a 9 98 83 94, 4, Contents ix 8.8 Lists and functions 95 8.9 Lists and strings 96 8.10 Parsing lines ” 8.11 Objects and values... . coe -. 8 8.12 Aliasing . cone cone 99 8.13 List arguments 100 8.14 Debugging 101 8.15 Glossary 104 8.16 Exercises 10s 9 Dictionaries 107 9.1 Dictionary as a set of counters 109 9.2 Dictionaries and files : : 110 9.3. Looping and dictionaries ML 9.4 Advanced text parsing 12 95 Debugging ua 9.6 Glossary us 9.1 Exercises. . . . 1s 10 Tuples 7 10.1 Tuples are immutable... cee -. 7 10.2. Comparing tuples 18 10.3 Tuple assignment 19 10.4 Dictionaries and tuples 121 10.5 Multiple assignment with dictionaries 12 10.6 The most common words vee 12 10.7 Using tuples as keys in dictionaries 124 10.8 Sequences: strings, lists, and tuples—Oh My! 124 10.9 Debugging 125 10.10. Glossary 126 10.11 Exercises 127 11 Regular expressions ta 16 ua 1s 19 (Character matching in regular expressions Extracting data using regular expressions ‘Combining searching and extracting Escape character Summary Bonus section for Unix users Debugging Glossary Exercises 12 Networked programs 124 122 123 124 125 126 127 128 129 HyperText Transport Protocol - HTTP ‘The World’s Simplest Web Browser Retrieving an image over HTTP Retrieving web pages with ur11 ib Parsing HTML and scraping the web Parsing HTML using regular expressions Parsing HTML using BeautifulSoup Reading binary files using urllib Glossary 12.10 Exercises 13 Using Web Services 13.1 B2 133 eXtensible Markup Language - XML Parsing XML Looping through nodes JavaScript Object Notation - JSON Parsing JSON Application Programming Interfaces Contents 129 130 131 133, 137 137 138 139 140 140 143 143 144 us 148. 148. 149 150 153 153 155 155 156 156 137 158 159 Contents 13.7 Google geocoding web service 13.8 Security and APT usage 13.9 Glossary 13.10 Exercises 14 Using databases and Structured Query Language (SQL) 14.1 What is a database? 14,2 Database concepts 14,3 SQLite manager Firefox add-on 14.4 Creating a database table 14.5 Structured Query Language summary 14.6 Spidering Twitter using a database 14.7 Basic data modeling 14.8 Programming with multiple tables 14.9 Three kinds of keys 14.10. Using JOIN to retrieve data 14.11 Summary 14.12. Debugging 14.13 Glossary 15.1 Building a Google map from geocoded data 15.2. Visualizing networks and interconnections 15.3 Visualizing mail data 16 Automating common tasks on your computer 16.1 File names and paths 16.2 Example: Cleaning up a photo directory 163 Command-line arguments 16.4 Pipes 165. Glossary 16.6 Exercises 160 162 166 167 203 203 ~ 206 209 210 ait 212 A. Python Programming on Windows B_ Python Programming on Macintosh C Contributions D_ Copyright Detail Contents 25 217 219 223

You might also like