0% found this document useful (0 votes)
501 views10 pages

Modern Data Science With R-775437 Chapters

This document outlines the contents of a book on data science. It contains 21 chapters organized into 4 parts, with the chapters covering topics like data visualization, wrangling, modeling, ethics, and more. It also includes 4 appendix sections on related topics such as R, algorithmic thinking, reproducible workflows, and regression modeling. The document provides a detailed overview of the book's structure and the topics covered in each chapter.

Uploaded by

Tweety Aristotle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
501 views10 pages

Modern Data Science With R-775437 Chapters

This document outlines the contents of a book on data science. It contains 21 chapters organized into 4 parts, with the chapters covering topics like data visualization, wrangling, modeling, ethics, and more. It also includes 4 appendix sections on related topics such as R, algorithmic thinking, reproducible workflows, and regression modeling. The document provides a detailed overview of the book's structure and the topics covered in each chapter.

Uploaded by

Tweety Aristotle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
  • Part I: Introduction to Data Science: Provides an introduction to the fundamental concepts of data science, the evolution of sabermetrics, and resources for further learning.
  • A grammar for graphics: Introduces a structured approach to creating data graphics, focusing on data wrangling and visualization principles.
  • Tidy data: Describes the concept of tidy data and its role in data analysis and visualization.
  • Data science ethics: Addresses the ethical considerations and societal impacts of data science, including privacy and professional responsibilities.
  • Iteration: Discusses methods for iterating over datasets with emphasis on vectorized operations for efficiency.
  • Part II: Statistics and Modeling: Delves into statistical foundations and modeling techniques, including predictive and supervised learning, and statistical inference.
  • Unsupervised learning: Explores methods such as clustering for discovering patterns in unlabeled data.
  • Simulation: Covers the use of simulation to model and understand complex systems and processes.
  • Part III: Topics in Data Science: Discusses advanced topics in data science including web content, geospatial data, and text as data.
  • Database administration: Guides on administering databases with emphasis on performance optimization and scalability.
  • Text as data: Explores techniques for using text data in analytical processes.
  • Network science: Examines network structures and interactions within various data-driven contexts.
  • Part IV: Appendices: Contains supplementary material, including packages, algorithmic thinking, and workflow reproducibility.

1

Contents

About the Authors

Preface

I Part I: Introduction to Data Science

1 Prologue: Why data science?


1.1 What is data science?
1.2 Case study: The evolution of sabermetrics
1.3 Datasets
1.4 Further resources

2 Data visualization
2.1 The 2012 federal election cycle
2.2 Composing data graphics
2.3 Importance of data graphics: Challenger
2.4 Creating effective presentations
2.5 The wider world of data visualization
2.6 Further resources
2.7 Exercises
2.8 Supplementary exercises

3 A grammar for graphics


3.1 A grammar for data graphics

7
3.2 Canonical data graphics in R
3.3 Extended example: Historical baby names
3.4 Further resources
3.5 Exercises
3.6 Supplementary exercises

4 Data wrangling on one table


4.1 A grammar for data wrangling
4.2 Extended example: Ben's time with the Mets
4.3 Further resources
4.4 Exercises
4.5 Supplementary exercises

5 Data wrangling on multiple tables


5.1 inner _ join()
5.2 lef t _ join()
5.3 Extended example: Manny Ramirez
5.4 Further resources
5.5 Exercises
5.6 Supplementary exercises

6 Tidy data
6.1 Tidy data
6.2 Reshaping data
6.3 Naming conventions
6.4 Data intake
6.5 Further resources
6.6 Exercises
6.7 Supplementary exercises

7 Iteration
7.1 Vectorized operations

8
7.2 Using across() with dplyr functions
7.3 The map() family of functions
7.4 Iterating over a one-dimensional vector
7.5 Iteration over subgroups
7.6 Simulation
7.7 Extended example: Factors associated with BMI
7.8 Further resources
7.9 Exercises
7.10 Supplementary exercises

8 Data science ethics


8.1 Introduction
8.2 Truthful falsehoods
8.3 Role of data science in society
8.4 Some settings for professional ethics
8.5 Some principles to guide ethical action
8.6 Algorithmic bias
8.7 Data and disclosure
8.8 Reproducibility
8.9 Ethics, collectively
8.10 Professional guidelines for ethical conduct
8.11 Further resources
8.12 Exercises
8.13 Supplementary exercises

II Part II: Statistics and Modeling

9 Statistical foundations
9.1 Samples and populations
9.2 Sample statistics
9.3 The bootstrap
9.4 Outliers

9
9.5 Statistical models: Explaining variation
9.6 Confounding and accounting for other factors
9.7 The perils of p-values
9.8 Further resources
9.9 Exercises
9.10 Supplementary exercises

10 Predictive modeling
10.1 Predictive modeling
10.2 Simple classi cation models
10.3 Evaluating models
10.4 Extended example: Who has diabetes?
10.5 Further resources
10.6 Exercises
10.7 Supplementary exercises

11 Supervised learning
11.1 Non-regression classi ers
11.2 Parameter tuning
11.3 Example: Evaluation of income models redux
11.4 Extended example: Who has diabetes this time?
11.5 Regularization
11.6 Further resources
11.7 Exercises
11.8 Supplementary exercises

12 Unsupervised learning
12.1 Clustering
12.2 Dimension reduction
12.3 Further resources
12.4 Exercises
12.5 Supplementary exercises

10
13 Simulation
13.1 Reasoning in reverse
13.2 Extended example: Grouping cancers
13.3 Randomizing functions
13.4 Simulating variability
13.5 Random networks
13.6 Key principles of simulation
13.7 Further resources
13.8 Exercises
13.9 Supplementary exercises

III Part III: Topics in Data Science

14 Dynamic and customized data graphics


14.1 Rich Web content using D3. js and htmlwidgets
14.2 Animation
14.3 Flexdashboard
14.4 Interactive web apps with Shiny
14.5 Customization of ggplot2 graphics
14.6 Extended example: Hot dog eating
14.7 Further resources
14.8 Exercises
14.9 Supplementary exercises

15 Database querying using SQL


15.1 From dplyr to SQL
15.2 Flat- le databases
15.3 The SQL universe
15.4 The SQL data manipulation language
15.5 Extended example: FiveThirtyEight ights
15.6 SQL vs. R
15.7 Further resources

11
15.8 Exercises
15.9 Supplementary exercises

16 Database administration
16.1 Constructing ef cient SQL databases
16.2 Changing SQL data
16.3 Extended example: Building a database
16.4 Scalability
16.5 Further resources
16.6 Exercises
16.7 Supplementary exercises

17 Working with geospatial data


17.1 Motivation: What's so great about geospatial data?
17.2 Spatial data structures
17.3 Making maps
17.4 Extended example: Congressional districts
17.5 Effective maps: How (not) to lie
17.6 Projecting polygons
17.7 Playing well with others
17.8 Further resources
17.9 Exercises
17.10 Supplementary exercises

18 Geospatial computations
18.1 Geospatial operations
18.2 Geospatial aggregation
18.3 Geospatial joins
18.4 Extended example: Trail elevations at MacLeish
18.5 Further resources
18.6 Exercises
18.7 Supplementary exercises

12
19 Text as data
19.1 Regular expressions using M acbeth
19.2 Extended example: Analyzing textual data from arXiv.org
19.3 Ingesting text
19.4 Further resources
19.5 Exercises
19.6 Supplementary exercises

20 Network science
20.1 Introduction to network science
20.2 Extended example: Six degrees of Kristen Stewart
20.3 PageRank
20.4 Extended example: 1996 men's college basketball
20.5 Further resources
20.6 Exercises
20.7 Supplementary exercises

21 Epilogue: Towards “big data”


21.1 Notions of big data
21.2 Tools for bigger data
21.3 Alternatives to R
21.4 Closing thoughts
21.5 Further resources

IV Part IV: Appendices

A Packages used in this book


A.1 The mdsr package
A.2 Other packages
A.3 Further resources

B Introduction to R and RStudio


B.1 Installation

13
B.2 Learning R
B.3 Fundamental structures and objects
B.4 Add-ons: Packages
B.5 Further resources
B.6 Exercises
B.7 Supplementary exercises

C Algorithmic thinking
C.1 Introduction
C.2 Simple example
C.3 Extended example: Law of large numbers
C.4 Non-standard evaluation
C.5 Debugging and defensive coding
C.6 Further resources
C.7 Exercises
C.8 Supplementary exercises

D Reproducible analysis and work ow


D.1 Scriptable statistical computing
D.2 Reproducible analysis with R Markdown
D.3 Projects and version control
D.4 Further resources
D.5 Exercises
D.6 Supplementary exercises

E Regression modeling
E.1 Simple linear regression
E.2 Multiple regression
E.3 Inference for regression
E.4 Assumptions underlying regression
E.5 Logistic regression
E.6 Further resources

14
E.7 Exercises
E.8 Supplementary exercises

F Setting up a database server


F.1 SQLite
F.2 MySQL
F.3 PostgreSQL
F.4 Connecting to SQL

Bibliography

Indices
Subject index
R index

15

You might also like