The document outlines the basic tasks of data analysis including getting data, exploring it, cleaning it, summarizing it, transforming it, modeling it, and deploying models as applications. It then discusses commonly used Python tools for data analysis like Jupyter notebooks which allow for explaining results in plain language using Markdown text and LaTeX formulas. Finally, it lists popular Python libraries for data analysis covering data handling with Pandas, visualization with Matplotlib and other libraries, modeling with SciKit-Learn and NumPy, and web app development with Dash.
The document outlines the basic tasks of data analysis including getting data, exploring it, cleaning it, summarizing it, transforming it, modeling it, and deploying models as applications. It then discusses commonly used Python tools for data analysis like Jupyter notebooks which allow for explaining results in plain language using Markdown text and LaTeX formulas. Finally, it lists popular Python libraries for data analysis covering data handling with Pandas, visualization with Matplotlib and other libraries, modeling with SciKit-Learn and NumPy, and web app development with Dash.
The document outlines the basic tasks of data analysis including getting data, exploring it, cleaning it, summarizing it, transforming it, modeling it, and deploying models as applications. It then discusses commonly used Python tools for data analysis like Jupyter notebooks which allow for explaining results in plain language using Markdown text and LaTeX formulas. Finally, it lists popular Python libraries for data analysis covering data handling with Pandas, visualization with Matplotlib and other libraries, modeling with SciKit-Learn and NumPy, and web app development with Dash.
• Get data • Explore • Clean • Summarize • Transform • Model • Deploy the model as an application Python tools for data analysis • Jupyterlab, Jupyter Notebooks or Google Colab are commonly used for data analysis because of communicative nature of data analysis • Jupyterlab can be installed as a standalone application or using Anaconda • Reason for using jupyter-like environment is that results must be explained in plain language • Markdown syntax can be used for text, formulas can be embedded using Latex-syntax (online editor) Python libraries • Pandas, creation of data frames, data summary, cleaning and transformation • Matplotlib, Seaborn, Plotly, Bokeh, Altair… visualization • Matplotlib is non-interactive and low level • Matplotlib can plot everything, but complex plots need a lot of code • Other libraries are interactive and higher level • Less coding, but more constraints on what you can plot • Installation in jupyterlab using %pip for example %pip install plotly • SciKit-Learn, NumPy, Statsmodels, ... Modeling • Dash, web application development