Alfeo, Introduction To PyCharm, PyLint, PyTest, and CVS

1
USING PYCHARM, PYLINT,

PYTEST, AND VCS TO DESIGN
A PROJECT PIPELINE
Antonio Luca Alfeo
2
OVERVIEW
1. Installation 8. Python basics to design a structured project pipeline
I. Tabular data with pandas.dataframes
2. Work with Pycharm projects
II. Dataframe manipulation
I. Configure the interpreter
II. Requirements management III. Data segregation with Sklearn
IV. Model hyperarametrization
III. Project navigation and run
V. From JSON to Object
VI. Performance evaluation
3. Code quality checking VII. Model deployment
I. Pylint VIII. Project profiling
II. Plugin installation and usage
9. Recap and more resources

4. Code correctness checking
I. Enabling and configuring Pytest in Pycharm
II. Testing methods and usage 10. Exercise
5. Version Control on Pycharm

I. Local history
II. Connect to GitHub
III. Github: share a project
IV. Github: commit and push
V. Github: clone a project
6. Recap and more resources

7. Exercise
3
INSTALLATION
4
INSTALLATION 1/3
1. Install Git using only the default settings in the installation process
• https://git-scm.com/downloads
2. Install PyCharm
• Linux -> https://download.jetbrains.com/python/pycharm-community-
anaconda-2020.2.3.tar.gz
• Windows -> https://download.jetbrains.com/python/pycharm-community-
anaconda-2020.2.3.exe
• MacOs -> https://download.jetbrains.com/python/pycharm-community-
anaconda-2020.2.3.dmg
5
INSTALLATION 2/3
During PyCharm installation enable “open folder as project”
6
INSTALLATION 3/3
• Accept the JetBrains
Privacy Policy
• Choose the UI theme you
prefer
• Do not install any
featured plugin
• Install Miniconda: includes
the conda environment
manager, Python, the
packages they depend
on, and a small number
of other useful packages
(e.g. pip).
• Start using PyCharm
7
WORK WITH PYCHARM

PROJECTS
8
OPEN A PROJECT
1. Unzip “simpleClassifier.rar”
2. Open the resulting folder “as a

PyCharm project”
9
GUI
Menu Toolbar Run/Debug buttons
Project view Editor
Console and Tool windows

10
INTERPRETER CONFIGURATION
• File > Settings > Project > Python interpreter > Show all
1
4
2
• > Conda Environment > New environment
• Location and conda executable may have slightly different paths
(the first part) according to your miniconda3 location.
6
5
• Apply > Ok 7
8
11
WARNING
It can take some minutes to complete the project

setting. A restart may be required before moving to
the next step!
12
PROJECT REQUIREMENTS 1/2

• The requirements are all the packages that our software needs to
run properly. Those can be installed with a PyCharm plugin.
• Double click on “requirements.txt” in the Project View > Install Plugin
• Once the plugin is installed click on “Install requirements”, select all

and click install.
13
PROJECT REQUIREMENTS 2/2

• if everything goes smoothly you might see the package installation
status progress at the bottom of the GUI.
• Once the package installation is done a notification appears, but it

is still NOT possible to go to next step, at least until PyCharm has
finished "Updating skeletons".
• Each package can also be installed via GUI or with the Terminal by
using
• “pip install <name_lib>” to install a single package
• “pip install –r requirements.txt” to install the packages on the
requirements.txt
14
CHECK THE PACKAGES

AVAILABLE
You can see (add, eliminate and upgrade*) the packages and
libraries available in your virtual environment by checking :
File > Settings > Project > Python Interpreter
*
15
CONFIGURE THE FIRST RUN

1. Click “Add configuration” 3. Select “main.py” in your project folder.
2. Select “Python”
4. Click OK
16
Click RUN RUN IT!
Txt with evaluation score! The confusion matrix!

17
CODE QUALITY CHECKING

18
CODE QUALITY CHECKING

Pylint is a source-code, bug and quality checker for the Python
programming language. It checks:
• if the code is compliant with Python's PEP8 standard
• if each module is properly imported and used
• if declared interfaces are truly implemented
• if there is duplicated code

19
PYLINT PLUGIN INSTALLATION

2
1. File > Settings
> Plugin 1
3
2. Search for
“pylint”
3. Install Pylint
plugin 4
4. Restart
Pycharm
20
PYLINT PLUGIN SETUP

2
3
1. File > Settings > Pylint

2. Insert the path to the pylint
1
executable
3. Click “Test”
4. Check the result
5. Click “Apply” and “Ok”. You
can now run “pylint” by
clicking the green arrow in
the Pylint Tool Windows
5 4
21
PYLINT TOOL WINDOW USAGE
1
1. Double click on a Python
file in the Project View
2. Open the Pylint Tool

Windows
3. Click the green arrow
2
22
USE PYLINT VIA COMMAND LINE

23
CODE CORRECTNESS CHECKING

24
CODE CORRECTNESS CHECKING

PyCharm supports pytest, providing many testing features such as:
• Dedicated test runner.
• Code completion for test subject.
• Code navigation.
• Detailed failing assert reports.
• Multiprocessing test execution.

25
ENABLE PYTEST
1. File > Settings > Python Integrated Tools and
2. Set “pytest” as the “Default test runner”
1
26
CREATE A NEW TEST

1. Put the cursor on the bracket of a method declaration, right-click and
select Go To > Test
2. Create a new test script
3. Give it a directory, a name, and a method to test
1
2
3
27
BUILD AND RUN A TEST 1/3

1. Fill the test function and the assert statement. If this statement is true
the test is PASSED!
2. Click “Edit pytest for …”

28

3. Reduce the verbosity of the pytest
outcome to improve readability. Put the
following Additional Arguments:
“ -q --disable-warnings --tb=no “
-q, quiet output
--disable-warnings, don't show most of the warnings
3
--tb=no, don't show any traceback
4. Click OK and Run the test
4
29

30
TEST PARAMETRIZATION
The Pytest framework makes it easy to write tests to support multiple

and complex functional testing for applications and libraries. Pytest
enables test parametrization at several levels, such as:
• @pytest.fixture provide a fixed baseline (e.g. an instance of a class)

upon which tests can reliably and repeatedly execute. Moreover,
via scope controls it is possible to check how often a fixture gets
called.
• @pytest.mark.parametrize allows the definition of multiple sets of

arguments and fixtures for testing purpose.
31
BUILD AND RUN MANY TESTS

1. Prepare a set of values to 1
test
2. Test those by passing

them to the test function
3. Now you have one

evaluation for each test 2
value!
3
32
USE PYTEST VIA COMMAND LINE

33
VERSION CONTROL ON
PYCHARM
34
VERSION CONTROL
In Pycharm the Version Control
can be accessed locally:
1. via VCS > Local History
2. Stores only local and recent
change in the project
highlighting them 1
2
35
ENABLE VCI TO USE GITHUB

Github is way more convenient
especially when a project is 1
developed in group.
After VC integration is enabled

(1 and 2), PyCharm will ask you
whether you want to share
project settings files via VCS.
In the dialog you can

choose ”Always Add” to
synchronize project settings with
other repository users who work 2
with PyCharm.
36
ENABLE VCI TO USE GITHUB

Login into GitHub, requesting a new access token with your login and
password (1). Then, authorize JetBrain IDE integration (2 and 3).
1 2 3
37
SHARE A PROJECT 1/2

1. When connection to GitHub has been established, the Share Project on
GitHub dialog appears (VCS > Import Version Control > Share …).
2. Name the repository and click share
3
1
38
SHARE A PROJECT 2/2

1. Being the initial commit, of the
project every file is highlighted
in red (they are not in the
VCS).
2. Click Add
3. PyCharm will notify once the 1

process is finalized
2
39
… ON GITHUB
Let’s say you have a central repository for your project, the
contribution of each coder will be highlighted by the commits she/he
made. In the next slide you will see how to “sign” your commit º
40
COMMIT AND PUSH

Update (pull) Push
Commit
Color code:
- Blue: modified
- Green: commit done
- Red: not in the VCS
- White: push done
Explain clearly the
change provided
After the first commit here
there will be the “before
and after” comparison
41
CLONE FROM GIT 1/2

1. Go to GitHub and annotate the URL of the repository you are
interested in.
2. From the main menu, choose VCS > Get from Version Control, and
choose GitHub on the left.
2
42
CLONE FROM GIT 2/2

1. Specify the URL of the repository that you want to clone (the one you
annotated from GitHub).
2. In the Directory field, enter the path to the folder where your local Git
repository will be created.
3. Click Clone. If you want to create a project based on these sources,
click Yes in the confirmation dialog. PyCharm will automatically set Git
root mapping to the project root directory.
1
2
3
43
44
RECAP
PyCharm is an integrated development environment (IDE) widely
used in the Python community. PyCharm provides:
• Project and code navigation: create a Python project by opening a
folder as a project or even from scratch, by creating or importing
the python files. Navigate the project with specialized project views,
jump between files, classes, methods and usages.
• Python refactoring: rename, extract methods, introduce variables
and constants
• Configurable python interpreter and virtual environment support.
• Coding assistance and analysis: code completion, syntax and error
highlighting, quick fixes, guided import, integrated Python
debugger, linting and unit testing with code coverage.
• PyCharm provides version control integration, a unified user
interface for many VCS such as GithHub, supporting the
management of commit, push, pull, change lists, cloning… and
even branch, merge, and conflict via Git.
45
MORE RESOURCES…
46
ON GETTING STARTED WITH

PYCHARM
PYCHARM DOCS
• Miniconda can be installed at any time from Tools > Install Miniconda3
• https://www.jetbrains.com/help/pycharm/quick-start-guide.html
• https://www.jetbrains.com/help/pycharm/opening-your-project-for-the-first-
time.html
• https://www.jetbrains.com/help/pycharm/configuring-python-interpreter.html
• https://www.youtube.com/watch?v=hyoS5W_1kBQ&list=PLQ176FUIyIUbDwdvWZN
uB7FrCc6hHregy
CREATE AND MANAGE REQUIREMENTS
• https://www.jetbrains.com/help/pycharm/managing-
dependencies.html#create-requirements
A PYCHARM BOOK
• https://www.academia.edu/42195195/Hands_On_Application_Development_wit
h_PyCharm_Accelerate_your_Python_applications_using_practical_coding_tech
niques_in_PyCharm?auto=download
47
ON LINTING
• https://www.python.org/dev/peps/pep-0008/
• https://plugins.jetbrains.com/plugin/11084-pylint
• Pylint is fully customizable: modify your pylintrc to customize

which errors or conventions are important to you.
48
ON TESTING
• https://www.jetbrains.com/help/pycharm/pytest.html
• https://www.jetbrains.com/pycharm/guide/technologies/pyt
est/
• pytest.mark allows you setting the metadata on your test
functions. Here you have some builtin markers different from
pytest.mark.parametrize:
• @pytest.mark.skip: always skip a test function
• @pytest.mark.skipif: skip a test function if a certain condition is
met
• @pytest.mark.xfail: produce an expected failure outcome if a
certain condition is met
49
ON VERSION CONTROL
PYCHARM DOCS
• https://www.jetbrains.com/help/pycharm/enabling-version-control.html
• https://www.jetbrains.com/help/pycharm/manage-projects-hosted-on-
github.html
• https://www.jetbrains.com/help/pycharm/contribute-to-projects.html
VIDEO TUTORIAL
• https://www.youtube.com/watch?v=jFnYQbUZQlA
• https://www.youtube.com/watch?v=_w9XWHDSSa4
• https://www.youtube.com/watch?v=AHkiCKG-JhM
50
51
EXERCISE
1. Use the IrisClassifier.py by changing the number of training epochs,
i.e. "max_iter" in the MLPClassifier() documentation page. How
does the performance change by using 50 epochs?
2. Perform code linting.
3. Test if the performance is greater than 0.75 with number of training

epochs equal to 5, 10, 25, 50, 100, 200. How many of them fail?
4. Make commit and push on your personal GitHub repository.

52
A POSSIBLE SOLUTION
from typing import Set

from irisclassifier import IrisClassifier
import pytest
epochs: Set[int] = {5, 10, 25, 50, 100, 200}
@pytest.mark.parametrize('epoch', epochs)
def test_evaluations(epoch):
i = IrisClassifier(epoch)
i.ingestion()
i.segregation()
i.train()
res = i.evaluation()
assert res > 0.75
53
54
PYTHON BASICS TO DESIGN

A PROJECT PIPELINE
55
SO MANY PACKAGES AVAILABLE

• Pytest and Pylint: to write better code!
• Pandas: read data from a broad range of sources like CSV, SQL
databases, and Excel files.
• Json and Joblib: to save/read models and configurations to/from
files.
• Sci-kit_learn: machine learning library with a broad range of
clustering, regression and classification algorithms. It provides also
data preprocessing, pipelining and performance analysis modules.
• Matplotlib: graphic visualization libraries.
• NumPy: to apply mathematical functions and scientific
computation to multi-dimensional data
…and much more.

56
PROJECT ORGANIZATION
• Consider that each class in your project should provide
one or more functionalities that work with inputs and
configurations to generate outputs. Inputs,
configurations and outputs have to be stored in
dedicated folders. Each class should be able to
save/load data and configurations from those folders.
• Let’s start with a very simple example. We aim at
classify a yearly set of time series. We need at least 3
folders:
• data -> tabular data, from raw to segregated form
• configs -> configurations and models
• results -> outcomes, figures and scores obtained
57
1. UPLOAD/SAVE TABULAR DATA

WITH PANDAS.DATAFRAME
from pandas import read_csv
Path of the csv to read
# Read csv
data1 = read_csv('data/hotspotUrbanMobility-1.csv')
# Print dataframe shape Dataframe shape
print(data1.shape) consist of its numer of
# Read csv rows and column
data2 = read_csv('data/hotspotUrbanMobility-2.csv')
# Print dataframe shape
print(data2.shape)
# Append dataframes Append 2 dataframes
data1 = data1.append(data2, ignore_index=True) and ignore the indexes
print(data1.shape) that pandas provides
# Save the new datafame as csv
data1.to_csv('data/completeDataset.csv', index=False)
Save the dataframe
58
2. DATAFRAME MANIPULATION
# Read csv
data = read_csv('data/completeDataset.csv') Provides statistics for
# Check the content of the dataframe each column in the
dataframe
print(data.describe())
# Remove column with constant values Drop the column (axis=1)
data = data.drop('h24', axis=1) whose label is ‘h24’
# Remove anomalous instances
data = data.loc[data['Anomalous'] < 1] Loc select instances by
print(data.describe()) using a boolean array.
# Save as csv
data.to_csv('data/preprocessedDataset.csv', index=False)
59
3. DATA SEGREGATION WITH

PANDAS AND SCIKIT-LEARN
from sklearn.model_selection import train_test_split Split randomly 75% and
25% of the dataset
# Read csv (default), once it knows
data = read_csv('data/preprocessedDataset.csv') which are the target
print(data.describe()) and the data.
# Random split 75% as training and 25% as testing
training_data, testing_data, training_labels, testing_labels =
train_test_split(data.iloc[:, 4:len(data.columns)], data.iloc[:, 1])
# Save as csv
training_data.to_csv('data/trainingData.csv', index=False)
testing_data.to_csv('data/testingData.csv', index=False) iloc provides
training_labels.to_csv('data/trainingLabels.csv', index=False) integer-location
testing_labels.to_csv('data/testingLabels.csv', index=False) based indexing
60
4. ML MODEL PARAMETRIZATION
VIA GRID SEARCH
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from numpy import ravel
import json
The ML model
# Read the data
training_data = read_csv('data/trainingData.csv')
(random_state is used for
training_labels = read_csv('data/trainingLabels.csv') reproducibility)
# setup Grid Search for a randomForestClassifier
rf = RandomForestClassifier(random_state=0) N_estimator is the name
# values to test of the hyperparameter of
parameters = {'n_estimators': (1, 2, 3)}
RandomForestClassifier
# apply grid search
gs = GridSearchCV(rf, parameters)
gs.fit(training_data, ravel(training_labels)) Use ravel to transform
# save best configuration dataframe to narray
config_path = 'config/modelConfiguration.json'
with open(config_path, 'w') as f: Save the best
json.dump(gs.best_params_, f) hyperparameter as JSON
61
5. FROM JSON TO OBJECT
import json
Use a simple class to get
the model’s parameters
class Params():
def __init__(self, n_estimators): Load the JSON .
self.n_estimators = n_estimators Remember, it should be
compliant to the
# load best configuration
expected json schema
config_path = 'config/modelConfiguration.json' i.e. pass the validation
with open(config_path, 'r') as f:
params = json.load(f) Create a new Object,
and pass the JSON
params_object = Params(**params) dictionary as a map to
convert JSON data into a
custom Python Object
62
6. ML MODEL PERFORMANCE
ASSESSMENT
from sklearn.metrics import accuracy_score
import json
# Read the data

training_labels = read_csv('data/trainingLabels.csv')
testing_data = read_csv('data/testingData.csv')
testing_labels = read_csv('data/testingLabels.csv')
# train the model
model = RandomForestClassifier(n_estimators=3, random_state=0).fit(training_data, ravel(training_labels))
# compute labels on testing data
labels = model.predict(testing_data)
# evaluate accuracy score You can use many different
score = accuracy_score(ravel(testing_labels), labels) performances measures
# report the result on a json
results = {}
results['accuracy'] = float(score) Define a set for your results
json_str = json.dumps(results, indent=4) and save them as JSON
with open('results/testReport.json', 'w') as file:
file.write(json_str)
63
7. MODEL DEPLOYMENT WITH

JOBLIB
import joblib
# Read the training data

training_labels = read_csv('data/trainingLabels.csv')
# train the model
initial_model = RandomForestClassifier(n_estimators=3, random_state=0).fit(training_data, ravel(training_labels))
# save the model
joblib.dump(initial_model, 'config/fitted_model.sav') Save a fitted model to file
#...
# Read new data
testing_data = read_csv('data/testingData.csv')
testing_labels = read_csv('data/testingLabels.csv')
# load the model
model = joblib.load('config/fitted_model.sav') Load a fitted model from file
# Evaluate the initial model on new data
print('Evaluation score:')
print(model.score(testing_data, ravel(testing_labels))) Use the model with new data
64
8. PROFILING YOUR PROJECT

WITH CPROFILE AND SNAKEVIZ
1. Install the packages via Terminal by using:
• Pip install cProfile
• Pip install snakeviz The main of your
project
2. Than execute:
• python -m cProfile -o test.profile main.py
The output of your
• snakeviz test.profile project profiling
3. Open the link in the Terminal windows:
65
8. PROFILING YOUR PROJECT

WITH CPROFILE AND SNAKEVIZ
Use the interactive interface to navigate the profiling output or search for
a specific method to understand its usage in your project. More info in
the official documentation
66
RECAP AND MORE RESOURCES…

67
MANIPULATE TABULAR DATA

• Load csv as a dataframe, save a dataframe as csv
• Shape, head and quick description of a dataframe
• Dataframe concatenation and append
• Data sort by index or by value
• Data resampling
• Dataframe selection by value or index
• Dataframe columns/row addition and elimination
• Replace by value, drop duplicate rows, group by value
• Drop NAN values , fill NAN value
• Data interpolation and signal smoothing via moving average
…and much more
68
GRAPHS AND VISUALIZATION

WITH MATPLOTLIB.PYPLOT
• Plots and save a figure
• Histogram e barplot
• Boxplots
• Scatterplot
• Single e multiple graphs
…and much more

69
DATA PRE-PROCESSING
WITH SCIKIT-LEARN
• Data discretization
• Data normalization with minmax or standard scaler
• Features computation and extraction with NumPy and Scipy
• Split data in train and testing set
…and much more

70
ML MODEL IMPLEMENTATION
WITH SCIKIT-LEARN
• MLPclassifier (classification)
• RandomForestClassifier (classification)
• DecisionTreeRegressor (regression)
• MLPregressor (regression)
• k-means (partition based clustering)
• DBSCAN (density based clustering)
• IsolationForest (anomaly detection)
• OneClassSVM (anomaly detection)
…and much more

71
ARCHITECTURE EVALUATION
• Choose the right performance metric
• Performance visualization
• K-Folds and Stratified-K-Fold cross-validation
• Hyperparameters search via grid search
• Import/export objects parametrization with json
• Validate the json schema
• Dump/load fitted model with joblib
• How much time does an instruction take to execute?
• Official Pycharm profiling tool (just for Pro version)
…and much more

72
73
EXERCIZE
• Given the csvs used in this lab, build a few classes that:
• read the csvs “1” and “2” and append them
• remove the columns with constant values
• select only data from Cluster 0 (‘Cluster’ = 0)
• generate training and testing sets (respectively 75% and 25% of the
whole dataset) using only columns ‘h1’-‘h23’ as data and the column
‘Anomalous’ as target
• build a ML model using isolationForest
• train the model with the training data
• predict the anomalous label with the testing data
• compute the accuracy of your model. Consider that with your labels
you have 1 for anomalous instances whereas isolationForest return -1 for
anomalous instances.
• save the ML trained model
• load the trained model and use it with the “New” data
• print the resulting accuracy on screen.

Alfeo, Introduction To PyCharm, PyLint, PyTest, and CVS

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Alfeo, Introduction To PyCharm, PyLint, PyTest, and CVS

Uploaded by

Copyright:

Available Formats

1

USING PYCHARM, PYLINT,

9. Recap and more resources

5. Version Control on Pycharm

6. Recap and more resources

WORK WITH PYCHARM

2. Open the resulting folder “as a

Project view Editor

Console and Tool windows

It can take some minutes to complete the project

PROJECT REQUIREMENTS 1/2

• Once the plugin is installed click on “Install requirements”, select all

PROJECT REQUIREMENTS 2/2

• Once the package installation is done a notification appears, but it

CHECK THE PACKAGES

CONFIGURE THE FIRST RUN

Click RUN RUN IT!

Txt with evaluation score! The confusion matrix!

CODE QUALITY CHECKING

CODE QUALITY CHECKING

• if the code is compliant with Python's PEP8 standard

• if each module is properly imported and used

• if declared interfaces are truly implemented

• if there is duplicated code

PYLINT PLUGIN INSTALLATION

PYLINT PLUGIN SETUP

1. File > Settings > Pylint

PYLINT TOOL WINDOW USAGE

2. Open the Pylint Tool

3. Click the green arrow

USE PYLINT VIA COMMAND LINE

CODE CORRECTNESS CHECKING

CODE CORRECTNESS CHECKING

• Dedicated test runner.

• Code completion for test subject.

• Detailed failing assert reports.

• Multiprocessing test execution.

CREATE A NEW TEST

BUILD AND RUN A TEST 1/3

2. Click “Edit pytest for …”

BUILD AND RUN A TEST 2/3

4. Click OK and Run the test

BUILD AND RUN A TEST 3/3

The Pytest framework makes it easy to write tests to support multiple

• @pytest.fixture provide a fixed baseline (e.g. an instance of a class)

• @pytest.mark.parametrize allows the definition of multiple sets of

BUILD AND RUN MANY TESTS

2. Test those by passing

3. Now you have one

USE PYTEST VIA COMMAND LINE

ENABLE VCI TO USE GITHUB

After VC integration is enabled

In the dialog you can

ENABLE VCI TO USE GITHUB

SHARE A PROJECT 1/2

SHARE A PROJECT 2/2

3. PyCharm will notify once the 1

COMMIT AND PUSH

CLONE FROM GIT 1/2

CLONE FROM GIT 2/2

ON GETTING STARTED WITH

• Pylint is fully customizable: modify your pylintrc to customize

2. Perform code linting.