Professional Documents
Culture Documents
3 Python Tools Data Scientists Can Use For Production
3 Python Tools Data Scientists Can Use For Production
Genevieve Hayes
Follow
Sep 7 · 7 min read
Readable;
Free from errors;
Robust to exceptions;
Efficient;
Well documented; and
Reproducible.
The problem is that most data scientists don’t even realize that
writing production-quality code is something they can and should
learn.
For many of these steps, there are no real short cuts to be taken.
The only way to build a minimum viable product, for example, is to
roll up your sleeves and start coding. However, in a few cases, tools
exist to automate tedious manual processes and make your life
much easier.
In Python, this is the situation for steps 4, 8 and 10, thanks to the
unittest, flake8 and sphinx packages.
Within this class, I have created two unit tests, one to test
circle_area() works for radius 2 (test_circle_area1) and one for
radius 0 (test_circle_area2). The names of these functions, again,
don’t matter, except that they must start with test_ and have the
parameter self.
Assuming all tests are passed, the output will look something like
this, with a dot on the top line for each test that has passed.
Alternatively, if one of your tests fails, then the top line of the
output will include an “F” for each failed test and further output
will be provided, giving details of the failures.
If you are writing your code using Python scripts (i.e. .py files),
ideally you should house your unit tests in a separate testing file, to
keep them apart from your main code. However, if you are using a
Jupyter notebook, you can just place the unit tests in the final cell
of the notebook.
One you have created your unit tests and got them working, it is
worthwhile re-running them whenever you make any (significant)
changes to your code.
If you are writing your code as a Python script, the flake8 package
will check for PEP 8 compliance.
The output will tell you exactly where your code is non-compliant.
Create Professional-Looking
Documentation with sphinx
Ever wondered how the creators of Python packages, such as
NumPy and scikit-learn, get their documentation to look so good?
I used sphinx when I wrote the Python package mlrose and here is
an extract from one of the functions contained within this package.
Note the very specific way in which the docstring at the top of this
function is formatted.
Running sphinx over this code produces the following nicely
formatted documentation:
You may never get your code to the point where no one is ever
going to complain about it, but at the very least, you won’t
embarrass yourself by trying to fit a regression model, one line at a
time, in the S-Plus console.