How Great Data Scientists Can Stop Writing Bad Code

09/09/2019 How great data scientists can stop writing bad code
How great data scientists can stop

writing bad code
Christopher Kelly Follow
Dec 12, 2017 · 5 min read
https://heartbeat.fritz.ai/how-great-data-scientists-can-stop-writing-bad-code-9b054eb62b75 1/13
https://xkcd.com/1513/
I’m a software engineer by trade. But I seem to keep dipping my toes into
the machine learning and data science communities. There are some
awesome libraries out there, but I admit I’ve had a difficult time wrapping
my head around many of them. Surprisingly, it’s not the math. It’s actually
the code itself that I find myself getting stuck on. It’s understandable.
Tensor algebra is hard enough without needing to worry about style guides
and code reviews.
Your code is the first impression others have of your hard work. A good first
impression will pique interest and encourage engagement with the tools you
build.
The other day I was reading some code from Apple that infers shapes from
Core ML neural network files. It’s great to see Apple getting their hands
dirty in the open source community. But, I got stuck trying to understand
this one function and have decided to document my processes of figuring it
out.
1 def crop(layer, shape dict):

1 def _crop(layer, shape_dict):
2 params = layer.crop
3 Seq, Batch, Cin, Hin, Win = shape_dict[layer.input[0]]
4
5 l = r = t = b = 0
6 if len(layer.input) == 1:
7 if len(params.cropAmounts.borderAmounts) != 0:
8 t = params.cropAmounts.borderAmounts[0].startEdgeSize
9 b = params.cropAmounts.borderAmounts[0].endEdgeSize
10 l = params.cropAmounts.borderAmounts[1].startEdgeSize
11 r = params.cropAmounts.borderAmounts[1].endEdgeSize
12 Hout = Hin - t - b
13 Wout = Win - l - r
14 else:
15 Hout = shape_dict[layer.input[1]][3]
16 Wout = shape_dict[layer.input[1]][4]
17
18 shape_dict[layer.output[0]] = (Seq, Batch, Cin, int(Hout), int(Wout))
_crop.py hosted with ❤ by GitHub view raw
. . .
Can you read this and understand what’s happening? If so, you are smarter
than me. This code doesn’t really provide much context for anything. Here
were my thoughts reading through this function the first few times:
1 def _crop(layer, shape_dict):

2 # Okay. Cool. We have some function Crop. I guess it takes a layer and a shape_dict.
y p g y p _
3 # Wonder what _crop will do. Let's. find. out.
4
5 params = layer.crop
6 # cool. there are some crop params. I wonder what they're used for? I guess we'll
7 # find out eventually.
8
9 Seq, Batch, Cin, Hin, Win = shape_dict[layer.input[0]]
10 # What are Cin, Hin, Win? Are these all classes because they're capitalized?
11
12 l = r = t = b = 0
13 # l, r, t, b. What is this? ah, left right top bottom would make sense.
14 # What do they refer to? The size of the input layer image? Why are they
15 # all zero initially?
16
18 # Okay, so if there's only one input blob we do this. Is this
19 # important in some way? Why would there only be one input blob?
20
21 if len(params.cropAmounts.borderAmounts) != 0:
22 # ah, here's the params settings. I guess the layer params tell us about
23 # how the layer is cropped.
24 t = params.cropAmounts.borderAmounts[0].startEdgeSize
25 b = params.cropAmounts.borderAmounts[0].endEdgeSize
26 l = params.cropAmounts.borderAmounts[1].startEdgeSize
27 r = params.cropAmounts.borderAmounts[1].endEdgeSize
28 Hout = Hin - t - b
29 Wout = Win - l - r
30 # Ahh so here's what we're trying to compute. The Hout and Wout (which I will
31 # assume is output height and width).
32 else:
33 # Okay "else", wait, what's my initial condition again? Ah if there is one input layer.
34 # I guess there can be more than one.
h di [l i [ ]][ ]
35 Hout = shape_dict[layer.input[1]][3]
36 Wout = shape_dict[layer.input[1]][4]
37 # Since we're only accessing the first index of the input layer, I guess there are only
38 # two possible? I wonder if there are more than two if that's a problem. /shrug
39
40 shape_dict[layer.output[0]] = (Seq, Batch, Cin, int(Hout), int(Wout))
41 # ahh so now I see that we're modifying what looks like the first output layer with the new
42 # values we computed. Interesting that we're casting them as an int. I wonder if they're
43 # ever not integers? Do we lose something because of that?
_crop_commented.py hosted with ❤ by GitHub view raw
By the very end I was able to figure out what this function is trying to do:
compute and update the output shape of a crop layer. To solve the mystery, I
dusted off my deductive reasoning skills and put them to work.
I read code nearby to see how and where this function is used. I had to
search for similarly named variables in other code to give myself more
context to what they mean. Ultimately, I had to comprehend way more than
just this function to figure out these 12 lines of code. This experience is very
common. To understand even a small part of the pie, you need to know how
the whole pie was baked.
So, what does this code really do? This wonderful snippet in the Core ML
documentation gives us a clue:
The cropping layer have two functional modes:

- When it has 1 input blob, it crops the input blob based
on the 4 parameters [left, right, top, bottom].
- When it has 2 input blobs, it crops the first input blob based
on the dimension of the second blob with an offset.
Aha! There are two functional modes with different behaviors for each
mode. The _crop function is beginning to come into clearer view. Let’s
rewrite the function in a way that is much more declarative about the world
it lives in:
1 def _compute_crop_layer_output_shape(layer, shape_dict):

2 """Update shape of output layer based on crop layer configuration.
3
4 There are two functional modes of the crop layer. When it has 1 input
5 blob, it crops the input blob based on the 4 parameters
6 [left, right, top, bottom]. When it has 2 input blobs, it crops the
7 first input blob based on the dimension of the second blob with an offset.
8
9 Args:
10 layer: Crop layer
11 shape_dict: dictionary of model layer shapes.
12
13 Returns:
14 Tuple containing output dimensions for Crop Layer.
15 """
16 seq, batch, input_channel, input_height, input_width = (
17 shape dict[layer.input[0]]
p _ [ y p [ ]]
18 )
19
20 if len(layer.input) > 2:
21 raise Exception('Crop does not accept more than two inputs.')
22
24 # When it has 2 input blobs, it crops the first input blob based
25 # on the dimension of the second blob.
26 second_input_shape = shape_dict[layer.input[1]]
27 output_height = second_input_shape[3]
28 output_width = second_input_shape[4]
29 return (seq, batch, input_channel, output_height, output_width)
30
31 crop_amounts = params.cropAmounts.borderAmounts
32 if not crop_amounts:
33 # If there are no border adjustments, return the original layer shape.
34 return shape_dict[layer.input[0]]
35
36 top, bottom = crop_amounts[0].startEdgeSize, crop_amounts[0].endEdgeSize
37 left, right = crop_amounts[1].startEdgeSize, crop_amounts[1].endEdgeSize
38
39 output_height = int(input_height - (top + bottom))
40 output_width = int(input_width - (left + right))
41
42 return (seq, batch, output_channel, output_height, output_width)
_compute_crop_layer_output_shape.py hosted with ❤ by GitHub view raw
Reading this, hopefully the intent of the function is obvious. It’s definitely
not perfect, and we can spend hours going back and forth on its
construction; however, I would be willing to bet that you feel more

comfortable explaining what this function does and maybe more
empowered to make a change to this code.
. . .
Spend less time searching and more time building.

Sign up for a weekly dive into the biggest news, best
tutorials, and most interesting projects from the
deep learning world.
. . .
No single letter variable names

While single letter variables are acceptable in some cases (indexes in an
array for instance), their proliferation makes code extremely difficult to
read. Using a single letter variable forces you to trace the code to its
definition and make sure it wasn’t changed along the way.
Variable names should state what they represent, not how or why. Single
letter names can not possible convey what they represent.
Be explicit rather than implicit

The first version implicitly states that there can be two modes of the crop
layer. I had to put my deductive reasoning skills to work to figure it out.
Making the details of the crop layer explicit in the code makes it clear what
the function does and does not do. If you find yourself deep inside nested if-
statements, ask yourself, “what conditions do I know about that the reader
might not know?” By explicitly saying what is handled rather than what it is
not handled, you will find code easier to understand.
“Functions should do one thing, and do one thing

well” -Craig Lancaster -Me
My mentor, Craig, said this to me many many times and I think it cannot be
repeated enough. If your function truly does one thing well, it should be
possible to communicate what that one thing is in the function name.
Name your function in a way that conveys what it does without having to the
read its contents.
The function name _compute_crop_layer_output_shape communicates the

intent without hiding any surprises inside. If you have a difficult time
succinctly describing the intent of a function, it’s probably time to split it up
into two (or more) functions that each do one thing well.
AI Index: 2017 Annual Report, http://www.aiindex.org/2017-report.pdf
The data science community is exploding right now. The popularity of AI

and ML software is growing exponentially. Machine learning models are
breaking into normal products. As machine learning becomes more
mainstream, strong communication is key. Communicating code effectively
to other engineers is a great place to start and will help the community
grow even quicker.
Readable code makes bugs easier to spot. Readable code makes it easier for
others to get involved. Readable code lets us spend our precious time
solving interesting problems.
The code we are writing is ultimately not for ourselves. It is used and improved
by many other people to solve problems in new ways.
As the data science community grows and machine learning is used more to
create fresh experiences, comprehensible code is necessary to increase the
velocity of the community.
. . .
Editor’s Note: Join Heartbeat on Slack and follow us on Twitter and LinkedIn
for the all the latest content, news, and more in machine learning, mobile
development, and where the two intersect.
Like what you're reading?

Skimmable bytes of mobile machine learning delivered right to your inbox. Puns free
with sign up.
Email
Sign up
I agree to leave Heartbeat.fritz.ai and submit this informa on, which will be
collected and used according to Upscribe's privacy policy.
Machine Learning Arti cial Intelligence Data Science Heartbeat Data Science For Ml
Discover Medium Make Medium yours Become a member

Welcome to a place where words matter. Follow all the topics you care about, and Get unlimited access to the best stories on
On Medium, smart voices and original we’ll deliver the best stories for you to your Medium — and support writers while
ideas take center stage - with no ads in homepage and inbox. Explore you’re at it. Just $5/month. Upgrade
sight. Watch
About Help Legal

How Great Data Scientists Can Stop Writing Bad Code

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

How Great Data Scientists Can Stop Writing Bad Code

Uploaded by

Copyright:

Available Formats

09/09/2019 How great data scientists can stop writing bad code

How great data scientists can stop

1 def crop(layer, shape dict):

_crop.py hosted with ❤ by GitHub view raw

1 def _crop(layer, shape_dict):

_crop_commented.py hosted with ❤ by GitHub view raw

The cropping layer have two functional modes:

1 def _compute_crop_layer_output_shape(layer, shape_dict):

_compute_crop_layer_output_shape.py hosted with ❤ by GitHub view raw

construction; however, I would be willing to bet that you feel more

Spend less time searching and more time building.

No single letter variable names

Be explicit rather than implicit

“Functions should do one thing, and do one thing

The function name _compute_crop_layer_output_shape communicates the

AI Index: 2017 Annual Report, http://www.aiindex.org/2017-report.pdf

The data science community is exploding right now. The popularity of AI

Like what you're reading?

Discover Medium Make Medium yours Become a member

About Help Legal

You might also like