Professional Documents
Culture Documents
[MUSIC PLAYING]
There's another term of art we can toss here, like [SNIFFS] something
smells kind of funky about this code.
This is an actual term of art.
There's code smell here.
Something smells a little off.
Why?
What do you think?
AUDIENCE: [INAUDIBLE]
Any instinct?
Yeah?
Yeah, so outside for loop.
So indeed, I can just go below line 8 and above line 9, creating a new one.
And now it's totally fine to just print a new line like that.
You don't have to print anything else with it.
It's indeed a character unto itself.
So let's do make mario one last time, ./mario.
OK, so now we're back in business there.
Well, what if we wanted to do some other scene from "Mario," such as this one
here where there's a lot of vertical obstacles like these bricks here?
If I wanted to print out now a column of three bricks--
and I'll use hashtags for these instead of anything graphical-- well,
I think we're almost there, right?
I think I can now--
it's almost maybe a little easier.
I can go back here, change the question mark
to something that looks more like a brick, like this hash symbol.
And I think now I do want the new line character because when I now do make
mario, ./mario, OK, there's my wall of four.
Oh, but wait.
I didn't want four.
I wanted to be consistent just with this particular scene here,
so I just want three.
So I can still change it in one place.
And here, again, is that paradigm.
Even whether you're using 4 or 3, if you get
into the habit of starting counting from 0,
you go on up to but not through the value you want to count up to.
So that's why I'm using less than instead of less than or equal to there.
So this would be the common paradigm, though you could count it
like we saw earlier in different ways.
But what if things escalate one level further?
And when you're in the underground version of "Super Mario Brothers,"
there's a lot of these underground obstructions,
including grids of bricks like this.
And let me conjecture that if you slice this up, it's roughly a 3
by 3 grid of bricks that all interlock prettily to give us just one
big, large brick like this.
So if I want to print out a 3 by 3 grid, now things are getting a little more
interesting because up until now, I've printed either one row horizontally
or one column vertically.
But we haven't really seen any code where
I'm printing or living in two different dimensions like the game would imply.
But let me propose that we could do this.
Let me go ahead and say, all right, suppose I want to print a 3
by 3 grid of bricks.
It's really that I want to print, what, three rows of bricks.
A grid is three rows.
So if I take the high-level idea and reduce it
to something a little simpler, how do I do that?
Well, let me get rid of the printf for a moment as I did.
And let me just stipulate that this for loop,
even though it doesn't do anything useful yet,
will do something how many times just by design?
All right, three times.
This for loop is good to go.
It will do something three times by just using i to do the counting.
All right, well, if I want to print out now a row of three bricks all
on the same line, that's pretty similar to what
we did earlier when I just wanted to print out
four question marks in the sky.
So we've seen a solution there.
And I daresay we can compose one into the other.
So if I want to print out a row of bricks,
I could just do this for in i get 0 i less than 3 i++,
and then inside of this inner loop, if you will,
let me print out a single brick like this.
And then I don't like where this is going but, I think I've taken two ideas
and I've combined them.
But what might be problematic about lines 5 and 7 at the moment?
What might be bad here?
Yeah, in back?
AUDIENCE: You used the same integer i.
DAVID J. MALAN: Yeah, I'm using the same integer i, which
I feel like could get me into trouble.
If I'm trying to count three things here,
but then I'm hijacking this variable and using it inside of the loop,
I feel like I should avoid this collision of names.
And so what's a good alternative to i?
Well, a programmer, if nesting loops in this way,
would pretty commonly go with j.
You could certainly change this to be rows and columns
if you want more descriptive variables.
But i and j is pretty canonical.
So I'm going to go ahead and do this, j++ instead of i++ everywhere.
And let me try compiling this.
So make mario, Enter, ./mario.
OK, so a couple of things are wrong here.
This is not a 3 by 3 grid.
But if you count these things, how many did I indeed print at least?
You can probably just guess logically.
AUDIENCE: Nine.
DAVID J. MALAN: Yeah, there's nine hashes there.
Unfortunately, they're all on the same line
instead of on three different lines.
So where logically can I fix this?
I'm definitely printing all the bricks.
They're just not on the right levels.
Yeah?
AUDIENCE: If you put a new line at the first loop,
then you'll get three separate lines.
DAVID J. MALAN: Yeah.
So put a new line after the first loop, this inner loop, if you will,
the nested loop, if you will.
So let me go ahead and print out just a backslash n here.
And what's this doing?
Well, I think that's going to solve it by just
moving the cursor to the next line after you've done one row.
So let me go ahead and do make mario, Enter, ./mario,
and now we're in business.
So it's a very simplistic version of this same graphic,
but I'm leveraging two different ideas now--
or the same idea twice rather now.
I'm using one loop to control my cursor going row, by row, by row.
But then within that loop, I'm doing left to right,
dot, dot, dot, dot, dot, with printing out
each of these individual bricks like this.
Now, there's a little sloppiness here still.
If I want this to always be a square just because that's
what it looks like in the game, well, I could change it to be a 4 by 4
square by doing this or a 5 by 5 grid--
whoops-- by doing this.
Why is this perhaps not the best design to just keep changing the numbers when
I want to change the size?
Where could this go awry?
Yeah?
AUDIENCE: If it's a square, [INAUDIBLE]
DAVID J. MALAN: Yeah.
If it's always going to be a square and height
is going to be the same as width, I'm just inviting trouble here, right?
Eventually, I'm going to screw up.
I'm going to change one but not the other.
Then it's going to come out to be a rectangle instead of a proper square.
So I should probably solve this a little differently.
So let me do that.
At the top of my main function here, let me go ahead and give myself
a variable called maybe n for the number of bricks I want horizontally
and vertically.
And I'll just initialize that to 3 initially.
And instead of putting 3 here, I'll literally just use n.
But I'll do it in both places so that now, henceforth,
if I ever want to change this and change it to 4, or 5, or anything else,
I'm all done.
It's better designed because there's a lower probability of mistakes.
But I could technically still screw up somehow.
I could technically accidentally write a line of code like n++,
or I could just change the value of that variable even though I don't want it
to ever change.
And maybe it's because I'm a bad programmer, I copy/pasted wrong,
I'm working with someone who doesn't know what n represents,
I can defend myself and my code against human error
like that by going up here to line 5.
And instead of just declaring a simple variable like we did in Scratch,
I can further harden my code, so to speak,
by declaring it to be a constant using the keyword const.
Now, this is just a feature of C and some other languages
to protect you against yourself by proactively saying,
n is a constant, specifically the number 5 or, previously, the number 3.
You cannot accidentally write code elsewhere that changes it.
The computer will throw an error and catch that error.
So it's just a way of programming a little more defensively.
Some languages have this.
Some languages don't.
But in general, it's a good practice.
It makes your code better designed because it's just
as less vulnerable to mistakes by you, colleagues, or anyone else
using the code.
So let me change this back to 3 just to be our default.
But now I'm using n in both places.
And if I do make mario, ./mario, we're back to where we originally started.
But the code is a little more better designed.
And let me note this too.
All this time, I've been mentioning that correctness is important.
Design is important.
There is also this matter of style.
I've been very deliberately writing pretty code,
if you will-- not just the syntax highlighting, which, is automatic.
But notice that I keep indenting everything nicely.
Any time I have curly braces, like on lines 4 and 14,
everything is indented one level.
When I have additional curly braces on lines 7 and 13,
everything is nicely indented as well.
Technically speaking, the computer does not care about that kind of whitespace,
so to speak.
And you could really make a mess of things
like this because you have a strange sense of style
or just because you're being a little sloppy.
But this code is actually still correct.
If I recompile it-- let me open up my terminal window--
make mario, no errors, ./mario, it works perfectly fine.
But you can imagine just how annoying this now
is to read, certainly for a TA, but certainly for you the next day,
certainly for a colleague who has to read your code.
This is just bad style.
It still works, and it's well designed in that you're writing code
defensively, you're using a constant.
But, my god, the style is atrocious.
Now, you'll often find that there's tools
that can help you format your code for you
in a manner consistent with a courses or a company's style.
But this is the kind of muscle memory you'll want to develop over time too.
Take these VS Code suggestions as it's outputting lines of code for you
because it's trying to format your code in a readable way.
And, oh, my god, if and when you do have bugs in your code
and things aren't even indented properly,
there's no way you the human are going to be able to wrap your mind around
what's happening and where.
You're just making the problem harder for yourself.
So do get into this habit too of manifesting good style as well.
All right, well, let me propose that we don't only want a 3 by 3 grid.
We want this to be a little more dynamic.
So suppose we moved away from a constant to just using an integer called n.
And let's ask the user for the size of this grid
as by prompting them with get_int, as we've done before.
And I'll store it in n here.
And then I can go ahead and, more dynamically,
run make mario to compile it-- whoops.
Oh, I screwed up accidentally.
What is it suggesting I do, albeit cryptically?
AUDIENCE: You have to include the cs50.h.
DAVID J. MALAN: Yeah, I forgot to include the CS50 header file up top.
And that's why it doesn't know that get_int is, in fact, valid.
So that's an easy fix.
I'm just going to go up here and include cs50.h.
Now I'm going to clear my terminal and rerun make mario.
Now we're good-- ./mario.
And now notice I'm prompted for size.
So if I type in 3, it's the same as before.
If I type in 10, it's even bigger, but it happens all now automatically.
But there are some things that we're not detecting.
For instance, suppose I type in cat.
Well, that's handled by the get_int function, as I claimed earlier.
That's one of the features of using a library.
You don't have to deal with erroneous input.
But we only designed a function called get_int to get you an integer.
We don't know if you want it to be positive, negative, zero,
or some combination thereof.
And it's kind of weird to allow the user to type in negative 1
for the size of the grid or negative 3 for the size of the grid.
And indeed, your code does nothing, so at least it's not crashing.
But that's kind of stupid, right?
It'd be nice to force the user if they want
a grid to give us a positive value.
So how could we do this?
Well, I could go up here and I could say something like if n is less than 1--
so if it's 0 or negative, which I don't want, what could I do?
Well, I could say, well, prompt the user again for the size.
And now notice, I'm not declaring n again because once it exists,
you don't have to mention the data type again.
We said that earlier.
But this is kind of stupid.
Why?
Because now when you've given the user a second chance, OK, now maybe
I'll do-- all right, if this version of n is less than 1, well,
let's just go and prompt the user a third time.
I mean, you can see where this is stupidly going.
This can't be the right solution to keep typing recursively
the same thing again and again.
Where would it stop?
You'd have to give them a finite number of chances
or just make a mess of your code.
So what would be intuitively a better solution here?
AUDIENCE: A while loop.
DAVID J. MALAN: Yeah, so some kind of loop.
We've seen a while loop.
We've seen a for loop, so maybe one of those.
So let me try this.
Let me delete this messiness and just go back to the first question.
And let me do this.
So while n is less than 1--
so while the number is not what we want--
let's just prompt the user in a loop this time for the size again.
Now, here too, this is better because it's only two requests for information.
But clearly, lines 6 and 9 are pretty much identical other than the int.
And if I went in and changed the size, if I
add this, if I change the wording here, change it to a different language,
I have to change it in two places.
That's bad.
Copy/paste, bad.
So what might be better?
Well, it turns out, there's another paradigm in C
that you can use that gets around this problem, this duplication of code.
It would be much nicer if I just write the code once.
And I can do that using a third type of loop called a do while loop.
So it turns out, in C, you can do this.
If you want to get the value of a variable like n,
first just to create the variable without an initial value.
So int n semicolon means we don't know what value it has, yes.
But that's OK.
We're going to add a value to it eventually.
Then I'm going to say this, do, literally.
I'm going to open my curly braces.
And what do I want to do?
I want to assign to n the return value of get_int,
prompting the user for size.
Well, when do you want to do that?
I want to do that while n is less than 1.
And this code now achieves the exact same goal,
but by never repeating myself.
Why?
Well, notice on these lines of code now, I'm literally saying on line 6,
give me a variable called n of type integer.
It doesn't have a value initially, but that's fine.
You can do that.
Line 7 says, do the following.
What do you want to do? get_int, prompting the user with the word size,
and just store that value in n.
But because C code runs top to bottom, left to right,
now it's reasonable on line 11 to ask that question, OK, is
the current value of n, which it definitely got on line 8, less than 1?
And if the user didn't cooperate-- they typed in 0, or negative 1,
or negative 3-- what's going to happen?
It's going to go back up here and repeat, repeat, repeat everything
in the do while loop.
So a do while loop in C--
which is not something some other languages have.
Python, if you know it, does not have a do while loop.
This is perhaps the cleanest way to achieve this,
even though it's a little weird that you have to declare your variable,
create your variable up top, and then check it down below.
But otherwise, it's similar to a while loop.
It just flips the order in which you're asking the question.
Any questions on this construct?
And do while, in general, is super useful when
you want to get input from the user and make
sure it meets certain requirements.
So all right, so now that we have this building block after that interlude.
How can I go about cleaning up this code?
And then let's conclude by taking a look at things that our code can't do
or can't do very well or correctly.
Let me propose that in a final version of Mario,
let me just add what are called now some comments.
So it turns out, in code in C, you can define
what are called comments, which are just notes to self.
Some of you discovered these in Scratch.
There's little yellow sticky notes you can
use to add citations or explanations.
In C, there's a couple of ways to write comments.
And in general, comments are notes for yourself, for your TA,
for your colleague as to what your code is doing and why or how.
It's a little explanatory note in English
or whatever your human language might be.
So for instance, what I might do here in my implementation
of this version of mario, I might first ask myself a question like--
I might first make a note to self like this on a new line,
above this first block of code, Get size of grid.
It's just an explanatory remark in any terse English
that generally explains the next six or so lines, the next chunk
or block of code, if you will.
It would be a little excessive to comment every single line.
At some point, the programmer should know what individual lines of code do.
But it's nice to be able to glance at this comment on line 6
that starts with two slashes, and it gets grayed out
because of syntax highlighting.
It's not logic.
It's just a note to self.
It generally gives me a little cheat sheet
as to what the following lines of code should be doing and/or why.
And then down here, well, there's a second block of code
that's a bunch of lines.
But together, this just, what, prints grid of bricks.
And so it's another comment to myself that
just makes it a little more understandable what
these 20-some-odd lines of code are doing by adding
some English explanations thereof.
But now that I have these, wouldn't it be nice
if I could abstract these pieces of functionality away, this
getting of the size and this printing of the grid?
In other words, suppose that you didn't know where to begin with this problem.
And the problem at hand were literally implement
a program that prints a grid of bricks of some variable size--
3, or 4, or 5, or whatever the human types in.
If you have really no idea where to start,
comments are actually a good way of getting
started because comments can be an approximation of what we call last week
pseudocode.
Pseudocode is terse English that gets your point across, like for the phone
book searching like last time.
So if you didn't really know where to begin,
you could do something like this.
I could, for instance, just say, Get size of grid as my first step
and then Print grid of bricks as my second step.
And that's it for my program thus far.
This is now implemented in pseudocode.
I have some massive placeholders there.
I still have work to be done.
But at least I have a high-level solution to the problem in comments.
And now I can even go this far.
I could say, well, let's suppose that there's just a function already
that exists called get size.
I could do something like this.
I could do int n equals get_size.
And now I just have to assume for the moment
that some abstraction called get_size exists.
It doesn't.
This does not come with the CS50 library.
But I could invent it, I bet.
How else might I proceed?
Well, let's just assume for the moment that there's also
a function called print_grid that just prints a grid of that size n.
So here too is an abstraction.
These puzzle pieces don't exist.
These functions don't yet exist.
But in C, just like in Scratch, I can create my own functions.
How do I do that?
Well, let me go down later in the file.
And by convention, you generally want to leave main at the top of your code.
Why?
Because it's the main function, and it's just where
the human eye is going to look to see what some file of code does.
And let me do this.
I want to create a function of my own called get_size whose purpose in life
is to get the size that the user wants.
I want this function to return an integer.
And the syntax for doing that is this, right,
similar to a variable, the data type that this function returns.
I don't need this function to take any inputs.
And so I'm going to use a new keyword that we've actually been using thus
far-- more on it another time-- just called
void, which just means this get_size function does not take any inputs.
It does have an output.
It outputs an int.
And this is just the weird order in which you write it.
You write the output format, the name of the function, and then the inputs,
if any, inside of parentheses.
And now I can implement get_size.
But I've already implemented get_size.
Or at least now at this point in the story,
I at least know concretely what to do.
And I could figure out eventually, with some trial and error
perhaps, all right, if I declare a variable
and I do the following n equals get_int, prompting the user for size,
and I keep doing that while n is less than 1, once that block of code
is done, here is a new keyword in C where you can return that value n.
So I keep referring to these values that some functions return as return values.
In C, there's literally a keyword called return
that will hand back to any function that uses
that function the value in question.
So in a nutshell, between lines 15 and 21 now,
here is some code identical to our solution earlier that gets a value
n from the user that is positive.
It's 1, or 2, or higher.
It's not 0, or it's not less than 1.
And as soon as we've got that value, we hand it back as a return value.
Notice how I'm using this function on line 7.
Just like with get_int, just like with get_string,
I'm calling the function-- nothing in the parentheses in this case.
But then I'm using the assignment operator
to copy whatever its return value is into my variable n.
And so now I have a function that didn't use
to exist called get_size that gets me a positive integer no matter what.
And now for the grid, how do I do this?
How do I invent a function called print_grid
that takes a single argument, a number and prints a grid of that size?
Well, let's go down here.
I'm going to write the name of this function print_grid.
This function just needs to print.
It has a side effect, as we keep saying.
So I'm just going to say it has no return value.
It's just void.
It doesn't have an output, per se.
It's just an aesthetic side effect.
But it does take in an argument.
An argument is an input, and the syntax for this in C
is to name the type of the input it takes and the name of the variable.
And I could call this anything I want.
I'll call it size.
I could call it n.
And it's OK to use the same variable in different functions,
but I'll call it size just to be distinct.
And then in this function, I'm just going
to copy from memory the same code is before. for int i get 0,
i less than size--
instead of 3-- i++, inside of this, for int j gets 0, j is less than size j++,
and inside of that, print out with printf a single hash,
print out after that loop a single new line, and that's it.
Now, I did this fast, admittedly.
But it's the same code that I wrote earlier.
But now, just like I did with Scratch, let
me just arbitrarily hit Enter a bunch of times
to move the code out of sight, out of mind.
Now I have abstractions.
I have puzzle pieces that now exist called get_size and print_grid,
syntax for which takes some getting used to, but they now just exist.
Except I do need to do one thing.
Because C is a little naive, if I try to do make mario now and hit Enter,
implicit declaration of function get_size is invalid.
And we've seen that before when I hadn't included a file.
When I hadn't included CS50 library, get_int didn't work.
But that's not the issue here because this is not from a library.
I just invented this.
C takes you literally.
And if you define these functions at the bottom of your file,
they don't exist on line 7 or 10.
So I could do this.
I could, all right, fine, well, let me just highlight all of this,
cut to my clipboard, and paste it up here.
This would solve the problem.
I could just move all of those functions at the top of my file.
That's annoying because now main is at the bottom of the file.
It's going to take longer to find it.
That's not a clean solution.
So let me put it back where it was at the bottom.
And let me do this.
This is the only time in CS50 and, really in C programming
where copy/paste is reasonable.
If you copy and paste the first line of code from each function
and then end it with a semicolon, you can tease the compiler
by giving it just enough of a hint at the top of the file
that, OK, these functions don't exist till down later.
But here's a hint that they will exist.
This is how you can convince the compiler to trust you.
So those other functions can still be lower in the file, below main.
But now when I do make mario--
oh, damn it.
Oh, I said print instead of printf.
That's my bad-- printf.
So if I do make mario, ./mario, now I can type in 3,
and we're back in business.
Now, this was a very heavy-handed way in long way
to get to a much more complicated solution.
But this solution, in some sense, is better designed.
Why?
Because now, especially without the comments,
I mean, look how short my code is.
My main function is literally two lines of code.
Why?
Well, I factored out the juicy stuff into its own functions.
And now, especially if I'm working with colleagues or others,
you could imagine splitting up large programs into smaller parts,
having different people implement different parts,
so long as you all agree in advance on what those inputs and those outputs
actually are.
All right, so let's now consider what computers can do well and not so well.
C indeed supports a whole bunch of operators, mathematically,
via which we can do addition, and subtraction, multiplication, division,
and even calculate the remainder when you divide one number by another.
In fact, why don't we go ahead and use these in a very simple program
and make our very own calculator?
So let me go over here to VS Code.
Let me go ahead and create a new file called calculator.c.
And in this file, let's go ahead and first include
a couple of now familiar header files-- cs50.h as well as stdio.h.
Let's go ahead then and declare main with int main(void).
And then inside of main, let's do something relatively simple.
Let's declare an int and call it x, and set
it equal to whatever the return value is of get int,
prompting the user for a value for x.
Let's then give ourselves a second variable.
We'll call it, say, y.
Set that equal to the return value of another call to get_int,
prompting the user this time for that value y.
And then let's very simply go ahead at the very end
and just print out, say, the sum of x plus y, a super simple calculator.
So I'll use printf, quote/unquote, %i for integer,
backslash n to give me the new line.
Then I'm going to go ahead and do x plus y to indeed print out the sum.
Let me go down to my terminal window now.
Let me do make calculator in order to compile the code.
No error messages, so that's good.
Let me do ./calculator.
And let's do something like 2 plus 2, which, of course, should equal 4.
And it does.
But it turns out that sometimes there are going to be limitations
that we bump up against.
And let me get a little more ambitious here.
Let me clear my terminal window.
And let me go ahead and rerun calculator again.
And this time, let's, oh, 2 billion for x, and let's type in the same for y.
And, of course, now the answer of 2 billion plus 2 billion
should, of course, be 4 billion.
And yet, it's not.
So curiously, we see, of all things, a negative number
here, which suggests that somehow the plus operator doesn't quite
work as well as we might like.
Now, why might this actually be?
Well, it turns out that inside of your computer is, of course, memory, or RAM,
Random Access Memory.
And depending on the size of your computer and the type of computer,
it might very well look a little something like this--
a little circuit board with these black little modules on it
that actually contain all of the bytes of your computer's memory.
Unfortunately, you and I only have a finite amount
of this memory inside of our computers, which
means no matter how high we want to count,
there's ultimately going to be a limitation on how high we
can count because we only have a finite amount of memory.
We don't have an infinite number of zeros and ones to play with.
We have to actually be bounded ultimately.
So what's the implication of this?
Well, it turns out that computers typically use
as many as 32 bits in zeros or ones to represent something
like an integer, or in C, in int.
So for instance, the smallest number we could
represent using 32 ints, of course, using 32 bits, of course,
would be zero--
32 zeros like this here.
And the biggest number we could represent
is by changing all of those zeros to ones, which, in this case,
will ideally give us a number that equals roughly 4 billion in total.
It's actually 4,294,967,295 maximally if you set all 32 of those bits to ones
and then do out the actual math.
The catch, though, is that we humans and computers in general
also sometimes want to and need to be able to represent negative numbers.
So if you want to represent negative numbers as well as positive numbers
in 0, you can't really just start counting at 0
and go all the way up to roughly 4 billion.
You've got to split the difference and maybe
allocate half of those patterns of zeros and ones two negative numbers
and the other half roughly to positive numbers.
So in fact, in practice, when you're using even as many as 32 bits,
the highest most computers could count, certainly in a program like this in C
using an int, would be roughly 2 billion.
That is 2,147,483,647.
But the flip side of that is that we could also now,
using different patterns of bits, represent negative numbers as low
as negative 2 billion, give or take.
But the implication then, of course, is that if we only
have a finite number of bits and can only count so high, at some point,
we're going to run out of bits, so to speak.
In other words, we encounter what's generally known as integer overflow
where you want to use more bits than you have available.
And as a result, you overflow the available space.
What does this mean, in fact, in real terms?
Well, let's suppose that you only have three bits,
but I'm going to gray out a fourth bit just
to convey where we'd like to put an additional bit ultimately.
If this of course, is 0, per week 0's discussion,
this is 1, 2, 3, 4, 5, 6, 7.
Now, ideally, in binary, if you want to add one more to this value 7,
you're going to have to carry the 1 mathematically,
and that would ideally give 1000.
But if you don't have four bits and your computer is only sophisticated enough
to have three bits, not even 32, but three,
the implication is that you're effectively representing not 1000,
but rather, 000.
There's just no room to store that fourth bit
that I've grayed out here, which is to say that your integer might overflow.
And as soon as you get to 7, the next number once you add 1
is actually going to be 0, or worse, as we've seen here
in my code, a negative value instead.
So what could we do to perhaps address this kind of concern?
Well, C does not have just integers or ints.
It also has longs, which, as the name suggests,
are just longer integers, which means they have more bits available to them.
So let me go back into my code here.
I'll clear the terminal window.
And let me go ahead and change my integers to
literally long here, long here.
I'm going to have to change my function in CS50's library
to be not get_int, but get_long.
And that's indeed another function we provide in the library.
Let me change this get_int to get_long as well.
I'll keep my variable names the same, but I do need to make one other change.
It turns out that printf also supports other format codes--
so not just %i for integers or %s for strings, but also, for instance,
%li for a long integer, as well as %f for floating-point values with
decimals.
So with that said, let's go ahead and change my printf line to be not %i,
but %li.
Now let me go ahead and do make calculator again, Enter--
no apparent errors now-- ./calculator.
And 2 plus 2 still equals 4 as before.
But now if I do calculator again, and let's do 2 billion
again as well as 2 billion for y, previously, we
overflowed the size of an integer and got some weird negative number
because the pattern was misinterpreted, if you will, as a negative number
instead.
But a long, instead of using 32 bits, conventionally
uses 64 bits, which means we have more than enough spare bits
to go when we add 2 billion plus 2 billion.
And now, in fact, we get the correct answer of 4 billion,
which does fit inside of the size of a long.
Now, a long can count up quite high.
And, in fact, it can count as high as this, 9 quintillion.
And so that will give us quite a bit more runway.
But, of course, it too is ultimately going to be finite.
So if you have numbers that need to go bigger than that,
you might still very well have a problem.
Now, there's another problem that we might run into as well.
And we can see it in the context of even this simple calculator.
Computers also suffer from potentially what's called truncation,
where especially when you're doing math involving floating-point values-- that
is numbers with decimals-- you might accidentally unknowingly truncate
the value-- that is lose everything after the decimal point.
So in fact, let me go back to VS Code here.
I'll clear my terminal window.
And let's still use longs, but let's go ahead and use
division instead of addition here.
So let me change this plus to a divide operator.
Let me go ahead and recompile the code down here with make calculator.
Let me go ahead and run ./calculator, and let me go ahead and do something
like 1 for x and 3 for y.
And we'll see that--
well, wait a minute.
1 divided by 3, I learned, should be 1/3.
But in a floating-point value, that should 0.33333, maybe
with a little line over it in grade school,
but, really, an infinite number of threes.
And yet, we seem to have lost even one of those threes after the decimal point
because the answer is coming back here as just 0.
So why might that be?
Well, if I know that two integers, when divided one by the other,
is supposed to give me a fraction, a floating-point value
with a decimal point, I can't continue to use integers or even,
in this case, longs, which do not have support for decimal points.
So let me go ahead and change this format code here from %li to %f,
which is, again, going to represent a floating-point value instead of a long
integer or even an integer.
And let me go ahead further and define maybe a third variable, z, as a float
itself.
So I'll give myself a variable z equals x divided by y.
And now rather than print x divided by y, let's just go ahead and print z.
So now I'm operating in a world of floating-point values
because I proactively that a long or an int divided
by another such value, if it's meant to have a fraction,
needs to be stored in a floating-point value, something with a decimal point.
Well, let me go down to my terminal window here and rerun make
of calculator-- seems to work OK-- ./calculator,
and let's do 1 divided by 3 again.
And still here, we see all zeros.
So we do at least see a decimal point, so we've made some progress Thanks
to the %f and the float.
But it seems that we've already truncated the value 1 divided by 3.
So how do we actually get around this issue?
Well, if you the programmer know that you're
dealing in a world that's going to give you floating point
values with decimal points, you might very well
need to use what's called a feature known
as typecasting-- that is convert one data type to another by explicitly
telling the compiler that you want to do so.
Now, how do I do this?
Well, let's go back to my code here.
And if the issue fundamentally is that C is still
treating x and y as integers-- or technically,
longs with no decimal point-- and dividing one by the other,
therefore has no room, so to speak, for any numbers after a decimal point,
why don't I proactively do this?
Let me, using a slightly new syntax with parentheses,
specify that I want to convert x proactively from a long to a float.
Let me specify proactively that I want to convert y from a long to a float
as well.
And now let me go ahead and trust that nz
should be the result of dividing not a long by a long or an int by an int,
but rather, a float by a float.
Let me clear my terminal window, run make calculator again--
seems to work OK-- ./calculator.
And now 1, 3, and hopefully now we actually see
that my code has outputted 0.333333.
And I think if we kept showing more numbers after the decimal point,
we'd theoretically see as many of those threes as we want.
But there is still one more catch.
And especially when we're manipulating numbers
in this way in a computer using a finite amount of memory,
another challenge we might run up against-- besides integer
overflow, besides truncation-- is this known as floating-point imprecision.
Just as we can't represent as big of an integer as we want using int
or long alone because there is going to be an upper bound,
there's similarly going to be a bound on just how precise our numbers can be.
And indeed, let's go back to VS Code here.
I'll clear my terminal window yet again.
And this time, let me use some slightly unlikely syntax to specify that I
don't want to see the default number of numbers after the decimal point,
which %f gives us automatically.
Let's go ahead and show me 20 decimal point numbers after the decimal point.
And the weird syntax for this is to do not %f,
but %.20 to indicate to see that I want to see 20 digits,
not the default after, now, the decimal point.
Let me rerun make calculator.
Let me do ./calculator again.
And let's do 1, let's do 3.
And now this is even weirder, right?
From grade school, you presumably learned that 1 divided by 3
is, of course, 1/3.
But that should be 0.33333, infinitely many times, or, on paper,
with a little line over it.
But the computer is doing some weird approximation here.
It's a whole bunch of 3's and then 4326744079590.
Well, what's really happening under the hood,
well, again, is this issue of floating-point imprecision.
If you only have a finite number of bits and, in turn,
a finite amount of memory, the computer can really only
be so precise intuitively.
Equivalently, the computer is decided on some way
of representing floating-point values.
But the catch is, per grade school math, there's
an infinite number of numbers out there and an infinite number
of floating-point values because you can keep adding more and more digits if you
want.
So the computer, given the way it's implementing these floating point
values, is essentially giving us the closest approximation that it can.
Now, how can we go about improving the situation?
Well, there is one alternative.
Instead of using float, I can use something
called a double, which, as the name suggests,
uses twice as many bits as a float.
So instead of 32 typically, it will use 64.
And that's just like the difference between a long and an int,
which gave us more bits.
But in this case, this will be used for more precision.
Let's go ahead and cast x to a double.
Let's cast y to a double.
And now let's go ahead and, using the same format code--
%.20f is still OK for doubles.
Let me do make calculator.
Let me do ./calculator.
And now let me do 1 divided by 3.
And we still have some of that imprecision.
And it's even more of it if we looked at more than just 20 digits.
But now we have more threes after the decimal point.
So it's at least more, and more, and more precise, but it's not perfect.
But it's at least more precise.
So these kinds of issues, then, are going
to be necessary to keep in mind any time you
do something numerically, scientifically, at least
with a language C where you're going to bump up
against these real-world limitations of hardware and, in turn, language.
Now, later in the semester, we'll transition to a language called Python.
And that's actually going to solve at least one of these problems
for us by just automatically giving us more bits,
so to speak, as we need them, at least for integers.
But even the issue of floating-point imprecision is going to remain.
Now, just how real-world are these issues?
Well, back in the year 1999, we got a taste
of this when the world realized in the years leading up to that date
that it might not have been the best idea to implement computers
and software therein by storing gears using just two digits.
Like, instead of storing 1999 to represent the year 1999,
a lot of computers, for reasons of space and cost,
were in the habit of cutting a corner and just using
two digits to keep track of the year.
The problem with that is that if systems were not updated by the year 1999
to support the year 2000, 2001, and so forth, is that, just like before
with integer overflow, some computers might
add 1 to the year in their memory, '99.
It should be the year 2000, but if they're only
using two digits to represent years, they
might mistake the year-- as some systems may very well have--
for the year 1900 instead, taking literally
a big step backwards, if you will.
Now, you'd like to think that kind of issue
is behind us, especially as we understand
all the more about the limitations of code and computing.
But we're actually going to run up against this very same type of issue
again in just a few years.
On January 19 in the year 2038, we will have run out of bits in most computers
right now to keep track of time.
It turns out, years ago, humans decided to use a 32-bit integer
to keep track of how many seconds had elapsed over time.
They chose a somewhat arbitrary date in the past--
January 1, 1970--
And they just started counting seconds from there on out.
And so if a computer stores some number of seconds,
that tells the computer how many seconds have
passed since that particular date, January 1, 1970.
Unfortunately, using a 32-bit integer, as we've
seen, you can only count so high, at which point,
you overflow the size of that variable.
And so potentially, if we don't get ahead of this as humans, as a society,
as computer scientists, on the date January 19, 2038,
that bit might flip over, thereby overflowing the size of those integers,
bringing us back computationally to December 13, 1901.
So this is to say now, with all of this computational ability and code
comes a responsibility to actually write correct code.
Next week, we'll peel back some of these layers.
But for now, this was week 1, and best of luck on problem set 1.
[APPLAUSE]
[MUSIC PLAYING]