No One Just Designs A Programming Language

Summary
Marianne ponders the consequences of different design decisions and how to direct her
research through an enormous amount of information and choices. Thorsten Ball, author of
Writing a Compiler in Go, talks about his experiences designing Monkey and some of his
regrets in retrospect. More from Thorsten Ball's can be found at thorstenball.com. Writing
an Interpreter in Go can be purchased here. Writing a Compiler in Go can be found here.
Bonus Context
Become a patron to support this project.
Transcript
MB: So let's get one question out of the way, does the world really need another
programming language? Probably not, no.
MB: You're listening to Marianne Writes a Programming Language, and I'm Marianne.
MB: I have always wanted to write a programming language.
MB: I figured I would learn so much from the challenge. Even 15 years into my career in
technology, I feel like there are all these weird holes in my knowledge. I know a bit about
abstracts syntax trees, a bit about tokens, a bit about compilers and bytecode, even a bit
about logic gates. But I don't have a clear sense of how all those things work together. One
thing my career so far has convinced me of is that no knowledge is useless, things I learned
in the past become valuable in the most unexpected ways.
MB: The other reason why I want to write a programming language is because I can't find
a language that does exactly what I want to do. If you're familiar with my work, you
probably already know that I'm very interested in complex systems and models. I spent a
year or two exploring formal specification, a set of tools that I still use, to be honest, before
I realized that languages like TLA+ and Alloy are looking for mathematical correctness. And
that wasn't what I wanted. That wasn't what I was interested in. I didn't want to know if
certain problems are impossible. I wanted to know how likely certain types of problems are
in a given system.
MB: I wanted to model resilience, but one thing at a time. We'll get to all of that in more
detail later. So how exactly does one write a programming language? When I started this
project, I thought finding a jumping off point would be easy. I typed How do you design a
programming language into Google and expected to get back at least a few medium posts
laying out the basic decisions. Even knowing very little upfront I had a sense that in order
for programming language to work, there had to be some sense of cohesion in its design.
You couldn't just pick and choose things you liked in other languages and just think it
would be OK. Some things would not work with other things, but what were those critical
choices? I was surprised to find not much information. I expected someone to have
developed a menu of some kind--even a primitive one--to serve as training wheels for
beginners, even a complete bullshit one drafted to start fights on Hacker News. I expected
to be able to skim a few opinions and have a good idea of where I wanted to start
researching, except I couldn't find anything like that.
MB: A theme that's going to come up a lot in this series is the incredible divide that exists
between the academic world of programming and the commercial world of programming
as we go down this rabbit hole together. We'll talk about completely different styles of
programming, strange data types. Jesus, there's a whole world of obscure experimental
languages that appear in research papers, rack up a host of citations, and never touch an
actual computer other than their inventor's. At one point I asked a researcher I interviewed
for this series, why do professors keep publishing descriptions of implementations without
actual code? Why do I see demos of things but no source code repositories?
MB: And he told me: because they're ashamed. If they share their code, other people will
find all their programming mistakes. And to me, this is a really fascinating problem. One
side of this divide is developing really interesting stuff, but doesn't know how to produce
mature production ready code. The other side has those skills, but finds the language of
academia to be inaccessible and difficult to parse.
MB: In my search for an overview of the theory behind designing programming languages,
I kept getting referred back to these thick computer science textbooks, the Wizard book,
the Dragon book. You've seen them referenced as well, I'm sure. I felt like this was probably
a bad place to start. Perhaps I'm reckless or just intellectually lazy, but adding a few
hundred page college textbooks to the top of my reading list seemed like a good way to
destroy my momentum really fast, even though without question, that was the proper way
to learn. It would do me no good at all if it killed my interests. I can't remember how I
found Writing an Interpreter in Go. Just one of those things, I guess. But I once I found it, I
knew immediately this is how I wanted to start. I recently had started learning Go for work
and I can't tell you what a relief it is to have books about general programming concepts
not written in Java. Thank God. Even better, Writing an Interpreter in Go implements a
programming language called Monkey, which was developed specifically for the book,
which meant the author was someone who had actually gone through the process of
designing a full programming language.
TB: Umm... I think you're overestimating how novel monkey is
MB: And that is Thorsten Ball, creator of Monkey, author of Writing an Interpreter in Go,
and also its follow up, Writing a Compiler in Go.
TB: So I didn't set out to create a completely new programming language ... I couldn't
like... no, I don't-- I wouldn't even know what it should do, but what I realized and I'm sure
you've seen it now that you've dug into this, there's a ton of material on how to create
program languages. And in my experience, it falls into two categories. The one category is
overly academic textbook kind of stuff where it says, like, you know, it doesn't even show
you what the programming looks like. It shows you only snippets and or it's glosses over
the syntax and says, you know, we don't have to worry about syntax. Let's just use this lisp
as a stand in whatever.
TB: And then on the other hand, you have these tutorials, posts that show how to build a
programming language and they mostly show really small toy examples. And they stripped
down so much. And it's often just a calculator like, you know, like a repl where you can
type in one plus two when you get out three. And what I wanted was something in the
middle fully documented, including tests with the real syntax, like no parentheses lisp stuff,
which I love-- like, for the record, I'm a fan.
TB: But what I wanted was like the real thing, like all of the programming languages, the
real ones, they have curly braces and they have keywords and whatever. And I wanted that,
like, how does that work? You know? So what I basically ended up doing is writing a lisp
and then putting the monkey outfit on top with the curly braces and the fn keywords and
let whatever, but it's it's really similar to JavaScript in a way, in the best possible way, I
guess. And JavaScript famously is also, you know, it started as a Scheme implementation,
like a really small Scheme implementation .. you have-- it's immutable by default. You have
first class functions. That means you can pass the lambdas around, you have higher level
functions. You can return functions from other functions. Not a lot of crazy data types, no
static type system.
TB: So I just took... a minimal Scheme added the curly braces and arrays and hashes and
the syntax for literals, so you can like literal types, so you can type array literal or like a hash
map or whatever, and that's basically it.
MB: So what were the kind of design choices that you made when you were thinking
about what Monkey should look like? I mean, like not in terms of curly braces or no curly
braces, but in terms of like, should it be a strict typing or like dynamic typing or should be
this or should it be that? Like, how did you think about was it like you had a design phase
with Monkey or was it just like I'm just going to build things and we'll see where we end
up? Right.
TB: The second one, yeah, definitely. And honestly, I feel like my my brain is not built for
the first type of step. It's if you start out and want to design a program language from the
ground up without, you know, building it at least in a prototype-y way, I cannot think that
far ahead. It's that the layer we're talking about here-- the programming language itself
and the built in functions and the keywords and you know how they work together. They
are so deep down.... That every little change you make has a lot of ramifications further up.
MB: Yeah,
TB: And that is the most impressive thing about programming language design. Like if you,
for example, look at the Go programming language, you know, there's this whole debate
about should Go have generics or not, and they Go authors famously pushed that decision
in front of them and didn't want to make a decision for years or said, no, Go doesn't need
them, and there's too many ramifications and effects that that would have, and people
would just say, oh, can you just add this and kind of just add that? And they experimented
around and had these prototypes and design concepts. And I was so impressed by....
Whenever they present it, like, oh, we make this attempt and turns out that this would
result in code that looks like this or we made this and that would change Go code to be
written like that. I was so impressed by their understanding of basically the power they
wield in a way, because, you know, look at Go or any other language the idioms you have
in a language, they just come from all of the pieces lying around, basically, like people go,
oh, Go's error handling works like this. So we going to use it like that. We're going to this.
This is what an error looks like. This is how we're going to use it. Ba ba ba ba ba. And if you
add more pieces into that mix, suddenly the whole game changes and suddenly... you
know...
TB: If you add the wrong piece, you can change how code is written in that language
completely, you know? I guess Java is famous, like if you look at Java in the pre generic
days when they didn't have generic looks completely different than modern Java. And the
same goes to C++. Right. They added a lot of stuff. And people-- people say, you know,
C++ 11 is not the same as the one from the nineties or early two thousands, whatever. All
of this has ramifications.
MB: That's the thing that I think I find the most intimidating about.... like, even undertaking
this, is understanding that tiny little design decisions I don't even realize I'm making could
have dramatic impacts. Like I think Java six that is like the point where they changed
something fundamental in how things works. And like everybody's struggling to migrate
off of. Like the the Python 2 to Python 3 story like, oh, we did this wrong. We're going to
do it differently so that we can grow and continue to thrive as a language. But it's going to
be really, really hard to make the transition. Like, I worry probably unnecessarily, because
to reach that point of success is in-and-of itself unlikely.
TB: Right.
MB: But still as a creative undertaking, you always want your work to represent the best
that you can do and you never want to go "If I had only thought a little bit harder about it,
I could have seen that problem coming." Is there anything in Monkey that in retrospect, if
you were to go back, you would think I would have done that differently in the way it's
designed.
TB: ...Yeah..... um.... I added return statements to Monkey and, you know. There's two ways
to return a value from a function in Monkey. Either you add a return statement and that
returns the value or it uses the last value of the last expression. Right. So if you have a
small function and it doesn't take parameters, but you just put in one plus two and it
returns three, this function.
TB: Or you can add, you know, an early exit and say if A > 3 return a five, closing brace and
then out six or whatever, so you have an early exit and.... Those-- like those return
statements, I added these like on a whim, I was like, I like the implicit return because I was
used to it from Ruby, I really like it because it allows you to write these really dense small
functions and pass them around... Scheme again. Return statements are not Scheme. So
suddenly, once you add return statements, you break... you kind of jump into this whole
paradigm and go like "Here I am!" return statements and we're just going to exit from this
function early. But it doesn't really fit a lot of the other stuff and it breaks kind of control
flow. For example, if-conditionals are also expressions.
TB: So if you say if true, then one else five, that whole thing evaluates to one. Right. But
you can now add a return statement in the consequence order of the alternative arm of the
if-conditional. And that suddenly changes to control flow because now this is not an
expression anymore, but it's an expression with like a hidden control flow statement in it.
And that needs to go up. Right. And then as I said, like I have first class functions, I have
higher level functions. That means you can return functions from other functions. That
means you can have two functions in another function and return from those two
functions. Right. And this is..... I had a few bugs in there. And it's just like... it's horrible. Like
it's you can make it work and it's working now. I guess there's still some bugs that nobody
has found, but it's... That there was one thing where I was short sighted and I was like, "OK,
I'm kind of trying to shoehorn something into this, but but I don't know what I'm doing.
And now it's really hard to implement."
MB: I'm seeing more and more parallels between the task of writing a programming
language and working with complex systems, I can't figure out if that's because I focus on
complex systems and therefore I'm predisposed to see them everywhere. The same way a
cardiologist will think of problems with your heart and a neurologist will think of problems
with your brain. But it does seem to be the case that programmers create languages
without being able to fully anticipate exactly how they will be used or how technology will
change around them. Is the solution to this dilemma the foresight earned through
experience? Should a student's first programming language be treated like the first
pancake off the griddle and just thrown away? I think about the stories I've heard about
the programming language C and its development, like the conventional origin story about
C is that Dennis Ritchie just walked into work one day and decided to create it. In reality,
the development of C started with the development of BCPL and B. Computer scientists
involved in those languages would build things, then change the implementation of those
languages to clear blockers they had discovered and in fact could only have discovered
through using the language. C then started from all of those lessons learned and then itself
had many different conflicting and undocumented implementations before it was
standardized in 1982, thirteen years after its development had started.
MB: I guess, like, from the perspective of somebody who has now done this for, like... what
sounds like two or three years, right? Like the first book took a year. The second book took
a year...
TB: Yeah, the first book was released in 2016.
MB: Oh so like... way more than that then.
TB: I started digging into it in '15, I guess. Yeah.
MB: So like a five year journey, like if you could highlight almost like the.... the decision
tree, if I can call it that, like I think there is a certain baseline level of decisions that once
you have gone that direction, other things are taken off the table, like what you were
saying with return statements when you decided that you wanted it to be kind of a lisp
Scheme flavored language, return statements just kind of didn't fit and didn't kind of work.
Like, how would you think about that in retrospect? Like, were there core decisions that
stand out in your mind as "I went down this road and that took all of these things off the
table" logically.
TB: I don't think I can name all of them, but a few stand out and that it won't be in
chronological order.
MB: That's all right.
TB: But so as I said, like the big one was Monkey is a Scheme in disguise, right? It is a
pretty simple language. It has only a few data types, first class functions. And I added the
syntax on top of it with the braces and, you know, the short keywords and let statements
and stuff like that.
TB: So this set. The basics, and then I realized....
TB: Once I implement that and actually, you know, wrote it down, I realized, oh, that's not
really useful, you know, so I added more data types like hashes and arrays. So to make it
more practical and make it more relatable or interesting, I guess, because this was always
missing and once you implement an interpreter that can do binary expressions with
numbers and booleans. It's cool, Yes. But it's-- it's not five hours of fun, you know.
TB: So I added these and then.... Somewhere around that time, I guess I also made a
decision to not add assignments or mutability to the language. So everything's basically
immutable, you cannot say that A equals 8 and then A equals 5 or whatever that is right
now, that's undefined behavior ...like it's not implemented. And the reason why I did it was
because that, you know. A equals eight or five is simple, sounds simple, right, but once you
add mutability and you can assign stuff or reassign stuff. You also need to incorporate that
in your closures which which captures state, right? Like you have a closure is a function that
wraps or closes around state. So what if it closes around state like a binding of a name like
A equals five? Right. When the closure is created, A equals five. And after the closure is
created, you say equals eight now. You know, what does the closure return? That is a
decision you have to make in different languages, have different answers to this, right?
MB: Right.
TB: That has effects like your decision. There is.. it's a simple decision to make, but it has a
lot of consequences. And I kind of, you know, I avoided that decision or said it's
immutable, but it also means.... It's not easy to implement loops right? Because a loop is
based on a condition that changes, I guess,
MB: Right, got to increment the i.
TB: Exactly, yeah, and if you don't have a way to change something, that's not
straightforward to implement. Right, like a for loop, like i = 0; i < 10; i++... That includes
assignment. That includes mutability.
MB: Right.
TB: So if you don't have assignment then that's off the table.
MB: That's interesting because I would not have made that connection until you pointed it
out. And like "Oh yeah. You really can't, can you?"
TB: Yeah.
TB: So I made a livestream in winter last year where I, you know, let's try out this live
streaming thing. And I said, let's add loops to Monkey. And I'm not sure whether you can
see it in the stream, but towards the end of it, I was like, Goddamn it, I need assignment for
this. The problem is you can. You can.... you can of course, you can write like while true and
then have it executed for all of eternity, right? Or you can say while My blah, blah, blah
function and where blah, blah, blah function as a built in function that returns whatever,
but as you know in the books everything is test driven. So we write the test first and you
want to make sure that it works. And there's no good way to write a test for loops without
basically-- Do like a dependency injection or mocking of the whole system, which breaks
this whole idea of really short and easy to read tests
MB: Right. Right.
TB: So, yeah.
TB: And then. Yes, so there was assignments and mutability, and I also I didn't want to add
classes, just aesthetic reasons, I guess, like the Scheme/Lisp py-whores were like, you know,
objects are just poor man's closure and closures are a poor man's objects. So if we have
closures we don't need objects, of course, readers are interested in.... how would you
implement classes? But such that, again, is such a seemingly small decision that has
incredibly huge consequences, like look at look at JavaScript. Like a couple of years ago to
added classes and people did classes, classes in air quotes by, you know, returning objects
with methods from other functions. But now that you have the class keyword and like this
whole syntactic structure, for them, it changed how people write code. So it's, again, like
huge consequences. And, you know, I didn't want to have classes. I'm I don't think I'm a
huge OOP fan anymore, I just didn't want to have them. And also everything that I just
said, I had to fit in in a reasonable amount of pages in a book.
MB: It's not a five part series then?
TB: Yeah and as you know: everything is test driven. So it's not just implementation, but it's
also a test for the implementation that you need to show in the book and every bit of
code. I think only two helper functions. I'm not sure.
MB: This is the place for something clicked for me. Immutable data, classes, loops, implicit
returns. What we're discussing are the conflicts that arise when mixing functional
programming styles with procedural* programming styles. So maybe a key part of that
menu I've been looking for is about the assumptions that ground those styles. How do we
want to handle state? Are functions first class? Is inheritance important?
MB: ....I think I need to get a few more books.
This should actually be imperative, not procedural but I didn't feel like rerecording that
voice over just to correct that error ;)

No One Just Designs A Programming Language

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

No One Just Designs A Programming Language

Uploaded by

Copyright:

Available Formats

Summary

You might also like