An Erlang primer

Johan Montelius

Introduction
This is not a crash course in Erlang since there are plenty of tutorials available on the web. I will however describe the tools that you need so that you can get a programming environment up and running. I will also list a set of task that we will work on together. I will however take for granted that you know some programming languages, have heard of functional programing and that recursion is not a mystery to you.

1

Getting started

The first thing you need to do is download the Erlang SDK. This is found at www.erlang.org and is available both as a windows binary and as source that you can make with the usual tools available on a Unix or even MacOs platform.

1.1

Emacs

You also need an text editor and what is better than Emacs. Download it from www.gnu.org/software/emacs/, available both for Unix, Mac and Windows. You need to add the code below to your .emacs file (provided that you have Erlang installed under C:/Program Files/). This will make sure that the Erlang mode is loaded as soon as you open a .erl file and that you can start an Erlang shell under Emacs etc. Change the <Ver> and <ToolsVer> to what is right in your system. On a Linux box it will look similar but the install directory is something like /usr/local/lib/erlang/. (setq load-path (cons "C:/Program Files/erl<Ver>/lib/tools-<ToolsVer>/emacs" load-path)) (setq erlang-root-dir "C:/Program Files/erl<Ver>") (setq exec-path (cons "C:/Program Files/erl<Ver>/bin" exec-path)) (require ’erlang-start)

1

If everything works you should be able to start an Erlang shell inside Emacs by M-x run-erlang (M-x is < escape > followed by x)

2

Hello World

Open a file hello.erl and write the following: -module(hello). -export([world/0]). world()-> ‘‘Hello world!’’. Now open a Erlang shell, compile and load the file with the command c(hello). and, call the fuction hello:world().. Remeber to end commands in th eshell with a dot. If things work out you have sucessfully written, compiled and executed your first Erlang program. Find the Erlang dokumentation and read the “Getting started” section.

3

A HTTP parser

To have something to work on, something that you will use in the next session, you should implement a HTTP parser. We will not build a complete parser nor will we implement a server that can reply to queries (yet) but we will do enough to understand how Erlang works and how HTTP is defined.

3.1

One file, one module

Open a file http.erl and declare a module http on the first line. Also export the functions that we will use from the outside of the module. In the end we will not export everything but while you’re debugging you want to test the functions as we implement them. -module(http). -export([parse_request/1]). We’re only going to implement the parsing of a HTTP get request; and avoiding some details to make life easier. Download the RFC 2616 from www.ietf.org and follow the descriptions of a request. From the RFC we have: Request = Request-Line *(( general-header 2 ; Section 5.1 ; Section 4.5

| request-header | entity-header ) CRLF) CRLF [ message-body ]

; Section 5.3 ; Section 7.1 ; Section 4.3

So a request consist of a request line optionally followed by a sequence of headers, a carriage return line feed and an optional body. We also not that each header is terminated by a carriage return line feed. OK, let’s go; we implement each parsing function so that it will parse its element and return a tuple consisting of the element and the rest of the string. parse_request(R0) -> {Request, R1} = request_line(R0), {Headers, R2} = headers(R1), {Body, _} = message_body(R2), {Request, Headers, Body}. The request line is, according to the RFC, a line consisting of a method, URI, and http version; separated by space characters and terminated by a carriage return line feed. Request-Line = Method SP Request-URI SP HTTP-Version CRLF

The method is one of: OPTIONS, GET, HEAD etc. Since we are only interested in get requests for now it makes life easy. request_line([$G, $E, $T, 32 |R0]) -> {URI, R1} = request_uri(R0), {Ver, R2} = http_version(R1), [13,10|R3] = R2, {{get, URI, Ver}, R3}. Note how we now treat a string a list of integers and matches the argument with the list starting with the integers of G, E and T followed by 32 which is the ASCII value for space. After having matched the input string with “GET ” we continue with the rest of the string R0. We find the URI and the version and finally the carriage return and line feed that end the request line. We then return the tuple {{get, URI, Ver}, R3}, the first element is the parsed representation of the request line and R3 is the rest of the string. Next we implement the parsing of the URI. This requires recursive definitions that if you have not been exposed to logic nor functional programing this might twist your head. If you have done some functional programming you might say “this is not tail recursive, I can do it better”, that is fine I only wanted to make things easy and not introduce things that we don’t need. 3

request_uri([32|R0])-> {[], R0}; request_uri([C|R0]) -> {Rest, R1} = request_uri(R0), {[C|Rest], R1}. The URI is returned as a string. There is of course a whole lot of structure to that string; the resource, a query and an index. For now we leave it as a string but feel free to parse it later. Parsing the version is simple, it’s either version 1.0 or 1.1. We represent this by th atoms v11 and v10. This mean that we later can switch on the atom rather than again parsing a string “1.1”. It does of course mean that our program will stop working when 1.2 is release but that will probably not be this week. http_version([$H, $T, $T, $P, $/, $1, $., $1 | R0]) -> {v11, R0}; http_version([$H, $T, $T, $P, $/, $1, $., $0 | R0]) -> {v10, R0}. Headers also have internal structure but we are only interested in dividing them up into individual strings and most important find the end of the header section. We implement is as two recursive functions; one that consumes a sequence of headers and one that consumes individual headers. headers([13,10|R0]) -> {[],R0}; headers(R0) -> {Header, R1} = header(R0), {Rest, R2} = headers(R1), {[Header|Rest], R2}. header([13,10|R0]) -> {[], R0}; header([C|R0]) -> {Rest, R1} = header(R0), {[C|Rest], R1}. The last thing we need is parsing of the body and we will make things very easy (even cheating). We assume that the body is everything that is left but the truth is not that simple. If we call our function with a string as input argument there is little discussion of how large the body is but this is not easy if we want to parse an incoming stream of bytes. When do we reach the end, when should we stop waiting for more? The length of the body is therefore encoded in the headers of the request. Or rather in 4

the specification of HTTP 1.0 and 1.1 there are several alternative ways of determining the length of the body. If you dig deeper into the specs you will find that it is quite messy. In our little world we will however treat the rest of the string as the body. message_body(R) -> {R, []}. You now have all the pieces and if you compile and load the module in a Erlang shell you can parse a request. Call the function with the http module prefix and give it a string to parse. 7>c(http). {ok,http} 8>http:parse_request("GET /index.html HTTP/1.1\r\nfoo 34\r\n\r\nThis is the body"). {get,"/index.html",v11,["foo 34"],"This is the body"} 9>

4

Finding a prime

This is a test that we should use in a coming exercise. The task is to test if a number is prime. A complete algorithm is quite expensive to execute for large numbers so we use an algorithm that will detect if number is not a prime with very high accuracy. The algorithm is from Fermat and to compute it we need a fast implementation of modular exponentiation. Open up a new file fermat.erl and declare the module fermat. mpow(N, 1, _) -> N; mpow(N, K, M) -> mpow(K rem 2, N, K, M). mpow(0, N, K, M) -> X = mpow(N, K div 2, M), (X * X) rem M; mpow(_, N, K, M) -> X = mpow(N, K - 1, M), (X * N) rem M. This algorithm will calculate N K mod M either by N K/2 ∗ N K/2 mod M if K is even, or by N K−1 ∗ N mod M if K is odd. Modular multiplication is nice since you can apply the modular operation on both terms and the result will be the same. Try it and see if it works. Next we implement the test by Fermat. If a random number R, less than P , raised to P − 1 modulo P is equal to 1 e.g. if RP −1 is relative prime to P , then it is very likely that P is prime. 5

fermat(1) -> ok; fermat(P) -> R = random:uniform(P-1), T = mpow(R,P-1,P), if T == 1 -> ok; true -> no end. Note that it is only likely so we want to perform this test with different random numbers. test(_, 0) -> ok; test(P, N) -> case fermat(P) of ok -> test(P, N-1); no -> no end. How many times do we have to perform the test, well it depends on how many false primes you want to find. If note that most numbers are not prime and fail the test in one try. If we stumble on a prime we might as well run the test a couple of time since it will not slow down the over all performance that much. That’s it. Compile, load and do some experiments. Why not build a prime generator, start with an integer and test if it’s a prime. If it’s not then move on to the next integer. Notice that Erlang is implemented using a big-num packet and can handle integers of (almost) arbitrary size. This is (probably) a prime that I found after some searching: 75654596987987976987 68756756757657656987 98789798798789796546 54654564217541236547 65421378512736521765 73658765123765123786 512378657852319179

6

5

Dijkstra

This is a algorithm that we will use when we build a network of routers. The algorithm will compute a routing table. The table is represented by a list with one entry per node where the entry describes which gateway should be used to reach the node. The input to the algorithm is a map represented as a set of vertexes and a set of gateways to which we have direct access. In our example our own node will not be explict part of the map. Before we implement the algorithm we need to implement operations on a map and operations on the sorted list. Before implementing the operations I advice you to study the lists library and learn how keysearch/3, keydelete/2, map/2 and foldl/3 works. It will make your life easier.

5.1

the map

Thinks of a good representation of a directional map where you should easily be able to update the map and find nodes directly connected to a given node. In a module map, implement and export the following functions: • new(): returns an empty map • update(Node, Links, Map): updates teh Map to reflec that Node has directional links to all nodes in the list Links. • reachable(Node, Map): lists the nodes directly reachable from Node. • all nodes(Map): lists all nodes in the map, also the one without outgoing links.

5.2

the sorted list

In the dijkstra algorithm we will use a sorted list when we calculate a new routing table. It need not be sorted but we should quickly find the entry with the shortest path (could a heap do?). We should also be able to find the length of the shortest path of a node and update the list to give a node a new length and a new gateway. In a module dijkstra implement the three functions: • entry(Node, Sorted): returns the length of the shortest path or 0 if the node is not found. • replace(Node, N, Gateway, Sorted): replaces the entry for Node in Sorted with a new entry having a new length N and Gateway. The resulting list should of course be sorted.

7

• update(Node, N, Gateway, Sorted): update the list Sorted with the information that Node can be reached in N hops using Gateway. If the current shortest path to a node in Nodes is longer than N then the entry in Sorted must be replaced. Use the above functions, why does entry/2 return 0 if the node is not found?

5.3

the algorithm

Now in the same module implement the function table/2 that should take a list of gateways and a map and produce a routing table with one entry per node in the map. Each entry will state which gateay to use to find the shortest path to a node. Follow the outline below and you will have your program running in no-time. iterate(Sorted, Map, Table) If there are no more entries in the sorted list then we are done and the routing table is complete. Otherwise take the first entry in the sorted list, find the nodes in the map reachable from this entry and for each of these nodes update the Sorted list. The enttry that you took from the sorted list is added to the routing table. You’re very close to a solution. table(Gateways, Map) All we have to do is list the nodes of the map and construct a initial sorted list. This list should have dummy entris for all nodes with the length set to infinity, inf, and the gateway to unknown. The entries of the gateways should have length 1 and gateway set to itself. When you have constructed this list you can call iterate with a empty table [].

8