SE 765 - Distributed Software Development

CS 610 - Introduction to Parallel and Distributed Computing

Erlang 3 Distributed Programming I

 This lecture uses materials and has been adapted from:

http://www.erlang.org/doc/pdf/otp-system-documentation.pdf

http://learnyousomeerlang.com/distribunomicon

Setting up an Erlang Cluster

         Erlang gives names to each of the nodes to be able to locate and contact them.

         The names are of the form Name@Host, where the host is based on available DNS entries, either over the network or in your computer's host files (/etc/hosts on OSX, Linux and other Unix-likes, C:\Windows\system32\drivers\etc\hosts for most Windows installs).

         All names need to be unique to avoid conflicts if you try to start a node with the same name as another one on the same exact hostname, you'll get a crash message.

There are two types of names: short names and long names.

         Long names are based on fully qualified domain names (aaa.bbb.ccc), and many DNS resolvers consider a domain name to be fully qualified if they have a period (.) inside of it.

         Short names will be based on host names without a period, and are resolved going through your host file or through any possible DNS entry.

Because of this, it is generally easier to set up a bunch of Erlang nodes together on a single computer using short names than long names.

One last thing: because names need to be unique, nodes with short names cannot communicate with nodes that have long names, and the opposite is also true.

To pick between long and short names, you can start the Erlang VM with two different options:

erl -sname short_name@domain

or

erl -name long_name@some.domain.

Note: that you can also start nodes with only the names: erl -sname short_name or erl -name long_name.

         Erlang will automatically attribute a host name based on your operating system's configuration.

         Lastly, you also have the option of starting a node with a name such as erl -name name@127.0.0.1 to give a direct IP.

Note: Windows users should still use werl instead of erl.

However, in order to start distributed nodes and giving them a name, the node should be started from the command line instead of clicking some shortcut or executable.

Let's start two nodes:

erl -sname ketchup

...

(ketchup@ferdmbp)1>

 

erl -sname fries

...

(fries@ferdmbp)1>

To connect fries with ketchup (and make a delicious cluster) go to the first shell and enter the following function:

(ketchup@ferdmbp)1> net_kernel:connect(fries@ferdmbp).

true

 

The net_kernel:connect(NodeName) function sets up a connection with another Erlang node (some tutorials use net_adm:ping(Node)

         If you see true as the result from the function call, congratulations, you're in distributed Erlang mode now.

         If you see false, then you're in for a world of hurt trying to get your network to play nice. For a very quick fix, edit your host files to accept whatever host you want. Try again and see if it works.

You can see your own node name by calling the BIF node() and see who you're connecting to by calling the BIF nodes():

(ketchup@ferdmbp)2> node().

ketchup@ferdmbp

(ketchup@ferdmbp)3> nodes().

[fries@ferdmbp]

 

To get the nodes communicating together, register each shell's process as shell locally:

(ketchup@ferdmbp)4> register(shell, self()).

True

 

(fries@ferdmbp)1> register(shell, self()).

true

 

You can call the process by name.

The way to do it is to send a message to {Name, Node}.

Let's try this on both shells:

(ketchup@ferdmbp)5> {shell, fries@ferdmbp} ! {hello, from, self()}.

{hello,from,<0.52.0>}

 

(fries@ferdmbp)2> receive {hello, from, OtherShell} -> OtherShell ! <<"hey there!">> end.

<<"hey there!">>

 

So the message is apparently received, and we send something to the other shell, which receives it:

(ketchup@ferdmbp)6> flush().

Shell got <<"hey there!">>

ok

 

         We transparently send tuples, atoms, pids, and binaries without a problem.

         Any other Erlang data structure is fine too. And that's it.

Cookies

If you recall the beginning of the chapter, I had mentioned the idea that all Erlang nodes are set up as meshes. If someone connects to a node, it gets connected to all the other nodes. There are times where what you want to do is run different Erlang node clusters on the same piece of hardware. In these cases, you do not want to be accidentally connecting two Erlang node clusters together.

Because of this, the designers of Erlang added a little token value called a cookie. While documents like the official Erlang documentation put cookies under the topic of security, they're really not security at all. If it is, it has to be seen as a joke, because there's no way anybody serious considers the cookie a safe thing. Why? Simply because the cookie is a little unique value that must be shared between nodes to allow them to connect together. They're closer to the idea of user names than passwords and I'm pretty sure nobody would consider having a username (and nothing else) as a security feature. Cookies make way more sense as a mechanism used to divide clusters of nodes than as an authentication mechanism.

To give a cookie to a node, just start it by adding a -setcookie Cookie argument to the command line. Let's try again with two new nodes:

$ erl -sname salad -setcookie 'myvoiceismypassword'

...

(salad@ferdmbp)1>

 

$ erl -sname mustard -setcookie 'opensesame'

...

(mustard@ferdmbp)1>

 

Now both nodes have different cookies and they shouldn't be able to communicate together:

(salad@ferdmbp)1> net_kernel:connect(mustard@ferdmbp).

false

 

This one has been denied. Not many explanations. However, if we look at the mustard node:

=ERROR REPORT==== 10-Dec-2011::13:39:27 ===

** Connection attempt from disallowed node salad@ferdmbp **

 

Good. Now what if we did really want salad and mustard to be together? There's a BIF called erlang:set_cookie/2 to do what we need. If you call erlang:set_cookie(OtherNode, Cookie), you will use that cookie only when connecting to that other node. If you instead use erlang:set_cookie(node(), Cookie), you'll be changing the node's current cookie for all future connections. To see the changes, use erlang:get_cookie():

(salad@ferdmbp)2> erlang:get_cookie().

myvoiceismypassword

(salad@ferdmbp)3> erlang:set_cookie(mustard@ferdmbp, opensesame).

true

(salad@ferdmbp)4> erlang:get_cookie().

myvoiceismypassword

(salad@ferdmbp)5> net_kernel:connect(mustard@ferdmbp).

true

(salad@ferdmbp)6> erlang:set_cookie(node(), now_it_changes).

true

(salad@ferdmbp)7> erlang:get_cookie().

now_it_changes

 

 

Remote Shells

One of the first things we've learned in Erlang was how to interrupt running code using ^G (CTRL + G). In there, we had seen a menu for distributed shells:

(salad@ferdmbp)1>

User switch command

--> h

c [nn]            - connect to job

i [nn]            - interrupt job

k [nn]            - kill job

j                 - list all jobs

s [shell]         - start local shell

r [node [shell]]  - start remote shell

q        - quit erlang

? | h             - this message

 

The r [node [shell]] option is the one we're looking for. We can start a job on the mustard node by doing as follows:

--> r mustard@ferdmbp

--> j

1  {shell,start,[init]}

2* {mustard@ferdmbp,shell,start,[]}

--> c

Eshell V5.8.4  (abort with ^G)

(mustard@ferdmbp)1> node().

mustard@ferdmbp

 

Remote shell can be used the same way you would with a local one.

By using ^G again, you can go back to your original node.

Be careful when you stop your session though. If you call q() or init:stop(), you'll be terminating the remote node!

 

Ping pong example modified to run on two separate nodes:

-module(tut17).

 

-export([start_ping/1, start_pong/0, ping/2, pong/0]).

 

ping(0, Pong_Node) ->

{pong, Pong_Node} ! finished,

io:format("ping finished~n", []);

 

ping(N, Pong_Node) ->

{pong, Pong_Node} ! {ping, self()},

receive

pong ->

io:format("Ping received pong~n", [])

end,

ping(N - 1, Pong_Node).

 

pong() ->

receive

finished ->

io:format("Pong finished~n", []);

{ping, Ping_PID} ->

io:format("Pong received ping~n", []),

Ping_PID ! pong,

pong()

end.

 

start_pong() ->

register(pong, spawn(tut17, pong, [])).

 

start_ping(Pong_Node) ->

spawn(tut17, ping, [3, Pong_Node]).

Assume two computers called gollum and kosken.

We will start a node on kosken called ping and then a node on gollum called pong.

On kosken (on a Linux/Unix system):

kosken> erl -sname ping

Erlang (BEAM) emulator version 5.2.3.7 [hipe] [threads:0]

 

Eshell V5.2.3.7 (abort with ^G)

(ping@kosken)1>

On gollum:

gollum> erl -sname pong

Erlang (BEAM) emulator version 5.2.3.7 [hipe] [threads:0]

 

Eshell V5.2.3.7 (abort with ^G)

(pong@gollum)1>

Now start the "pong" process on gollum:

(pong@gollum)1> tut17:start_pong().

true

and start the "ping" process on kosken (from the code above you will see that a parameter of the start_ping function is the node name of the Erlang system where "pong" is running):

(ping@kosken)1> tut17:start_ping(pong@gollum).

<0.37.0>

Ping received pong

Ping received pong

Ping received pong

ping finished

Here we see that the ping pong program has run, on the "pong" side we see:

(pong@gollum)2>

Pong received ping

Pong received ping

Pong received ping

Pong finished

(pong@gollum)2>

Looking at the tut17 code we see that the pong function itself is unchanged, the lines:

{ping, Ping_PID} ->

io:format("Pong received ping~n", []),

Ping_PID ! pong,

works in the same way irrespective of on which node the "ping" process is executing.

Erlang pids contain information about where the process executes so if you know the pid of a process, the "!" operator can be used to send it a message if the process is on the same node or on a different node.

A difference is how we send messages to a registered process on another node:

{pong, Pong_Node} ! {ping, self()},

We use a tuple {registered_name,node_name} instead of just the registered_name.

       In the previous example, we started "ping" and "pong" from the shells of two separate Erlang nodes.

         spawn can also be used to start processes in other nodes.

         The next example is the ping pong program, yet again, but this time we will start "ping" in another node:

-module(tut18).

 

-export([start/1, ping/2, pong/0]).

 

ping(0, Pong_Node) ->

{pong, Pong_Node} ! finished,

io:format("ping finished~n", []);

 

ping(N, Pong_Node) ->

{pong, Pong_Node} ! {ping, self()},

receive

pong ->

io:format("Ping received pong~n", [])

end,

ping(N - 1, Pong_Node).

 

pong() ->

receive

finished ->

io:format("Pong finished~n", []);

{ping, Ping_PID} ->

io:format("Pong received ping~n", []),

Ping_PID ! pong,

pong()

end.

 

start(Ping_Node) ->

register(pong, spawn(tut18, pong, [])),

spawn(Ping_Node, tut18, ping, [3, node()]).

Assuming an Erlang system called ping (but not the "ping" process) has already been started on kosken, then on gollum we do:

(pong@gollum)1> tut18:start(ping@kosken).

<3934.39.0>

Pong received ping

Ping received pong

Pong received ping

Ping received pong

Pong received ping

Ping received pong

Pong finished

ping finished

Notice we get all the output on gollum. This is because the io system finds out where the process is spawned from and sends all output there.