Introduction to Networking
Networking
Friday, 04 April 2008

The very first time I tried to setup a server at home I discovered something unexpected: What I thought was going to be simple and easy turned out to be more complicated than expected because - in spite of many years of working with networks - it turned out that there was still so much more that I needed to know.

The next shock came soon after: virtually every book or document that I found on the subject attacked it in a way that made it seem terribly complicated. I wanted something simple and easy to read. I wanted something that was organized in such a way as to allow me to jump in and quickly start getting things done.

So here it is.

You can find a rather large number of Introductions to Networking on the Internet today. This one is intended to be different. In fact, let's go ahead and jump right in; let's start looking seriously at what we need to know to get simple things done.

TCP / IP

Virtually all the communications that we see these days take place using something called the Internet Protocol Suite. We call it TCP/IP - Transmission Control Protocol / Internet Protocol - the acronyms for two of the protocols in the suite.

The overall suite of protocols handle all the issues related to communications between computers. However, there are three main protocols that stand out: ICMP, UDP and TCP. These three protocols stand front and center in our lives when we are building, maintaining and using servers. So, let's get a basic picture of how they work into our minds:

Postcards vs. Operator Assisted Calls

The two main paradigms in communications are the post card (delivered by a mail man,) and the operator-assisted long-distance call. We will make regular use of these paradigms as we discuss networking, so let's start with a short review:

Post Cards

A Post Card is a small card - often a photo. You write, on the back of the card, the address that you want the card to be sent to and a short note for the person who receives it. The post man will carry the card to the address you specified in exchange for the value of the postage stamp - but that's all! The postage does not include any kind of reply service - so you don't know if your friend got the card or not.

Operator Assisted Calls

Many years ago, before the advent of the telephone exchange, there was a person who worked as a long distance telephone operator. In those days you would dial zero on your rotary phone and the operator would join the call. You would then speak to the operator about the long distance call that you wanted to make. You would tell the operator what country you wanted to call and who you wanted to speak with. The operator would then stay on the line with you until you were connected to the person you wanted to speak with. If there was any problem and the call did not go through - you would not be charged.

The difference here is a huge one: the post card is sort of a hit-and-miss proposition - it might work or it might not and you don't really know either way. The operator assisted call, though, is guaranteed service - you don't pay a cent unless the call goes through. Of course, operator assisted calls used to cost a fortune.

UDP & TCP

Not surprisingly, the two main protocols in the TCP/IP protocol suite are UDP and TCP:

UDP, the User Datagram Protocol, works like a post card carried by a post man. You send it out and the system will try to deliver it. It's surprising to find out about all the things that can go wrong:

  • A computer in the network can be busy. UDP messages sent in a sequence can arrive at their destination in a completely different sequence because some of the messages can be held-up by busy computers.

  • A computer can be really busy - so it might not pass along the UDP message at all. This is called a dropped packet in network parlance.

TCP, the Transmission Control Protocol, works like an operator-assisted long-distance call. It establishes a connection and then works hard to get around the various problems that come up during the conversation:

  • When a TCP connection receives a message it sends a reply.

  • When a TCP connection does not get a reply within a certain delay period, it resends the message.

  • When a TCP connection receives parts of a long message in the wrong sequence, it sorts the parts back into the proper sequence.

  • When a TCP connection is sending lots of data, the receiver will send rate control information back to the sender. This ensures that the data is sent as fast as possible without flooding the receiver.

TCP works by sending UDP packets. As you might imagine, there are lots of UDP messages that go back and forth during a TCP connection. TCP, therefore, is more expensive than UDP in terms of overhead.

Control Messages

There is one more protocol that you will really need to know about: ICMP. The Internet Control Message Protocol is used to send important information. For example:

  • If a connection is attempted but there is no software listening to the specified port, a Connection Refused reply is sent via ICMP.

  • If a connection is attempted to a computer that is not available for some reason, a Destination Unreachable ICMP message is returned.

  • Probably the most popular message is called an Echo Request It is more popularly known as a ping. When a ping request goes out, the connection that receives it simply sends back a reply. Pinging is a really easy way of checking out a network.

The three ICMP messages above are probably the most common ones but, of course, there are many more.

Addressing & Routing

Now that we know how to send post cards (UDP packets) and make telephone calls (TCP connections,) how do we specify who we want to talk to? And, once we specify who we want to talk to, how does the message get from where we are to where it's supposed to go?

Routing

The address information that a human writes on a post card tends to follow a certain form: Street number, Street, Borough, City, State and Country. Notice how the address is written in human form: From the nearest details to the farthest details. When sent to a different country, the post card address is processed in the reverse order: At each step in the delivery process, the size of the next geographical region is reduced.

As the post card is routed at the various steps before the final destination, the people who are routing it don't really know exactly where it's going - but each one can figure out how to get it to the next step in the route. Finally, at the last step, the mail man who is handed the post card actually knows where to find the house that the post card is going to.

TCP/IP networks also have an addressing scheme and a process for routing a message. Addressing of devices on a TCP/IP network is not much different from telephone numbers. The Routing procedure on the network is more complicated than the one used by the Post Office - but the basic ideas are not much different. Messages need to be routed from one network to another until they arrive at their destination. At any given step in the chain there may not be enough information available to know exactly where the message will arrive - but each router in the system does know the next step in the route that the message must take.

Fortunately, a typical server in a small company or home office is not connected to many networks. As a result, routing is often very easy to do for the majority of network administrators. For now it's enough to know that messages do need to be routed - we don't need to study routing in more detail just yet.

Addressing

On the Internet today there are two major addressing systems available: Internet Protocol version 4 (IPv4) and Internet Protocol version 6 (IPv6.)

The one that most of us are using at the time of this writing is IPv4. It allows a message to be delivered to any one of about four billion devices. This may sound like a large number, but, because of the way that IP numbers have been assigned to various companies throughout the world, we don't have enough to go around. There is currently a shortage of IP numbers.

IPv6 solves this problem. IPv6 supports so many possible devices that there's no real chance that we will ever run out of IP numbers in the future. When will start using IPv6? Well, at the moment there is more and more software available that is able to operate in IPv6 environments. We should be able to go ahead and start using it right away. Eventually more people will do so and, eventually, the day will come when you will connect to your Internet Service Providers' network and find that you have been assigned an IPv6 address. On that day you will know that the IPv6 future has arrived, at least for you.

For learning purposes we can continue to use the shorter IPv4 numbers. Again, we don't really need to know about routing just yet - but we need to look at addressing.

Each computer (specifically each network device) needs to have an address so that it can be found. The device address is called an Internet Protocol number, or IP Number. To make it easier to remember it's broken up into sections - sort of like a telephone number - but basically it's a big number.

IP Addresses are also logically organized into ranges. Some of the IP ranges are reserved for internal use within private networks. Other ranges are assigned to various companies, on a first-come first-served basis, by an international organization that was charged with this responsibility. Most of us get our public IP numbers from our local Internet Service Provider.

Port Numbers

Computers tend to run lots of software at the same time. When we address a message to another computer, we need to specify which program we want the message to go to. This part of the address is called the Port Number. It is used to route the message to the software that will handle it.

Destination Port Numbers for common protocols are assigned by convention. For example: if we want to get a web page from a web server we connect to Destination Port 80. Port 80 is the well known port number for the web protocol. The http:// part of the address that you type into the browser eventually translates into port 80 - you don't often need to type the actual port number. Another well known port number is 25 for sending email. Again, you don't normally need to know the number - by convention the network software defaults to port 25 when email is being sent.

There are 65,535 port numbers available. The standard port numbers are listed in a file called services. On Unix systems this file is found in the /etc/ directory; in Microsoft Windows it's found in the windows installation directory (often C:\Windows\.)

Source Port Numbers are used for responses sent back from the server. When a web browser connects to a web server it must specify destination port 80. Remember that, by convention, port 80 is the well known port number that the web server software is listening to. However, the web browser doesn't care what port number it receives its replies on. For this reason, Source Port Numbers are automatically assigned from available port numbers when a connection is made.

Sending a Message

At this point we can put together what we've learned and send a message from one computer to another. Let's say we have two computers in our network: PC A and PC B. PC A wants to send a message to PC B. What happens?

First, PC B must be running some software that will listen to a specified port number. The software must register itself with the operating system. During the registration process, the software will specify the port that it wants to listen to. Let's say, for example, that PC B has an IP address of 192.168.1.20 and is listening on port 34567.

PC B: 192.168.1.20:34567

To send a message to PC B, PC A will try to open a connection on the specified port. Let's say, for example, that PC A has an IP address of 192.168.1.10. When the program running on PC A asks to open the connection to PC B, the operating system will assign an unused port number on PC A. Let's say that PC A is assigned 45678. The resulting connection would be numbered like this:

192.168.1.10:45678 ... 192.168.1.20:34567

Since we're using TCP here we are establishing a connection through which both sides can send and receive messages. So, let's go ahead and try it. At first we can try it from one shell to another. After that we can try it from one PC to another.

Telnet & Netcat

If you have one PC running Linux you can try this using the telnet and nc (netcat) utilities. Netcat is sort of a networking swiss army knife that allows you to send and receive using TCP and UDP. Telnet is a utility that has been around for a long time. As with netcat it can establish a connection with a TCP port and allow you to send and receive text through that connection.

If you are using Windows you will find that telnet is already installed. Netcat, unfortunately, is not available by default. You will have to first download a version of netcat from your favorite source. If you are an Apple Mac user you will no doubt find telnet installed on your system by default. You may have to check the Apple documentation to find out how to find the equivalent of netcat in your environment.

Let's start by sending a message from one shell to another on the same computer. Open two shells (also known as command shells or DOS boxes in Windows.) In the first shell, run netcat and tell it to listen on port 34567:

$ nc -l 34567

(The command should be the same in Windows; check the documentation for your version of netcat.)

On the second shell, use the telnet command to talk to netcat:

$ telnet localhost 34567

(The above command will work in Windows command shells as is.)

When the telnet program starts running, it establishes a TCP connection to the netcat program. By convention, localhost is the name of a special network interface, the Loop Back device, on the local computer. There should be a loopback device on every computer you work with and it should always have the same IP address: 127.0.0.1.

Getting back to the telnet program: it is waiting for you to type something. Anything that you type will be sent through the TCP connection to the listening program; in this case netcat. Netcat will simply take whatever it gets and display it on the screen.

Note that nothing happens until you press the Enter key - both programs will buffer a line of input text before they send anything.

tcpdump

Another good thing to try is to open a third shell and run a utility called tcpdump on the loopback device. Tcpdump is a utility that listens to the traffic passing through an interface. It can display everything it finds on your screen or you can tell it to display only specific information.

(Windows users will find that there are many programs that can be downloaded to perform similar functions. Look for wireshark, among others.)

On Linux systems you will only be allowed to do this if you have access to the root password. This is because the tcpdump utility must have low-level access permission to be able to listen to device traffic.

In this case, we want to display the activity on the loopback device that involves port number 34567. The command (for Redhat-related distributions of linux,) is:

$ su -c "tcpdump -i lo port 34567"

Now, when you go to one of the other shells and type a line of text: The text will appear in the other shell and a trace of the traffic on the loopback device will appear in the tcpdump shell:

listening on lo, link-type EN10MB (Ethernet), capture size 96 bytes 14:27:20.236572 IP localhost.localdomain.34567 > localhost.localdomain.36636: P 3217691855:3217691873(18) ack 3213568188 win 256 <nop,nop,timestamp 343270274 342791263> 14:27:20.236593 IP localhost.localdomain.36636 > localhost.localdomain.34567: . ack 18 win 257 <nop,nop,timestamp 343270274 343270274>

The exact meaning of the above is perhaps not terribly important - but you can see that a message goes from the listening program (netcat, where I sent the line of text from,) to the connecting program (telnet,) in the other shell. This message is followed by a reply. Note also that the first message comes from the destination port 34567 that was specified when the listening program, nc, was started. The connecting program, telnet, was assigned a source port number of, in this example, 36636.

From either of the first two shells you can press Control-D to send an End of File message. This will cause both programs to stop running.

ping

Another utility that you might want to try: ping lets you verify that you can get a reply from a specific device. In one of the shells, type ping localhost to see how that works:

$ ping localhost PING localhost.localdomain (127.0.0.1) 56(84) bytes of data. 64 bytes from localhost.localdomain (127.0.0.1): icmp_seq=1 ttl=64 time=0.018 ms 64 bytes from localhost.localdomain (127.0.0.1): icmp_seq=2 ttl=64 time=0.027 ms 64 bytes from localhost.localdomain (127.0.0.1): icmp_seq=3 ttl=64 time=0.024 ms 64 bytes from localhost.localdomain (127.0.0.1): icmp_seq=4 ttl=64 time=0.024 ms 64 bytes from localhost.localdomain (127.0.0.1): icmp_seq=5 ttl=64 time=0.027 ms --- localhost.localdomain ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4001ms rtt min/avg/max/mdev =0.018/0.024/0.027/0.003 ms

Of course, the localhost address and the loopback device should always be working on your computer. The information that they return is more interesting when you try it on devices that are not working properly. Try to ping a number you don't expect to hear back from, such as 12.34.56.78 (when you are not connected to the internet:)

$ ping 12.34.56.78 PING 12.34.56.78 (12.34.56.78) 56(84) bytes of data. --- 12.34.56.78 ping statistics --- 12 packets transmitted, 0 received, 100% packet loss, time 11000ms

There are many reasons why you might not get a response from a device that you try to ping. With time you will learn about all the various bits of software and hardware that carry the ICMP messages back and forth. Once you have a solid understanding of the whole process you will be able to investigate each step in the route, find any problems and fix them.

traceroute

One last exercise that might be useful is to try traceroute. You will need to be connected to the internet for this to work. Try traceroute www.microsoft.com, for example. On windows machines the command name is a little different: tracert instead of traceroute. Traceroute will list all the routers between your computer and the one you want to talk to. It will display a total round-trip time, in thousandth's of a second (ms,) three times, for each router in the route. If a router drops a packet, traceroute will display a star in place of the round-trip time for that ping.

Some traceroute results are kind of amazing! When I first discovered this utility I enjoyed connecting to the internet from a dial-up line and running traceroutes back to the local high-speed connection. It's amazing how two computers sitting next to each other in an office might have to route their packets through several other cities when they communicate together. This is because each service provider has agreements with different back-bone operatators in different cities. Even these days it's amazing to see just how far a packet has to travel to get from one ISP to another. Routes between two local computers in Montreal often have to pass through Toronto, Chicago and New York - and will often do so in less than 60ms.

Finally, if you do have two computers handy, you might want to try the above telnet/netcat exercise between them. Of course, you will want the two computers to already be connected through a working network. You will also need to know the address of one of them. Let's continue to use the example addresses that we recently used for PC A and PC B above:

PC A: 192.168.1.10 PC B: 192.168.2.20

On PC A run netcat and tell it to listen to port 34567 (nc -l 34567 or netcat -l 34567.) On PC B, telnet to PC A (telnet 192.168.1.10 34567) and send some lines of text back and forth.

You should see the same results that you saw in the single computer example.