Socket Create and Connect With C

From Free Knowledge Base- The DUCK Project: information for everyone
Jump to: navigation, search
The program listed here creates a Socket and connects it to a remote computer. 
It's assumed that we connect to a web-server at the other end. Then when/if the 
socket is connected it reads what the remote machine outputs over the socket 
and prints it to the screen. Then the program exits.

Program listing:

  1 #include <stdio.h>
  2 #include <string.h>
  3 #include <stdlib.h>
  4 #include <unistd.h>
  5 #include <fcntl.h>

  6 #include <netinet/tcp.h>
  7 #include <sys/socket.h>
  8 #include <sys/types.h>
  9 #include <netinet/in.h>
 10 #include <netdb.h>

 11 int socket_connect(char *host, in_port_t port){
 12         struct hostent *hp;
 13         struct sockaddr_in addr;
 14         int on = 1, sock;     

 15         if((hp = gethostbyname(host)) == NULL){
 16                 herror("gethostbyname");
 17                 exit(1);
 18         }
 19         bcopy(hp->h_addr, &addr.sin_addr, hp->h_length);
 20         addr.sin_port = htons(port);
 21         addr.sin_family = AF_INET;
 22         sock = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
 23         setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, (const char *)&on, sizeof(int));
 24         if(sock == -1){
 25                 perror("setsockopt");
 26                 exit(1);
 27         }
 28         if(connect(sock, (struct sockaddr *)&addr, sizeof(struct sockaddr_in)) == -1){
 29                 perror("connect");
 30                 exit(1);
 31         }
 32         return sock;
 33 }

 34 #define BUFFER_SIZE 1024

 35 int main(int argc, char *argv[]){
 36         int fd;
 37         char buffer[BUFFER_SIZE];

 
 38         if(argc < 3){
 39                 fprintf(stderr, "Usage: %s <hostname> <port>\n", argv[0]);
 40                 exit(1); 
 41         }

        
 42         fd = socket_connect(argv[1], atoi(argv[2])); 

 43         write(fd, "GET /\r\n", strlen("GET /\r\n"));  

 44         bzero(buffer, BUFFER_SIZE);

 45         while(read(fd, buffer, BUFFER_SIZE - 1) != 0){
 46                 fprintf(stderr, "%s", buffer);
 47                 bzero(buffer, BUFFER_SIZE);
 48         }

 
 49         shutdown(fd, SHUT_RDWR); 
 50         close(fd); 

 51         return 0;
 52 }

We first state that the line numbers in the beginning of all lines are just 
there make it easier to go through the program. If you intend to compile this 
program remove all the line numbers first.

To compile the program you need a C Compiler like GCC. In order to do so you 
type the following in your shell, provided that you saved the code in a file 
called connect_socket.c

 $ gcc -o connect_socket connect_socket.c 

This will produce a program called connect_socket, you can try it using:

 
 $ ./connect_socket linuxdocs.org 80

Web-servers typically listens to port 80 so if there's any web-server at 
linuxdocs.org then you would get the index page.

Now to the program, what does it do? First we start at line 35. This is where 
all C programs starts (this is not always true, but in most cases at least). 
This is a function called main that takes two arguments. int argc and char 
*argv[]. argc is the number of arguments to the program. Arguments are the ones 
you provided from the command line, for example "linuxdocs.org" and "80" in the 
case above. These arguments are stored in *argv[], which is of the type pointer 
to char arrays. The program name is always stored in the first argument so if 
you would want to write out the programs name you can try:

 pritnf("Hello I'm a program called: %s\n", argv[0]);

Further at line 36 we declare a file descriptor called fd. We will use it later 
to read from the socket we are about to create. Also, a char array named buffer 
is declared to store the incoming data in, more about that later. We see that 
we want the array to be of size BUFFER_SIZE. Previously we declared BUFFER_SIZE 
to be 1024, that is on line 34. The #define statement is a pre processor 
directive. For now we just state that: at ever place where we use the word 
BUFFER_SIZE we get the value 1024.

At line 38 we check that the user have supplied the program with the correct 
number of arguments.

 38         if(argc < 3){
 39                 fprintf(stderr, "Usage: %s <hostname> <port>\n", argv[0]);
 40                 exit(1); 
 41         }

That is, argc < 3, recall that the array called argv stores three arguments. 
First the name of the program, then the user supplied. So we are really 
interested in argv[1] and argv[2]. If the user fails to provide enough 
arguments we print some error message at line 39 and exits the program at line 
40. What is this stderr you can see on line 39, the first argument to fprintf. 
There are three standard file streams that most operating systems provide, 
stdin, stdout and stderr. stderr in this case is an un buffered output. 
Everything written to this filestream is printed directly to the terminal. 
stdout however often wait for a while. If the program terminates in some bad 
way, like a segmentation fault it's not guarantied that things written to 
stdout will be displayed, therefor it's often the case that error messages are 
written to stderr so that we are guarantied to see then. If you'r interested in 
fprintf you should check it's man page. man fprintf. Please note that it's 
often considered good conduct to check the arguments provided to a program, 
can't really hurt.

On line 42 we call the function socket connect. It look like this:

 42         fd = socket_connect(argv[1], atoi(argv[2])); 

The first argument provided to the program is the host we wish to connect to, 
we simply pass that argument to the function. The second argument is the port. 
Here, however we get a string i.e. "80" but would like to convert it to an 
integer. This is done by the function atoi(). atoi() takes as argument a string 
and tries to convert it to an integer. atoi is not guarantied to return 
something senceful, if you would provide it with "hello" the returned value is 
somewhat undefined, what you actually get is depending on the actually 
implementation of atoi. Just be careful so that you don't assume anything about 
the arguments to atoi.

Now back to line 11 where the function socket_connect is declared. The first 
part looks this:

 11 int socket_connect(char *host, in_port_t port){
 12         struct hostent *hp;
 13         struct sockaddr_in addr;
 14         int on = 1, sock;     

We have here declared a variable *hp of type struct hostent. This variable is 
used later when we try to figure out the host address associated with the 
hostname that we provided the program. We'll look at a bit later. addr is a 
variable of type struct sockaddr_in. This variable is used later when we open 
the connection to the remote host. Further we use a variable called on, that 
helps us later. And last a variable called sock, this is the actuall file 
descriptor that we will associate the opened socket with later on.

Now we come to the part where we actually try to resolve the host address 
accosiated with the host name. This is done on line 15.

 15         if((hp = gethostbyname(host)) == NULL){
 16                 herror("gethostbyname");
 17                 exit(1);
 18         }

The function called gethostbyname takes a char* as argument which might be 
something like "www.google.com" or "192.168.0.1". It returns a pointer to a 
struct of type struct hostent. We check that we get something senseful out of 
it, i.e. is the pointer was assigned the value NULL we have something of an 
error. Exactly what happened is unknown, but we assume that the function herror 
can tell us. Therefor we call herror with an argument "gethostbyname". For 
example we might try to lookup a hostname that does not exist.

 $ ./socket_connect www.hshsasjdhas.dfhsaj 80
 gethostbyname: Unknown host

manpages for gethostbyname will get you additional useful information.

On line 19 we take the result from gethostbyname, that is hp, and use a part of 
the struct called h_addr. This part contains the IP number to the host. 
Typically encoded as 4 byts. This is not always the case so rather than 
assuming anything about the length we use hp->h_length, a variable that 
indicates exactly the length of the IP. You should check, again, the manpages 
for gethostbyname if you're interested in what the struct contains. Anyway, we 
use bcopy to copy the address to the part of addr called sin_addr. As we can 
see here addr is stack allocated, which means that we have to use a pointer to 
the struct rather than the struct itself. That is done by using the & operator. 
This might be very confusing at first, but rather than covering all the details 
here we just state that you have to do like that. Anyway, now the address is 
copied into addr.

 19         bcopy(hp->h_addr, &addr.sin_addr, hp->h_length);

What about the port? Internet connected machines typically can listen to at 
most 65536 different ports. But mostly they listen to just a few of them. We 
might decide that the typical webserver listens to port 80, so that is what we 
try. Now comes an interesting function called htons.

 20         addr.sin_port = htons(port);

Some architectures like SPARC use something called BIG ENDIAN byte order, and 
some like Intel and clones uses LITTLE ENDIAN byte order. What is the 
difference? Everything has to do with the way they have chosen to encode bits 
in an integer. As an example we assume a 32 bit integer then we can think of 
the interger as being built out of 4 bytes. one byte = 8 bits, thus 8*4 = 32. 
Something like this A,B,C,D. Where A corresponds to the first byte, B the 
second and so on. If A,B,C,D is the case with BIG ENDIAN, then LITTLE ENDIAN 
encodes it like this: D,C,B,A. Alright, so they have different ways to encode 
the same number. What's kind of interesting here is that it's not simply a 
revere order of the bits, but rather the bytes. However, if we wish to send 
binary data from one machine to another it might be very useful to know how 
the interpreter and encode integers. And now we are going to send a package 
over the Internet to a host of unknown architecture. We better take some 
precautions. To deal with this matter it's decided that Internet is BIG ENDIAN 
byte order. Simple as that. The htons function which is short for 
host-to-network, change the byte order if necessary. A way to check what 
byteoredr your machine has is to run the following test:

 printf("%d\n", htons(666)); 

If it prints 39462 you'r on a machine that uses LITTLE ENDIAN and if it prints 
666 you'r on a BIG ENDIAN machine. Continuing with line 21 we simply tell the 
addr struct that we are interested in the Address Family InterNET, AF_INET.

 21         addr.sin_family = AF_INET;

When this is done we create a socket, as told by the sock manpage (try 
man -S 2 socket if get nothing, or unrelated info) we simply creates a 
communication endpoint. This socket is not connected to anything yet, but we 
specify some interesting attributes for the socket. First we use something 
called PF_INET which specifies which protocol we what to use. PF_INET 
corresponds to Protocol Family IPV4. You could for instance use PF_INET6 which 
corresponds to IPV6 or PF_IPX which is the Novel protocol, and so on. Then we 
tell the socket function that we are interested in using SOCK_STREAM, this 
argument corresponds to the type of communication. SOCK_STREAM typically 
corresponds to two way reliable communications. You could for example use 
SOCK_DGRAM here if you want to send datagram packages. Last we specify that we 
are interested in an TCP connection by giving IPPROTO_TCP as argument. Again, 
check the socket manpage for more details.

 
 22         sock = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);

More options, this is getting more and more complex. :)=. We use setsockopt to 
tell set some options for the socket. first we tell it to use IPPROTO_TCP, 
again. Then we specify some options for this protocol, namely that we are 
interested in no delay communications using TCP_NODELAY. If you recall the 
variable on before, we did set it to 1. When sending it to the function 
setsockopt it means that we are interested in enabling TCP_NODELAY rather then 
disabling it, 0 would do that. Interesting enough we send a pointer, recall 
that & gets the address to a variable, in this case on. We also tell how large 
this variable is by sending in the last argument sizeof(int). setsockopt is a 
quiet useful function that can manipulate a lot of properties that sockets 
have. Check out the manpage for setsockopt to get more details.

 23         setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, (const char *)&on, sizeof(int));
 24         if(sock == -1){
 25                 perror("setsockopt");
 26                 exit(1);
 27         }

If this option manipulation for some reason fails, maybe because some option we 
try to enable is not available for the type of communication we want to use 
then we get the return value -1. This is checked at line 24 to 27. The general 
idea is the same as with gethostbyname before, but we use another error 
function here. Check manpages for perror if you're interested.

Now at last, we are ready to connect out socket to a remote machine. The 
function connect does this. We use the sock variable that we have done a lot of 
things with. Also we use the addr variable which have some information about 
where we wish to connect. Observe that we cast the addr variable to a pointer 
of type struct sockaddr. Again we use the & operator to get the address of the 
struct. We also tell how large this struct is by using sizeof(struct 
sockaddr_in). Then we check the reurn value, if -1 we have problems. For 
instance we might want to connect to a port on a machine that didn't listen. 
For example:

$ ./connect_socket localhost 6677 connect: Connection refused

 28         if(connect(sock, (struct sockaddr *)&addr, sizeof(struct sockaddr_in)) == -1){
 29                 perror("connect");
 30                 exit(1);
 31         }
 32         return sock;

Error handling here is very similar to the above examples. Now we return the 
just created socket at like 32. Back to the main function, we want to read 
things from this socket also.

But how do we make the web-server at the other end send us anything? Well 
luckily the procedure is very simple. Sending "GET /\r\n" to a web-server just 
tells the server that we want the root of the server, this often defaults to 
the index page. The "\r\n" is just a standard way of telling the server at the 
other end that we won't send anything else on the same line, so it's safe to 
interpreter the line as is. The function write does this for ur. It takes 3 
parameters, the file-descriptor fd, that is the socket. Furthermore, the string 
we want to send, that is "GET /\n\r", and last the length of that string.

 43         write(fd, "GET /\r\n", strlen("GET /\r\n"));  

After that we take the buffer we declared before and set all bytes in this 
buffer to 0. This is to avoid junk data that the buffer might contain.

 44         bzero(buffer, BUFFER_SIZE);

Then while read indicates that there are still things to read we read from the 
socket. read returns the number of bytes that have been read. We simply assume 
that if we get the result 0 bytes read then we have read all available data. 
This is generally true when we work with blocking IO. That is the read function 
waits till it can read something, something that is good since it might take 
some time for the data to travel over the Internet. Arguments to read are the 
file-descriptor fd, i.e. the one we are reading from. The char array buffer in 
which we store the data. And lastly the number of bytes we want to read every 
time. But why not read exactly BUFFER_SIZE bytes? we just read BUFFER_SIZE - 1 
bytes. This is because the last byte in this char array is 0, due to the call 
to bzero before. When we print the contents of the buffer using fprintf on line 
46 fprintf must know when to stop printing. The case is that fprintf stops 
printing when it sees 0, or '\0' if you want the char value for 0. Otherwise 
we would print other things in memory that comes after the buffer. Something 
that might end up with an Segmentation fault, when trying to read memory we 
have no access to. After we have printed the message we bzero the buffer again 
and continue until no data is left to print.

 45         while(read(fd, buffer, BUFFER_SIZE - 1) != 0){
 46                 fprintf(stderr, "%s", buffer);
 47                 bzero(buffer, BUFFER_SIZE);
 48         }

When we're done we close the socket using shutdown, and specify that we are not 
interested in reading (RD) nor writing (WR) using SHUT_RDWR. After that we 
close the file descriptor using close and return 0 to the shell, just for the 
sake of good conduct.

 49         shutdown(fd, SHUT_RDWR); 
 50         close(fd); 

 
 51         return 0;

That's it about it, quiet frankly this example might be a bit hard to begin 
with since it's lengthy and contains a lot of socket yadda yadda. But I assume 
that most people would want something more 'useful' than another hello world 
described in great detail.