CSCI 341: Lab 4

Webserver
Due: 11:59 PM on Monday, April 19th

This homework has the following learning objectives:

  • Learn how to follow a network protocol (in this case, http)
  • Learn network/socket programming
  • Learn some simple multithreading

In this homework, we’ll look at the HTTP protocol as a Web server. Please create a github repository using this link https://classroom.github.com/g/uplicORa. After creating your personal repository, you will find that the lab4 directory contains:

  • lab4.c Skeleton code for the server side of a TCP application. This will be the primary file for this assignment, but feel free to modularize and create other files if you prefer to do so.
  • WWW/ A directory containing example files for your Web server to distribute.
  • A Makefile If you modularize your code into different files, make sure those changes are reflected in the Makefile (and don’t forget to git add those files to your personal git repo!

Your server program will receive two arguments: 1) the port number it should listen on for incoming connections, and 2) the directory out of which it will serve files (often called the document root in production Web servers). For example:

$ ./lab4 8080 WWW

This command will tell your Web server to listen for connections on port 8080 and serve files out of the WWW directory. That is, the WWW directory is considered ‘/’ when responding to requests. For example, if you’re asked for /index.html, you should respond with the file that resides in WWW/index.html. If you’re asked for /dir1/dir2/file.ext, you should respond with the file WWW/dir1/dir2/file.ext.

Requirements

In addition to serving requested files, your server should handle at least the following cases:

  1. HTML, text, and image files should all display properly. You’ll need to return the proper HTTP Content-Type header in your response, based on the file ending. Also, other binary file types like PDFs should be handled correctly. You don’t need to handle all possible file types, but you should at least be able to handle files with html, txt, jpeg, gif, png, and pdf extensions.
  2. If asked for a file that does not exist, you should respond with a 404 error code with a readable error page, just like a Web server would. It doesn’t need to be fancy, but it should contain some basic HTML so that the browser renders something and makes the error clear.
  3. Some clients may be slow to complete a connection or send a request. Your server should be able to serve multiple clients concurrently, not just back-to-back. For this lab, use multithreading with pthreads to handle concurrent connections.
  4. If the path requested by the client is a directory, you should handle the request as if it was for the file index.html inside that directory. Hint: use the stat() system call to determine if a path is a directory or a file. The st_mode field in the stat struct has what you need.

When testing, you should be able to retrieve byte-for-byte copies of files from your server. Use wget or curl to fetch files and md5sum or diff to compare the fetched file with the original. I will do this when grading. For full credit, the files need to be exact replicas of the original.

Tips

  • Take compiler warnings seriously. Unless it’s an unused variable, you should address the warning as soon as you see it. Dealing with a pile of warnings just makes things more difficult later.
  • Test your code in small increments. It’s much easier to localize a bug when you’ve only changed a few lines.
  • If you need to copy a specific number of bytes from one buffer to another, and you’re not 100% sure that the data will be entirely text, use memcpy().
  • If you’re trying to do some sort of specific string or memory manipulation, feel free to ask if there’s a better/recommended way to do it rather than brute force. Often there may be a standard library function that will make things easier.
  • Read chapter 11 (especially 11.5 and 11.6) from the textbook to learn more about the network programming in C.

Roughly, your server should follow this sequence:

  1. Read the arguments, bind to the specified port, and find your document root (you might find the chdir() system call helpful).
  2. Accept a connection, and hand it off to a new thread for concurrent processing.
  3. Receive and parse a request from the client.
  4. Look for the path that was requested, starting from your document root (the second argument to your program). One of three things should happen:
    1. If the path exists and it’s a file, formulate a response (with the Content-Type header set) and send it back to the client.
    2. If the path exists and it’s a directory that contains an index.html file, respond with that file.
    3. If the path does not exist, respond with a 404 code with a basic error page.
  5. Close the connection, and continue serving other clients.

Reminders

Always, always, always check the return value of any system calls you make! This is especially important for send, recv, read, and write calls that tell you how many bytes were read or written.

You will need to use the portnumber assigned to you for testing your program, since multiple processes cannot use the same port at once. You can find your personal port number here

If you have any questions about the homework requirements or specification, please post on Piazza.

Partners

You can work on this lab with a partner if you choose. If you decide to work with a partner, you and your partner should check out a single lab4 repository. The first partner will create a team name, and the second partner should choose that team name. Please be careful choosing a team, as this cannot be undone. Please name your team something that makes it clear who you are.

If you choose to work with a partner, you and your partner must complete the entire lab together. Dividing the lab up into pieces and having each partner complete a part of it on their own will be considered a violation of the honor code. Both you and your partner are expected to fully understand all of the code you submit.


C. Taylor