![]() |
ANU College of Engineering and Computer Science
School of Computer Science
|
|
|
Computer NetworksA Rate-limiting HTTP Proxy and FilterIn this assignment, you will be writing a HTTP proxy in the C programming language. This assignment is an example of an application-layer proxy. Application-Layer ProxiesAn application-layer proxy is a program that sits between some client and it's server and, generally, performs some function. To the client it appears as a server and to the server it appears as a client. A well-known example of application-layer proxies include the web-cache, which appears to a web-browser as a web-server and, when accessed, goes to the real server, appearing as a web-browser for the content. In this case, the web-cache will also cache recently accessed pages, so that they do not need to be fetched from the real server multiple times. Application-layer proxies can also perform some sort of protocol translation between the real client and real server. An example of this is a print-server that might "speak" Internet Printing Protocol (ipp) to it's clients, but then "speak" BSD Line Printer (lpr) protocol to the printers, or even using a completely different network-layer protocol, speaking AppleTalk Printer Access Protocol (pap) to laserwriters connected using the AppleTalk network layer. HTTP ProxyThe Hyper-Text Transport Protocol explicitly supports the use of intermediate proxies between a web client and a web server. For this assignment, the main function of the proxy is to perform rate-limitiing, a way of mitigating against buffer bloat. The proxy is also required to support both IPv4 and IPv6, to allow an IPv6 only web client to connect to IPv4 only web servers. You will need to perform some research into the format of HTTP requests. One way to do this is to use Wireshark and/or TCP Dump (tcpdump) to analyse the packets between a client and an HTTP server during a web request. To make the assignment more interesting, we will use a configuration file to allow different web sites to be rate-limited to different rates. Configuration fileThe assignment code is required to read and parse a configuration file which will provide details such as which port(s) to listen on, how long to cache entries, and the details of the backend protocols to support. The name of the configuration file will be passed as a command-line parameter with the -f flag, eg.: ./my_proxy -f proxy.conf The format of the configuration file is: # anything following a hash on a line is a comment # blank lines are ignored. debug = 0 # how much debugging info, 0 is none, 1 is more, 2 is more still # setting debug to other than 0 should imply no daemon mode proxy_port = 8080 # the TCP port to listen to for HTTP requests (default is 8080) [rates] # the start of rates section www.google.com 10 # limit google to 10kbytes/sec www.anu.edu.au 20 # limit ANU to 20kbytes/sec edu.au 5 # limit all other .edu.au domains to 5kbytes/sec An example parser will be available soon. Marks and ImplementationThis assignment should be attempted in stages, with marks awarded for each stage successfully achieved. Stage 1For the first stage, write an HTTP proxy that listens to web requests, works out the destination URL from the headers, connects to the web-server in the URL and passes on the URL to that webserver. As part of the debugging process, you may choose to have the proxy print out all the header fields on separate lines to standard out, or print out the number of header fields in the request etc. to get some feedback on how your proxy is working. It is recommended that your HTTP proxy has a listening process which then forks (see the fork(2) man page) a new child process for each incoming connection. The child process should then read the HTTP headers in the request, determine the URL and attempt to connect to the server in the URL. It should then simply echo bytes back and forth, using a loop with a select(2) call, until the connections are closed, at which time the child can then terminate. Successful implementation of Stage 1 will gain up to 10 out of the 20 marks for the assignment. Stage 2For stage 2, implement a generic rate-limiting process for each child process. You will need to research how an application can impose rate limits on TCP, but, in general, you simply need to count bytes and keep time and only forward on outbound data (writes) when the amount of data dividing by the expired time is within the rate limit. You are only required to rate-limit data being returned from the web-server to the web-client. Successful implementation of Stage 2 will gain up to an additional 5 out of the 20 marks for the assignment. Stage 3For stage 3, your proxy will need to read and parse the configuration file. It will then need to perform some string matching on the URLs to find an entry in the configuration file that matches some or all of the DNS name of the server in the request and then apply the appropriate rate limit for the matched entry. If no entry matches, then use the default. Successful implementation of Stage 3 will gain up to the remaining 5 out of 20 marks for the assignment. Additional tasksOnly after the above stages are complete and tested can one or more of the additional tasks be attempted for bonus marks. First additional task - DNS cacheWith many connections being opened to a potentially vast array of remote servers, and potentially many connections being opened to the same server, it makes sense to not perform a DNS lookup for each new outgoing connection. The first extension task (compulsory for COMP6331 students) is to implement a simple DNS cache so that recurring lookups of the same DNS name can be supplied directly. Depending upon how your code is written, this could be a separate thread or process or a shared memory scheme with careful locking etc. You can choose to implement the usual DNS Time To Live, or simply add a naive fixed TTL, configurable in the configuration file. An example might be a fixed TTL of 10 minutes. Think about whether to use a simple linear search, or some more advanced hashing or sorting based index. You can use available open-source libraries for implementing your DNS "database". Document your implementation and discuss it's short-comings as well as any strengths. You are not expected to provide a production-ready cache, but you should be able to discuss how your implmentation needs to be developed to make it production-ready. This additional task will not gain any additional marks for COMP6331 students, and up to 5 for COMP3310 students. Second additional task - SSL encryptionThe second task is to implement an SSL connection back to the client. This would normally be done by a "reverse-proxy" (located nearer to the server) to allow the server to not have to implement SSL (CPU intensive). Document how your proxy will manage SSL keys. Include appropriate additional tokens in your configuration file to specify where the SSL keys are to be loaded from etc. This additional task is worth up to 3 additional marks (out of 20) for both COMP3310 and COMP6331 students. AdministriviaThis assignment is to be done in groups of no more than 2. The assignment is worth 20% of final assessment. Both members of each group will receive the same mark unless there is clear evidence of non-performance of one member of the group, in which case both members may need to explain in person their contribution to the assignment. For COMP3310 students, the main part of the assignment is worth the full 20%. The bonus parts are worth additional marks, but the combined mark for both assignments and quizzes shall not exceed 40%. For COMP6331 students, the main part of the assignment is worth 15% and the first additional task is worth 5% (of final assessment). The other additional tasks are worth the additional marks indicated, but can not be attempted until the first additional part is implemented and tested. SubmissionThis assignment is due by 5:00pm on Friday 25th May, 2010 (end of week 12). Submission Details:We will be using the Subversion (SVN) Source Code Management (SCM) system to develop this assignment. There will be one SVN repository set up for each group. Both members of each group are required to make "commits" to their respective SVN repository as evidence of their contribution to the assignment. I expect to see many versions (minimum 10) of each groups work by the submission date. Clearly list in a file "versions.txt" in the top of your SVN repository the version number of the final version for the main assignment and for each of any of the additional tasks undertaken. E-mail me (bob@cs.anu.edu.au) with the student IDs of your group members when you know them, and I will create a repository for your group and send you the name and URL etc. In your repository there will be some skeleton files: webproxy.c and Makefile as well as the parser code. You are to edit these files and to add other files as required to your repository. You must include a README file (in plain text) with information about how your code works and how to use it. By assignment deadline, you should be able to check out your repository into a clean area, type "make" and expect a working version of the main assignment to be compiled called "webproxy" Subversion is well-documented at: http://svnbook.red-bean.com/en/1.5/index.html. | |||||||||||||||||||||||||||||||||||||||||||
|
Please direct all enquiries to: bob@cs.anu.edu.au Page authorised by: Head of School, SoCS |
| The Australian National University — CRICOS Provider Number 00120C |