CGI scripts

By Samadi van Koten on the 8th of March 2018

When I started building this site, I was thinking about how to handle the backend. Having started learning web development with languages and frameworks like Ruby and Rails, my thoughts turned first to that style of development. However, I have had bad experiences in the past with the monolithic, complex nature of such systems and have no desire to repeat that. So, I started thinking about other options.

CGI scripts were one of the first things that came to my mind. They're simple, can be written in any language, and are well supported by most web servers (though, as I found out later, not by NGINX). However, I had heard bad things about CGI scripts. "They're outdated and obsolete; stay away" seemed to be the general consensus towards them. I don't like doing things without knowing the reason though, so I decided to do some investigation.

There doesn't seem to be much information online about why CGI scripts are bad. I've searched quite a lot, but the only information I've been able to find is these two claims:

This blog post hopes to debunk both of those myths.

"CGI scripts are slow"

This is probably the most common complaint about CGI scripts. People claim that, because they fork a new process for every connection (to run the script) they are slower than leaving a single process connected permanently (which is what FastCGI and SCGI do). While this is true, it doesn't actually have as much of an effect as you might think. To prove this, I've done some simple benchmarks.

All testing in this section is performed with wrk.

Here's the result of wrk http://localhost:8080/hello.txt with Lighttpd serving hello.txt with the contents Hello, world!\n:

Running 10s test @ http://localhost:8080/hello.txt
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   406.67us  180.32us   7.78ms   83.31%
    Req/Sec    12.26k   618.54    14.37k    71.29%
  246396 requests in 10.10s, 57.13MB read
Requests/sec:  24396.15
Transfer/sec:      5.66MB

So, static files are fast. Very fast. What about CGI scripts?

#!/bin/sh
# hello.cgi
# Served with Lighttpd
echo 'Content-Type: text/plain; charset=utf-8'
echo
echo 'Hello, world!'

Output of wrk http://localhost:8080/cgi-bin/hello.cgi:

Running 10s test @ http://localhost:8080/cgi-bin/hello.cgi
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.90ms    1.35ms  22.16ms   86.15%
    Req/Sec     1.30k    34.72     1.39k    70.50%
  25913 requests in 10.00s, 3.88MB read
Requests/sec:   2590.33
Transfer/sec:    397.44KB

Much slower, but how does it compare to other methods? Here's a Python script using Flask:

# hello.py
# Served with Gunicorn (gunicorn3 hello:app)
from flask import Flask
app = Flask(__name__)
@app.route('/')
def hello():
  return "Hello, world!"
if __name__=='__main__':
  app.run()

Result of wrk http://localhost:8000/:

Running 10s test @ http://localhost:8000/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     6.61ms  733.64us  15.89ms   93.46%
    Req/Sec   752.03     37.42   808.00     77.50%
  14981 requests in 10.01s, 2.47MB read
Requests/sec:   1497.30
Transfer/sec:    252.98KB

Whoa! The CGI script is faster than Flask? Wasn't CGI supposed to be slow? Well, maybe we're not giving it a fair shot. Let's try running Gunicorn with 2 threads and see if it improves:

Running 10s test @ http://localhost:8000/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     5.03ms    4.09ms  29.21ms   87.97%
    Req/Sec   806.79    580.16     1.46k    30.73%
  15599 requests in 10.04s, 2.57MB read
Requests/sec:   1553.14
Transfer/sec:    262.40KB

That doesn't seem to have helped much... Does this mean CGI scripts aren't actually slow? Well, maybe. It's important to remember that this is a very simple example. On a different system or with more complex scripts, the results could change completely. But I would say that it's incorrect to dismiss CGI scripts outright because of their supposed "slowness".

"CGI scripts are insecure"

This is less about CGI itself and more about the scripts that it runs. Historically, many CGI scripts were written in Perl. Perl has some serious security issues. One such security issue is the unusual syntax of open.

Here's a very simple CGI script written in Perl:

#!/usr/bin/perl
# readfile.cgi
my $filename = "../$ENV{'PATH_INFO'}";

print "Content-Type: text/plain; charset=utf-8\n\n";

my $file;
open(file, $filename);
print "$_\n" while <file>;

Looks fine, right? It just reads the file specified in the PATH_INFO environment variable (which comes from the URL) and outputs it. Here's an example usage:

$ curl http://localhost:8080/cgi-bin/readfile.cgi/hello.txt
Hello, world!

Perfect. But what happens when I put weird stuff in the filename?

$ curl 'http://localhost:8080/cgi-bin/readfile.cgi/;date|'
Thu Mar  8 12:26:02 GMT 2018

Uh oh. We just managed to run a command via a CGI script. That's very, very bad. So what's going on here? Well, Perl's open function isn't just for opening files, like you'd expect. It can also read from or write to commands, which it does when you put a | at the end or beginning of the "filename" respectively. So in this example, when we put ./;date| as the filename, it tries to run ./;date| in a shell. Because ./ isn't a command, that errors which you'd see if you were to look at the error log on your webserver, but it still runs date and sends its output back as part of the response.

This is not by any means the only way to get caught out. For Perl specifically, there is a whole list here. Security issues can also be introduced into Bash scripts by not properly santizing user input or by not quoting variable expansions properly.

The problem with this argument against CGI scripts (and the reason it's much less commonly mentioned than the performance one) is that it's not an issue with CGI specifically. It is an issue with the scripts being run using CGI, and these issues will remain no matter what system you use for running the programs.

Conclusion

So, what's the result? Are CGI scripts really such a bad idea? I'd say it depends. In many cases CGI scripts are fine, but there are also a lot of situations where they wouldn't be appropriate.

If you're creating a very simple website with only a few dynamic pages, CGI scripts are a perfect fit. However, if every page is dynamic or you're expecting to get a very large amount of traffic, it might be better to consider another method such as FastCGI, SCGI or WSGI.

If you're using a programming language that supports an alternative method by default (like PHP), I would suggest using that over CGI for the speed increase, but if not (like me with my Bash scripts) CGI is a good option.

Thanks for reading! Any opinions or comments can be expressed to me either on Freenode IRC where I idle as vktec or by emailing samadi at this domain.