If you are going to put information on the internet, you need to consider security issues. Anyone in the world can read documents on the web, and some of those people are "bad guys", people would want to hurt you for no particular reason. One very obvious thing to do is not put too much information about yourself or about the biolinx computer on your web site. This is very basic: even Dear Abby says this in her advice column.
More relevant to bioinformatics is the problem of CGI scripts and the forms that run them. They are necessary, as there is simply too much information in a typical bioinformatics application to display on a static web page. However, CGI scripts are inevitably security holes: a pathway into the server that can be exploited by the bad guys. You need to write your CGI scripts in safe ways. Here are a few issues that need to be considered.
One source of problems is putting CGI scripts in the document path: that is, putting an executable script in any directory under /home/httpd/html instead of under /home/httpd/cgi-bin. Any file under /html can be retrieved as-is and viewed by anyone on the web. Try it out and see: put one of your programs into your HTML directory, then open a web browser and attempt to access it through http://biolinx.bios.niu.edu. Now go and remove all those executables from your HTML directory!
Allowing the bad guys to view your source code is a bad idea, because it gives them information about the machine: things like directory structure, Perl version, etc., and potentially leads to a program that can be exploited directly.
http://biolinx.bios.niu.edu/cgi-bin/z012345/your_program.cgi?color=blackA modestly clever user could change this to:
http://biolinx.bios.niu.edu/cgi-bin/z012345/your_program.cgi?color=purpleand your carefully chosen options would be subverted. So, be aware of this issue and check the user's inputs as discussed below rather than blindly taking whatever comes in from the form.
A script that writes a file can be a problem. In the simplest case, do you care if the contents of that file are completely trashed by some random idiot? A more complex and dangerous case is that a bad guy might write a file that contains executable code that would cause you problems if you inadvertantly executed it. As a good example, if "rm -rf *" gets executed by the shell, all programs in that directory and below will be deleted. Be sure that permissions for files to be written are set at 666, not 777: you don't want any file written by a CGI program to be executable!
Read-only files (files whose last permission number is 4: e.g. 744) might give away information to the bad guys. Don't keep important information here. One particular source of problems here can be "encrypted" passwords. Encryption is a great thing, but us non-experts probably can't do it well enough to prevent an expert from cracking the encryption. For instance, if a bad guy starts guessing at passwords and trying to encryt them, he/she might find a match to the one you had in your read-only file.
This means using the "system" or "exec" commands, or using "open" with a pipe { | } in it, or using backtick quotes ( ` ). Any of these can be a problem if the command contains a variable that the user can enter. As a simple example:
my $q = new CGI;
my $user_in = $q->param("user_input");
exec $user_in;
This would cause serious problems if the user enteerd "rm -rf *" in the user input text box of the form that invoked this program.
This doesn't mean you can't execute system commands, but it is necessary to do it carefully. The primary solution to this is to only allow the user to indirectly invoke the system. For instance, if you give the user a choice of files to open, check to see if the input is in fact one of those file names, then open it. Here is an example.
my $q = new CGI;
my $in_file = $q->param("in_file");
open FILE, $in_file";
my $q = new CGI;
my $in_file = $q->param("in_file");
my $file;
if ($in_file eq "at1g12345" {
$file = "at1g12345.txt";
} elsif ($in_file eq "at1g67890" {
$file = "at1g67890.txt";
} elsif ($in_file eq "at1g63475" {
$file = "at1g63475.txt";
} else {
die "Please choose one of the listed files"
open FILE, $file";
User input that contains shell metacharacters is potentially dangerous. The shell metacharacters have special meanings for the shell which means that they are potentially useful for doing unpleasnat things. These characters are:
&;`'\"|*?~<>^()[]{}$\n\r
A pretty extensive list! It is better to simply limit users to appropriate characters: letters, digits, spaces, underscores, commas, periods ( . ), and hyphens should cover nearly everything you will need. The standard way to do this in Perl is to use a substitution:
$user_input =~ s/[^-.\w, ]//g;This will remove any character other than the ones listed in the square backets. Recall that the initial "^" inside square brackets in a regular expression means to negate that character class, to substitue any character NOT listed. Also recall that the "g" at the end means to do it globally, to remove all offending characters and not just the first one. If you like, you can convert any unacceptable character to an underscore:
$user_input =~ s/[^-.\w, ]/_/g;There are, of course, some exceptions to this list of acceptable characters. It is up to you to check user input and to allow only acceptable characters to enter your program.
Perl has a system for checking CGI scripts for unsafe practices. It is not 100% guaranteed, but it certainly is a big help, and you should use it on all your CGI scripts.
The basic syntax is quite simple: add "-T" to the line that invokes Perl:
#! /usr/bin/perl -wT
The effect of taint checking is to cause you a certain number of errors along the lines of: "Insecure $ENV{PATH} at line XX" or "Insecure dependency" or some other message implying that the program is insecure or tainted. That is because you are trying to run a CGI program that modifies an external file: opening a file for writing, running "exec", "system" or "eval" -- and using raw user input to those commands. To make the Perl taint mode happy again, you need to change a few things.
The major taint problem is from accepting parameters, raw user input, from a form and putting them directly into a variable. Perl considers any such variable "tainted" and will not allow you to use them or any other variables derived from them, with any command that affects the sytem outside your program.
To untaint a variable, you need to pass all user-entered parameters through a regular expression that matches a pattern and then extracts out the the desired substring. For example:
#! /usr/bin/perl -wT
my $q = new CGI;
my $in_var = $q->param("file_name_from_user");
open FILE, ">$in_var" or die "Couldn't open file: $!\n";
#! /usr/bin/perl -wT
my $q = new CGI;
my $in_var = $q->param("file_name_from_user");
$in_var =~ /([-.,\w]+)/; # match only acceptable characters
my $perl_var = $1;
die "Unnacceptable input!\n" unless $perl_var eq $in_var;
open FILE, ">$perl_var" or die "Couldn't open file: $!\n";
An example of this is found at http://biolinx.bios.niu.edu/bios546/test5_form.htm, which runs a program in the BIOS546 cgi-bin called "test5.pl"
It is important to notice that using pattern matching does not all by itself make your script safe. What is important is that you, the programmer, think about what characters and patterns of characters are acceptable or not. All taint mode actually does is force you to think about the issue. If you do something stupid and dangeroous, it will be due to stupidity or willful behavior, and not due to ignorance.
And that is really the bottom line concerning CGI security. We need to use forms to retrieve data, but forms are fundamentally insecure. So, we try to decrease the risk by using "safer" procedures and avoiding risky behavior.