General Data Syntax
	Scalars vs. Slices
	Octal vs. Decimal

Confusing Perl, C, and sh
	Command-line arguments
	Program Name
	Control Structures

Storing Command Output

Idiomatic Perl
	Operating on Arrays
	Handling Nested Quotes
	Comparison Operators


Common Perl Pitfalls

General Data Syntax

Scalars vs. Slices

Let's say you want to access a value in an array or a hash. After skimming --– a poor choie --– Chapter 2 of Programming Perl (O'Reilly, 1997) -- a good choice --– you see that arrays start with a @ and hashes start with a %. You have a snippet of code that looks something like this:

        @names = ("Jeff", "Jon", "Andrea", "Chuck");
        $dad = @names[3];

Stop right there! We have a problem that is caught when using Perl's -w switch: "Scalar value @names[3] better written as $names[3]..." Although the problem may not be as apparent in this case, the following code should clear it up considerably. If you are storing output from a command in a scalar variable, like so:

        $command_output[0] = `who`;

the output is stored as a string with newlines. If, however, you store it like this:

        @command_output[0] = `who`;

you only get the first line of the output. The reason is this: @array[...] denotes an array slice, whereas $array[$index] denotes a scalar value. The incorrect code can be shown more clearly like so:

        @command_output[0] = `who`;
        ($command_output[0]) = `who`;

The two examples are one in the same; @array[$index] is a list of one scalar, which is rarely what is intended. Likewise, to refer to many hash elements at once, use @hash{$key1,$key2}, and not %hash{$key1,$key2}. To refer to a single hash element, use a $ to indicate a scalar: $hash{$key1}. Again, this is all explained, in much more detail, in Chapter 2 of Programming Perl, and it should be on your system in perldoc perlsyn.


A common mistake by inexperienced programmers occurs because they feel compelled to stringify variables in all cases.

        print "$var";
        $foo = "$bar";

All of these forces the variable to be a string, even if it is a number or a reference. This causes big problems if the variable is a reference, because a string that holds the memory address of data cannot be used to retrieve the data; that is, you cannot de-reference a variable $foo containing the text SCALAR(0xb5a3c) by using $$foo. Therefore, if function() expects a reference passed to it, and $value is a reference, that last line of code above will break the subroutine.

The magical auto-increment operator has a different effect on strings and numbers. Consider the following example; it shows what happens when a string with letters and numbers gets magically incremented, and what happens when a number (even an octal or hexidecimal number) gets magically incremented.

        $val = "0x123456";      # oops, meant it to be a hex number
        print ++$val;           # strips everything after the 'x'

        $val = 0x123456;        # now THAT'S hexidecimal
        print ++$val;           # prints the decimal representation

Octal vs. Decimal

Many of the Perl's functions make system calls, and expect a file permission status – chmod, mkdir, umask – and these all expect the file permissions in octal, which means they need a leading "0". The problem is that many people do not enter the leading "0" when running chmod on their own system, so they do not expect to need one in Perl; even worse, some people are not aware what the bits in "755" – or more properly, "0775" – signify. I suggest you consult your system's man page on chmod(1) if you are one of these unlucky people.

Again, this is one of those situations where the -w switch will save you must frustration. The following code will produce an error with -w on:

        chmod(644, $file);
        chmod: mode argument is missing initial 0

Look at the permissions of the file here before and after that code is run on it:

        -rw-rw-r--   1     190 Jan 30 15:44 1.txt
        --w--wx---   1     190 Jan 30 15:45 1.txt*

Obviously, we meant 0664, not 664. You can turn 644 into 0664 very easily. Simply use the oct() function. These two lines do the same thing:

        chmod(0664, $file);
        chmod(oct(664), $file);

On a side note, the function oct() assumes its argument is octal (or hexidecimal, if it starts with a 0x) and returns the corresponding value. The function hex() assumes its argument is hexidecimal, and returns the corresponding value. Caveat: oct(0664) is not equal to "0664". For more information, refer to perlfunc.

There is also a module, available on the Comprehensive Perl Archive Network (CPAN), called File::chmod. It allows for symbolic file permissions as well as ls style permissions. Instead of extracting the permissions of a file and modifying them bitwise, or making a system call, you can append to its permissions or remove from them. The regular chmod function in Perl requires an absolute file permission; you can tell it (simply) to merely add the executable bit to a file. The File::chmod module overrides the regular chmod with its own, which can handle octal, symbolic, or ls style permissions.

Confusing Perl, C, and sh

A common mistake made by programmers of C and sh is that they don't partition their brains correctly, and some of their C or sh knowledge slips into the Perl part of their gray matter, and they confuse the languages. That, or they just assume Perl acts in a similar fashion. Perl has taken from many languages, yet it does have its differences from them.

Command-line arguments

In my experience, I have seen more people confuse C and Perl syntax for arguments more than I have seen people confuse sh and Perl, but I shall address them both. Perl stores its command-line arguments – minus those parsed as command-line options using either the -s with Perl or a module such as GetOpts – in the array @ARGV. This array is accessible by all packages; that is, in any package, @ARGV is the same as @main::ARGV. The first argument is index 0, the last is index $#ARGV: @ARGV[0..$#ARGC] is an array slice containing all the elements in @ARGV. In C, the arguments are stored in argv[], but the first element in the list is the name of the program. C uses argc to hold the number of arguments passed to the program, so then argv[1,2,3,...,argc] holds all the arguments to the program. In sh, arguments are stored in $1, $2, $3, ... which can cause problems when you get up to more than 9 arguments, but I'm not here to bash shell :). Perl uses those variables for storing matching strings in a regular expression.

Program Name

In perlvar (which should be installed on your system along with Perl, unless your negligent system administrator has been remiss in his duties), the variable $0 is listed as holding the name of the currently running Perl program; for those of you accustomed to using the English module, it's called $PROGRAM_NAME. The mnemonic for the variable is, oddly enough, "same as in sh or ksh." In C, however, as mentioned above, uses the first element in the argv array, argv[0], to store the program name. Many times, in Perl programs, I've seen people using $ARGV[0] or $ARGV when they should have been using $0. $ARGV is a totally different variable dealing with the name of the current file when reading from <>; see perldoc perlvar for more information.

Control Structures

You can always tell if a person is stuck on C, because they'll ask a Perl programmer how to do a switch statement. There is information in perldoc perlsyn, and Tom Christiansen has a response to the question at There are multiple ways of creating a switch-like control structure; using for-loop, if-elsif-else statements... there are more, but I often end up using a for-loop.

Speaking of if-elsif-else statements, there are different syntaxes among the three languages here. Not to mention, in C one can leave braces off a one-line if statement, which the author finds ghastly wretched. In Perl, the statement is "elsif", in C, the statement is "else if" (two separate words), and in sh, the statement is "elif" (which is "file" spelled backwords).

Storing Command Output

Perl has a couple ways of calling system commands, and these are often sources of confusion for inexperienced programmers. There are several different ways to capture command output, each of them acts differently or returns data differently.

The system() function takes a list or a string and executes it, printing to STDOUT whatever is sent the specified command's STDOUT. It does not return what it prints, it only prints it. It returns the return value of the system call, zero for success, non-zero for failure. This example code shows you how not to get the date from your system.

        $date = system("/usr/bin/date")
                or die "can't run /usr/bin/date: $!";

What that just did was assign 0 (hopefully) to $date – either 0 or whatever the return value of /usr/bin/date was – and then die because system returned 0. The more correct (or less wrong) way of getting the date from the system (if you really want to make a system call) is:

        chomp($date = `/usr/bin/date`) or die "can't run /usr/bin/date: $!";

The backticks cause the program to return the standard output (with newlines included) to a variable. The qx() operator is identical to backticks. Using backticks in scalar form is slightly different from using it in list form. In scalar form, multiline input is stored as a single string of text, with newlines at the end of each line. In list form, it returns a list of lines, sensitive to the $INPUT_LINE_SEPARATOR, or $/, variable. List form is similar to:

        open DATE, "/usr/bin/date |" or die "can't run /usr/bin/date: $!";
        @data = <DATE>;
        close DATE;

Of course, you might just want to use localtime() in scalar context, except that it doesn't report the time zone you're in.

Finally, there's exec(). I see this used much too often, causing problems that inexperienced programmers don't expect. Server programs most frequently use this; it replaces the current program with what is passed to it. It will end your program, so the following code is rather silly. The only way the print statement would be called is if the exec failed, which is a bad thing.

        $date = exec "/usr/bin/date";
        print "Today's date is: $date";

Idiomatic Perl

Operating on Arrays

Another way to tell if someone's been programming in C and hasn't read up much on Perl is to look at how they do things to array elements. In C, you'll often see code that looks something like:

        for (int i = 0; i < sizeof(array); i++){
                char c = array[i];
                // et cetera

And then they bring that over to Perl, and you get something like:

        $size = @array;
        for ($i = 0; $i < $size; $i++){
                $element = $array[$i]; # or even worse, @array[$i]
                # et cetera

Perl has a very nice way of iterating over lists. You can use for or foreach, which happen to be the same thing. It allows you to shrink your code (and the number of variables you use) amazingly:

        for (@array){ ... } # or
        foreach (@array){ ... } # or
        for $element (@array){ ... } # or
        foreach $element (@array){ ... }

You see? It's that easy.

Handling Nested Quotes

"He screamed, 'It's him! He told me, "I'll kill Sara.""'

What a troublesome thing to have to store in a variable, eh? Here's a shoddy attempt at storing that phrase in a variable:

        $line = "\"He screamed, 'It's him!  He told me, \"I'll kill Sara.\"'";

Now, if that isn't hideous, I'm not sure what is. Let's use Perl's qq() operator to make things nice.

        $line = qq("He screamed, 'It's him!  He told me, "I'll kill $name."'");

The qq() operator works like double quotes, only you can use any non-alphanumeric delimiter you want. Like double quotes, it interpolates variables and escape sequences. The q() operator acts like single quotes. The qx() operator, described previously, is the same as using backticks. The qw() allows for speedy creation of lists. The two lines of code are equivelent:

        @list = qw(jonathan jeffrey jennifer andrea);
        @list = split ' ', q(jonathan jeffrey jennifer andrea);

Notice that it splits on ' ', which is a magical string in split() that splits on as much whitespace as possible, and removes leading and trailing whitespace. Also, qw() does not interpolate variables and escape sequences.

Please note a "quirk" about the qw() operator, shown to me not too long ago on #perl. It does not imply parentheses around itself, causing an unexpected error message in the following:

        $word = qw( this that the other thing)[$i];
        Can't use subscript on split at - line 1, near "2]"

        $word = (qw( this that the other thing))[$i];  # properly done

Comparison Operators

It's a shame when novice programmers ruin flat files or databases by doing the following erroneous "comparison":

        open FILE, "file" or die "can't open file: $!";
        open OUT, ">file.out" or die "can't create file.out: $!";
        while (<FILE>){
                print OUT unless $_ == $dont_print_this_line;
                # or even unless $_ = $dont_print_this_line;
        close FILE;
        close OUT;
        rename "file.out" => "file" or die "can't mv file.out to file: $!";

Oh dear. Both of those tests will most likely screw up that file of yours. The problem is this: the == operator is for numeric values only, whereas the eq operator is for variables to be treated as strings.

Many errors in conditional statements arise from programmers using = when they mean ==. The difference: if ($a = getword()){ ... } means "if the return value of getword(), stored in $a, is true (non-zero), then..."; if ($a == getword()){ ... } means "if $a is the same value as getword(), then...".

But chances are, that's not what you really meant. The function getword() probably returns a word, not an numeric value. A string in numeric context usually returns 0. Thus, that == comparison will probably only be true if $a is 0. If, instead, $a is a word, and you want to test if $a is the same as the return value of getword(), use the eq operator: if ($a eq getword()){ ... }. The equivalent string operators for numeric operators are:

	==	eq		!=	ne		>	gt
	<	lt		>=	ge		<=	le
	<=>	cmp

Comparisons are done ASCIIbetically, meaning "This" comes before "this", and "hello" comes after "goodbye".