9781118013847
working_with_data.html

Chapter 4. Working With data

WHAT YOU WILL LEARN IN THIS CHAPTER

  • Working with string, numeric, bitwise and boolean data

  • Understanding Perl’s precedence and associativity

  • Understanding keywords that affect scoping

Perl is a dynamic language. While there’s a lot of debate about the merit of dynamic languages, looking at the success of Perl, PHP, Python and Ruby clearly shows that they’re popular. The popularity is due in large part to how easy it is to focus on the problem at hand and just Get Stuff Done. This chapter will show you much of the basic data manipulation available in Perl to help you Get Stuff Done.

This chapter is, quite frankly, boring. It will serve more as a reference chapter that you can conveniently flip back to when you want to understand how to manipulate data in a particular way. If you like, you can think of it as an appendix we slipped into the front of the book. Note that the builtins described here are not an exhaustive list. They’re the ones you’re most likely to encounter in your daily work.

Builtins

For many languages there is a strong distinction between operators and functions. This distinction is less clear in Perl. In fact, some things that look like functions are sometimes referred to as named unary operators (see perldoc perlop). In order to sidestep the inherent ambiguity, many Perl developers refer to operators and functions as “built-ins” (sometimes spelled “builtins”, we do here). We will often use these terms interchangeably.

Note

Subroutines and functions are considered distinct in some languages. If you refer to a function as a subroutine, invariably some AD&D rules lawyer turned programmer will come along and imperiously state “No, no. That’s a subroutine”, even if it has no bearing on the discussion at hand. Because Perl is designed to be a working language, we don’t get bogged down in terminology. That’s why sometimes you might see my described as a function (as it is in perldoc perlfunc), even though it’s not the same thing. The print() function is sometimes described as a named unary operator when it’s used with parentheses. Don’t be a rules lawyer and get bogged down in terminology.

Because Perl’s type system focuses more on how you organize your data than what kind of data that you have, you’ll find that many string, numeric, bitwise and boolean operators will work on just about any kind of data you have. Most of the time this “just works”, but you still have a responsibility as a programmer to understand what type of data you have.

Also note that the parentheses are optional on most builtins. Your author tends to omit parentheses because he views them as visual clutter, but others prefer to be explicit. Just choose the style you prefer and stick with it for consistency. We’ll skip back and forth to get you used to each. However, when we mention a function name in the body of the text, we’ll usually include the parentheses to avoid confusion.

Also, note that many of these functions and operators are prefix, infix or postfix.

  • prefix: placed before their operand (!$var)

  • infix: placed between two operands ($var + $var)

  • postfix: placed after their operand ($var++)

Sometimes an operator’s meaning may change slightly if you use it as a prefix operator instead of an infix operator. We’ll describe these conditions as they arise. They’re actually very natural to use this way.

Scalar

In Chapter 2, you learned that a scalar is a variable that contains a single value. Perl really doesn’t care what kind of data you have in that value, so stuff anything in there that you need:

my $answer     = 'forty two';
my $num_answer = 42;

Clearly 'forty two' is a string and 42 is an integer, but Perl expects you (mostly) to handle them with care and not mix them up. If you try to use 42 as a string, Perl will treat is as a string comprised of the characters '4' and '2'. If you try to treat 'forty two' as a number, Perl will treat it as the number 0 and if you have warnings enabled, Perl will usually complain loudly when you try to use it as a number.

We’ll start with many of the string builtins first, listed mostly in alphabetical order with “operators” coming after. You’ll notice that many of these functions automatically operate on the $_ variable if no variable is specified. In Chapter 5, when we cover control flow, you’ll see many operations that set the $_ variable if no variable is declared. This may sound strange, but it will be more clear when you see examples. You’ll also see this in the map() and grep() functions which we introduce in this chapter.

Builtins will be introduced with a snippet of “grammar” that shows more or less how to use it. These will deliberately not always match what you see in perlfunc. This is to avoid obscure cases (as with the my() builtin) or to just make them a bit easier to read and see common usage.

Note

Remember that for all of the builtins that are “words” (print(), chomp(), and so on), you can read more about them with perldoc -f builtin.

perldoc -f chomp
perldoc -f ucfirst

For the “operator-like” builtins such as +, ==, << and so on, you’ll just have to read the gory details in perldoc perlop.

String

In Perl, just about anything can be coerced into a string merely be treating it as a string. We’ll list various functions and their usage in alphabetical order.

chop() and chomp()
chop            (defauls to $_)
chop VARIABLE
chop( LIST )
chomp (defaults to $_)
chomp VARIABLE
chomp( LIST )

The chop() builtin removes the last character from a string and returns it.

my $name = 'Ovid';
my $last = chop $name;

$last is now set to 'd' and $name is 'Ovi'. The chop() function was primarily used to remove the newline from strings, but for that we now use the chomp() function.

chomp() removes newlines from the end of strings. It’s particularly useful when you’re reading lines from a file and want to remove the newline from each record.

Note

Actually, chomp() removes whatever is stored in the $/ variable, also known as the “input record separator”. Most of the time, $/ is equal to a newline, but sometimes people will set it to a different value when they want to change how to read records from a file. We’ll cover this more in Chapter 9, Files and Directories. Read perldoc perlvar and look for $INPUT_RECORD_SEPARATOR if you can’t wait.

You can also use both chop() and chomp() with lists and hashes, but this usage is uncommon and less commonly seen in production code. For lists (and arrays), both chop() and chomp() work their magic on each individual element, but for hashes they only affect the values of the hash and not its keys.

Both chop() and chomp() modify the variable directly. However, chop() returns whatever character was removed from the string and chomp() returns a true or false value, depending on whether or not anything was removed. As a general rule, we recommend that you not use chop().

chr() and ord()
chr (defaults to $_)
chr NUMBER
ord (defaults to $_)
ord  STRING

chr() accepts a number and returns the character associated with that number. For example, the following will assign the string "Ovid" to the variable $name. Note that the dot operator (.) is used in Perl for string concatenation.

my $name = chr(79).chr(118).chr(105).chr(100);

If the number is greater than 255, chr() will return the corresponding Unicode character.

The ord() function does the reverse: it returns the numeric value of the first character in the string passed to it.

my @values = ( ord('O'), ord('v'), ord('i'), ord('d') );

@values will now contain (79, 118, 105, 100).

Even though the characters represented by the values 128 through 255 are not ASCII, Perl’s chr() function will not return Unicode values for them to maintain backwards compatibility.

index() and rindex()
index STR,SUBSTR,POSITION
index STR,SUBSTR
rindex STR,SUBSTR,POSITION
rindex STR,SUBSTR

Given a string, index() lets you find the first occurrence of a substring within it, with indexing starting at 0. If the substring is not found, it returns -1. You can also supply a starting position from which to search. The rindex() function is identical the index() function, but it finds the last occurrence of the string.

So when the word “miminypiminy” springs to your lips as the perfect description of something (it means “delicate, mincing, or dainty”, but you knew that), you naturally wonder where the substring “iminy” may be found within said word.

#               012345678901
my $word     = 'miminypiminy';
my $first    = index  $word, 'iminy';
my $second   = index  $word, 'iminy', $first + 1;
my $last     = rindex $word, 'iminy';
my $not_last = rindex $word, 'iminy', $last - 1;
print "First:    $first\n";
print "Second:   $second\n";
print "Last:     $last\n";
print "Not last: $not_last\n";

And that prints out:

First:    1
Second:   7
Last:     7
Not last: 1

Now you can tell your friends you’re an expert in miminypiminy, but don’t be surprised when they laugh.

lc(), lcfirst(), uc() and ucfirst()
lc (defaults to $_)
lc      EXPR
lcfirst (defaults to $_)
lcfirst EXPR
uc (defaults to $_)
uc      EXPR
ucfirst (defaults to $_)
ucfirst EXPR

These handy little functions are part of the very useful suite of tools Perl provides for manipulating data. The lc() function will force an entire string to lower-case. The uc() function will force the string to upper-case. The lcfirst() and ucfirst() functions will do the same thing, but only on the first character. Naturally you can combine them. Here’s one way to print “Perl”, for example:

print ucfirst lc 'PERL';

All of these functions respect locale settings.

length()
length (defaults to $_)
length EXPR

The length() function returns the number of characters in a string. Note that due to Unicode, this is not necessarily the same as the number of bytes. So this prints 6, as you would expect:

print length('danger');

But the following prints 9 when trying to figure out the length of "Japan" when written in Japanese:

print length('日本国');

That’s because each of those characters is comprised of 3 octets (bytes, but see the Unicode section in Chapter 9, Files and Directories) and Perl doesn’t know that you have Unicode in your source code. To handle it correctly, use the utf8 pragma. The following will correctly print 3:

use utf8;
print length('日本国');

Many people mistakenly use the length() function to try to determine the length of an array or hash. Use scalar(@array) or scalar(keys(%hash)) for this, not the length() function. That’s not what it’s for.

pack() and unpack()
pack   TEMPLATE, LIST
unpack TEMPLATE, VARIABLE
unpack TEMPLATE

The pack() and unpack() builtins are two functions that nobody remembers or understands, even though conceptually they’re simple.

The pack() function accepts a template and a list of values, “packing” that list of values into a single value according to the template. The unpack() function does the reverse. It takes the same template and “unpacks” a scalar value into a list of values. Unlike pack(), unpack() defaults to the $_ variable.

You’ll want to read perldoc -f pack and perldoc -f unpack to understand the templates. We won’t cover them much in this book as they’re not terribly common in production code, but here’s a quick example of reading fixed-length data very quickly. We use dots in the comment to show you where each field in the record ends.

#                    .       .    .  .
my $record = '20080417john    39552027';
my ( $hired, $user, $emp_number, $dept ) = unpack 'A8A8A5A3', $record;
print "Hired: $hired\nUser:  $user\nEmp#:  $emp_number\nDept:  $dept\n";

And that will print out:

Hired: 20080417
User:  john
Emp#:  39552
Dept:  027

And that’s pretty much the last you’ll see of these two functions in this book. Just be aware they exist.

Note

If you want to know more about pack() and unpack(), see perldoc perlpacktut.

print()
print (defaults to $_)
print FILEHANDLE LIST
print LIST

We’ve used print() quite a bit and had you’ve seen examples in Chapter 3, Variables, but it’s worth covering a few things here. First, print() takes a list. A scalar variable can be thought of as a list with one element, which is why print($name) works.

my $customer = 'Alex';
print "Customer: $customer\n";

This raises the obvious question of where this is printing to and that’s where filehandles come in.

The optional FILEHANDLE argument is something we will cover more in Chapter 9, Files and Directories when we cover files, but for now be aware that a filehandle is usually (not always) one of three things:

  • A “handle” to an actual file

  • STDOUT: the default place where a program writes normal output

  • STDERR: the default place where a program writes error output

If you don’t specify a filehandle, print() defaults to printing to STDOUT. The following two print() statements are identical:

print $name;
print STDOUT $name;

Warning

Notice that there is no comma after the filehandle argument. If there were, Perl would assume that the filehandle is one of the list arguments you’re trying to print:

print STDOUT, $name; # probably not what you wanted

That will print something like:

No comma allowed after filehandle at myprogram.pl line 1.

However, a filehandle can be stored in a scalar and then Perl can’t determine what you mean:

use strict;
use warnings;
my $name = 'foo';
open my $fh, '>', 'somefile.txt'
  or die "Can't open somefile.txt for reading: $!";
print $fh, $name;

In the above example, Perl will try to print the filehandle and $name to STDOUT instead of what you probably wanted:

GLOB(0x100802eb8)foo

Again, filehandles will be covered in more depth in Chapter 9, Files and Directories.

STDOUT, short for “standard output”, generally goes to your terminal, but there are ways of redirecting it to files, sockets, or other places. We won’t be covering this in the book. Just remember that generally STDOUT is the “normal” printed stuff you see.

STDERR, short for “standard error”, also tends to be seen on your terminal, but can also be redirected to other locations. Error handling functions like die() and warn() will direct their output to STDERR. We’ll touch on error handling more in Chapter 7, Subroutines when we deal with subroutines. For now, just be aware that when you’re running a Perl program from the terminal, you’ll usually see both STDOUT and STDERR output written there.

sprintf() and printf()
sprintf FORMAT, LIST
printf  FILEHANDLE FORMAT, LIST
printf  FORMAT, LIST

The sprintf() and printf() functions format data according to the printf() function of the underlying C libraries. They are extremely useful when reporting. The sprintf() function will format and return the string while printf() will format and print the string. Any “extra” values in the list are ignored.

my @musketeers = qw( Aramis Athos Portos );
printf "%s,%s\n", @musketeers; # prints "Aramis,Athos"
my $two_musketeers = sprintf "%s,%s", @musketeers;
# $two_musketeers is now "Aramis,Athos"

Table 4.1. common printf() formats

Format

Meaning

%%

A percent sign

%c

A character

%s

A string

%d

Signed integer, in decimal

%u

Unsigned integer, in decimal

%o

Unsigned integer, in octal

%x

Unsigned integer, in hexadecimal

%e

Floating-point number, in scientific notation

%f

Floating-point number, in fixed decimal notation

%g

Floating-point number, in %e or %f notation


In addition to the common formats, Perl also supports several commonly accepted formats that are not part of the standard list of printf() formats. (see Table 4-2).

Table 4.2. Perl-specific printf() formats

Format

Meaning

%X

Like %s, but using upper-case letters

%E

Like %e, but using an upper-case “E”

%G

Like %g, but with an upper-case “E” (if applicable)

%b

An unsigned integer, in binary

%p

A pointer (outputs the Perl value’s address in hexadecimal)

%n

Special: stores the number of characters output so far into the next variable in the parameter list


When using sprintf() formats, you have a percent sign and a format letter. However, you can control the output by inserting attributes, also known as flags, between them. For example, inserting an integer controls the default minimum width:

my $formatted = sprintf "%20s", 'some name';
print "<$formatted>\n";

This code will print <some name> because the %20s format forces a string to be 20 characters long. That’s equivalent to:

printf "<%20s>\n", 'some name';

To left-justify the string, insert a - (hyphen) after the leading % symbol:

my $formatted = sprintf "%-20s", 'some name';
print "<$formatted>\n";
# <some name           >

Conversely, if you wish to enforce a maximum width, use a dot followed by a number:

printf "%.7s", 'some name';

That prints some na. You can also combine them, if you wish:

printf "%5.10s", $some_string;

That ensures that you will print a minimum of 5 characters (passing with spaces, if needed), and a maximum of 10. To force every string to be the same length - useful for reporting - set the minimum and maximum to the same value:

printf "%10.10s", $some_string;

You can also use the printf() formats to control numeric output, but we’ll cover that a bit later in the chapter when we cover numeric builtins.

Table 4-3 lists some of the common flags used with printf() formats.

Table 4.3. Common printf() flags

Flag

Meaning

Space

Prefix non-negative number with a space

+

Prefix non-negative number with a plus sign

-

Left-justify within the field

0

Use zeros, not spaces, to right-justify

#

Ensure the leading zero for any octal, prefix non-zero hexadecimal with 0x or 0X, prefix non-zero binary with 0b or 0B.


Note

See perldoc -f sprintf for a full description of the format options.

substr()
substr EXPR,OFFSET,LENGTH,REPLACEMENT
substr EXPR,OFFSET,LENGTH
substr EXPR,OFFSET

The substr() function takes an expression (usually a string) and an offset and returns the substring of the string, starting at the offset. Like the index() and rindex() functions, the offset starts at 0, not 1. The following prints hearted:

my $string = 'halfhearted';
my $substr = substr $string, 4;
print $substr;

You can also specify an optional length argument after the offset. This will limit the returned substring no more than the specified length. The following prints heart.

my $string = 'halfhearted';
my $substr = substr $string, 4, 5;
print $substr;

An underappreciated use of substr() is its lvalue property. In Perl, an lvalue is something you can assign to. The “l” stands for “left” and is found on the left side of an expression. For substr(), you can supply a replacement string for the string you’re returning.

my $string = 'halfhearted';
my $substr = substr $string, 0, 4, 'hard';
print "$substr\n$string\n";

That will print:

half
hardhearted

The substr() function is useful, but it’s often overlooked in favor of regular expressions, something we’ll cver in Chapter 8, Regular Expressions.

tr/// and y///
VARIABLE =~ tr/SEARCHLIST/REPLACEMENTLIST/cds
VARIABLE =~ y/SEARCHLIST/REPLACEMENTLIST/cds

The tr/// and y/// operators are identical. The y/// variant is exactly equivalent to tr/// but is provided for those who use Perl as a replacement for sed, a stream editor utility provided in Unix-like environments.

The tr/// builtin takes a list of characters on the left side and replaces it with the corresponding list of characters on the right side. It returns the number of characters replaced. The string being altered must be followed by the binding operator (=~). The binding operator is generally seen when using regular expressions and we’ll cover that in more detail in Chapter 8, Regular Expressions.

This might sound strange, so some examples are in order.

To replace all commas in a string with tabs:

my $string = "Aramis,Athos,Portos";
$string =~ tr/,/\t/;
print $string;

If, for some reason, you wanted to lower-case all vowels:

$string =~ tr/AEIOU/aeiou/;

You can also specify a range by adding a hyphen. To lower-case all letters (though obviously the lc() function would be clearer here):

$string =~ tr/A-Z/a-z/;

The tr/// builtin also accepts several switches, c, d, and s, but you probably won’t see them much in day-to-day usage unless you’re doing a heavy amount of text munging. Read perldoc perlop and see the Quote and Quote-like Operators section.

String operators

As mentioned, the difference between Perl’s functions and operators is a bit vague at times, but for convenience, we’ll refer to the punctuation bit as operators.

Repetition operator: x
STRING   x INTEGER
(STRING) x INTEGER

The x operator is for repetition. It’s often used to repeat a string several times:

my $santa_says = 'ho' x 3.7;
print $santa_says;

That assigns hohoho to $santa_says.

Sometimes you want to assign a single value multiple times to a list. Just put the string in parentheses to force list context:

my $ho = 'ho';
my @santa_says = ($ho) x 3;

@santa_says now contains the three strings ho, ho and ho.

Note

In many places where Perl expects an integer, a floating point number if fine. Perl will act as if you have called the int() function on the number. This includes using floating point numbers with the x operator, or even accessing array elements.

Concatenation operator: .
STRING . STRING

Unlike many other languages, the dot operator (.) is used for string concatenation instead of the + operator. Not only is this visually distinctive, it tells Perl that to treat the data as strings instead of numbers.

my $first   = 1;
my $second  = 2;
my $string  = $first . $second;
my $answer  = $first + $second;
print "$string - $answer";

That will print 12 - 3. This is because the concatenation operator considers the 1 and 2 to be strings and concatenates (joins) them. The addition operator, +, expects numbers and adds the 1 and 2 together, giving the answer of 3.

You can also “chain” together multiple concatenation operators. Here’s one way to join two strings with a space.

my $full_name = $first_name . ' ' . $last_name;
Autoincrement and Autodecrement operators: ++ --
++VARIABLE
--VARIABLE
VARIABLE++
VARIABLE--

The ++ and -- operators are for autoincrement and autodecrement. They will return the value of the variable and increase or decrease the variables value by one. They seem rather strange for strings, but they return the next or previous letter. If they’re used as a prefix operator (++$var), they change the value before returning it. If used as a postfix operator ($var++), they change the value after returning it. So if you want to find the next character after ‘f’, you can do this:

my $letter = 'f';
$letter++;
print $letter;

When you get past the ‘z’, the letters double. If $letter is ‘z’ and then you call $letter++, the $letter will now be ‘aa’. You won’t see this often in code, but your author has seen it used to create the prefix letters in code that automatically generated outlines.

In the faint hope of making this more clear, here’s exactly what perldoc perlop has to say on this subject:

If, however, the variable has been used in only string contexts
since it was set, and has a value that is not the empty string
and matches the pattern "/^[a-zA-Z]*[0-9]*\z/", the increment is
done as a string, preserving each character within its range,
with carry:

          print ++($foo = '99');      # prints '100'
          print ++($foo = 'a0');      # prints 'a1'
          print ++($foo = 'Az');      # prints 'Ba'
          print ++($foo = 'zz');      # prints 'aaa'

The “pattern” mentioned is what we refer to as a regular expression. We’ll cover those in Chapter 8, Regular Expressions, for now, understand that /^[a-zA-Z]*[0-9]*\z/ means that the string must match zero or more letters, followed by zero or more numbers.

Note

For the pedants in the audience, yes, the regular expression described for autoincrement/autodecrement matching can match a string consisting of zero letters and zero numbers, but the correct way to write it would have been a bit more cumbersome and probably obscured this even more:

/^(?:[a-zA-Z]*[0-9]+|[a-zA-Z]+[0-9]*)\z/

The main reason we mention autoincrement and autodecrement operators for strings is to introduce the range operators. Understanding that some operators can be used with both numbers and strings is essential to understanding some of the unusual aspects of Perl.

Note

Be careful when using the ++ and -- operators. perldoc perlop has this to say on the subject:

Note that just as in C, Perl doesn't define when the variable
Is incremented or decremented. You just know it will be done
Sometime before or after the value is returned. This also
means that modifying a variable twice in the same statement
will lead to undefined behaviour.  Avoid statements like:
    $i = $i ++;
    print ++ $i + $i ++;
Perl will not guarantee what the result of the above statements is.

To use these operators safely, don’t use them more than once with the same variable in the same expression. In fact, it’s often safer to place them on a line by themselves because they modify the variable in place and you don’t need to use the return value:

my $i = 7;
$i++;
# more code here

STRING .. STRING

The double dots, .., are the range operator. Though the range operator is usually used for numbers, it can also be used for letters. Here’s how to assign the lower-case letters ‘a’ through ‘z’ to an array.

my @alphabet = ( 'a' .. 'z' );

Of course, you can do this with upper-case letters, too:

my @alphabet = ( 'A' .. 'Z' );

If the left string is “greater” than the right string, nothing is returned.

Internally, when used with strings, the range operator uses the special autoincrement behavior discussed with ++ and --.

Note

The range operators actually have a tremendous amount of power and are useful in many more ways than shown here. Read the “Range Operators” section of perldoc perlop to learn more about them.

Scalar::Util

In Perl 5.7.3, the Scalar::Util module was included in the Perl core. This module implements a number of useful functions. The two most common are blessed() and looks_like_number(). The blessed() function is useful to determine if a scalar is actually an object (see Chapter 12, Object Oriented Perl on Object Oriented Perl) and the looks_like_number() function returns a boolean (true or false) value indicating whether or not a string, well, looks like a number. To use these functions, you must explicitly import them as follows:

use Scalar::Util 'blessed';
# or
use Scalar::Util 'looks_like_number';
# or both
use Scalar::Util qw(blessed looks_like_number);
my $is_number = looks_like_number('3fred'); # false
my $is_number = looks_like_number('3e7');   # true!

We’ll cover more about boolean values in Chapter 5, Control flow when we discuss conditionals.

Note

As usual, type perldoc Scalar::Util for more information. If you are using a version of Perl before 5.7.3, you may have to install this module from the CPAN.

Numeric

Naturally, Perl has plenty of numeric functions. It wouldn’t make much of a programming language if it didn’t! Many of them are the basic arithmetic operators you’re familiar with.

Arithmetic operators: +, -, *, / and **
NUMBER + NUMBER
NUMBER - NUMBER
NUMBER / NUMBER
NUMBER ** NUMBER

The +, -, *, and / operators are for addition, subtraction, multiplication and division, respectively. In terms of precedence, multiplication and division are calculated first, left to right, and addition and subtraction are calculated last, left to right. The following will print 21:

my $answer = 8 + 6 / 4 * 2;
print $answer;

Though your author generally avoids parentheses to avoid visual clutter, they are strongly recommended when doing math to avoid confusion. The above is equivalent to:

my $answer = 8 + ( ( 6 / 4 ) * 2 );
print $answer;

If you wanted the addition first, followed by the multiplication and then division, just use parentheses to group things logically:

my $answer = ( 8 + 6 ) / ( 4 * 2 );
print $answer;

Now you’ll have 1.75 as the answer instead.

Exponentiation is handled with the ** operator. To calculate the cube of 25:

print 25 ** 3;

That prints 15625.

Note

The arithmetic operators are infix operators. This means that they are placed in between a left and right operand. They have no meaning as postfix operators, but the + and - operators are special.

For the - operator, it can be used to reverse the sign of a number:

my $num1 = -17;
print -$num1;
my $num2 = 42;
print -$num2;

Those two print statements will print 17 and -42, respectively.

A prefix plus (referred to as a unary plus) has no distinct meaning but it is sometimes placed after a function name and before parentheses to indicate grouping. For example, the following doesn’t do what you want, it prints 3 and throws away the 4:

print (1 + 2) * 4;

Instead, use a unary plus to make it clear to Perl that the parentheses are for grouping and not for the function call.

print +( 1 + 2 ) * 4;
The modulus operator: %
INTEGER % INTEGER

The % is the modulus operator. This returns the remainder of the division between the left and right operands. Like many operators and functions which take integers, if floating point numbers are used, their integer value (see the int() function below) will be used. Thus, since 25 divided by 9 is 2 with a remainder of 7, meaning that 25 modulus 9 is 7.

print 25 % 9; # prints 7
abs()
abs (defaults to $_)
abs NUMBER

The abs() function returns the absolute value for a number. Thus, if the number is greater or equal to zero, you get the number back. If it’s less than zero, you get the number multiplied by -1.

expt()
exp (defaults to $_)
exp NUMBER

The exp() function returns e (approximately 2.718281828) to the power of the number passed to it. See also: log().

hex() and oct()
hex (defaults to $_)
hex STRING
oct (defaults to $_)
oct STRING

Given a string, hex() will attempt to interpret the string as a hexadecimal value and print the base 10 value. For example, the following are equivalent and will each print the decimal number 2363.

print hex("0x93B");
print hex "93B"; # same thing

Note that this works on strings, not numbers. The following prints 9059:

print hex 0x93B;

Why does it print that? Because 0x93B is a hexadecimal number and it’s evaluated as 2363. The hex() function then sees it as the string 2363, which if interpreted as a hexadecimal number, is 9059.

The oct() function is almost identical, but it expects strings which it can consider to be octal numbers instead of hexadecimal numbers. This means that each of the following lines will print the decimal number 63.

print oct("77");
print oct("077");

Note

If you need to go from decimal to either hexadecimal or octal, use the %h or %o formats for sprintf() and printf(), respectively:

printf "%x", 2363;
printf "%o", 63;

To format the hexadecimal number with a leading 0x, just add it to the string before the % character:

printf "0x%x", 2363;
# 0x93b

To format the octal number with a leading 0, use the # flag after the % character:

printf "%#o", 63;
# 077
int()
int (defaults to $_)
int NUMBER

The int() function returns the integer value of the number. In other words, it truncates everything after a decimal point.

print int(73.2); # prints 73

Note that for some programming languages, if all numbers in a mathematical operation are integers, an integer result is returned. For example, in Ruby, the following will print 3 instead of 3.5:

print 7/2;

In Perl we assume that you don’t want do discard this extra information, so it will print 3.5, as expected. To force an integer response, you can use the int() function:

print int(7/2); # prints 3

Note

To force integer math, you can also use the integer pragma. See perldoc integer for more information.

log()
log (defaults to $_)
log NUMBER

The log() function, as with most programming languages, returns the natural logarithm of NUMBER (the number raised to the power of e). See also: exp().

rand() and srand()
rand NUMBER
srand NUMBER

The rand() function returns a random fractional number between 0 and the number passed to it. If no number is passed, it assumes 1. If you prefer integer numbers, use the int() function with it. Thus, to simulate the roll of a six-sided die, you could do this:

print 1 + int(rand(6));

Adding 1 to it is necessary because otherwise you’ll get numbers between 0 and 5.

The srand() function is used to set the seed for the random number generator. As of Perl version 5.004 (released back in 1997), Perl will call srand() for you the first time that rand() is called. You only want to set the seed yourself if you wish to generate predictable “random” results for testing or debugging. As of Perl 5.10, srand() will also return the seed used.

Note

The rand() function is for convenience but it’s not strong enough for cryptography. The CPAN lists several useful modules, including Math::Random::Secure, Math::Random::MT::Perl, and Math::TrulyRandom that are intended for this purpose. Your author has no background in cryptography, so he can’t comment on their effectiveness.

sprintf() and printf()
sprintf FORMAT, LIST
 printf  FILEHANDLE FORMAT, LIST
 printf  FORMAT, LIST

We’ve already mentioned the sprintf() function in relation to strings and mentioned that it can be used to format numbers, but it should be pointed out that it can be used for rounding numbers when used with the %f template. You merely specify how many digits (optional) you want before the decimal point and how many digits you want after. Some examples:

printf "%1.0f", 5.2;   # prints 5
printf "%1.0f", 5.7;   # prints 6
printf "%.2f",  6.248; # prints 6.25

Often you see people recommending adding .5 to a number and calling the int() function to round off, but this will fail with negative numbers. Just use printf() or sprintf().

sqrt()
sqrt (defaults to $_)
sqrt NUMBER

Returns the positive square root of the number. Does not work with negative numbers unless the Math::Complex module is loaded.

use Math::Complex;
print sqrt(-25);

That prints 5i. If you are not familiar with imaginary numbers, you will probably never need (or want) the Math::Complex module.

Trigonometric function: atan2(), cos, and sin()
atan2 (defaults to $_);
atan2 NUMBER
cos (defaults to $_)
cos NUMBER
sin (defaults to $_)
sin NUMBER

The atan2(), cos() and sin() functions return the arcus tangent, cosine and sine of a number, respectively. If you need other trigonometric functions, see the Math::Trig or POSIX modules.

Bitwise

As one might expect, Perl also provides a variety of bitwise operators. Table 4-4 explains these operators.

Table 4.4. Common printf() flags

Operators

Type

Grammar

Description

&

Infix

NUMBER & NUMBER

Bitwise “and”

|

Infix

NUMBER | NUMBER

Bitwise “or”

^

Infix

NUMBER ^ NUMBER

Bitwise “xor”

~

Prefix

~NUMBER

Bitwise negation

<<

Infix

NUMBER << NUMBER

Left shift operator

>>

Infix

NUMBER >> NUMBER

Right shift operator


If you’re familiar with bitwise operators, these behave as you would expect. For example, a quick check if a number is even is as follows:

print "Even\n" if 0 == ($number & 1);

This is identical to the following modulus check:

print "Even\n" if 0 == ($number % 2);

Note

See Bitwise String Operators in perldoc perlop if these will be useful for you.. You may also use bitwise operators on strings.

Boolean

Boolean operators are use to determine true and false. We’ll cover their use in more detail in Chapter 5, Control flow, but they’re included here for completeness. Because Perl lets you assign strings and numbers to variables, the boolean operators are separated into string and numeric versions. You’ll learn the string versions first.

Even though we’ll cover their use in Chapter 5, Control flow, we’ll show the if/else statement now just so you can understand how they work.

The if statement takes an expression in parentheses and, if it evaluates as true, executes the code in the block following it. If an else block follows the if block, the else block will be executed only if the if expression evaluates as false. For example:

my ( $num1, $num2 ) = ( 7, 5 );
if ( $num1 < $num2 ) {
    print "$num1 is less than $num2\n";
}
else {
   print "$num1 is not less than $num2\n";
}

That code will print 7 is not less than 5. The < boolean operator is the boolean “less than” operator and returns true if the left operand is less than the right operand.

Now that we have this small example out of the way, here are the boolean operators.

eq, ne, lt, le, gt, ge, cmp

All of these are “infix” operators. They are “spelled out” in Perl to make it clear that they are for strings. Table 4-5 explains them.

Table 4.5. Boolean string operators

Operator

Meaning

eq

Equal

ne

Not equal

lt

Less than

le

Less than or equal to

gt

Greater than

ge

Greater than or equal to

cmp

String compare


A string is considered “less than” another string if, depending on your current locale settings, an alphabetical sorting of that string would cause it to come before another string. This means that a comes before b, punctuation tends to come before and numbers and numbers come before letters. Also, zzz comes before zzza because the first three letters of each match, but zzz is shorter than zzza. That also means that 100 comes before 99 when doing a string compare because 1 comes before 9. It’s a frequent trap that inexperienced Perl programmers fall into.

For example, the following will print yes because a comes before bb.

if ( 'a' le 'bb' ) {
   print 'yes';
}
else {
   print 'no';
}

The special cmp infix operator returns -1 if the left operand is less than the right operand. It returns 0 if the two operands are equal and it returns 1 if the left operand is greater than the right operand. The following, for example, prints -1:

print 'a' cmp 'b'

This seems strange, but it comes in very handy in sorting lists. We’ll cover it in more detail in Chapter 10, Sort, map and grep when we discuss sorting issues, but now be aware that you can sort a list alphabetically with the following:

my @sorted = sort { $a cmp $b } @words;

Actually, the sort() function defaults to sorting alphabetically, so that’s equivalent to this:

my @sorted = sort @words;

Naturally, all of these have numeric equivalents, as detailed in table 4-6.

Table 4.6. Boolean numeric operators

Operator

Meaning

==

Equal

!=

Not equal

<

Less than

<=

Less than or equal to

>

Greater than

>=

Greater than or equal to

<=>

Numeric compare


Those all behave as you expect. The numeric compare operator, <=> (sometimes affectionately referred to as the spaceship operator), has the same rules as the cmp operator, but does numeric sorting rather than alphabetical sorting. So to sort a list of numbers in ascending order:

my @sorted = sort { $a <=> $b } @numbers;

As a side note, you can sort numbers in reverse order by reversing the $a and $b:

my @descending = sort { $b <=> $a } @numbers;

Finally, we have the boolean operators which do not compare strings or numbers, but simply return true or false. Table 4-7 explains them:

Table 4.7. Boolean operators

Operator

Type

Meaning

!

Prefix

Equal

&&

Infix

And

||

Infix

Or

//

Infix

Defined or

not

Infix

Not

and

Infix

And

or

Infix

Or

xor

Infix

Exclusive or


Note

What is “truth”?

Sometimes people get confused about true/false values in Perl. It’s actually pretty simple. The following scalar values are all false in Perl:

  • undef

  • “” (the empty string)

  • 0

  • 0.0

  • “0” (the “string” zero)

Any other scalar value is true.

These operators return true or false depending on the true and false values of their operands. Here’s are some examples that should make their meaning clear:

if ( ! $value ) {
    print "$value is false";
}
if ( $value1 && $value2 ) {
    print "both values are true";
}
if ( $value1 || $value2 ) {
    print "One or both of the values are true";
}
if ( $value1 // $value2 ) {
    print "One or both of the values are defined";
}
if ( $value1 xor $value2 ) {
   print "Either $value1 or $value2 is true, but not both";
}

The not, and, and or operators are the equivalent of the corresponding !, && and || operators, but they have a lower precedence. See the section on “Precedence and Associativity” in the next section.

Note

The // operator is a bit special. Introduced in Perl version 5.10.0, it’s the “defined or” operator. The || operator evaluates the left operand to see if it’s true. The // operator evaluates the left operand to see if it’s defined (that is, if it has a value assigned to it) and if the left operand has any value, including one that is ordinarily considered to be false, then it is returned. Otherwise, the right operand is returned.

It avoids many bugs where you would ordinarily use the || operator but might accidentally ignore a valid value that happens to evaluate as false.

This feature is not available prior to version 5.10.0.

One useful feature to note is that boolean operators all return the first value evaluated that allows Perl to determine the condition is satisfied. For example, the && operator returns the left operand if it’s false. Otherwise, it returns the right operand.

my $zero  = 0;
my $two   = 2;
my $three = 3;
my $x = $zero  && $two;   # $x is 0
my $y = $three && $zero;  # $y is 0
my $z = $two   && $three; # $z is 3

However, this is more commonly used with the || and // operators (remember, // is only available on Perl version 5.10.0 and up) by assigning the first value that is not false (or not defined, in the case of the // operator):

use 5.10.0; # tell Perl we want the // operator
my $zero  = 0;
my $two   = 2;
my $three = 3;
my $undef;
my $w = $zero  || $two;   # $w is 2
my $x = $undef || $zero;  # $x is 0
my $y = $zero  // $two;   # $y is 0!
my $z = $undef // $three; # $z is 3

Assignment operators

Perl offers a wide variety of assignment operators, including many shortcut operators to handle common tasks.

=, +=, -=, *=, /=, ||=, //=, &&=, .=, |=, &= **=, x=, <<=, >>=, ^=

You’ve already seen the = assignment operator. It just tells Perl to evaluate the expression on the right and assign the resulting value to the variable or variables on the left. However, there are many shortcut assignment operators available. These operators save you a bit of typing. They’re in the form of ‘operator’ and the equals sign (=) and they tell Perl to treat the operator like an infix operator with the value you’re assigning to be the left operand, the value on the right to be the right operand and assign the results to the left operand.

The following examples all have the equivalent expression in the comment after the assignment.

$x += 4;      # $x = $x + 4;
$y .= "foo";  # $y = $y . "foo";
$z x= 4;      # $z = $z x 4;

Precedence and Associativity

What does the following do?

print -4**.5;

If you remember your math, raising a number to .5 is equivalent to take the square root of the number. If Perl evaluates the infix exponentiation operator (**) first, it means this:

print -sqrt(4);

If Perl evaluates the prefix negation operator (-) first, it means this:

print sqrt(-4);

The first version will print -2, but the second version, depending on how you wrote it and which version of Perl you use will print something like Can't take the sqrt of -4, or perhaps nan (which means “not a number”).

In this case, the exponentiation operator has a higher precedence than the prefix negation operator and thus will be evaluated first.

The main precedence rules that you need to remember are that math operations generally have the same precedence you learned in math class. Thus, multiplication and division (* and /) have a higher precedence than addition and subtraction (+ and -). So the following assigns 13 to $x, not 25.

my $x = 3 + 2 * 5;

But what happens when you have several of the same operator in the same expression? That’s when associativity kicks in. Associativity is the side from which the operations are first evaluated. For example, subtraction has left associativity, meaning that the leftmost operations are evaluated first. So 20 - 5 - 2 means 15 - 2, not 30 - 3.

On the other hand, exponentiation right associative. The following prints 512 (2 raised to the 9th power), and not 64 (8 squared).

my $x = 2 ** 3 ** 2;
print $x;

If you really wanted to print 64, use parentheses to force the precedence. Parenthesized items always have the highest precedence.

my $x = ( 2 ** 3 ) ** 2;

Table 4-8 lists the associativity of various operators, in descending order of precedence. Operators are separated by spaces rather than commas to avoid confusion with the comma operator.

Table 4.8. Operator Associativity

Operator

Associativtiy

Terms and list operators

Left

->

Left

++ --

Nonassoc

**

Right

! ~ \ and unary + and -

Right

=~ !~

Left

* / % x

Left

+ - .

Left

<< >>

Left

Named unary operators

Nonassoc

< > <= >= lt gt le gr

Nonassoc

== != <=> eq ne cmp ~~

Nonassoc

&

Left

| ^

Left

&&

Left

|| //

Left

.. ...

Nonassoc

?:

Right

= += -= *= and so on

Right

, =>

Left

List operators (rightward)

Nonassoc

not

Right

and

Left

or xor

left


The first item, terms and list operators, might sound strange. A term is a variable, quotes and quotelike operators, anything in parentheses and functions who enclose their arguments in parentheses.

Note

If you’re familiar with C, operators found in C retain the same precedence in Perl, making them a bit easier to learn.

Table 4-8 is a daunting list and memorizing it might seem like a scary proposition. In fact, many programmers recommend memorizing it and it’s not a bad idea, but there are a couple of issues. First, you may simply forget the precedence levels. Second, when the maintenance programmer behind you sees you abusing precedence and associativity, she’s not going to be very happy to stumble across this:

print 8**2 / 7 ^ 2 + 3 | 4;

Using parentheses can clarify that. The following means exactly the same thing:

print( ( ( ( 8**2 ) / 7 ) ^ ( 2 + 3 ) ) | 4 );

(Both of those lines print 12, by the way).

No, we are not advocating making such a complicated bit of code, but even for simple expressions it can come in handy to make it clear what you intended.

Array and list functions

Arrays and lists have a variety of useful functions that make them easy to manipulate.

pop() and push()

pop (defaults to @_)
pop  ARRAY
push ARRAY, LIST

The pop() function pops and returns the last value off the end of a list. The list length is shortened by one element.

my $last_element = pop @array;

The push() function pushes one or more values onto the end of an array, making it longer.

my @array = ( 1 .. 5 );
push @array, ( 6 .. 10 );

In the above example, @array now contains ten elements, the numbers 1 through 10, in the correct order.

Note

The @_ special variable is one we’ve not covered yet. It contains the arguments to subroutines and we’ll explain this more in Chapter 7, Subroutines.

shift() and unshift()

shift (defaults to @_)
shift   ARRAY
unshift ARRAY, LIST

The shift() and unshift() functions behave list the pop() and push() functions, but they operate on the beginning of the list.

splice()

splice ARRAY,OFFSET,LENGTH,LIST
splice ARRAY,OFFSET,LENGTH
splice ARRAY,OFFSET
splice ARRAY

The splice() function allows you to remove and return items from a list, starting with the OFFSET. If LENGTH is supplied, only LENGTH elements are removed. If a LIST is supplied, the removed elements are replaced with the LIST (possibly changing the length of the array). As usual, OFFSET starting with 0 is the first element of the list.

my @writers = qw( Horace Ovid Virgil Asimov Heinlein Dante );
my @contemporary = splice @writers, 3, 2;

The preceding example assigns Asimov and Heinlein to @contemporary, and leaves Horace, Ovid, Virgil and Dante in @writers.

If you do not specify an offset, the splice() function removes all elements from the array.

There are also a variety of list functions, some of which will we cover in far more depth in Chapter 10, Sort, map and grep, when we explain sort, grep and map in greater detail. We’ll give you some basics a little later in this chapter, though.

join() and split()

join STRING, LIST
split PATTERN, STRING
split PATTERN, STRING, LIMIT

The join() builtin takes a string and a list and joins every element in the list into a single string, with each element separated by the string value.

my $result = join "-", ( 'this', 'that', 'other' );

That will assign this-that-other to $result. As you might expect, you can use an array for the list. The following is identical behavior:

my @array = qw( this that other );
my $result = join '-', @list;

The opposite of join() is split(). However, the first argument to split is a regular expression pattern and we won’t be covering those until Chapter 8, Regular Expressions, so we’ll just give you a quick (incomplete) example of splitting a string on tabs:

my @fields = split /\t/, @string;

That will take a string, split it on the tabs (discarding the tab characters) and return the individual fields into the @fields array. The split() function is very powerful due to the power of regular expressions, but it has traps for the unwary, so we won’t cover it for now.

reverse()

reverse LIST

Does what it says on the tin: it reverses a list. However, in scalar context it concatenates the list elements and prints the reverse of the resulting string. The latter behavior can be confusing in some cases.

my @array    = ( 7, 8, 9 );
my @reversed = reverse @array;
my $scalar   = reverse @array;

In the preceding example, while the @reversed array now contains 9, 8 and 7 (in that order), the $scalar variable now contains the string 987. However, this behavior is very useful if you wish to reverse a single word:

my $desserts = reverse 'stressed';

Or if you prefer to be explicit:

my $desserts = scalar reverse 'stressed';

sort()

sort LIST

We’ve briefly touched on sort() earlier and we’ll cover it more in-depth in Chapter 10, Sort, map and grep, but here are a few examples to get your started. In these examples, you’ll note that an optional block occurs after the sort() function. As the sort function walks through the list, the special variables $a and $b contain the two elements to be compared while sorting. If you reverse them ($b, then $a), then the sort will occur in the reverse order than normal.

# sorting alphabetically
my @sorted = sort @array;
# sorting alphabetically in reverse order
my @sorted = sort { $b <=> $a } @array;
# sorting numerically
my @sorted = sort { $a <=> $b } @array;
# sorting numerically in reverse order
my @sorted = sort { $b <=> $a } @array;

Reversing the $a and $b to reverse the sort looks strange and you might be tempted to do this to sort a list in reverse alphabetical order:

my @sorted_descending = reverse sort @array;

That works and it’s very easy to read, but it has to sort the entire list and then iterate over the list again to reverse it. It’s not as efficient, particularly for huge lists. That being said, it may not be a big deal. If your program runs fast enough with the “reverse sort” construct, don’t sweat it. Making your programs easy to read is a good thing.

grep()

grep EXPR,  LIST
grep BLOCK, LIST

The grep() function filters a list of values according to whatever is in the BLOCK or EXPR (EXPRESSION). The name comes from an old Unix command of the same name, but it operates a bit differently in Perl. We’ll cover it more in Chapter 10, Sort, map and grep, but the basic usage is simple. Each item in the list is aliased to $_ and you can compare $_ to a value to determine if you want the selected value. For example, to get all values greater than 5:

my @list = grep { $_ > 5 } @array;

You can use this to rewrite an array in place. To remove all values less than 100:

@array = grep { $_ < 100 } @array;

The grep() function is extremely powerful, but we will wait until we know more about Perl to see the full power of this tool. Also note that the above syntax is the most common syntax for grep(), but it’s not the only syntax.

map()

map EXPR,  LIST
map BLOCK, LIST

The map() function, like the grep() function, takes a list and creates a new list. However, unlike the grep() function, it doesn’t filter a list, it applies a function to each element of a list, returning the result of the function. It aliases each element in a list to $_. To multiply every value in a list by 2:

my @doubled = map { $_ * 2 } @array;

Or to upper case every element in a list:

my @upper = map { uc($_) @array;

If you remember the uc() function, you know it defaults to operate on $_, so the above can be written as:

my @upper = map { uc } @array;

The map() and grep() functions can also be chained. If you want to take the square root of all values in a list which are greater than zero, just use map() and grep() together.

my @roots = map { sqrt } grep { $_ > 0 } @numbers;

Many programmers like to put the map() and grep() on separate lines on the theory that it makes the code easier to read. This is true, particularly if your map() and grep() blocks are complicated.

my @roots = map  { sqrt }
            grep { $_ > 0 }
            @numbers;

Like grep(), there’s a huge amount of power here that we’ve barely touched and will cover more later.

The map() and grep() functions are often very confusing to new Perl programmers, but they are core to the power of Perl. It’s very important that you take the time to understand them and know completely how they work.

One caveat about map() and grep(): they operate on every element of a list. If you need to only operate on a few of the elements or if your map() and grep() statements are very complicated, it’s better to use a for loop with the array. We’ll cover those in Chapter 5, Control flow.

List::Util

Starting with Perl 5.7.3 (released in March of 2002), the List::Util module was bundled with Perl. This module includes many list functions that provide even more power when dealing with lists and arrays. For example, to sum all elements in a list together, you can do this:

use List::Util 'sum';
my $total = sum @numbers;

Because sum() accepts lists and not just a single array, you can use multiple arrays:

my $total = sum @weight_supplies, @weights_food;

See perldoc List::Util for a full list of useful functions included. There’s also the List::MoreUtils module, but you’ll need to install that from the CPAN.

Hash functions

Hashes, of course, also have useful functions to help you work with them.

delete()

delete KEY

The delete() function removes a key/value pair from a hash.

my %birth_year_for = (
    Virgil                       => '70 BCE',
    Shakespeare                  => '1564 CE',
    'Elizabeth Barrett Browning' => '1806 CE',
    'Carrot Top'                 => '1965 CE',
);
delete $birth_year_for{'Carrot Top'};

That, thankfully, removes Carrot Top from your list of birth years.

exists()

exists KEY

But how do you know that you really deleted a given key/value pair in a hash? You can check it with the exists() function.

if ( exists $birth_year_for{'Carrot Top'} ) {
    print "Carrot Top not expurgated!";
}

keys()

keys HASH

Sometimes you just want to iterate over all the keys to the hash. This is easy with the keys() function.

for my $key (keys %hash) {
    if ( $hash{$key} < 10 ) {
        delete $hash{$key};
    }
}

values()

values HASH

Or if you want to just inspect the values of a hash, use the values() function:

my @large_enough = grep { $_ >= 10 } values %hash;

each()

each HASH

If you prefer, you can iterate over the keys and values at the same time using the each() function and a while loop. We’ll explain while loops in Chapter 5, Control flow, but for now, just know that it looks like this:

while ( my ( $key, $value ) = each %hash ) {
    print "$key: $value\n";
}

In the previous example with keys(), you saw how to delete items from the hash. It is generally OK to do this even when using the each() function, but do not add key/value pairs to the hash. This will break the each() function and you’ll get unpredictable results. Also, don’t call the each function if you’ll be calling other code at the same time (typically via a subroutine - Chapter 7, Subroutines) if you can’t guarantee that it won’t also try to iterate over the same hash. This is because calling each() twice on the same hash at the same time means that the each() function will not be able to figure out what you meant to do. When in doubt, just use keys().

# this is always safe
for my $key (keys %hash) {
    my $value = $hash{$key};
}

Scoping keywords

A variety of keywords in Perl can affect the scope of variables or are related to scoping issues. You’ve already seen some of these, but we’ll cover them again for completeness.

my()

my VARIABLE
my (LIST OF VARIABLES)

The my() builtin declares a new variable or list of variables. They are locally scoped (only visible) to the file, block or eval in which they are declared.

local()

local VARIABLE
local (LIST OF VARIABLES)

The local() builtin declares a variable or list of variables to the current file, block or eval. It does not work on lexical variables (those declared with the my builtin).

As a general rule, you want to minimize your use of local(), but it’s important to use it when working with Perl’s global variables, filehandles, globs or package variables. It’s very useful when you want to temporarily override a value and ensure that called subroutines see your new value, or to make sure that you don’t accidentally change a global value. We’ll see more of this in subsequent chapters, particularly the chapter on subroutines, Chapter 7, Subroutines.

our()

our VARIABLE
our (LIST OF VARIABLES)

The our() builtin allows you to declare package variables in the current package without needing to use the full package name. The following declares the package variable $Foo::manchu.

package Foo;
our $manchu = 'Computer Criminal';

You could do the following, but note how we’ve accidentally misspelled the package name:

package Foo;
$Fu::manchu = 'Computer Criminal';

Many developers use the our keyword to declare package variables at the top of a package. This is a bad habit. The use of our should be discouraged unless you absolutely need to share a variable value outside of your package. Even then, it’s better done through a subroutine to preserve encapsulation and help avoid typos. We will cover this more in Chapter 11, Packages and Modules when we describe packages and modules in more details.

state()

state VARIABLE

Beginning with Perl version 5.10.0, you could declare state variables. These are like declaring variables with my(), but they are only initialized once and retain their value. For example, writing a subroutine (covered in Chapter 7, Subroutines) that tracks how many times it’s been called is easy:

sub counter {
   state $counter = 1;
   print "This sub was called $counter times\n";
   $counter++;
}
for (1..10) { counter() }

Prior to version 5.10.0, you would have had to write that subroutine like this:

{
    my $counter = 1;
    sub counter {
        print "This sub was called $counter times\n";
        $counter++;
   }
}
for (1..10) { counter() }

That’s pretty ugly and can obscure the intent of what’s going on. The state() builtin makes this more clear.

For reasons of backwards compatibility, you cannot use the state() builtin unless you ask for it:

use feature 'state';

Or you specify a minimum version of Perl:

use 5.10.0;

The latter syntax asserts that your code can use all features available in that version of Perl.

State variables are generally used in subroutines, so we won’t cover them more for the time being.

Exercises

Q:

1. Which of the following variables will evaluate to true?

my $first  = undef;
my $second = ' ';     # a single space
my $third  = 0.0;
my $fourth = '0.0';
my $fifth  = 0;
my $sixth  = 'false';

Q:

2. Given the following array of Fahrenheit values, create a new array, @celsius, containing the Fahrenheit temperatures converted to Celsius. Remember that to convert Fahrenheit to Celsius, you must first subtract 32 and then multiply the number by 5/9.

my @fahrenheit = ( 0, 32, 65, 80, 212 );
my @celsius    = ...

Q:

3. Given an array called @ids, create a new array called @upper containing only the values in @ids which were all upper case to begin with.

my @ids   = qw(AAA bbb Ccc ddD EEE);
my @upper = ...

When you are finished, @upper should only have the values AAA and EEE.

Q:

4. What values do $answer1, $answer2 and $answer3 contain after all of these statements have been executed?

my $answer1 = 3 + 5 * 5;
my $answer2 = 9 - 2 - 1;
my $answer3 = 10 - $answer2++;

What You Learned in This Chapter

Topic

Key Concepts

String/Numeric builtins

Core data manipulation.

Bitwise operators

Manipulating binary data.

Boolean operators

How “truth” works in Perl.

Assignment operators

How to assign data to variables.

Precedence

The order in which builtins are evaluated.

Associativity

The direction in which identical operators are evaluated

Array and list functions

Manipulating arrays and lists

Hash functions

Manipulating hashes

Answers

Chapter 4 Exercise Answers

Following are the answers to the exercises in Chapter 4, Working With data.

Answer to Question 1

  1. Which of the following variables will evaluate to true?

    my $first  = undef;
    my $second = ' ';     # a single space
    my $third  = 0.0;
    my $fourth = '0.0';
    my $fifth  = 0;
    my $sixth  = 'false';

    Answer: $second, $fourth and $sixth will all evaluate to true. The $fourth variable is a bit of a trick. Even though it looks like 0.0, because it’s a string, it evaluates as true because all non-empty strings evaluate as true. To make it evaluate as false, add zero to it:

    0+$fourth;

    That will force Perl to consider it a number.

Answer to Question 2

  1. Given the following array of Fahrenheit values, create a new array, @celsius, containing the Fahrenheit temperatures converted to Celsius. Remember that to convert Fahrenheit to Celsius, you must first subtract 32 and then multiply the number by 5/9.

    my @fahrenheit = ( 0, 32, 65, 80, 212 );
    my @celsius    = map { ($_-32) * 5/9 } @fahrenheit;

Answer to Question 3

  1. Given an array called @ids, create a new array called @upper containing only the values in @ids which were all upper case to begin with.

    my @ids   = qw(AAA bbb Ccc ddD EEE);
    my @upper = grep { $_ eq uc($_) } @ids;

    When you are finished, @upper should only have the values AAA and EEE.

Answer to Question 4

  1. What values do $answer1, $answer2 and $answer3 contain after all of these statements have been executed?

    my $answer1 = 3 + 5 * 5;
    my $answer2 = 9 - 2 - 1;
    my $answer3 = 10 - $answer2++;

    $answer1 will contain 28 because the multiplication operator has a higher precedence than addition.

    $answer2 will initially contain 6 because subtraction is left associative, but after the autoincrement in the third line, it will contain 7.

    $answer3 will contain 4 because $answer2 will be subtracted from 10 before it is incremented. If the ++ autoincrement operator was before the $answer2 (10 - ++$answer2), it would have contained 3.

    If the autoincrement operator confused you, that’s OK. That’s why we often recommend that those lines be rewritten as follows:

    my $answer2 = 9 - 2 - 1;
    my $answer3 = 10 - $answer2;
    $answer2++;

    By having autoincrement and autodecrement operators in their own statements, the code is often easier to understand.

Site last updated on: July 5, 2012 at 11:41:08 AM PDT
Cover for Beginning Perl (Wrox)

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add serial comma in "string, numeric, bitwise and boolean data".

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add serial comma in "Perl, PHP, Python and Ruby".

Add a comment

View 2 comments

  1. dawpa2000 – Posted June 15, 2012

    In "might see my described", "my" needs to be in a code tag.

  2. Curtis Poe – Posted June 19, 2012

    @dawpa2000: Fixed, thanks.

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add serial comma in "many string, numeric, bitwise and boolean operators".

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add serial comma in "prefix, infix or postfix".

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add comma after "Sometimes".

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add serial comma in "+, ==, << and so on".

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add comma after "chop() returns whatever character was removed from the string".

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add serial comma in "lc(), lcfirst(), uc() and ucfirst()".

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add comma after "Naturally".

Add a comment

View 2 comments

  1. Ben Bullock – Posted June 15, 2012

    "and had you’ve seen" -> "and you’ve seen"

  2. dawpa2000 – Posted June 15, 2012

    Also, add comma after "We’ve used print() quite a bit".

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add comma after "but for now".

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add comma after "generally".

Add a comment

View 2 comments

  1. Ben Bullock – Posted June 15, 2012

    For %X, it should be changed

    "Like %s, but using upper-case letters" -> "Like %x, but using upper-case letters"

  2. Curtis Poe – Posted June 19, 2012

    @Ben: Fixed, thanks!

Add a comment

View 2 comments

  1. dawpa2000 – Posted June 15, 2012

    Replace dashes with em dashes.

  2. Curtis Poe – Posted June 19, 2012

    @dawpa2000: fixed, thanks.

Add a comment

View 1 comment

  1. chrisjack1 – Posted June 27, 2012

    Missing word "to": This will limit the returned substring to no more than the specified length

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add comma after "Sometimes".

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add serial comma in "ho, ho and ho".

Add a comment

View 2 comments

  1. dawpa2000 – Posted June 15, 2012

    In "a floating point number if fine", replace "if" with "is".

  2. Curtis Poe – Posted June 19, 2012

    @dawpa2000: fixed, thanks.

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Put "perldoc perlop" in code tags.

Add a comment

View 1 comment

  1. perrettdl – Posted June 19, 2012

    After:

          print ++($foo = 'Az');      # prints 'Ba'
    

    Add:

          print ++($foo = 'a9');      # prints 'b0'
    

    ...as it is not obvious what happens when rounding up across a w|d boundary using the current examples (you do not, for example, get 'a10').

Add a comment

View 1 comment

  1. Ben Bullock – Posted June 15, 2012

    "in Chapter 8, Regular Expressions, for now" -> "in Chapter 8, Regular Expressions. For now"

Add a comment

View 1 comment

  1. Yary – Posted June 20, 2012

    Capitalization: "...the variable Is incremented..."

Add a comment

View 1 comment

  1. Yary – Posted June 20, 2012

    "they modify the variable in place and you don’t need to use the return value" strikes me as odd since I often do need to use the return value. Maybe say "In fact, because they modify the variable in place, you can place them on a line by themselves." and emphasize the point by adding a comment:

    $i ==8 now

    before "# more code here"

    Edited on June 20, 2012, 5:33 a.m. PDT

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add comma after "(see Chapter 12, Object Oriented Perl on Object Oriented Perl)".

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add serial comma in "Arithmetic operators: +, -, , / and *".

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add serial comma in "addition, subtraction, multiplication and division".

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add comma after "Now".

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add comma after "In Perl".

Add a comment

View 2 comments

  1. dawpa2000 – Posted June 15, 2012

    Misaligned?

  2. Curtis Poe – Posted June 19, 2012

    @dawpa: fixed, thanks!

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add serial comma in "The atan2(), cos() and sin() functions".

    Add serial comma in "arcus tangent, cosine and sine of a number".

Add a comment

View 2 comments

  1. dawpa2000 – Posted June 15, 2012

    Extra period after "you".

  2. Curtis Poe – Posted June 19, 2012

    @dawpa2000: fixed, thanks!

Add a comment

View 2 comments

  1. dawpa2000 – Posted June 15, 2012

    In "Boolean operators are use", change "use" to "used".

  2. Curtis Poe – Posted June 19, 2012

    @dawpa2000: fixed, thanks.

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add comma after "It returns 0 if the two operands are equal".

Add a comment

View 1 comment

  1. Yary – Posted June 20, 2012
    1. "! Prefix Equal" in the first row should be "! Prefix Not"
    2. How about adding "lower precedence" to the meanings of the "english" ops to help show difference between "&&" vs "and", "||" vs "or", "!" vs "not" at a glance. I missed the description you have a couple paragraphs down at first reading.

    Edited on June 20, 2012, 5:47 a.m. PDT

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add comma after "Sometimes".

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add serial comma in "corresponding !, && and || operators".

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add comma after "(that is, if it has a value assigned to it)".

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add period after "They’re in the form of ‘operator’ and the equals sign (=)".

    Change

    "they tell Perl to treat the operator like an infix operator with the value you’re assigning to be the left operand, the value on the right to be the right operand and assign the results to the left operand."

    to

    "They tell Perl to treat the operator like an infix operator with the value you’re assigning to be the left operand and the value on the right to be the right operand, and to assign the results to the left operand."

    Edited on June 15, 2012, 9:19 p.m. PDT

Add a comment

View 2 comments

  1. Remanence – Posted June 15, 2012

    The last sentence should be "So 20 - 5 - 2 means 15 - 2, not 20 - 7." This is because - 5 - 2 = -7 not -3, and 30 is a typo.

    Since subtraction is associative; i.e. (20 -5) -2 = 20 + (-5 - 2). Mathematically, the side from which the operations are first evaluate doesn't matter.

  2. Curtis Poe – Posted June 19, 2012

    @Remanance: the 30 is a typo and I've fixed it. Thanks!

    However, subtraction is not associative. Note that your example mixes the addition of negative numbers (associative) with the subtraction of numbers (not associative). See the wikipedia article for more information.

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add comma after "many programmers recommend memorizing it".

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add comma after "It contains the arguments to subroutines".

Add a comment

View 2 comments

  1. dawpa2000 – Posted June 15, 2012

    In "behave list the", change "list" to "like".

  2. Curtis Poe – Posted June 19, 2012

    @dawpa2000: fixed. Thanks!

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add serial comma in "Horace, Ovid, Virgil and Dante".

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add serial comma in "sort, grep and map".

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add comma after "in scalar context".

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add serial comma in "contains 9, 8 and 7".

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add comma after "we’re going to take upper and lower case versions of a name".

    Add comma after "but in the first argument".

    Change "we use join() the" to "we use join() on the".

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add serial comma in "file, block or eval".

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add serial comma in "file, block or eval".

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add serial comma in "$answer1, $answer2 and $answer3".

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Change "Following" to "The following" or "Here".

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add serial comma in "$second, $fourth and $sixth".

Add a comment

View 1 comment

  1. dawpa2000 – Posted June 15, 2012

    Add serial comma in "$answer1, $answer2 and $answer3".

Add a comment