11 posts tagged “perl”
Enough advocacy, let's get to the nitty gritty of 5 things I hate about Perl. (After brian d foy.)
-
The difference between list and scalar context is subtle. One bug that bites people too often is this (for some value of "people", including me, far too often):
sub myfunc { return }
my $scalar = myfunc(); # contains undef
my %hash = (
key => 'value',
key => myfunc(),
);Luckily in this case we'll get a "Odd number of elements in hash declaration" warning. Perl's warnings are mostly very useful and surprisingly helpful. However:
-
Some warnings suck. Yes, some of them almost always point out an error (the void context error is useful: I usually find it means I've written: my $x => 1 instead of my $x = 1) but some are more irritating.
When was the last time an 'uninitialized' warning had any effect on your code apart from making you have to sprinkle $x ||= '' throughout your code? Yet I usually restrain myself from adding no warnings 'uninitialized' because maybe one time in a hundred there's a genuine bug I need to know about.
Similarly 'once' warnings sound useful, but in practise are usually because you've referred to a magic global that's used plenty of times in the library you want to use. For example, from List::Util's documentation,
my $sum = reduce { $a + $b } @list;
-
Argument passing. Yes, it's really flexible to be able to unpack @_ in any way you want but I'd like a proper way of doing it. (And yes, there are all kinds of solutions including Devel::Declare, but none are standard yet). Oh, and we don't have full destructuring bind, so yes, you can do ($x,$y) = ($y,$x) and my ($self, %args)=@_, but not my ({foo=>$foo}, [undef,undef,$bar]) = @_.
-
It's a big language, with lots of syntax, several powerful minilanguages, a vast array of standard idioms, a large set of standard libs (including various incompatible ways of doing similar tasks) and a truly staggering number of 3rd party libs, CPAN, with even more ways of doing it. Actually, learning a big language is fine, but the bigness is one of the things making Perl difficult to parse. The other is the sheer amount of flexibility that Perl gives you to shoot yourself in the foot, change the way the language is parsed etc. Only Perl can parse Perl, and usually only after executing part of your code first. Yay for exploits on your language-aware text editor!
-
Functional programming isn't easy or elegant. Yes, we have first class functions, but things like argument unpacking being ugly make it less convenient. Little kludges like the functions passed to map and grep taking $_ rather than the usual argument list @_ just add to the fun ($_ seems like a special-cased points-free hack, and it's far less consistent or common than Haskell's currying)
I also dislike that regex substitution etc. can't be done as a non-destructive function.
Hmmm, there are probably more things, but those are the main ones. In particular I don't hate references (yes, they take a little learning, and I'm aware that's a stumbling block for most learners, but they are quite sensible once you do) or OO (baroque, but quite capable, and see Moose for a cute modern take on it).
In the last post, I mentioned that we might be able to improve the performance of our sort using a "http://en.wikipedia.org/wiki/Schwartzian_transform".
This basically involves precaching the expensive calculation (in this case, length), sorting by the cached value, then taking just the original values.
Let's test with a list of words:
words "If on a winter's night a traveller"
So instead of the original sortBy (flip $ comparing length), we'd have something like:
map fst
. sortBy (flip $ comparing snd)
. map (id &&& length)
Let's read it from the bottom. First of all we create a list of tuples using the rather wonderful &&& operator from Control.Arrow.
[("If",2),("on",2),("a",1),("winter's",8),("night",5),("a",1),("traveller",9)]
Then we sort comparing the new second field of this tuple.
[("traveller",9),("winter's",8),("night",5),("If",2),("on",2),("a",1),("a",1)]
Finally we map again, getting just the first element of the tuple.
["traveller","winter's","night","If","on","a","a"]
And we can easily abstract this with a new function
sortST cmp f = map fst
. sortBy (cmp snd)
. map (id &&& f)
Now we can write:
sortST comparing length listOfWords
sortST (flip.comparing) length listOfWords
This is very similar to the sortBy syntax, except that we've separated out the "comparing" from the "length" clause, in order to compose the two separately for the new transformation.
I've been away for a while from Haskell so I thought I should do some revision and really get my head around Monads. While I plodded through the wonderful "meet the monads" tutorial, I decided that the best way to learn would be to do. By implementing Monads in Perl. I'd highly recommend trying to implement monads in Your Favourite Language, if it supports lambdas. Perl has already been done by Greg Buchholz and rather nicely too, but there's no Monad library on CPAN so I thought it would be worth a try.
First of all, the question of how to model "types" is easily resolved. We bless each monad into the Monad class or a subclass. These can then have methods for bind and return etc.
Now I do like the haskell >> and by a stroke of good fortune, Perl allows us to overload that symbol too.
use overload '>>' => 'Bind';
I use the string 'Bind' rather than the reference \&Bind, so that the subclasses can easily override it.
Some default bind methods in Monad.pm and Monad::Maybe etc., available here and we have some simple examples like this one (in test.pl):
my $result =
(Writer 2) >>
L { my $x = shift; (Writer $x*2, "Doubled. ") >>
L { my $y = shift; (Writer $y+1, "Plus 1. ") >>
L { my $z = shift; (Writer $z*3, "Tripled $z. ")
}}};
Woot! OK, that's not entirely beautiful, but it's been slightly improved by the overloading of >>.
The L lambda generator is also there for readability. It's basically defined as
sub L (&) { shift }
i.e. it's an identity function, but it's an L (like lambda) and to my mind, lined up on the left, it looks pleasingly like "and then".
Nests
This didn't just fall straight out of the text editor into fully working code, of course. A blow-by-blow account of me getting confused wouldn't be especially interesting, but one big "aha" moment is worth pointing out. I realised that I was thinking of monads as being a chain of lambdas, each one passing control to the next, like OO chaining:
But that doesn't work, as of course then the $x, $y, $z of each scope would be separate, whereas in fact, in "later" sections, you can refer to $x too. This implies that the model is more like a nest of lambdas:
This is made fairly clear in the Perl above, with its delimited braces, if you look at where the closing "}" are, and which opening "{" they match up with.
This is an interesting mind shift, and one that I still haven't really fully grasped, as I'll demonstrate a bit later.
Polymorphic functions on monads
In Haskell, you can call "return" in a monadic block to "lift" a value to the appropriate monad. Similarly, you can call "fail", and the function will fail in the right way (returning Nothing in a Maybe, throwing an error in IO). This is a function call, not a method, so how does it know which monad to behave as?
Of course Haskell does this with its strong inferencing typechecker. The compiler "knows" that we are in Maybe, so "fail" will be fail :: Maybe.
Perl on the other hand doesn't have a strong type-inferencing compiler... Right now I'm doing some shonky magic with caller() that works in this very simple test case (and I believe only in this test case). I think I could just simplify things and set a dynamic variable "$Monad::current_monad" on the first occurrence of Bind. Yeah, global variables, yuck. The final alternative that occurs to me would be to run the whole thing in a Reader monad which just passes the name of the monad... but I'm fairly sure that's slightly insane.
So what can it do right now?
The test script shows the current capabilities. As of r246, I have Writer, Maybe, and List implemented (the Monad superclass is effectively Identity).
I think Maybe is very useful - with some wrapper functions that raise Perl functions to monadic ones using a variety of strategies (fail on undef/0/die etc.) it could be a useful addition to the toolbox, simplifying a nested set of if checks.
The List monad already does list comprehensions, albeit with a rather yucky syntax. Which is of course the big problem, 'cos Perl programmers (and this statement may surprise non Perl programmers :-) are often obsessive about syntax.
Making it look pretty
OK, so we already added a bit of sugar with the >> overloading, and the L function for lambda generators, but it's still rather ugly with the mix of Perlish argument unpacking (my $x = shift), scope delimiters (}}}) etc.
Source filters!
The original Perl monad tutorial used a source filter to give a monadic Do notation. It's a fairly nice one as they go, but I don't really want to treat my program as a string if I can help it, so let's look at some other techniques first!
Devel::Declare
Matt Trout has been working on some crazy parsing magic in Devel::Declare. This isn't a source filter, but (I think) hooks into Perl's parser to change the way that subroutine declarations are parsed. It'd designed to give us parameter unpacking, so that we could substitute:
with:L {my $x = shift; .... }
L ($x) { .... }
In the current version this doesn't work (you can define L like that easily, but the overloaded >> evidences a minor parsing bug (you'd have to put the expression between parentheses to get the precedence right, which loses the syntactic advantage we gain).
Still, hopefully will be fixed in a future release.
Generators
"Valued Lessons" has a beautiful post on Monads in Python (with nice syntax!). The parenthesis is not hyperbole: the post describes a monadic do block which looks about as pretty as Haskell's, but which works in a different way. We spell 'bind' (Haskell's <-) as 'yield'. So a control sub calls the 'do' block, gets out monadic values one by one as they are yielded back, and deals with the nitty gritty of binding them to the rest of the generator.
It took quite a while to understand the Python code: in fact I'm not sure I understand it fully, I really don't buy into the "Python is so easy to read" meme, and certainly the "@whatever" syntax, which seems to be 'decorators' that modify the subroutine that follows them, are rather confusing at first. But it's quite impressive, and it took me a while to replicate in Perl.
First hurdle: Perl doesn't have generators. OK, that shouldn't be an issue, I thought, because we have the CPAN. And yes, I found Brock Wilcox's Coro::Generator.
This doesn't quite do what I want though. The yield only works one way, so
doesn't actually bind $x to 3. I asked Brock on IRC, and apparently this behaviour is desired (I'm not quite sure why) so I forked his code to play with it :-) Also, the coroutine restarts immediately it finishes, which is inconvenient. Brock suggested yielding undef at the end, which is fine, I can do that from the control sub. (The Python version deals with finishing by throwing an exception, so perhaps it has the same semantics?)my $x = yield (Monad 3);
After a lot of ugly pain, I finally got this working, and we can now do:
my $result = Do {
my $x = yield (Just 3);
my $y = yield (Nothing);
my $z = yield (Just 5);
warn "x=$x, y=$y, z=$z";
Just 6;
Why the pain? Failing to understand coroutines while trying to use them to implement monads (which I understand only very slightly) was a bad start. I found myself using the Do function to repeatedly take a value from the generator and bind it with the next value (rather than letting the monadic bind deal with those details). And even when I'd realised that the sub that I needed to bind was a lambda that would abstract the details of invoking the coroutine, I still ended up flailing around more or less at random till I finally got it working.
The current code is ugly (declared inline in test.pl rather than modularized) but the result is pleasantly magical and readable.
Props of course to Python for having powerful techniques like yield and decorators in core!
Hold the champagne
Of course the final test example, in the List monad doesn't work. Why? The List monad's bind strategy is to call the function on every element of the list, so the coroutine will get called repeatedly. And every time it's called, the execution pointer will move on.
I wonder whether the Python version has the same problem? I looked again at the Coro modules on CPAN, and noted that they are advertised as being able to implement "(non-clonable) continuations". I think this is the problem: I want to be able to take the point at which the next Bind will be called, and call exactly that same point multiple times (for the List monad). I asked various people including Brock again, and Scott Walters (the authors of Continuity, a continuation-based web application framework in Perl) and got the answer that Perl really doesn't do proper continuations. (As far as I understood it, they're more or less practically impossible, due to the way Perl models its execution context).
So, unless I've misunderstood (and please let me know if I have!) this technique is limited to monads that only call the bound function once (e.g. most of them except List). That's a shame though, as the List comprehension semantics would be lovely to express in a monadic do block.
Meta continuations
The Valued Lesson post does implement continuations monadically... Could we do that and then implement monadic do using these monadic continuations? I think the answer might be "Yes but my brain would explode trying to implement it".
Plan B
I think that the most sensible method may be to take the contents of the monadic do block and use the B:: modules to convert them from what looks like
tomy $x = bind ...;
. Which is pretty much the approach of Greg Buchholz's source filter. But I think a parse tree transformation may be more elegant. (This said, I don't know the Perl source or understand the opcodes, so it may just be slightly crazy).... >> sub { my $x = shift;
Update: Some discussion on reddit, as Vox still doesn't support OpenID
Thanks to Chris and Thom for organising the recent GeekUp Liverpool talks. The evening started well with Martin Owen's very interesting talk on justifying Erlang, and why FP is going to become a major factor in tomorrow's concurrency-oriented world. I demonstrated the wisdom of always saving a copy of your talk in PDF to a FAT formatted USB drive by being able to seamlessly present from Martin's laptop when my own decided that it really wanted to do a fsck right now, thanks very much.
My talk went well, thanks in no small part to a very receptive and welcoming audience (largely non-Perl programmers). I've now posted the slides to http://greenokapi.net/talks/ReadablePerl.pdf under a Creative Commons Attribution-Non-Commercial 2.0 UK: England & Wales license. Comments and questions welcome!
A varied discussion ensured, spanning legacy code, the value (or not) of training, vitamins and handmade chess-sets, Liverpool city of cultcha, and much much more.
Thom has posted the details of the next Liverpool geekup, Tuesday May 27th at 3345 Parr Street.
Last time, I admitted to programming in Perl and got ribbed about it being unreadable. Fueled by the fervour of the righteous (and maybe a pint or two of Cains bitter) I volunteered to do a talk on Readable Perl.
People like to claim Perl is line noise, with its sigils and regular expressions. But a lot of the features that make it possible to write, yes, truly awful, unreadable Perl, also let you write clean, maintainable code too.
- those $%&* sigils!
- there's More Than One Way To Do It
- strings and data structures
- map, grep, first class functions
- metaprogramming and the CPAN
- modern Object Oriented programming with Moose
Martin Owen will also be talking on Erlang, which I'm very much looking forward to!
Hope to see you there :-)
On #moose, the channel for a popular modern dialect of Object Oriented Perl, Debolaz asked:
<Debolaz> Is there a simple way to produce an array of array
like [[1..4],[1..4]] without having to specify the
[1..4] multiple times?
Matt Trout's reply confused me at first
<@mst> map { [ 1..4 ] } (1, 2)
because I thought there was a more straight-forward obvious answer:
([1..4]) x 2
Of course, as it happens, I was wrong... there is a problem with my version, and it's clear if you print the structure out using Data::Dumper
$VAR1 = [
[ 1, 2, 3, 4 ],
$VAR1->[0]
];
The same reference is cloned. This wouldn't be a problem in Haskell of course — being a pure language, there's no way reusing the same memory address could cause a problem. So you can just write:
> replicate 2 [1..4]
Peregrin suggested
(eval { [1..4] }) x2
but this also reused the array. My best solution was
use Storable qw(dclone);
map { dclone $_ } ([1..4]) x 2
which is also quite yucky. Debolaz ended up using a variant of Matt's solution:
map [ 1..4 ], (1,2)
I never use map EXPR, LIST style (probably I read that it could be confusing at an impressionable age...) but that does seem fairly legible.
This morning, Ranguard asked an interesting question on #london.pm:
11:27 <@ranguard> What's the best way of finding the common root of two paths,
e.g. for /a/b/c/d/e, /a/b/c/1/2/3 I want /a/b/c/ returned,
Path::Class::Dir has 'contains', but I'm not sure that's quite right?
In my copious free lunchtime, I thought I'd write a version of it in Haskell. Though Ranguard was very sensibly using Path::Class in Perl, I decided to just work on a list of values (in this case, characters).
First of all, let's zip the list of paths, together with a boolean indicating whether we matched or not:
> zipWith (\a b -> (a==b, a))
We'll want to only take the ones from the beginning for which the tuple contains True in its first element:
> takeWhile fst
And we only want the second element
> map snd
So we just build a pipeline of these by composing the functions with the dot operator:
> commonPrefix = map snd .
> takeWhile fst .:
> zipWith (\a b -> (a==b, a))
Of course I've cheated here. The map and the takeWhile are expecting just one argument, so can be composed fine in this pipeline, but zipWith takes two arguments. I've seen people rewrite their pipelines to accomodate this, but I think it's horrible and derived a function to compose a 1-arg function with a 2-arg one. Saizan suggested the conventional name (.:) and roconnor gave a cuter implementation than mine:
> infixr 8 .:
> (.:) = (.).(.)
(The infixr declaration is because (.) is infixr 9, but I couldn't get it to compile like that so played around with it at random. Yes, very scientific.)
> x = commonPrefix "abcde" "abc123"
I asked lambdabot for a points-free version of the lambda expression I pass to zipWith, but it came up with utter horrors. EvilTerran suggested the very clever looking:
> flip ((&&& id) . (==))
> -- or alternatively:
> ((,) =<<) . (==)
Vincenz came up with another, perhaps more sensible option.
> commonPrefix = map fst . takeWhile (uncurry (==)) .: zip
This just zips all the elements together, and takes only the equal ones.
Meanwhile, back in the Perl world, Ranguard posted a solution, which Ilmari modified to use the lovely List::MoreUtils:
#!/usr/bin/perl -l
use strict;
use warnings;
use File::Spec::Functions qw(catdir splitdir);
use List::MoreUtils qw(all each_arrayref);
my $paths = each_arrayref map { [ splitdir $_ ] } (
'/v/f/d/index.html',
'/v/f/d/i/a/s/s.gif',
'/v/f/a/s/s.gif',
);
my @root;
while (my ($part, @rest) = $paths->()) {
last unless all { $_ eq $part } @rest;
push @root, $part;
}
print catdir(@root);
Unlike the Haskell version, it takes any number of paths.
Which suggests an additional exercise: modify or create a Haskell function to find the longest common root of an arbitrary number of lists. Do let me know if you try this, and post your code (if you can get it past Vox's comments system ;-).
Greg confessed on #london.pm to having written a average function in Perl in FP style... recursively.
I asked him about this, as a perfectly good average function (using List::Util would be:
sub avg { sum(@_) / @_ }
which is perfectly fine FP style too. As far as I could understand it, he was reducing a tuple of sum and count. Though he said he wrote it just-for-fun, it's not entirely unreasonable if you're using a list-based language, where you'd otherwise have to traverse the list twice (in Perl, arrays know how long they are).
Out of curiosity, I tried to check how Haskell defined avg, but it seems that it doesn't.
LoganCapaldo suggested the following definition:
> import Data.List
> avg xs = sum xs / genericLength xs
We use genericLength because the normal length function insists on returning an integer, while genericLength returns a generic Number, which can be divided appropriately. (We'd otherwise have to write sum xs / (fromIntegral $ length xs))
The recursive function would work by summing a tuple, so that
avgs [1,3,5] => (9,3)
Once we have that we could divide the tuple using uncurry.
> avg' = uncurry (/) . avgs
As we're going to fold the number values into a tuple, we need a function like:
> sum_count s (t,c) = (t+s, c+1)
after which our definition is easily written as a fold
> avgs = foldr sum_count (0,0)
Which works just fine. But seeing the tuples, I thought of the comments from Joao and doserj, and thought of (&&&).
> import Control.Arrow
> sum_count s = (+s).fst &&& succ.snd
And byorgey commented that the .fst and .snd pair can be abstracted away using (***). So I tried
> sum_count s = (+s) *** succ
and was really rather surprised that it worked! Admittedly, the definition is now something that scares me and whose intent is far less clear than the first naive definition above:
> avg2 = uncurry (/) . foldr (\s -> (+s) *** succ) (0,0)
For extra bonus obfuscation points, making the lambda expression points-free is left as an exercise to the reader ;-)
Update: mauke pointed out that my Perl avg sub didn't work... added parens to sum (@_). Looks like naughty me didn't bother testing that code...
Some interesting comments on yesterday's Haskell post. I thought I'd write this in Perl to compare. Of course, we don't have an "inits" function in the standard library, but that's easily written:
#!/usr/bin/perl
use strict; use warnings;
use Data::Dumper;
my %hash = (
"key1" => 1,
"key2" => 2,
"other" => 3,
);
sub inits {
my $string = shift or return;
return ($string, inits( substr($string, 0, -1)) );
}
sub getPrefixMap {
my %hash = @_;
return map {
my $k = $_; my $v = $hash{$_};
map { ($_ => $v) } inits $k;
} keys %hash;
}
my %map = getPrefixMap( %hash );
print Dumper( \%map );
This is a bit noisier, but perhaps more straight-forward than the final haskell version...
Update: broquaint pointed out that the code listing above was missing the zero in the substr, breaking it. Thanks!
Haskell's prelude has a function words that splits a string by spaces.
Prelude> words "nice cup of tea"
["nice","cup","of","tea"]
Apparently the question comes up quite regularly on irc or haskell-cafe as to why this function is specialised to split only on whitespace. Perl's split, for example, can split on any character, or indeed string or regular expression.
As quicksilver has suggested, the split function is more complicated than you might think:
- character/string/regular expression
- include the split token in the list?
- collapse empty tokens?
and therefore perhaps the reason that there is no general function, is that it would require a very complex type.
I thought about this some more, and if I've got anything at all about functional programming, it's that you can build up progressively more complicated functions from smaller pieces. I wondered if I could do the same with split.
After playing a little with unfoldr, I decided I was better off using a simple recursive solution for now until I understand it better. But the first thing I need is a function to split a string just once.
> splitOne = splitOne' []
> where splitOne' acc p [] = (acc, Nothing, Nothing)
> splitOne' acc p xs@(x:xs') =
> let m = p xs
> in case m of
> Just (s, rest) -> (acc, Just s, Just rest)
> Nothing -> splitOne' (acc++[x]) p xs'
splitOne takes a function which either returns the separator and the rest of the string, or Nothing. We iterate the list of characters, stopping at the first where this function matches. Examples of these functions:
> onCharP p xs@(x:xs') | p x = Just ([x], xs')
> | otherwise = Nothing
> onChar c = onCharP (==c)
> onSpace = onCharP isSpace
> onComma = onChar ','
At which point we can do:
*Main> splitOne onSpace "nice cup of tea"
("nice",Just " ",Just "cup of tea")
*Main> splitOne onSpace "nice"
("nice",Nothing,Nothing)
So we now need to run for the whole length of the string, which is where the actual split function comes in.
> split t p [] = []
> split t p xs = let (tok,sep,rest) = splitOne p xs
> res = t (tok,sep)
> in case rest of
> Nothing -> res
> Just rest' -> res ++ split t p rest'
split takes a transformation function as well as a predicate. This takes the lists of (separator,token) and transforms them as required.
> onlyToken :: (t, t1) -> [t]
> onlyToken (x,_) = [x]
> -- onlyWord ("",_) = []
> onlyWord ([],_) = []
> onlyWord (x, _) = [x]
> tokenAndSep (t, Nothing) = [t]
> tokenAndSep (t, Just s) = [t,s]
This means that you can write the words function, as well as a function to split on commas, with different behaviours.
> words = split onlyWord onSpace
> commas = split tokenAndSep onComma
As quicksilver suggested, split does indeed have a rather complicated type:
*Main> :t split
split :: (([a1], Maybe a2) -> [a])
-> ([a1] -> Maybe (a2, [a1]))
-> [a1]
-> [a]
but the final function is simple enough. I did promise that we'd be able to split on words as well as characters, and this is why splitOne runs the predicate against xs instead of just the head of the list.
> onPrefix :: Eq a => [a] -> [a] -> Maybe ([a], [a])
> onPrefix = onPrefix' []
> where onPrefix' :: Eq a => [a] -> [a] -> [a] -> Maybe ([a], [a])
> onPrefix' acc [] s2 = Just (acc, s2)
> onPrefix' acc _ [] = Nothing
> onPrefix' acc s1 s2
> | (head s1) == (head s2)
> = onPrefix' (acc++[head s1]) (tail s1) (tail s2)
> | otherwise = Nothing
Which gives us:
*Main> split onlyWord (onPrefix "and") "sex and drugs and rock and roll"
["sex "," drugs "," rock "," roll"]
OK, this is still missing two important things from the Perl function:
- split on a regexp. You could use some parser combinator as the splitOne predicate.
- split a certain number of fields and then stop, like so:
DB> x split /,/, "red,green,yellow,blue", 3
0 'red'
1 'green'
2 'yellow,blue'This one, I haven't really thought enough about how to implement, without complicating things.
As usual when I start on Haskell, I'm missing some obvious way to make the API not suck horribly, so comments and criticism very welcome.
Update Nov 15:
Thanks to everyone who responded here and on reddit (it is a constant source of pleasure and amazement that people find these silly posts worth commenting on :-)
Andrew, vincentk, and geezusfreak commented that splitting a string is really a form of parsing, and therefore they would use a proper parser library like Parsec. Good point, though split in Perl is often “just good enough” as a lightweight solution. I'd like to see a solution in Parsec or similar though.
Mamie Camacho suggested to use Text.Regex:
I had a play with this and it's quite simple to do something like:
> splitRegex (mkRegex "\\s*,\\s*") "eggs,ham, whatever"
Haskell's string quoting is less pleasant than Perl's (having to quote the backtick) and this version doesn't seem to have the semantics of keeping the separator in capture brackets. (e.g. if you use the string “(\\s*,\\s*)“)
Other good suggestions included starting from the source of words and deriving a more general solution from that. Thanks again!