Banner

Haskell Study Plan

Posted by alpheccar - Feb 15 2007 at 18:13 CEST

I have recently convinced one of my friends to learn Haskell. But, he finds it very difficult. He is making a big effort and really wants to learn it but the beginning is hard and he may have spent too much time writing Java code.

So, to help him, I have written a short Haskell study plan. It may interest some other people so I decided to take the risk to post it on my blog. It is very informal and subjective but I think it is a good summary of the problems I had to solve myself to learn Haskell. So, when reading the post don't forget it was initially written for a friend and may look a bit too personal at some places. But, I don't have the energy to write another version.

Forget the imperative world

I think the most difficult for someone coming from the imperative world (Java, C etc...) is to look at program development in a totally different way : forget about state, control flow and time. A software is no more a succession of instructions changing a state.

You have to think in a totally different way : a software is a combination of data transformers. You have some very powerful filters that you apply to data. You have combinators used to create complex filters from more simpler ones. A combinator is itself a kind of filter. There is no more any idea of state or time evolution. Once you have described the transformations to be applied to data to get the solution of your problem, the execution of the software must be understood as an algebraic simplification of your transformations. When no more simplification is possible you have your result.

Control flow is needed for what ? Most of the time it is needed to process some data which have a recursive structure (like a tree) or to extract some data from a structured object. For the former case, you use the powerful data transformers fold, map, iter, scan ... For the latter case you use pattern matching. In the unfrequent cases when you need more, you just use a recursive function.

Of course, nothing prevent you from reintroducing the notions of control flow, state and time in Haskell but you do it in a controlled way and only when needed. Most of the time, you should avoid these notions. If you use them too often it means you are still thinking in the imperative way. You must think in an algebraic way : you are really using an algebra of programming.

The data transformers generally respect some rules. For instance :

Haskell Code by HsColour
map f . map g = map (f.g)

In C, that kind of equivalence cannot be asserted because the order of evaluation matters (because of side effects and non local state). So, it is not possible to define an algebra of programming. Something as simple as f(x) + f(x) = 2 f(x) is generally wrong in C. It depends on f. In Haskell f+f is 2f so you can define an algebra of programming.

So, the first step for learning Haskell is to start learning the algebra of programming. Look at the Prelude. Learn the basic functions like map, fold etc.. Learn how to use them.

Types

Types are your friends. In C, types are used as a documentation and as an help for the compiler to generate better code. But they won't protect you from runtime errors because you can easily cast a value into the wrong type.

Types in Haskell are much much more powerful. They are strong so will prevent most of the runtime errors (a division by zero will raise an exception and is not detected at compile time by the types). Haskell types are a partial specification of your functions but giving a lot of information. Types are constraints on what a function is allowed to do or not. (I'll talk about that later in this post).

The types are very flexible and it is the first strongly and statically typed language that I meet which is as pleasant to use as dynamically typed languages like Lisp. In fact, I only see one remaining reason for using dynamically typed languages : reflexivity when it is really needed (which is not so frequent). But I am less and less certain that advantage is compensating the problems with dynamic types : even if you are careful, are a good programmer, build the right abstractions you'll have too many details to check that could be more easily handled by the compiler. And, it is generally translated into much more unit tests and more exceptions at runtime and also more difficulty to prevent "leaky" abstractions.

So, in Haskell you have to learn a bit about how to decode types and understand them. No need to be an expert in type theory or logic. It is easy. For instance,

Haskell Code by HsColour
map :: (a -> b) -> [a] -> [b]

The type of map is saying that map is taking as argument a function of type a -> b and a list of type [a] and is returning a list of type [b]. From the type we can nearly deduce what map is doing to the list.

Another weird thing is that a function of two variables has a type like:

Haskell Code by HsColour
f :: a -> b -> c

and not

Haskell Code by HsColour
f :: (a,b) -> c

Why ?

Because you have to think about algebra : f 1 is not enough to get a result because f is needing two arguments but with one argument you can already start the simplification process and partially evaluate f.

So, f 1 is a new function which is needing one argument. Don't forget : evaluation of a program is simplifying an algebraic expression. Forget state, forget time, forget control flow.

Is it useful ? Of course !

Haskell Code by HsColour
map (+1) l

We add 1 to all element of a list. So, we have partially evaluated the function + using just one argument. We get a new function and we finish the evaluation of this new function on each element of the list.

The list may even be infinite !

Haskell Code by HsColour
map (+1) [1..]

Indeed, Haskell is lazy : it will first simplify the function definition and then it will simplify the arguments which are still needed. So, values will be computed only when needed.

An infinite list is a data generation process that will generate elements only when needed by the data transformer which is consuming the list elements.

In the previous example, we should limit the consumption since map is consuming every element:

Haskell Code by HsColour
take 4 $ map (+1) [1..]

Modules

When a software is beginning to be complex, you have to modularize it and control what is visible from the clients of your modules. Modules are used for that : they allow to "encapsulate" your implementation and control what is visible from the outside. Since there is nothing new here, I won't detail. Most languages have similar features with more or less power.

Class

When we write:

Haskell Code by HsColour
map :: (a -> b) -> [a] -> [b]

we have a function which can work with list of any type. It is the same function with the same definition for any kind of list.

Sometimes, we want to vary the definition according to the type. For instance, addition of integers, fractions and reals are implemented differently even if from a mathematical point of view we can see integers embedded into fractions and fractions into reals. So, we can say that it is always an addition.

First, we define the property of having an addition operation

Haskell Code by HsColour
class Add a where
  add :: a -> a -> a

Then, we say that some type has the property of having an addition and we define it

Haskell Code by HsColour
instance Add Int where
 add x y = ...

It looks like OO but it is different. The instance are not runtime objects. It is not 3 or 4 which are instances of Add. It is Int !

instance means that the type has some property. It is another way to specify what a program can do and constrain it.

For instance, if I later decide to build an average function, I'll need an addition so I'll have to give this information:

Haskell Code by HsColour
average :: (Add a) => [a] -> a

For any type which can be added, I am able to compute an average of a list of elements of that type ... huh ? No ! I need also a division by integers (length of the list). It is another property:

Haskell Code by HsColour
class IntegerDivide a where
 div :: a -> Int -> a

And then:

Haskell Code by HsColour
 average :: (Add a, IntegerDivide a) => [a] -> a

So, class are used to specify properties of the types. The properties are not always unrelated. For instance,

Haskell Code by HsColour
class (Eq a, Show a) => Num a where

Any type which has the property of being a Number has also the property of being testable for equality and has also the property of being printable. The => must be read backward.

Said differently : you need to have the property Eq a and Show a to be able to have the property Num a.

But, not all types which have the Eq and Show properties are numbers.

Here again, we see that types are specifications and are allowing the developer to control how the functions can be used or not.

There is one very important property that a type can have : being a Monad !

Monad

In previous sections I explained that types are useful as a partial specifications and as constraints.

Let's see a very common and useful example : a controlled environment. We want to compute in a controlled environment and be sure that nothing can leak out of that environment or only through specific gateways.

So, we have a type:

Haskell Code by HsColour
data Controlled a

Let's assume we have a string in that controlled environment:

Haskell Code by HsColour
myString :: Controlled String

We'd like to be able to compute the length of the string in the controlled environment. We cannot use the standard length function because its type is:

Haskell Code by HsColour
length :: String -> Int

(The standard length function has a more general type but it does not matter for the current explanation).

We'd like to have a function returning the length in the controlled environment since we want to prevent anything from leaking out of that controlled area. So, the type should be:

Haskell Code by HsColour
controlledLength :: String -> Controlled Int

But then we have a type mismatch. The function is expecting a String but we just have a Controlled String

But we are lucky, we have a magic operator : the bind operator which can extract the value from its controlled environment and transmit it to a function if and only if the function is returning its result in the same controlled environment. The bind combinator is >>= so we can write

Haskell Code by HsColour
myString >>= controlledLength

Now that we have computed the length in a controlled environment we'd like to be able to use that result in other parts of the software which are not under control. We can do it if the controlled environment is providing a function to escape from the controlled area. But, once we have escaped, the environment is lost. We cannot access it any more .

In general, when these escape functions are available they are named runXXX. So, here we may have a runControlled with type:

Haskell Code by HsColour
runControlled :: Controlled a -> a

The controlled environment may require some additional data so the function may be more general. It depends on the controlled environment where the computation is taking place.

That kind of controlled environment is called a Monad !.

So, you want do to IO, just compute in IO. You need state, Just use State s where s is the type of your state. You need indeterminacy and logic programming, just use LogicT or the simpler list monad. You want configuration information just use Reader. You need a log : just use Writer. You want to mix several controlled environments like a State and IO : StateT s IO.

People are liking OO because OO is providing a controlled environment : State is encapsulated and can only be changed through a controlled interface. Here we are generalizing the idea : we are not just encapsulating State, we are encapsulating computation strategies : state, control flow, side effects, containers etc ...

You cannot escape from IO because IO is controlled by the dark side : the operating system. the runIO is in fact main and the OS is the consumer of your IO actions.

Monad are possible thanks to the type system. Without the type system it will be difficult to prevent information from leaking out of your controlled environment so Monads no more make as much sense without a type system.

So, do you still think Monads are hard ? Is the idea of working in a controlled environment difficult ? Monads are cool : you can build the controlled environments that you want and combine them as you want. You can define computations for a given environment and reuse them later so that they become a bit like new keywords for a specific language. For instance, in the IO environment you can define a computation loop of type:

Haskell Code by HsColour
loop :: Int -> (Int -> IO a) -> IO a

Since the output type is IO a, you can reuse that function in any IO computation:

Haskell Code by HsColour
do
 putStrLn "Start of the loop"
 loop 10 $ \i -> do
   putStrLn ("Loop iteration:" ++ (show i))
 putStrLn "end of loop"

do is just a notation when the use of >>= may make the code less readable.

Phantom types

Type can constrain how to use functions and force some properties to be respected.

Haskell Code by HsColour
newtype List a s = List [a]

The previous definition is just defining a new kind of list. But, look at the s parameter on the left. It is not used on the right ! It is just a tag that will be used to force some properties to be respected.

Let's define two new types:

Haskell Code by HsColour
data Safe
data Unsafe

These types are special because there is nothing on the right side of the data definition. So there is no data constructor.

You won't be able to create a value of type Safe or Unsafe. These types are phantom ones. They won't be used at runtime but they will be used at compile time to tag definitions and force some constraints.

Haskell Code by HsColour
createList :: List a Unsafe

A new list is Unsafe because it is empty so we cannot get its head.

Haskell Code by HsColour
sTail :: List a s -> List a Unsafe

The result of the special tail (not the standard one from Prelude) is Unsafe because the resulting list may be empty

Haskell Code by HsColour
sAdd :: a -> List a s -> List a Safe

If you add an element to the list then the resulting list is Safe

Haskell Code by HsColour
sHead :: List a Safe -> a

sHead can be applied only to safe lists. So, the typechecker will prevent us from applying head to an empty list ! We have defined a property that we want to be respected by our lists and we have explained to the typechecker how to check it for us. Of course, it will work only if people use the API we have defined. So we must hide the List constructor. But module are used for that. We need to define a module where we control the API which is exported.

It is possible to do much much more with phantom types. I won't give an example because I would not like you to leave that study plan now. But remember that with types you are really writing partial specifications constraining how your functions are going to behave.

Existential types

We'd like to mix several values of different types in the same list ! So, we have to hide the types. We could do something like:

Haskell Code by HsColour
data H = forall a. H a

The type variable a is not visible on the left because of the forall. The type variable is not free. It is used by the forall. So, a is hidden from the outside. Now, you can write something like

Haskell Code by HsColour
myList = [H (1::Int), H "Hello", H (2.4::Float)]

But, it is not very useful if you cannot do something with these objects.

You cannot have access to the value hidden in H directly because:

Haskell Code by HsColour
f :: H -> a

will not typecheck. The a in the forall for H is distinct from the a defined by function f.

So, to be able to use the values hidden in H, you need to specify they have a common interface. After all, if you decided to mix them in the same list it is probably because you wanted to apply the same kind of transformation to each value.

So, let's define a class:

Haskell Code by HsColour
class MyOperation a where
 isNumber :: a -> Bool

And some instances:

Haskell Code by HsColour
instance MyOperation Int where
 isNumber _ = True
Haskell Code by HsColour
instance MyOperation Float where
 isNumber _ = True
Haskell Code by HsColour
instance MyOperation String where
 isNumber _ = False

and we need one more instance for H

Haskell Code by HsColour
instance MyOperation H where
 isNumber (H a) = isNumber a

but it won't work ! Indeed, we need to tell Haskell that H is not containing any kind of value but only the one with a isNumber function.

So, let's write:

Haskell Code by HsColour
data H = forall a. MyOperation a => H a

Now, you can mix several different kind of values in the same list and process them with a given function. But, even if all values have type H there is no risk of mixing a String with a Int. You have the flexibility and the security.

GADT : Generalized Algebraic Data Type

Imagine you'd like to build a calculator. You may define a new data type like:

Haskell Code by HsColour
data Term = I Int | V [Int] | Plus Term Term

But there is a problem : You could write Plus (I 1) (V [4,2,3]) where the list is a vector. Unfortunately, with the standard abstract data type you do not have enough flexibility to avoid this problem. With GADT you can use typechecking to prevent some values to be built because they have no meaning. So, you can define some new constraints:

Haskell Code by HsColour
data Term a where
  I :: Int -> Term Int
  V :: [Int] -> Term [Int]
  Plus :: Term a -> Term a -> Term a

Now our data constructor are not forced to create a value of type Term. We can create a value of type Term a and this a can be used to enforce some constraints a bit like with phantom types.

In the previous definition we tell Haskell that Plus can only be applied to Term of the same type and we also say that I and V are generating Term of different type.

So, we prevent the developer from building values like Plus (I 1) (V [4,2,3]).

Conclusion

So, to understand Haskell you need to forget time, control and state and think about data flow, data transformer, algebra of programming.

Evaluation of a program is simplification of an algebraic expression and not evolution of a State in time.

You need to learn types and how they are used to document, enforce properties, partially specify a behavior, control the evaluation environments.

You need to understand lazy evaluation which allows to separate data generation from data consumption and is key to be able to prevent optimization from leaking through the APIs.

For instance, you may develop as if a file was totally loaded into memory when it is in fact loaded block per block on a needed basis. But, as a consumer of the file content you do not need to know how the content is generated and made available to the client of the API.

Tags

Comments

Add a comment...

Posted by Artyom Shalkhakov - Nov 18 2008 at11:59 CEST

Thanks for the great post, now I think that I understand phantom types, existential types and GADTs. :)

Posted by alpheccar - Jan 12 2008 at17:55 CEST

You are right. The explanation about monads is not very good. Thank you for the clarification. I am sure it will help other readers.

But, there are so many monad tutorials on the web that I thought I could pass quickly on this part :-)

About the monad presentation

Posted by F Loyer - Jan 12 2008 at16:55 CEST

The article is very interesting, but I found the explanation about monads a bit confusing. The controlledLength is presented too much like a "method" which seems to permit an operation which would not be permited without it ( "we can't use the standard length function") . In fact, by definition, a monad must provide the "return" function which permits the definition :

Haskell Code by HsColour
controlledLength :: String -> Controlled Int
controlledLength s = return (length s)

This definition illustrate that inside the "Controlled" monad, every pure functions are allowed (no need to define a controlledLength) but since they can't access outside monad (IO, State...), they can't leak any value.

It would have been clearer (less abstract) to propose other functions which involve exchanges with a controlled/trusted system/environnement, than a pure function (length), such as ReadControlledDatabase/WriteControlledDatabase. With these functions, we could propose a Monad which permits any operation inside a database, but prevents the results to leak (like an UPDATE SQL command).

Such functions - whose the result is not "return ..." - are in most cases required to make a monad usefull and then I do think they should be present in a pedagic example.

Posted by alpheccar - Aug 01 2007 at20:55 CEST

You're totally right. The part about monad is not very good. I should have detailed more. I am insisting too much on one aspect of monads (tainting) and I am not explaining what I mean by computation strategies.

The problem with:

Haskell Code by HsColour
controlledLength :: Controlled String -> Controlled Int

is that you cannot extract the String from the controlled environment. Only >>= can and >>= is returning a simple String. So in fact, it is (>>= controlledLength) which has the type just above. >>= is allowing the function to work in the protected environment.

Monad

Posted by david48 - Aug 01 2007 at20:37 CEST

This post is excellent. However, I have a remark

In your monad example,

Haskell Code by HsColour
myString :: Controlled String

We'd like to have a function returning the length in the controlled environment since we want to prevent anything from leaking out of that controlled area. So, the type should be:

Haskell Code by HsColour
controlledLength :: String -> Controlled Int

For a beginner the obvious solution that comes to mind is :

Haskell Code by HsColour
controlledLength :: Controlled String -> Controlled Int

So you should explain why it should be String -> Controlled String is needed.

Posted by alpheccar - Jun 14 2007 at19:04 CEST

Thank you so much.

Best Overview

Posted by Lake Vibart - Jun 14 2007 at18:50 CEST

Thanks for the best overview of advanced haskell topics. For the first time i understood what existential types, phantom types and gadt's are all about. Thanks again!

Good remark

Posted by alpheccar - Feb 24 2007 at09:39 CEST

Good remark. I made the correction. When my posts are too long I tend to be sloppy at the end.

Posted by Pete - Feb 24 2007 at00:08 CEST

Small correction: "GADT" stands for "generalized _algebraic_ data type", not "abstract". Haskell doesn't directly support abstract data types, though you can define one by putting it in a module and not exporting its constructors.

A few additional words about the loop

Posted by alpheccar - Feb 22 2007 at20:57 CEST

You used a recursive function but you should have avoided. Indeed, most recursive functions are just processing of hidden recursive data structures. And, the Haskell library is containing lots of higher order functions whose role is to process recursive data structures : they implement "pattern" of recursion.

So, the loop may be written:

Haskell Code by HsColour
loop n stuff = mapM_ stuff [0..n-1]

and then

Haskell Code by HsColour
loop 10000 print

or

Haskell Code by HsColour
loop 10 $ \i -> do
   putStrLn ("Iteration : " ++ (show i))

Someone coming from a strict background may think it is not efficient. But, Haskell is a lazy language so the list in the definition of loop is never fully built. It is a data generator producing values from 0 to n-1 as they are needed by the data consumer.

You can compile the soft with the profiler enabled and check that loop 1000, loop 10000 or loop 1000000 are using approximately the same amount of memory.

So, here we have another example of data processing and not of control flow. The integers produced by the list are flowing through the data transformer mapM_ stuff.

Posted by alpheccar - Feb 22 2007 at19:04 CEST

I am surprised because it should work with Firefox. I'll test. Can you tell me more about the problem you had when trying to post a comment with Firefox ? (send me an email)

Thanks!

Posted by casey - Feb 22 2007 at04:25 CEST

Thanks for the post! I really like the way you clearly explained the Haskell type system in plain language, I feel like I finally get class and instance declarations, monads, and that >>= operator.

I wrote the loop function you described above, just to figure it out (hope this doesn't get mangled):

Haskell Code by HsColour
loop :: Int -> (Int -> IO a) -> IO ()
loop 0 stuff = do 
    return ()
loop i stuff = do
    stuff i
    loop (i-1) stuff

BTW, does anyone else have trouble using Firefox to post comments here? I had to use IE :(

Everywhere

Posted by alpheccar - Feb 17 2007 at21:40 CEST

So far, for everyday tasks I was using Perl, Ruby or shell scripting (I don't like Python but it is subjective). And otherwise, I was coding in Ocaml, C, ObjectiveC etc... But, since I have discovered Haskell I am using Haskell for nearly all the tasks : scripting, web development (server side), tools etc ... It is only for embedded system development that I keep on using C,C++ and assembly language.

I don't think there is a standard Haskell library containing the usual shell functions like rm, cp etc... So, if you want to do "cp f1 f2" you'll have to implement cp which is in fact probably easier to do in Haskell than in most languages. Here is an interesting page where several unix tools (similar to grep, rm ...) are implemented with one line Haskell programs.

So, for shell scripting there will be some additional work until you have defined your own library. For cgi, there is a cgi and fastcgi library. For HTML, you have several librairies available.

Haskell is not the perfect solution. There is none. But, it is the best I found so far. The simple things will continue to be simple to do in Haskell and the complex things will be easier.

You just have to look at program development in a totally different way which can be difficult. But, once it is done you'll find Haskell development to be very very easy and pleasant. I think that if you can pass the first two items of my post (forgetting the imperative way of thinking and understanding types) then the most difficult is done and after that learning Haskell should get easier and easier.

where do you use haskell?

Posted by Roman G - Feb 17 2007 at21:17 CEST

I also want to learn haskell and i'm reading a tutorial currently, thanks a lot for this explanation. My question is: where do you use haskell? I.e. shell scripts, cgi, etc. And how easy is it, i.e. is it easy to do 'cp f1 f2', etc? Thanks

Posted by alpheccar - Feb 17 2007 at12:15 CEST

funes, here is a clue on how to improve the phantom type example.

Instead of:

Haskell Code by HsColour
data Safe
data Unsafe

use

Haskell Code by HsColour
data Zero
data Succ a

and change the type of the functions.

sTail will require the use of class since you'll have to define it on List a Zero and List a (Succ s)

Posted by alpheccar - Feb 17 2007 at09:49 CEST

It is just an example. My constraint is not powerful enough. I am not counting the number of elements in the list. So, indeed, after a tail you have no way to know if the result is secure or not because the tail has no info about its argument. So, by default it is assumed to be insecure.

Posted by funes - Feb 17 2007 at00:30 CEST

The Phantom type example seems to be broken.

Haskell Code by HsColour
sHead (sTail (sAdd 2 (sAdd 1 createList)))

won't compile, since sTail constructs a List Int Unsafe.

Posted by alpheccar - Feb 16 2007 at21:05 CEST

The existential and GADT sections are less clear. I was getting tired :-)

I am happy to see that this post is useful. Initially, I did not want to post it because I imagined it would not interest anyone.

But my friend convinced me to post it. A good idea !

Excellent!

Posted by Justin - Feb 16 2007 at20:53 CEST

Your exposition of phantom types and Monads is really excellent. The phantom types section is especially clear and shows a very useful application of them.

The GADTs is pretty cool too but the example isn't quite as exciting as the Safe list example. Still, very readable and understandable.

Thanks!

Posted by alpheccar - Feb 15 2007 at20:52 CEST

I am happy if this post can help a bit. Personally it is the first point that caused me most trouble when I started coding with Haskell : forgeting imperative programming and thinking in a different way. After that, I found it was getting easier and easier to code in Haskell (and fun too).

Was looking for something like this

Posted by stefanha - Feb 15 2007 at20:45 CEST

Thanks for this post. I am starting to face problems like managing state as my Haskell programs become slightly more complex. Your post shows the way towards concepts needed to write anything bigger than a toy program.

Posted by alpheccar - Feb 15 2007 at20:11 CEST

But my friend is as crazy as me and GADT are so cool :-)

Posted by Neil Mitchell - Feb 15 2007 at20:00 CEST

I'm not sure I'd have put Phantom Types, GADT's or existentials into a study plan - I've been doing Haskell 4 years and have successfully avoided them.

I'd also plug Hoogle as another benefit for the type system, but thats just me :)

Typos ...

Posted by alpheccar - Feb 15 2007 at18:14 CEST

I apologize for the typos. There are several ones. I will correct them later.