F# and OCaml – differences in detail

F# is similar to the Caml programming language family, and was partially inspired by OCaml. However, the F# compiler implementation is a new re-implementation of a Caml-style language, and some relatively minor differences in the implemented languages exist, much as relatively minor differences exist between "Caml Light" and "OCaml".

One of the explicit aims of F# is to make it feasible to cross-compile F# code and other ML code such as OCaml code, at least for the core computational components of an application. Cross-compilation between OCaml and F# has been applied to very substantial code bases, including the F# compiler itself, the Abstract IL library and the Static Driver Verifier product shipped by Microsoft. However, F# and OCaml are different languages, so programmers need to be aware of the differences in order to develop code that can be compiled under both systems.

The OCaml language is documented at the OCaml website. This section documents the places where F# differs from the OCaml language, i.e., where OCaml code will either not compile or will execute with a different semantics when compiled with F#. This list was compiled with reference to OCaml 3.06, so later updates to OCaml may not be mentioned. In addition, the list does not mention differences in the libraries provided.

If any further differences are detected please contact the F# team and we will attempt to either fix them or document them.

In summary:

  • Functors, OCaml-style objects, labels and optional arguments are not supported. Some common functors like Set.Make and Hashtbl.Make are simulated by returning records of functions.

  • Some identifiers are now keywords.

  • Top-level initialisation effects occur when .dll is loaded.

  • Some minor parsing differences.

  • Strings are Unicode and immutable.

  • The "include" declaration is not supported.

  • Two top-level definitions with the same name are not allowed within a module or a module type.

  • Type equations may not be abstracted.

  • Constraining a module by a signature may not make the values in the module less polymorphic.

  • The C stub mechanisms to call C from OCaml are not supported. (F# has good alternatives that involve no coding in C at all.)

  • Parsers generated with fsyacc have local parser state, while ocamlyacc has a single global parser state code.

  • Module abbreviations are just local aliases.

  • Some additional restrictions are placed on immediately-recursive data.

  • The in_channel and out_channel abstractions do not apply LF/CRLF translation on Windows platforms.

Functors, OCaml-style objects, labels and optional arguments are not supported. The design of F# aims for a degree of compatibility with the core of the OCaml language and omits these features. The Set, Map and Hashtbl modules in the library all support pseudo-functors that accept comparison/hash/equality functions as input and return records of functions.

Some identifiers are now keywords. For example, the identifiers null and inline are keywords in F#. See the informal language specification for full details. Some identifiers are not keywords but are reserved for future use.

Some operator names are used for quotations. All operators beginning with <@ or ending with @> are reserved for use as quotation-processing operators.

Top-level initialisation effects occur when a .dll is loaded or a top-level module is first referenced . In OCAML the top-level value bindings are executed first by module, then by definition order within a module. In F#, the initialisation sequence within a module is still the definition order, however, module initialisation may occur at any point prior to the first use of any item within a top-level module. This need not be when the application starts. An application itself initializes by eagerly executing the bindings in the last module specified on the command line when the application was compiled.

There are some minor parsing differences. The syntax !x.y.z parses as !(x.y.z) rather than (!x).y.z. OCaml supports the original parsing partly by making case distinctions in the language grammer (uppercase identifiers are module names, lower case identifiers are values, fields and types). However, F# does not lexically distinguish between module names and value names, e.g., upper case identifiers can be used as value names. However, in order to maintain compatibility with OCaml one would then still have to modify the parsing of long identifiers based on case. Although we would prefer to follow the original OCaml syntax, we have decided to depart from OCaml at this point.

For some reason, OCaml allows constructor application without parentheses, e.g., A Some i gives type t = A of int option. F# rejects this syntax.

OCaml gives type annotations on patterns a precedence lower than that of tuple patterns. This means that in OCaml (x : int, y : int) is not a legal pattern, and (x, y : int) will give a type error since it is trying to assert that x,y has type int. In contrast F# binds type annotations with lower precedence, so (x : int, y : int) is legal and annotates each parameter. The OCaml approach is OK except when you have to start writing more type annotations, which is more common in F# code.

Other minor parsing differences may also be present in any particular release, since the F# parser is a complete from-scratch re-implementation of an ML-like language.

Strings are Unicode and immutable. This has a number of follow-on effects. For example, some of the library signatures differ, e.g., for the IO function input accepts a mutable byte[] buffer rather than a string. Chars are "wide characters", giving Unicode support at the expense of breaking the equivalence between characters and bytes. To convert between byte arrays and strings you must call library functions such as the following defined in the Bytearray module in F#'s mllib.dll:

let ascii_to_string (b:byte[]) = System.Text.Encoding.ASCII.GetString(b)
let string_to_ascii (s:string) = System.Text.Encoding.ASCII.GetBytes(s)

The "include" declaration is not supported.

Module abbreviations are just abbreviations. It is common for ML programmers to use abbreviations such as module M = Matrix in their module signatures and module implementations. These are just abbreviations: they do not define a new module, nor does other code referring to the given module see names such as M within the module namespace. This choice stems from the fact that the facility is nearly always used for local abbreviations, and many programmers are surprised to find that the abbreviations become part of the published interface of their module, and sometimes even stop using abbreviations as a result. This decision is reviewed from time to time and at some point we may support an --ml-compatibility option that supports the alternative treatment.

Two top-level definitions with the same name are not allowed within a module or a module type.

let x = 1 
let x = 3 

will give a compilation error. Duplicates are allowed in modules constrained by signatures.

Type equations may not be abstracted. Type equations (as opposed to new type definitions) may not be hidden by a signature.

For example, the type abbreviation

type x = int 

constrained by a signature

type x 

will give an error. But the following constructed type (here X is a data cosntructor)

type x = X of int 

constrained by the same signature will not, and

type x = int 

constrained by a signature

type x = int 

will compile.

Constraining a module by a signature may not make the values in the module less polymorphic. That is, values in modules may not be more polymorphic than the types given in the corresponding signature. For example, a compile-time error will occur if a module declares

let f x = x 

and the signature declares

val f : int -> int 

The value in the module must be constrained explicitly to be less polymorphic:

let f (x:int) = x 

This can be annoying because extra type annotations are needed, but greatly simplifies compilation. In addition, the code produced turns out to be more efficient.

Some additional restrictions are placed on immediately-recursive data. OCaml supports "recursion through data types using 'let rec'" to create "infinite" (i.e., self-referential) data structures. F# both extends this feature (see the advanced section of the manual) and places some additional restrictions. In particular, you can't use recursive 'let rec' bindings through immutable fields except in the assembly where the type is declared. This means

       let rec x = 1 :: x

is not permitted. This restriction is required to make sure the head/tail fields of lists are may be made immutable in the underlying assembly, which is ultimately more important than supporting all variations on this rarely-used feature. However, note that

       type node = { x: int; y: node}
       let rec myInfiniteNode = {x=1;y=myInfiniteNode} 

is still supported since the "let rec" occurs in the same assembly as the type definition, and

       type node = node ref
       let rec myInfiniteNode = { contents = myInfiniteNode } 

is supported since "contents" is a mutable field of the type "ref".

(When compiling for .NET Versions 1.0 and 1.1 only.) There are two kinds of array types The first is the truly polymorphic set of F# array types, i.e., 'a array. These are correctly polymorphic in the sense that you may write new polymorphic code that manipulates these values. However, because of the lack of support for generics in the CLR these array types are always compiled to the .NET type object[]. A rich set of polymorphic operations over these array types is provided in the Array module.

.NET array types are also provided, e.g., int[] or string[]. These are not truly polymorphic in the sense that the F# compiler must be able to infer the exact .NET array types manipulated by any code you write. If you want to write new polymorphic operations over these types then you must duplicate your code for each new array type you wish to manipulate. (This also means you can't use these types as building blocks for new F# data structures such as hash tables – use the polymorphic array types above instead. This is what the built-in Hashtable module does.) A rich set of pseudo-polymorphic operations over these array types is provided in the Microsoft.FSharp.Compatibility.CompatArray module. These are pseudo-polymorphic because the code will be duplicated and type-specialized at each callsite.

The C stub mechanisms to call C from OCaml are not supported. Instead, [<DllImport(...)>] attributes can be used to declare stubs directly in F#. Pinning and allocation can also be done ddirectly from F#. The F# Wiki has extended notes on using C from F#.

Parsers generated with fsyacc have local parser state, while ocamlyacc has a single global parser state code. Parsers generated by fsyacc.exe provide location information within parser actions. However, that information is not available globally, but rather is accessed via the functions available on the following local variable which is available in all parser actions:

  parseState : 'a Microsoft.FSharp.Primitives.ParserState.Provider
  

However, this is not compatible with the parser specifications used with OCamlYacc and similar tools, which make a single parser state available globally. If you wish to use a global parser state (e.g., so your code will cross-compile with OCaml) then you can use the functions in this file. You will need to either generate the parser with '--ml-compatibility' option or add the code

  Parsing.set_parse_state parseState;
  

at the start of each action of your grammar. The functions below simply report the results of corresponding calls to the latest object specified by a call to set_parse_state.

Note that there could be unprotected multi-threaded concurrent access for the parser information, so you should not in general use these functions if there may be more than one parser active, and should instead use the functions directly available from the parseState object.

Missing compatibility modules.

OCaml programmers will notice that compatibility wrappers for OCaml libraries are not always available with the distribution. The F# community and Wiki may provide pointers or samples that provide these wrappers. It is also sometimes simpler to access the .NET Framework Class Library dirctly, e.g., it may be easier to use System.Text.RegularExpressions than to use the OCaml Regexp package, of System.Net.Sockets for the socket portion of the OCaml Unix library and System.Windows.Forms as a basic windowing package.

The in_channel and out_channel abstractions do not apply LF/CRLF translation on Windows platforms.

On Windows, OCaml's open_in abstraction opens text files in a mode where output_string "\n" (i.e. LF or line feed) outputs the two characters "\r\n", and this mapping is reversed on output. This translation is not applied by F#: output_string "\n" will write a single LF character, and output_string "\r\n" must be used explicitly to get the two characters. The same applies to uses of printf, fprintf, eprintf and related functions.