Miscellaneous announcements v2

First of all, I've to announce a new project (Like you don't have enough projects running…Hush you, they don't need to know that…) : an implementation of file(1) using as it's data source shared-mime-info, a freedestop spec for MIME information storage usually used by Linux’s DEs for getting the type of a file.
The goals are to avoid duplicating file type information between file(1) and the DE, for said implementation to conform to the relevant POSIX spec and for it to be able to act as a drop-in replacement for Ian Darwin's file.
That project's repo can be found here

Second, a long in the making project of mine is at long last launched: I've moved to Australia! I'm in Melbourne since the 13th of January.

Compiler project: Miscellaneous announcements

Compiler project’s information

First of all, I want to Shayan at Clean Typecheck for his GSoC on adding type-checking and name resolution capabilities to Haskell-Source with Extensions. That will greatly facilitate writing the front-end.

Second, I may very well implement versions of Haskell pre-dating Haskell 98. After all, I got my hands on the preceding versions’ standards (see here), so why not do a old-standards compliant compiler…if that’s feasible, of course.

Lastly, I have chosen the name of the compiler. It shall be named lhc for, at your choice: Loïc’s Haskell Compiler, Lambda Haskell Compiler, a reference to the Large Hadron Collider, Light Haskell Compiler, Le Haskell Compiler, Last Haskell Compiler. An alternative spelling is λhc.
P.S. : The two last names are courtesy of a good friend, homer. Go bug him there !
P.P.S. : You avoided, amongst other more or less sane ideas, the THC.
P.P.P.S. : I’ll try, amongst other insane ideas, to enforce the alternative spelling as a possibility to call the compiler.

Other information

Ok, I forgot to celebrate my first Hackage package (that was hs-json-rpc)…I’ll redeem myself by celebrating my first contributions to a wider project in which I am not one of the founders (No, my contributions to Genetic Invasion don’t count for that milestone). Let’s get that show on the road!

I said “contributions”, so first contribution: I contributed a patch to Evolving Objects because the library didn’t compile with Visual C++ when OpenMP was used. I had this patch in a private version of the library, used to compile Genetic Invasion for Windows, but given that this private branch served exclusively under Windows, I didn’t knew that a terrible error in my modifications nuked compilation under Linux…I cleaned up the patch and submitted it once I became aware of that error.

Second one, I made a library proposal for GHC (see here). Nothing big, but it was a good introduction to Haskell’s library proposal process.

Compiler project: Design, first phase

First target choice

I choose to have as first target .Net’s virtual machine because it is — at least theoretically — made to support various paradigms, it already has support for:

There also is a package on Hackage to write and manipulate code in .Net’s intermediate language. Ah, and it supports unsigned integral types unlike, for example, the JVM.

Type mappings

Haskell’s values will be mapped to Lazy<T> values with the exception of functions values who will be mapped to Func<T, TResult> values and tuples who will be mapped to Tuple<T> values. It should be noted that because of .Net’s Func<T, TResult> delegate implementation, it can’t represent functions of more than 16 variables, so the compiler should automatically curry functions of 16 variables or more when they are used as values. .Net’s tuples going only up to 8-tuples, the last element of a 8-tuple will be used to store a tuple containing the rest of the tuple, that is a one-tuple for 8-tuples, a pair for 9-tuples, a triple for 10-tuples, et cætera.

N.B. : The limitation to 16 variables for a function value will probably stay a limitation of the implementation, but to comply with Haskell 98, I must support tuples up to 15 elements.

Haskell’s numeric types will be mapped to .Net’s numeric types, e.g. Integer will be mapped to BigInteger. Arrays will be mapped to .Net’s arrays, or to a class backed by a .Net array. Character handling will most certainly be a tall order: I would like to have a string type backed by .Net’s native string type but it’s a UTF-16 encoded string type giving only access to enumerators on, length in and indexation of 16-bit entities. StringInfo is better in this aspect, but it gives access to the .Net concept of a text element, something akin to a grapheme.

I will have to find a mean to export type aliases, but I’ll probably implement newtypes as classes extending the base type and adding nothing to the base type, as well as modifying nothing I don’t need to.

Module mappings

Each Haskell module will be mapped to three things:

  • A namespace, for the scoping role of the module
  • An assembly or a netmodule, for the “hiding” role of the module
  • A set of classes (one for each datatype and most probably one for the functions) for the code

In its scoping role, a module named X.Y will correspond to a namespace named Haskell.X.Y. In its role to hide elements, it will probably correspond to an assembly.

Notes & musings

N.B. : The following paragraph contains open questions for the implementation. All suggestions are welcomed.

I’ll perhaps implement typeclasses with a map linking types with dictionaries, but I will need to decide one thing: do I enter only the type without the Lazy<T> wrapping, only the type with the Lazy<T> wrapping or both ? And if I enter both, should I just enter a generic function forcing the value and calling the version without the Lazy<T> wrapping ? Or, at least, should I just do that for user-defined types ? And if I do that, shouldn’t I offer a option (with a pragma perhaps) to do the inverse, i.e. consider that the default version is the one with the Lazy<T> wrapping and the dictionary I create for the strict type just wrap the value and call the version for the lazy type.

There is another delegate family, Action<T>, for functions that do not return a value. Should I use it for function returning () ? And should I use Func<TResult> for functions that only take a () as parameter ? Of course, this paragraph questions are for the case where a function is used as a function value, but they are nonetheless interesting.

Compiler project: The beginning

This is the first post of a series concerning one of my craziest projects: writing from scratch an Haskell compiler. This post will be about the context of that project and a “mission statement”. Let’s get that show on the road !


Amongst my many projects is an Haskell compiler targeting virtual machines or other such environments. One reason I have this project is because I like Haskell, and writing a compiler for it seems to be a good way to progress in the language. Another reason is that I am curious about compilers, and writing one — even for as peculiar a language as Haskell — seems to be a good way to quench that curiosity.

The project is to write an Haskell compiler, targeting virtual machines beginning with .Net’s one but it may in the future target also the JVM or Parrot. In the future, it might even target more exotic platforms such as WAM, SQL, OpenCL, OpenGL or PostScript. One goal of that choice is portability, another is that this toy compiler not be totally useless.

Objectives, wistful goals & non-objectives

N.B. : In the following, when I talk about an “host”, I am talking about the target for which we generate code, be it .Net’s VM, the JVM or SQL.

We’ll begin with what the compiler should do when it’s finished:

  • Compile code conforming to both Haskell 98 and Haskell 2010, with possibility to choose between both standards.
  • Compile code using common extensions, be they syntax extensions (e.g. monad comprehensions) or library ones (e.g. Concurrent Haskell).
  • Facilitate as much as possible calling to/from host functions.
  • Use as much as feasible the host’s standard library to implement Haskell’s one.
  • Map as far as is reasonable Haskell’s concepts to the host’s ones (e.g. Haskell’s packages).

The first two points are par for the course in a compiler, but note that points 3 to 5 are here because of an objective to generate code as “transparent” as possible, from the host point of view.

We’ll continue with what the compiler might do if I have the time:

  • Have an option to generate code with a non-strict non-lazy evaluation strategy (e.g. a classic call-by-name or a more exotic call-by-future).
  • Experiment with code generation (e.g. automatic memoization)
  • Facilitate as much as possible other FFI calls.
  • Use as much as possible the host’s facilities for debugging, profiling…
  • Be able to self-compile a working version of its Haskell parts
  • Be able to target from one compiler all other targets

It may take some effort, but it would be awesome to be able to seamlessly interact between lazy and non-lazy code, with both code bases keeping their non-strict semantics and without needing a recompilation of the called code. The calling code might then need informations on the called code evaluation strategy.
P.S. : While the last point is reasonable for hosts such as .Net or Parrot, at the extreme limit OpenCL or OpenGL, I think it is not the case for hosts such as SQL or PostScript. Thus, I wouldn’t hold my breath in having a PostScript-based compiler targeting SQL, for example.

We’ll conclude with what the compiler won’t even pretend to do:

  • Be an highly optimizing and efficient compiler: It is a pet project of mine, after all.
  • Have a stable behaviour from one target to another: I wouldn’t bat an eyebrow if .Net’s version is a lazy implementation and the JVM’s one is a non-strict non-lazy one.
  • Generate interoperable code from one version to another, at least in the beginning.

Yes, that does mean that performance will not be my primary goal: correctness will be difficult enough a goal, I’m afraid.

Implementation decisions for my JSON-RPC client (part 1)

Welcome to the second blog post of this series consacred to my project of a JSON-RPC client implementation in Haskell (the first post may be consulted here).

What are the implementation choices ?

I had to choose between version 1 and version 2 of the protocol. I have chosen to implement both versions using the second version as the default protocol. The reasons for this choice were the following:

  • Most other implementations support version 2, so there’s not a great incentive not to implement version 2. However, you might be forced to work with a legacy server, like I had been during my tests. Thus, having also version 1 is a Good Idea™.
  • Annoyingly enough, version 1 mandates the existence of an error object but doesn’t specify required fields…
  • Yet another annoyance concerns ids, version 1 specifies that a method call id is of any type but mandates that a notification id be of null type. This slight imprecision is usually dealt with by using only non-null ids as call ids, but still… To be fair, version 2 is also annoying here: a method call id is a number, a string or null but a null id must be attributed to syntactically incorrect requests. This “problem” is dealt with by, once again, using only non-null ids.

I do not, for the time being, implement batch requests (given that the API map an Haskell function call to a JSON-RPC communication), neither do I implement positional arguments, nor do I implement other transports than HTTP POST requests. I might in the future try to adapt the API to support batch requests and to support other transports (HTTP GET, TCP streams…)

What is the API ?

The API, as well as the code, was heavily inspired by HaXR (which, if you think about it is only fitting given JSON-RPC was inspired by XML-RPC)
Exposed to the client, there are two typeclasses: JsonRpcCall (representing remote calls) and JsonRpcNotification (representing notifications). Both serve to retrieve and marshal the remote function’s parameters, call it and, in JsonRpcCall‘s case, retrieve and unmarshal the result. Both have an instance for functions taking an instance of ToJSON (that is, a type marshallable to JSON, this typeclass comes from aeson) and returning an instance of the typeclass. JsonRpcCall has also an instance for (FromJSON a) => IO a (type unmarshallable from JSON, lifted in IO, also comes from aeson) and JsonRpcNotification has an instance for IO (), which is normal given a notification doesn’t care about the server answer. The IO part of the return type shouldn’t be a surprise: after all, they take their result from the network…
There are also two data types: JsonRpcVersion representing the protocol’s version used in a given call and JsonRpcException representing errors in JSON-RPC, be that errors in the remote functions usage, in the (un)marshalling of the messages, or even type mismatches between the remote function and its caller.
Finally, there are four functions: remote and notify which generate a remote call (resp. notification) using JSON-RPC version 2, HTTP POST as transport protocol and without custom elements in the generated JSON. Their signatures should always be explicit: they take two Strings in arguments (the server’s URL and the method’s name) and return a JsonRpcCall (resp. JsonRpcNotification). The other two are detailledRemote and detailledNotify which take, before the two Strings taken by the basic versions, a JsonRpcVersion and a [Pair], representing respectively the protocol version used for this call and the key-element pairs to be added to the JSON object representing the request, in addition to the standards elements.

And after ?

The next blog post in this series will talk about the details of the implementation of this client, and how to use it.
In the meantime, you may find the source code of the client here.

What is JSON-RPC ?

Welcome to the first blog post of this series consacred to my project of a JSON-RPC client implementation in Haskell.

Definition and description

JSON-RPC is a lightweight RPC protocol defining an encoding, JSON, and a transport, HTTP. It also defines notifications, requests not needing a response, and the protocol’s second version defines another possible transport, TCP/IP sockets, and a means of batching calls and notifications. However, due to its simplicity, it does not define neither authentication nor means of querying the server about implemented functions.

Why will I implement it ?

First of all, I’ll implement it because it is a simple textual RPC protocol, specifiying an already implemented in Haskell transport protocol. Another reason is that’s there are many implementations of this protocol, giving me the possibility to test my client against an already existing server. And it serves as a test of my capacities as an Haskell developper and as a spec reader. And there’s the fact that previous experience have told me that trying to tackle everything at once in my quest to be a better Haskell programmer is a bad idea: I’ll start with networking (without taking care of protocol details)… I’ll leave writing the network encoder and taking care of low level details to my next project…

Why Haskell ?

Cf. my post on RFC 707.

And after ?

The next blog post in this series will be consacred to the API my library will have and to the implementation choices I’ll make. After all, I have to choose between two versions of the protocol.

After this project, I’ll probably get back to my project of an RFC 707 implementation.

What is RFC 707 ?

Welcome to the first blog post of this series consacred to my project of a RFC 707 implementation in Haskell.

Definition and description

RFC 707, whose formal name is “A High-Level Framework for Network-Based Resource Sharing” and which was published on the 14th of January 1976, is a primitive system of Remote Procedure Call.
It describes the manner in which a networked procedure call is to be done, the format each message (call and return) must abide by and the binary encoding of each message. In essence, it is the ancestor of ONC RPC, CORBA, SOAP… However, it doesn’t describe the transport protocol used, the ports used, any form of authentication… It has the inconvenient of being underspecified enough that two implementations don’t have much chance to understand each other and the advantage of being very lightweight.

Why will I implement it ?

First of all, I’ll implement it because it is a simple binary RPC protocol, a stepping stone to my project of an ONC RPC client implementation. Another reason is that’s there’s to my knowledge no implementation of this protocol, so it gives me more freedom to handle unspecified parts of the protocol, like how should I handle floating-point numbers. And it serves as a test of my capacities as an Haskell developper and as a spec reader. And there’s the fact that RPC systems are often distributed file systems’ basis, such as NFS or 9P.

Why Haskell ?

Because I appreciate and want to learn this language, and such a library seems like a good idea to me: after all, monads (and I/O in particular) is often cited as THE stumbling block for learning and mastering this language.

And after ?

The next blog post in this series will be consacred to the API my library will have and to the implementation choices I’ll make. After all, like I have said above, this RPC method is woefully underspecified by its RFC.

After this project, I’ll tackle either the ONC RPC client implementation or the AWT curses implementation.