==============================
 Translation of MiniML to IMP
==============================

The translation of our functional language MiniML to the core imperative
language IMP contains three main tasks:

- decomposing MiniML expressions into a sequence of instructions
- implementing MiniML functions as closures
- decomposing pattern matching as cascades of 'if's

This folder contains a code skeleton for performing the translation in
two steps :

1/ closure conversion: translate MiniML into an intermediate language Clj,
   where MiniML functions are turned into global function definitions and
   closures
2/ generation of IMP code, from the intermediate Clj representation
   (includes compilation of pattern matching)

Note: MiniML and IMP are slightly different compared to the previous parts
where they appeared
- MiniML is simplified to accept only patterns with exactly one constructor;
  if you want to deal with nested patterns, you may revert to the MiniML
  definition and parser from first part (conversely, you may also use this
  simplified version in the first part if you struggle with patterns)
- IMP is extended with pointers, both for data and for functions


Contents
========

This folder contains the following elements.

Language definitions:
  - miniml.ml   Source language
  - clj.ml      Intermediate language
  - imp.ml      Target language
  - ops.ml      Shared definition of operators

A working compiler from IMP to MIPS (including pointers)
  - impparser.mly, implexer.mll     parsing
  - imp2mips.ml                     simple translation
  - impc.ml                         main file of the compiler
  - malloc.imp                      naive allocation library

A skeletal compiler from MiniML to CLJ
  - parser.mly, lexer.mll       parsing
  - mini2clj.ml                 first step of the translation [TO BE COMPLETED]
  - clj2imp.ml                  second step of the translation [TO BE COMPLETED]
  - minimlc.ml                  main file of the compiler
  - imppp.ml                    pretty-printer for IMP (writes the output in a file)


Main Task
=========
You have to complete the two files [mini2clj.ml] and [clj2imp.ml] to produce
a compiler from MiniML to IMP, covering as many features of MiniML as you can.

Hints
-----
Separating concerns allow you to test the basic parts of the compiler before
going further. For instance, at the beginning you may assume that all variables
in the source program have different names, and implement name disambiguation
only in a second step.

Here is a suggested order, allowing language features to be tested independently:
1. Constants and arithmetic operations (tests/arith.ml)
2. Variables and local definitions (tests/let.ml)
3. Pairs and projections (tests/pair.ml)
4. Conditional expressions (tests/if.ml)
5. Functions and applications (tests/fun1.ml and tests/fun2.ml)
6. Recursive functions (tests/fact.fun)
7. Constructors and pattern matching (tests/trees.ml)
You are encouraged to add more test files for each of the features.

Your first priority should be to have a subset of the MiniML language that
actually works. Coverage comes only second.


Additional information
======================

Pairs and projections
---------------------
The constructors Fst, Snd and Pair are included in the common module Ops,
but they are not valid IMP operators. The MiniML compiler has to translate
them to valid IMP operations.

For this you need to choose an actual representation of pairs as IMP data
structures. Suggested representation for a pair: a pointer to a block of
memory with two main fields. You may also include a first field serving as a
header.

Then creation of a pair requires:
- allocating a new block (and storing a pointer to the block)
- initializing the header and the two main fields with the values of the
  two components of the pair
- returning the pointer
The pointer can be stored in a new local variable introduced on the fly.

Structures
----------
Similarly to pairs, MiniML data structures can be implemented in IMP as
pointers to memory blocks, with one header (or more if needed) and as
many data fields as necessary. One of the fields has to identify the 
constructor. For this, it is enough to associate a number to each constructor.

Pattern matching
----------------
A pattern matching operation can be translated in IMP as a cascade of ifs,
that test the constructor numbers. Then, when the appropriate branch is found,
the pattern variables have to be initialized with the values of the appropriate
substructures.


Extensions
==========
These are suggestions of elements you can add to your compiler, or other
things you can explore after this project.

Unified project
---------------
The three parts of the project (MiniML typechecker & interpreter, IMP optimized
compiler, MiniML compiler) can be kept separate, or merged as a single projet:
an optimized MiniML compiler. Some of the extensions below make sense only
in the merged project.

Structural equality
-------------------
For now, your test of equality on data structures is probably implemented as
physical equality: test whether the pointers are equal. In OCaml however,
this test should compare the constructors and the values of each field (which
may trigger recursive comparison of substructures!).

You may consider two strategies for implementing structural equality:
- use the typing information to translate each MiniML equality test to a
  dedicated IMP code, that implements precisely the adequate exploration
  (warning: not compatible with polymorphism)
- write in IMP a generic function for testing structural equality of any
  two structures (for this, the runtime must be able to distinguish immediate
  values from pointers)

Exhaustive pattern matching
---------------------------
For any pattern matching operation, check whether the list of cases is
exhaustive. You may also try to detect redundant cases.

Note: this is easy with simple patterns, but far from trivial if you also
implemented nested patterns.

Improved pattern matching
-------------------------
Add to the MiniML compiler the ability to deal with nested patterns, or
other forms allowed in OCaml (or-patterns, naming with as...)

Prettier printer
----------------
Modify the program printer for IMP (file imppp.ml) to improve the text
produced. For instance, try to avoid unneeded parentheses, and pay more
attention to indentation and linebreaks.

Memory allocation
-----------------
Implement memory allocation and garbage collection, with a strategy of your
choice. This memory allocator can be written in IMP, and included in the
file malloc.imp (which is already included in the code compiled by impc).

Note: for garbage collection, the runtime must be able to distinguish
      immediate values from pointers.

Function inlining
-----------------
Good programming habits often imply writing several small functions for 
common elementary tasks. On the other hand, code with many small functions
may be inefficient for (at least) two reasons:
- function calls themselves are costly,
- optimizations are performed mostly at the scale of a function, and
  thus function boundaries prevent optimizations.
Solution: during compilation, inline the calls to (nonrecursive) functions 
          that are "small". This means replacing a function call by a copy
          of the code that should be executed (and making sure that every
          variable points to the right value).
Note: this can be implemented in the MiniML or in the IMP compiler,
      but has a strong requirement: the called function must be known
      statically.

Tail call optimization
----------------------
Optimize tail calls to avoid building a new call frame.

This extension is mainly about the IMP compiler, since FUN function calls
are translated into IMP calls.
