Update for 2020

This commit is contained in:
Alex Hirsch 2020-02-05 13:35:04 +01:00
parent 7eee3ca736
commit 09f2f06750
5 changed files with 175 additions and 319 deletions

View File

@ -1,14 +1,25 @@
# Compiler Construction # Compiler Construction (Draft)
| Date | Deadline | | Date | Topic / Recommended Schedule / Deadlines |
| ---------- | ------------------------------------------ | | ---------- | ----------------------------------------- |
| 2019-03-15 | [Example Input](example_input.md) | | 2020-03-03 | Introduction |
| 2019-04-05 | [Milestone 1](specification.md#milestones) | | 2020-03-10 | Lexer complete |
| 2019-05-03 | [Milestone 2](specification.md#milestones) | | 2020-03-17 | |
| 2019-05-24 | [Milestone 3](specification.md#milestones) | | 2020-03-24 | |
| 2019-06-14 | [Milestone 4](specification.md#milestones) | | 2020-03-31 | Parser complete |
| 2019-06-21 | [Milestone 5](specification.md#milestones) | | 2020-04-07 | *no proseminar* |
| 2019-07-12 | [Final](evaluation_scheme.md) | | 2020-04-14 | *no proseminar* |
| 2020-04-21 | Semantic checks complete |
| 2020-04-28 | |
| 2020-05-05 | AST → TAC conversion complete |
| 2020-05-12 | |
| 2020-05-19 | TAC → ASM (no function calls) complete |
| 2020-05-26 | |
| 2020-06-02 | TAC → ASM (with function calls) complete |
| 2020-06-09 | CFG generation complete |
| 2020-06-16 | Polish |
| 2020-06-23 | Build test submission deadline |
| 2020-07-14 | Final submission deadline (no extensions) |
- [mC Compiler Specification](specification.md) - [mC Compiler Specification](specification.md)
- [Getting Started Code-base](https://git.uibk.ac.at/c7031162/mcc) - [Getting Started Code-base](https://git.uibk.ac.at/c7031162/mcc)
@ -19,39 +30,46 @@
The ultimate goal of this course is to build a working compiler according to the given specification. The ultimate goal of this course is to build a working compiler according to the given specification.
You are not allowed to use code from other people participating in this course or code that has been submitted previously by somebody else. You are not allowed to use code from other people participating in this course or code that has been submitted previously by somebody else.
However, a *getting started* code-base is provided. A *getting started* code-base is provided, but you can also start from scratch.
You will be able to work on your compiler during the lab. During the lab, short QA sessions will be held.
You can work on your compiler in the meantime.
I'll be present for questions all the time, yet a big part of this course is to acquire the necessary knowledge yourself. I'll be present for questions all the time, yet a big part of this course is to acquire the necessary knowledge yourself.
Please note that minor modifications may be made to the specification until 1 week before the final deadline. Please note that minor modifications may be made to the specification until 2 weeks before the final deadline.
Therefore, double check for modifications before submitting — Git provides you the diff anyway. Therefore, double check for modifications before submitting — Git provides you the diff anyway.
Apart from this, there will be one *required* submission near the beginning of the semester.
You have to submit an additional example input, which may be added to the set of example inputs — this way the number of integration tests is extended.
Furthermore, there are five *optional* milestones.
They provide a golden thread and enable you to receive feedback.
You may work together in teams of 13 people. You may work together in teams of 13 people.
Teams may span across pro-seminar groups. Teams may span across pro-seminar groups.
### Programming Language
Any of the following programming languages can be used:
- modern C (used for the getting started code-base)
- modern C++
- Go
- Rust
- Haskell
Go easy on external dependencies and obscure language extensions — yes, I'm looking at you, Haskell.
Code readability is paramount.
Using overly complex and cryptic concepts may negatively impact the evaluation process — again, looking at you, Haskell and your voodoo magic lenses.
### Evaluation System
I'll be using a virtualised, updated Ubuntu 20.04 LTS (64 bit) to examine your submissions.
From this you can infer the software versions I'll be using.
The submitted code has to compile and run on this system.
## Grading ## Grading
The final grade is computed as the weighted average of the final submission (80%) and the QA sessions (20%). The final grade is computed as the weighted average of the final submission (80%) and the QA sessions (20%).
Both of these parts as well as the majority of QA session grades must be positive to pass this course. Both of these parts as well as the majority of QA session grades must be positive to pass this course.
Other submissions are not graded. Be sure to adhere to the specification, deviating from it (without giving proper reason) will negatively impact your grade.
Be sure to adhere to the specification, deviating from it (without stating a proper reason) will negatively impact your grade.
See [Final Submission Evaluation Scheme](evaluation_scheme.md) for more details. See [Final Submission Evaluation Scheme](evaluation_scheme.md) for more details.
### Evaluation System
I'll be using a virtualised, updated Ubuntu 18.04 LTS (64 bit) to examine your submissions.
From this you can infer the software versions I'll be using.
The submitted code has to compile and run on this system.
### Absence ### Absence
You must not be absent more than three times to pass this course. You must not be absent more than three times to pass this course.

View File

@ -1,17 +1,14 @@
# Final Submission Evaluation Scheme # Final Submission Evaluation Scheme
Each checkbox represents 1 point to score.
The following key is used for calculating the resulting grade: The following key is used for calculating the resulting grade:
- **1:** ≥ 92% - **1:** ≥ 90%
- **2:** (92%, 84%] - **2:** [80%, 90%)
- **3:** (84%, 76%] - **3:** [70%, 80%)
- **4:** (76%, 68%] - **4:** [60%, 70%)
- **5:** < 68% - **5:** < 60%
It is required that for the *mandelbrot* test input, a respective executable can be built and run successfully. Points will be subtracted for shortcomings discovered during evaluation.
Points *may* be subtracted for shortcomings not explicitly listed in this form.
This includes things like: This includes things like:
- Encountered issues not mentioned or justified in the *Known Issues* section - Encountered issues not mentioned or justified in the *Known Issues* section
@ -24,92 +21,36 @@ This includes things like:
- Inconsistently formatted or unreadable source code - Inconsistently formatted or unreadable source code
- … - …
## Boundary Conditions ## Hard Requirements
- [ ] Correct submission - README is present:
- Subject is correct - Contains list of prerequisites
- Attached file has correct name and structure - Contains build instructions
- Contains *Known Issues* section
- Submitted code builds successfully.
- `mcc` executable operates as demanded by the specification.
- A respective executable can be built and run for the *mandelbrot* test input.
- [ ] README is present ## General (10 Points)
- Contains instructions
- Contains dependencies
- Contains *Known Issues*
- [ ] Code builds successfully This is all about compiling *valid* input programs.
- Warnings are enabled
- No unjustified warnings of any kind
- [ ] All unit tests succeed - Provided test inputs (examples) build and run successfully.
- Additional, secret test inputs build and run successfully.
- [ ] All integration tests succeed ## Front-end (8 Points)
- provided test inputs must be included
- [ ] Additional integration tests (provided by the instructor) succeed This is all about rejecting *invalid* input programs.
- [ ] Architecture consists of shared library + executables - Invalid input yields a meaningful error message including source location (filename, start line, and start column).
- Syntactically invalid input is rejected by the parser.
- Semantic checks demanded by the specification are implemented and run on the obtained AST.
- [ ] All symbols exported by the library are prefixed with `mcc_` ## Core (2 Points)
## Front-end The IR needs to be decoupled in order to exploit its benefits.
Furthermore, the control flow graph is an essential tool used by optimising compilers.
Errors need to come with a meaningful error message and source location information (filename, start line, and start column). - TAC data structure is present and independent from front- and back-end.
- A dedicated CFG data structure is present.
- Syntactic checks: - A CFG of a given IR function can be obtained and visualised.
- [ ] Syntactically invalid mC programs are rejected with an error
- [ ] AST data structure is present and instantiated by the parser
- [ ] AST can be visualised using `mc_ast_to_dot`
- Semantic checks:
- [ ] Shadowing is supported correctly
- [ ] Error on use of undeclared variable
- [ ] Error on conflicting variable declaration
- [ ] Error on use of unknown function
- [ ] Error on missing `main` function
- [ ] Error on conflicting function names
- includes built-in functions
- [ ] Error on missing return-statement for non-void functions
- [ ] Correct type checking on scalars
- [ ] Correct type checking on arrays
- [ ] Error on invalid call-expressions
- Mismatching argument count
- Mismatching argument types
- Return type is taken into account by the type checker
- [ ] Symbol table data structure is present
- [ ] Symbol table can be visualised using `mc_symbol_table`
- [ ] Type checking can be traced using `mc_type_check_trace`
## Core
- [ ] TAC data structure is present
- [ ] TAC can be visualised using `mc_ir`
- [ ] CFG data structure is present
- [ ] CFG can be visualised using `mc_cfg_to_dot`
## Back-end
- [ ] Assembly code can be obtained using `mc_asm`
- [ ] GCC is invoked to generate the final executable
## Driver
- [ ] `mcc` executable supports the requested command-line flags
- [ ] Multiple input files are supported

View File

@ -1,16 +0,0 @@
# Example Input
Some example inputs for the compiler are already provided.
These examples are to be used as integration tests.
Your initial task is to create another example which may be added to the set.
Try to use as many features of the mC language as possible.
The example may read from `stdin` and write to `stdout` using the built-in functions.
Provide `.stdin.txt` and `.stdout.txt` files for verification purposes.
The getting started code-base provides a stub for the mC compiler.
It converts mC to C and compiles the result using GCC.
See [Submission Guideline](submission.md).

View File

@ -1,13 +1,13 @@
# mC Compiler Specification # mC Compiler Specification
This document describes the mC compiler as well as the mC language itself along with some requirements. This document describes the mC compiler as well as the mC language along with some requirements.
Like a regular compiler the mC compiler is divided into 3 main parts: front-end, back-end, and a core in-between. Like a regular compiler the mC compiler is divided into 3 main parts: front-end, back-end, and a core in-between.
The front-end's task is to validate a given input using syntactic and semantic checks. The front-end's task is to validate a given input using syntactic and semantic checks.
The syntactic checking is done by the *parser*, which, on success, generates an abstract syntax tree (AST). The syntactic checking is done by the parser, which, on success, generates an abstract syntax tree (AST).
This tree data structure is mainly used for semantic checking, although transformations can also be applied to it. This tree data structure is mainly used for semantic checking, although transformations can also be applied to it.
Moving on, the AST is translated to the compiler's intermediate representation (IR) and passed to the core.
Invalid inputs cause errors to be reported. Invalid inputs cause errors to be reported.
Moving on, the AST is translated to the compiler's intermediate representation (IR) and passed on to the core.
The core provides infrastructure for running analyses and transformations on the IR. The core provides infrastructure for running analyses and transformations on the IR.
These analyses and transformations are commonly used for optimisation. These analyses and transformations are commonly used for optimisation.
@ -18,30 +18,7 @@ The back-end translates the platform *independent* IR code to platform *dependen
An assembler converts this code to *object code*, which is finally crafted into an executable by the linker. An assembler converts this code to *object code*, which is finally crafted into an executable by the linker.
For these last two steps, GCC is used — referred to as *back-end compiler* in this context. For these last two steps, GCC is used — referred to as *back-end compiler* in this context.
The mC compiler is implemented using modern C (or C++) adhering to the C11 (or C++17) standard. Adapt project layout, build system, and coding guidelines according to the used programming language's conventions.
## Milestones
1. **Parser**
- Inputs are accepted / rejected correctly (syntax only).
- Syntactically invalid inputs result in a meaningful error message containing the corresponding source location.
- An AST is constructed for valid inputs.
- The obtained AST can be printed in the DOT format (see `mc_ast_to_dot`).
2. **Semantic checks**
- The compiler rejects semantically wrong inputs.
- Invalid inputs trigger a meaningful error message including source location information.
- Type checking can be traced (see `mc_type_check_trace`).
- Symbol tables can be viewed (see `mc_symbol_table`).
3. **Control flow graph**
- Valid inputs are converted to IR.
- The IR can be printed (see `mc_ir`).
- The CFG is generated and can be printed in the DOT format (see `mc_cfg_to_dot`).
4. **Back-end**
- Valid inputs are converted to IR and then to assembly code.
- The assembly code can be printed (see `mc_asm`).
- GCC is invoked to create the final executable.
5. **Build Infrastructure**
- Your code builds and tests successfully on my evaluation system.
## mC Language ## mC Language
@ -50,7 +27,7 @@ The semantics of mC are identical to C unless specified otherwise.
### Grammar ### Grammar
The next segment defines the grammar of mC using this notation: The grammar of mC is defined using the following notation:
- `#` starts a single line comment - `#` starts a single line comment
- `,` indicates concatenation - `,` indicates concatenation
@ -152,14 +129,14 @@ call_expr = identifier , "(" , [ arguments ] , ")"
arguments = expression , [ { "," expression } ] arguments = expression , [ { "," expression } ]
# Program # Program (Entry Point)
program = [ { function_def } ] program = [ { function_def } ]
``` ```
### Comments ### Comments
mC supports only *C-style* comments, starting with `/*` and ending with `*/`. mC supports only C-style comments, starting with `/*` and ending with `*/`.
Like in C, they can span across multiple lines. Like in C, they can span across multiple lines.
Comments are discarded by the parser; however, line breaks are taken into account for line numbering. Comments are discarded by the parser; however, line breaks are taken into account for line numbering.
@ -179,14 +156,16 @@ Furthermore, it is assumed that arrays and strings are at most `LONG_MAX` elemen
The operators `!`, `&&`, and `||` can only be used with Booleans. The operators `!`, `&&`, and `||` can only be used with Booleans.
Short-circuit evaluation is *not* supported. Short-circuit evaluation is *not* supported.
An expression used as a condition (for `if` or `while`) is expected to be of type `bool`.
#### Strings #### Strings
Strings are immutable and do not support any operation (e.g. concatenation). Strings are immutable and do not support any operation (e.g. concatenation).
Like comments, strings can span across multiple lines. Like comments, strings can span across multiple lines.
Newlines and indentation whitespaces are part of the string, when dealing with multiline strings. Whitespaces (i.e. newlines, tabs, spaces) are part of the string.
Escape sequences are *not* supported. Escape sequences are *not* supported.
Their sole purpose is to be used with the built-in `print` function (see below). The sole purpose of strings in mC is to be used with the built-in `print` function (see below).
#### Arrays #### Arrays
@ -232,7 +211,7 @@ Modifications made to an array inside a function are visible outside the functio
int main() { int main() {
int[5] arr; int[5] arr;
foo(arr); foo(arr);
print_int(arr[2]); // outputs 42 print_int(arr[2]); /* outputs 42 */
return 0; return 0;
} }
@ -246,7 +225,7 @@ While strings can be re-assigned (in contrast to arrays), this is not visible ou
string s; string s;
s = "bar"; s = "bar";
foo(s); foo(s);
print(s); // outputs bar print(s); /* outputs bar */
return 0; return 0;
} }
@ -254,32 +233,26 @@ While strings can be re-assigned (in contrast to arrays), this is not visible ou
There are *no* type conversion, neither implicit nor explicit. There are *no* type conversion, neither implicit nor explicit.
An expression used as a condition (for `if` or `while`) is expected to be of type `bool`.
*Note:* If the need for explicit type conversion arises, additional built-ins will be added for this purpose.
#### Entry Point #### Entry Point
The top-level grammar rule is `program` which simply consists of 0 or more function definitions. The top-level grammar rule is `program` which consists of 0 or more function definitions.
While the parser happily accepts empty source files, a semantic check enforces the presence of a function named `main`. While the parser happily accepts empty source files, a semantic check enforces the presence of a function named `main`.
This function takes no arguments and returns an `int`. This function takes no arguments and returns an `int`.
On success, an mC program returns `0`. On success, an mC program's `main` function returns `0`.
#### Declaration, Definition, and Initialization #### Declaration, Definition, and Initialization
`declaration` is used to declare variables which can then be initialised with `assignment`. `declaration` is used to declare variables which can then be initialised with `assignment`.
Splitting declaration and initialisation simplifies the creation of symbol tables. Splitting declaration and initialisation simplifies the creation of symbol tables.
Functions are always declared by their definition.
Forward declarations are therefore *not* supported.
Contrary to C, it is possible to call a function before it has been declared (in case of mC defined). Contrary to C, it is possible to call a function before it has been declared (in case of mC defined).
Forward declarations are therefore *not* supported.
#### Empty Parameter List #### Empty Parameter List
In C, the parameter list of a function taking no arguments is written as `(void)`. In mC, an empty parameter list is always written as `()`.
mC, in this case, just uses an empty parameter list `()`.
#### Dangling Else #### Dangling Else
@ -308,26 +281,23 @@ The following built-in functions are provided by the compiler for I/O operations
## mC Compiler ## mC Compiler
The mC compiler is implemented as a library. The mC compiler is implemented as a library.
It can be used either programmatically or via the provided command-line applications. It can be used either programmatically or via the provided command-line applications (see below).
The focus lies on a clean and modular implementation as well as a straight forward architecture, rather than raw performance. The focus lies on a clean and modular implementation as well as a straightforward architecture, rather than raw performance.
For example, each semantic check may traverse the AST in isolation. For example, each semantic check may traverse the AST in isolation.
The compiler guarantees the following: The compiler guarantees the following:
- Exported symbols are prefixed with `mcc_`. - All functions are thread-safe.
- It is thread-safe.
- No memory is leaked — even in error cases.
- Functions do not interact directly with `stdin`, `stdout`, or `stderr`. - Functions do not interact directly with `stdin`, `stdout`, or `stderr`.
- No function terminates the application on correct usage. - No function terminates the application on correct usage (or replaces the running process using `exec`).
- No memory is leaked — even in error cases.
*Note for C++*: *Note for C*: Prefix symbols with `mcc_` due to the lack of namespaces.
Do not prefix symbols.
Put everything in an `mcc` namespace instead.
### Logging ### Logging
Logging infrastructure may be present; however, all log output is disabled by default. Logging infrastructure *may* be present; however, all log (and debug) output is disabled by default.
The log level can be set with the environment variable `MCC_LOG_LEVEL`. The log level can be set with the environment variable `MCC_LOG_LEVEL`.
0 = no logging 0 = no logging
@ -352,14 +322,11 @@ This allows for better IDE integration.
Displaying the offending source code along with the error message is helpful, but not required. Displaying the offending source code along with the error message is helpful, but not required.
Parsing may stop on the first error. Parsing may stop on the first error.
Pay attention to operator precedence.
Error recovery is optional. Error recovery is optional.
The parser component may be generated by tools like `flex` and `bison`, or similar.
Although, you are encouraged to implement a recursive descent or combinator parser instead.
Nevertheless, pay attention to operator precedence.
Note that partial mC programs, like an expression or statement, are not valid inputs to the main *parse* function. Note that partial mC programs, like an expression or statement, are not valid inputs to the main *parse* function.
The library may provide additional functions for parsing single expressions or statements. The library *may* provide additional functions for parsing single expressions or statements.
### Abstract Syntax Tree ### Abstract Syntax Tree
@ -369,13 +336,11 @@ Consider using the visitor pattern for tree traversals.
Given this example input: Given this example input:
```c int fib(int n)
int fib(int n) {
{
if (n < 2) return n; if (n < 2) return n;
return fib(n - 1) + fib(n - 2); return fib(n - 1) + fib(n - 2);
} }
```
The visualisation of the AST for the `fib` function could look like this: The visualisation of the AST for the `fib` function could look like this:
@ -392,7 +357,7 @@ As the parser only does syntactic checking, additional semantic checks are imple
- Checking for presence of `main` and correct signature - Checking for presence of `main` and correct signature
- Checking that all execution paths of a non-void function return a value - Checking that all execution paths of a non-void function return a value
- Type checking (remember, neither implicit nor explicit type conversions) - Type checking (remember, neither implicit nor explicit type conversions)
- Includes checking operations on arrays - Includes checking operations on arrays (including array size)
- Includes checking arguments and return types for call expressions - Includes checking arguments and return types for call expressions
In addition to the AST, *symbol tables* are created and used for semantic checking. In addition to the AST, *symbol tables* are created and used for semantic checking.
@ -401,18 +366,23 @@ Be sure to correctly model [*shadowing*](https://en.wikipedia.org/wiki/Variable_
### Intermediate Representation ### Intermediate Representation
As IR, a low-level [three-address code (TAC)](https://en.wikipedia.org/wiki/Three-address_code) is used. As IR, a low-level [three-address code (TAC)](https://en.wikipedia.org/wiki/Three-address_code) is used.
The instruction set of this code is *not* specified. The instruction set of this IR is *not* specified.
The compiler's core is independent from the front- and back-end. The compiler's core is independent from the front- and back-end.
### Control Flow Graph *Hint:* Handle arguments and return values for function calls via an *imaginary* stack using dedicated `push` and `pop` instructions.
Have a look at the calling convention used for assembly code generation.
A control flow graph data structure is present and can be constructed for a given IR program. ### Control Flow Graph (CFG)
This graph is commonly used by analyses for extracting structural information crucial for transformation steps.
It is recommended to also provide a visitor mechanism for this graph. A control flow graph data structure consisting of edges and basic blocks (containing IR instructions) is present.
For each function in a given IR program, a corresponding CFG can be obtained.
Like the AST, it can be visualised. The CFG is commonly used by analyses for extracting structural information crucial for transformation steps.
Providing a visitor mechanism for CFGs is optional, yet recommended.
Like the AST, CFGs can be printed using the DOT format.
The example below is taken from [Marc Moreno Maza](http://www.csd.uwo.ca/~moreno/CS447/Lectures/CodeOptimization.html/node6.html). The example below is taken from [Marc Moreno Maza](http://www.csd.uwo.ca/~moreno/CS447/Lectures/CodeOptimization.html/node6.html).
Given this example IR: Given this example IR:
@ -447,61 +417,61 @@ Pay special attention to floating point and integer handling.
Use [cdecl calling convention](https://en.wikipedia.org/wiki/X86_calling_conventions#cdecl). Use [cdecl calling convention](https://en.wikipedia.org/wiki/X86_calling_conventions#cdecl).
It is paramount to correctly implement the calling convention, otherwise the stack may get corrupted during function calls and returns. It is paramount to correctly implement the calling convention, otherwise the stack may get corrupted during function calls and returns.
Note that *all* function calls (including built-ins) use the same calling convention — do not needlessly introduce special cases.
*Hint:* There is a `.float` assembler directive.
*Hint:* If you are not familiar with x86 assembly, pass small C snippets to GCC and look at the generated assembly code (using `-S`).
Optimisations, mitigations, and other unnecessary features (e.g. dwarf symbols, unwind tables) should be disabled.
There are also flags like `-fverbose-asm` which add additional annotations to the output.
## Applications ## Applications
Apart from the main compiler executable `mcc`, additional auxiliary executables are provided. Apart from the main compiler executable `mcc`, additional auxiliary executables are provided.
These executables aid the development process and are used for evaluation. These executables aid the development process and are used for evaluation.
Do not omit details in the output (e.g. do not simplifying the AST).
The applications are commonly defined by their usage information. The applications are specified by their usage information.
Composing them with other command-line tools, like `dot`, is a core feature. Composing them with other command-line tools, like `dot`, is a core feature.
The exact output format is not specified in all cases.
However, details should *not* be omitted — like simplifying the AST.
All applications exit with code `EXIT_SUCCESS` *iff* they succeeded in their operation. All applications exit with code `EXIT_SUCCESS` *iff* they succeeded in their operation.
Each executable accepts multiple input files.
The inputs are parsed in isolation; the resulting ASTs are merged before semantic checks are run.
Errors are written to `stderr`. Errors are written to `stderr`.
### `mcc` ### `mcc`
This is the main compiler executable, sometimes referred to as *driver*. This is the main compiler executable, sometimes referred to as *driver*.
usage: mcc [OPTIONS] file... usage: mcc [OPTIONS] <file>
The mC compiler. It takes mC input files and produces an executable. The mC compiler. It takes an mC input file and produces an executable.
Errors are reported on invalid inputs.
Use '-' as input file to read from stdin. Use '-' as input file to read from stdin.
OPTIONS: OPTIONS:
-h, --help displays this help message -h, --help display this help message
-v, --version displays the version number
-q, --quiet suppress error output -q, --quiet suppress error output
-o, --output <file> write the output to <file> (defaults to 'a.out') -o, --output <out-file> write the output to <out-file> (defaults to 'a.out')
Environment Variables: Environment Variables:
MCC_BACKEND override the back-end compiler (defaults to 'gcc' in PATH) MCC_BACKEND override the back-end compiler (defaults to 'gcc')
### `mc_ast_to_dot` ### `mc_ast_to_dot`
usage: mc_ast_to_dot [OPTIONS] file... usage: mc_ast_to_dot [OPTIONS] <file>
Utility for printing an abstract syntax tree in the DOT format. The output Utility for printing an abstract syntax tree in the DOT format. The output
can be visualised using graphviz. Errors are reported on invalid inputs. can be visualised using Graphviz. Errors are reported on invalid inputs.
Use '-' as input file to read from stdin. Use '-' as input file to read from stdin.
OPTIONS: OPTIONS:
-h, --help displays this help message -h, --help display this help message
-o, --output <file> write the output to <file> (defaults to stdout) -o, --output <out-file> write the output to <out-file> (defaults to stdout)
-f, --function <name> limit scope to the given function
### `mc_symbol_table` ### `mc_symbol_table`
usage: mc_symbol_table [OPTIONS] file... usage: mc_symbol_table [OPTIONS] <file>
Utility for displaying the generated symbol tables. Errors are reported on Utility for displaying the generated symbol tables. Errors are reported on
invalid inputs. invalid inputs.
@ -509,27 +479,12 @@ This is the main compiler executable, sometimes referred to as *driver*.
Use '-' as input file to read from stdin. Use '-' as input file to read from stdin.
OPTIONS: OPTIONS:
-h, --help displays this help message -h, --help display this help message
-o, --output <file> write the output to <file> (defaults to stdout) -o, --output <out-file> write the output to <out-file> (defaults to stdout)
-f, --function <name> limit scope to the given function
### `mc_type_check_trace`
usage: mc_type_check_trace [OPTIONS] file...
Utility for tracing the type checking process. Errors are reported on
invalid inputs.
Use '-' as input file to read from stdin.
OPTIONS:
-h, --help displays this help message
-o, --output <file> write the output to <file> (defaults to stdout)
-f, --function <name> limit scope to the given function
### `mc_ir` ### `mc_ir`
usage: mc_ir [OPTIONS] file... usage: mc_ir [OPTIONS] <file>
Utility for viewing the generated intermediate representation. Errors are Utility for viewing the generated intermediate representation. Errors are
reported on invalid inputs. reported on invalid inputs.
@ -537,13 +492,12 @@ This is the main compiler executable, sometimes referred to as *driver*.
Use '-' as input file to read from stdin. Use '-' as input file to read from stdin.
OPTIONS: OPTIONS:
-h, --help displays this help message -h, --help display this help message
-o, --output <file> write the output to <file> (defaults to stdout) -o, --output <out-file> write the output to <out-file> (defaults to stdout)
-f, --function <name> limit scope to the given function
### `mc_cfg_to_dot` ### `mc_cfg_to_dot`
usage: mc_cfg_to_dot [OPTIONS] file... usage: mc_cfg_to_dot [OPTIONS] <file>
Utility for printing a control flow graph in the DOT format. The output Utility for printing a control flow graph in the DOT format. The output
can be visualised using graphviz. Errors are reported on invalid inputs. can be visualised using graphviz. Errors are reported on invalid inputs.
@ -551,13 +505,13 @@ This is the main compiler executable, sometimes referred to as *driver*.
Use '-' as input file to read from stdin. Use '-' as input file to read from stdin.
OPTIONS: OPTIONS:
-h, --help displays this help message -h, --help display this help message
-o, --output <file> write the output to <file> (defaults to stdout) -o, --output <out-file> write the output to <out-file> (defaults to stdout)
-f, --function <name> limit scope to the given function -f, --function <name> print the CFG of the given function (defaults to 'main')
### `mc_asm` ### `mc_asm`
usage: mc_asm [OPTIONS] file... usage: mc_asm [OPTIONS] <file>
Utility for printing the generated assembly code. Errors are reported on Utility for printing the generated assembly code. Errors are reported on
invalid inputs. invalid inputs.
@ -565,15 +519,14 @@ This is the main compiler executable, sometimes referred to as *driver*.
Use '-' as input file to read from stdin. Use '-' as input file to read from stdin.
OPTIONS: OPTIONS:
-h, --help displays this help message -h, --help display this help message
-o, --output <file> write the output to <file> (defaults to stdout) -o, --output <out-file> write the output to <out-file> (defaults to stdout)
-f, --function <name> limit scope to the given function
## Project Structure ## Project Structure
The following directory layout is used. The following directory layout is used.
mcc/ # This node represents the root of your repository. mcc/ # This node represents the root of the repository.
├── app/ # Each C file in this directory corresponds to one executable. ├── app/ # Each C file in this directory corresponds to one executable.
│ ├── mc_ast_to_dot.c │ ├── mc_ast_to_dot.c
│ ├── mcc.c │ ├── mcc.c
@ -616,54 +569,35 @@ The README is kept short and clean with the following sections:
`src` contains the implementation of the library, while `include` defines its API. `src` contains the implementation of the library, while `include` defines its API.
Each application (C file inside `app`) is linked against the shared library and uses the provided interface. Each application (C file inside `app`) uses the library via the provided interface.
They mainly contain argument parsing and combine the functionality provided by the library to achieve their tasks. They mainly contain argument parsing and combine the functionality offered by the API to achieve their tasks.
The repository does not contain or track generated files. The repository does not contain or track generated files.
All generated files are placed inside a build directory (i.e. out-of-source build).
Under normal circumstances, all generated files are placed somewhere inside the build directory (i.e. out-of-source build).
### Known Issues ### Known Issues
At any point in time, the README contains a list of unfixed, known issues. At any point in time, the README contains a list of unfixed, known issues.
Each entry is kept short and concise and should be justified. Each entry is kept short and concise including a justification.
More complex issues may reference a dedicated document inside `docs` providing more details. Complex issues may reference a dedicated document inside `docs` elaborating it in greater detail.
## Build Infrastructure
As build system (generator), use either [Meson](http://mesonbuild.com/), [CMake](https://cmake.org/), or plain Makefiles.
Dependencies between source files are modelled correctly to enable a short development cycle.
*Note:* Talk to me if you want to use a different build system.
### Building
The default build configuration is *release* (optimisations enabled).
Unless Meson or CMake is used, the README documents how to switch to a *debug* configuration.
Warnings are always enabled: `-Wall -Wextra` are used at least.
### Testing ### Testing
Crucial or complicated logic is tested adequately. Crucial or complicated logic should be tested adequately.
The project infrastructure provides a *simple* way to run all unit and integration tests. The project infrastructure provides a *simple* way to run all unit and integration tests.
See the getting started code-base for an example (`scripts/run_integration_tests`). See the getting started code-base for example.
Similarly, a way to run unit tests using `valgrind` is provided. Similarly, a way to run unit tests using `valgrind` is provided.
### Coverage
An HTML coverage report can be obtained following *simple* instructions inside the README.
### Dependencies ### Dependencies
The *prerequisites* section of the README enumerates the dependencies. The *prerequisites* section of the README enumerates all dependencies.
The implementation should not have any dependencies apart from the C (or C++) standard library, system libraries (POSIX), and a testing framework. The implementation should not have any dependencies apart from the standard library, system libraries (POSIX), a testing framework, and a lexer / parser generator.
The unit testing framework is *vendored* and automatically used by the build system. If a dependency is not available via the evaluation system's package manager, it needs to be automatically built and used by the build system.
See the getting started code-base for an example. It is recommended to *vendor* such dependencies rather than downloading them during build time.
## Coding Guidelines ## Coding Guidelines
@ -686,7 +620,8 @@ Architectural design and readability of your code will be judged.
### C/C++ ### C/C++
- While not required, it is highly recommended to use a formatting tool, like [ClangFormat](https://clang.llvm.org/docs/ClangFormat.html). - While not required, it is highly recommended to use a formatting tool, like [ClangFormat](https://clang.llvm.org/docs/ClangFormat.html).
A configuration file is provided in the getting started code-base, however, you are free to rule your own. A configuration file is provided in the getting started code-base; however, you are free to rule your own.
- Consider enabling address and memory sanitizers.
- Lines should not exceed 120 columns. - Lines should not exceed 120 columns.
- The nesting depth of control statements should not exceed 4. - The nesting depth of control statements should not exceed 4.
- Move inner code to dedicated functions or macros. - Move inner code to dedicated functions or macros.

View File

@ -1,46 +1,24 @@
# Submission Guideline # Submission Guideline
- `XX` is to be replaced with the number of your team with leading zero (e.g. `02`). - Replace `XX` with your team number with leading zero (e.g. `02`).
- `Y` is to be replaced with the corresponding milestone number.
- One submission *per team*. - One submission *per team*.
## Example Input Submission ## Build Test Submission
Assuming your example input is named `mandelbrot`, zip the corresponding files like so:
mandelbrot.zip
└── mandelbrot/
├── mandelbrot.mc
├── mandelbrot.stdin.txt
└── mandelbrot.stdout.txt
Submit the zip archive via mail using the following line as subject (or link below).
List your team members in the mail body.
703602 - Example Input
📧 [send email](mailto:alexander.hirsch@uibk.ac.at?subject=703602%20-%20Example%20Input)
## Milestone Submission
1. `cd` into your repository. 1. `cd` into your repository.
2. Commit all pending changes. 2. Commit all pending changes.
3. Checkout the revision you want to submit. 3. Checkout the revision you want to submit.
4. Ensure everything builds. 4. Ensure the submitted code builds.
- Warnings are okay
- Tests may fail
- Memory may be leaked
- Known issues should be present
5. Run the following command: 5. Run the following command:
$ git archive --prefix=team_XX_milestone_Y/ --format=zip HEAD > team_XX_milestone_Y.zip $ git archive --prefix=team_XX_build_test/ --format=zip HEAD > team_XX_build_test.zip
6. Verify that the resulting archive contains everything you want to submit and nothing more. 6. Verify that the resulting archive contains everything you want to submit and nothing more.
7. Submit the zip archive via mail using the following line as subject (or link below). 7. Submit the archive via mail using the following line as subject (or link below).
703602 - Team XX Milestone Y 703602 - Team XX Build Test Submission
📧 [send email](mailto:alexander.hirsch@uibk.ac.at?subject=703602%20-%20Team%20XX%20Milestone%20Y) 📧 [send email](mailto:alexander.hirsch@uibk.ac.at?subject=703602%20-%20Team%20XX%20Build%20Test%20Submission)
## Final Submission ## Final Submission
@ -53,14 +31,14 @@ List your team members in the mail body.
- All unit tests succeed - All unit tests succeed
- All integration tests succeed - All integration tests succeed
- No memory is leaked - No memory is leaked
- Known issues must be present - Known issues is present and up-to-date
5. Run the following command: 5. Run the following command:
$ git archive --prefix=team_XX_final/ --format=zip HEAD > team_XX_final.zip $ git archive --prefix=team_XX_final/ --format=zip HEAD > team_XX_final.zip
6. Verify that the resulting archive contains everything you want to submit and nothing more. 6. Verify that the resulting archive contains everything you want to submit and nothing more.
7. Submit the zip archive via mail using the following line as subject (or link below). 7. Submit the archive via mail using the following line as subject (or link below).
703602 - Team XX Final 703602 - Team XX Final Submission
📧 [send email](mailto:alexander.hirsch@uibk.ac.at?subject=703602%20-%20Team%20XX%20Final) 📧 [send email](mailto:alexander.hirsch@uibk.ac.at?subject=703602%20-%20Team%20XX%20Final%20Submission)