uibk_703602-Compiler-Constr.../specification.md

680 lines
24 KiB
Markdown
Raw Permalink Normal View History

2018-12-31 15:26:22 +01:00
# mC Compiler Specification
2020-02-05 13:35:04 +01:00
This document describes the mC compiler as well as the mC language along with some requirements.
Like a regular compiler the mC compiler is divided into 3 main parts: front-end, back-end, and a core in-between.
2018-12-31 15:26:22 +01:00
The front-end's task is to validate a given input using syntactic and semantic checks.
2020-02-05 13:35:04 +01:00
The syntactic checking is done by the parser, which, on success, generates an abstract syntax tree (AST).
This tree data structure is mainly used for semantic checking, although transformations can also be applied to it.
2018-12-31 15:26:22 +01:00
Invalid inputs cause errors to be reported.
2020-02-05 13:35:04 +01:00
Moving on, the AST is translated to the compiler's intermediate representation (IR) and passed on to the core.
2018-12-31 15:26:22 +01:00
The core provides infrastructure for running analyses and transformations on the IR.
2019-01-07 15:07:01 +01:00
These analyses and transformations are commonly used for optimisation.
2018-12-31 15:26:22 +01:00
Additional data structures, like the control flow graph (CFG), are utilised for this phase.
Next, the (optimised) IR is passed to the back-end.
The back-end translates the platform *independent* IR code to platform *dependent* assembly code.
An assembler converts this code to *object code*, which is finally crafted into an executable by the linker.
For these last two steps, GCC is used — referred to as *back-end compiler* in this context.
2020-02-05 13:35:04 +01:00
Adapt project layout, build system, and coding guidelines according to the used programming language's conventions.
2018-12-31 15:26:22 +01:00
## mC Language
This section defines *mC* — a simple, C-like language.
The semantics of mC are identical to C unless specified otherwise.
### Grammar
2020-02-05 13:35:04 +01:00
The grammar of mC is defined using the following notation:
2018-12-31 15:26:22 +01:00
- `#` starts a single line comment
2018-12-31 15:26:22 +01:00
- `,` indicates concatenation
- `|` indicates alternation
- `( )` indicates grouping
- `[ ]` indicates optional parts (0 or 1)
- `{ }` indicates repetition (1 or more)
2019-02-17 22:57:48 +01:00
- `[ ]` and `{ }` can be combined to build *0 or more* repetition
2018-12-31 15:26:22 +01:00
- `" "` indicates a terminal string
- `/ /` indicates a [RegEx](https://www.regular-expressions.info/)
```
# Primitives
alpha = /[a-zA-Z_]/
alpha_num = /[a-zA-Z0-9_]/
digit = /[0-9]/
identifier = alpha , [ { alpha_num } ]
# Operators
unary_op = "-" | "!"
binary_op = "+" | "-" | "*" | "/"
| "<" | ">" | "<=" | ">="
| "&&" | "||"
| "==" | "!="
# Types
type = "bool" | "int" | "float" | "string"
# Literals
literal = bool_literal
| int_literal
| float_literal
| string_literal
2019-01-07 15:07:01 +01:00
bool_literal = "true" | "false"
int_literal = { digit }
float_literal = { digit } , "." , { digit }
string_literal = /"[^"]*"/
2018-12-31 15:26:22 +01:00
# Declarations / Assignments
declaration = type , [ "[" , int_literal , "]" ] , identifier
assignment = identifier , [ "[" , expression , "]" ] , "=" , expression
# Expressions
expression = literal
| identifier , [ "[" , expression , "]" ]
| call_expr
| unary_op , expression
| expression , binary_op , expression
| "(" , expression , ")"
# Statements
statement = if_stmt
| while_stmt
| ret_stmt
| declaration , ";"
| assignment , ";"
| expression , ";"
| compound_stmt
if_stmt = "if" , "(" , expression , ")" , statement , [ "else" , statement ]
while_stmt = "while" , "(" , expression , ")" , statement
ret_stmt = "return" , [ expression ] , ";"
compound_stmt = "{" , [ { statement } ] , "}"
# Function Definitions / Calls
function_def = ( "void" | type ) , identifier , "(" , [ parameters ] , ")" , compound_stmt
parameters = declaration , [ { "," , declaration } ]
call_expr = identifier , "(" , [ arguments ] , ")"
arguments = expression , [ { "," expression } ]
2020-02-05 13:35:04 +01:00
# Program (Entry Point)
2018-12-31 15:26:22 +01:00
program = [ { function_def } ]
```
### Comments
2020-02-05 13:35:04 +01:00
mC supports only C-style comments, starting with `/*` and ending with `*/`.
2018-12-31 15:26:22 +01:00
Like in C, they can span across multiple lines.
2019-01-07 15:07:01 +01:00
Comments are discarded by the parser; however, line breaks are taken into account for line numbering.
2018-12-31 15:26:22 +01:00
### Size Limitations
2019-02-17 22:57:48 +01:00
`long` and `double` are used in the compiler to store mC's `int` and `float` literals, respectively.
2019-01-07 15:07:01 +01:00
It is assumed that both types are big and precise enough to store the corresponding literal.
2018-12-31 15:26:22 +01:00
2021-01-27 16:48:54 +01:00
String literals are at most 1000 characters long.
Arrays are at most `LONG_MAX` elements long.
2018-12-31 15:26:22 +01:00
2021-04-13 15:24:42 +02:00
### Operators
The following table enumerates which types an operator supports.
In the case of a binary operator, both sides must be of the same type.
2021-04-13 15:28:21 +02:00
| Operator | Supported Types |
| ---------------------------------- | ---------------------- |
| `-` `+` `*` `/` | `int`, `float` |
| `<` `<=` `>` `>=` | `int`, `float` |
| `==` `!=` | `bool`, `int`, `float` |
| `!` `&&` <code>&#124;&#124;</code> | `bool` |
2021-04-13 15:24:42 +02:00
2018-12-31 15:26:22 +01:00
### Special Semantics
#### Boolean
2019-01-07 15:07:01 +01:00
`bool` is considered a first-class citizen, distinct from `int`.
Short-circuit evaluation is *not* supported.
2018-12-31 15:26:22 +01:00
2020-02-05 13:35:04 +01:00
An expression used as a condition (for `if` or `while`) is expected to be of type `bool`.
2018-12-31 15:26:22 +01:00
#### Strings
Strings are immutable and do not support any operation (e.g. concatenation).
2019-01-07 15:07:01 +01:00
Like comments, strings can span across multiple lines.
2021-01-27 16:48:54 +01:00
Whitespace (i.e. newlines, tabs, spaces) is part of the string.
2019-01-07 15:07:01 +01:00
Escape sequences are *not* supported.
2018-12-31 15:26:22 +01:00
2020-02-05 13:35:04 +01:00
The sole purpose of strings in mC is to be used with the built-in `print` function (see below).
2018-12-31 15:26:22 +01:00
#### Arrays
Only one dimensional arrays with static size are supported.
The size must be stated during declaration and is part of the type.
The following statement declares an array of integers with 42 elements.
int[42] my_array;
We do not support *any* operations on whole arrays.
For example, the following code is *invalid*:
int[10] a;
int[10] b;
int[10] c;
c = a + b; /* not supported */
2019-01-07 15:07:01 +01:00
This needs to be rewritten as a loop in order to work:
2018-12-31 15:26:22 +01:00
int i;
i = 0;
while (i < 10) {
c[i] = a[i] + b[i];
i = i + 1;
}
Even further, one cannot assign to a variable of array type.
c = a; /* not supported, even though both are of type int[10] */
2019-02-17 22:57:48 +01:00
#### Call by Value / Call by Reference
2018-12-31 15:26:22 +01:00
2019-06-17 08:50:10 +02:00
`bool`, `int`, and `float` are passed by value.
For arrays and strings a pointer is passed by value (similar to C).
2018-12-31 15:26:22 +01:00
2019-06-17 08:50:10 +02:00
Modifications made to an array inside a function are visible outside the function.
void foo(int[5] arr) {
arr[2] = 42;
}
2019-06-17 09:58:36 +02:00
int main() {
2019-06-17 08:50:10 +02:00
int[5] arr;
foo(arr);
2020-02-05 13:35:04 +01:00
print_int(arr[2]); /* outputs 42 */
2019-06-17 09:58:36 +02:00
return 0;
2019-06-17 08:50:10 +02:00
}
While strings can be re-assigned (in contrast to arrays), this is not visible outside the function call.
void foo(string s) {
s = "foo";
}
2019-06-17 09:58:36 +02:00
int main() {
2019-06-17 08:50:10 +02:00
string s;
s = "bar";
foo(s);
2020-02-05 13:35:04 +01:00
print(s); /* outputs bar */
2019-06-17 09:58:36 +02:00
return 0;
2019-06-17 08:50:10 +02:00
}
2019-02-17 22:57:48 +01:00
2018-12-31 15:26:22 +01:00
#### Type Conversion
2019-01-07 15:07:01 +01:00
There are *no* type conversion, neither implicit nor explicit.
2018-12-31 15:26:22 +01:00
#### Entry Point
2020-02-05 13:35:04 +01:00
The top-level grammar rule is `program` which consists of 0 or more function definitions.
2019-01-07 15:07:01 +01:00
2019-02-17 22:57:48 +01:00
While the parser happily accepts empty source files, a semantic check enforces the presence of a function named `main`.
This function takes no arguments and returns an `int`.
2018-12-31 15:26:22 +01:00
2020-02-05 13:35:04 +01:00
On success, an mC program's `main` function returns `0`.
2019-01-07 15:07:01 +01:00
2018-12-31 15:26:22 +01:00
#### Declaration, Definition, and Initialization
`declaration` is used to declare variables which can then be initialised with `assignment`.
2019-01-07 15:07:01 +01:00
Splitting declaration and initialisation simplifies the creation of symbol tables.
2018-12-31 15:26:22 +01:00
Contrary to C, it is possible to call a function before it has been declared (in case of mC defined).
2020-02-05 13:35:04 +01:00
Forward declarations are therefore *not* supported.
2018-12-31 15:26:22 +01:00
#### Empty Parameter List
2020-02-05 13:35:04 +01:00
In mC, an empty parameter list is always written as `()`.
2018-12-31 15:26:22 +01:00
#### Dangling Else
A [*dangling else*](https://en.wikipedia.org/wiki/Dangling_else) belongs to the innermost `if`.
The following mC code snippets are semantically equivalent:
if (c1) | if (c1) {
if (c2) | if (c2) {
f2(); | f2();
else | } else {
f3(); | f3();
| }
| }
2019-01-07 15:07:01 +01:00
### Built-in Functions
2018-12-31 15:26:22 +01:00
The following built-in functions are provided by the compiler for I/O operations:
- `void print(string)` outputs the given string to `stdout`
- `void print_nl()` outputs the new-line character (`\n`) to `stdout`
- `void print_int(int)` outputs the given integer to `stdout`
- `void print_float(float)` outputs the given float to `stdout`
- `int read_int()` reads an integer from `stdin`
- `float read_float()` reads a float from `stdin`
## mC Compiler
The mC compiler is implemented as a library.
2020-02-05 13:35:04 +01:00
It can be used either programmatically or via the provided command-line applications (see below).
2018-12-31 15:26:22 +01:00
2020-02-05 13:35:04 +01:00
The focus lies on a clean and modular implementation as well as a straightforward architecture, rather than raw performance.
2018-12-31 15:26:22 +01:00
For example, each semantic check may traverse the AST in isolation.
The compiler guarantees the following:
2020-02-05 13:35:04 +01:00
- All functions are thread-safe.
2018-12-31 15:26:22 +01:00
- Functions do not interact directly with `stdin`, `stdout`, or `stderr`.
2020-02-05 13:35:04 +01:00
- No function terminates the application on correct usage (or replaces the running process using `exec`).
- No memory is leaked — even in error cases.
2018-12-31 15:26:22 +01:00
2020-02-05 13:35:04 +01:00
*Note for C*: Prefix symbols with `mcc_` due to the lack of namespaces.
2018-12-31 15:26:22 +01:00
### Logging
2020-02-05 13:35:04 +01:00
Logging infrastructure *may* be present; however, all log (and debug) output is disabled by default.
2018-12-31 15:26:22 +01:00
The log level can be set with the environment variable `MCC_LOG_LEVEL`.
2019-02-17 22:57:48 +01:00
0 = no logging
1 = normal logging (info)
2 = verbose logging (debug)
The output destination can be set with `MCC_LOG_FILE` and defaults to `stderr`.
2018-12-31 15:26:22 +01:00
Log messages do not overlap on multi-threaded execution.
### Parser
The parser reads the given input and, if it conforms syntactically to an mC program, constructs the corresponding AST.
2019-02-17 22:57:48 +01:00
An invalid input is rejected, resulting in a meaningful error message.
For instance:
2018-12-31 15:26:22 +01:00
foo.mc:3:8: error: unexpected '{', expected (
It is recommended to closely follow the error message format of other compilers.
2019-02-17 22:57:48 +01:00
This allows for better IDE integration.
2019-01-07 15:07:01 +01:00
2019-02-17 22:57:48 +01:00
Displaying the offending source code along with the error message is helpful, but not required.
2019-01-07 15:07:01 +01:00
2018-12-31 15:26:22 +01:00
Parsing may stop on the first error.
Error recovery is optional.
Pay attention to operator precedence.
2018-12-31 15:26:22 +01:00
2019-02-17 22:57:48 +01:00
Note that partial mC programs, like an expression or statement, are not valid inputs to the main *parse* function.
2020-02-05 13:35:04 +01:00
The library *may* provide additional functions for parsing single expressions or statements.
2018-12-31 15:26:22 +01:00
### Abstract Syntax Tree
2019-01-07 15:07:01 +01:00
The AST data structure itself is *not* specified.
2018-12-31 15:26:22 +01:00
Consider using the visitor pattern for tree traversals.
Given this example input:
2020-02-05 13:35:04 +01:00
int fib(int n)
{
if (n < 2) return n;
return fib(n - 1) + fib(n - 2);
}
2018-12-31 15:26:22 +01:00
The visualisation of the AST for the `fib` function could look like this:
2019-01-07 15:07:01 +01:00
![`fib` AST example](images/fib_ast.png)
2018-12-31 15:26:22 +01:00
### Semantic Checks
As the parser only does syntactic checking, additional semantic checks are implemented:
- Checking for uses of undeclared variables
- Checking for multiple declarations of variables with the same name in the same scope
- Checking for multiple definitions of functions with the same name
- Checking for calls to unknown functions
- Checking for presence of `main` and correct signature
- Checking that all execution paths of a non-void function return a value
- You may assume that there is no dead code
2019-01-07 15:07:01 +01:00
- Type checking (remember, neither implicit nor explicit type conversions)
- Includes checking arguments and return types for call expressions
- Don't forget that an array's type includes its size
2018-12-31 15:26:22 +01:00
Semantic checking may stop on the first error encountered.
2018-12-31 15:26:22 +01:00
In addition to the AST, *symbol tables* are created and used for semantic checking.
Be sure to correctly model [*shadowing*](https://en.wikipedia.org/wiki/Variable_shadowing).
### Intermediate Representation
As IR, a low-level [three-address code (TAC)](https://en.wikipedia.org/wiki/Three-address_code) is used.
2020-02-05 13:35:04 +01:00
The instruction set of this IR is *not* specified.
2018-12-31 15:26:22 +01:00
2019-01-07 15:07:01 +01:00
The compiler's core is independent from the front- and back-end.
2018-12-31 15:26:22 +01:00
2020-02-05 13:35:04 +01:00
*Hint:* Handle arguments and return values for function calls via an *imaginary* stack using dedicated `push` and `pop` instructions.
Have a look at the calling convention used for assembly code generation.
### Control Flow Graph (CFG)
2018-12-31 15:26:22 +01:00
2020-02-05 13:35:04 +01:00
A control flow graph data structure consisting of edges and basic blocks (containing IR instructions) is present.
For each function in a given IR program, a corresponding CFG can be obtained.
2018-12-31 15:26:22 +01:00
2020-02-05 13:35:04 +01:00
The CFG is commonly used by analyses for extracting structural information crucial for transformation steps.
2018-12-31 15:26:22 +01:00
2020-02-05 13:35:04 +01:00
Providing a visitor mechanism for CFGs is optional, yet recommended.
Like the AST, CFGs can be printed using the DOT format.
2019-01-30 11:25:52 +01:00
The example below is taken from [Marc Moreno Maza](http://www.csd.uwo.ca/~moreno/CS447/Lectures/CodeOptimization.html/node6.html).
Given this example IR:
```
s = 0
i = 0
n = 10
L1: t1 = a - b
ifz t1 goto L2
t2 = i * 4
s = s + t2
goto L3
L2: s = s + i
L3: i = i + 1
t3 = n - i
ifnz t3 goto L1
t4 = a - b
```
The visualisation of the corresponding CFG could look like this:
![CFG Example](images/cfg.png)
2018-12-31 15:26:22 +01:00
### Assembly Code Generation
2019-01-07 15:07:01 +01:00
The mC compiler targets x86 and uses GCC as back-end compiler.
On an x86_64 system, GCC multi-lib support must be available and the flag `-m32` is passed to the compiler.
2018-12-31 15:26:22 +01:00
The code generated by the back-end is compiled with the [GNU Assembler](https://en.wikipedia.org/wiki/GNU_Assembler) (by GCC).
Single precision floating point arithmetic is sufficient.
The `print_float` built-in only outputs 2 decimal places.
2018-12-31 15:26:22 +01:00
Use [cdecl calling convention](https://en.wikipedia.org/wiki/X86_calling_conventions#cdecl).
2019-01-07 15:07:01 +01:00
It is paramount to correctly implement the calling convention, otherwise the stack may get corrupted during function calls and returns.
2020-02-05 13:35:04 +01:00
Note that *all* function calls (including built-ins) use the same calling convention — do not needlessly introduce special cases.
*Hint:* There is a `.float` assembler directive.
*Hint:* If you are not familiar with x86 assembly, pass small C snippets to GCC and look at the generated assembly code (using `-S`).
Optimisations, mitigations, and other unnecessary features (e.g. dwarf symbols, unwind tables) should be disabled.
There are also flags like `-fverbose-asm` which add additional annotations to the output.
2018-12-31 15:26:22 +01:00
## Applications
2019-01-07 15:07:01 +01:00
Apart from the main compiler executable `mcc`, additional auxiliary executables are provided.
2018-12-31 15:26:22 +01:00
These executables aid the development process and are used for evaluation.
2020-02-05 13:35:04 +01:00
Do not omit details in the output (e.g. do not simplifying the AST).
2018-12-31 15:26:22 +01:00
2020-02-05 13:35:04 +01:00
The applications are specified by their usage information.
2018-12-31 15:26:22 +01:00
Composing them with other command-line tools, like `dot`, is a core feature.
All applications exit with code `EXIT_SUCCESS` *iff* they succeeded in their operation.
2019-01-07 15:07:01 +01:00
Errors are written to `stderr`.
2018-12-31 15:26:22 +01:00
### `mcc`
This is the main compiler executable, sometimes referred to as *driver*.
2020-02-05 13:35:04 +01:00
usage: mcc [OPTIONS] <file>
2018-12-31 15:26:22 +01:00
2020-02-05 13:35:04 +01:00
The mC compiler. It takes an mC input file and produces an executable.
Errors are reported on invalid inputs.
2018-12-31 15:26:22 +01:00
Use '-' as input file to read from stdin.
OPTIONS:
2020-02-05 13:35:04 +01:00
-h, --help display this help message
2018-12-31 15:26:22 +01:00
-q, --quiet suppress error output
2020-02-05 13:35:04 +01:00
-o, --output <out-file> write the output to <out-file> (defaults to 'a.out')
2018-12-31 15:26:22 +01:00
Environment Variables:
2020-02-05 13:35:04 +01:00
MCC_BACKEND override the back-end compiler (defaults to 'gcc')
2018-12-31 15:26:22 +01:00
### `mc_ast_to_dot`
2020-02-05 13:35:04 +01:00
usage: mc_ast_to_dot [OPTIONS] <file>
2018-12-31 15:26:22 +01:00
Utility for printing an abstract syntax tree in the DOT format. The output
2020-02-05 13:35:04 +01:00
can be visualised using Graphviz. Errors are reported on invalid inputs.
2018-12-31 15:26:22 +01:00
Use '-' as input file to read from stdin.
OPTIONS:
2020-02-05 13:35:04 +01:00
-h, --help display this help message
-o, --output <out-file> write the output to <out-file> (defaults to stdout)
2018-12-31 15:26:22 +01:00
### `mc_symbol_table`
2020-02-05 13:35:04 +01:00
usage: mc_symbol_table [OPTIONS] <file>
2018-12-31 15:26:22 +01:00
Utility for displaying the generated symbol tables. Errors are reported on
invalid inputs.
Use '-' as input file to read from stdin.
OPTIONS:
2020-02-05 13:35:04 +01:00
-h, --help display this help message
-o, --output <out-file> write the output to <out-file> (defaults to stdout)
2018-12-31 15:26:22 +01:00
### `mc_ir`
2020-02-05 13:35:04 +01:00
usage: mc_ir [OPTIONS] <file>
2018-12-31 15:26:22 +01:00
Utility for viewing the generated intermediate representation. Errors are
2018-12-31 15:26:22 +01:00
reported on invalid inputs.
Use '-' as input file to read from stdin.
OPTIONS:
2020-02-05 13:35:04 +01:00
-h, --help display this help message
-o, --output <out-file> write the output to <out-file> (defaults to stdout)
2018-12-31 15:26:22 +01:00
### `mc_cfg_to_dot`
2020-02-05 13:35:04 +01:00
usage: mc_cfg_to_dot [OPTIONS] <file>
2018-12-31 15:26:22 +01:00
Utility for printing a control flow graph in the DOT format. The output
2021-04-15 13:52:05 +02:00
contains multiple connected graphs, one per function. The output can be
visualised using graphviz. Errors are reported on invalid inputs.
2018-12-31 15:26:22 +01:00
Use '-' as input file to read from stdin.
OPTIONS:
2020-02-05 13:35:04 +01:00
-h, --help display this help message
-o, --output <out-file> write the output to <out-file> (defaults to stdout)
2021-04-15 13:52:05 +02:00
-f, --function <name> print the CFG of the given function
2018-12-31 15:26:22 +01:00
### `mc_asm`
2020-02-05 13:35:04 +01:00
usage: mc_asm [OPTIONS] <file>
2018-12-31 15:26:22 +01:00
Utility for printing the generated assembly code. Errors are reported on
invalid inputs.
Use '-' as input file to read from stdin.
OPTIONS:
2020-02-05 13:35:04 +01:00
-h, --help display this help message
-o, --output <out-file> write the output to <out-file> (defaults to stdout)
2018-12-31 15:26:22 +01:00
## Project Structure
The following directory layout is used.
/ # This node represents the root of the repository.
2018-12-31 15:26:22 +01:00
├── app/ # Each C file in this directory corresponds to one executable.
│ ├── mc_ast_to_dot.c
│ ├── mcc.c
│ └── …
2019-01-07 15:07:01 +01:00
├── docs/ # Additional documentation goes here.
2018-12-31 15:26:22 +01:00
│ └── …
├── include/ # All public headers live here, note the `mcc` subdirectory.
│ └── mcc/
│ ├── ast.h
│ ├── ast_print.h
│ ├── ast_visit.h
│ ├── parser.h
│ └── …
2019-01-07 15:07:01 +01:00
├── src/ # The actual implementation; may also contain private headers and so on.
2018-12-31 15:26:22 +01:00
│ ├── ast.c
│ ├── ast_print.c
│ ├── ast_visit.c
2021-01-27 16:48:54 +01:00
│ ├── lexer.c
│ ├── parser.c
2018-12-31 15:26:22 +01:00
│ └── …
├── test/
│ ├── integration/ # Example inputs for integration testing.
│ │ ├── fib/
│ │ │ ├── fib.mc
│ │ │ ├── fib.stdin.txt
│ │ │ └── fib.stdout.txt
│ │ └── …
│ └── unit/ # Unit tests, typically one file per unit.
│ ├── parser_test.c
│ └── …
2019-01-30 10:47:43 +01:00
├── vendor/ # Third-party libraries and tools go here.
│ └── …
2018-12-31 15:26:22 +01:00
└── README.md
The README is kept short and clean with the following sections:
- Prerequisites
- Build instructions
- Known issues
`src` contains the implementation of the library, while `include` defines its API.
2020-02-05 13:35:04 +01:00
Each application (C file inside `app`) uses the library via the provided interface.
They mainly contain argument parsing and combine the functionality offered by the API to achieve their tasks.
2018-12-31 15:26:22 +01:00
The repository does not contain or track generated files.
2020-02-05 13:35:04 +01:00
All generated files are placed inside a build directory (i.e. out-of-source build).
2018-12-31 15:26:22 +01:00
### Known Issues
At any point in time, the README contains a list of unfixed, known issues.
2018-12-31 15:26:22 +01:00
2020-02-05 13:35:04 +01:00
Each entry is kept short and concise including a justification.
Complex issues may reference a dedicated document inside `docs` elaborating it in greater detail.
2018-12-31 15:26:22 +01:00
### Testing
2020-02-05 13:35:04 +01:00
Crucial or complicated logic should be tested adequately.
2018-12-31 15:26:22 +01:00
The project infrastructure provides a *simple* way to run all unit and integration tests.
2020-02-05 13:35:04 +01:00
See the getting started code-base for example.
2018-12-31 15:26:22 +01:00
2019-01-07 15:07:01 +01:00
Similarly, a way to run unit tests using `valgrind` is provided.
2018-12-31 15:26:22 +01:00
### Dependencies
2020-02-05 13:35:04 +01:00
The *prerequisites* section of the README enumerates all dependencies.
The implementation should not have any dependencies apart from the standard library, system libraries, a testing framework, and a lexer / parser generator.
2018-12-31 15:26:22 +01:00
2020-02-05 13:35:04 +01:00
If a dependency is not available via the evaluation system's package manager, it needs to be automatically built and used by the build system.
It is recommended to *vendor* such dependencies rather than downloading them during build time.
2018-12-31 15:26:22 +01:00
## Coding Guidelines
Architectural design and readability of your code will be judged.
- Don't be a git — use [Git](https://git-scm.com/)!
- Files are UTF-8 encoded and use Unix line-endings (`\n`).
- Files contain *one* newline at the end.
- Lines do not contain trailing whitespace.
2018-12-31 15:26:22 +01:00
- Your code does not trigger warnings, justify them if otherwise.
2019-01-30 10:47:43 +01:00
- Do not waste time or space (this includes memory leaks).
2018-12-31 15:26:22 +01:00
- Check for leaks using `valgrind`, especially in error cases.
- Keep design and development principles in mind, especially KISS and DRY.
2019-01-07 15:07:01 +01:00
- Always state the source of non-original content.
2018-12-31 15:26:22 +01:00
- Use persistent links when possible.
- Ideas and inspirations should be referenced too.
> Credit where credit is due.
### C/C++
- Use [ClangFormat](https://clang.llvm.org/docs/ClangFormat.html).
2020-02-05 13:35:04 +01:00
A configuration file is provided in the getting started code-base; however, you are free to rule your own.
- Consider enabling address and memory sanitizers.
2018-12-31 15:26:22 +01:00
- Lines should not exceed 120 columns.
- The nesting depth of control statements should not exceed 4.
- Move inner code to dedicated functions or macros.
- Avoid using conditional and loop statements inside `case`.
- Use comments *where necessary*.
- Code should be readable and tell *what* is happening.
- A comment should tell you *why* something is happening, or what to look out for.
- An overview at the beginning of a module header is welcome.
2019-02-17 22:57:48 +01:00
- Use the following order for includes (separated by an empty line):
2018-12-31 15:26:22 +01:00
- Corresponding header (`ast.c` → `ast.h`)
- System headers
- Other library headers
- Public headers of the same project
- Private headers of the same project
- The structure of a source file should be similar to its corresponding header file.
- Separators can be helpful, but they should not distract the reader.
- Keep public header files free from implementation details, this also applies to the overview comment.
- Use assertions to verify preconditions.
2019-01-07 15:07:01 +01:00
- Ensure the correct usage of library functions, remember to always check return codes.
- Prefer bound-checking functions, like `snprintf`, over their non-bound-checking variant.
2018-12-31 15:26:22 +01:00
Also, keep the following in mind, taken from [Linux Kernel Coding Style](https://www.kernel.org/doc/html/v4.10/process/coding-style.html):
> Functions should be short and sweet, and do just one thing.
> They should fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24, as we all know), and do one thing and do that well.
>
> The maximum length of a function is inversely proportional to the complexity and indentation level of that function.
> So, if you have a conceptually simple function that is just one long (but simple) case-statement, where you have to do lots of small things for a lot of different cases, it's OK to have a longer function.
>
> However, if you have a complex function, and you suspect that a less-than-gifted first-year high-school student might not even understand what the function is all about, you should adhere to the maximum limits all the more closely.
> Use helper functions with descriptive names (you can ask the compiler to in-line them if you think it's performance-critical, and it will probably do a better job of it than you would have done).
>
> Another measure of the function is the number of local variables.
> They shouldn't exceed 510, or youre doing something wrong.
> Re-think the function, and split it into smaller pieces.
> A human brain can generally easily keep track of about 7 different things, anything more and it gets confused.
> You know youre brilliant, but maybe you'd like to understand what you did 2 weeks from now.