/* Type checking rules for the Mitchell language. This document describes * how a conformant mitchell compiler will perform type checking on a * program. The exact rules are provided in the first section, while * explanation and examples are provided in the second. In the event of * any discrepancy between the rules and the explanation, the rules are * always correct. * * $Id: typing,v 1.8 2005/08/19 01:13:42 chris Exp $ */ /* mitchell - experimental compiler * Copyright (C) 2004 Chris Lumens * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ Typing Rules ============ ⊢ t : boolean [true-expr] ⊢ f : boolean [false-expr] ⊢ integer-constant : integer [integer] ⊢ string-constant : string [string] ⊢ ⊥ : bottom [bottom] A[I : T] ⊢ I : T [id] A ⊢ E : T ----------------- [raise-expr] A ⊢ raise E : T ℰ ∀i . (A ⊢ Ei : T) ---------------------------- [list] A ⊢ [ E1, ..., En ] : T list ∀i . (A ⊢ Ei : Ti) ----------------------------------------------------------------- [record] A ⊢ { I1 ← E1: T1, ..., In ← En: Tn } : { I1 : T1, ..., In : Tn } A ⊢ Et : Tt ∀i . (A ⊢ Ebi : Tt) A ⊢ Tt : { boolean, integer, string } ∀i . (A ⊢ Ei : T) ----------------------------------------------------- [case-expr-1] A ⊢ case Et in Eb1 → E1, ..., Ebn → En end : T A ⊢ Et : Tt ∀i . (A ⊢ Ebi : Tt) A ⊢ Tt : { boolean, integer, string } ∀i . (A ⊢ Ei : T) A ⊢ Ee : T ------------------------------------------------------------ [case-expr-2] A ⊢ case Et in Eb1 → E1, ..., Ebn → En, else → Ee end : T A ⊢ Et : (oneof (I1 T1) ... (In Tn)) ∀i. ∃j. ((Ii = Itj) ∧ (A[Ivj: Ti] ⊢ Ej: T)) --------------------------------------------------------- [case-expr-3] A ⊢ case Et in It1(Iv1) → E1, ..., EItn(Ivn) → En end : T A ⊢ Et : (oneof (I1 T1) ... (In Tn)) ∀i. (∃j. (Ii = Itj)) . A[Ivj: Ti] ⊢ Ej: T A ⊢ Ee : T -------------------------------------------------------------------- [case-expr-4] A ⊢ case Et in It1(Iv1) → E1, ..., EItn(Ivn) → En, else → Ee end : T A' = A[I1 : T1, ..., In : Tn] ∀i . (A' ⊢ Ii : Ti) A' ⊢ E : Te ------------------------------------------ [decl-expr] A ⊢ decl I1: T1, ..., In: Tn in E end : Te A ⊢ E1 : boolean ; A ⊢ E2 : T ; A ⊢ E3 : T ------------------------------------------ [if-expr] A ⊢ if E1 then E2 else E3 : T A ⊢ E : T ∀i . (A[Ii : Ti] ⊢ Ehi : T) ∀i . (A ⊢ Ti : Te ℰ) ---------------------------------------------------- [handle-expr-1] A ⊢ E handle T1 I1 → Eh1, ..., Tn In → Ehn end : T A ⊢ E : T ∀i . (A[Ii : Ti] ⊢ Ehi : T) ∀i . (A ⊢ Ti : Te ℰ) A ⊢ Ehm : T ----------------------------------------------------------------- [handle-expr-2] A ⊢ E handle T1 I1 → Eh1, ..., Tn In → Ehn, else Im → Ehm end : T A ⊢ Ip : ⊥ → Tb --------------- [call-empty] A ⊢ Ip () : Tb A ⊢ Ip : (T1, ..., Tn) → Tb ∀i . (A ⊢ Ei : Ti) --------------------------- [call-args] A ⊢ Ip (E1, ..., En) : Tb A[I1 : T1, ..., In : Tn] ⊢ Eb : Tb ------------------------------------------------------------ [fun-decl] A ⊢ ƒ If: Tb (I1: T1, ..., In: Tn) ← Eb : (T1, ..., Tn) → Tb Notes ===== [case-expr-3] is a specialization of the normal case-expr type checking rules for tag discriminated union types. This rule states that in a case-expr without a default branch, all possible constructors of the union must be specified. All branches must return the same type. [case-expr-4] is similar, except that it allows for a default expression to be evaluated if not all constructors are listed as branch possibilities. This default expression must be of the same type as the return value of all the branches. Type Equivalence Rules ====================== T = T [reflexive-=] T1 = T2 ------- [symmetric-=] T2 = T1 T1 = T2 ; T2 = T3 ----------------- [transitive-=] T1 = T3 T1 = T2 ----------------- [list-=] T1 list = T2 list τ I ← T list = τ Inew ← [Inew/I]T list [list-α] { Nl1, ..., Nln } = { Nr1, ..., Nrn } ∀i, j . (Nlj = Nri ⇒ Tlj = Tri) ---------------------------------------------------------- [record-=] {Nl1 : Tl1, ..., Nln : Tln } = {Nr1 : Tr1, ..., Nrn : Trn} τ I ← { N1, ..., Nn } = τ Inew ← [Inew/I]{ N1, ..., Nn } [record-α] { Nl1, ..., Nln } = { Nr1, ..., Nrn } ∀i, j . (Nlj = Nri ⇒ Tlj = Tri) ---------------------------------------------------------- [union-=] {Nl1 : Tl1, ..., Nln : Tln } = {Nr1 : Tr1, ..., Nrn : Trn} Notes ===== [list-=] states that two list types are equivalent if the underlying types are equivalent. That is, the following list types are equivalent: * integer list = integer list * string list list = string list list However, the following list types are not: * boolean list ≠ integer list * integer list ≠ integer list list [list-α] states that two list type definitions produce equivalent types even if they have different type names, as long as the underlying types are equivalent subject to [list-=]. That is, the following list definitions are equivalent: * τ lst ← string list = τ str_lst ← string list [record-=] states that two record types are equivalent if they have the same named elements with equivalent types. However, the elements can be listed in any order and still preserve equivalence. That is, the following record types are equivalent: * { name: string } = { name: string } * { make: string, model: string, year: integer } = { make: string, model: string, year: integer } * { name: string, age: integer } = { age: integer, name: string } However, the following record types are not: * { name: string } ≠ { age: integer } * { make: string } ≠ { model: string } * { year: string } ≠ { year: integer } [record-α] states that two record type definitions produce equivalent types even if they have different type names, as long as the underlying types are equivalent subject to [record-=]. That is, the following record definitions are equivalent: * τ car ← { make: string, model: string } = τ vehicle ← { make: string, model: string } [union-=] states that two union types are equivalent if they have the same named elements with equivalent types. However, the elements can be listed in any order and still preserve equivalence. That is, the following union types are equivalent: * τ tree1 ← ∪ { Leaf: integer, Branch: {Left: tree1, Right: tree1} } = τ tree2 ← { Branch: {Left: tree2, Right: tree2}, Leaf: integer} * τ enum1 ← ∪ { Red, Orange, Yellow, Green } = τ enum2 ← { Yellow, Red, Green, Orange } Name Scope ========== Mitchell is a lexically-scoped language with nested modules, providing different namespaces for types/exceptions, module names, and functions/values. Modules may further contain other modules. A module's symbol table exists for the entire lifetime of the program, while the symbol table for a function or decl block only exists while that local block is being type checked. Lexically scoped means that symbol names are resolved against the environment they were defined in, rather than the environment they were called from. At the top level, there is an implicit layer of scope that contains all the base types and identifiers, as well as all modules that are not declared inside another module. This layer also contains the base types - integer, boolean, string, and so on. So, the modules of the standard library would exist inside this top-level scope, as would the modules that make up the main modules of a program. With the exception of a potentially large collection of modules, the top-level environment is very small. The rules for resolving names are unfortunately a little more complicated than they should be. The module system is partially to blame for the complications. The rules are as follows: * The following pieces of syntax create new levels of scope: - modules - decl expressions - functions (for binding formal parameters) - exception handlers (for binding the exception value) - case branches for certain union types (for binding the various parts of the union) - parametric type declarations (for binding type values) * If a "naked identifier" is given (that is, without multiple period-delimited sections indicating module references) then it is assumed to exist somewhere within the lexically innermost module. This environment consists of all the identifiers defined at the module's top level, plus all the modifications made to the environment to take into account any of the above syntax that the reference may occur in, plus the global namespace. This environment is searched from the innermost scope outward to the global scope. At this point if the symbol is not found, it is an unresolved reference. * If an identifier is given that consists of period-delimited names, it must be referring to a symbol defined in a module. Since modules may nest and could have the same name as a lexical parent module, and due to complications from the absorb keyword, modules are searched in the same method as regular identifiers. * Within a block of declarations, new type and function declarations may reference other types and functions that are not yet defined. This is to allow recursive declarations for tree structures, recursive functions, and so forth. However, restrictions are placed on how this works. A group of type declarations is defined as several type declarations with no value, function, module, or exception declarations. A group of function declarations is defined similarly. In a group of type declarations, any type may refer to any other type in the group, to allow for recursive definitions. However, all types must be resolved at the end of the group. Similarly, in a group of function declarations, any function may refer to any other function in the group. * Values may not make reference to values that are not yet defined. * A name must be unique within its defining scope. However, a name may override one defined in a more outer level of scope, as long as it does not override one of the types defined in the global namespace. * Module references may not form cycles. That is, module A may refer to module B, but only if module B does not also refer to module A. This could be considered a bug, though this is standard in many languages and may not have any good solution.