programming languages (notes)
Assembly / assembler language: A language in which each line corresponds directly to a machine instruction. Assembly spares the programmer from having to work directly in binary code when they want to manually create machine code.
Each processor instruction set requires its own assembly language, e.g. x86 assembly differs from MIPS assembly. Moreover, assembly code is particular to each individual assembler, so for instance, x86 assembly code written for the GAS assembler for x86 won’t assemble using the MASM assembler for x86.
Assembly language is generally considered neither statically nor dynamically typed because assembly has no conception of types. Everything is just bytes.
Assembler: A program that translates assembly code into object code. The most popular assemblers for x86 include:
- NASM (Netwide Assembler). Used on Unix.
- GAS (Gnu Assembler). Used on Unix.
- MASM (Microsoft Macro Assembler). Used on Windows.
Object code: Machine instructions, but with symbolic names used as stubs in place of some addresses. To produce an executable, a linker patches in the real addresses.
Linker: A program that patches together object files to produce executables.
Dynamic linking: An operating system feature that allows code to be linked into an already running process. This allows many programs to make use of the same library code without wasting storage space and memory. On Linux, object files meant to be dynamically linked against are called shared-object files and end in .so. On Windows, they are called dynamic-linking libraries and end in .dll.
Low-level language: Languages that provide precise control of the hardware. Basically a synonym for assembly languages. (Some people like to call C and Fortran low-level languages or sometimes mid-level languages, but most often they are lumped in with high-level languages.)
High-level language: A language that abstracts away the details of the hardware to allow greater expressiveness per line of code and greater portability.
Compiler: A program that translates one form of code into another—usually high-level language source code into object code. Unlike with assemblers, this is usually not a simple one-to-one translation, hence the different term.
Interpreter: A program that translates code into action: as the interpreter reads each statement of code, it does what the code says to do rather than producing another form of code. When code is interpreted, usually any kind of linking that needs to be done is done at runtime by the interpreter.
Virtual machine: Arguably just a glorified term for an interpreter, but usually refers to an interpreter that doesn’t interpret source code but rather interprets some kind of intermediate code that resembles machine instructions. In Java, they call this intermediate code bytecode while, in C#, they call it IL (Intermediate Language) code.
JIT compilation: With Just-In-Time compilation, an interpreter, instead of interpreting a piece of code, compiles it into machine code and then has that machine code execute. The Java VM and CLR do JIT compilation.
Garbage collection (GC): A language feature that automatically deallocates the memory allocated in our program when we’re no longer using it. With automatic garbage collection, we don’t have to worry so much about memory leaks (though it is still possible to create them).
Type error: Doing something with a piece of data which you’re not supposed to do with that type is called a type error. Supplying the wrong type of operand to an operator or function is a type error.
Static typing: In a statically-typed language, type errors can be programatically detected without actually running the program. When the compiler detects a type error, it will abort the compilation and report the problem. This effectively eliminates a whole class of potential bugs.
Dynamic typing: In a dynamically-typed language, type errors cannot be programatically detected without running the program. Most dynamically-typed languages are strongly typed, meaning that the wrong type of operand supplied to an operation will cause an error. This is desirable: you don’t want your code to blindly continue, oblivious to the type problem.
Polymorphism: When an operation or function accepts invokation with a varied number of arguments and/or with varying types of operands, it is polymorphic. In a dynamically-typed language, we create a polymorphic function by simply testing the number/types of the parameters and branching as we see fit, e.g. ‘if the second argument is a number, do this, otherwise do that’. In a statically-typed language supporting polymorphism, we can declare a function of the same name multiple times but with differing numbers and/or differing types of parameters. Each version of the function need not return the same type as any of the other versions. In effect, the versions are really just separate functions that happen to share the same name.
Weak typing: A weakly-typed language allows for arbitrary manipulation of any data. In this sense, the data types are open for violation, hence “weak”. Assembly and C are the primary examples of weakly-typed languages.
Strong typing: A strongly-typed language only allows manipulation of a piece of data through operations intended for its type. In dynamically-typed languages, strong typing requires checking the types of the operands to every operation before each time they are executed. This overhead degrades performance. In statically-typed languages, the compile-time type-checking makes runtime checking unnecessary.
(While C is statically-typed, it is still weakly-typed because C has a unique feature that allows treating any data as just bytes. Once you can arbitrarily muck with bits, the static typing system can’t stop you from doing whatever you want to data, whether “type valid” or not.)
Paradigm: A fundamental approach or way of thinking about problems and code.
Imperative: The paradigm in which we freely modify state.
Functional: The paradigm in which we avoid modifying state as much as possible.
Procedural: The paradigm in which code is comprised of functions (or “procedures”, or whatever you want to call them). Procedural code can be functional but is most commonly imperative.
Object-oriented programming: The paradigm in which code is comprised of the definition of data types. Object-oriented code can be functional but is most commonly imperative.
Syntax and semantics: At heart, a language is a set of syntactical and semantic rules. Syntax refers to the rules governing the source text while semantics refers to the meaning behind the source text: that comments in Pigeon begin with # and run to the end of the line is a syntax rule; that functions in Pigeon must be called with a fixed number of arguments is a semantic rule.
Library: A body of code that provides commonly useful functionality.
Idiom: A small-scale pattern that frequently occurs in code of a particular language.
Tool: Any program that helps software development.
Debugger: A tool that allows you to pause execution of your code and see what is happening in memory as your code executes.
Profiler: A tool that helps you measure performance of your code, both in terms of execution time and memory use.
Version control: A tool that helps developers keep track of changes in their code as it is developed and helps developers merge their work together.
IDE (Integrated Developer Environment): A tool that ties together a text editor and other tools into one graphical interface. In an IDE, functionality usually done at the command line, such as running or debugging your code, can be done at the push of a button.
C: A static but weakly-typed language used when control and efficiency are big concerns.
C++: Like C, but with object-oriented features added.
Objective-C: Like C, but with object-oriented features added. Favored by Apple for Macintosh and iPhone development.
Java: A popular language with a mix of static- and dyanmic- typing and a bias towards object-oriented programming. The most-used language today, Java is compiled into “bytecode”, which is then executed by a VM.
C#: A Java-like language introduced by Microsoft.
Visual Basic: An older language from Microsoft. Today basically just an alternative syntax for C#. While still more widely used than C#, VB is not as well regarded.
Perl: A dynamic language popularized in the 1990’s. Now on the decline as similar languages take its place.
Python: Like Perl, but cleaner and more elegant.
Ruby: Also like Perl, but cleaner and more elegant (though not so much as Python, depending whom you ask).
PHP: A dynamic language almost exclusively used for web applications. PHP is considered ugly by most, but it filled the right niche at the right time to become popular.
Fortran: The first major programming language. Still used today by scientists and engineers.
Lisp: The second major programming language, often considered to be well ahead of its time. Has never been widely used, but has fervent advocates.
Efficiency: Interpretation, dynamic typing, and automatic garbage collection tend to introduce overhead at runtime, so the fastest languages are generally statically typed and compiled to machine code.
Portability: Portability first hings on the ability to translate your source into something runnable on all of your target platforms, so assembly code is the antithesis of portable. Portability also requires that the libraries you use be available on all target platforms. Lastly, portability requires that the target platforms all support the needed capabilities.
Functional languages: Some lanuages are designed with the functional paradigm in mind. These include Haskell, Scala, ML, and F#. None of these are terribly popular, but they have their adherants.
Logic language: The only notable language designed for the logic paradigm is called Prolog. Pretty much only academics have ever used Prolog.
Shell language: A shell language is a language designed to be primarily used interactively as a command prompt. The most common shell on Unix is called BASH, which we’ll cover in a later unit.
Scripting language: Shell languages and dynamic languages have often been called scripting languages because they are commonly used to run and orchestrate other programs. Programs written in shell languages, especially, are called scripts.
Data language: A data language is not a programming language, per se, but rather a human-readable way to textually express data. XML, for instance, is a standard syntax for expressing hierarchical data of all kinds.
Query language: A language for making requests to a database.
Domain-specific language: A language intended to solve a limited class of problems in a limited domain. (Query languages fall into this category.)
Graphical language: A language in which the code is expressed graphically, either whole or in part, instead of textually. Some domain-specific languages are successfully expressed graphically, but graphically expressing code of a general-purpose language is probably just a bad idea.