the Java language (notes, part 1)

Java was created by Sun Microsystems in the mid 1990’s. Within a few years, it became the most widely used programming language.

Java is an imperative, object-oriented language.

Java’s syntax strongly resembles C, but semantically, it is very different. While Java is usually considered a static language, it has some elements of dynamicism.

Unlike C, Java has no pointers, and Java code is compiled to bytecode to be executed by an interpreter called the JVM (Java Virtual Machine). Like most interpreted languages, Java uses automatic garbage collection, and Java has an exception mechanism.

Java’s standard library is much more fully-featured than what you expect in C. It is so large, in fact, that Java comes in three different “editions” that differ in the completeness of their included libraries:

  • Standard Edition: the default used on most computers
  • Enterprise Edition: everything in Standard but with extra libraries having to do with large-scale networking
  • Micro Edition: stripped down so as not to waste space with unneeded stuff on small devices, like cell phones

The current version of Java is known as Java 6. Java 7 should be released sometime in 2010. Previous versions of Java have ridiculously confusing names.

The core concept in most object-oriented languages, including Java, is what’s called a class. Like a struct in C, a class is a data-type definition, but unlike a struct, classes contain not just data members (called fields) but also function members (called methods). A class is like a blueprint for creating pieces of data; these pieces of data are called objects or instances of their class.

A class is written with this syntax:

class name {members}

The members can be writte in any order, but the usual convention is to put fields before methods. Also by convention, class names in Java should begin with a capital letter.

The members (the fields and methods) are listed in any order. A field looks basically like a C variable declaration:

type name;

A method looks very much like a C function definition:

return-type name(args) {statements}

The theory behind creating classes rather than just structs and functions is that the data types in our code and the operations we use upon those types should be encapsulated together: only the methods of a class should touch the fields of its instances; methods of other classes should only interact with class X’s instances by using X’s methods, not by accessing X’s fields directly.

§

To get the value of an object’s field, use the dot operator:

object.field

…where object is an expression evaluating into an object and field is an identifier naming a field of that object. Similarly, to invoke a method:

object.method(args)

…where object is an expression evaluating into an object, method is an identifier naming a method of that object, and args is a comma-separated list of arguments.

Because the dot operator expects any object expression before it, the dot operator can be chained:

w.x().y.z()

With explicit parentheses, this is:

((w.x()).y).z()

When invoking object.method(args), the relationship between object and method is this: inside the invocation of the method, the object is assigned to a special reference named this. So when calling x.y(), the reserved word this in the body of method y() refers to the object from x for the duration of the invocation.

A variable or field is declared with the syntax:

type name;

A variable of a class type is a reference variable—it holds an address of an instance:

Cat c;    // creates a variable which holds a reference to a Cat object; the initial value is null

To create an actual instance, use the new operator:

new type(args)
Cat c = new Cat();   // create reference c with a new Cat instance as its initial value

The arguments are the arguments to pass to the constructor. A constructor is a method invoked when an object is created. The point of a constructor is to do any setup work appropriate for instances of the class.

A constructor is a written like a method but has no return type and shares its name with the class:

class Cat {
    Cat() {…}    // constructor for the class Cat
}

Inside a constructor, the object being constructed is referred to by the special reference this.

§

When class A inherits from class B, A is the ‘child’ of B, and B is the ‘parent’ of A.

The members of a parent are automatically included in its children, so a class contains all the members defined in itself plus everything its parent has.

Each class in Java must inherit from one—and only one—other class. The only exception is the built-in class Object, which is the ancestor of all other classes. Effectively, the classes of a Java program form a tree with the class Object at the top.

A class denotes its parent with the extends clause:

class Terry extends Ben {…}   // the parent of Terry is Ben

By default, a class inherits from Object:

class Bernard {…}          // the parent of Bernard is Object

When one class inherits from another, the child should be a more specific kind of thing as its parent. For instance, a Fiat is a kind of car, so it makes sense to have a class Fiat which inherits from a class Car.

A common mistake is to use inheritance when you should use composition: a SteeringWheel is not a kind of Car, nor is a Car a kind of SteeringWheel, so one should not inherit from the other. Rather, a Car has a SteeringWheel, so it makes sense if a Car class has a SteeringWheel field. A Car is composed of a SteeringWheel.

It’s generally a bad idea to have many levels of inheritance. You generally shouldn’t create classes which have more than a few ancestors.

§

When class A is a descendent of class B, instances of type A are considered suitable subsitutes where a B instance is expected:

Mammal m = new Hamster(); // legal if Hamster is a descendent of Mammal

What’s really going on here is an implicit “upcast” from Hamster to Mammal:

Mammal m = (Mammal) new Hamster();

Casting in the other direction, “downcasting”, cannot be left implicit:

Mammal m = new Hamster();
Hamster h = (Hamster) m;

A downcast not only makes the compiler happy, it represents a type check at runtime. The cast to Hamster here requires a check to make sure the Mammal reference actually holds a Hamster; if not a Hamster, the downcast throws an exception.

§

A class can override any method which it inherits, meaning it can create its own method of the same name, the same return type, and the same number, types, and order of parameters.

When invoking a method, the override invoked is determined by the type of the object, not the compile-time type of the expression:

Mammal m = new Cat();
m.eat();

Assume Mammal defines a method eat and Cat overrides it. The above code will invoke eat of Cat because the reference m will hold a Cat object when m.eat() is invoked.

An interface, in Java, is a contract that lists methods. The interface doesn’t specify an actual body for these methods but rather just their names, their return types, and the number, order, and types of their arguments:

interface Philip {
    void foo(Hamster, Cat);
    void bar();
}

A class that implements an interface is required to have an actual method for each method listed in the interface:

class Janice implements Philip {
    void foo(Hamster h, Cat c) {…}
    void bar() {…}
}

You can use an interface as a type for variables or for a method return type. Any object whose class implements an interface is considered a valid instance of that interface:

Philip p = new Janice();

For an expression which has the compile-time type of an interface, only the methods of that interface can be invoked via that expression.

Philip p = new Janice();
p.bar(); // OK
p.ack(); // if ack is defined in Janice but not Philip, this is illegal

If a class implements an interface, all of its descendents are considered to implement it as well.

§

A method invocation resolves to a particular method by these rules:

Given object.method(args), where object is an expression evaluating into an object, method is an identifier naming the method, and args is a comma-separated list of arguments:

  1. The compiler checks the compile-time type of object to make sure its class actually has a method of that name.
  2. Because the runtime type of the object may not be the same as the compile-time type, it’s left until runtime to decide whether to invoke that class’s method or an override of the method in some subclass thereof.
  3. However, if the object expression is the special reference this, Java invokes the class’s method, not an override from some descendent.
  4. If the object expression is the reserved word super, Java invokes the inherited method, passing this as the object.
super.foo()  // invoke inherited method foo, passing this as the object

So it’s possible, if class X inherits a method foo but overrides foo, that the overriden foo might get invoked with an X instance as the object. This might happen indirectly through some other inherited method, or it might be done explicitly using super.

§

Classes and interfaces are known as reference types. Java also has 8 primitive types:

  • byte                      1-byte signed integer
  • short                    2-byte signed integer
  • int                        4-byte signed integer
  • long                     8-byte signed integer
  • char                     2-byte unsigned integer
  • boolean               true and false
  • float                     single-precision floating-point
  • double                 double-precision floating-point

If you really need arbitrary-precision, arbitrary-magnitude numbers, use the standard library classes BigInteger and BigDecimal.

Unlike a reference-type variable, a primitive-type variable is a value variable: the variable directly holds the data, not an address pointing to the data.

The == operator used with reference-type operands tests for identity:

Cat fluffy;
Cat mittens;
// (fluffy == mittens) returns true if fluffy and mittens reference the very same object

In contrast, == used with primitive-type operands tests for equality:

int x;
short y;
// (x == y) returns true if x and y hold equivalent numeric values

Finally, casting a reference type merely satisfies the compiler about an object’s acceptability in that context. Casting a primitive actually produces a new value:

int x = 356;
float f = (float) x;      // the cast produces a float equivalent of the int value in x

When casting primitives, casts from certain types to certain other types can be left implicit because such casts never produce distortions to the numeric value. For instance, a cast from a byte to an int can be left implicit because all possible byte values are proper int values.

§

A static member is a member of a class which is really not a member at all: a static’s only real relationship to the class is that the class serves as its namespace. So a static field is really just a global variable, and a static method is really just a plain old function.

A static field, unlike an “instance field”, exists singley and independently of any instances of the class: there is always just one variable per static field, and it is not attached to any instance; in fact, a static field exists whether you create any instances of the class at all.

A static method, unlike an “instance method”, has no object passed to the special reference this when it is invoked; this has no meaning in a static method.

To refer to a static method or field, use the classname with the dot operator:

Cat.foo       // the static field foo of the class Cat
Cat.bar()     // the static method bar of the class Cat

Confusingly and pointlessly, Java allows us to refer to static methods and fields via any expression with the compile-time type of the appropriate class. So for instance, assuming a static field foo of Cat:

c.foo         // if the reference c is of type Cat, this refers to Cat.foo
x.y().foo     // if the method y() is declared to return a Cat, this refers to Cat.foo

Also confusing and pointless, static methods are “inherited” by descendant classes. This just means we can access the inherited statics via the name of the descendant classes and via expressions of those types. So, say, if Mammal has a static field ack, then so does its descendant Hamster:

Hamster.ack          // the static field ack inherited by Hamster from Mammal
Mammal.ack           // the very same field

Be clear, however, that these both refer to the very same field.

Static methods can be overridden, but the override is really just a separate function that happens to share the same name as the method it overrides.

To keep things simple and clear, I strongly advise you to always just refer to statics via the name of the class in which they are defined.

Reference types (classes and interfaces) are organized into namespaces called packages. At the top of each source file, you declare to which package the classes and interfaces of the file belong:

package hippo;    // the clases and interfaces of this file belong to the package goat

Package names can have dots in them:

package goat.lemur;    // the clases and interfaces of this file belong to the package goat.lemur

In the terminology of the Java spec, goat.lemur refers to lemur, a “subpackage” in the package goat. Like with directories and subdirectories, though, there’s no special relationship between the stuff in lemur and goat. It’s simplest just to think of goat.lemur as a wholly distinct package that happens to have a dot in its name.

When using the name of a class or interface, its full name is prefixed by its package, e.g.:

package.class

So the full name of the class Cat in the package goat.lemur is:

goat.lemur.Cat

We only have to use the full names of classes and interfaces from other packages, e.g. when in a file of package A, we have to use the full name of classes from packages B and C but not of package A.

Because having to use full names is annoying and ugly, you can “import” individual classes and interfaces from other packages such that you don’t have to write their full names. Import statements go at the top of the file after the package statement:

package hippo;
import goat.lemur.Cat;
import goat.lemur.Dog;

With these imports, we don’t have to write the full names of Cat and Dog when we use them in this file even though they belong to another package.

§

The package java.lang contains standard library classes which are integral to the language. The Object class, for instance, is in java.lang.

Another key class is java.lang.String. A String in Java is a heap-allocated object, and it includes a few dozen useful methods. None of the String methods mutate strings, so strings in Java are immutable. Instead of mutating strings, we produce new ones based upon existing strings:

String s = “Thunderdome”;
s = s.toUpperCase();  // creates new String “THUNDERDOME” and assigns it to reference s

The visibility modifiers applied to fields, methods, constructors, classes, and interfaces determine where the names of those things can be used in code:

  • public                                   (visible everywhere)
  • protected*                            (visible in the same package and in any subclass)
  • default (visible in the same package)
  • private*                                (visible in the same class)

Default visiblity is denoted by the absense of the modifier reserved words public, protected, and private.

Classes and interfaces can only be public or default.

By restricting visibility of some members, the compiler helps us enforce encapsulation: when the instances of our class are used, the compiler only allows the members of our choosing to be invoked/referred to by name outside of the class. This helps ensure that the internal business of our class is not interfered with by the rest of the code.

§

To overload a method is to create multiple methods of the same name in the same class. In truth, these methods of the same name have no special relationship except they happen to share the same name. In order for the compiler to distinguish a call to one overload from another, methods of the same name can’t have the same number, types, and order of parameters. So for example, say we have in a single class these methods:

void foo() {…}                             // OK
int foo(String x) {…}                      // conflict
Cat foo(Hamster h, char x) {…}             // OK
short foo(String a) {…}                    // conflict

The names of the parameters and the method return types are irrelevant, but we can’t have two overloads of foo which both expect a single parameter of type String. The problem is that this conflict would lead to an ambiguity when we invoke foo with a single String argument:

x.foo(“avast”)

Does this invoke the foo returning an int or the foo returning a short? The compiler can’t know, so the compiler will object if you create such ambiguous overloads in your classes.

The compiler tries to find the best matching overload based on the compile-time types of the arguments. The precise rules for this are actually quite complicated because of the ability to pass arguments which don’t precisely match the parameters (e.g. passing a Cat argument to a Mammal parameter). When in doubt, you can always just explicitly cast your arguments to make sure the compiler picks the overload you have in mind.

Be clear on the distinction between overloading and overriding. When overriding, the number, types, and order (but not the names) of the parameters must exactly match, as must the return type. So if in class Cat we inherit void foo() but define in Cat void foo(int a), we’re defining an overload, not an override. If we inherit void foo() but define in Cat boolean foo(), this would be an illegal overload.

In the end, overloading methods doesn’t add any real capability to Java. It simply allows us to give closely related methods the same name for stylistic purposes.

§

Constructors can be overloaded, e.g. the class Cat can have three constructors:

Cat(int a) {…}
Cat(boolean b) {…}
Cat() {…}

Unlike overloading of methods, this is actually useful because it’s the only way we can have multiple constructors for one class.

In a constructor, you can invoke this in the first line to invoke another overload of the constructor. You can also invoke super in the first line to invoke a constructor of the parent. In fact, the first line of a constructor is always either this or super (never both).

When an object is created, the constructors of its ancestors should run before its own constructor, in order from Object on down the chain of inheritance. To enforce this, Java has rules about constructors:

  1. Constructors are invoked only via new, this, or super
  2. Calls to this and super can only occur in the first line of a constructor, nowhere else.
  3. The first line is always this or super.
  4. If the first line is neither this or super, the constructor implicitly starts with a call to super with no arguments.
  5. A this call cannot be recursive. The compiler will detect recursive uses of this and abort compilation.
  6. A return statement in a constructor is always written just return; . If omitted, the last line of a constructor is always implicitly return;

When you don’t write any constructors in a class, Java includes a default constructor which takes no arguments and does nothing except invoke super.

§

Like Javascript, Java has an exception mechanism. Exception objects in Java must be of type java.lang.Throwable or some descendent thereof.

Throwable has two children, Error and Exception, and Exception has a child RuntimeException:

  • Error:  something has gone wrong in the JVM itself
  • Exception: something has gone wrong in your code
  • RuntimeException: your code has violated a rule of Java (a rule which the compiler couldn’t enforce and so leaves up to the runtime to enforce)

When you create your own exception types, you almost always make it a descendent of Exception.

Any exception which is not an Error or RuntimeException (or descendent of these two) is considered “checked”: a method which might throw one of these types must include a throws clause.

void jeremy() throws Victoria, Hugh {…}

The throws clause above is necessary if jeremy might throw an exception of checked-exception type Victoria or Hugh.

In a throws clause, a listed type covers all exceptions of not just that type but of descendants, so, say, we can write:

void jeremy() throws Exception {…}

…and this covers any ancestor of Exception (which presumably includes Victoria and Hugh).

Error and RuntimeException are “unchecked”, so Java doesn’t require you to declare which methods might throw them. These exceptions might occur anywhere in code, and so it would generally be wrong-headed to catch and handle them.

An exception is caught using try-catch:

try {…} catch (exception-reference-declaration) {…}

When an exception occurs in the try block, execution jumps to the catch. The exception object is passed to the exception reference declared in parentheses after catch; this reference is in the scope of the catch block.

If the exception is the wrong type to assign to the catch’s exception reference, the exception is not caught and instead propogates.

§

An array, in Java, is a fixed-size collection. Unlike in C, an array in Java is allocated on the heap, not the stack. Like classes and interfaces, arrays are considered reference types, so when we declare:

Cat[] c;

…this creates a reference for holding a Cat array but doesn’t create any actual Cat array.

To create an array, we use the new operator with the size of the array specified as an integer expression in square brackets where the constructor arguments usually go:

new Cat[6]           // create a 6-element Cat array

The slots of this array are Cat references, so we can assign to them anything which is a valid Cat object. (Upon creation, the slots all reference null.)

An array object has an int field length containing the number of elements in the array:

Cat[] c = new Cat[6];
int x = c.length;          // assign 6 to x

To retrieve the values of slots and to assign to them, we use the [] subscript operator. Unlike in C, this is not syntactical magic for dereferencing and pointer arithmetic: Java has no concept of referencing/dereferencing or pointer arithmetic.

Like in C, the syntax of the [] operator is peculiar: it’s a binary operator which surrounds its second operand and is conventionally written with no spaces:

x[3]                 // return the 4th element of the array referenced by x

For this to be a valid expression, x must be an array.

Assigning to such a target assigns to a slot in the array:

x[3] = y;     // assign value referenced by y to the 4th slot in the array referenced by x

Arrays effectively form a type hierarchy in parallel to the class hierarchy: for each class type, we have a corresponding array type, and these array types inherit from one another the same way, e.g. if you have a class Cat which inherits from Mammal, you have corresponding types Cat[] which inherits from Mammal[]. At the top of this parallel hierarchy is Object[], which itself inherits from regular Object.

So effectively, a Cat[] object is a valid Mammal[], a valid Object[], and a valid Object.

When you use the [] operator, the Java compiler blindly assumes the type of the array based on its compile-time type. This can lead to thrown exceptions:

Mammal[] m = new Hamster[4];
m[2] = new Mammal();            // throws ArrayStoreException

The actual array in m has Hamster-type references, not Mammals, so we can’t assign them Mammal objects.

Similarly, when you get an item from an array, the compile-time type corresponds to the compile-time type of the array:

Mammal[]m = new Hamster[4];
m[2] = new Hamster();
// even though m[2] now returns a Hamster, it has compile-time type Mammal
Hamster h = (Hamster) m[2];

§

In addition to regular single-dimension arrays, Java has multi-dimensional arrays:

Cat[]                        // regular Cat array
Cat[][]                      // 2-dimension Cat array
Cat[][][]                    // 3-dimension Cat array
Cat[][][][]                  // 4-dimension Cat array

Etc…

An n-dimension X array is made up of references to (n – 1)-dimension X arrays, e.g. a 5-dimension Cat array is made up of references to 4-dimension Cat arrays; and a 2-dimension Cat array is made up of references to single-dimension Cat arrays.

Each dimension forms its own parallel type hierarchy which ultimately inherits from regular Object. For instance, given our hierarchy of classes, there is a parallel hierarchy of 4-dimension array types, with Object[][][][] at the top, which itself inherits from Object.

When creating a multi-dimension array, the size of the array is specified in the first set of square brackets:

new Cat[5][][]                   // a 5-element 3-dimension Cat array

Like most operators, [] has left-to-right-associativity, so the expression:

x[3][5][7]

…can be expressed:

((x[3])[5])[7]

Breaking this down, we get:

x[3]                        // return the 4th element from the array
(     )[5]                  // return the 6th element from that array
(             )[7]          // return the 8th element from that array

So for this to be a valid expression, x must be a 3-or-greater-dimension array.

A single-dimension array of primitives contains not references but actual primitive values. For instance:

new int[4]                 // create a 4-element array of ints

The slots of this array store actual int values themselves, not references to int values. (Upon the array’s creation, the slots’ all initially hold the value zero.)

A multi-dimension array of primitives, however, is really an array of other arrays, so it consists of references.

Arrays of primitives of any dimension all directly inherit from Object. So whereas a Cat[][] is a valid kind of Object[][] or Object, a float[][] is only a valid kind of Object.

To launch a program in Java, you specify a class to run when you launch the JVM. That class must contain a method named main that returns void and has one parameter of type String[]; the class must be public, and main must be public and static. Execution of the program begins by invoking this main method.

Comments are closed.