Understanding Java I/O Facilities
Joseph Bergin
Pace University

Introduction

The Java input and output libraries are quite extensive and quite confusing for the beginner. We hope to clear up some of the difficulties here.

Java permits two kinds of I/O: byte and character. Byte oriented I/O is intended for data that does not need to be read by humans. Avoiding the translation step to and from readability saves time and space. If you want to save a file for later reading by the same or another program you usually want byte oriented I/O.

On the other hand, if the information needs to be read by people, you want character I/O. Java uses an international code called Unicode that requires 16 bits to represent a character. This is the same length as a Java short. Unicode is sufficient to provide accurate alphabets for all of the worlds languages except Chinese, which is subseted in Unicode. Java also can understand UTF code which requires 24 bits per character. UTF covers all the languages in use in the world, including all of Chinese. If Java is to be useful on the web and the web is a world-wide phenomenon, then the ability of a Java program to work with any language is very important.

In Java, byte I/O is handled with a set of classes called InputStreams and OutputStreams. There are several kind of each of these, such as FileInputStream and DataOutputStream. Each of these has a special purpose and is intended to do one job well. The stream classes are used together to provide a wide variety of services to the programmer.

Character I/O is achieved by using the Reader and Writer classes. Again there are a large number of each of these, including FileReader and PrintWriter. Once again, the classes need to be composed in a certain way to do sophisticated things. The Reader and Writer classes are new to Java (1.1) and correct some flaws in the earlier libraries. As a result, many of the methods of the InputStream and OutputStream classes have been deprecated and are therefore not recommended for use. In a future version of Java, some of these deprecated methods will disappear.

System.in and System.out

Java provides three standard files in the System class that can be used for simple I/O, though they are primarily intended for testing programs and reporting errors. These are System.in, System.out, and System.err. These correspond roughly to the standard UNIX input, output, and error files. When running a Java applet, these may not be available to you for security considerations, though some browsers provide at least System.out.

System.in is a DataInputStream, though it is intended for character based input. This is an historical problem and it means that System.in will not handle Unicode properly. Likewise System.out and System.err are PrintStreams. We will discuss the characteristics of these in some detail later, but for now it is sufficient to know that System.in can read an entire line into a String. It can also read all of the standard simple types like int, long, and float. These latter methods, however, assume that the external encoding is binary data, not characters. So it is not suitable for reading from a user typing integers at a keyboard. Only readLine translates from character encoding. While this seems to limit (greatly) the usefulness of System.in, we shall see that Java provides a solution in the StringTokenizer class.

Likewise System.out can write all of the simple types like int and double, as well as Strings. PrintStreams also convert their inputs to human readable form, but do not handle Unicode characters properly, so their general use is discouraged in favor of PrintWriters which we shall examine also.

Basic Principles

The Java I/O classes are each meant to do one job well and to be used in combination to do complex tasks. For example, a BufferedReader provides buffering of input, but nothing else, A FileReader provides a way to connect to a file, but nothing else. Using them together, however, lets us do buffered reading from a file. If we want to keep track of line numbers when we read from a character file, so that we can ask which line we are reading from, we can combine these with a LineNumberReader.

Each Java I/O class provides certain methods. To use the libraries, look for the methods that you need to execute and then create an object of that type. Consider the other characteristics of the input or output that you wish to achieve (buffered or not, source of the information, etc.) and connect an object of that type to the first. In complex situations you might connect quite a number of objects together, much as a plumber connects pipes and fittings.

StringTokenizer

Before we can discuss input streams properly, we need to discuss a class from the java.util package. This StringTokenizer class is used to break up a String into parts according to some user criteria. The usual thing that you have is a String composed of words separated by whitespace characters like space and tab. Sometimes you have numbers separated by commas or sub strings separated by back slash characters. Many things are possible. What you often want to do is extract the individual parts from the string. In Java a StringTokenizer is intended for this purpose.

We construct a new StringTokenizer by giving it a reference to the String that it will process. We can optionally give it another string that contains the character or characters that will separate the string into sections. We can ask the StringTokenizer object how many elements, called tokens, the string contains with countTokens. The primary usage, however, uses two methods, hasMoreTokens and nextToken. Here is an example:


StringTokenizer tokens = new StringTokenizer("These are the times.");
System.out.println( tokens.countTokens()); // Should be 4.
while (tokens.hasMoreTokens())
	System.out.println( tokens.nextToken());

This should print the four words, one on each line with the period still attached to the last word:


	4
	These
	are
	the
	times.

Another example uses a comma separated list of integers. If all we want to do is print them it is easy.


StringTokenizer tokens = new StringTokenizer("1234,5,67,89", ",");
System.out.println( tokens.countTokens()); // Should be 4.
while (tokens.hasMoreTokens())
	System.out.println( tokens.nextToken());

This will produce:


	4
	1234
	5
	67
	89

If we want to add the numbers up, however, it takes more work, since nextToken returns a String, not an int. The various classes in java.lang can help us here. In particular, the Integer class provides a static method parseInt that will extract an int from a String, if the String properly contains only an integer. If it doesn't however, the method will throw an exception which the user must catch. Here we use a semicolon separator for variety.


StringTokenizer tokens = new StringTokenizer("1234;5;67;89", ";");
int sum = 0; 
while (tokens.hasMoreTokens())
{	try
	{	sum += Integer.parseInt( tokens.nextToken() );
	}
	catch (NumberFormatException e )
	{	System.out.println("Illegal encoding of string.");
	}
}
System.out.println(sum);

One of the problems that beginners have with Java, especially with I/O is that exceptions must be caught. They cannot be ignored.

We shall see how StringTokenizers help us do I/O in a moment. The basic idea is that it is easy to get a String on input, but harder to get things like ints and floats.

Readers and Writers

Readers

First we will attack the character I/O features of Java 1.1. The abstract Reader class has the following descendants:

	BufferedReader
		LineNumberReader
	CharArrayReader
	FilterReader (abstract)
		PushbackReader
	InputStreamReader
		FileReader
	PipedReader<
	StringReader

The BufferedReader provides buffering which increases efficiency since fewer physical reads need to be done. One of the most important methods of BufferedReader is readLine, which reads a line into a String. That String can be used with a StringTokenizer as above. BufferedReader has the LineNumberReader subclass which provides buffering as well as keeping track of how many lines have been read. The CharArrayReader treats a char array as a reader so that we can extract from it using I/O methods rather than dealing with the individual chars. A FilterReader provides some kind of modification to the stream as it is read. An example is its subclass, PushbackReader. A push back reader in effect lets you "unread" one or more characters. Subsequent reads will read them again. You can specify how much space should be set aside for pushing back characters into the reader. The default is one character.

An InputStreamReader takes any InputStream (binary stream) and turns it into a character stream. The FileReader is a subclass of this and also provides connection to an external file. A PipedReader is intended for use in a UNIX pipe and provides the needed buffering for that. Finally the StringReader treats a String object as an input device.

The classes here are more general than they might seem actually. For example, in an internet application that is distributed over two or more machines on the internet, a Socket connection between processes on two different machines provides a reader and writer mechanism. You can ask a Socket for its InputStream with getInputStream and subsequently treat information coming from the other machine just as if you were reading it from a file. Likewise you can "write" to the other process using the associated output stream.

Let's take an example and see how these streams can work together to get useful work done. Suppose that we have a file named "mydata.txt" that contains some textual information that we need to process. To get access to this we need a FileReader. Suppose that we think that buffering should be used so that we don't need to do a physical read for each character, but can handle the information a line (actually buffer full) at a time. To keep things simple we will use only the default buffer size.

The basic idea is to force the various readers that we want to use to send their output through the next reader in a chain. Thus we are going to connect the FileReader to the file and the BufferedReader to the FileReader. If the BufferedReader provides the other facilities that we need then we have enough. Otherwise we would attach more readers together. If we just want to read a line at a time into a String we have enough. We can create our reader chain with the following statement. Note that we haven't even named the FileReader here, just constructed it and passed the resulting object to the BufferedReader constructor.


BufferedReader in = new BufferedReader(new FileReader("mydata.txt"));

This is what we have done:

When we read from the object named in, we get characters from the file mydata.txt, passed through the FileReader and subsequently passed through the BufferedReader to us. We can read a line from the file with


String s = in.readLine();

This may or may not cause a physical read, depending on the size of the buffer and just what was in it at the time the instruction is executed, but the String s gets the Unicode characters for the next line of the file.

We probably want to break the line up into parts to process individual items in the string. We use a StringTokenizer for this as shown before. Thus, in a certain sense, we add to the chain we have begun above. The StringTokenizer will give us the individual parts of the line of the file (or any String).

InputStreams and Readers have one more source of confusion. There are a number of methods in these classes named read that are intended to read characters, but the stated return type is int not char. This is because of the chosen way to signal end of file. When you are reading characters from a file, end of file is signaled by reading a value of -1, which is not a valid character. The read method with no parameters in BufferedReader is like this. A beginner may think it reads ints, but that is not the case. This trick works because a Unicode character is 16 bits (unsigned) and an Integer is 32 bits.

This also means that you can test for end of file while reading characters by testing the sign of the character just read.

Writers

Writers are used similarly. The descendants of the abstract class Writer are

	BufferedWriter
	CharArrayWriter
	FilterWriter (abstract)
	OutputStreamWriter
		FileWriter
	PipedWriter
	PrintWriter
	StringWriter

The BufferedWriter provides buffering on output. The CharArrayWriter lets us write to a character array as if it were a file. The StringWriter is similar. A FilterWriter provides some sort of filtering to the data as it passes through. One could write a FilterWriter to verify data, for example, or to append line number to each line written, etc. No FilterWriters are provided in the standard libraries, though the Writers returned from some methods might in fact be FilterWriters.

A PipedWriter is the writing end of a UNIX pipe as you would expect. A PrintWriter is used to show internal data in a form humans can easily understand. This has methods for writing all of the basic types and translates the information to characters on output.

Once again, Writer classes are used together. Suppose that we want to write a character file that will contain integers (encoded as characters for readability) and we would like to buffer the output for efficiency and store the results into a file called "results.txt." This is the chain we must set up:

The Java statement that will construct the needed PrintWriter and its parts is


PrintWriter out = 
	new PrintWriter( new BufferedWriter (new FileWriter("results.txt")));

Having done this we can write integers onto the file with something like:


int x; 
...
out.print(x);

We are responsible for putting spaces or other separators between items, of course.

InputStreams and OutputStreams

The same ideas that apply to Readers and Writers also apply to InputStreams and OutputStreams. The only real difference is that these classes are intended for information to be processed by machines rather than people, so that internal formats can be retained within the streams for efficiency. Thus an integer will retain its twos complement encoding rather than being converted to a multi character Unicode format as would be the case with a Writer or Reader. InputStreams and OutputStreams process bytes and collections of bytes rather than characters.

InputStreams

The descendants of the abstract class InputStream are:

	ByteArrayInputStream
	FileInputStream
	FilterInputStream
		BufferedInputStream
		DataInputStream
		LineNumberInputStream(deprecated -- use LineNumberReader)
		PushbackInputStream
	ObjectInputStream
	PipedInputStream
	SequenceInputStream
	StringBufferInputStream
		(deprecated -- use StringReader or ByteArrayInputStream)

Many of these are similar to their Reader counterparts. A ByteArrayInputStream "reads" from a byte array. A SequenceInputStream can be used to catenate two or more streams together so that they appear as a single stream. Note that a SequenceInputStream could be connected to an InputStreamReader to process such a catenated stream as characters rather than bytes. We will discuss ObjectInputStreams in a subsequent section.

A DataInputStream is very useful for reading binary data in the form of integers, floats, booleans, and the like. Generally speaking you should read something with readInt from the DataInputStream class if it was written originally as an int using, for example writeInt from the DataOutputStream class. In older versions of Java, this class was often used to get access to its (now deprecated) readLine method. Now we use a BufferedReader for the same sort of task, as was shown above.

As was shown before, these are chained together, even chained together with one or more readers to achieve complex results.

OutputStreams

The descendants of the abstract class OutputStream are:

	ByteArrayOutputStream
	FileOutputStream
	FilterOutputStream
		BufferedOutputStream
		DataOutputStream
		PrintStream
	ObjectOutputStream
	PipedOutputStream

By now most of these should be pretty obvious. A DataOutputStream provides encodings for the basic Java types. A PrintStream is the old way to convert to human readable form, but PrintWriters should generally be used instead now. This class was not deprecated, since System.out is a PrintStream (historical), though the constructor for this class is deprecated to discourage its future use.

Other Facilities

In addition to the classes already discussed, the java.io package exports a number of interfaces, other classes and exceptions. There are various classes for files and file descriptors as well as interfaces describing file characteristics. For example RandomAccessFile implements both DataInput and DataOutput. RandomAccesFiles provide their own input and output facilities independent of the stream functionality.

The File class maintains a record of the operating system's external file information such as file name and length and whether the file is readable or not. In the above examples we created FileReaders from strings giving their names, but you could also create a File object using its name and then construct the reader or writer from the File object rather than the file name.

Object I/O and Serialization -- an Advanced Topic

Java has one additional facility that is lacked by most other languages. This is the ability to write out a complex object, even a web of objects, with a single statement. We have seen the ObjectInputStream and ObjectOutputStream above. These are used for two important purposes. The first is to save away the current state of an object in a file, such as an object database. Such a file contains objects themselves, not just the fields that implement them. The second use is to permit the migration of an object from a process running on one machine to a process on another, but writing the object over a (socket) link between the two processes. Thus an object can move from one machine to another and will be fully functional at its new home.

To permit an object to be written with an ObjectOutputStream or read with an ObjectInputStream, its class must implement the Serializable interface. This interface has no methods and so no requirements. It exists because the proper default for serialization should be "not allowed." This is for security considerations. No information can be transmitted or stored away unless we specifically permit it. We do so by implementing Serializable.

Saving an object and the objects that it references is very simple. You just create an ObjectOutputStream in the usual way, perhaps by connecting it to a FileOutputStream. You then just pass your object to the writeObject method of the stream. The object and all of the objects that it references (recursively) will be stored on the output medium. Thus an entire web of objects can be written with a single call. If objects refer to one another in a cycle, the system will be smart about only saving one copy of each object and correctly connecting them together again when the original object is read again using an ObjectInputStream.

ObjectInputStreams and ObjectOutputStreams can also be used to save arrays and primitive data such as int and boolean.

Note that type information is saved with the objects when they are serialized. Thus it is not possible to read an object back in and then cast it to an improper type. You do need to cast the results returned by an ObjectInputStream's readObject method since its stated return type is Object, but your cast will be checked.

Java I/O and Standard Design Patterns

If you notice carelfully, most of the Java I/O classes conform to the standard Decorator Pattern from the "Gang of Four" book: (Design Patterns, Gamma, Helm, Johnson, and Vlisides, Addison-Wesley, 1995). For example, a BufferedReader "decorates" another reader by adding buffering. An InputStreamReader decorates an InputStream by doing character conversions, etc. If you are teaching an elementary course in Object Technology and Java, even CS1, there is a lot of leverage in this fact, since the Decorator Pattern has many other uses at this level as well. If this pattern has been used in the course and students are comfortable with it, then the design of the I/O libraries seems more natural to them.

An Example -- Safe Input

The I/O facilities of Java are difficult for beginning programmers. Input is harder than output. Instructors will want to consider providing a simple class that lets simple I/O be done from the keyboard/screen and to and from files. I wrote such a class, which is presented here as an example of what might be done. Feel free to extend this to fit your own needs. Let me know of your results though. The code and the Javadoc documentation can be extracted from my home page: http://csis.pace.edu/~bergin

This class can be given to novices for use. Later it can be used as an example of wrapping Readers and Writers around other classes to get the desired effect. Here we want a line buffered input system. It also shows a StringTokenizer.

This has two constructors. The first is for backward compatibility with code that used PrintStreams and DataInputStreams. It also allows System.in and System.out to be passed directly without wrapping them in a Reader and Writer respectively.

The book CoreJava 1.1 (Volume 1) by Horstman and Cornell (Prentice-Hall, 1997) has a more elaborate but still simple Console class that can also be used by novices.


// (c) Copyright 1997, Joseph Bergin. All rights reserved.

package cs1;
import java.io.InputStream;
import java.io.Reader;
import java.io.InputStreamReader;
import java.io.BufferedReader;
import java.util.StringTokenizer;
import java.io.IOException;
import java.io.PrintStream;
import java.io.PrintWriter;
import java.util.NoSuchElementException;

/** Provides line buffered input that can be used either interactively
* or from a file.  In interactive mode (using a non-null prompt file), 
* one item is read per line.  In "batch" mode (prompt = null), lines 
* may contain several elements.  Note that getLine
* retrieves the rest of the current line.  In non-interactive mode,
* tokens will be skipped to try to find one of the desired kind. For 
* example if you getInt() when the next input token is not a legal int,
* tokens will be skipped looking for an int.  Failure will come at the
* end of the file, of course.  <p>
* Sources: <a href= SafeInput.java> SafeInput.java </a>
*/

public class SafeInput
{	
/** Create a new inputter. The PrintStream is used to prompt the user.  Data is read
*  from the InputFlie.  If the input stream is a file, then the print stream 
* used for prompting should be null.  
* @param prompt A print stream used for prompting interactively. If null, no
* 		prompting will be done.
* @param in The input stream to be read.  
*/
	public SafeInput(PrintStream prompt, InputStream in)	
	// if prompt is null, no prompting will be done.	
	{	input = new BufferedReader(new InputStreamReader(in));
		output = new PrintWriter(prompt);
	}

/** Create a new inputter. The PrintWriter is used to prompt the user.  Data is read
*  from the Reader.  If the input stream is a file, then the print stream 
* used for prompting should be null.  
* @param prompt A PrintWriter used for prompting interactively. If null, no
* 		prompting will be done.
* @param in The input stream to be read.  
*/
	public SafeInput(PrintWriter prompt, Reader in)	
	// if prompt is null, no prompting will be done.	
	{	input = new BufferedReader(in);
		output = prompt;
	}

/** Guarantees that we have a non-empty line buffer.
*/		
	private void guaranteeBuffer()	
	{	try		
		{	if( inBuffer == null || ! fetcher.hasMoreElements() )			
			{	while(fetcher == null || ! fetcher.hasMoreElements())
				{	inBuffer = input.readLine();
					fetcher = new StringTokenizer(inBuffer, delimiters);
				}
			}
		}
		catch(NoSuchElementException e)
		{ 	System.err.println(e);
 		}
		catch(IOException e)
		{ 	System.err.println(e);
 		}
	}

/** Get an int from the next location in the input.  When used interactively
* the user will be continuously prompted until a valid int is entered.  
*/	
	public int getInt()	
	{	int result = 0;
		boolean inputOk = false;
		while(!inputOk)
		{	if(output != null)
			{	inBuffer = null; 
				fetcher = null; 
				output.print("Enter an integer: "); 
				output.flush();
			}
			guaranteeBuffer();
			inputOk = true;
			try
			{	result = Integer.parseInt(fetcher.nextToken(delimiters));			
			}
			catch(NumberFormatException e)
			{	inputOk = false;
				if(output != null){output.println("Not an integer.");}
			}
		}
		return result;
	}
	
/** Get a float from the next location in the input.  When used interactively
* the user will be continuously prompted until a valid float is entered.  
*/	
	public float getFloat()	
	{	float result = 0;
		boolean inputOk = false;
		while(!inputOk)
		{	if(output != null)
			{	inBuffer = null;
				fetcher = null; 
				output.print("Enter a float: "); 
				output.flush();
			}
			guaranteeBuffer();
			inputOk = true;
			try
			{	result = Float.valueOf(fetcher.nextToken(delimiters)).floatValue();			
			}
			catch(NumberFormatException e)
			{	inputOk = false;
				if(output != null){output.println("Not a float.");}
			}
		}
		return result;
	}
		
/** Get a double from the next location in the input.  When used interactively
* the user will be continuously prompted until a valid double is entered.  
*/	
	public double getDouble()	
	{	double result = 0;
		boolean inputOk = false;
		while(!inputOk)
		{	if(output != null)
			{	inBuffer = null; 
				fetcher = null; 
				output.print("Enter a double: "); 
				output.flush();
			}
			guaranteeBuffer();
			inputOk = true;
			try
			{	result = Double.valueOf(fetcher.nextToken(delimiters)).doubleValue();			
			}
			catch(NumberFormatException e)
			{	inputOk = false;
				if(output != null){output.println("Not a double.");}
			}
		}
		return result;
	}
		
/** Get a long from the next location in the input.  When used interactively
* the user will be continuously prompted until a valid long is entered.  
*/	
	public long getLong()	
	{	long result = 0;
		boolean inputOk = false;
		while(!inputOk)
		{	if(output != null)
			{	inBuffer = null; 
				fetcher = null;
				output.print("Enter a long: "); 
				output.flush();
			}
			guaranteeBuffer();
			inputOk = true;
			try
			{	result = Long.parseLong(fetcher.nextToken(delimiters));		
			}
			catch(NumberFormatException e)
			{	inputOk = false;
				if(output != null){output.println("Not a long.");}
			}
		}
		return result;
	}
		
/** Get a word from the next location in the input.  In interactive mode, 
* a new line will be read first.   
*/	
	public String getWord()	
	{	if(output != null)
			{	inBuffer = null; 
				fetcher = null; 
				output.print("Enter a word: "); 
				output.flush();
			}
		guaranteeBuffer();
		return fetcher.nextToken(delimiters);
	}
		
/** Get the remainder of the current line buffer as input.  In interactive
* mode, a fresh line will be read entirely. 
*/	
	public String getLine()	
	{	if(output != null)
		{	inBuffer = null; 
			fetcher = null; 
			output.print("Enter a line of text: "); 
			output.flush();
		}
		guaranteeBuffer();
		String result = fetcher.nextToken("");
		return result;
	}

/** Set the delimiters that are used to separate tokens.  The default set
* is space and tab only.  
* @param newDelimiters the new set of delimiting chars in a String.
* @return the original set of delimiters.  
*/	
	public String setDelimiters(String newDelimiters)
	// returns old delimiters
	// The default delimiters are just spacs and tab.
	{	String result = delimiters;
		delimiters = newDelimiters;
		return result;
	}
		
	private String inBuffer = null;
	private PrintWriter output;
	private StringTokenizer fetcher = null;
	private BufferedReader input;
	private String delimiters = " \t";
}

Last Updated: May 4, 2002