RandomAccessFile

Sequential Text Files

Files are used to store data on a disk-drive.  Without this, any data we typed in would be lost as soon as we shut down the computer.  An easy way to create a data file is to use a text editor (e.g. Notepad) and typing the data, like this bank-account file:

 Arnie
 12.25
 Barbie
 123.50
 Chaplin
 999.99
 Dufus
 0.01
 Ernie
 2500.00
 Great, Alexander the
 123456789.00
   ......

This is a sequential text file.  It is called a text-file because it is clearly readable, containing no special formatting or strange characters.  It is sequential because a computer must read all the data in order when searching for something - there is no way to "jump around" in the file.  Although this makes sequential text files inefficient, they are very flexible - you can type anything you want using any format.  For example, there is no restriction on the size of a name or the size of a number.  The example above is alphabetical, so if you were looking for "Yoyo" you would jump to the end.  But if the file were not alphabetical, and contained all the words in a novel (in order), you would need to search sequentially to find a specific word.  In a text-file, computers always read sequentially, because there is no command for jumping around in a text-file.

Random Access

If we are willing to use clear, inflexible structures in our files, we can make it possible for the computer to jump around in the file by counting bytes. 

We break the file up into records - one for each customer.  Then decide what the maximum size of a record should be.  For example, we can allocate 30 bytes (characters) for each name and 20 bytes for each number, giving each record 50 bytes total maximum size.

Each record is divided up into fields - in this case, a NAME field of 30 bytes and a MONEY field of 20 bytes.

After making these decisions, we can say exactly where the 750th record in the file is:

  ==>   750 * 50 = 37500 bytes

Assuming the computer can count bytes, it can jump to byte #37500 and then read that record, without reading the first 749 records.  This won't work in a text file, where the records and fields are variable length.  This only works if we use fixed length fields and records.

Jumping around in the file is potentially more efficient than reading sequentially.  In addition to this efficiency in reading data, it also enables us to write data much more efficiently.  For example, to change Ernie's money, the computer can jump directly to that spot in the file and write a new number there.  In a text file, the only way to write new data is to copy all the data into arrays in the computer's memory, then change an item, and then write all the data back onto the disk. 

The increased efficiency of Random Access becomes more important with large amounts of data.  For example, there are 80 million people in Germany and probably an equal number of telephones.  So the telephone customer database contains 80 million records.  If one piece of data changes, copying 80 million records could require 50 x 80 million bytes, or approximately 4000 megabytes (4 GB) of memory.  That might not fit into the memory at all, making it impossible to change the file if it is supposed to all be in an array at once.

RandomAccessFile

Java provides the RandomAccessFile class for creating and manipulating random-access data files.  It does not specify the size of a record or the number of fields - this is controlled by the program.

SEEK

The most important command is seek, which jumps around in the file.  Despite the name, this is not a search command.  It simply jumps to a specific position in the file.  This is not possible in a text file.

READ and WRITE UTF

The input and output commands are .readUTF() and .writeUTF(String) UTF stands for Unicode Transformation Format.  If you want lots of details, try this link: http://en.wikipedia.org/wiki/UTF-8 .  UTF supports Unicode characters in a standard way, so lots of computer software can read and write UTF successfully.

FIELD SIZES

Since the file has fixed size fields, the program must control the size of the data before writing.  If a program writes 50 characters into a 30 character field, this will cause some sort of problem.  But the RandomAccessFile methods will let you make this mistake - it does not control the size of the data.  So your program must check data size before writing.

SEEK FIRST

Programs should always seek to a specific position before reading or writing.  If the program is writing two fields - name and money - into the file, there should be two seek commands, one before each write command.  This is shown in the following sample program.

Bank Sample Program

Here is a sample program that writes and reads bank data in a RandomAccessFile.

//== Create a RandomAccessFile ==
//  Creates a RandomAccessFile with names and salaries (money)
//  The program allocates 50 bytes for each recored -
//    40 bytes for the name field
//    10 bytes for the salary field
//  The commands .writeUTF and .readUTF use the following system:
//    First two bytes tell the length of the following string,
//     and the following bytes contain the string (1 char per byte)
//  The double value occupies 8 bytes.
//  So there are 42 bytes for the name = 2 bytes for length, 40 bytes for data.
//  If shorter strings are recorded, the extra bytes are empty (wasted) and ignored.
//==================================================================================

 import java.awt.*;
 import java.awt.event.*;
 import javax.swing.*;
 import java.io.*;

 public class Bank extends EasyApp    
                                      
 { public static void main(String[] args)  
   
{  new Bank(); }
               
   
   
Button bWrite  =  addButton("Write Record",30,30,100,50,this
);
   Button bRead   =  addButton("Read Record",130,30,100,50,this);

   public void actionPerformed(ActionEvent evt
)
   {   
       
Object source = evt.getSource
();
       if (source == bWrite) { writeRecord();}
       if (source == bRead)  { readRecord(); }
   
}
   
   
public void writeRecord
()
   
{
       try
       {
 
           
RandomAccessFile file = new RandomAccessFile("bank.dat","rw");
           long pos = inputLong("Type the record number for saving the data:"
);

           String name = input("Type the customer's name:"
);
           if (name.length() > 40) { name = name.substring(0,40
);}
               
           
double money = inputDouble("Type the customer's money:"
);
           
           
file.seek(50*pos
);
           file.writeUTF(name);
           file.seek(50*pos + 42);
           file.writeDouble(money);

           file.close();
       }
       catch (IOException ex)
      
{ output(ex.toString());

       }

   
}
   
   
public void readRecord
()
   
{
       try
       {
 
           
RandomAccessFile file = new RandomAccessFile("bank.dat","r");
           long pos = inputLong("Type the record number for reading the data:"
);
           
           
file.seek(50*pos);
           String name = file.readUTF();
           file.seek(50*pos + 42);
           double money = file.readDouble();

           output("Record #" + pos + " = " + name + " : " + money
);

           file.close();
       
}
       catch (IOException ex) { output(ex.toString());}
   
}
}

Notice the following details:

Counting Bytes

Counting the bytes in a RandomAccessFile is tricky.  The arithmetic is not so difficult (see SEEK above).  The problem is knowing exactly how many bytes are actually used by various data types.  The following chart shows the .write commands and the corresponding number of bytes required.

 write command  bytes occupied
 .writeInt(int)   4
 .writeDouble(double)    8
 .writeChar(char)    2 (this is not UTF)
 .writeLong(long)   8
 .writeByte(byte)   1
 .writeFloat(float)   4
 .writeBoolean(boolean)    1
 .writeUTF(String)   String.length() + 2 bytes *** 

In the sample Bank program, the name field is limited to 40 characters.  But the program allows 42 bytes in the file.  UTF Strings are written with a 2 byte prefix that tells how long the String is.  So the UTF String actually occupies 42 bytes instead of 40.

*** Calculating UTF storage space is actually more complex.  UTF does not always use 1 byte per character - it uses 1,2, or 3 bytes per character, depending on the language.  "Normal" English characters (those with ASCII codes below 128) require one byte per character.  So the calculation above is fine as long as you have normal pure English language data.  If the text might contain some Greek letters or special math symbols, then these characters will take more than one byte of storages.  If you are unsure and you don't mind wasting disk storage space, allocate 3 times as much space as you actually need, and you won't have any problems.

In general, there is nothing wrong with allocating a bit of extra space.  For example, if you are writing 20 character String, an int and a double, you calculate:

   ==>  (20 + 2) + 4 + 8 = 36

You can allocate 40 bytes per record (or even 50), in case you miscounted.

Practice - Add More Features

  1. countCustomers()
       
    Read through the entire file and count the records that contain a name that isn't blank.
       Assume there are exactly 1000 records in the file, so record 999 is not blank.

    showAllCustomers()
      
      Read through the entire file and print the name and money for each customer
        Assume there are exactly 1000 records in the file, so record 999 is not blank.
     
  2. PIN number
    Banks normally have security to prevent people from accessing other people's data. Often this uses a 4-digit PIN (Personal ID Number).    To add a PIN to this database, each record must become larger.  This can be written as a UTF String - that's the simplest way. Remember that this requires 6 bytes for 4 characters - 2 extra bytes for the String length.

    Add the PIN code to the program so that it is required before each access.  That means both reading and writing data should input the PIN code and check it against the code stored in the file.
     
  3. Charges
    Banks charge fees for various services.

    monthlyFee
       Reads through the entire file and subtracts 50.00 from each customer as a monthly fee.
       If the customer has less than 50.00 EU, resulting in a negative balance,
         the method should print a warning message.  This would print the account number,
         the customer's name and the current balance.

    interest(double rate)
       Adds money to each customer.  For example, if rate is 0.5 % , it is calculated like this:
            newMoney = money * (1 + rate / 100) ;
         So   500 EU --->  500 * (1.005) = 502.50

    montlyUpdate()
      
    Reads through the entire file and subtracts 50.00 for the monthly fee and then add 
       0.5% interest.  Like this:

       Current balance = 500
       Subtract 50.00 =  450.00
       Add 0.5%  =  452.25  =  New Balance
       Write the new balance back into the file
     
  4. Searching
    The program is fine as long as the customers and the bank employees know the record number for each customer.  Otherwise, it is impossible to access the correct record.  For example, what if Madonna wants to put some money in the bank?  Maybe she already has a record, so it would be silly to make a new one.

    The program needs some searching methods.  Write some of the following:

    nameSearch(String customer)
        searches sequentially for the record containing a name matching customer

    moneySearch(double min, double max)

       searches for all records where the money is between min and max
       
    For example, we could find all the rich people
       by running moneySearch(1000000,9e99);

     < br >