What Are You Waiting For?

Latency is a delay which occurs before a process can start. A very common cause of latency is power-save standby mode, which turns off a peripheral device when it is not in use, thus saving energy. Before the device can do anything, it must warm-up. A disk-drive in standy mode is no longer spinning. It must first spin-up before it can do anything, and this may take several seconds. A monitor in standby mode must warm-up - the electron gun won't start shooting electrons until it is warm. Notebook computers make extensive use of power-saving techniques, because they run off batteries which would only last about an hour without power-management. Good power-management strategies can double or triple battery life.

Most peripheral devices have other start-up requirements which slow things down. Commonly, storage devices have a seek time built in, while I/O devices are often busy.

Printers - printers sometimes download new fonts from the PC before they start printing, and this can be quite time-consuming. Laser printers are page printers, which means that they must finish loading the contents of an entire page before they start printing - this can take several seconds if the page contains graphics.

Disk drives - if you try to load a program from the hard-disk, there is about a 10 ms delay while the read/write head moves from one track to another to find the file. This is the seek delay, and the 10 ms is called the mean seek time. You are unlikely to notice this delay, but if the computer makes lots of disk accesses, you may notice it.

Modems - each time you contact a new web-site, your modem and the modem at the web-site must carry out handshaking. This is an exchange of signals which determines the protocols which will be used. This sets the baud rate (14.4 kbps or 28.8 kbps or 56 kbps), decides whether parity bits and stop bits will be used, and a few other details. After the protocol has been agreed, the modems can start exchanging data.

Tape drives - commonly called streamers, because the data flows onto the tape in a steady stream (hopefully fast - 1 MB or more per second for a digital tape drive). Tapes are used to back-up hard disks, because a fairly cheap tape (under 40 DM) can be used to save the contents of a 4 GB hard-disk. And the tape is replaceable, so many different tapes can be used - a new copy can be made each day on a different tape, and an entire month of tapes can be stored, so that the exact contents of a financial system can be restored for any day of the month. The only really bad part of a streamer tape is the latency caused by two factors: (1) the user must mount the tape - this means insert it in the tape-drive - which can be very annoying if more than one tape is required; (2) tapes are serial devices - this means that they must be read from the beginning to the end. There is no simple way to jump to the middle of the tape. Thus, the seek time for a tape drive can be very, very long (dozens of seconds or several minutes). Some drives allow fast-forward to be used, but old-fashioned drives didn't even allow that, so the tapes had to be played at their normal speed until a desired file was found.

Networks - the most common cause of latency in a network is over-used resources (busy printers and servers). If the server is busy with someone else's requests, it may take some time before it responds to your requests. If you print over a network, the server is likely to save the entire print job into a spool file and register the print job in a print queue, waiting behind other jobs which got there first. Then you may have to wait several minutes until all the other jobs are finished before your printing can start. You don't notice it, but networks also suffer from collisions. In a bus network (e.g. Ethernet), only one computer can be transmitting at any one time. If two computers both try to use the cable at the same time, it is called a collision. Then one computer must wait until the other is finished before it can use the cable. This is a common problem which goes unnoticed, because most packets of data are quite small (several KiloBytes), and each computer is required to pause after each packet, in case another computer is waiting.

Memory - every RAM chip has a specific latency which is measured in nanoseconds - a typical speed is 60 ns. When the CPU wants to fetch bytes from RAM, it transmits the address of the required data. The RAM chip must decode this address and retrieve the required byte from memory. This process takes 60 ns. This limits the maximum data transfer rate to 1/60e-9 = 16.7 MBps. Good RAM chips support a burst or block mode. This means that if a large block of bytes is transferred, then the address-decode delay only occurs once at the beginning of the block, not for every byte. So if a megabyte is being transferred, the 60 ns delay only occurs once, and the rest of the bytes are transferred at a higher speed - typically 1/3 of the initial latency time, e.g. 20 ns per byte instead of 60. Thus, block (serial) transfers occur 3 times faster than random access transfers.

Monitors - A graphics card refreshes the picture on the monitor about 70 times per second. It draws the top line of pixels, then the next line, then the next, until it reaches the bottom. Then the electron gun is repositioned to the top of the screen (vertical retrace) and the next refresh starts. If a program "draws" a line on the screen, at the exact moment when the monitor is in the middle of the page, then you may first see the bottom half of the line, then the top half appears. This sort of poor timing can cause "flickering". Good programs wait until the screen has all been drawn, then draws the new pixels into the video RAM, and then the next refresh draws the entire line. This approach limits drawings to changing 70 times per second.

Virtual Memory - Windows uses a swap area on the hard-disk for temporary storage. Then Windows can "pretend" that it has lots more memory than is actually present - typically, the swap file is twice as large as RAM, which effectively triples the amount of available RAM. This is necessary because running many applications (multi-tasking) or running large applications often requires more memory than is actually available in RAM. When this occurs, the computer swaps some of the RAM out onto the disk into the swap file. The data remains there, until it is needed, and then it is swapped back into RAM. Swapping from RAM to disk is pretty slow (about a second per megabyte). If you start too many large applications, Windows may end up spending most of its time swapping - this condition is called thrashing. You will hear the disk-drive running contantly, and notice a massive slow-down in performance. Even under normal circumstances, every swap can cost a second or more.

Hurry Up - Cashing in with Cache

Many devices spend lots of time sitting around doing nothing, while some other device is busy. Then the other devices sit around waiting while that device does something. Printers are a prime example. Most of the time you are typing, the printer is doing nothing. Then you decide to print something. The computer waits until the printer is warmed up, then the printer starts printing. If a particularly stupid piece of software is running, it may sit around doing nothing until the printer is finished.

Time spent doing nothing is called idle time. Times when there is lots of activity and lots of demand for services is called peak demand. The general strategy to speed up a computer system is to even the load, reducing the differences between peak demand and average demand. Then all devices have less idle time, which also results in less waiting time. For example, a good printer will let the computer send the entire print-job and store it in a buffer (typically several megabytes in a laser printer). This happens rapidly, at speeds of several hundred kilobytes per second. Once the entire print-job has been transferred, the computer (or user) can go back to work, while the printer simultaneously prints the document. Thus, both the computer and the printer are busy at the same time - no idle time for the printer, and no waiting time for the computer or user.

A buffer is a temporary storage area, where data can be quickly collected, and then processed more slowly by a slow peripheral device. Another type of temporary storage area is a cache. Modern CPUs don't load data directly from the slow, cheap RAM chips. Instead, the data is loaded from the faster, smaller, more expensive cache memory. A typical Pentium PC contains 256 KB of cache memory, and 16 MB of RAM. If the needed data is not in the cache, then the cache fetches the data from the RAM. But rather than fetching single bytes, the cache fetches entire pages (blocks) of data (perhaps 256 bytes or 1 KB) in a single burst. This transfer occurs about 3 times faster than fetching single bytes. Then the CPU fetches single bytes from the cache, which is a much faster process than fetching from RAM. Previously, there was a very small cache (16 KB or so) inside the CPU - this was called level-1 cache. Then separate, fast memory chips comprised a level-2 cache (typically 256 KB). Data travelled from RAM to the level-2 cache, and then into the level-1 cache, before being used by the CPU. Newer Pentium chips have the level-2 cache directly inside the Pentium chip. This dumping of data from cache to cache continues somewhat automatically at the same time that the CPU is doing calculations - e.g. no idle time for the cache, and less waiting time for the CPU.

A disk-cache is a memory area in RAM which is used for temporary storage of disk-data. Rather than reading and writing data directly on the physical disk, the system can read and write from the cache (much faster), and do the actual disk-updates later. This is dangerous - if the computer is shut down before the data is committed to the disk (physically recorded), the data goes lost. However, a disk-cache can save a lot of time, so it may be worth the risk. A read-only cache is one that is used only for reading from the disk, while write operations go directly to disk (also called a write-through cache). A read-only cache speeds up the read operations, which comprise most of the disk operations, while leaving the write operations to function normally. Some hard-disk drives and/or controllers contain an on-board cache, which is memory which is actually part of the disk drive. Then no RAM is occupied by the disk-cache.

Double-buffering is a buffer technique that uses two separate buffer or cache areas. Then data can be written into one buffer by one device at the same time that data is being read from the other buffer by another device.