How GOOGLE WorksZDNET The numbers alone are enough to make your eyes water.
It is one of the largest computing projects on the planet, arguably employing more computers than any other single, fully managed system (we're not counting distributed computing projects here), some 200 computer science PhDs, and 600 other computer scientists. http://insight.zdnet.co.uk/hardware/servers/ |
|
When was the last time a computer or network was built to take advantage of cheap bandwidth, cheap DRAM, and plentiful PCs? Most companies are still building mainframes based on old computer architectures. At Google, for example, we found it costs less money and it is more efficient to use DRAM as storage as opposed to hard disks--which is kind of amazing. It turns out that DRAM is 200,000 times more efficient when it comes to storing seekable data. In a disk architecture, you have to wait for a disk arm to retrieve information off of a hard-disk platter. DRAM is not only cheaper, but queries are lightning fast. Search technology has a lot of room for improvement, be it algorithms or computer architecture. |
Recording of a speech by Jim Reese http://technetcast.ddj.com/tnc_play_stream.html?stream_id=420
The Magic That Makes Google Tick: http://insight.zdnet.co.uk/hardware/servers/0,39020445,39175560,00.htm
Google Cluster Architecture http://www.computer.org/micro/mi2003/m2022.pdf
Interview with Jim Reese http://www.hpworld.com/hpworldnews/hpw009/02nt.html
Is Google Broken? http://www.w3reports.com/index.php?itemid=549
Pigeon Rank : http://www.google.com/technology/pigeonrank.html
Anatomy of a Search Engine: http://www-db.stanford.edu/~backrub/google.html
The Secret of Google's Power http://blog.topix.net/archives/000016.html
Google Server Design http://news.cnet.com/8301-1001_3-10209580-92.html
Underground Server Farm http://www.wired.com/news/business/0,1367,48104,00.html?tw=wn_story_related
RAM at HowStuffWorks http://computer.howstuffworks.com/ram.htm/printable
Questions:
Hardware
How many servers are there (approximately)?
Describe the hardware in a Google server.
What is a “server farm”?
Why is are heat and power significant issues in a server farm? Why is this an even bigger problem for Google?
Explain the difference between Dynamic RAM (DRAM) and Static Ram (SRAM).
Why would Google use SRAM instead of hard-disks?
Do servers break? Approximately how often?
Explain the term “mean-time-between-failure” and give an approximate value for a hard-disk.
How does Google avoid a “single point of failure”?
What special arrangements does Google make to enable them to use very cheap hardware?
Software
Which browser do the Google servers use?
What standard software is running on the Google servers?
Explain what “page rank” is and how it works.
What does it mean when a “web-spider” is “crawling” the web?
What is an “index”?
Why does Google “cache” pages?
Calculate the total data storage requirements for 4 billion pages averaging 10 KB each.
System
What operating system does Google use? Why?
Describe some difference between a server OS and a desktop OS.
What major change in the OS did Google program themselves?
How does a large block size in the file system improve disk drive performance?
What does “parallel” mean?
Explain “load balancing”.
What does “scalability” refer to?
What does “redundant” mean?
What does “fault tolerant” mean?
Describe how Google can stay “up” 24/7, despite frequent changes to the
Web
and frequent hardware failures.
Vocabulary
Megabyte, Gigabyte, Terabyte, Petabyte, Gbps
http, HTML, client, server, host, spider, crawl, DNS, IP, query, Boolean, bandwidth, cache
rackmount, 1U, 2U
parallel, cluster, load-balancing, multi-threading, multi-processor, scalability, fault-tolerance
IDE, SCSI, drive array, RAID
CPU, RAM, SRAM, DRAM, L2 cache