How Do You Grab an Explosion Making the World Wide Web Usable
If the Web was incomprehensibly large in December 1997, it more than doubled in size in little more than one year. According to a survey by NEC's Research Institute,1 in February 1999 the Web had more than 800 million searchable pages with more than 6 trillion characters. Only 15 months before, the same authors estimated that the Web had approximately 320 million pages of information. By contrast, the Library of Congress had an estimated 20 trillion characters.
Staying even with this growth is difficult. Getting ahead of it is like trying to surf a tsunami. Just when you think you have a part of it down pat, it doubles in size and becomes even more incomprehensible. So, how can you manage this ever-growing monster to put its huge research library to work for you?
Eating an Elephant
The answer to the age-old question of "How do you eat an elephant?" applies equally to the Web—"one byte at a time."2 You have to start somewhere, get familiar with one part of the Internet and branch out from there. If you have yet to jump on the "information superhighway," now is the time because you will only get further behind as the Internet continues its exponential growth.
Since you already have an interest in both bankruptcy and a growing interest in the WWW, the logical place to jump in is http://www.abiworld.org, ABI's own corner of the world (wide web). Starting each day with a look at Today's Headlines, followed by the discussion boards, gives you both a crash course on using the WWW and the current events in the bankruptcy world. The current events portion is essential, since there is the possibility of a major overhaul coming out of Congress during this session.
Once you become familiar with ABI World, you are ready to branch out to the other areas of the Web. But where do you start? How do you find the needle in one of 10,000 haystacks? This is where the search engines come into play.
The Little Engines That (Almost) Could
Simply put, a search engine is a site on the WWW that helps you find other sites with the information you want. They come in different styles with their own strengths and weaknesses (which will be discussed below). Now we learn that perhaps the greatest weakness is their coverage.
The NEC Research Institute study included a statistic that was both surprising and completely anticipated—search engines do not really cover the Internet. That should not come as a surprise with the explosive growth it experienced in the past few years. How could anything stay absolutely current? The real surprise came from the report that even the best search engines cover barely more than 15 percent of the Web, leaving more than 80 percent virtually untouched. If these search engines did not find a bit of information, the only other way to find these web pages is purely by accident.
Searching Web Sites
Northern Light was found to have the best coverage with 16 percent, with Snap and AltaVista close behind at 15.5 percent each.3 Yahoo!, one of the most popular sites, was reported to cover only 7.4 percent of the Web. Understanding how these search engines work explains both their incredible scope and unbelievable lack of reach.
Northern Light, Snap and AltaVista all use programs called "spiders" to continuously search the Web, find sites and index the locations they find by creating an enormous database. The programs work all of the time to update the information they find. When you search for something using the search engine, instead of searching all of the web sites, they simply search their databases and return with a list of sites that match your search criteria. An example of the power of the search engine (and sheer magnitude of the database) is a recent search on AltaVista for "bankruptcy." In a search in July, AltaVista took only two seconds to generate a list of 497,890 sites on the WWW where it found the word "bankruptcy." Obviously, these engines should be used for very specific searches of narrow topics—the true "needle in thousands of haystacks" searches.
Searching for Types of Web Sites
By contrast, Yahoo! is an outline (created by real humans) of sites on the Web. Yahoo! categorizes the sites by area, then topic, and you finally get down to a list of specific sites that meet your criteria. You can search it using only a mouse (with no "keyword" searching) or giving it specific criteria to search for. When using the keyword method, Yahoo! creates a list of categories for you to use to narrow and continue your search. It will usually guide you to the front page of the web sites instead of the specific pages deeper in the site that use the specific terms from your search. In short, Yahoo! will drop you at the front door of a site and then let you do the exploring from there to find what you want. This is the type of search engine you want for general searches of broad topics.4
Another way to surf the Web is to find what I will call "compilation sites." These sites compile and categorize lists of sites meeting their criteria. Two popular sites are CEO Express and Virtual Reference Desk.5 CEO Express categorizes web sites, providing information that a CEO would use. Its areas include news, financial information (including the financial calculators that can amortize note payments), medical, travel, legal (including ABI World) and similar sites. It is a quick way to get to these sites. The Virtual Reference Desk includes (almost) everything that a reference librarian could find for you (including many of the sites listed by CEO Express). A personal favorite for finding obscure information is a feature called "Homework Helper." It found an obscure Shakespearean quote by knowing where to find the web site where the entire body of his work was available.
For a more detailed review of the strengths and weaknesses of the search engines, go to the link for "How to best use a search engine" at CEO Express.6 This link describes the features of the most popular search engines and gives specific examples of the searches where each search engine excels and fails to reach the mark. Having this information in something other than "technogeek speak" is invaluable.7
So What's Really Out There?
The NEC Research Institute report also described the types of sites on the Web. Not surprisingly, the main type of site was described as "commercial"—businesses with web sites that inform the world about what they do (and what they have to sell). A surprisingly large 82.1 percent of the sites were described as "commercial." By contrast, "government" and "religion" sites brought up the rear with 1.2 percent and 0.8 percent, respectively.8 Businesses have learned that the way to draw traffic to their sites is to entice you with their products. We apparently need to be bribed (with discounts and the like) to surf to those pages.
On the government front, some of the most helpful information comes from the Securities and Exchange Commission (SEC) and Internal Revenue Service (IRS)—two agencies that you always hear about. The SEC's EDGAR database includes the recent filings by public companies. The IRS site includes some tax information but also includes forms you can download and use (instead of going to the post office or their local office and waiting in line).9
There is no doubt that the WWW will continue to grow. At its current size, you could argue that its growth rate is irrelevant. If it grows slower, it will still take years before the search engines catch up to track all of it. If, on the other hand, it grows even faster, we will only be able to use a small part of all that is available. Either way, the only way to make any use of it is by going one step at a time. Get familiar with a corner of the Web and move out from there.
4 A new search engine called "Ask Jeeves" mixes the two types of engines by accepting "natural language" requests and presenting a number of possible sites for the information. It is found at www.askjeeves.com. Return to article
8 Those listed as "pornography" accounted for only 1.5 percent of the sites. This is far smaller than some might believe, given the news reports about those sites and the privacy and First Amendment issues they present. Return to article
9 These sites are found at www.sec.gov and www.irs.treas.gov. The far more usable site for SEC documents is maintained by the Technology Centre of PricewaterhouseCoopers and is found at www.edgarscan.pwcglobal.com/EdgarScan/index.html. Return to article