Finding Information on the
Worldwide Web (WWW)

By M. Ettisch-Enchelmaier

 

a) Single Search Engines

What are search engines or similar tools? A definition
How do search engines work and their use
Their limitations
Recommended search engines

In this essay the author intends to look at an important tool for finding
information on the WWW: first search engines, then "meta-search" engines are discussed. In both cases not only American versions are listed, but also international ones are discussed.

What is a search engine?

One of the easiest way of finding information on the WWW is to use a search engine which may also act as a collection of databases. This is an Internet based utility helping to find information buried in online databases worldwide. There are many search engines available on the web that can be of great help.

There are search engines providing general information such as Yahoo, Google, AltaVista, but there are numerous which serve specific needs, such as "FindLaw" [22].

How do search engines work and their use?

To use a web search engine all a surfer has to do is to get connected to a search engine. All search engines have a starting page that has a form to enter the user's query. Some have links categorized into subject areas. Yahoo (http://www.yahoo.com) is a typical example for this. The categories are main headings divided into sub categories. One can select a subject and be taken to a page full of sub categories relevant to that subject. These subtitles lead to other titles mostly by the means of URLs until they lead the researchers to the relevant web pages. This form of the set up consists mostly of what is termed as "directory".

The easiest way of getting information is to go through these links. If the links provided are too vague or ambiguous, there is another feature supported by almost all search engines. That is to search under a key word. The key word is the subject matter that the searcher looks for. It may be a combination of words or a subject name. If the user only wishes these combination of words or phrase, then some search engines need them put in " ". The search engine will go through its database of known links and give a list of pages that are relevant to the key word entered. These are usually sorted in the order of relevance. Some even provide a guide by inserting a hit ratio. The higher the hit ratio, the closer one is to his needed information!

 

The limitations of search engines

There are search engines providing for general needs, like Yahoo, Google,
AltaVista. The general search engines may be too general to be of help to a user supplying to much information, but not touch the specific field of search the user looks for. There are numerous search engines however which serve specific clientel.A lawyer for instance will be more interested in a law search engine, e.g. "FindLaw" [22] (a free search engine) or "Versuslaw" (the most inexpensive US legal search engine [23] than in "Kids Search Engines" [24]. Other search engines may only supply websites compiled in a special language. A European attorney most likely prefers the "Guide to European Legal Databases" [24] than in "Kids Search Engines" [25]

Toxicologists or physicians may be more interested in TOXNET, the Toxicology Data Network [27] which is sponsored by the National Library of Medicine (National Institutes of Health). TOXNET is a collection of databases covering toxicology, hazardous chemicals, and related topics.

Some of the more advanced search engines provide several types of searches like Ordinary Search and Power Search. The ordinary search is the one that was described above. The power search is an advanced version of the ordinary search. It provides the user the facility to combine several words in conjunction with Boolean [31] operators (i.e. AND, OR, NOT etc.).
One can even specify a source document or URL to search for the information. OpenText [33] excels in this area.

The links available are usually accompanied with some text that describes the link or some extract from the source page. This information can be is a guideline for assessing the pages without going through all the links and unnecessarily wasting time, resources, and money.

Search engines get ways. One obvious and straight forward method is by notifications sent to them by the authors or owners of the appertaining web pages. The search engines have facilities to add pages into their database. The authors, owners or webmasters can select a topic under which the page or site should be included and the people who maintain the appertaining engine will check it, adding the site or page under the relevant topic or any topic they believe the site fits best.

The other method of finding out web sites is to mine through these web sites and follow their links. This is what most automated web search engines do. They send "SPIDERS" into the WWW to track down the resources and include them in their database. Special texts can be inserted in web site headers to assist the automated search engines to find out information. These are usually called "Meta Data".

Some of the popular search engines are:

Yahoo http://www.yahoo.com
Lycos http://www.lycos.com
AltaVista http://www.altavista.digital.com
Infoseek http://www.infoseek.com
Web Crawler http://www.webcrawler.com
OpenText http://www.opentext.com

 

In ordinary (non-"meta") search engines such as Northern Light, AltaVista, Google, the user submits keywords to the individual database of web pages or pages and gets back a different display of documents from each search engine. Results from submitting very comparable searches can differ widely (about 40%), but also contain some of the same sites (about 60%).

Another very popular method of getting information is to use a news server. A news server contains many news groups: i.e. articles under a certain subject matter. There are many subjects of interest ranging from e.g. education, science & technology, religion, hobbies, politics and many other topics. A user may submit his ideas to the news server and the news server users can read them and reply.

The Television News Archive collection at Vanderbilt University is believed to be the world's most extensive and complete archive of television news. The collection holds more than 30,000 individual network evening news broadcasts and more than 9,000 hours of special news-related programming. [37]

According to a study where 87.528 German Internet users participated
about 50% of the users find new websites through advertisements in
journals and newspapers as per market researchers Fittkau & Mass,
published in the marketing journal W & V. [25].

Banners and buttons draw some 28% of the users' attention to new sites,
but some 80% of the sites are found by search engines.

Pop up windows draw only some 11.4% of the surfers of which 12.4% are
female in contrast to 11% male users.[25]

However from the USA the writer of this essay received an email from
the Internetseer on May 26, 2001 stating that 90% of all Internet consumers
and businesses use a search engine o find products and services. They
also maintain that studies have proven conclusively that search engines are the most cost effective way to acquire new customers. They continue to state that search engines get over 10 million submissions per months. [26]

The present author would take these statements with a "pinch of salt", since Internetseer wants to sell its "EnterURL.com" service to
submit the users' site to "over 350 of the most visited search engines and directories every month for a year while shielding the users from unsoli-cited email confirmations and offers". [26]

There are even more than "most visited" 350 search engines, but the present author wonders whether for most users or customers of Internetseer they are of no use, since they do not reach those potential customers for interest of the Internetseer's clients though.

Recommended general search engines as a starter

Some popular general web search engines are:

Yahoo http://www.yahoo.com
Lycos http://www.lycos.com
AltaVista http://www.altavista.digital.com
Infoseek http://www.infoseek.com
Web Crawler http://www.webcrawler.com
OpenText http://www.opentext.com

b) Meta-Search Engines

What are "meta-search" engines or similar tools? A definition
How do meta-search engines work and their use
Their limitations
Recommended search engines

One of the easiest way of finding information on the WWW [1] is to use a search engine. There are many search engines available on the web that can be of great help to a user, also called "surfer". However, of greater
value are such search tools and meta-search engines combining several
single search engines together, thus cutting down the time online.

What are "meta-search" engines?

"Meta" originates from Greece either meaning "after", "beyond", "over" or
"next". Less likely here the meaning "change", "transformation", "exchange". In Italian there is the verb "meta", one meaning being "to stack" e.g. hay or straw or in the phrase "meta business" in Italian it means that two companies perform a business or banking transaction together sharing the profit or loss 50-50.

"Meta" could originate from one of these words, but the writer of this
essay is of the opinion that the term meta is the word "mega" misspelt,
since meta-search engines or tools are a number of search engines combined i.e. a very big search engine (mega).

"Meta" is also used in meta tags, meta data, html meta, meta stuff, meta
network, meta index, meta links, meta directory, with search engines or links: Meta-Cog, MetaGopher, Meta Mergers, Meta Manufacturing.

Meta has also been adopted in other languages, e.g. MetaGer (a German search engine), Enfin ­ les meta-moteurs francophones, Vindel a "Meta-Zoekmachine", http://meta-ukraine.com/, a non-English page for the same or similar use, MESA, a German meta email search engine.

The word "meta" is even used in companies' names to express their scope
or their programme, e.g. Meta Software Corporation and Meta Integration
Technology, Inc., both in the USA. [32]

Meta-search engines do not own a database of web pages. They have different set ups:

Some have a "search box" where a user may submit the keywords for the document(s) looked for in the meta-search engine search box of the site. Other meta-search engines in turn make available a collection of search boxes for different search engines and again another mode is a drop-down menu that lets the user choose which one among a list of search engines to search.

Whichever a way the meta-search engine is set up, it transmits the requested search simultaneously to several individual search engines and their databases of web pages. Within seconds or minutes, depending on the appertaining of the search engine, that of the user's PC modem, the number of surfers online clogging up the telephone lines, the user or searcher gets back results from all the search engines queried.

The results may be expressed in percentage: from 0% to 100%, i.e. nothing
was found in a particular (or in any engines) or in others the user has found just or even more than what he has wanted. However, often the results overlap, e.g. 60% of the sites may appear in many or all search engines the meta-search engines use, some 40% may be different.

The results however depend on various factors, such as the quality of the
search engine (e.g. ordinary or pro version, which may mean a free version or such costing up to several hundred US dollars), the quality of the submission of the "keywords", also known as "meta tags" by the website owners as well as the submission of the "the keywords" to the search engine by the user or searcher and lastly by the appertaining webmaster's choice of accepting or dismissing the meta tags supplied.

Some meta-search engine sites offer many useful secondary, portal-like services and specialized collections of websites and/or resources e.g. for businesses, web designers, movie-goers, but which may till be useful to
some extent at times to the particular user because of the leads or links
contained in these secondary services.

How do meta-search engines work and their use

Most meta-search engines only spend a short time in each database and often retrieve only 10% of any of the results in any of the databases queried. This makes their searches usually "quick and dirty," but often good enough to find what the user wants.

Most meta-search engines simply pass the user's search terms along, and if the search contains more than one or two words or very complex logic, most of these will be lost.

Quantity in results does not equal satisfaction. Here is where advanced searches accommodate the user by trying to refine the results. AltaVista Advanced Search, Northern Light, or Infoseek may be of use by clicking on their link in the results [2].

However, the present author wants remind any user that such "advanced" or "pro" search engines in many instances ask for a -at times- substantial fee and this trend is growing.

Even the leading meta-search engines may not develop the needed results,
then the user should such meta-search engines which employ selective or
odd databases such as WebCrawler, Thunderstone, Direct Hit, and WhatUSeek.

As stated previously Barker [2] regards the meta-search engines "Copernic" [3], "Ixquick" [4],"MetaCrawler" [5], "WebFerret" [6], as the best; second best: "Chubba" [7] "Dogpile" [8], "ProFusion".

In a table, featuring several meta-search engines (updated in September 2000), Barker [2] evaluates such search engines which must be free, query several search engines from a single search box and offer at least one interface of potential usefulness in general searching -not subject focused.

There are private investigators have high opinion of the Ferret software package, even as being the best [17] since this tool does not only access multiple search engines quickly, but the user has the option of various search formats, including all keywords, exact phrase, Boolean etc. He continues to state that "these search programs make tools like Yahoo look like a children's book for beginners only".

The present author has no experience as yet with Chubba or ProFusion or WebFerret, is of the opinion that the others mentioned are not such easy and satisfactory tools as Copernic is. It should be mentioned that the Copernic 2000 version was much more stable and provided less handling errors than the 2001 edition.

The meta-search engine "Mamma" [10] carries the sub-title "The Mother of
All Search Engines". It handles the categories "Web", "Images", "News",
"Audio", "Video", plus "Power Search". The user may decides as to whether to
use all search engines listed or leave one or more out by adding or
erasing tags. The writer of this article is not that satisfied in the use of "Mamma". It appears to be too cumbersome, but other users may be of a different opinion.

Further such engines are listed e.g. at "What is the CUI, W3 Search Engines" [11] and Yahoo [12].

Another meta-search engine is "37.com" [13], handling search 37 engines at a time, "Cyber 411" [14], "Inference Find" [15], "The Internet Sleuth" [16] with more than 2000 links.

The meta-search qb Search engine has several features that set it apart from its peers. In addition to the usual suspects (Google, Yahoo, Raging,
MetaCrawler, etc.), the engine also searches News Index and Deja.com now
in the head of Google. qbSearch allows users to specify the number of pages displayed per engine and combines them all into one large page (this may take a while to load). The final and best feature of the site is the
QuickLinks mode, which (when activated) allows users to select all of
the links they want to view and then display the first page of all of
them on a single page. Clicking any link on these pages launches a
new browser window. [36]

The A-Z Search Engines and Directories is part of the "Search Engines" section which features nine other categories providing links to hundreds of general purpose, country specific, specialized, and Invisible Web search engines. [36]

The present author believes that it is difficult to firmly say which meta-search engine is really the best. It very much depends on what a user frequently needs, in which country he is located, which language he uses. These are factors which "automatically" restrict the usage of meta-search engines or even simple engines at that.

The (Meta)WebCrawler.de, i.e. the German version, concentrating on websites located in Germany or other German speaking countries, such as Austria
and Switzerland or on websites which are published in German will be less of use to a searcher in the USA who (only) speaks English. The same applies
for instance to the French or Spanish version of Yahoo.

Other German meta-search engines: "C.U.S.I." [18], "MetaGer" [19], "MESA" [20], "SavvySearch" [21].

An international meta-search engine, also supplying information in four
languages at the time is called Colossus [35].

 

In the first part of this essay the author discussed search engines and in the second part meta search engines and tools, which search (nearly) simultaneously several search engines thus saving the users or surfers time and consequently money.

Saving time and money: not only working force's time, which also is thus costing money to the employer, but in many countries there are no (longer) flat rates, but surfers have (still) to pay for every bit and byte online.
In some countries, e.g. Germany, the German Telekom, the leading European
provider, ceased its main trial service of a flat rate, since they were
losing money.

Limitations of meta-search engines

The way a user enters the search keywords is called search protocol and is far from being standardized [2]. Almost all accept "... ", meaning that the user wishes only that "phrase" in that sequence and none other. By inserting the words "Standard Chartered Bank" (a leading bank in the Near East and in Africa), the user indicates to the search engine that he does not want the word "Standard" and/or "Chartered" and/or "Bank", since he would be overwhelmed by answers, irrelevant to him. The user only wants that bank. He may even more refine his search by going first to the country where the bank is located, e.g. South Africa, business or bank, then the city and then only insert "Standard Chartered Bank".

A few engines accept the search mode Boolean which works in the main
with words such as AND, OR, and NOT. Fewer accept ( ) to group terms. Others work only with the mathematical signs such as + or -, therefore
called the "math search engines". Some default to OR, some to AND. Some take the * to truncate or "cut" the words to stems. Other engines stem automatically.

However, the general user will mostly be satisfied with the results after
having inserted the searchable term by either "...." or + .

Three main factors determine the usefulness of any meta-search engine:

1. The search engines they send the user's search terms to (size,
content, number of search engines, their ability to choose the
search engines the user prefers); all of them search subject
directories as well as search engines and intermix results from all.
2. The way they handle or process the user's search terms and search
syntax (Boolean operators, phrases, and defaults they impose);
3. The manner they display results (ranking; aggregated into one list,
or with each search engine's results reported separately).

A user is strongly advised [2] to tailors his approach to the kind of information needed. Meta-search engines are helpful if he is looking for a unique term or phrase enclosing phrases in quotes " "; or if he simply wants to test run a couple of keywords to see if he gets what he wants. For such straight-forward searches, the unique ranking algorithm used by Google (based on how many other sites link to a site) often finds exactly what is wanted, better than any meta-search engine. Barker believes that unless the user chooses one meta-search engine, the user may limit his search to Google only.

The present author is not quite of the same opinion. By starting the search with Yahoo which extensively uses Google, at the same time letting the simple version of Copernic [3] load, the results, may be astounding, even though Copernic also of course searches Google. Yahoo does not find the searched website. Excite or Webcrawler, other search engines may.

For more difficult searches, it is recommended to employ [2] a search engine the user may search within results on a term or phrase he specifies. Learning AltaVista Advanced Search and Northern Light Power Search and possibly Infoseek is a further possibility to improve the user's ability whenever wishing to retrieve huge resuls and wanting to focus on some specific aspects.

Other possibilities to improve his results [2] the user should consult Berkley's recommended search strategy based on what the user knows and wants to know. The user will learn when to consult subject directories, how to look for expert guides and specialized databases -- all of which have a valuable place in the repertoire of searching skills for the experienced searcher.

A further good source about search engines of all kinds, the way they
work, tips and tricks can be obtained from the Search Engine Watch [31].

Meta-searchers Barker [2] recommends

The best free downloadable meta-search tool he believes is Copernic [3] (which is a utility by definition), especially because of its many customisable features and ability to refine search within results using Boolean [34] operators. It is easy to download and install, and the user does not need to buy Copernic's other products. For serious searching using a single meta-searcher, Copernic is hard to beat. It has extensive help. It requires Internet Explorer or Netscape.

The present author concords to that opinion preferring it to Ferret (also a utility) both because of it is easier to use and the basic version being free of charge.

Based on where they search, easiness of use, ability to focus, capacity to handle more advanced searches intelligently (e.g. by translating or carefully routing Boolean operators), some meta-search tools stand out as superior. For the general purpose, freely accessible engines such as Ixquick and MetaCrawler are sufficient. Still professional searchers spending much time online, especially in specific field, may be better
off with an even pro(fessional) version.

c) Portals

Either to save money, but more likely space and to attract more surfers meta-search engines in a growing number adopt a major feature of the
single search engines: Portals, offering a lot of often distracting other services and variations of searching. [2]

The trend is for many single search sites to offer not only searching and links to resources by subject, but also many other services (stock quotes, airline tickets, shopping malls, news links, games, chat rooms, free e-mail, and much more). The goal seems to be to lure as many users to the site and keep them there as long as possible, probably because the site's advertisers should benefit.

This set up is called "Portal", derived from the Latin meaning an opening
(to other re-sources). The present author may add that the advertisements
and links are placed mostly right, left and on top of the main part of the
site, thus with some imagination the surfer "sees" a "Grecian" portal or
entry.

It should be added that there are some signs that the trend is reversible, e.g. WebCrawler [29]

[1] http://www.ac.lc/learnnews/findinfo.html
[2] The Library, University of California, Berkley, Joe Barker (creator
and maintainer of the document):
Meta-Search Engines: When to use and not to use them:
Teaching Library Internet Workshops, University of California,
Berkeley, revised April/September 2000
http://www.lib.berkeley.deu/TeachingLib/Guides/Intenet/MetaSearch.html
see also: Finding Information on the Net - A Tutorial
http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/FindInfo.html
[3] http://www.copernic.com (German version: http://www.copernic.de)
[4] http://www.ixquick.com
[5] http://www.metacrawler.com
[6] http.//www.ferretsoft.com/netferret, but the advanced version
[7] http://www.whatuseek.com
[8] http.//www.dogpile.com
[9] http://www.profusion.com
[10] http://www.mamma.com/
[11] WL3 Search Engines XML
http://cui.unige.ch/meta-index.html
[12] Yahoo! Home>Computers and Internet>Internet>World Wide Web>Serching
the Web>Search Engines and Directoris>All-in One Search
Pages
http://dir.yahoo.com/Computers_and...rectories/All_in_One_Search_Pages/
cf. also http://www.albany.net/allinone/
[13] http://www.37.com
[14] http://www.cyber411.com
[15] http://www.inference.com/infind
[16] http://www.isleuth.com/
[17] Jeffrey Gross, Forensic Computer Examiner, jg@forensic-computer.com
[18] http://www.unix-ag.uni-siegen.de/search/
[19] http://meta.rrzn.uni-hannover.de/
[20] http://mesa.rrzn.uni-hannover.de
[21] http://guaraldi.cs.colostate.edu:2000/form
[22] http://www.findlaw.com
[23] http://www.versuslaw.com, supplied by Barry Zalma, lawyer, PI and
C.F.E., bzalma@earthlink.net
[24] http://www.llrx.com/features/europenew.htm
[25] http://searchenginewatch.internet.com/links/Kids_Search_Engines/
[26] http://www.wuv.de
[27] InternetSeer Report (website monitoring): May 26, 2001
http://www.internetseer.com/
[28] TOXNET, the Toxicology Data Network, http://toxnet.nlm.nih.gov/ ,
this info supplied by Bill Schneid, Criminologist, on May 25, 2001,
http://globalprojectsltd.com
[29] WebCrawler Gets "Deportaled",
http://www.webcrawler.com/info/whats_changed,
discussed in Search Engine Watch, July 3, 2001 - Number 42,
copywrite: INT Media Group, Inc.
[30] http://www.yahoo.com
[31] Search Engine Watch
http://searchenginewatch.com/
[32] found when searching "Yahoo" and "Teoma" search engines for the
expression "meta"
[33] OpenText
http://www.opentext.com)
[34] Boolean Searching on the Internet
http://www.albany.edu/library/internet/boolean.html
[35] http://www.searchenginecolossus.com/
[36] http://qbsearch.com and http://websearch.about.com/internet/websearch/cs/azsearchengines/index.htm
[37] http://tvnews.vanderbilt.edu/

 

Copyright: M. Ettisch-Enchelmaier, 2001
All Rights Reserved