Subject: Conservation OnLine
**** Moderator's comments: This is a special mailing, devoted to a single announcement. This is very long, but I think it important enough to warrant the length. Please save this, as it is intended to be part of the fundamental DistList documentation. If you are in a hurry, read only sections 1 & 3, and skim 2. For those who would prefer to print this document, I will try to provide a postscript version at an anonymous ftp site soon. Outline 1. Introduction 2. An overview of WAIS 3. The CoOL databases 4. Finding a client Appendix 1. Frequently Asked Questions about WAIS Appendix 2 SWAIS help 1. Introduction The Preservation Department of Stanford University Libraries is pleased to announce the creation of Conservation OnLine (CoOL), a Wide Area Information Server (WAIS) dedicated to providing Internet access to a full text database of conservation information. The databases cover a wide spectrum of topics of interest to those involved with the conservation of library, archives and museum materials. Those of you familiar with WAIS can simply search cool-directory-of-servers.src at aldus.stanford.edu port 210, seed word "source", to find out more about CoOL. The rest of this article will describe the CoOL databases in more detail, will provide a short overview of WAIS, and will offer some help in getting started using it. Please save this article, as it will form the basic reference document for CoOL. The content of the CoOL databases comes from a variety of sources and we hope that all users will consider contributing some material to the project. As you use the server please pay attention to lacunae that you might be able to help fill. As a start, I'd very much like to assemble a collection of disaster plans. Please send your institution's disaster plan, in machine readable form (preferably as an ascii file) either by email to waiscool [at] aldus__stanford__edu or on a floppy to Walter Henry Conservation Lab Stanford University Libraries Stanford, CA 94305-6004 [Please include a note telling me what format the thing is (eg "Dos/Word Perfect" or "Mac/Microsoft Word", etc). We can probably read anything you throw at us if you tell us what it is] 2. An overview of WAIS Bruce Kahle, one of the key figures in the development of WAIS describes it this way: The Wide Area Information Servers system is a set of products supplied by different vendors to help end-users find and retrieve information over networks. Thinking Machines, Apple Computer, and Dow Jones initially implemented such a system for use by business executives. These products are becoming more widely available from various companies. Users on different platforms can access personal, company, and published information from one interface. The information can be anything: text, pictures, voice, or formatted documents. Since a single computer-to-computer protocol is used, information can be stored anywhere on different types of machines. Anyone can use this system since it uses natural language questions to find relevant documents. Relevant documents can be fed back to a server to refine the search. This avoids complicated query languages and vendor specific systems. Successful searches can be automatically run to alert the user when new information becomes available. 2.1 Client-Server Model In the client-server model, an increasingly important concept for network applications, an application entails two completely separate components, that may be created independently of each other. The server sits on remote machine somewhere on the network and process requests for services (e.g. database queries). These requests, and the response from the server may be in a format not at all comfortable to humans, but this doesn't matter because the end-user never communicates directly with the server; she talks only with a program called a client, which provides a comfortable user interface, translates the user's requests to a format that the server can understand, and processes the response from the server to make it palatable for the user. A common example of such a system is found in some Campus Wide Information Systems (CWIS) which allow users to communicate with online library catalogs at other universities. The foreign university catalogs may well use a completely different query language and would be difficult for the local users to make sense of, but they need never deal directly with the foreign catalog. Instead, they deal only with a client program (probably developed by programmers at the local site) that makes all database requests look like those the user are familiar with. The client translates those request into a standard form that the server can understand. When the server returns bibliographic records, the client steps in again and formats those records in a display that is comfortable to the user. Obviously, the client-server model depends on the availability of a standard "language" that both client and server software can agree to understand. In the library catalog example above, a standard known as Z39.50 provides such a language. Now it is common to speak of WAIS as if it were program, but is really a protocol, a language understood by clients and servers. It is, in fact, an extended form of Z39.50 The programs that we deal with (clients) are just pieces of software that happen to speak the WAIS protocol. 2.2 Full Text CoOL is a full text database. This means that what you retrieve is a final product (eg an article) rather than a pointer to another product. At first CoOL will contain only text files, but eventually it will also contain non-text material, such as images, in standard formats. Right now, there aren't many clients widely available that can handle non-text files, but when these appear, CoOL will provide material to keep them busy. 2.3 Guide to WAIS Searching Unlike conventional databases, WAIS does not use a specialized query language. That is, your question can be phrased in English, in whatever fashion you like. If the question doesn't produce the desired results, you will learn this immediately and can rephrase the question. By doing so, one quickly learns, without any real effort, what sort of questions get satisfactory results in a given database. The texts that are retrieved are returned with weights indicating the extent to which the documents match the words in your question, a concept central to the WAIS protocol, called relevance ranking. Your client can then use this to present the document list in order of relevance. 2.4 Bye-bye boolean One of the major differences between WAIS searching and conventional searching is that WAIS generally does not support boolean search operations. Whether this is a bug or a feature is largely a matter of preference, and those of us brought up with boolean searching will need to take a while to get used to the WAIS way of doing things. I can promise you though, that after a while, it does begin to feel comfortable. It is, by the way, not quite accurate to say that WAIS "doesn't do boolean", since there is nothing inherent in the protocol that prevents it, and indeed, there are experimental implementations that do support boolean searching. In place of boolean searching, WAIS offers natural language queries ("tell me about glues and adhesives and sticky things"), quick retrieval, casual browsing, and relevance feedback. 2.5 Relevance feedback One of the genuinely spiffy ideas in WAIS is relevance feedback. The concept is simple: after you've asked a question, perhaps in a less-than-optimal form, you have a set of retrieved texts that you can browse through. The chances are good, if the database has anything at all in your subject, that at least one of the retrieved texts will be the sort of thing you had in mind. With relevance feedback, you can repeat your search and tell WAIS that you are interested in seeing more texts that are "like" that one, with "like" meaning "having a lot of text in common with". CoOL will support relevance feedback, if your client supports it, but at this stage in WAIS's development (at least with the noncommercial implementation that CoOL uses) , it is not really as effective as it might be. Because of the limitations of the current relevance ranking scheme, the "similar" documents may not seem, to a human reader at least, to have much in common. Nevertheless, it does sometimes work well. Some clients allow you to cut and paste a portion of a retrieved document into a relevance feedback-driven query, rather than using the entire document. In this case, relevance feedback can be particularly effective. 2.6 Searching WAIS databases consist of the text files themselves and a set of indexes in which every word, except those in a stopword list, are indexed and weighted according to a relevance-ranking algorithm. In principle, an indexer should be able to discern that in an article on mass deacidification, the words "calcium hydroxide" are probably more important to you than the words "Thank you for your attention". At this stage in the game, however, things aren't that sophisticated, so you will need to be a little bit careful about choosing your search terms. A simple example will illustrate the point. Since (almost) every word in the text is indexed, if your search question for the cool database (the archives of the Cons DistList), is "So, what's the latest buzz on the subject of bookcloth" your search will be accepted, because WAIS will let you search for anything you like and will do its best to match the retrieved text to your query, but in practice, you will probably not be happy with what you retrieve. "So", "what's", "on", and "the" are in the stopword list, so they will not get in the way. The search will actually use only the words "latest", "buzz", "subject", "bookcloth". Now "bookcloth" is obviously a relevant term, but "buzz" and "subject" clearly aren't. "Latest" is indeed part of the concept of your query, but the appearance of the word in a text is not likely to indicate that the text is "the latest buzz", so it is irrelevant as well. In CoOL, because of the nature of the subject matter, "buzz" won't have much effect (in fact, at the moment, it doesn't occur in the database). "Subject" however appears in every single message, often several times, so unless other weighting factors dominate, the chances are good that articles that contain the word "subject" will be ranked as more relevant than the (fewer) articles that mention bookcloth. We have tried by various means to maximize the probability that the document ranking will reflect the subject matter of the text, but there are severe limits on how effectively this can be done with the available indexing tools. For subjects that are relatively more well-covered in the database, document ranking is rather decent. For example, if in the above example, you substitute "mass deacidification" for "bookcloth" you will probably find the ranking to be satisfactory. 2.7 The future WAIS as it exists today is a wonderful tool, but much of the excitement that surrounds it has to do with its potential. Like other tools based on the client-server model, the richness of the application depends upon the extent to which the server provides a rich set of services and the client provides an effective interface to those services. WAIS is still very young, and both the clients and servers are undergoing improvements so you should expect the WAIS scene to look a lot more interesting as the work progresses. For now, it is very exciting and quite useful, but one ought to be a little reasonable in one's expectations. Among the areas that are not yet as exciting as they promise to be are relevance ranking and, to a lesser extent, relevance feedback. At present, most of the servers (including CoOL) use an almost trivially simple algorithm to weight the words in the index, but work is underway to incorporate the findings of the information sciences to produce a new generation of servers that should provide really interesting search and retrieval functionality. 2.8 Stopwords When indexing texts, CoOL ignores many common words like "the", "what", etc., as well as single letters so they do not interfere with your search. However, stopwords are a function of the server, not the client, and your client has no way of knowing which of the terms it passed to the server were actually used for the search. Because of this, when the text is retrieved, if your client highlights the 'seed words', it may well highlight "the" and "what", giving the erroneous impression that those terms played a part in the selection of the document. 3. The CoOL databases In WAIS terminology, a "source" is a file that describes, for both the client software and the user, what the database is about and how it is used. It includes information about where the database is (address and port) as well as information about costs, how often the database is changed, etc. Your client can retrieve these source files and present them to you whenever you want to ask a question, making the CoOL databases an extension of your own computing environment. Although, the terminology is a little slippery, in general I will use "database" to refer to a collection of texts and "source" to refer to a particular type of WAIS file (called <database>.src, as described above. The universe of CoOL texts is subdivided into several databases (each of which is described by a source) and the divisions are calculated to enhance the probability that you will find a relevant answer (if not "the" answer) to your question. That is, it enhances searching precision. Since most WAIS clients allow you to search several databases at a time (even databases on separate hosts), it is easy to expand the scope of a search to increase the number of items retrieved (recall). As of this writing there are 8 databases, and more will be added as we gather material. As the databases grow, expect this structure to will be expanded and refined to provide a well defined search space. 3.1 cool-directory-of-servers cool-directory-of-servers is a top level directory for Conservation OnLine (CoOL), a collection of WAIS databases containing information of interest to people involved with the conservation of library, archives and museum materials. This is used to locate the individual CoOL databases, in which you will actually do your actual searching. To determine which CoOL database will best meet your needs, query cool-directory-of-servers. To see a list of all the CoOL databases, use the word 'source' as your search term. CoOL will return a list of all the other CoOL sources and your client can retrieve these and save them on your machine so that the next time you search, you will be able to select them as your target. New databases will be added to CoOL so it will be a good idea to search cool-directory-of-servers regularly. 3.2 cool cool contains the complete archives of the Conservation DistList. Every message that has appeared in the DistList since its inception, has been reformatted and enhanced (e.g. full names added to From: fields, subjects regularized, spelling corrected) to increase the probability of your search retrieving a relevant item. Searches will return individual messages rather than complete DistList digests. 3.4 cool-cfl The largest of the CoOL database, cool-cfl is the information workhorse, containing files on a wide variety of conservation topics. Most of your searches will probably include this database. 3.5 cool-cdr cool-cdr contains an uptodate version of the Conservation Email Directory (ConsDir). Searches return single entries for individuals. You can search for any word that appears in a Directory entry, eg "Conservator from California Interested in book structure" (but see "Guide to WAIS Searching" elsewhere in this document for ways to improve this question). 3.6 cool-net cool-net contains information of a general nature, concerning networking, mailing lists, the Internet, etc. It is the only component of CoOL that is not focused explicitly on conservation issues. 3.7 cool-lex cool-lex.src contains lexical and classification material pertaining to conservation and preservation, including thesauri (or microthesauri), glossaries, classification schemes, authority lists (descriptors, subject headings), etc. These items are segregated from other CoOL databases in order to prevent false hits in the other databases: if you search cool-cfl for "Adhesives" you are probably not going to be satisfied by learning that "Adhesives" appears an authority list. 3.8 cool-bib and cool-ref cool-bib contains complete bibliographies on conservation topics. cool-ref is similar but returns individual citations. Note that in general CoOL databases return full text rather than literature citations. cool-bib and cool-ref are, however, exceptions to this rule. Although there is a considerable duplication between the two databases, the overlapping coverage is not complete and cool-ref will probably always contain a great many more citations than cool-bib. Normally you will search cool-ref to find out if someone has provided a thorough coverage of a topic, and cool-ref to find answers to specific questions. 3.9 Acceptance of material to be mounted in CoOL. We will always be grateful to receive machine readable text to be mounted in CoOL and hope that you will all dig through your files for material to share. There are only a few restrictions, and of course we reserve the prerogative of deciding what will be mounted. 3.10 Copyright The material must be either in the public domain or material for which we have permission to reproduce and present in machine readable form. If you are not the copyright holder of the material you submit, please verify with the copyright holder that s/he is willing to permit us to mount the material and tell us, at the time of your submission, how to get in touch with him/her. If you submit material for which you hold the rights, please send a note with your submission, making explicit the nature and extent of the permission being granted. If you wish to include a copyright statement in the text, you are encouraged to do so. The following is an example that might be appropriate, but please confer with your own legal counsel before relying on it (we have *not* consulted an attorney about this clause and make no claims for its appropriateness). Copyright <YEAR> by <NAME>. Copying in excess of rights otherwise established under copyright law is permitted, without individual permission or payment of a fee, provided that copies are made or distributed for non-profit purposes and credit is given for the source. Abstracting with credit is permitted. If you discover copyrighted material in CoOL that you believe may be mounted without proper permissions, please let us know so that we can correct the error. 3.11 Advertising Advertising per se is not welcome, but announcements, technical specifications and other material whose primary purpose is to provide information about products and services, rather than to entice sales, are welcome. 3.12 Limitations We are not able to provide any support for client software. Nor can we offer help in obtaining and installing clients, beyond what is offered in this document. If you have trouble, please get help from someone at your site. If, on the other hand, you discover anomalies in the data (e.g. missing or incorrect Headlines) or in the behaviour of the server, we will be grateful if you would bring them to our attention. We do not have adequate resources to go out hunting for text or, in most cases, to scan printed text. If you want to suggest that a given printed text be scanned and mounted, we will be happy to record the suggestion, but the chances are very slim that we will be able to act on it. If you know of an appropriate text in machine readable form, please provide specific information about where it can be found (eg "by anonymous ftp to abc [at] xyz__edu in directory pub/doc/foo" or "Point your gopher to abc [at] xyz__edu and look in directory Foo" are helpful; "It's on listserv" is not). 3.13 Acknowledgements I would like to thank Jonathan Goldman of Thinking Machines for help advice on WAIS matters and for sharing with us the source code for WAISmail and Bill Tierney of the Stanford University Libraries Systems Office for invaluable help in getting this thing off the ground. 4. Finding a client If you are directly connected to the Internet, (ie. you can telnet and ftp from your machine to other places on the net) and are able to install software on your machine see 4.1 Finding WAIS Clients If you have access to Gopher see 4.2 Using Gopher with WAIS If you are directly connected the Internet (ie you can telnet and ftp from your machine to other places on the net) but are NOT able to install software on your machine see 4.3 Running SWAIS at quake.think.com If you are not on the Internet (ie you are able to send mail to Internet hosts, but are not able to telnet or ftp from your machine to other places on the net see 4.4 The WAISmail interface 4.1 Finding WAIS Clients WAIS clients are available, free of charge, for several machines. Some are more sophisticated than others, and all are likely to undergo considerable change in the next year or so. There are two clients available for the Macintosh, and both are fairly nice. Both require that you have MacTCP installed. WAISstation is a standalone program that is available by anonymous ftp to think.com in the the wais directory. You will also find a WAISstation demo there, which is a great introduction to using WAIS. WAIStation0.63.hqx is binhexed, so you will need binhex 4.0 or another utility that can unbinhex files (eg Compactor Pro, Stuffit Deluxe, etc). HyperWais, which requires Hypercard to run) is available from mendel.welch.jhu.edu [128.220.59.42] by anonymous ftp. The application is located in incoming/HyperWais.sea.hqx, and the source is in incoming/HyperWais.src.sea.hqx. sunsite.unc.edu has a nice selection of clients for a variety of computers including Dos and Windows machines, NeXt, Unix boxes, etc. They are found in pub/wais. If you don't find what you need there, see the Frequently Asked Questions posting below. 4.2 Using Gopher with WAIS If you have access to a Gopher client, you can use it to search WAIS databases, including CoOL. To find your way to CoOL, navigate through the Gopher directories until you find something like "Other Information," and look there for "WAIS Based Information". Somewhere below that you may find cool-directory-of-servers. If not just look for directory-of-servers, which is the top level directory at think.com. Search either for the word "conservation" and you will be presented with all the CoOL databases. Gopher clients are available for a variety of machines and you can obtain them by anonymous FTP to boombox.micro.umn.edu in the directory pub/gopher. There are of course, other sources, and you can use gopher to find them. If you do not have your own gopher client, there are publicly available Gopher sites. To use them, telnet to one near you and login using the name indicated in the table below (taken from the Gopher Frequently Asked Questions file) Non-tn3270 Public Logins: Hostname IP# Login Area ------------------------- --------------- ------ ------------- consultant.micro.umn.edu 134.84.132.4 gopher North America gopher.uiuc.edu 128.174.33.160 gopher North America panda.uiowa.edu 128.255.40.201 panda North America gopher.sunet.se 192.36.125.2 gopher Europe info.anu.edu.au 150.203.84.20 info Australia gopher.chalmers.se 129.16.221.40 gopher Sweden tolten.puc.cl 146.155.1.16 gopher South America ecnet.ec 157.100.45.2 gopher Ecuador tn3270 Public Logins: Hostname IP# Login Area ------------------------- --------------- ------ ------------- pubinfo.ais.umn.edu 128.101.109.1 -none- North America 4.3 Running SWAIS at quake.think.com NB in the following discussion, when the instruction says type "something", it is understood that you will not type the quote marks. Case *is* significant: "B" is not the same as "b". Remember to read whatever instructions are displayed on the screen as they will usually --but not always-- tell you what to do next. Running your own client is without question the most desirable way to use CoOL, but if you are unable to install a client, there are some options available to you. However, they are neither as powerful nor convenient as running your own client. Thinking Machines provides a publicly available client (SWAIS) which you can use via telnet. To use it telnet to quake.think.com and login as "wais". You will be asked for your terminal type (if you are not emulating VT100, the chances are good you will find quake's facility unusable). Detailed information for using SWAIS, whether at quake or on your own system, are found in the SWAIS Manual below. Note that you can not use the "Save" command on quake, since that would save the retrieved text on quake instead of your own machine, but you can use "m" to have the retrieved text mailed to your account. While you are logged on, you can get help by typing "?" Once logged into quake you will be presented with a list of 'sources'. Page down (upper case J" or, if you are in a hurry type /cool to search the source list for the cool- group of databases. You should see a list including cool-directory-of-servers.src, cool.src, cool-cfl.src, etc. Let's assume you want to search the cool-cdr, which contains the Conservation Email Directory, in order to find someone from Australia. Move the cursor until it is on cool-cdr.src and type SPACE to tell SWAIS that this is the database you want to search in. (You can select more than one database so that they will be searched simultaneously, but for this exercise let's keep things simple). Then type "w". This tells SWAIS you want to enter Words for your search. Type "Australia" <return> After the search you will see a list of names. Use the cursor to select one and type a SPACE to tell SWAIS to retrieve. While you are viewing the retrieved text, you can get help by typing "h". SPACE will move you forward to the next screen and "b" will move you back to the previous screen. Type "q" when you are done reading. Some useful keys (again, case is significant): ? & h get Help. If you try one and don't get help, try the other (and repeat to yourself "Unix is my friend") q stop doing what you're doing and go back. If you find yourself stuck, "q" should get you out of trouble. If you are at the Sources list, though, "q" will quit the program (but you will be asked first). delete/bs if your Backspace key doesn't seem to do what you want, try the Delete key, and vice versa. ^u erase the line you typed J Down a page 4.4 The WAISmail interface Until very recently, those whose only access to the network is electronic mail were out of luck, but now, thanks to the efforts of Jonathan Goldman, of Thinking Machines, there is a mail interface known as WAISmail. To get help on using WAISmail, send a message to waismail [at] think__com and make the first line Help You will receive a detailed help file by mail. A version of the help file is included below, but you should get a new one, as there may be changes as WAISmail develops. To see how it works, let's try the same search that we did with SWAIS. WAISmail searches are quite simple. They consist of two separate transactions a search and a retrieval. To search, send a message to waismail [at] think__com that looks like search cool-cdr australia In a very short time, you will get back a message that looks something like this From daemon [at] quake__think__com Mon Feb 1 16:42:19 1993 Date: Mon, 1 Feb 93 16:43:51 PST From: WAISmail [at] quake__think__com To: whenry [at] lindy__Stanford__EDU (homo obsolescensis) Subject: Your WAIS Request: Searching: cool-cdr Keywords: australia Result # 1 Score:1000 lines: 0 bytes: 421 Date: 0 Type: TEXT Headline: name |Drew, Nancy DocID: 41139 41560 /u/wais/Cdr/CONSDIR:/u/wais/src/cool-cdr@aldus. stanford.edu:210%TEXT Result # 2 Score:1000 lines: 0 bytes: 390 Date: 0 Type: TEXT Headline: name |Spade, Sam DocID: 104773 105163 /u/wais/Cdr/CONSDIR:/u/wais/src/cool-cdr@aldus. stanford.edu:210%TEXT Result # 3 Score:1000 lines: 0 bytes: 338 Date: 0 Type: TEXT Headline: name |Wolfe, Nero DocID: 140758 141096 /u/wais/Cdr/CONSDIR:/u/wais/src/cool-cdr@aldus. stanford.edu:210%TEXT Note that these will be very long lines and may wrap on your terminal screen, but you should not insert newlines except between items. Now to retrieve one or more of these texts, you will need to compose a new message to waismail [at] think__com and *include* the DocID lines for those items you want. If you are using Unix mail, when you are done reading this result set, you can r (reply) ~f (forward the current message) ~v (edit the outgoing message, cut away any items you do *not* want, save) . (send the message) Note that you don't have to edit away any of the headers or other extraneous material. Don't "quote" the included text (eg with ">") Only the lines that begin with "DocID:" are relevant. Be very careful however, not to delete the blank lines between items and not to insert or delete any extraneous spaces at the ends of lines. If your system automatically inserts a .sig or "--------Original Message________" line around the included text, be sure to insert one or more blank lines around the DocIDs Mail the message to waismail [at] think__com and with luck, in a short time you will get the full text of your selections. If you're not so lucky, you will get back an error message that may help you figure out what went wrong. The most likely problems are extraneous characters, especially if your editor or mailer has wrapped long lines and missing blank lines between the docids. If the file comes to you uuencoded, it means that WAISmail thinks you have requested a non-text file. You can recognize this situation because the file you receive will say begin WAIS.res 666 and be followed by what looks like garbage. Since (at least for the time being) all CoOL files are pure text, this is obviously an error. Most likely your mailer has joined the last line of the docid to your .sig or other extraneous text. Just insert a newline after the docid (ie after "210%TEXT) and it should work fine. Appendix 1. Frequently Asked Questions about WAIS Archive-name: wais-faq/getting-started Last-modified: 27 Dec 92 00:00:01 EST comp.infosystems.wais Frequently asked Questions [FAQ] (with answers) -1- What is the purpose of this newsgroup? -2- How can I search this FAQ to find the answers? -3- What is WAIS? -4- Where can I find more information on WAIS? -5- How can I get access to WAIS? -6- Where can I find WAIS software for the XYZ OS? -7- Where can I pick up the list of sources (e.g. databases) for WAIS? Please send suggested corrections and additions to: edguer [at] ces__cwru__edu ---------------------------------------------------------------------- Subject: -1- What is the purpose of this newsgroup? Date: 28 Oct 92 00:00:01 EST From the Charter: comp.infosystems.wais is for discussion of WAIS, the Wide Area Information Servers, a networked full text retrieval system developed by Thinking Machines, Apple Computer, and Dow Jones. ------------------------------ Subject: -2- How can I search this FAQ to find the answers? Date: 19 Nov 92 00:00:01 EST This FAQ follows the RFC1153 recommendations for message digests and thus should easily be viewed by newsreaders that understand message digests. This FAQ also uses the Subject: lines with the answer to each question and thus it should be easy to step through the answers with the "^G" command of rn. This FAQ marks each question with a "dash number dash" so that using a regular expression search pattern you can easily get directly to any question on the document. ------------------------------ Subject: -3- What is WAIS? Date: 27 Dec 92 00:00:01 EST WAIS stands for Wide Area Information Servers. WAIS is a networked information retrieval system. WAIS currently uses TCP/IP to connect client applications to information servers. Client applications are able to retrieve text or multimedia documents stored on the servers. Client applications request documents using keywords. Servers search a full text index for the documents and return a list of documents containing the keyword. The client may then request the server to send a copy of any of the documents found. Although the name "Wide Area" implies the use of the large networks such as the Internet to connect clients to servers distributed around the network, WAIS can be used between a client and server on the same machine or a client and server on the same LAN. WAIS uses the Z39.50 query protocol to communicate between clients and servers. WAIS does not, at this time, implement the full Z39.50-1992 specification. In particular, WAIS does not permit boolean searches but instead is restricted to relevance feedback. There are a large number of servers running currently [over 350 databases]. Topics range from recipes and movies to bibliographies, technical documents, and newsgroup archives. WAIS is a project of Thinking Machines, Apple Computer and Dow Jones. WAIS is a free product available with full source to the server, indexing software, and many clients. ------------------------------ Subject: -4- Where can I find more information on WAIS? Date: 3 Dec 92 00:00:01 EST Depending upon the information you seek there are many options. Perhaps the best place to start is the WAIS white sheet available via anonymous FTP from think.com in the file wais-corporate-paper.text. This will give you a good idea of why people got interested in WAIS and a very simple overview of the WAIS architecture. If you want to learn more about how WAIS really works or answer other FAQ's the best place to start is the documentation that comes with WAIS. The WAIS distribution is available via anonymous FTP from think.com in the file /wais/wais-8-b5.1.tar.Z. After uncompressing and untarring the distribution, you will find a ./doc directory that includes a more complete FAQ, documents for programmers, users guides, protocol specifications, a paper on digital librarian ethics, and a bibliography of WAIS articles. If you wish to do further reading the bibliography of articles published on WAIS is also available separately from think.com in the file /wais/bibliography.txt. Next, of course, there is the newsgroup comp.infosystems.wais. The newsgroup is regularly visited by the authors of WAIS at think.com and other experts on using both WAIS and other resources on the Internet. After listening in on the group for a while, you are welcome to post your questions if you have been unable to find an answer yourself from the documentation. Finally, there are a number of mailing lists which you can join if you wish to follow WAIS. wais-interest Contact: wais-interest-request [at] think__com This is a moderated list used to announcement new releases for the Internet environment. wais-discussion Contact: wais-discussion-request [at] think__com The WAIS-discussion is a digested, moderated list on Electronic publishing issues in general and Wide Area Information Servers in particular. There are postings every week or two. wais-talk Contact: wais-talk-request [at] think__com The WAIS-talk is an open list (interactive, not moderated) for implementors and developers. This is a technical list that is not meant to be used as a support list. Z3950IW Contact: LISTSERV [at] nervm__nerdc__ufl__edu Z39.50 Implementors list for low level discussions of protocol details. ------------------------------ Subject: -5- How can I get access to WAIS? Date: 19 Nov 92 00:00:01 EST Perhaps the easiest way to get started, if you do not want to get a copy of the full distribution and build your own clients is to try WAIS out using the client running at Thinking Machines. To do this you must use TELNET to connect to quake.think.com and enter the username "wais" [lowercase-no quotes] at the "login:" prompt. This will permit you to use swais (Screen WAIS). swais is a curses based interface, so if you have problems, it may be due to your terminal setup. If you are unsure of the commands, try using a question mark [?] at the prompt. ------------------------------ Subject: -6- Where can I find WAIS software for the XYZ OS? Date: 3 Dec 92 00:00:01 EST There are a number of sources for WAIS software available via anonymous FTP. [please try nic.funet.fi:/pub/networking/services/wais first, if in Europe] think.com:/wais/wais-8-b5.1.tar.Z This is the main UNIX distribution. It includes waisindex, the program that builds the indexes, and waisserver, the program that responds to client queries. The clients include: waissearch - a "dumb" tty client interface swais - a "simple" curses based client interface wais.el - A GNU Emacs client interface xwais - an X Window System client interface think.com:/wais/motif-a1.tar.Z mxwais - an OSF-Motif client interface that requires the xwais source. SunSITE.unc.edu:/pub/wais/openlook-wais.tar.Z xwais - an OpenLook (NeWS) client interface SunSITE.unc.edu:/pub/wais/sunview/sunsearch.src.003.tar.Z sunsearch - a SunView (SunTools) client interface SunSITE.unc.edu:/pub/wais/vms/vms-client/* client - a VAX VMS based client interface (based on the code from 8-b2?) SunSITE.unc.edu:/pub/wais/vms/vms-server/* waisserver - a VAX VMS based server waisindex - a VAX VMS based indexer think.com:/wais/WAIStation-NeXT-1.9.1.tar.Z WAIStation.app - a NeXTstep based client interface for NeXT workstations think.com:WAIStation-0-63.sit.hqx WAIStation - a Macintosh interface client based on MacTCP. MacTCP must be obtained separately. Source to the client in THINK C is available from quake.think.com:/wais/WAIStation-0-62-Sources.sit.hqx mendel.welch.jhu.edu:/incoming/HyperWais.sit.hqx HyperWais - A Macintosh Hypercard client interface. Based on MacTCP and Hypercard (which must be obtained separately). Source is also available from mendel.welch.jhu.edu:/incoming/HyperWais.src.sit.hqx SunSITE.unc.edu:/pub/wais/DOS/pcdist.zip pcwais - An MS-DOS client interface Based on Borland TurboVision and the Crynwr Packet Drivers. oac.hsc.uth.tmc.edu:/public/dos/misc/oacwais.exe oacwais - An MS-DOS client interface Based on FTP Software's PC/TCP. FTP Software's PC/TCP must be obtained separately. SunSITE.unc.edu:/pub/wais/Windows/wwais103.zip wwais - a Microsoft Windows 3.0 client interface Based on Visual Basic and Novell's LAN Workplace for DOS. LAN Workplace for DOS must be obtained separately. You can also use Gopher to access WAIS. For the availability of Gopher clients, please visit the comp.infosystems.gopher newsgroup. ------------------------------ Subject: -7- Where can I pick up the list of sources (e.g. databases) for WAIS? Date: 3 Dec 92 00:00:01 EST The current listing of publicly advertised sources is always available via anonymous FTP from think.com in the /wais directory in the file wais-sources.tar.Z (a compressed UNIX tar file). [please try nic.funet.fi:/pub/networking/services/wais first, if in Europe] ------------------------------ Appendix 2 SWAIS help SWAIS(1) UWO (1992-12-04) SWAIS(1) NAME swais, vtswais - a simple WAIS query front-end SYNOPSIS swais [-s sourcename] [-S sourcedir] [-C common sourcedir] [-h] [keywords] vtswais [-s sourcename] [-S sourcedir] [-C common sourcedir] [-h] [keywords] DESCRIPTION swais is a curses-based, simple screen user interface for making WAIS queries. This Simple WAIS interface is an basic access tool designed for those focused on data retrieval and not computer operation. It provides most of the functionality of the more complicated interfaces but features a simple and potentially more natural interface for non-bitmapped screens. The functionality supported includes source selection, keyword entry, and automatic document retrieval. There is currently no provision for relevance feedback based questions nor is there a mechanism for storing questions to be asked again. Remember, this is a simple interface! This software is fairly new and experimental. You should expect a few bugs. vtswais is a special version of swais that forces your terminal type to be a VT100 variant that works around a bug or two in swais. Use it on all machines when your terminal type is set to xterm as it fixes a problem where the last page of a document is cleared before it can be viewed. You might want to try vtwais if you encounter some other strangeness in the swais program. USING swais When swais is first started, the "Source Selection" screen is displayed. The database source files displayed here are a combination of those in the system source area and those that you have copied into your own $HOME/wais-sources directory. You must create this directory before trying to save any database sources. Select a database source (or source for short) by pressing the up and down arrow keys to move the reverse-video bar (if available on your terminal) over the source that you want and then pressing the space-bar. The selected source is marked with a star. You can select one or more sources for a search. A control-V, control-D or K will move you down a page of sources, <esc>v, control-U or J will move you up a page. The slash /<string> command can be used to search forward for the next line with a given string in the sources display. Type a "w" to move the cursor to the keyword list to be used in the search. Enter the keywords. Use control-U to erase them and start again, <rubout> to delete one character at a time. Once you have typed your keyword list, type a <return> to do the search of your selected sources using the keywords typed. When your search is complete and there are some matches, a "Search Results" screen is displayed. It lists information headlines about each matched entry. The score column gives you a 1-1000 rating for how "good" the match was. To retrieve an entry, move the bar over it using the cursor keys (or the slash search command or by number) and press the space bar. The document is retrieved and displayed using your PAGER. To retrieve and save or otherwise process a document use the vertical bar "|" command to pipe the document to a program. For example the command "cat >/tmp/article" would save the retrieved document in the file /tmp/article and the command "lpr" would send the item to your default printer. You can also use the "m" command to email a document. When retrieving from a directory of source files (like the directory-of-servers source) you can use the the "u" command to add it to your list of personal sources instead of the "<space>" to view it. Once the source has been added, it will appear in the "Source Selection" screen. (The source file will be copied into the $HOME/wais-sources directory (which must exist!)) To return to select another source or to try different search terms, use the "s" (source) or "w" (word) commands. To exit from the program, type a "q" at the source selection or results screen. COMMAND LINE OPTIONS The command line options can be used to customize or accelerate your use of swais. Most people just leave these out and interact directly with the program. See the previous section for details. -s is followed by a WAIS database source name. That name will be selected and you will be immediately placed at the keyword prompt (if no key words were entered on the command line). Note that only one database may be selected with the -s command-line option. The last one mentioned on the command line is used. (You can always add others once you are inside the program.) -S is used to specify the directory to look for your own personal set of WAIS sources. This directory is also used to save (the Use command) any new database sources that you may discover. If you don't specify this switch, $HOME/wais-sources is assumed. -C is used to specify the directory to use for the system- wide database sources. On our machines this defaults to /uwo/ccs/share/lib/wais-sources. -h is used to print a summary of the command line options to stderr. If a list of keywords is included on the command line they will be used for your initial search. FILES $(HOME)/wais-sources location for personal sources. /uwo/ccs/share/lib/wais-sources location for system-wide sources. BUGS While swais works on non-VT100 screens, it isn't very good looking unless the screen has some sort of high-lighting. The program uses highlighting to indicate the current selection and you will require a fast eye to see which selection or document is the current one if your screen doesn't implement reverse-video or some other form of marking. If you try to save (the use command) a source without having a $HOME/wais-sources directory, it looks as if the source is added to your list, but in fact nothing is done and no error message is generated. Create this directory first. The <esc>v command (for moving up a page) doesn't work on MIPS or Sun4 machines. Use one of the alternative commands: J or control-U. SEE ALSO xwaisq(1), xwais(1), waissearch(1), waisindex(1), waisserver(1) AUTHOR Program by John Curran (jcurran [at] nnsc__nsf__net). Third pass at a manual by Peter Marshall CCS, The University of Western Ontario: <peter.marshall [at] uwo__ca>. *** Conservation DistList Instance 6:42 Distributed: Wednesday, February 3, 1993 Message Id: cdl-6-42-001 ***Received on Wednesday, 3 February, 1993