This chapter uses what you learned in Chapter 7 to build a complete application using collections and streams. The program, however, purposely contains several errors. This is so you can get some experience locating and correcting errors using the Smalltalk/V Debugger.
The examples for this chapter are stored in the disk file, chapter.8. Since you will again be making modifications to your Smalltalk/V environment, be sure to save the image when you exit Smalltalk/V.
The first thing to do is implement a new class, WordIndex, which allows you to create a database of documents and locate them based upon the words that they contain. Documents are ASCII text files, viewed as a series of words containing alphanumeric characters separated by a series of non-alphanumeric characters. You query the database by supplying a collection of word strings. The query returns a collection of the file names of all the documents that contain all the words. You could use the word index, for example, to locate resumes in a personnel system, such as all employees whose resumes contain the words "object-oriented programming."
Instances of class WordIndex have instance variables documents and words:
documents is a set of strings of the document file path names whose words have been entered into the word index.
words is a dictionary, with each key containing a string for a word and each value being a set containing the path names of all documents containing the word.
Therefore, the class definition is:
Object subclass: #WordIndex instanceVariableNames: 'documents words' classVariableNames: ' ' poolDictionaries: ' '
Add WordIndex class definition and methods to your Smalltalk/V image by evaluating the following expression:
(File pathName: 'tutorial\wrdindx8.st') fileIn; close
Now select Update in the Classes menu of the Class Hierarchy Browser so you can browse the methods of the new class WordIndex. There are six methods defined for class WordIndex, as follows:
addDocument: pathName "Add all words in document described by pathName string to the words dictionary." | word wordStream | (documents includes: pathName) ifTrue: [self removeDocument: pathName]. wordStream := File pathName: pathName. documents add: pathName. [ (word := wordStream nextWord) == nil] whileFalse: [ self addWord: word asLowerCase to: pathName]. wordStream close addWord: wordString for: pathName "Add wordString to words dictionary for document described by pathName." (words at: wordString) add: pathName initialize "Initialize a new empty WordIndex." documents := Set new. words := Dictionary new locateDocuments: queryWords "Answer an array of the pathNames for all documents which contain all words in queryWords." | answer bag | bag := Bag new. answer := Set new. queryWords do: [ :word | bag addAll: (documents at: word ifAbsent: [#( ) ] ) ]. bag asSet do: [:document | queryWords size = (bag occurrencesOf: document) ifTrue: [answer add: document] ]. ^answer asSortedCollection asArray removeDocument: pathName "Remove pathName string describing a document from the words dictionary." words do: [ :docs | docs remove: pathName]. self removeUnusedWords removeUnusedWords "Remove all words which have empty document collection." | newWords | newWords := Dictionary new. words associationsDo: [ :anAssoc | anAssoc value isEmpty ifFalse: [newWords add: anAssoc] ]. words := newWords
Let's look at class WordIndex in terms of the high-level messages which create an index and make queries.
We mentioned earlier that we've included some intentional errors; this is the first place where they occur. For this reason, don't evaluate these messages until the tutorial asks you to do so.
You construct and use the word index in three steps. First, create an empty word index (remember, don't evaluate this expression yet):
Index := WordIndex new initialize
Index is created as a new global variable which will immediately be turned into (that is, it will store the new instance of) a WordIndex. The initialize method initializes instance variables of the WordIndex; that is, documents now contains an empty set and words contains an empty Dictionary.
Next, add the words from documents to the WordIndex. The addDocument: method creates a file stream to scan the document, repeatedly sends the nextWord message to the file stream to obtain each word, and then uses the addWord:for: method to enter each word/document pair in the words dictionary. For example, to add the words from the Chapters 5 and 6 sample files, you will use the following expressions (again, don't evaluate these yet):
index addDocument: 'tutorial\chapter.5'. index addDocument: 'tutorial\chapter.6'.
To query the word index, you use the locateDocuments: message, as in the following examples (again, do not evaluate them):
Index locateDocuments: #('show' 'class') index locateDocuments: #('where' 'the' 'turtle') Index locateDocuments: #('each' 'talk')
Each query above returns an array of strings, containing the document path names for all documents that contain all words in the query.
The locateDocuments: method is somewhat more complex than the other methods in its class. it uses a bag to accumulate all the path names of all the files that contain each word in the query. (Remember that bags, unlike sets, can contain multiple occurrences of the same object.) It then examines the bag to find any documents which are repeated as many times as there are words in the query; these are the documents which contain all the words.
Tha'¡s how this class is supposed to work. Let's see if it does. (From this point on, start evaluating the sample expressions again.) First, build a new word index and assign it to the global variable index:
Index := Wordlndex new initialize
Now add the tutorial files for Chapters 5 and 6 by evaluating the following addDocument messages:
Index addDocument: 'tutorial\chapter.5'. Index addDocument: 'tutorial\chapter.6'.
Oops! Instead of adding the tutorial files, we get a Walkback window as seen in Figure 8.1.
As you saw in Chapter 2, a Walkback window describes an error condition. The label shows the error condition, and the text pane shows the most recently sent messages, with those most recently sent appearing first.
In the current Walkback, the label says that the addWord:to: message is not understood, while the top line in the text pane shows WordIndex as the class of the object which did not understand the message.
Whenever you get a Walkback window, you generally do one of three things:
Figure 8.1
Walkback Window
In this case, you have enough information in the Walkback window to fix the problem. Look at the code for class WordIndex using the Class Hierarchy Browser. We defined a method addWord:for:, but sent the message addWord:to: (in the addDocument: method) which was not understood. We used the wrong message! Close the Walkback.
Correct the addDocument: method to use addWordfor: instead of addWord:to:. Then try again to add the tutorial files to the word index using the following expressions:
Index addDocument: 'tutorial\chapter.5'. Index addDocument: 'tutorial\chapter.6'.
Not fixed yet! This time, you get a new Walkback. The title of the Walkback window says Key is missing. Since the problem is not obvious, you can get more information by opening the Debugger window. Select Debug from the Walkback menu or press the Debug button on the text pane. The Walkback window closes and you'll see the following:
Figure 8.2
Debugger with Buttons
The Debugger window, with its multiple panes, gives an expanded view of the state of the Smalltalk/V environment which was summarized in the Walkback. The top left pane (a list pane) repeats the Walkback information; you can use this pane to select walkback lines. When you select a walkback line, the other panes are updated to contain related information.
Select the entry containing Dictionary>>at:. The bottom text pane displays the source code for the selected method, in this case at: from class Dictionary:
at:aKey "Answer the value of the key/value pair whose key equals a key from the receiver. If not found, report an error." | answer | ^(answer := self lookUpKey: aKey)==nil ifTrue: [self errorAbsentKey] ifFalse: [answer value]
The text that is reversed is the expression currently being evaluated in this method.
The at: method answers the value of the key/value pair whose key equals aKey in the receiver dictionary. If the key is missing, errorAbsentKey is invoked which leads to the initiation of the Walkback window.
The two panes on the top right are an inspector for the receiver, arguments and temporary variables of the selected method. In this case, you see the receiver self, the argument aKey and the temporary answer. Select self; you see that the value is an empty dictionary. Now select aKey. The value is the string 'tutorial', the first word in the file. We tried to do a Dictionary lookup on an empty dictionary, self, with the first word in the file as key.
Select the line containing addWord:for: in the walkback pane on the top left of the window. Now select the argument wordString. Again, it's the string 'tutorial'. We tried to access the words Dictionary with a key, without first testing whether the key is present! Correct the addWord:for: method in the bottom pane of the Debugger to look as follows:
addWord: wordString for: pathName "Add wordString to words dictionary for document described by pathName." (words includesKey: wordString) ifFalse: [words at: wordString put: Set new]. (words at: wordString) add: pathname
Select Save from the File menu or press Command + S. Notice what happens. The entries above addWord:for: in the Walkback list are discarded because a method they would return to has been changed. The addWord:for: method is still selected. Now select Restart from the Debugger menu. Execution resumes by re-sending the selected message.
The Debugger window disappears and the method builds the index. With the Dictionary now built, try to make some queries. Evaluating the expression:
Index locateDocuments: #('show' 'class')
Another Walkback pops up signaling another error. Immediately open a Debugger window on this new error. You'll see the following window:
Figure 8.3
Inspecting Variables
The message at:ifAbsent: was sent to an instance of class Set which did not understand it. Select the walkback line containing Set(Object)>>doesNotUnderstand:. Then select self in the temporary variable list. The value is:
Set('tutorial\chapter.6' 'tutorial\chapter.5')
Now select the third walkback line, representing a block in the locateDocuments: method, and examine the values of the temporary variables. Look at the source code for the method. The at:ifAbsent: message being executed is reversed. It uses instance variable documents as receiver. The value printed out for the set above confirms this, because it does contain the document path names.
Consider this statement. Either we sent the wrong message to documents or documents is the wrong receiver. This statement is trying to add to variable bag all the documents that include the string contained in variable word. The receiver is indeed wrong. This statement should instead use the words dictionary:
bag addAll: (words at: word ifAbsent: [#( ) ] )
Change the locateDocuments: method using the text pane in the Debugger, save the changed method, and restart at locateDocuments:. It works!
Try the following queries:
Index locateDocuments: #('where' 'the' 'turtle') Index locateDocuments: #('each' 'talk')
Now that you have class WordIndex debugged, let's see how you can use the Debugger to learn how an application operates by watching it send messages. Open a Debugger window to step through execution of the query you just performed by evaluating the following expression:
self halt. Index locateDocuments: #(each talk)
Figure 8.4
Debugging an Expression
There are 5 buttons in a left to right row just below the window title bar: the Walkback and Breakpoints radio set and the Hop, Skip and Jump push buttons as seen in Figure 8.4. Hop, Skip and Jump each cause limited program execution:
Try pushing the Hop button twice and watch the Debugger window update. Execution state is now at the beginning of execution of the expression, shown in Figure 8.4. Press Hop again. Notice how execution proceeds in small amounts with the next statement to be executed highlighted after the step. You can examine the state of objects after each Hop.
Now try pressing the Skip button a few times. Notice that the highlighting stays within the same method until the method finishes execution. This allows you to concentrate on a single method activation and ignore lower level messages.
Jump takes the most dramatic steps of the three step-wise execution buttons. In this case, Jump will progress to the end of the debugged expression as we have not set any breakpoints. For more information about the Debugger and to learn about use of the Breakpoints button, refer to Chapter 16.
You are now familiar with:
If you want to review, you can either repeat the tutorial, or refer to the detailed descriptions in Part 3 of this manual.
As always, if you exit Smalltalk before beginning the next tutorial be sure to save the image.