COMPUTE! ISSUE 92 / JANUARY 1988 / PAGE 64
In my last column, I promised that this month would mark the beginning of a discussion of computer languages. In particular, I want to take a look at the strengths and weaknesses of various languages. So this month, I’ll open the mini-series by looking at data types.
If BASIC is your only programming language, then you probably have not run across this topic before. Yet, more than likely, you have already used various data types. In BASIC, the two underlying data types are numbers and strings. Typically, you might write program lines such as these, where the first line demonstrates numeric data types and the second shows strings:
Because BASIC has only these two types, the language has a very simple scheme for distinguishing them: String variables have a dollar sign on the end of their names, and string literals have quotation marks around their contents. Other data items are assumed to be numeric—simple and clean. Yet even in BASIC there are actually several implied data types that are not specially declared.
For example, consider the address that you PEEK or POKE to. It must be a number between 0 and 65535. The actual value at that address must be a number between 0 and 255. File numbers must be between 0 and 7. The list could go on. You object? You say these are all simply restricted ranges of the basic numeric data type? In BASIC, that is true. But in other languages….
Consider the following fragment of a Pascal program. In Pascal, the keyword TYPE means that the following declarations are naming various kinds of data, not reserving actual data space. The keyword VAR means that further declarations do indeed reserve space for variables.
TYPE Mem_Address = 0..65535; Mem_Data = 0..255; Channel = 0..7; Open_Mode = ( Rd, Wr, Up ) Cust_Rec = RECORD Name : String; Addr : String; City : String; State : String; Zip : 0..99999; Credit: (OK, Avg, Bad); END; VAR Peeker : Mem_Address; Peeked : Mem_Data; Customer : Cust_Rec; Mail_List : ARRAY [1..100] OF Cust_Rec;
Do you see what we have done? Thanks to Pascal’s very rich data-typing capability, we are able to explicitly say what kinds of things a given variable is expected to handle. Take a close look at the variable Peeked. Its declaration says that it is a memory data type. Most Pascals will not even let you try to do a statement such as this:
Peeked = 3.7 * Total;
You are trying to assign a number that probably has a fractional part to a variable that can only have integer values from 0 to 255. Pascal knows you are being naughty, and the compiler burps real quick! And, although the following statement might get through the compiler, it will probably get you a range error when you run the program (if the original value of Peeked is 2 or more):
Peeked = 243 * Peeked;
Wow! Safety first, right? Well, yes. But it is more than that. Code written with strong data typing is more likely to run correctly (I have had several Pascal programs that worked the first time, once they had successfully compiled). Most importantly, in a commercial environment, such code is maintainable—a programmer can look at the code months or even years later and figure out what it is doing.
So, without even really trying, I have shown you one reason to consider learning languages other than BASIC. And I did not mean to imply that Pascal is the only language that has advantages here. Although C is generally more forgiving (another way of saying you can shoot yourself in the foot more easily) than Pascal, you can build quite readable and properly declared data types and structures with it. And, in fact, the newer versions of C—ones which follow the proposed ANSI standard—offer an option of choosing all the close checking of Pascal.
Go back and look at those Pascal data type declarations again. In particular, look at the Cust_Rec type and the Customer and Mail_List variables. Just as Pascal allows more restrictive variables than BASIC, so does it allow more complex variables. Consider these legal Pascal statements (given the above declarations):
Customer.Name := 'Jones' Customer.Zip := 77344; Mail_List := Customer; IF Customer.Credit = Bad THEN Write('No Credit!');
Those first two lines might find their way into a BASIC program looking something like this:
Which is more readable? If you decided to change from 5-digit to 9-digit zip codes, which program do you think would be easier to modify? No contest, right? And how would you begin to do something as simply as those third and fourth statements in BASIC?
COMPUTE! ISSUE 93 / FEBRUARY 1988 / PAGE 62
Last month we took a look at data types and how they’re used successfully and profitably in computer languages such as Pascal. There are real and discernible advantages to using a language that handles structured data, and I hope I convinced you of that. Of course, most of those languages offer other significant reasons to use them, such as faster execution speed. Still, none of them do one thing as well as good old BASIC does. The interpretive environment of BASIC makes program development exceptionally easy.
When I travel to user-group meetings and show off one or the other of the OSS advanced BASICs, I inevitably write a program that I make up on the spot, in the meeting. And I usually manage to convert a few nonprogrammers into at least thinking about taking up BASIC as a hobby. I’m not sure I could do that with most compiler environments. So, like it or not, I do understand why most people want to learn to program in BASIC first. (And did you notice that I didn’t even mention the usual reason? BASIC comes with the machine, so you don’t have to pay extra to use it.)
So, given that most of you, my readers, will program in BASIC, the least I can do is show you some techniques to make such programming easier. To me, that implies showing you how to use techniques from other languages in BASIC. In turn, that means learning some tricks that will make BASIC more powerful.
Type in and try Program 1. When you run it, give it any numbers you like, including perhaps several occurrences of the same value. When you finally enter a 0 value, the program will print out the list of your numbers in sorted order. Congratulations—you’ve just used a technique known as an insertion sort.
The name makes sense, doesn’t it? As each new number is entered (line 25), we find where it belongs (that is, after which current number; lines 35 and 40), and then “insert” it into the appropriate spot in the list of numbers (lines 45 through 60). In some situations, this is a pretty good sorting method. For example, when you have to wait several seconds between user input, what’s a quarter second or so to insert a number? A lot of the efficiency of an insertion sort depends on the speed with which the actual insertion is made. In this little BASIC program, we used a FOR/ NEXT loop (lines 45 through 55) to do the insertion. (Note that this would be way too slow if we were trying to do a couple of thousand insertions.) Luckily, in most languages, there are faster methods. But, in any case, the sorting method is not the important part of this month’s discussion.
Now suppose, that instead of inserting a single number (as we were doing here), we were working with an entire set of information. Consider a typical mailing list, where we would be shuffling around a name, address, city, state, zip code, phone number, and various other bits and pieces. Can we, using BASIC, manipulate this information as easily as we sorted those numbers? Not quite, but we can come close.
Take a look at this example of a Pascal record as I presented it last month:
TYPE Cust_Rec = RECORD Name: String; Addr: String; City: String; State : String; Zip : 0..99999; Credit: ( OK,Avg,Bad ); END; VAR MailList: ARRAY [1..100] OF Cust_Rec;
Our variable MailList is an array of records, and each record holds several pieces of information about a given customer. Using the information in these records is almost easy. For example, we could find the zip code of customer number 17 by simply coding
The conversion from a number-sorting program to a record-sorting program is almost a trivial exercise in Pascal. While we can’t duplicate the feat as easily in BASIC, we can at least simulate this convenient grouping of related pieces of information into a record. Again, look at Program 2. This is actually the same program as Program 1, but it uses strings to simulate records. If you look at the code from line 300 to line 410, you should be able to find a direct correlation to the statements of lines 30 to 60 of the first listing. True, the lack of string arrays in Atari BASIC has forced us to use some pretty strange looking assignment statements because we are now moving around substrings instead of simple numbers. I’ve tried to make these movements as clear as possible, but don’t feel bad if it takes you some time to understand what is going on. I encourage you to print out the various strings (such as MAILLIST$ and RECORDS) at several points to see what is happening.
You may have noticed that these records are sorted based on the name of the person. Try this puzzle before reading on: Can you suggest ways of insuring that the sort is by zip code, instead?
And, if you have ST BASIC, Atari Microsoft BASIC, or OSS’s BASIC XL or BASIC XE, you might try converting this program to use string arrays. I think you’ll find that the only real savings in coding complexity occurs in the actual insertion loop (lines 300 to 410). For the rest of the program, good old Atari BASIC doesn’t suffer too much in comparison.
What have we accomplished? I hope you can see how, by isolating the record build/retrieve in subroutines such as those at lines 800 and 900 in this example, pseudo records are quite possible in BASIC. But we have also seen that the manipulation of these records can be tedious. And certainly moving all that string data around is not the fastest set of operations in the world. How could we improve things? Time to borrow some more concepts from structured languages such as Pascal: pointers and linked lists. But, for a look at those topics, we’ll have to wait until next month.
Thought I forgot the answer to my little puzzle? Nope. Two ways to sort by zip code: Rearrange the order of the data in the RECORDS string so that the zip code comes first; or, change the master record comparison in line 340 so that only the zip code portions of the strings are compared. For example:
340 IF RECORD$(78,82)>MAILLIST$(RECPTR+78,RECPTR+82) THEN NEXT RECNUM:STOP
Program 1: Simple Numeric Insertion Sort 10 DIM NPOS(20) 15 FOR TOP=0 TO 19 20 PRINT "GIVE ME A NUMBER BIGGER THAN 0 "; 25 INPUT NPOS:IF NPOS<=0 THEN 80 30 IF TOP=0 THEN CHK=1:GOTO 60 35 FOR CHK=1 TO TOP 40 IF NPOS>NPOS(CHK) THEN NEXT CHK:GOTO 60 43 FOR MV=TOP+1 TO CHK STEP -1 50 NPOS(MV)=NPOS(MV-1) 55 NEXT MV 60 NPOS(CHK)=NPOS 65 NEXT TOP 70 REM IF 20 NUMBERS, FALL THROUGH 80 REM TO HERE WHEN 0 OR LESS ENTERED 85 FOR CNT=1 TO TOP 90 PRINT NPOS(CNT) 95 NEXT CNT
Program 2: Insertion Sort of Pseudo-Records 100 REM DATA DECLARATIONS 110 DIM NAME$(30),ADDR$(30),CITY$(15) 120 DIM STATE$(2),ZIP$(5),CREDIT$(1) 130 RECSIZE=30+30+15+2+5+1:MAXREC=100 140 DIM RECORD$(RECSIZE) 150 DIM MAILLIST$(RECSIZE*MAXREC) 160 DIM SPACE$(RECSIZE),YESNO$(1) 170 SPACE$=" ":SPACE$(RECSIZE)=" " 180 SPACE$(2,RECSIZE)=SPACE$ 190 MAILLIST$=SPACE$:TOPREC=0 200 REM DATA ENTRY 210 FOR TOPREC=TOPREC TO MAXREC-1 220 GRAPHICS 0:PRINT TOPREC;" CUSTOMERS IN FILE " 230 PRINT "ENTER ANOTHER CUSTOMER (Y/N) "; 240 INPUT YESNO$:IF YESNO$="N" THEN 500 250 GRAPHICS 0:GOSUB 700:REM ENTER A RECORD 260 GRAPHICS 0:GOSUB 600:REM SHOW THAT SAME RECORD 270 PRINT :PRINT "IS THIS OKAY"; 280 INPUT YESNO$:IF YESNO$<>"Y" THEN 250 290 GOSUB 900:REM CONVERT TO RECORD FORMAT 300 REM FIND INSERT POINT 310 IF TOPREC=0 THEN RPTR=0:GOTO 400 320 FOR CHK=1 TO TOPREC 330 RPTR=(CHK-1)*RECSIZE 340 IF RECORD$>MAILLIST$(RPTR+1,RPTR+RECSIZE) THEN NEXT CHK:RPTR=RPTR+RECSIZE:GOTO 400 350 REM INSERT RECORD 360 FOR R=TOPREC TO CHK STEP -1 370 TEMP2=R*RECSIZE:TEMP1=TEMP2-RECSIZE 300 MAILLIST$(TEMP2+1,TEMP2+RECSIZE)=MAILLIST$(TEMP1+1) 390 NEXT R 400 MAILLIST$(RPTR+1,RPTR+RECSIZE)=RECORD$ 410 NEXT TOPREC 500 REM 510 REM DONE 520 FOR RECNUM=0 TO TOPREC-1 530 RPTR=RECNUM*RECSIZE 540 RECORD$=MAILLIST$(RPTR+1) 550 GRAPHICS 0:GOSUB 800:GOSUB 600 560 PRINT :PRINT "HIT RETURN TO SHOW NEXT RECORD"; 570 INPUT YESNO$ 580 NEXT RECNUM 590 GOTO 200 600 REM SUBROUTINE 610 REM SHOW A RECORD 620 PRINT "NAME ::";NAME$ 625 PRINT "ADDR ::";ADDR$ 630 PRINT "CITY ::";CITY$ 635 PRINT "STATE::";STATE$ 640 PRINT "ZIP ::";ZIP$ 645 PRINT "CREDIT RATING (A TO F) ::";CREDIT$ 690 RETURN 700 REM SUBROUTINE 710 REM INPUT A RECORD 720 PRINT "NAME >";:INPUT #16,NAME$ 725 PRINT "ADDR >";:INPUT #16,ADDR$ 730 PRINT "CITY >";:INPUT #16,CITY$ 735 PRINT "STATE>";:INPUT #16,STATE$ 740 PRINT "ZIP >";:INPUT #16,ZIP$ 745 PRINT "CREDIT RATING (A TO F) >"; 750 INPUT #16,CREDIT$ 790 RETURN 800 REM SUBROUTINE 810 REM TAKE APART A RECORD 825 NAME$=RECORD$ 830 ADDR$=RECORD$(31) 835 CITY$=RECORD$(61) 840 STATE$=RECORD$(76) 845 ZIP$=RECORD$(78) 850 CREDIT$=RECORD$(83) 890 RETURN 900 REM SUBROUTINE 910 REM BUILD A RECORD 920 RECORD$=SPACE$ 925 RECORD$(1,30)=NAME$ 930 RECORD$(31,60)=ADDR$ 935 RECORD$(61,75)=CITY$ 940 RECORD$(76,77)=STATE$ 945 RECORD$(78,82)=ZIP$ 950 RECORD$(83,83)=CREDIT$ 990 RETURN
COMPUTE! ISSUE 94 / MARCH 1988 / PAGE 63
In last month’s column, I discussed structured data types. In the program listing, I then used a large string and its substrings to simulate an array of records. This month, I will continue my efforts to help you write more structured programs, but first I’ll make some general comments on Atari BASIC strings.
I think that last month’s program demonstrates that long strings can be just as powerful as string arrays. I would expect that program to operate at least as quickly as an equivalent Microsoft BASIC program using string arrays, because the typical Microsoft implementation goes through a lot of overhead generating and reclaiming dynamic strings. In that example, we were inserting new records into our string structure. If we had been deleting records, Atari BASIC really would have shone. For example, suppose we have a string filled with 50-character pseudorecords. To delete the third such record, we could simply do this:
RECORD$(101) = RECORD$(151)
Presto! All records are moved up one spot, and the third one is gone.
A small sidetrack: Unfortunately, one failing of Atari BASIC is that it has no built-in way to conveniently save and restore such long strings to and from disk. The most common output method is to PRINT# such a string, but that doesn’t help a lot, since INPUT# is limited to no more than 255 bytes per line. I have seen many users resort to using a FOR/NEXT loop to PRINT# or INPUT# the individual pseudorecords one at a time. That works, but it certainly slows down disk I/O speed. In BASIC XL and BASIC XE, we added a pair of special statements for this purpose (BPUT# and BGET#, where the B stands for Block or Buffer), but you can accomplish the same thing in Atari BASIC with a pair of fairly short USR assembly language subroutines. (These routines have been published several times, and I won’t repeat them this month.)
To continue my discussion of how to achieve structured data features in Atari BASIC, I call your attention to Program 1. Study it, type it in, and run it. It is a fairly clumsy but working record-sort routine. That is, it sorts the kinds of pseudorecord strings that we also used in last month’s program. (Incidentally, I have used the worst of all possible sort algorithms: the bubble sort. Please, if you are serious about sorting your data, learn a couple of other methods, such as the heap sort, Shell sort, and quicksort. Why do I use the bubble sort? Because it is the smallest and easiest to demonstrate. Or maybe because I’m just lazy.)
As you study that listing, pay attention to lines 230–260, where the tests and swaps necessary to any sort routine are made. Because I purposely organized the data in my array of pseudorecords in the worst possible way (for a bubble sort, at least), the IF test will never branch around the swap, and we will make more than 4900 of these string swaps. Surely there must be a better way.
Time to introduce another concept from the structured languages. In any computer language, moving blocks of data around (whether records, pictures, disk blocks, or whatever) is time-consuming. So most programs don’t actually move the data. Instead, they move pointers to the data.
What is a pointer? Quite simply, a pointer is a variable that contains the address of another variable. Atari BASIC allows only one explicit kind of pointer—the ADR function that gives the address of a string. And, indeed, many programmers use ADR as a pointer when they pass the address of a string to an assembly language subroutine. (Imagine having to pass the bytes of a string through a series of POKEs.)
But there is another, hidden kind of pointer in almost any computer language: array and string indices. As an example, if the record data we are working on is always within a particular string, for example, then we need only know the relative position of a given record within the string in order to obtain its information. We most likely do not care about the actual physical memory address of the data.
Merge the lines shown in Program 2 to those already in place in Program 1 (deleting the three lines noted) and study the results of using implicit pointers. There is (rather obviously) little difference between the two programs. Instead of swapping actual substring pseudorecords, we are now swapping only the indices into the master string. When you run this second version, you should notice a speed improvement of almost 2:1. (I got 70.8 seconds versus 135.3 seconds, but that was done using the FAST mode of BASIC XL. Your times will likely be slower.)
As clever as this trick is, it does not answer all of a programmer’s needs. Suppose, for example, that the data to be sorted resides in two or three different arrays. The logistics in BASIC get complicated. In a language with more data structuring capabilities, where a pointer can point to a given record type no matter where the record might be, such a division of data would probably make virtually no difference on program speed.
Enough for this month. Next month, we will go back to the acrostics puzzle of the December issue, since it generated more mail for me than any topic in recent months. If I have managed to convince you that records and pointers are valuable tools, wait until you see my proposed solution to the acrostics problem!
Program 1 100 DIM TEST$(10000),TEMP$(100),BLANK$(100) 110 BLANK$=" ":BLANK$(100)=" " :BLANK$(2)=BLANK$ 120 FOR R1=0 TO 99:RPTR=R1*100+1 130 TEST$(RPTR)="THIS IS RECORD NUMBER " 140 TEMP$=STR$(199-R1):TEMP$(4)=BLANK$ 150 TEST$(RPTR+22)=TEMP$(2) 160 NEXT R1 170 REM OUT OF ORDER... SORT THEM 180 POKE 20,0:POKE 19,0 190 FDR R1=98 TO 0 STEP -1 200 FOR R2=0 TO R1 210 PTR1=R2*100+1 220 PTR2=PTR1+100 230 IF TEST$(PTR1,PTR1+99)<TEST$(PTR2,PTR2+99) THEN 270 240 TEMP$=TEST$(PTR1,PTR1+99) 250 TEST$(PTR1,PTR1+99)=TEST$(PTR2,PTR2+99) 260 TEST$(PTR2,PTR2+99)=TEMP$ 270 NEXT R2 280 PRINT " * "; 290 NEXT Rl 300 PRINT 310 TICK=PEEK(20):TOCK=PEEK(19):IF TICK<>PEEK(20) THEN 310 320 TIME=TICK+256*TOCK 330 PRINT "THAT TOOK ";TIME/60;" SECONDS" 340 PRINT "HIT RETURN TO SEE LIST "; 350 INPUT TEMP$ 360 FOR R1=0 TO 99:RPTR=R1*100+1 370 PRINT TEST$(RPTR,RPTR+99) 380 NEXT R1
Program 2 105 DIM RPT(99) 125 RPT(R1)=RPTR 210 PTR1=RPT(R2) 220 PTR2=RPT(R2+1) 230 IF TEST$(PTR1,PTR1+99)>TEST$(PTR2,PTR2+99) THEN RPT(R2)=PTR2:RPT(R2+1)=PTR1 240 REM DELETE 250 REM THESE 260 REM 3 LINES! 360 FOR R1=0 TO 99:RPTR=RPT(R1)
COMPUTE! ISSUE 95 / APRIL 1988 / PAGE 56
By now, most of you have heard that Atari has announced that it is, indeed, going to sell a CD-ROM. The advantage of a CD-ROM is that a single optical disk can hold hundreds of megabytes of information. The disadvantage is that CD-ROMs are exactly what the second part of their acronyms suggest: Read Only Memory. The computer can not write to such a device.
But the computer industry is working very hard to overcome this restriction. Welcome to the world of the WORM—Write Once Read Many. Special optical disk drives have already been introduced that use lasers to write information. The data thus written cannot be changed, but it can effectively be “erased” and a later, updated copy can be written to another part of the disk. A typical home user could probably use a single such optical disk for a couple of years before needing to copy the most recent versions of all files to a new, clean disk. But don’t hold your breath waiting to buy one—at least not unless you’d rather buy one than, say, a new sports car. However….
I know it may be hard to believe, but the designers of the original eight-bit Atari computers, way back in 1979, included a close relative of the WORM in their design. True, it is slower than a WORM, and it isn’t as easy to use, but it works! And yes, the WORN is built into each Atari eight-bit computer!
There are a couple of ways to use an Atari WORN, but here is one of the simplest. From BASIC, just type in the command:
Then load a BASIC program type:
Presto! Your program will be saved to this marvelous device. (Hit RESET to disable the WORN.)
Of course, you should be careful not to rely on the WORN. Certainly, compared to a WORM, recovering programs saved to this Write Once Read Never device can take a while. If you happen to have a LAND device handy, you can make a quick copy of small programs saved to the WORN, but otherwise you will probably have to ensure a reliable connection between your biological optical devices and your digital extremity input devices.
Another marvelous acronym, pronounced “wizz-ee-wigg,” is an old one that is relatively new to computers: What You See Is What You Get. Usually applied to word processing programs, where it means that the printed copy will look like the screen display (implying a higher-resolution display than that of an eight-bit Atari), this time I use it in its old meaning, the one a flea market vendor might use. Take another look at just the initial letters of the words in my headings up until now. Together they make a single acronym. One very appropriate to this month’s issue.
Actually, my tale of the WORN device owes much to tales of WOM (Write Only Memory) devices that have abounded in computer folklore for ages. (Well, 10 or 15 years is “ages” when it comes to computers, right?) I remember one article that showed a picture of a water tower and claimed it was a WOM big enough for a whole town. So, if you don’t like jokes, I apologize, but I haven’t pulled an April Fool jest in a couple of years. It was time. (Oh, yes, the LAND above is not an acronym: I was referring to a Polaroid Land camera. And biological optical devices are your eyes, and digital extremity input devices are your fingers, of course.)
A couple of my columns lately have turned out to be mildly prophetic of other COMPUTE! articles. One article that related to some of my recent comments was “Tri-Sort for Atari” on page 88 of the February issue, in which Arthur Horan provides you with a fast machine language sort that you can use with the pseudo-fields and pseudorecords I described in my February and March columns. The Shell-Metzner sort used by Mr. Horan is not the fastest for very large arrays of data, but it is probably quite well suited for the number of records you can pack into an Atari BASIC string. In my March column (which, of course, was written long before I saw the February issue) I said that I hoped you wouldn’t use my quick-and-dirty bubble sort. With the help of Mr. Horan, you don’t have to.
Last month, I also promised to return to the subject of my December article: Acrostic and other word puzzles. Well, in the December issue I said that I had yet to see a really good crossword puzzle program. Lo and behold, on page 61 of the February issue is a review of Crossword Power (for IBM PCs) that shows indeed how limited such programs are. I think the program did a creditable job with the number of words it was given, but the result was far from ideal.
For example, a typical newspaper crossword puzzle is perhaps 5–10-percent black space. The one shown in that review was more like 75-percent black space. Too, it is considered less than ideal for words in a newspaper puzzle to have more than one uncrossed letter. In the puzzle of the review, several words are “hooked in” by a single letter! In at least one case, this results in a clue with two answers. (See 21 Across: A musical instrument. Is it a piano or cello?) Granted, the reviewer gave the program very few words to work with (only 35), but I can’t help but wonder how long it would take to generate a good puzzle if one gave it a list of a couple of thousand words.
In this same vein, several readers wrote to give comments and suggestions about the acrostics problem. (To refresh your memory: The problem is to write a program that will produce all valid five-by-five acrostics or word squares from a given list of five-letter words. Assume that there are 5000 words in the list.) One gentleman suggested that I was making the problem too hard: I should limit the number of words and accept the first puzzle produced. Well, yes, that wouldn’t take as long, but that is kind of like building a chess-playing program that can only take over after a human has played the first 40 moves, and even then it can only play until it finds the first check (but not mate). As a practical matter, perhaps the gentleman is right. As a mathematician (which I was, once, I think), I want to see a problem solved, not sidestepped.
I even got two versions for other computers. An Amiga version took about three times as long on the Amiga as on the eight-bit Atari. But that is because of the inefficient way that Microsoft BASIC strings are implemented.
As for myself, I haven’t had time to put together a complete solution, but I have started a couple of paper designs. I am convinced that, as with so very many computer problems, a really good solution depends on finding the right way to represent the data (in this case, the word list).
One possibility is this: How about a “map” wherein every single possible five-letter word is represented by a YES/NO flag? (That is, yes or no that the flagged word exists in our word list.) In compact form, such a map requires 26^5 bits, or about 1.5 megabytes. In a more practical form (use a 32-bit computer word for each set of 26 bits), one still needs just a little under 1.9 megabytes. Hmmm … anybody with a four-megabyte ST listening out there? (Actually, for efficiency, you would want four maps of increasing size—26^2, 26^3, 26^4, and 26^5—to represent the possible sequential letter sets. With some intelligent compression, all this might be possible in a half megabyte or so.)
I also tend to think that building the valid word set via a linked tree or list would work (albeit probably slower than the brute force approach, above). At worst, such a list would need about 75,000 bytes. Given the likely letter patterns in 5000 English words, I wouldn’t be surprised to find that we could make do with 30,000 bytes or fewer. (Now we’re down in eight-bit territory again!)
Are you asking “What is a linked list?” That’s a big topic. For now, let me show you a way of simulating a word tree in Atari BASIC. The accompanying listing looks long, but you will quickly find that the bulk of it is nothing but simple DATA statements. This program has no real practical value, so don’t feel that you need to type it in unless you are curious. But I do hope that at least some of you will look at my word tree and become inspired. If you are, write to me (P.O. Box 710352, San Jose, CA, 95171-0352).
100 DIM COUNT(3),LINE(3),MAX(3) 110 DIM WORD$(3),LETTER$(1) 120 GRAPHICS 0 200 LEVEL=0:LINE(LEVEL)=1000 220 STOP 300 REM RECURSIVE SUBROUTINE 310 RESTORE LINE(LEVEL) 320 READ MAX 330 LEVEL=LEVEL+1 340 COUNT(LEVEL)==1:MAX(LEVEL)=MAX 350 RESTORE LINE(LEVEL-1)+COUNT(LEVEL) 360 READ LETTER$,LINE 370 WORD$(LEVEL,LEVEL)=LETTER$:LINE(LEVEL)=LINE 360 IF LINE=0 THEN PRINT WORD$ 390 IF LINE<>0 THEN GOSUB 300 400 COUNT(LEVEL)=COUNT(LEVEL) 410 IF COUNT(LEVEL)<=MAX(LEVEL) THEN 390 420 LEVEL=LEVEL-1 430 RETURN 1000 DATA S,(FIRST LETTERS) 1001 DATA A,1100 1002 DATA B,1200 1003 DATA C,1300 1004 DATA L,1400 1005 DATA N,1500 1006 DATA O,1600 1007 DATA P,1700 1008 DATA T,1800 1100 DATA 1,(SECOND LETTERS,A*) 1101 DATA R,1110 1110 DATA 1,(THIRD LETTERS,A*) 1111 DATA E,0 1200 DATA 1,(SECOND LETTERS, B*) 1201 DATA E,1210 1210 DATA 1,(THIRD LETTERS, BE*) 1211 DATA T,0 1300 DATA 2,(SECOND LETTERS, C*) 1301 DATA A,1310 1302 DATA O,1320 1310 DATA 3,(THIRD LETTERS, CA*) 1311 DATA N,0 1312 DATA P,0 1313 DATA T,0 1320 DATA 1,(THIRD LETTERS, CO*) 1321 DATA T,0 1400 DATA 2,(SECOND LETTERS, L*) 1401 DATA A,1410 1402 DATA O,1420 1410 DATA 1,(THIRD LETTERS, LA*) 1411 DATA P,0 1420 DATA 1,(THIRD LETTERS, LO*) 1421 DATA P,0 1500 DATA 1,(SECOND LETTERS, N*) 1501 DATA E,1510 1510 DATA 1,(THIRD LETTERS, NE*) 1511 DATA T,0 1600 DATA 1,(SECOND LETTERS, O*) 1601 DATA R,1610 1610 DATA 1,(THIRD LETTERS, OR*) 1611 DATA E,0 1700 DATA 2,(SECOND LETTERS, P*) 1701 DATA E,1710 1702 DATA O,1720 1710 DATA 2,(THIRD LETTERS, PE*) 1711 DATA N,0 1800 DATA 2,(SECOND LETTERS, T*) 1801 DATA A,1810 1802 DATA O,1820 1810 DATA 3,(THIRD LETTERS, TA*) 1811 DATA B,0 1812 DATA N,0 1813 DATA P,0 1820 DATA 2,(THIRD LETTERS, TO*) 1821 DATA N,0 1822 DATA P,0