-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathcontent.xml
1083 lines (998 loc) · 155 KB
/
content.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<content title="(learn 'scheme)">
<cover title="A Complete Environment for Learning Lisp">
<div class="jumbotron">
<h1>A Complete Environment for Learning Lisp</h1>
<p>Do you want to learn <term>Lisp</term>? On your smart phone? While on the bus? Of course you do!</p>
<p><a class="btn btn-primary btn-lg" href="toc.html" role="button">Get Started!</a></p>
</div>
<p>Up until now, in order to tinker with Lisp, you'd need to download some heavyweight Lisp environment (hope your platform is supported!), configure language options, and generally spend lots of time not actually learning anything.</p>
<p>Now, you don't have to. <brand/> is a complete Lisp-learning environment that fits comfortably in your browser. Thanks, Web 2.0!</p>
<p>Additionally, through the magic (i.e. liberal licensing terms) of the <a href="http://creativecommons.org/">Creative Commons</a>, <brand/> also includes a classic textbook on programming in Lisp: <term>Structure and Interpretation of Computer Programs</term> by Abelson, Sussman, and Sussman (originally published by <a href="http://mitpress.mit.edu/sicp/">MIT Press</a>).</p>
<p><term>Scheme</term> is a minimalist dialect of Lisp. It boils Lisp down to its clean, simple essence. <term>Structure and Interpretation of Computer Programs</term> similarly boils programming down to its essence... which it then builds up into a full-blown Lisp interpreter and compiler. That's right, you get to learn Lisp by implementing Lisp in Lisp! That's a lot of Lisp!</p>
<p>If you're intrigued, <a href="toc.html">continue on</a> to get started with the wonderful world of Lisp! If not, go read some of <a href="http://www.paulgraham.com/lisp.html">Paul Graham's essays on Lisp</a>. Still not feeling motivated? Here is why you should be inspired to tackle Lisp:</p>
<quote source="Eric S. Raymond, How to Become a Hacker">LISP is worth learning for a different reason -- the profound enlightenment experience you will have when you finally get it. That experience will make you a better programmer for the rest of your days, even if you never actually use LISP itself a lot.</quote>
</cover>
<front>
<section title="How to Use This Site">
<lead><brand/> is a tiny, interactive implementation of the <term>Scheme</term> programming language that runs in your browser.</lead>
<p><term>Scheme</term> is a minimalist variant of Lisp that can be studied in order to understand Lisp in general. Lisp is frequently touted as a "powerful" programming language because of its unique design in which code is just data that can be manipulated or transformed in order to create new programs at run-time.</p>
<subsection title="How do I use (learn 'scheme)?">
<p><brand/> is an interactive environment for learning Scheme. At any time, you can click the <term>Launch Editor</term> button in the navigation bar at the top of the screen to launch the editor. Note that, on small screens, you may need to expand the navigation bar in order to find the <term>Launch Editor</term> button.</p>
<p>Once you're in the editor, you can type Scheme code into the <term>Input</term> text box. When you want to execute your code, simply click the <term>Evaluate</term> button or press <term>Ctrl+Enter</term>. The output of the evaluation will be displayed in the <term>Output</term> text box.</p>
<p>To navigate the content on <brand/>, use the Next, Contents, and Previous links at the top and bottom of each page.</p>
</subsection>
<subsection title="Demo">
<p>Below, you'll find a sample of Scheme code in a text box, followed by the expected output of evaluating the code.</p>
<p>If you'd like to try the code out, click the <term>Try it</term> button on the right side of the example code text box or double-click the text box itself to launch the editor with the example code. Once the editor is shown, you can follow the instructions above to evaluate the code and view the output.</p>
<p>As an example, the following code</p>
<code>
(+ 1 2)
</code>
<p>evaluates to:</p>
<result>3</result>
</subsection>
<subsection title="Feedback">
<p>I hope you enjoy using <brand/>! If you have any suggestions or if you run into any issues, please open a new issue on the <a href="https://github.com/jaredkrinke/learn-scheme/issues">GitHub issue tracker</a>.</p>
</subsection>
</section>
</front>
<body>
<section title="Building Abstractions with Procedures">
<quote source="John Locke, An Essay Concerning Human Understanding (1690)"> The acts of the mind, wherein it exerts its power over simple ideas, are chiefly these three: 1. Combining several simple ideas into one compound one, and thus all complex ideas are made. 2. The second is bringing two ideas, whether simple or complex, together, and setting them by one another so as to take a view of them at once, without uniting them into one, by which it gets all its ideas of relations. 3. The third is separating them from all other ideas that accompany them in their real existence: this is called abstraction, and thus all its general ideas are made.</quote>
<p>We are about to study the idea of a <term>computational process</term>. Computational processes are abstract beings that inhabit computers. As they evolve, processes manipulate other abstract things called <term>data</term>. The evolution of a process is directed by a pattern of rules called a <term>program</term>. People create programs to direct processes. In effect, we conjure the spirits of the computer with our spells.</p>
<p>A computational process is indeed much like a sorcerer's idea of a spirit. It cannot be seen or touched. It is not composed of matter at all. However, it is very real. It can perform intellectual work. It can answer questions. It can affect the world by disbursing money at a bank or by controlling a robot arm in a factory. The programs we use to conjure processes are like a sorcerer's spells. They are carefully composed from symbolic expressions in arcane and esoteric <term>programming languages</term> that prescribe the tasks we want our processes to perform.</p>
<p>A computational process, in a correctly working computer, executes programs precisely and accurately. Thus, like the sorcerer's apprentice, novice programmers must learn to understand and to anticipate the consequences of their conjuring. Even small errors (usually called <term>bugs</term> or <term>glitches</term>) in programs can have complex and unanticipated consequences.</p>
<p>Fortunately, learning to program is considerably less dangerous than learning sorcery, because the spirits we deal with are conveniently contained in a secure way. Real-world programming, however, requires care, expertise, and wisdom. A small bug in a computer-aided design program, for example, can lead to the catastrophic collapse of an airplane or a dam or the self-destruction of an industrial robot.</p>
<p>Master software engineers have the ability to organize programs so that they can be reasonably sure that the resulting processes will perform the tasks intended. They can visualize the behavior of their systems in advance. They know how to structure programs so that unanticipated problems do not lead to catastrophic consequences, and when problems do arise, they can <term>debug</term> their programs. Well-designed computational systems, like well-designed automobiles or nuclear reactors, are designed in a modular manner, so that the parts can be constructed, replaced, and debugged separately.</p>
<subsection title="Programming in Lisp"><p>We need an appropriate language for describing processes, and we will use for this purpose the programming language Lisp. Just as our everyday thoughts are usually expressed in our natural language (such as English, French, or Japanese), and descriptions of quantitative phenomena are expressed with mathematical notations, our procedural thoughts will be expressed in Lisp. Lisp was invented in the late 1950s as a formalism for reasoning about the use of certain kinds of logical expressions, called <term>recursion equations</term>, as a model for computation. The language was conceived by John McCarthy and is based on his paper "Recursive Functions of Symbolic Expressions and Their Computation by Machine" (McCarthy 1960).</p>
<p>Despite its inception as a mathematical formalism, Lisp is a practical programming language. A Lisp <term>interpreter</term> is a machine that carries out processes described in the Lisp language. The first Lisp interpreter was implemented by McCarthy with the help of colleagues and students in the Artificial Intelligence Group of the MIT Research Laboratory of Electronics and in the MIT Computation Center.<footnote><p>The <term>Lisp 1 Programmer's Manual</term> appeared in 1960, and the <term>Lisp 1.5 Programmer's Manual</term> (McCarthy 1965) was published in 1962. The early history of Lisp is described in McCarthy 1978. </p>
</footnote> Lisp, whose name is an acronym for LISt Processing, was designed to provide symbol-manipulating capabilities for attacking programming problems such as the symbolic differentiation and integration of algebraic expressions. It included for this purpose new data objects known as atoms and lists, which most strikingly set it apart from all other languages of the period.</p>
<p>Lisp was not the product of a concerted design effort. Instead, it evolved informally in an experimental manner in response to users' needs and to pragmatic implementation considerations. Lisp's informal evolution has continued through the years, and the community of Lisp users has traditionally resisted attempts to promulgate any "official" definition of the language. This evolution, together with the flexibility and elegance of the initial conception, has enabled Lisp, which is the second oldest language in widespread use today (only Fortran is older), to continually adapt to encompass the most modern ideas about program design. Thus, Lisp is by now a family of dialects, which, while sharing most of the original features, may differ from one another in significant ways. The dialect of Lisp used in this book is called Scheme.<footnote><p>The two dialects in which most major Lisp programs of the 1970s were written are MacLisp (Moon 1978; Pitman 1983), developed at the MIT Project MAC, and Interlisp (Teitelman 1974), developed at Bolt Beranek and Newman Inc. and the Xerox Palo Alto Research Center. Portable Standard Lisp (Hearn 1969; Griss 1981) was a Lisp dialect designed to be easily portable between different machines. MacLisp spawned a number of subdialects, such as Franz Lisp, which was developed at the University of California at Berkeley, and Zetalisp (Moon 1981), which was based on a special-purpose processor designed at the MIT Artificial Intelligence Laboratory to run Lisp very efficiently. The Lisp dialect used in this book, called Scheme (Steele 1975), was invented in 1975 by Guy Lewis Steele Jr. and Gerald Jay Sussman of the MIT Artificial Intelligence Laboratory and later reimplemented for instructional use at MIT. Scheme became an IEEE standard in 1990 (IEEE 1990). The Common Lisp dialect (Steele 1982, Steele 1990) was developed by the Lisp community to combine features from the earlier Lisp dialects to make an industrial standard for Lisp. Common Lisp became an ANSI standard in 1994 (ANSI 1994). </p>
</footnote></p>
<p>Because of its experimental character and its emphasis on symbol manipulation, Lisp was at first very inefficient for numerical computations, at least in comparison with Fortran. Over the years, however, Lisp compilers have been developed that translate programs into machine code that can perform numerical computations reasonably efficiently. And for special applications, Lisp has been used with great effectiveness.<footnote><p>One such special application was a breakthrough computation of scientific importance -- an integration of the motion of the Solar System that extended previous results by nearly two orders of magnitude, and demonstrated that the dynamics of the Solar System is chaotic. This computation was made possible by new integration algorithms, a special-purpose compiler, and a special-purpose computer all implemented with the aid of software tools written in Lisp (Abelson et al. 1992; Sussman and Wisdom 1992). </p>
</footnote> Although Lisp has not yet overcome its old reputation as hopelessly inefficient, Lisp is now used in many applications where efficiency is not the central concern. For example, Lisp has become a language of choice for operating-system shell languages and for extension languages for editors and computer-aided design systems.</p>
<p>If Lisp is not a mainstream language, why are we using it as the framework for our discussion of programming? Because the language possesses unique features that make it an excellent medium for studying important programming constructs and data structures and for relating them to the linguistic features that support them. The most significant of these features is the fact that Lisp descriptions of processes, called <term>procedures</term>, can themselves be represented and manipulated as Lisp data. The importance of this is that there are powerful program-design techniques that rely on the ability to blur the traditional distinction between "passive" data and "active" processes. As we shall discover, Lisp's flexibility in handling procedures as data makes it one of the most convenient languages in existence for exploring these techniques. The ability to represent procedures as data also makes Lisp an excellent language for writing programs that must manipulate other programs as data, such as the interpreters and compilers that support computer languages. Above and beyond these considerations, programming in Lisp is great fun.</p>
</subsection>
<section title="The Elements of Programming">
<p>A powerful programming language is more than just a means for instructing a computer to perform tasks. The language also serves as a framework within which we organize our ideas about processes. Thus, when we describe a language, we should pay particular attention to the means that the language provides for combining simple ideas to form more complex ideas. Every powerful language has three mechanisms for accomplishing this:</p>
<ul>
<li><term>primitive expressions</term>, which represent the simplest entities the language is concerned with, </li>
<li><term>means of combination</term>, by which compound elements are built from simpler ones, and </li>
<li><term>means of abstraction</term>, by which compound elements can be named and manipulated as units. </li>
</ul>
<p>In programming, we deal with two kinds of elements: procedures and data. (Later we will discover that they are really not so distinct.) Informally, data is "stuff" that we want to manipulate, and procedures are descriptions of the rules for manipulating the data. Thus, any powerful programming language should be able to describe primitive data and primitive procedures and should have methods for combining and abstracting procedures and data.</p>
<p>In this chapter we will deal only with simple numerical data so that we can focus on the rules for building procedures.<footnote><p>The characterization of numbers as "simple data" is a barefaced bluff. In fact, the treatment of numbers is one of the trickiest and most confusing aspects of any programming language. Some typical issues involved are these: Some computer systems distinguish <term>integers</term>, such as 2, from <term>real numbers</term>, such as 2.71. Is the real number 2.00 different from the integer 2? Are the arithmetic operations used for integers the same as the operations used for real numbers? Does 6 divided by 2 produce 3, or 3.0? How large a number can we represent? How many decimal places of accuracy can we represent? Is the range of integers the same as the range of real numbers? Above and beyond these questions, of course, lies a collection of issues concerning roundoff and truncation errors -- the entire science of numerical analysis. Since our focus in this book is on large-scale program design rather than on numerical techniques, we are going to ignore these problems. The numerical examples in this chapter will exhibit the usual roundoff behavior that one observes when using arithmetic operations that preserve a limited number of decimal places of accuracy in noninteger operations. </p>
</footnote> In later chapters we will see that these same rules allow us to build procedures to manipulate compound data as well.</p>
<section title="Expressions">
<p>One easy way to get started at programming is to examine some typical interactions with an interpreter for the Scheme dialect of Lisp. Imagine that you are sitting at a computer terminal. You type an <term>expression</term>, and the interpreter responds by displaying the result of its <term>evaluating</term> that expression.</p>
<p>One kind of primitive expression you might type is a number. (More precisely, the expression that you type consists of the numerals that represent the number in base 10.) If you present Lisp with a number</p>
<code>486</code>
<p>the interpreter will respond by printing<footnote><p>Throughout this book, when we wish to emphasize the distinction between the input typed by the user and the response printed by the interpreter, we will show the latter in a block quote. </p>
</footnote></p>
<result>486</result>
<p>Expressions representing numbers may be combined with an expression representing a primitive procedure (such as <code>+</code> or <code>*</code>) to form a compound expression that represents the application of the procedure to those numbers. For example:</p>
<code>(+ 137 349)</code>
<result>486</result>
<code>(- 1000 334)</code>
<result>666</result>
<code>(* 5 99)</code>
<result>495</result>
<code>(/ 10 5)</code>
<result>2</result>
<code>(+ 2.7 10)</code>
<result>12.7</result>
<p>Expressions such as these, formed by delimiting a list of expressions within parentheses in order to denote procedure application, are called <term>combinations</term>. The leftmost element in the list is called the <term>operator</term>, and the other elements are called <term>operands</term>. The value of a combination is obtained by applying the procedure specified by the operator to the <term>arguments</term> that are the values of the operands.</p>
<p>The convention of placing the operator to the left of the operands is known as <term>prefix notation</term>, and it may be somewhat confusing at first because it departs significantly from the customary mathematical convention. Prefix notation has several advantages, however. One of them is that it can accommodate procedures that may take an arbitrary number of arguments, as in the following examples:</p>
<code>(+ 21 35 12 7)</code>
<result>75</result>
<code>(* 25 4 12)</code>
<result>1200</result>
<p>No ambiguity can arise, because the operator is always the leftmost element and the entire combination is delimited by the parentheses.</p>
<p>A second advantage of prefix notation is that it extends in a straightforward way to allow combinations to be <term>nested</term>, that is, to have combinations whose elements are themselves combinations:</p>
<code>(+ (* 3 5) (- 10 6))</code>
<result>19</result>
<p>There is no limit (in principle) to the depth of such nesting and to the overall complexity of the expressions that the Lisp interpreter can evaluate. It is we humans who get confused by still relatively simple expressions such as</p>
<code>(+ (* 3 (+ (* 2 4) (+ 3 5))) (+ (- 10 7) 6))<expected>57</expected></code>
<p>which the interpreter would readily evaluate to be 57. We can help ourselves by writing such an expression in the form</p>
<code>(+ (* 3
(+ (* 2 4)
(+ 3 5)))
(+ (- 10 7)
6))<expected>57</expected></code>
<p>following a formatting convention known as <term>pretty-printing</term>, in which each long combination is written so that the operands are aligned vertically. The resulting indentations display clearly the structure of the expression.<footnote><p>Lisp systems typically provide features to aid the user in formatting expressions. Two especially useful features are one that automatically indents to the proper pretty-print position whenever a new line is started and one that highlights the matching left parenthesis whenever a right parenthesis is typed. </p>
</footnote></p>
<p>Even with complex expressions, the interpreter always operates in the same basic cycle: It reads an expression from the terminal, evaluates the expression, and prints the result. This mode of operation is often expressed by saying that the interpreter runs in a <term>read-eval-print loop</term>. Observe in particular that it is not necessary to explicitly instruct the interpreter to print the value of the expression.<footnote><p>Lisp obeys the convention that every expression has a value. This convention, together with the old reputation of Lisp as an inefficient language, is the source of the quip by Alan Perlis (paraphrasing Oscar Wilde) that "Lisp programmers know the value of everything but the cost of nothing." </p>
</footnote></p>
</section>
<section title="Naming and the Environment">
<p>A critical aspect of a programming language is the means it provides for using names to refer to computational objects. We say that the name identifies a <term>variable</term> whose <term>value</term> is the object.</p>
<p>In the Scheme dialect of Lisp, we name things with <code>define</code>. Typing</p>
<code>(define size 2)<extra>size</extra><expected>2</expected></code>
<p>causes the interpreter to associate the value 2 with the name <code>size</code>.<footnote><p>In this book, we do not show the interpreter's response to evaluating definitions, since this is highly implementation-dependent. </p>
</footnote> Once the name <code>size</code> has been associated with the number 2, we can refer to the value 2 by name:</p>
<code><hidden>(define size 2)</hidden>size</code>
<result>2</result>
<code><hidden>(define size 2)</hidden>(* 5 size)</code>
<result>10</result>
<p>Here are further examples of the use of <code>define</code>:</p>
<code>(define pi 3.14159)
(define radius 10)
(* pi (* radius radius))</code>
<result>314.159</result>
<code><hidden>(define pi 3.14159)
(define radius 10)</hidden>(define circumference (* 2 pi radius))
circumference</code>
<result>62.8318</result>
<p><code>Define</code> is our language's simplest means of abstraction, for it allows us to use simple names to refer to the results of compound operations, such as the <code>circumference</code> computed above. In general, computational objects may have very complex structures, and it would be extremely inconvenient to have to remember and repeat their details each time we want to use them. Indeed, complex programs are constructed by building, step by step, computational objects of increasing complexity. The interpreter makes this step-by-step program construction particularly convenient because name-object associations can be created incrementally in successive interactions. This feature encourages the incremental development and testing of programs and is largely responsible for the fact that a Lisp program usually consists of a large number of relatively simple procedures.</p>
<p>It should be clear that the possibility of associating values with symbols and later retrieving them means that the interpreter must maintain some sort of memory that keeps track of the name-object pairs. This memory is called the <term>environment</term> (more precisely the <term>global environment</term>, since we will see later that a computation may involve a number of different environments).<footnote><p>Chapter 3 will show that this notion of environment is crucial, both for understanding how the interpreter works and for implementing interpreters. </p>
</footnote></p>
</section>
<section title="Evaluating Combinations">
<p>One of our goals in this chapter is to isolate issues about thinking procedurally. As a case in point, let us consider that, in evaluating combinations, the interpreter is itself following a procedure.</p>
<ul>
<li>To evaluate a combination, do the following: </li>
</ul>
<p>1. Evaluate the subexpressions of the combination.</p>
<p>2. Apply the procedure that is the value of the leftmost subexpression (the operator) to the arguments that are the values of the other subexpressions (the operands). </p>
<p>Even this simple rule illustrates some important points about processes in general. First, observe that the first step dictates that in order to accomplish the evaluation process for a combination we must first perform the evaluation process on each element of the combination. Thus, the evaluation rule is <term>recursive</term> in nature; that is, it includes, as one of its steps, the need to invoke the rule itself.<footnote><p>It may seem strange that the evaluation rule says, as part of the first step, that we should evaluate the leftmost element of a combination, since at this point that can only be an operator such as <code>+</code> or <code>*</code> representing a built-in primitive procedure such as addition or multiplication. We will see later that it is useful to be able to work with combinations whose operators are themselves compound expressions. </p>
</footnote></p>
<p>Notice how succinctly the idea of recursion can be used to express what, in the case of a deeply nested combination, would otherwise be viewed as a rather complicated process. For example, evaluating</p>
<code>(* (+ 2 (* 4 6))
(+ 3 5 7))<expected>390</expected></code>
<p>requires that the evaluation rule be applied to four different combinations. We can obtain a picture of this process by representing the combination in the form of a tree, as shown in figure 1.1. Each combination is represented by a node with branches corresponding to the operator and the operands of the combination stemming from it. The terminal nodes (that is, nodes with no branches stemming from them) represent either operators or numbers. Viewing evaluation in terms of the tree, we can imagine that the values of the operands percolate upward, starting from the terminal nodes and then combining at higher and higher levels. In general, we shall see that recursion is a very powerful technique for dealing with hierarchical, treelike objects. In fact, the "percolate values upward" form of the evaluation rule is an example of a general kind of process known as <term>tree accumulation</term>.</p>
<figure image="ch1-Z-G-1.gif"><caption>Tree representation, showing the value of each subcombination.</caption></figure>
<p>Next, observe that the repeated application of the first step brings us to the point where we need to evaluate, not combinations, but primitive expressions such as numerals, built-in operators, or other names. We take care of the primitive cases by stipulating that</p>
<ul>
<li>the values of numerals are the numbers that they name, </li>
<li>the values of built-in operators are the machine instruction sequences that carry out the corresponding operations, and </li>
<li>the values of other names are the objects associated with those names in the environment. </li>
</ul>
<p>We may regard the second rule as a special case of the third one by stipulating that symbols such as <code>+</code> and <code>*</code> are also included in the global environment, and are associated with the sequences of machine instructions that are their "values." The key point to notice is the role of the environment in determining the meaning of the symbols in expressions. In an interactive language such as Lisp, it is meaningless to speak of the value of an expression such as <code>(+ x 1)</code> without specifying any information about the environment that would provide a meaning for the symbol <code>x</code> (or even for the symbol <code>+</code>). As we shall see in chapter 3, the general notion of the environment as providing a context in which evaluation takes place will play an important role in our understanding of program execution.</p>
<p>Notice that the evaluation rule given above does not handle definitions. For instance, evaluating <code>(define x 3)</code> does not apply <code>define</code> to two arguments, one of which is the value of the symbol <code>x</code> and the other of which is 3, since the purpose of the <code>define</code> is precisely to associate <code>x</code> with a value. (That is, <code>(define x 3)</code> is not a combination.)</p>
<p>Such exceptions to the general evaluation rule are called <term>special forms</term>. <code>Define</code> is the only example of a special form that we have seen so far, but we will meet others shortly. Each special form has its own evaluation rule. The various kinds of expressions (each with its associated evaluation rule) constitute the syntax of the programming language. In comparison with most other programming languages, Lisp has a very simple syntax; that is, the evaluation rule for expressions can be described by a simple general rule together with specialized rules for a small number of special forms.<footnote><p>Special syntactic forms that are simply convenient alternative surface structures for things that can be written in more uniform ways are sometimes called <term>syntactic sugar</term>, to use a phrase coined by Peter Landin. In comparison with users of other languages, Lisp programmers, as a rule, are less concerned with matters of syntax. (By contrast, examine any Pascal manual and notice how much of it is devoted to descriptions of syntax.) This disdain for syntax is due partly to the flexibility of Lisp, which makes it easy to change surface syntax, and partly to the observation that many "convenient" syntactic constructs, which make the language less uniform, end up causing more trouble than they are worth when programs become large and complex. In the words of Alan Perlis, "Syntactic sugar causes cancer of the semicolon." </p>
</footnote></p>
</section>
<section title="Compound Procedures">
<p>We have identified in Lisp some of the elements that must appear in any powerful programming language:</p>
<ul>
<li>Numbers and arithmetic operations are primitive data and procedures. </li>
<li>Nesting of combinations provides a means of combining operations. </li>
<li>Definitions that associate names with values provide a limited means of abstraction. </li>
</ul>
<p>Now we will learn about <term>procedure definitions</term>, a much more powerful abstraction technique by which a compound operation can be given a name and then referred to as a unit.</p>
<p>We begin by examining how to express the idea of "squaring." We might say, "To square something, multiply it by itself." This is expressed in our language as </p>
<code>(define (square x) (* x x))<extra>(square 3)</extra><expected>9</expected></code>
<p>We can understand this in the following way:</p>
<code valid="false">(define (square x) (* x x))
To square something, multiply it by itself.</code>
<p>We have here a <term>compound procedure</term>, which has been given the name <code>square</code>. The procedure represents the operation of multiplying something by itself. The thing to be multiplied is given a local name, <code>x</code>, which plays the same role that a pronoun plays in natural language. Evaluating the definition creates this compound procedure and associates it with the name <code>square</code>.<footnote><p>Observe that there are two different operations being combined here: we are creating the procedure, and we are giving it the name <code>square</code>. It is possible, indeed important, to be able to separate these two notions -- to create procedures without naming them, and to give names to procedures that have already been created. We will see how to do this in section 1.3.2. </p>
</footnote></p>
<p>The general form of a procedure definition is</p>
<code valid="false">(define (<name> <formal parameters>) <body>)</code>
<p>The <<term>name</term>> is a symbol to be associated with the procedure definition in the environment.<footnote><p>Throughout this book, we will describe the general syntax of expressions by using symbols delimited by angle brackets -- e.g., <<term>name</term>> -- to denote the "slots" in the expression to be filled in when such an expression is actually used. </p>
</footnote> The <<term>formal parameters</term>> are the names used within the body of the procedure to refer to the corresponding arguments of the procedure. The <<term>body</term>> is an expression that will yield the value of the procedure application when the formal parameters are replaced by the actual arguments to which the procedure is applied.<footnote><p>More generally, the body of the procedure can be a sequence of expressions. In this case, the interpreter evaluates each expression in the sequence in turn and returns the value of the final expression as the value of the procedure application. </p>
</footnote> The <<term>name</term>> and the <<term>formal parameters</term>> are grouped within parentheses, just as they would be in an actual call to the procedure being defined.</p>
<p>Having defined <code>square</code>, we can now use it:</p>
<code><hidden>(define (square x) (* x x))</hidden>(square 21)</code>
<result>441</result>
<code><hidden>(define (square x) (* x x))</hidden>(square (+ 2 5))</code>
<result>49</result>
<code><hidden>(define (square x) (* x x))</hidden>(square (square 3))</code>
<result>81</result>
<p>We can also use <code>square</code> as a building block in defining other procedures. For example, <term>x</term><sup>2</sup> + <term>y</term><sup>2</sup> can be expressed as</p>
<code valid="false">(+ (square x) (square y))</code>
<p>We can easily define a procedure <code>sum-of-squares</code> that, given any two numbers as arguments, produces the sum of their squares:</p>
<code><hidden>(define (square x) (* x x))</hidden>(define (sum-of-squares x y)
(+ (square x) (square y)))
(sum-of-squares 3 4)</code>
<result>25</result>
<p>Now we can use <code>sum-of-squares</code> as a building block in constructing further procedures:</p>
<code><hidden>(define (square x) (* x x))
(define (sum-of-squares x y)
(+ (square x) (square y)))</hidden>(define (f a)
(sum-of-squares (+ a 1) (* a 2)))
(f 5)</code>
<result>136</result>
<p>Compound procedures are used in exactly the same way as primitive procedures. Indeed, one could not tell by looking at the definition of <code>sum-of-squares</code> given above whether <code>square</code> was built into the interpreter, like <code>+</code> and <code>*</code>, or defined as a compound procedure.</p>
</section>
<section title="The Substitution Model for Procedure Application">
<p>To evaluate a combination whose operator names a compound procedure, the interpreter follows much the same process as for combinations whose operators name primitive procedures, which we described in section 1.1.3. That is, the interpreter evaluates the elements of the combination and applies the procedure (which is the value of the operator of the combination) to the arguments (which are the values of the operands of the combination).</p>
<p>We can assume that the mechanism for applying primitive procedures to arguments is built into the interpreter. For compound procedures, the application process is as follows:</p>
<ul>
<li>To apply a compound procedure to arguments, evaluate the body of the procedure with each formal parameter replaced by the corresponding argument. </li>
</ul>
<p>To illustrate this process, let's evaluate the combination</p>
<code valid="false">(f 5)</code>
<p>where <code>f</code> is the procedure defined in section 1.1.4. We begin by retrieving the body of <code>f</code>:</p>
<code valid="false">(sum-of-squares (+ a 1) (* a 2))</code>
<p>Then we replace the formal parameter <code>a</code> by the argument 5:</p>
<code valid="false">(sum-of-squares (+ 5 1) (* 5 2))</code>
<p>Thus the problem reduces to the evaluation of a combination with two operands and an operator <code>sum-of-squares</code>. Evaluating this combination involves three subproblems. We must evaluate the operator to get the procedure to be applied, and we must evaluate the operands to get the arguments. Now <code>(+ 5 1)</code> produces 6 and <code>(* 5 2)</code> produces 10, so we must apply the <code>sum-of-squares</code> procedure to 6 and 10. These values are substituted for the formal parameters <code>x</code> and <code>y</code> in the body of <code>sum-of-squares</code>, reducing the expression to</p>
<code valid="false">(+ (square 6) (square 10))</code>
<p>If we use the definition of <code>square</code>, this reduces to</p>
<code valid="false">(+ (* 6 6) (* 10 10))</code>
<p>which reduces by multiplication to</p>
<code valid="false">(+ 36 100)</code>
<p>and finally to</p>
<code valid="false">136</code>
<p>The process we have just described is called the <term>substitution model</term> for procedure application. It can be taken as a model that determines the "meaning" of procedure application, insofar as the procedures in this chapter are concerned. However, there are two points that should be stressed:</p>
<ul>
<li>The purpose of the substitution is to help us think about procedure application, not to provide a description of how the interpreter really works. Typical interpreters do not evaluate procedure applications by manipulating the text of a procedure to substitute values for the formal parameters. In practice, the "substitution" is accomplished by using a local environment for the formal parameters. We will discuss this more fully in chapters 3 and 4 when we examine the implementation of an interpreter in detail. </li>
<li>Over the course of this book, we will present a sequence of increasingly elaborate models of how interpreters work, culminating with a complete implementation of an interpreter and compiler in chapter 5. The substitution model is only the first of these models -- a way to get started thinking formally about the evaluation process. In general, when modeling phenomena in science and engineering, we begin with simplified, incomplete models. As we examine things in greater detail, these simple models become inadequate and must be replaced by more refined models. The substitution model is no exception. In particular, when we address in chapter 3 the use of procedures with "mutable data," we will see that the substitution model breaks down and must be replaced by a more complicated model of procedure application.<footnote><p>Despite the simplicity of the substitution idea, it turns out to be surprisingly complicated to give a rigorous mathematical definition of the substitution process. The problem arises from the possibility of confusion between the names used for the formal parameters of a procedure and the (possibly identical) names used in the expressions to which the procedure may be applied. Indeed, there is a long history of erroneous definitions of <term>substitution</term> in the literature of logic and programming semantics. See Stoy 1977 for a careful discussion of substitution. </p>
</footnote> </li>
</ul>
<subsection title="Applicative order versus normal order"><p>According to the description of evaluation given in section 1.1.3, the interpreter first evaluates the operator and operands and then applies the resulting procedure to the resulting arguments. This is not the only way to perform evaluation. An alternative evaluation model would not evaluate the operands until their values were needed. Instead it would first substitute operand expressions for parameters until it obtained an expression involving only primitive operators, and would then perform the evaluation. If we used this method, the evaluation of</p>
<code valid="false">(f 5)</code>
<p>would proceed according to the sequence of expansions</p>
<code valid="false">(sum-of-squares (+ 5 1) (* 5 2))
(+ (square (+ 5 1)) (square (* 5 2)) )
(+ (* (+ 5 1) (+ 5 1)) (* (* 5 2) (* 5 2)))</code>
<p>followed by the reductions</p>
<code valid="false">(+ (* 6 6) (* 10 10))
(+ 36 100)
136</code>
<p>This gives the same answer as our previous evaluation model, but the process is different. In particular, the evaluations of <code>(+ 5 1)</code> and <code>(* 5 2)</code> are each performed twice here, corresponding to the reduction of the expression</p>
<code valid="false">(* x x)</code>
<p>with <code>x</code> replaced respectively by <code>(+ 5 1)</code> and <code>(* 5 2)</code>.</p>
<p>This alternative "fully expand and then reduce" evaluation method is known as <term>normal-order evaluation</term>, in contrast to the "evaluate the arguments and then apply" method that the interpreter actually uses, which is called <term>applicative-order evaluation</term>. It can be shown that, for procedure applications that can be modeled using substitution (including all the procedures in the first two chapters of this book) and that yield legitimate values, normal-order and applicative-order evaluation produce the same value. (See exercise 1.5 for an instance of an "illegitimate" value where normal-order and applicative-order evaluation do not give the same result.)</p>
<p>Lisp uses applicative-order evaluation, partly because of the additional efficiency obtained from avoiding multiple evaluations of expressions such as those illustrated with <code>(+ 5 1)</code> and <code>(* 5 2)</code> above and, more significantly, because normal-order evaluation becomes much more complicated to deal with when we leave the realm of procedures that can be modeled by substitution. On the other hand, normal-order evaluation can be an extremely valuable tool, and we will investigate some of its implications in chapters 3 and 4.<footnote><p>In chapter 3 we will introduce <term>stream processing</term>, which is a way of handling apparently "infinite" data structures by incorporating a limited form of normal-order evaluation. In section 4.2 we will modify the Scheme interpreter to produce a normal-order variant of Scheme. </p>
</footnote></p>
</subsection>
</section>
<section title="Conditional Expressions and Predicates">
<p>The expressive power of the class of procedures that we can define at this point is very limited, because we have no way to make tests and to perform different operations depending on the result of a test. For instance, we cannot define a procedure that computes the absolute value of a number by testing whether the number is positive, negative, or zero and taking different actions in the different cases according to the rule</p>
<image path="ch1-Z-G-2.gif"/>
<p>This construct is called a <term>case analysis</term>, and there is a special form in Lisp for notating such a case analysis. It is called <code>cond</code> (which stands for "conditional"), and it is used as follows:</p>
<code>(define (abs x)
(cond ((> x 0) x)
((= x 0) 0)
((< x 0) (- x))))<extra>(abs -1)</extra><expected>1</expected></code>
<p>The general form of a conditional expression is</p>
<code valid="false">(cond (<p1> <e1>)
(<p2> <e2>)
(<pn> <en>))</code>
<p>consisting of the symbol <code>cond</code> followed by parenthesized pairs of expressions <code>(<<term>p</term>> <<term>e</term>>)</code> called <term>clauses</term>. The first expression in each pair is a <term>predicate</term> -- that is, an expression whose value is interpreted as either true or false.<footnote><p>"Interpreted as either true or false" means this: In Scheme, there are two distinguished values that are denoted by the constants <code>#t</code> and <code>#f</code>. When the interpreter checks a predicate's value, it interprets <code>#f</code> as false. Any other value is treated as true. (Thus, providing <code>#t</code> is logically unnecessary, but it is convenient.) In this book we will use names <code>true</code> and <code>false</code>, which are associated with the values <code>#t</code> and <code>#f</code> respectively. </p>
</footnote></p>
<p>Conditional expressions are evaluated as follows. The predicate <<term>p<sub>1</sub></term>> is evaluated first. If its value is false, then <<term>p<sub>2</sub></term>> is evaluated. If <<term>p<sub>2</sub></term>>'s value is also false, then <<term>p<sub>3</sub></term>> is evaluated. This process continues until a predicate is found whose value is true, in which case the interpreter returns the value of the corresponding <term>consequent expression</term> <<term>e</term>> of the clause as the value of the conditional expression. If none of the <<term>p</term>>'s is found to be true, the value of the <code>cond</code> is undefined.</p>
<p>The word <term>predicate</term> is used for procedures that return true or false, as well as for expressions that evaluate to true or false. The absolute-value procedure <code>abs</code> makes use of the primitive predicates <code>></code>, <code><</code>, and <code>=</code>.<footnote><p><code>Abs</code> also uses the "minus" operator <code>-</code>, which, when used with a single operand, as in <code>(- x)</code>, indicates negation. </p>
</footnote> These take two numbers as arguments and test whether the first number is, respectively, greater than, less than, or equal to the second number, returning true or false accordingly.</p>
<p>Another way to write the absolute-value procedure is</p>
<code>(define (abs x)
(cond ((< x 0) (- x))
(else x)))<extra>(abs -1)</extra><expected>1</expected></code>
<p>which could be expressed in English as "If <term>x</term> is less than zero return - <term>x</term>; otherwise return <term>x</term>." <code>Else</code> is a special symbol that can be used in place of the <<term>p</term>> in the final clause of a <code>cond</code>. This causes the <code>cond</code> to return as its value the value of the corresponding <<term>e</term>> whenever all previous clauses have been bypassed. In fact, any expression that always evaluates to a true value could be used as the <<term>p</term>> here.</p>
<p>Here is yet another way to write the absolute-value procedure:</p>
<code>(define (abs x)
(if (< x 0)
(- x)
x))<extra>(abs -1)</extra><expected>1</expected></code>
<p>This uses the special form <code>if</code>, a restricted type of conditional that can be used when there are precisely two cases in the case analysis. The general form of an <code>if</code> expression is</p>
<code valid="false">(if <predicate> <consequent> <alternative>)</code>
<p>To evaluate an <code>if</code> expression, the interpreter starts by evaluating the <<term>predicate</term>> part of the expression. If the <<term>predicate</term>> evaluates to a true value, the interpreter then evaluates the <<term>consequent</term>> and returns its value. Otherwise it evaluates the <<term>alternative</term>> and returns its value.<footnote><p>A minor difference between <code>if</code> and <code>cond</code> is that the <<term>e</term>> part of each <code>cond</code> clause may be a sequence of expressions. If the corresponding <<term>p</term>> is found to be true, the expressions <<term>e</term>> are evaluated in sequence and the value of the final expression in the sequence is returned as the value of the <code>cond</code>. In an <code>if</code> expression, however, the <<term>consequent</term>> and <<term>alternative</term>> must be single expressions. </p>
</footnote></p>
<p>In addition to primitive predicates such as <code><</code>, <code>=</code>, and <code>></code>, there are logical composition operations, which enable us to construct compound predicates. The three most frequently used are these:</p>
<ul>
<li><code>(and <<term>e<sub>1</sub></term>> ... <<term>e<sub><term>n</term></sub></term>>)</code> The interpreter evaluates the expressions <<term>e</term>> one at a time, in left-to-right order. If any <<term>e</term>> evaluates to false, the value of the <code>and</code> expression is false, and the rest of the <<term>e</term>>'s are not evaluated. If all <<term>e</term>>'s evaluate to true values, the value of the <code>and</code> expression is the value of the last one. </li>
<li><code>(or <<term>e<sub>1</sub></term>> ... <<term>e<sub><term>n</term></sub></term>>)</code> The interpreter evaluates the expressions <<term>e</term>> one at a time, in left-to-right order. If any <<term>e</term>> evaluates to a true value, that value is returned as the value of the <code>or</code> expression, and the rest of the <<term>e</term>>'s are not evaluated. If all <<term>e</term>>'s evaluate to false, the value of the <code>or</code> expression is false. </li>
<li><code>(not <<term>e</term>>)</code> The value of a <code>not</code> expression is true when the expression <<term>e</term>> evaluates to false, and false otherwise. </li>
</ul>
<p>Notice that <code>and</code> and <code>or</code> are special forms, not procedures, because the subexpressions are not necessarily all evaluated. <code>Not</code> is an ordinary procedure.</p>
<p>As an example of how these are used, the condition that a number <term>x</term> be in the range 5 < <term>x</term> < 10 may be expressed as</p>
<code valid="false">(and (> x 5) (< x 10))</code>
<p>As another example, we can define a predicate to test whether one number is greater than or equal to another as</p>
<code>(define (>= x y)
(or (> x y) (= x y)))<extra>(>= 2 2)</extra><expected>#t</expected></code>
<p>or alternatively as</p>
<code>(define (>= x y)
(not (< x y)))<extra>(> 2 3)</extra><expected>#f</expected></code>
<exercise><p>Below is a sequence of expressions. What is the result printed by the interpreter in response to each expression? Assume that the sequence is to be evaluated in the order in which it is presented.</p>
<code valid="false">10
(+ 5 3 4)
(- 9 1)
(/ 6 2)
(+ (* 2 4) (- 4 6))
(define a 3)
(define b (+ a 1))
(+ a b (* a b))
(= a b)
(if (and (> b a) (< b (* a b)))
b
a)
(cond ((= a 4) 6)
((= b 4) (+ 6 7 a))
(else 25))
(+ 2 (if (> b a) b a))
(* (cond ((> a b) a)
((< a b) b)
(else -1))
(+ a 1))</code>
</exercise>
<exercise><p>Translate the following expression into prefix form </p>
<image path="ch1-Z-G-3.gif"/>
</exercise>
<exercise><p>Define a procedure that takes three numbers as arguments and returns the sum of the squares of the two larger numbers. </p>
</exercise>
<exercise><p>Observe that our model of evaluation allows for combinations whose operators are compound expressions. Use this observation to describe the behavior of the following procedure: </p>
<code valid="false">(define (a-plus-abs-b a b)
((if (> b 0) + -) a b))</code>
</exercise>
<exercise><p>Ben Bitdiddle has invented a test to determine whether the interpreter he is faced with is using applicative-order evaluation or normal-order evaluation. He defines the following two procedures: </p>
<code valid="false">(define (p) (p))
(define (test x y)
(if (= x 0)
0
y))</code>
<p>Then he evaluates the expression </p>
<code valid="false">(test 0 (p))</code>
<p>What behavior will Ben observe with an interpreter that uses applicative-order evaluation? What behavior will he observe with an interpreter that uses normal-order evaluation? Explain your answer. (Assume that the evaluation rule for the special form <code>if</code> is the same whether the interpreter is using normal or applicative order: The predicate expression is evaluated first, and the result determines whether to evaluate the consequent or the alternative expression.) </p>
</exercise>
</section>
<section title="Example: Square Roots by Newton's Method">
<p>Procedures, as introduced above, are much like ordinary mathematical functions. They specify a value that is determined by one or more parameters. But there is an important difference between mathematical functions and computer procedures. Procedures must be effective.</p>
<p>As a case in point, consider the problem of computing square roots. We can define the square-root function as </p>
<image path="ch1-Z-G-4.gif"/>
<p>This describes a perfectly legitimate mathematical function. We could use it to recognize whether one number is the square root of another, or to derive facts about square roots in general. On the other hand, the definition does not describe a procedure. Indeed, it tells us almost nothing about how to actually find the square root of a given number. It will not help matters to rephrase this definition in pseudo-Lisp:</p>
<code valid="false">(define (sqrt x)
(the y (and (>= y 0)
(= (square y) x))))</code>
<p>This only begs the question.</p>
<p>The contrast between function and procedure is a reflection of the general distinction between describing properties of things and describing how to do things, or, as it is sometimes referred to, the distinction between declarative knowledge and imperative knowledge. In mathematics we are usually concerned with declarative (what is) descriptions, whereas in computer science we are usually concerned with imperative (how to) descriptions.<footnote><p>Declarative and imperative descriptions are intimately related, as indeed are mathematics and computer science. For instance, to say that the answer produced by a program is "correct" is to make a declarative statement about the program. There is a large amount of research aimed at establishing techniques for proving that programs are correct, and much of the technical difficulty of this subject has to do with negotiating the transition between imperative statements (from which programs are constructed) and declarative statements (which can be used to deduce things). In a related vein, an important current area in programming-language design is the exploration of so-called very high-level languages, in which one actually programs in terms of declarative statements. The idea is to make interpreters sophisticated enough so that, given "what is" knowledge specified by the programmer, they can generate "how to" knowledge automatically. This cannot be done in general, but there are important areas where progress has been made. We shall revisit this idea in chapter 4. </p>
</footnote></p>
<p>How does one compute square roots? The most common way is to use Newton's method of successive approximations, which says that whenever we have a guess <term>y</term> for the value of the square root of a number <term>x</term>, we can perform a simple manipulation to get a better guess (one closer to the actual square root) by averaging <term>y</term> with <term>x</term>/<term>y</term>.<footnote><p>This square-root algorithm is actually a special case of Newton's method, which is a general technique for finding roots of equations. The square-root algorithm itself was developed by Heron of Alexandria in the first century A.D. We will see how to express the general Newton's method as a Lisp procedure in section 1.3.4. </p>
</footnote> For example, we can compute the square root of 2 as follows. Suppose our initial guess is 1:</p>
<table class="table" border="0"><tr><td valign="top" >Guess </td><td valign="top" >Quotient </td><td valign="top" >Average</td></tr>
<tr><td valign="top" >1 </td><td valign="top" > (2/1) = 2 </td><td valign="top" >
((2 + 1)/2) = 1.5 </td></tr>
<tr><td valign="top" >1.5 </td><td valign="top" > (2/1.5) = 1.3333 </td><td valign="top" >
((1.3333 + 1.5)/2) = 1.4167 </td></tr>
<tr><td valign="top" >1.4167 </td><td valign="top" > (2/1.4167) = 1.4118 </td><td valign="top" >
((1.4167 + 1.4118)/2) = 1.4142 </td></tr>
<tr><td valign="top" >1.4142 </td><td valign="top" >...</td><td valign="top" >...</td></tr>
</table><p>Continuing this process, we obtain better and better approximations to the square root.</p>
<p>Now let's formalize the process in terms of procedures. We start with a value for the radicand (the number whose square root we are trying to compute) and a value for the guess. If the guess is good enough for our purposes, we are done; if not, we must repeat the process with an improved guess. We write this basic strategy as a procedure:</p>
<code valid="false">(define (sqrt-iter guess x)
(if (good-enough? guess x)
guess
(sqrt-iter (improve guess x)
x)))</code>
<p>A guess is improved by averaging it with the quotient of the radicand and the old guess:</p>
<code valid="false">(define (improve guess x)
(average guess (/ x guess)))</code>
<p>where</p>
<code>(define (average x y)
(/ (+ x y) 2))<extra>(average 10 20)</extra><expected>15</expected></code>
<p>We also have to say what we mean by "good enough." The following will do for illustration, but it is not really a very good test. (See exercise 1.7.) The idea is to improve the answer until it is close enough so that its square differs from the radicand by less than a predetermined tolerance (here 0.001):<footnote><p>We will usually give predicates names ending with question marks, to help us remember that they are predicates. This is just a stylistic convention. As far as the interpreter is concerned, the question mark is just an ordinary character. </p>
</footnote></p>
<code><hidden>(define (square x) (* x x))
(define (abs x)
(if (< x 0)
(- x)
x))</hidden>(define (good-enough? guess x)
(< (abs (- (square guess) x)) 0.001))<extra>(good-enough? 2.9999 9)</extra><expected>#t</expected></code>
<p>Finally, we need a way to get started. For instance, we can always guess that the square root of any number is 1:</p>
<code><hidden>(define (square x) (* x x))
(define (abs x)
(if (< x 0)
(- x)
x))
(define (sqrt-iter guess x)
(if (good-enough? guess x)
guess
(sqrt-iter (improve guess x)
x)))
(define (improve guess x)
(average guess (/ x guess)))
(define (average x y)
(/ (+ x y) 2))
(define (good-enough? guess x)
(< (abs (- (square guess) x)) 0.001))</hidden>(define (sqrt x)
(sqrt-iter 1.0 x))<extra>(sqrt 9)</extra><expected>3.00009155413138</expected></code>
<p>If we type these definitions to the interpreter, we can use <code>sqrt</code> just as we can use any procedure:</p>
<code><hidden>(define (square x) (* x x))
(define (abs x)
(if (< x 0)
(- x)
x))
(define (sqrt-iter guess x)
(if (good-enough? guess x)
guess
(sqrt-iter (improve guess x)
x)))
(define (improve guess x)
(average guess (/ x guess)))
(define (average x y)
(/ (+ x y) 2))
(define (good-enough? guess x)
(< (abs (- (square guess) x)) 0.001))
(define (sqrt x)
(sqrt-iter 1.0 x))</hidden>(sqrt 9)</code>
<result>3.00009155413138</result>
<code><hidden>(define (square x) (* x x))
(define (abs x)
(if (< x 0)
(- x)
x))
(define (sqrt-iter guess x)
(if (good-enough? guess x)
guess
(sqrt-iter (improve guess x)
x)))
(define (improve guess x)
(average guess (/ x guess)))
(define (average x y)
(/ (+ x y) 2))
(define (good-enough? guess x)
(< (abs (- (square guess) x)) 0.001))
(define (sqrt x)
(sqrt-iter 1.0 x))</hidden>(sqrt (+ 100 37))</code>
<result>11.704699917758145</result>
<code><hidden>(define (square x) (* x x))
(define (abs x)
(if (< x 0)
(- x)
x))
(define (sqrt-iter guess x)
(if (good-enough? guess x)
guess
(sqrt-iter (improve guess x)
x)))
(define (improve guess x)
(average guess (/ x guess)))
(define (average x y)
(/ (+ x y) 2))
(define (good-enough? guess x)
(< (abs (- (square guess) x)) 0.001))
(define (sqrt x)
(sqrt-iter 1.0 x))</hidden>(sqrt (+ (sqrt 2) (sqrt 3)))</code>
<result>1.7739279023207892</result>
<code><hidden>(define (square x) (* x x))
(define (abs x)
(if (< x 0)
(- x)
x))
(define (sqrt-iter guess x)
(if (good-enough? guess x)
guess
(sqrt-iter (improve guess x)
x)))
(define (improve guess x)
(average guess (/ x guess)))
(define (average x y)
(/ (+ x y) 2))
(define (good-enough? guess x)
(< (abs (- (square guess) x)) 0.001))
(define (sqrt x)
(sqrt-iter 1.0 x))</hidden>(square (sqrt 1000))</code>
<result>1000.000369924366</result>
<p>The <code>sqrt</code> program also illustrates that the simple procedural language we have introduced so far is sufficient for writing any purely numerical program that one could write in, say, C or Pascal. This might seem surprising, since we have not included in our language any iterative (looping) constructs that direct the computer to do something over and over again. <code>Sqrt-iter</code>, on the other hand, demonstrates how iteration can be accomplished using no special construct other than the ordinary ability to call a procedure.<footnote><p>Readers who are worried about the efficiency issues involved in using procedure calls to implement iteration should note the remarks on "tail recursion" in section 1.2.1. </p>
</footnote> </p>
<exercise><p>Alyssa P. Hacker doesn't see why <code>if</code> needs to be provided as a special form. "Why can't I just define it as an ordinary procedure in terms of <code>cond</code>?" she asks. Alyssa's friend Eva Lu Ator claims this can indeed be done, and she defines a new version of <code>if</code>:</p>
<code valid="false">(define (new-if predicate then-clause else-clause)
(cond (predicate then-clause)
(else else-clause)))</code>
<p>Eva demonstrates the program for Alyssa:</p>
<code valid="false">(new-if (= 2 3) 0 5)</code>
<result>5</result>
<code valid="false">(new-if (= 1 1) 0 5)</code>
<result>0</result>
<p>Delighted, Alyssa uses <code>new-if</code> to rewrite the square-root program:</p>
<code valid="false">(define (sqrt-iter guess x)
(new-if (good-enough? guess x)
guess
(sqrt-iter (improve guess x)
x)))</code>
<p>What happens when Alyssa attempts to use this to compute square roots? Explain. </p>
</exercise>
<exercise><p>The <code>good-enough?</code> test used in computing square roots will not be very effective for finding the square roots of very small numbers. Also, in real computers, arithmetic operations are almost always performed with limited precision. This makes our test inadequate for very large numbers. Explain these statements, with examples showing how the test fails for small and large numbers. An alternative strategy for implementing <code>good-enough?</code> is to watch how <code>guess</code> changes from one iteration to the next and to stop when the change is a very small fraction of the guess. Design a square-root procedure that uses this kind of end test. Does this work better for small and large numbers? </p>
</exercise>
<exercise><p>Newton's method for cube roots is based on the fact that if <term>y</term> is an approximation to the cube root of <term>x</term>, then a better approximation is given by the value </p>
<image path="ch1-Z-G-5.gif"/>
<p>Use this formula to implement a cube-root procedure analogous to the square-root procedure. (In section 1.3.4 we will see how to implement Newton's method in general as an abstraction of these square-root and cube-root procedures.) </p>
</exercise>
</section>
<section title="Procedures as Black-Box Abstractions">
<p><code>Sqrt</code> is our first example of a process defined by a set of mutually defined procedures. Notice that the definition of <code>sqrt-iter</code> is <term>recursive</term>; that is, the procedure is defined in terms of itself. The idea of being able to define a procedure in terms of itself may be disturbing; it may seem unclear how such a "circular" definition could make sense at all, much less specify a well-defined process to be carried out by a computer. This will be addressed more carefully in section 1.2. But first let's consider some other important points illustrated by the <code>sqrt</code> example.</p>
<p>Observe that the problem of computing square roots breaks up naturally into a number of subproblems: how to tell whether a guess is good enough, how to improve a guess, and so on. Each of these tasks is accomplished by a separate procedure. The entire <code>sqrt</code> program can be viewed as a cluster of procedures (shown in figure 1.2) that mirrors the decomposition of the problem into subproblems.</p>
<figure image="ch1-Z-G-6.gif"><caption>Procedural decomposition of the <code>sqrt</code> program.</caption></figure>
<p>The importance of this decomposition strategy is not simply that one is dividing the program into parts. After all, we could take any large program and divide it into parts -- the first ten lines, the next ten lines, the next ten lines, and so on. Rather, it is crucial that each procedure accomplishes an identifiable task that can be used as a module in defining other procedures. For example, when we define the <code>good-enough?</code> procedure in terms of <code>square</code>, we are able to regard the <code>square</code> procedure as a "black box." We are not at that moment concerned with <term>how</term> the procedure computes its result, only with the fact that it computes the square. The details of how the square is computed can be suppressed, to be considered at a later time. Indeed, as far as the <code>good-enough?</code> procedure is concerned, <code>square</code> is not quite a procedure but rather an abstraction of a procedure, a so-called <term>procedural abstraction</term>. At this level of abstraction, any procedure that computes the square is equally good.</p>
<p>Thus, considering only the values they return, the following two procedures for squaring a number should be indistinguishable. Each takes a numerical argument and produces the square of that number as the value.<footnote><p>It is not even clear which of these procedures is a more efficient implementation. This depends upon the hardware available. There are machines for which the "obvious" implementation is the less efficient one. Consider a machine that has extensive tables of logarithms and antilogarithms stored in a very efficient manner. </p>
</footnote></p>
<code>(define (square x) (* x x))<extra>(square 3)</extra><expected>9</expected></code>
<code>(define (square x)
(exp (double (log x))))
(define (double x) (+ x x))<extra>(square 3)</extra><expected>9</expected></code>
<p>So a procedure definition should be able to suppress detail. The users of the procedure may not have written the procedure themselves, but may have obtained it from another programmer as a black box. A user should not need to know how the procedure is implemented in order to use it.</p>
<subsection title="Local names"><p>One detail of a procedure's implementation that should not matter to the user of the procedure is the implementer's choice of names for the procedure's formal parameters. Thus, the following procedures should not be distinguishable:</p>
<code valid="false">(define (square x) (* x x))
(define (square y) (* y y))</code>
<p>This principle -- that the meaning of a procedure should be independent of the parameter names used by its author -- seems on the surface to be self-evident, but its consequences are profound. The simplest consequence is that the parameter names of a procedure must be local to the body of the procedure. For example, we used <code>square</code> in the definition of <code>good-enough?</code> in our square-root procedure:</p>
<code valid="false">(define (good-enough? guess x)
(< (abs (- (square guess) x)) 0.001))</code>
<p>The intention of the author of <code>good-enough?</code> is to determine if the square of the first argument is within a given tolerance of the second argument. We see that the author of <code>good-enough?</code> used the name <code>guess</code> to refer to the first argument and <code>x</code> to refer to the second argument. The argument of <code>square</code> is <code>guess</code>. If the author of <code>square</code> used <code>x</code> (as above) to refer to that argument, we see that the <code>x</code> in <code>good-enough?</code> must be a different <code>x</code> than the one in <code>square</code>. Running the procedure <code>square</code> must not affect the value of <code>x</code> that is used by <code>good-enough?</code>, because that value of <code>x</code> may be needed by <code>good-enough?</code> after <code>square</code> is done computing.</p>
<p>If the parameters were not local to the bodies of their respective procedures, then the parameter <code>x</code> in <code>square</code> could be confused with the parameter <code>x</code> in <code>good-enough?</code>, and the behavior of <code>good-enough?</code> would depend upon which version of <code>square</code> we used. Thus, <code>square</code> would not be the black box we desired.</p>
<p>A formal parameter of a procedure has a very special role in the procedure definition, in that it doesn't matter what name the formal parameter has. Such a name is called a <term>bound variable</term>, and we say that the procedure definition <term>binds</term> its formal parameters. The meaning of a procedure definition is unchanged if a bound variable is consistently renamed throughout the definition.<footnote><p>The concept of consistent renaming is actually subtle and difficult to define formally. Famous logicians have made embarrassing errors here. </p>
</footnote> If a variable is not bound, we say that it is <term>free</term>. The set of expressions for which a binding defines a name is called the <term>scope</term> of that name. In a procedure definition, the bound variables declared as the formal parameters of the procedure have the body of the procedure as their scope.</p>
<p>In the definition of <code>good-enough?</code> above, <code>guess</code> and <code>x</code> are bound variables but <code><</code>, <code>-</code>, <code>abs</code>, and <code>square</code> are free. The meaning of <code>good-enough?</code> should be independent of the names we choose for <code>guess</code> and <code>x</code> so long as they are distinct and different from <code><</code>, <code>-</code>, <code>abs</code>, and <code>square</code>. (If we renamed <code>guess</code> to <code>abs</code> we would have introduced a bug by <term>capturing</term> the variable <code>abs</code>. It would have changed from free to bound.) The meaning of <code>good-enough?</code> is not independent of the names of its free variables, however. It surely depends upon the fact (external to this definition) that the symbol <code>abs</code> names a procedure for computing the absolute value of a number. <code>Good-enough?</code> will compute a different function if we substitute <code>cos</code> for <code>abs</code> in its definition.</p>
</subsection>
<subsection title="Internal definitions and block structure"><p>We have one kind of name isolation available to us so far: The formal parameters of a procedure are local to the body of the procedure. The square-root program illustrates another way in which we would like to control the use of names. The existing program consists of separate procedures:</p>
<code><hidden>(define (square x) (* x x))
(define (abs x) (if (< x 0) (- x) x))
(define (average x y) (/ (+ x y) 2))</hidden>(define (sqrt x)
(sqrt-iter 1.0 x))
(define (sqrt-iter guess x)
(if (good-enough? guess x)
guess
(sqrt-iter (improve guess x) x)))
(define (good-enough? guess x)
(< (abs (- (square guess) x)) 0.001))
(define (improve guess x)
(average guess (/ x guess)))<extra>(sqrt 9)</extra><expected>3.00009155413138</expected></code>
<p>The problem with this program is that the only procedure that is important to users of <code>sqrt</code> is <code>sqrt</code>. The other procedures (<code>sqrt-iter</code>, <code>good-enough?</code>, and <code>improve</code>) only clutter up their minds. They may not define any other procedure called <code>good-enough?</code> as part of another program to work together with the square-root program, because <code>sqrt</code> needs it. The problem is especially severe in the construction of large systems by many separate programmers. For example, in the construction of a large library of numerical procedures, many numerical functions are computed as successive approximations and thus might have procedures named <code>good-enough?</code> and <code>improve</code> as auxiliary procedures. We would like to localize the subprocedures, hiding them inside <code>sqrt</code> so that <code>sqrt</code> could coexist with other successive approximations, each having its own private <code>good-enough?</code> procedure. To make this possible, we allow a procedure to have internal definitions that are local to that procedure. For example, in the square-root problem we can write</p>
<code><hidden>(define (square x) (* x x))
(define (abs x) (if (< x 0) (- x) x))
(define (average x y) (/ (+ x y) 2))</hidden>(define (sqrt x)
(define (good-enough? guess x)
(< (abs (- (square guess) x)) 0.001))
(define (improve guess x)
(average guess (/ x guess)))
(define (sqrt-iter guess x)
(if (good-enough? guess x)
guess
(sqrt-iter (improve guess x) x)))
(sqrt-iter 1.0 x))<extra>(sqrt 9)</extra><expected>3.00009155413138</expected></code>
<p>Such nesting of definitions, called <term>block structure</term>, is basically the right solution to the simplest name-packaging problem. But there is a better idea lurking here. In addition to internalizing the definitions of the auxiliary procedures, we can simplify them. Since <code>x</code> is bound in the definition of <code>sqrt</code>, the procedures <code>good-enough?</code>, <code>improve</code>, and <code>sqrt-iter</code>, which are defined internally to <code>sqrt</code>, are in the scope of <code>x</code>. Thus, it is not necessary to pass <code>x</code> explicitly to each of these procedures. Instead, we allow <code>x</code> to be a free variable in the internal definitions, as shown below. Then <code>x</code> gets its value from the argument with which the enclosing procedure <code>sqrt</code> is called. This discipline is called <term>lexical scoping</term>.<footnote><p>Lexical scoping dictates that free variables in a procedure are taken to refer to bindings made by enclosing procedure definitions; that is, they are looked up in the environment in which the procedure was defined. We will see how this works in detail in chapter 3 when we study environments and the detailed behavior of the interpreter. </p>
</footnote></p>
<code><hidden>(define (square x) (* x x))
(define (abs x) (if (< x 0) (- x) x))
(define (average x y) (/ (+ x y) 2))</hidden>(define (sqrt x)
(define (good-enough? guess)
(< (abs (- (square guess) x)) 0.001))
(define (improve guess)
(average guess (/ x guess)))
(define (sqrt-iter guess)
(if (good-enough? guess)
guess
(sqrt-iter (improve guess))))
(sqrt-iter 1.0))<extra>(sqrt 9)</extra><expected>3.00009155413138</expected></code>
<p>We will use block structure extensively to help us break up large programs into tractable pieces.<footnote><p>Embedded definitions must come first in a procedure body. The management is not responsible for the consequences of running programs that intertwine definition and use. </p>
</footnote> The idea of block structure originated with the programming language Algol 60. It appears in most advanced programming languages and is an important tool for helping to organize the construction of large programs.</p>
</subsection>
</section>
</section>
<section title="Procedures and the Processes They Generate">
<p>We have now considered the elements of programming: We have used primitive arithmetic operations, we have combined these operations, and we have abstracted these composite operations by defining them as compound procedures. But that is not enough to enable us to say that we know how to program. Our situation is analogous to that of someone who has learned the rules for how the pieces move in chess but knows nothing of typical openings, tactics, or strategy. Like the novice chess player, we don't yet know the common patterns of usage in the domain. We lack the knowledge of which moves are worth making (which procedures are worth defining). We lack the experience to predict the consequences of making a move (executing a procedure).</p>
<p>The ability to visualize the consequences of the actions under consideration is crucial to becoming an expert programmer, just as it is in any synthetic, creative activity. In becoming an expert photographer, for example, one must learn how to look at a scene and know how dark each region will appear on a print for each possible choice of exposure and development conditions. Only then can one reason backward, planning framing, lighting, exposure, and development to obtain the desired effects. So it is with programming, where we are planning the course of action to be taken by a process and where we control the process by means of a program. To become experts, we must learn to visualize the processes generated by various types of procedures. Only after we have developed such a skill can we learn to reliably construct programs that exhibit the desired behavior.</p>
<p>A procedure is a pattern for the <term>local evolution</term> of a computational process. It specifies how each stage of the process is built upon the previous stage. We would like to be able to make statements about the overall, or <term>global</term>, behavior of a process whose local evolution has been specified by a procedure. This is very difficult to do in general, but we can at least try to describe some typical patterns of process evolution.</p>
<p>In this section we will examine some common "shapes" for processes generated by simple procedures. We will also investigate the rates at which these processes consume the important computational resources of time and space. The procedures we will consider are very simple. Their role is like that played by test patterns in photography: as oversimplified prototypical patterns, rather than practical examples in their own right.</p>
<section title="Linear Recursion and Iteration">
<figure image="ch1-Z-G-7.gif"><caption>A linear recursive process for computing 6!.</caption></figure>
<p>We begin by considering the factorial function, defined by</p>
<image path="ch1-Z-G-8.gif"/>
<p>There are many ways to compute factorials. One way is to make use of the observation that <term>n</term>! is equal to <term>n</term> times (<term>n</term> - 1)! for any positive integer <term>n</term>:</p>
<image path="ch1-Z-G-9.gif"/>
<p>Thus, we can compute <term>n</term>! by computing (<term>n</term> - 1)! and multiplying the result by <term>n</term>. If we add the stipulation that 1! is equal to 1, this observation translates directly into a procedure:</p>
<code>(define (factorial n)
(if (= n 1)
1
(* n (factorial (- n 1)))))<extra>(factorial 4)</extra><expected>24</expected></code>
<p>We can use the substitution model of section 1.1.5 to watch this procedure in action computing 6!, as shown in figure 1.3.</p>
<p>Now let's take a different perspective on computing factorials. We could describe a rule for computing <term>n</term>! by specifying that we first multiply 1 by 2, then multiply the result by 3, then by 4, and so on until we reach <term>n</term>. More formally, we maintain a running product, together with a counter that counts from 1 up to <term>n</term>. We can describe the computation by saying that the counter and the product simultaneously change from one step to the next according to the rule</p>
<p>product <img src="book-Z-G-D-14.gif" border="0" /> counter · product </p>
<p>counter <img src="book-Z-G-D-14.gif" border="0" /> counter + 1</p>
<p>and stipulating that <term>n</term>! is the value of the product when the counter exceeds <term>n</term>.</p>
<figure image="ch1-Z-G-10.gif"><caption>A linear iterative process for computing 6!.</caption></figure>
<p>Once again, we can recast our description as a procedure for computing factorials:<footnote><p>In a real program we would probably use the block structure introduced in the last section to hide the definition of <code>fact-iter</code>: </p>
<code>(define (factorial n)
(define (iter product counter)
(if (> counter n)
product
(iter (* counter product)
(+ counter 1))))
(iter 1 1))<extra>(factorial 4)</extra><expected>24</expected></code>
<p>We avoided doing this here so as to minimize the number of things to think about at once. </p>
</footnote></p>
<code>(define (factorial n)
(fact-iter 1 1 n))
(define (fact-iter product counter max-count)
(if (> counter max-count)
product
(fact-iter (* counter product)
(+ counter 1)
max-count)))<extra>(factorial 4)</extra><expected>24</expected></code>
<p>As before, we can use the substitution model to visualize the process of computing 6!, as shown in figure 1.4.</p>
<p>Compare the two processes. From one point of view, they seem hardly different at all. Both compute the same mathematical function on the same domain, and each requires a number of steps proportional to <term>n</term> to compute <term>n</term>!. Indeed, both processes even carry out the same sequence of multiplications, obtaining the same sequence of partial products. On the other hand, when we consider the "shapes" of the two processes, we find that they evolve quite differently.</p>
<p>Consider the first process. The substitution model reveals a shape of expansion followed by contraction, indicated by the arrow in figure 1.3. The expansion occurs as the process builds up a chain of <term>deferred operations</term> (in this case, a chain of multiplications). The contraction occurs as the operations are actually performed. This type of process, characterized by a chain of deferred operations, is called a <term>recursive process</term>. Carrying out this process requires that the interpreter keep track of the operations to be performed later on. In the computation of <term>n</term>!, the length of the chain of deferred multiplications, and hence the amount of information needed to keep track of it, grows linearly with <term>n</term> (is proportional to <term>n</term>), just like the number of steps. Such a process is called a <term>linear recursive process</term>.</p>
<p>By contrast, the second process does not grow and shrink. At each step, all we need to keep track of, for any <term>n</term>, are the current values of the variables <code>product</code>, <code>counter</code>, and <code>max-count</code>. We call this an <term>iterative process</term>. In general, an iterative process is one whose state can be summarized by a fixed number of <term>state variables</term>, together with a fixed rule that describes how the state variables should be updated as the process moves from state to state and an (optional) end test that specifies conditions under which the process should terminate. In computing <term>n</term>!, the number of steps required grows linearly with <term>n</term>. Such a process is called a <term>linear iterative process</term>.</p>
<p>The contrast between the two processes can be seen in another way. In the iterative case, the program variables provide a complete description of the state of the process at any point. If we stopped the computation between steps, all we would need to do to resume the computation is to supply the interpreter with the values of the three program variables. Not so with the recursive process. In this case there is some additional "hidden" information, maintained by the interpreter and not contained in the program variables, which indicates "where the process is" in negotiating the chain of deferred operations. The longer the chain, the more information must be maintained.<footnote><p>When we discuss the implementation of procedures on register machines in chapter 5, we will see that any iterative process can be realized "in hardware" as a machine that has a fixed set of registers and no auxiliary memory. In contrast, realizing a recursive process requires a machine that uses an auxiliary data structure known as a <term>stack</term>. </p>
</footnote></p>
<p>In contrasting iteration and recursion, we must be careful not to confuse the notion of a recursive <term>process</term> with the notion of a recursive <term>procedure</term>. When we describe a procedure as recursive, we are referring to the syntactic fact that the procedure definition refers (either directly or indirectly) to the procedure itself. But when we describe a process as following a pattern that is, say, linearly recursive, we are speaking about how the process evolves, not about the syntax of how a procedure is written. It may seem disturbing that we refer to a recursive procedure such as <code>fact-iter</code> as generating an iterative process. However, the process really is iterative: Its state is captured completely by its three state variables, and an interpreter need keep track of only three variables in order to execute the process.</p>
<p>One reason that the distinction between process and procedure may be confusing is that most implementations of common languages (including Ada, Pascal, and C) are designed in such a way that the interpretation of any recursive procedure consumes an amount of memory that grows with the number of procedure calls, even when the process described is, in principle, iterative. As a consequence, these languages can describe iterative processes only by resorting to special-purpose "looping constructs" such as <code>do</code>, <code>repeat</code>, <code>until</code>, <code>for</code>, and <code>while</code>. The implementation of Scheme we shall consider in chapter 5 does not share this defect. It will execute an iterative process in constant space, even if the iterative process is described by a recursive procedure. An implementation with this property is called <term>tail-recursive</term>. With a tail-recursive implementation, iteration can be expressed using the ordinary procedure call mechanism, so that special iteration constructs are useful only as syntactic sugar.<footnote><p>Tail recursion has long been known as a compiler optimization trick. A coherent semantic basis for tail recursion was provided by Carl Hewitt (1977), who explained it in terms of the "message-passing" model of computation that we shall discuss in chapter 3. Inspired by this, Gerald Jay Sussman and Guy Lewis Steele Jr. (see Steele 1975) constructed a tail-recursive interpreter for Scheme. Steele later showed how tail recursion is a consequence of the natural way to compile procedure calls (Steele 1977). The IEEE standard for Scheme requires that Scheme implementations be tail-recursive. </p>
</footnote></p>
<exercise><p>Each of the following two procedures defines a method for adding two positive integers in terms of the procedures <code>inc</code>, which increments its argument by 1, and <code>dec</code>, which decrements its argument by 1.</p>
<code valid="false">(define (+ a b)
(if (= a 0)
b
(inc (+ (dec a) b))))</code>
<code valid="false">(define (+ a b)
(if (= a 0)
b
(+ (dec a) (inc b))))</code>
<p>Using the substitution model, illustrate the process generated by each procedure in evaluating <code>(+ 4 5)</code>. Are these processes iterative or recursive? </p>
</exercise>
<exercise><p>The following procedure computes a mathematical function called Ackermann's function.</p>
<code valid="false">(define (A x y)
(cond ((= y 0) 0)
((= x 0) (* 2 y))
((= y 1) 2)
(else (A (- x 1)
(A x (- y 1))))))</code>
<p>What are the values of the following expressions?</p>
<code valid="false">(A 1 10)
(A 2 4)
(A 3 3)</code>
<p>Consider the following procedures, where <code>A</code> is the procedure defined above: </p>
<code valid="false">(define (f n) (A 0 n))
(define (g n) (A 1 n))
(define (h n) (A 2 n))
(define (k n) (* 5 n n))</code>
<p>Give concise mathematical definitions for the functions computed by the procedures <code>f</code>, <code>g</code>, and <code>h</code> for positive integer values of <term>n</term>. For example, <code>(k n)</code> computes 5<term>n</term><sup>2</sup>. </p>
</exercise>
</section>
<section title="Tree Recursion">
<p>Another common pattern of computation is called <term>tree recursion</term>. As an example, consider computing the sequence of Fibonacci numbers, in which each number is the sum of the preceding two:</p>
<image path="ch1-Z-G-11.gif"/>
<p>In general, the Fibonacci numbers can be defined by the rule </p>
<image path="ch1-Z-G-12.gif"/>
<p>We can immediately translate this definition into a recursive procedure for computing Fibonacci numbers:</p>
<code>(define (fib n)
(cond ((= n 0) 0)
((= n 1) 1)
(else (+ (fib (- n 1))
(fib (- n 2))))))<extra>(fib 6)</extra><expected>8</expected></code>
<figure image="ch1-Z-G-13.gif"><caption>The tree-recursive process generated in computing <code>(fib 5)</code>.</caption></figure>
<p>Consider the pattern of this computation. To compute <code>(fib 5)</code>, we compute <code>(fib 4)</code> and <code>(fib 3)</code>. To compute <code>(fib 4)</code>, we compute <code>(fib 3)</code> and <code>(fib 2)</code>. In general, the evolved process looks like a tree, as shown in figure 1.5. Notice that the branches split into two at each level (except at the bottom); this reflects the fact that the <code>fib</code> procedure calls itself twice each time it is invoked.</p>
<p>This procedure is instructive as a prototypical tree recursion, but it is a terrible way to compute Fibonacci numbers because it does so much redundant computation. Notice in figure 1.5 that the entire computation of <code>(fib 3)</code> -- almost half the work -- is duplicated. In fact, it is not hard to show that the number of times the procedure will compute <code>(fib 1)</code> or <code>(fib 0)</code> (the number of leaves in the above tree, in general) is precisely <term>F</term><term>i</term><term>b</term>(<term>n</term> + 1). To get an idea of how bad this is, one can show that the value of <term>F</term><term>i</term><term>b</term>(<term>n</term>) grows exponentially with <term>n</term>. More precisely (see exercise 1.13), <term>F</term><term>i</term><term>b</term>(<term>n</term>) is the closest integer to <img src="book-Z-G-D-11.gif" border="0" /><sup><term>n</term></sup> /<img src="book-Z-G-D-13.gif" border="0" />5, where</p>
<image path="ch1-Z-G-14.gif"/>
<p>is the <term>golden ratio</term>, which satisfies the equation</p>
<image path="ch1-Z-G-15.gif"/>
<p>Thus, the process uses a number of steps that grows exponentially with the input. On the other hand, the space required grows only linearly with the input, because we need keep track only of which nodes are above us in the tree at any point in the computation. In general, the number of steps required by a tree-recursive process will be proportional to the number of nodes in the tree, while the space required will be proportional to the maximum depth of the tree.</p>
<p>We can also formulate an iterative process for computing the Fibonacci numbers. The idea is to use a pair of integers <term>a</term> and <term>b</term>, initialized to <term>F</term><term>i</term><term>b</term>(1) = 1 and <term>F</term><term>i</term><term>b</term>(0) = 0, and to repeatedly apply the simultaneous transformations </p>
<image path="ch1-Z-G-16.gif"/>
<p>It is not hard to show that, after applying this transformation <term>n</term> times, <term>a</term> and <term>b</term> will be equal, respectively, to <term>F</term><term>i</term><term>b</term>(<term>n</term> + 1) and <term>F</term><term>i</term><term>b</term>(<term>n</term>). Thus, we can compute Fibonacci numbers iteratively using the procedure</p>
<code>(define (fib n)
(fib-iter 1 0 n))
(define (fib-iter a b count)
(if (= count 0)
b
(fib-iter (+ a b) a (- count 1))))<extra>(fib 6)</extra><expected>8</expected></code>
<p>This second method for computing <term>F</term><term>i</term><term>b</term>(<term>n</term>) is a linear iteration. The difference in number of steps required by the two methods -- one linear in <term>n</term>, one growing as fast as <term>F</term><term>i</term><term>b</term>(<term>n</term>) itself -- is enormous, even for small inputs.</p>
<p>One should not conclude from this that tree-recursive processes are useless. When we consider processes that operate on hierarchically structured data rather than numbers, we will find that tree recursion is a natural and powerful tool.<footnote><p>An example of this was hinted at in section 1.1.3: The interpreter itself evaluates expressions using a tree-recursive process. </p>
</footnote> But even in numerical operations, tree-recursive processes can be useful in helping us to understand and design programs. For instance, although the first <code>fib</code> procedure is much less efficient than the second one, it is more straightforward, being little more than a translation into Lisp of the definition of the Fibonacci sequence. To formulate the iterative algorithm required noticing that the computation could be recast as an iteration with three state variables.</p>
<subsection title="Example: Counting change"><p>It takes only a bit of cleverness to come up with the iterative Fibonacci algorithm. In contrast, consider the following problem: How many different ways can we make change of $ 1.00, given half-dollars, quarters, dimes, nickels, and pennies? More generally, can we write a procedure to compute the number of ways to change any given amount of money?</p>
<p>This problem has a simple solution as a recursive procedure. Suppose we think of the types of coins available as arranged in some order. Then the following relation holds:</p>
<p>The number of ways to change amount <term>a</term> using <term>n</term> kinds of coins equals</p>
<ul>
<li>the number of ways to change amount <term>a</term> using all but the first kind of coin, plus </li>
<li>the number of ways to change amount <term>a</term> - <term>d</term> using all <term>n</term> kinds of coins, where <term>d</term> is the denomination of the first kind of coin. </li>
</ul>
<p>To see why this is true, observe that the ways to make change can be divided into two groups: those that do not use any of the first kind of coin, and those that do. Therefore, the total number of ways to make change for some amount is equal to the number of ways to make change for the amount without using any of the first kind of coin, plus the number of ways to make change assuming that we do use the first kind of coin. But the latter number is equal to the number of ways to make change for the amount that remains after using a coin of the first kind.</p>
<p>Thus, we can recursively reduce the problem of changing a given amount to the problem of changing smaller amounts using fewer kinds of coins. Consider this reduction rule carefully, and convince yourself that we can use it to describe an algorithm if we specify the following degenerate cases:<footnote><p>For example, work through in detail how the reduction rule applies to the problem of making change for 10 cents using pennies and nickels. </p>
</footnote></p>
<ul>
<li>If <term>a</term> is exactly 0, we should count that as 1 way to make change. </li>
<li>If <term>a</term> is less than 0, we should count that as 0 ways to make change. </li>
<li>If <term>n</term> is 0, we should count that as 0 ways to make change. </li>
</ul>
<p>We can easily translate this description into a recursive procedure:</p>
<code>(define (count-change amount)
(cc amount 5))
(define (cc amount kinds-of-coins)
(cond ((= amount 0) 1)
((or (< amount 0) (= kinds-of-coins 0)) 0)
(else (+ (cc amount
(- kinds-of-coins 1))
(cc (- amount
(first-denomination kinds-of-coins))
kinds-of-coins)))))
(define (first-denomination kinds-of-coins)
(cond ((= kinds-of-coins 1) 1)
((= kinds-of-coins 2) 5)
((= kinds-of-coins 3) 10)
((= kinds-of-coins 4) 25)
((= kinds-of-coins 5) 50)))<extra>(count-change 100)</extra><expected>292</expected></code>
<p>(The <code>first-denomination</code> procedure takes as input the number of kinds of coins available and returns the denomination of the first kind. Here we are thinking of the coins as arranged in order from largest to smallest, but any order would do as well.) We can now answer our original question about changing a dollar:</p>
<code><hidden>(define (count-change amount)
(cc amount 5))
(define (cc amount kinds-of-coins)
(cond ((= amount 0) 1)
((or (< amount 0) (= kinds-of-coins 0)) 0)
(else (+ (cc amount
(- kinds-of-coins 1))
(cc (- amount
(first-denomination kinds-of-coins))
kinds-of-coins)))))
(define (first-denomination kinds-of-coins)
(cond ((= kinds-of-coins 1) 1)
((= kinds-of-coins 2) 5)
((= kinds-of-coins 3) 10)
((= kinds-of-coins 4) 25)
((= kinds-of-coins 5) 50)))</hidden>(count-change 100)</code>
<result>292</result>
<p><code>Count-change</code> generates a tree-recursive process with redundancies similar to those in our first implementation of <code>fib</code>. (It will take quite a while for that 292 to be computed.) On the other hand, it is not obvious how to design a better algorithm for computing the result, and we leave this problem as a challenge. The observation that a tree-recursive process may be highly inefficient but often easy to specify and understand has led people to propose that one could get the best of both worlds by designing a "smart compiler" that could transform tree-recursive procedures into more efficient procedures that compute the same result.<footnote><p>One approach to coping with redundant computations is to arrange matters so that we automatically construct a table of values as they are computed. Each time we are asked to apply the procedure to some argument, we first look to see if the value is already stored in the table, in which case we avoid performing the redundant computation. This strategy, known as <term>tabulation</term> or <term>memoization</term>, can be implemented in a straightforward way. Tabulation can sometimes be used to transform processes that require an exponential number of steps (such as <code>count-change</code>) into processes whose space and time requirements grow linearly with the input. See exercise 3.27. </p>
</footnote></p>
</subsection>
<exercise><p>A function <term>f</term> is defined by the rule that <term>f</term>(<term>n</term>) = <term>n</term> if <term>n</term><3 and <term>f</term>(<term>n</term>) = <term>f</term>(<term>n</term> - 1) + 2<term>f</term>(<term>n</term> - 2) + 3<term>f</term>(<term>n</term> - 3) if <term>n</term><u>></u> 3. Write a procedure that computes <term>f</term> by means of a recursive process. Write a procedure that computes <term>f</term> by means of an iterative process. </p>
</exercise>
<exercise><p>The following pattern of numbers is called <term>Pascal's triangle</term>.</p>
<image path="ch1-Z-G-17.gif"/>
<p>The numbers at the edge of the triangle are all 1, and each number inside the triangle is the sum of the two numbers above it.<footnote><p>The elements of Pascal's triangle are called the <term>binomial coefficients</term>, because the <term>n</term>th row consists of the coefficients of the terms in the expansion of (<term>x</term> + <term>y</term>)<sup><term>n</term></sup>. This pattern for computing the coefficients appeared in Blaise Pascal's 1653 seminal work on probability theory, <term>Traité du triangle arithmétique</term>. According to Knuth (1973), the same pattern appears in the <term>Szu-yuen Yü-chien</term> ("The Precious Mirror of the Four Elements"), published by the Chinese mathematician Chu Shih-chieh in 1303, in the works of the twelfth-century Persian poet and mathematician Omar Khayyam, and in the works of the twelfth-century Hindu mathematician Bháscara Áchárya. </p>
</footnote> Write a procedure that computes elements of Pascal's triangle by means of a recursive process. </p>
</exercise>
<exercise><p>Prove that <term>F</term><term>i</term><term>b</term>(<term>n</term>) is the closest integer to <img src="book-Z-G-D-11.gif" border="0" /><sup><term>n</term></sup>/<img src="book-Z-G-D-13.gif" border="0" />5, where <img src="book-Z-G-D-11.gif" border="0" /> = (1 + <img src="book-Z-G-D-13.gif" border="0" />5)/2. Hint: Let <img src="book-Z-G-D-12.gif" border="0" /> = (1 - <img src="book-Z-G-D-13.gif" border="0" />5)/2. Use induction and the definition of the Fibonacci numbers (see section 1.2.2) to prove that <term>F</term><term>i</term><term>b</term>(<term>n</term>) = (<img src="book-Z-G-D-11.gif" border="0" /><sup><term>n</term></sup> - <img src="book-Z-G-D-12.gif" border="0" /><sup><term>n</term></sup>)/<img src="book-Z-G-D-13.gif" border="0" />5. </p>
</exercise>
</section>
<section title="Orders of Growth">
<p>The previous examples illustrate that processes can differ considerably in the rates at which they consume computational resources. One convenient way to describe this difference is to use the notion of <term>order of growth</term> to obtain a gross measure of the resources required by a process as the inputs become larger.</p>
<p>Let <term>n</term> be a parameter that measures the size of the problem, and let <term>R</term>(<term>n</term>) be the amount of resources the process requires for a problem of size <term>n</term>. In our previous examples we took <term>n</term> to be the number for which a given function is to be computed, but there are other possibilities. For instance, if our goal is to compute an approximation to the square root of a number, we might take <term>n</term> to be the number of digits accuracy required. For matrix multiplication we might take <term>n</term> to be the number of rows in the matrices. In general there are a number of properties of the problem with respect to which it will be desirable to analyze a given process. Similarly, <term>R</term>(<term>n</term>) might measure the number of internal storage registers used, the number of elementary machine operations performed, and so on. In computers that do only a fixed number of operations at a time, the time required will be proportional to the number of elementary machine operations performed.</p>
<p>We say that <term>R</term>(<term>n</term>) has order of growth <img src="book-Z-G-D-3.gif" border="0" />(<term>f</term>(<term>n</term>)), written <term>R</term>(<term>n</term>) = <img src="book-Z-G-D-3.gif" border="0" />(<term>f</term>(<term>n</term>)) (pronounced "theta of <term>f</term>(<term>n</term>)"), if there are positive constants <term>k</term><sub>1</sub> and <term>k</term><sub>2</sub> independent of <term>n</term> such that </p>
<image path="ch1-Z-G-18.gif"/>
<p>for any sufficiently large value of <term>n</term>. (In other words, for large <term>n</term>, the value <term>R</term>(<term>n</term>) is sandwiched between <term>k</term><sub>1</sub><term>f</term>(<term>n</term>) and <term>k</term><sub>2</sub><term>f</term>(<term>n</term>).)</p>
<p>For instance, with the linear recursive process for computing factorial described in section 1.2.1 the number of steps grows proportionally to the input <term>n</term>. Thus, the steps required for this process grows as <img src="book-Z-G-D-3.gif" border="0" />(<term>n</term>). We also saw that the space required grows as <img src="book-Z-G-D-3.gif" border="0" />(<term>n</term>). For the iterative factorial, the number of steps is still <img src="book-Z-G-D-3.gif" border="0" />(<term>n</term>) but the space is <img src="book-Z-G-D-3.gif" border="0" />(1) -- that is, constant.<footnote><p>These statements mask a great deal of oversimplification. For instance, if we count process steps as "machine operations" we are making the assumption that the number of machine operations needed to perform, say, a multiplication is independent of the size of the numbers to be multiplied, which is false if the numbers are sufficiently large. Similar remarks hold for the estimates of space. Like the design and description of a process, the analysis of a process can be carried out at various levels of abstraction. </p>
</footnote> The tree-recursive Fibonacci computation requires <img src="book-Z-G-D-3.gif" border="0" />(<img src="book-Z-G-D-11.gif" border="0" /><sup><term>n</term></sup>) steps and space <img src="book-Z-G-D-3.gif" border="0" />(<term>n</term>), where <img src="book-Z-G-D-11.gif" border="0" /> is the golden ratio described in section 1.2.2.</p>
<p>Orders of growth provide only a crude description of the behavior of a process. For example, a process requiring <term>n</term><sup>2</sup> steps and a process requiring 1000<term>n</term><sup>2</sup> steps and a process requiring 3<term>n</term><sup>2</sup> + 10<term>n</term> + 17 steps all have <img src="book-Z-G-D-3.gif" border="0" />(<term>n</term><sup>2</sup>) order of growth. On the other hand, order of growth provides a useful indication of how we may expect the behavior of the process to change as we change the size of the problem. For a <img src="book-Z-G-D-3.gif" border="0" />(<term>n</term>) (linear) process, doubling the size will roughly double the amount of resources used. For an exponential process, each increment in problem size will multiply the resource utilization by a constant factor. In the remainder of section 1.2 we will examine two algorithms whose order of growth is logarithmic, so that doubling the problem size increases the resource requirement by a constant amount.</p>
<exercise><p>Draw the tree illustrating the process generated by the <code>count-change</code> procedure of section 1.2.2 in making change for 11 cents. What are the orders of growth of the space and number of steps used by this process as the amount to be changed increases? </p>
</exercise>
<exercise><p>The sine of an angle (specified in radians) can be computed by making use of the approximation <code>sin</code> <term>x</term> <img src="book-Z-G-D-20.gif" border="0" /> <term>x</term> if <term>x</term> is sufficiently small, and the trigonometric identity </p>
<image path="ch1-Z-G-19.gif"/>
<p>to reduce the size of the argument of <code>sin</code>. (For purposes of this exercise an angle is considered "sufficiently small" if its magnitude is not greater than 0.1 radians.) These ideas are incorporated in the following procedures:</p>
<code valid="false">(define (cube x) (* x x x))
(define (p x) (- (* 3 x) (* 4 (cube x))))
(define (sine angle)
(if (not (> (abs angle) 0.1))
angle
(p (sine (/ angle 3.0)))))</code>
<p>a. How many times is the procedure <code>p</code> applied when <code>(sine 12.15)</code> is evaluated?</p>
<p>b. What is the order of growth in space and number of steps (as a function of <term>a</term>) used by the process generated by the <code>sine</code> procedure when <code>(sine a)</code> is evaluated? </p>
</exercise>
</section>
<section title="Exponentiation">
<p>Consider the problem of computing the exponential of a given number. We would like a procedure that takes as arguments a base <term>b</term> and a positive integer exponent <term>n</term> and computes <term>b</term><sup><term>n</term></sup>. One way to do this is via the recursive definition </p>
<image path="ch1-Z-G-20.gif"/>
<p>which translates readily into the procedure </p>
<code>(define (expt b n)
(if (= n 0)
1
(* b (expt b (- n 1)))))<extra>(expt 2 9)</extra><expected>512</expected></code>
<p>This is a linear recursive process, which requires <img src="book-Z-G-D-3.gif" border="0" />(<term>n</term>) steps and <img src="book-Z-G-D-3.gif" border="0" />(<term>n</term>) space. Just as with factorial, we can readily formulate an equivalent linear iteration:</p>
<code>(define (expt b n)
(expt-iter b n 1))
(define (expt-iter b counter product)
(if (= counter 0)
product
(expt-iter b
(- counter 1)
(* b product))))<extra>(expt 2 9)</extra><expected>512</expected></code>
<p>This version requires <img src="book-Z-G-D-3.gif" border="0" />(<term>n</term>) steps and <img src="book-Z-G-D-3.gif" border="0" />(1) space.</p>
<p>We can compute exponentials in fewer steps by using successive squaring. For instance, rather than computing <term>b</term><sup>8</sup> as </p>
<image path="ch1-Z-G-21.gif"/>
<p>we can compute it using three multiplications: </p>
<image path="ch1-Z-G-22.gif"/>
<p>This method works fine for exponents that are powers of 2. We can also take advantage of successive squaring in computing exponentials in general if we use the rule</p>
<image path="ch1-Z-G-23.gif"/>
<p>We can express this method as a procedure:</p>
<code><hidden>(define (even? n)
(= (remainder n 2) 0))
(define (square x) (* x x))</hidden>(define (fast-expt b n)
(cond ((= n 0) 1)
((even? n) (square (fast-expt b (/ n 2))))
(else (* b (fast-expt b (- n 1))))))<extra>(fast-expt 2 9)</extra><expected>512</expected></code>
<p>where the predicate to test whether an integer is even is defined in terms of the primitive procedure <code>remainder</code> by </p>
<code>(define (even? n)
(= (remainder n 2) 0))<extra>(even? 3)</extra><expected>#f</expected></code>
<p>The process evolved by <code>fast-expt</code> grows logarithmically with <term>n</term> in both space and number of steps. To see this, observe that computing <term>b</term><sup>2<term>n</term></sup> using <code>fast-expt</code> requires only one more multiplication than computing <term>b</term><sup><term>n</term></sup>. The size of the exponent we can compute therefore doubles (approximately) with every new multiplication we are allowed. Thus, the number of multiplications required for an exponent of <term>n</term> grows about as fast as the logarithm of <term>n</term> to the base 2. The process has <img src="book-Z-G-D-3.gif" border="0" />(<code>log</code> <term>n</term>) growth.<footnote><p>More precisely, the number of multiplications required is equal to 1 less than the log base 2 of <term>n</term> plus the number of ones in the binary representation of <term>n</term>. This total is always less than twice the log base 2 of <term>n</term>. The arbitrary constants <term>k</term><sub>1</sub> and <term>k</term><sub>2</sub> in the definition of order notation imply that, for a logarithmic process, the base to which logarithms are taken does not matter, so all such processes are described as <img src="book-Z-G-D-3.gif" border="0" />(<code>log</code> <term>n</term>). </p>
</footnote></p>
<p>The difference between <img src="book-Z-G-D-3.gif" border="0" />(<code>log</code> <term>n</term>) growth and <img src="book-Z-G-D-3.gif" border="0" />(<term>n</term>) growth becomes striking as <term>n</term> becomes large. For example, <code>fast-expt</code> for <term>n</term> = 1000 requires only 14 multiplications.<footnote><p>You may wonder why anyone would care about raising numbers to the 1000th power. See section 1.2.6. </p>
</footnote> It is also possible to use the idea of successive squaring to devise an iterative algorithm that computes exponentials with a logarithmic number of steps (see exercise 1.16), although, as is often the case with iterative algorithms, this is not written down so straightforwardly as the recursive algorithm.<footnote><p>This iterative algorithm is ancient. It appears in the <term>Chandah-sutra</term> by Áchárya Pingala, written before 200 B.C. See Knuth 1981, section 4.6.3, for a full discussion and analysis of this and other methods of exponentiation. </p>
</footnote> </p>
<exercise><p>Design a procedure that evolves an iterative exponentiation process that uses successive squaring and uses a logarithmic number of steps, as does <code>fast-expt</code>. (Hint: Using the observation that (<term>b</term><sup><term>n</term>/2</sup>)<sup>2</sup> = (<term>b</term><sup>2</sup>)<sup><term>n</term>/2</sup>, keep, along with the exponent <term>n</term> and the base <term>b</term>, an additional state variable <term>a</term>, and define the state transformation in such a way that the product <term>a</term> <term>b</term><sup><term>n</term></sup> is unchanged from state to state. At the beginning of the process <term>a</term> is taken to be 1, and the answer is given by the value of <term>a</term> at the end of the process. In general, the technique of defining an <term>invariant quantity</term> that remains unchanged from state to state is a powerful way to think about the design of iterative algorithms.) </p>
</exercise>
<exercise><p>The exponentiation algorithms in this section are based on performing exponentiation by means of repeated multiplication. In a similar way, one can perform integer multiplication by means of repeated addition. The following multiplication procedure (in which it is assumed that our language can only add, not multiply) is analogous to the <code>expt</code> procedure:</p>
<code valid="false">(define (* a b)
(if (= b 0)
0
(+ a (* a (- b 1)))))</code>
<p>This algorithm takes a number of steps that is linear in <code>b</code>. Now suppose we include, together with addition, operations <code>double</code>, which doubles an integer, and <code>halve</code>, which divides an (even) integer by 2. Using these, design a multiplication procedure analogous to <code>fast-expt</code> that uses a logarithmic number of steps. </p>
</exercise>
<exercise><p>Using the results of exercises 1.16 and 1.17, devise a procedure that generates an iterative process for multiplying two integers in terms of adding, doubling, and halving and uses a logarithmic number of steps.<footnote><p>This algorithm, which is sometimes known as the "Russian peasant method" of multiplication, is ancient. Examples of its use are found in the Rhind Papyrus, one of the two oldest mathematical documents in existence, written about 1700 B.C. (and copied from an even older document) by an Egyptian scribe named A'h-mose. </p>
</footnote> </p>
</exercise>
<exercise><p>There is a clever algorithm for computing the Fibonacci numbers in a logarithmic number of steps. Recall the transformation of the state variables <term>a</term> and <term>b</term> in the <code>fib-iter</code> process of section 1.2.2: <term>a</term> <img src="book-Z-G-D-14.gif" border="0" /> <term>a</term> + <term>b</term> and <term>b</term> <img src="book-Z-G-D-14.gif" border="0" /> <term>a</term>. Call this transformation <term>T</term>, and observe that applying <term>T</term> over and over again <term>n</term> times, starting with 1 and 0, produces the pair <term>F</term><term>i</term><term>b</term>(<term>n</term> + 1) and <term>F</term><term>i</term><term>b</term>(<term>n</term>). In other words, the Fibonacci numbers are produced by applying <term>T</term><sup><term>n</term></sup>, the <term>n</term>th power of the transformation <term>T</term>, starting with the pair (1,0). Now consider <term>T</term> to be the special case of <term>p</term> = 0 and <term>q</term> = 1 in a family of transformations <term>T</term><sub><term>p</term><term>q</term></sub>, where <term>T</term><sub><term>p</term><term>q</term></sub> transforms the pair (<term>a</term>,<term>b</term>) according to <term>a</term> <img src="book-Z-G-D-14.gif" border="0" /> <term>b</term><term>q</term> + <term>a</term><term>q</term> + <term>a</term><term>p</term> and <term>b</term> <img src="book-Z-G-D-14.gif" border="0" /> <term>b</term><term>p</term> + <term>a</term><term>q</term>. Show that if we apply such a transformation <term>T</term><sub><term>p</term><term>q</term></sub> twice, the effect is the same as using a single transformation <term>T</term><sub><term>p</term>'<term>q</term>'</sub> of the same form, and compute <term>p</term>' and <term>q</term>' in terms of <term>p</term> and <term>q</term>. This gives us an explicit way to square these transformations, and thus we can compute <term>T</term><sup><term>n</term></sup> using successive squaring, as in the <code>fast-expt</code> procedure. Put this all together to complete the following procedure, which runs in a logarithmic number of steps:<footnote><p>This exercise was suggested to us by Joe Stoy, based on an example in Kaldewaij 1990. </p>
</footnote></p>
<code valid="false">(define (fib n)
(fib-iter 1 0 0 1 n))
(define (fib-iter a b p q count)
(cond ((= count 0) b)
((even? count)
(fib-iter a
b
<??> ; compute p'
<??> ; compute q'
(/ count 2)))
(else (fib-iter (+ (* b q) (* a q) (* a p))
(+ (* b p) (* a q))
p
q
(- count 1)))))</code>
</exercise>
</section>
<section title="Greatest Common Divisors">
<p>The greatest common divisor (GCD) of two integers <term>a</term> and <term>b</term> is defined to be the largest integer that divides both <term>a</term> and <term>b</term> with no remainder. For example, the GCD of 16 and 28 is 4. In chapter 2, when we investigate how to implement rational-number arithmetic, we will need to be able to compute GCDs in order to reduce rational numbers to lowest terms. (To reduce a rational number to lowest terms, we must divide both the numerator and the denominator by their GCD. For example, 16/28 reduces to 4/7.) One way to find the GCD of two integers is to factor them and search for common factors, but there is a famous algorithm that is much more efficient.</p>
<p>The idea of the algorithm is based on the observation that, if <term>r</term> is the remainder when <term>a</term> is divided by <term>b</term>, then the common divisors of <term>a</term> and <term>b</term> are precisely the same as the common divisors of <term>b</term> and <term>r</term>. Thus, we can use the equation </p>
<image path="ch1-Z-G-24.gif"/>
<p>to successively reduce the problem of computing a GCD to the problem of computing the GCD of smaller and smaller pairs of integers. For example, </p>
<image path="ch1-Z-G-25.gif"/>
<p>reduces GCD(206,40) to GCD(2,0), which is 2. It is possible to show that starting with any two positive integers and performing repeated reductions will always eventually produce a pair where the second number is 0. Then the GCD is the other number in the pair. This method for computing the GCD is known as <term>Euclid's Algorithm</term>.<footnote><p>Euclid's Algorithm is so called because it appears in Euclid's <term>Elements</term> (Book 7, ca. 300 B.C.). According to Knuth (1973), it can be considered the oldest known nontrivial algorithm. The ancient Egyptian method of multiplication (exercise 1.18) is surely older, but, as Knuth explains, Euclid's algorithm is the oldest known to have been presented as a general algorithm, rather than as a set of illustrative examples. </p>
</footnote></p>
<p>It is easy to express Euclid's Algorithm as a procedure: </p>
<code>(define (gcd a b)
(if (= b 0)
a
(gcd b (remainder a b))))<extra>(gcd 40 30)</extra><expected>10</expected></code>
<p>This generates an iterative process, whose number of steps grows as the logarithm of the numbers involved.</p>
<p>The fact that the number of steps required by Euclid's Algorithm has logarithmic growth bears an interesting relation to the Fibonacci numbers:</p>
<p>compute the GCD of some pair, then the smaller number in the pair must be greater than or equal to the <term>k</term>th Fibonacci number.<footnote><p>This theorem was proved in 1845 by Gabriel Lamé, a French mathematician and engineer known chiefly for his contributions to mathematical physics. To prove the theorem, we consider pairs (<term>a</term><sub><term>k</term></sub> ,<term>b</term><sub><term>k</term></sub>), where <term>a</term><sub><term>k</term></sub><u>></u> <term>b</term><sub><term>k</term></sub>, for which Euclid's Algorithm terminates in <term>k</term> steps. The proof is based on the claim that, if (<term>a</term><sub><term>k</term>+1</sub>, <term>b</term><sub><term>k</term>+1</sub>) <img src="book-Z-G-D-15.gif" border="0" /> (<term>a</term><sub><term>k</term></sub>, <term>b</term><sub><term>k</term></sub>) <img src="book-Z-G-D-15.gif" border="0" /> (<term>a</term><sub><term>k</term>-1</sub>, <term>b</term><sub><term>k</term>-1</sub>) are three successive pairs in the reduction process, then we must have <term>b</term><sub><term>k</term>+1</sub><u>></u> <term>b</term><sub><term>k</term></sub> + <term>b</term><sub><term>k</term>-1</sub>. To verify the claim, consider that a reduction step is defined by applying the transformation <term>a</term><sub><term>k</term>-1</sub> = <term>b</term><sub><term>k</term></sub>, <term>b</term><sub><term>k</term>-1</sub> = remainder of <term>a</term><sub><term>k</term></sub> divided by <term>b</term><sub><term>k</term></sub>. The second equation means that <term>a</term><sub><term>k</term></sub> = <term>q</term><term>b</term><sub><term>k</term></sub> + <term>b</term><sub><term>k</term>-1</sub> for some positive integer <term>q</term>. And since <term>q</term> must be at least 1 we have <term>a</term><sub><term>k</term></sub> = <term>q</term><term>b</term><sub><term>k</term></sub> + <term>b</term><sub><term>k</term>-1</sub> <u>></u> <term>b</term><sub><term>k</term></sub> + <term>b</term><sub><term>k</term>-1</sub>. But in the previous reduction step we have <term>b</term><sub><term>k</term>+1</sub> = <term>a</term><sub><term>k</term></sub>. Therefore, <term>b</term><sub><term>k</term>+1</sub> = <term>a</term><sub><term>k</term></sub><u>></u> <term>b</term><sub><term>k</term></sub> + <term>b</term><sub><term>k</term>-1</sub>. This verifies the claim. Now we can prove the theorem by induction on <term>k</term>, the number of steps that the algorithm requires to terminate. The result is true for <term>k</term> = 1, since this merely requires that <term>b</term> be at least as large as <term>F</term><term>i</term><term>b</term>(1) = 1. Now, assume that the result is true for all integers less than or equal to <term>k</term> and establish the result for <term>k</term> + 1. Let (<term>a</term><sub><term>k</term>+1</sub>, <term>b</term><sub><term>k</term>+1</sub>) <img src="book-Z-G-D-15.gif" border="0" /> (<term>a</term><sub><term>k</term></sub>, <term>b</term><sub><term>k</term></sub>) <img src="book-Z-G-D-15.gif" border="0" /> (<term>a</term><sub><term>k</term>-1</sub>, <term>b</term><sub><term>k</term>-1</sub>) be successive pairs in the reduction process. By our induction hypotheses, we have <term>b</term><sub><term>k</term>-1</sub><u>></u> <term>F</term><term>i</term><term>b</term>(<term>k</term> - 1) and <term>b</term><sub><term>k</term></sub><u>></u> <term>F</term><term>i</term><term>b</term>(<term>k</term>). Thus, applying the claim we just proved together with the definition of the Fibonacci numbers gives <term>b</term><sub><term>k</term>+1</sub> <u>></u> <term>b</term><sub><term>k</term></sub> + <term>b</term><sub><term>k</term>-1</sub><u>></u> <term>F</term><term>i</term><term>b</term>(<term>k</term>) + <term>F</term><term>i</term><term>b</term>(<term>k</term> - 1) = <term>F</term><term>i</term><term>b</term>(<term>k</term> + 1), which completes the proof of Lamé's Theorem. </p>
</footnote></p>
<p>We can use this theorem to get an order-of-growth estimate for Euclid's Algorithm. Let <term>n</term> be the smaller of the two inputs to the procedure. If the process takes <term>k</term> steps, then we must have <term>n</term><u>></u> <term>F</term><term>i</term><term>b</term> (<term>k</term>) <img src="book-Z-G-D-20.gif" border="0" /> <img src="book-Z-G-D-11.gif" border="0" /><sup><term>k</term></sup>/<img src="book-Z-G-D-13.gif" border="0" />5. Therefore the number of steps <term>k</term> grows as the logarithm (to the base <img src="book-Z-G-D-11.gif" border="0" />) of <term>n</term>. Hence, the order of growth is <img src="book-Z-G-D-3.gif" border="0" />(<code>log</code> <term>n</term>).</p>
<exercise><p>The process that a procedure generates is of course dependent on the rules used by the interpreter. As an example, consider the iterative <code>gcd</code> procedure given above. Suppose we were to interpret this procedure using normal-order evaluation, as discussed in section 1.1.5. (The normal-order-evaluation rule for <code>if</code> is described in exercise 1.5.) Using the substitution method (for normal order), illustrate the process generated in evaluating <code>(gcd 206 40)</code> and indicate the <code>remainder</code> operations that are actually performed. How many <code>remainder</code> operations are actually performed in the normal-order evaluation of <code>(gcd 206 40)</code>? In the applicative-order evaluation? </p>
</exercise>
</section>
<section title="Example: Testing for Primality">
<p>This section describes two methods for checking the primality of an integer <term>n</term>, one with order of growth <img src="book-Z-G-D-3.gif" border="0" />(<img src="book-Z-G-D-13.gif" border="0" /><term>n</term>), and a "probabilistic" algorithm with order of growth <img src="book-Z-G-D-3.gif" border="0" />(<code>log</code> <term>n</term>). The exercises at the end of this section suggest programming projects based on these algorithms.</p>
<subsection title="Searching for divisors"><p>Since ancient times, mathematicians have been fascinated by problems concerning prime numbers, and many people have worked on the problem of determining ways to test if numbers are prime. One way to test if a number is prime is to find the number's divisors. The following program finds the smallest integral divisor (greater than 1) of a given number <term>n</term>. It does this in a straightforward way, by testing <term>n</term> for divisibility by successive integers starting with 2.</p>
<code><hidden>(define (square x) (* x x))</hidden>(define (smallest-divisor n)
(find-divisor n 2))
(define (find-divisor n test-divisor)
(cond ((> (square test-divisor) n) n)
((divides? test-divisor n) test-divisor)
(else (find-divisor n (+ test-divisor 1)))))
(define (divides? a b)
(= (remainder b a) 0))<extra>(smallest-divisor 12)</extra><expected>2</expected></code>
<p>We can test whether a number is prime as follows: <term>n</term> is prime if and only if <term>n</term> is its own smallest divisor.</p>
<code><hidden>(define (smallest-divisor n)
(find-divisor n 2))
(define (find-divisor n test-divisor)
(cond ((> (square test-divisor) n) n)
((divides? test-divisor n) test-divisor)
(else (find-divisor n (+ test-divisor 1)))))
(define (divides? a b)
(= (remainder b a) 0))
(define (square x) (* x x))</hidden>(define (prime? n)
(= n (smallest-divisor n)))<extra>(prime? 13)</extra><expected>#t</expected></code>
<p>The end test for <code>find-divisor</code> is based on the fact that if <term>n</term> is not prime it must have a divisor less than or equal to <img src="book-Z-G-D-13.gif" border="0" /><term>n</term>.<footnote><p>If <term>d</term> is a divisor of <term>n</term>, then so is <term>n</term>/<term>d</term>. But <term>d</term> and <term>n</term>/<term>d</term> cannot both be greater than <img src="book-Z-G-D-13.gif" border="0" /><term>n</term>. </p>
</footnote> This means that the algorithm need only test divisors between 1 and <img src="book-Z-G-D-13.gif" border="0" /><term>n</term>. Consequently, the number of steps required to identify <term>n</term> as prime will have order of growth <img src="book-Z-G-D-3.gif" border="0" />(<img src="book-Z-G-D-13.gif" border="0" /><term>n</term>).</p>
</subsection>
<subsection title="The Fermat test"><p>The <img src="book-Z-G-D-3.gif" border="0" />(<code>log</code> <term>n</term>) primality test is based on a result from number theory known as Fermat's Little Theorem.<footnote><p>Pierre de Fermat (1601-1665) is considered to be the founder of modern number theory. He obtained many important number-theoretic results, but he usually announced just the results, without providing his proofs. Fermat's Little Theorem was stated in a letter he wrote in 1640. The first published proof was given by Euler in 1736 (and an earlier, identical proof was discovered in the unpublished manuscripts of Leibniz). The most famous of Fermat's results -- known as Fermat's Last Theorem -- was jotted down in 1637 in his copy of the book <term>Arithmetic</term> (by the third-century Greek mathematician Diophantus) with the remark "I have discovered a truly remarkable proof, but this margin is too small to contain it." Finding a proof of Fermat's Last Theorem became one of the most famous challenges in number theory. A complete solution was finally given in 1995 by Andrew Wiles of Princeton University. </p>
</footnote></p>
<p>(Two numbers are said to be <term>congruent modulo</term> <term>n</term> if they both have the same remainder when divided by <term>n</term>. The remainder of a number <term>a</term> when divided by <term>n</term> is also referred to as the <term>remainder of</term> <term>a</term> <term>modulo</term> <term>n</term>, or simply as <term>a</term> <term>modulo</term> <term>n</term>.)</p>
<p>If <term>n</term> is not prime, then, in general, most of the numbers <term>a</term>< <term>n</term> will not satisfy the above relation. This leads to the following algorithm for testing primality: Given a number <term>n</term>, pick a random number <term>a</term> < <term>n</term> and compute the remainder of <term>a</term><sup><term>n</term></sup> modulo <term>n</term>. If the result is not equal to <term>a</term>, then <term>n</term> is certainly not prime. If it is <term>a</term>, then chances are good that <term>n</term> is prime. Now pick another random number <term>a</term> and test it with the same method. If it also satisfies the equation, then we can be even more confident that <term>n</term> is prime. By trying more and more values of <term>a</term>, we can increase our confidence in the result. This algorithm is known as the Fermat test.</p>
<p>To implement the Fermat test, we need a procedure that computes the exponential of a number modulo another number:</p>
<code><hidden>(define (even? n) (= (remainder n 2) 0))
(define (square x) (* x x))</hidden>(define (expmod base exp m)
(cond ((= exp 0) 1)
((even? exp)
(remainder (square (expmod base (/ exp 2) m))
m))
(else
(remainder (* base (expmod base (- exp 1) m))
m))))<extra>(expmod 2 9 500)</extra><expected>12</expected></code>
<p>This is very similar to the <code>fast-expt</code> procedure of section 1.2.4. It uses successive squaring, so that the number of steps grows logarithmically with the exponent.<footnote><p>The reduction steps in the cases where the exponent <term>e</term> is greater than 1 are based on the fact that, for any integers <term>x</term>, <term>y</term>, and <term>m</term>, we can find the remainder of <term>x</term> times <term>y</term> modulo <term>m</term> by computing separately the remainders of <term>x</term> modulo <term>m</term> and <term>y</term> modulo <term>m</term>, multiplying these, and then taking the remainder of the result modulo <term>m</term>. For instance, in the case where <term>e</term> is even, we compute the remainder of <term>b</term><sup><term>e</term>/2</sup> modulo <term>m</term>, square this, and take the remainder modulo <term>m</term>. This technique is useful because it means we can perform our computation without ever having to deal with numbers much larger than <term>m</term>. (Compare exercise 1.25.) </p>
</footnote></p>
<p>The Fermat test is performed by choosing at random a number <term>a</term> between 1 and <term>n</term> - 1 inclusive and checking whether the remainder modulo <term>n</term> of the <term>n</term>th power of <term>a</term> is equal to <term>a</term>. The random number <term>a</term> is chosen using the procedure <code>random</code>, which we assume is included as a primitive in Scheme. <code>Random</code> returns a nonnegative integer less than its integer input. Hence, to obtain a random number between 1 and <term>n</term> - 1, we call <code>random</code> with an input of <term>n</term> - 1 and add 1 to the result:</p>
<code><hidden>(define (expmod base exp m)
(cond ((= exp 0) 1)
((even? exp)
(remainder (square (expmod base (/ exp 2) m))
m))
(else
(remainder (* base (expmod base (- exp 1) m))
m))))
(define (square x) (* x x))
(define (even? n) (= (remainder n 2) 0))</hidden>(define (fermat-test n)
(define (try-it a)
(= (expmod a n n) a))
(try-it (+ 1 (random (- n 1)))))<extra>(fermat-test 13)</extra><expected>#t</expected></code>
<p>The following procedure runs the test a given number of times, as specified by a parameter. Its value is true if the test succeeds every time, and false otherwise.</p>
<code><hidden>(define (expmod base exp m)
(cond ((= exp 0) 1)
((even? exp)
(remainder (square (expmod base (/ exp 2) m))
m))
(else
(remainder (* base (expmod base (- exp 1) m))
m))))
(define (even? n) (= (remainder n 2) 0))
(define (square x) (* x x))
(define (fermat-test n)
(define (try-it a)
(= (expmod a n n) a))
(try-it (+ 1 (random (- n 1)))))</hidden>(define (fast-prime? n times)
(cond ((= times 0) true)
((fermat-test n) (fast-prime? n (- times 1)))
(else false)))<extra>(fast-prime? 13 10)</extra><expected>#t</expected></code>
</subsection>
<subsection title="Probabilistic methods"><p>The Fermat test differs in character from most familiar algorithms, in which one computes an answer that is guaranteed to be correct. Here, the answer obtained is only probably correct. More precisely, if <term>n</term> ever fails the Fermat test, we can be certain that <term>n</term> is not prime. But the fact that <term>n</term> passes the test, while an extremely strong indication, is still not a guarantee that <term>n</term> is prime. What we would like to say is that for any number <term>n</term>, if we perform the test enough times and find that <term>n</term> always passes the test, then the probability of error in our primality test can be made as small as we like.</p>
<p>Unfortunately, this assertion is not quite correct. There do exist numbers that fool the Fermat test: numbers <term>n</term> that are not prime and yet have the property that <term>a</term><sup><term>n</term></sup> is congruent to <term>a</term> modulo <term>n</term> for all integers <term>a</term> < <term>n</term>. Such numbers are extremely rare, so the Fermat test is quite reliable in practice.<footnote><p>Numbers that fool the Fermat test are called <term>Carmichael numbers</term>, and little is known about them other than that they are extremely rare. There are 255 Carmichael numbers below 100,000,000. The smallest few are 561, 1105, 1729, 2465, 2821, and 6601. In testing primality of very large numbers chosen at random, the chance of stumbling upon a value that fools the Fermat test is less than the chance that cosmic radiation will cause the computer to make an error in carrying out a "correct" algorithm. Considering an algorithm to be inadequate for the first reason but not for the second illustrates the difference between mathematics and engineering. </p>
</footnote> There are variations of the Fermat test that cannot be fooled. In these tests, as with the Fermat method, one tests the primality of an integer <term>n</term> by choosing a random integer <term>a</term><<term>n</term> and checking some condition that depends upon <term>n</term> and <term>a</term>. (See exercise 1.28 for an example of such a test.) On the other hand, in contrast to the Fermat test, one can prove that, for any <term>n</term>, the condition does not hold for most of the integers <term>a</term><<term>n</term> unless <term>n</term> is prime. Thus, if <term>n</term> passes the test for some random choice of <term>a</term>, the chances are better than even that <term>n</term> is prime. If <term>n</term> passes the test for two random choices of <term>a</term>, the chances are better than 3 out of 4 that <term>n</term> is prime. By running the test with more and more randomly chosen values of <term>a</term> we can make the probability of error as small as we like.</p>
<p>The existence of tests for which one can prove that the chance of error becomes arbitrarily small has sparked interest in algorithms of this type, which have come to be known as <term>probabilistic algorithms</term>. There is a great deal of research activity in this area, and probabilistic algorithms have been fruitfully applied to many fields.<footnote><p>One of the most striking applications of probabilistic prime testing has been to the field of cryptography. Although it is now computationally infeasible to factor an arbitrary 200-digit number, the primality of such a number can be checked in a few seconds with the Fermat test. This fact forms the basis of a technique for constructing "unbreakable codes" suggested by Rivest, Shamir, and Adleman (1977). The resulting <term>RSA algorithm</term> has become a widely used technique for enhancing the security of electronic communications. Because of this and related developments, the study of prime numbers, once considered the epitome of a topic in "pure" mathematics to be studied only for its own sake, now turns out to have important practical applications to cryptography, electronic funds transfer, and information retrieval. </p>
</footnote></p>
</subsection>