CoCalc -- book.tex

📚 The CoCalc Library - books, templates and other resources
cocalc-examples / think-python-2ed / book / book.tex
¹³²⁹²⁴ views
License: OTHER
1
% LaTeX source for ``Think Python: How to Think Like a Computer Scientist''
2
% Copyright (c)  2015  Allen B. Downey.
3

4
% License: Creative Commons Attribution-NonCommercial 3.0 Unported License.
5
% http://creativecommons.org/licenses/by-nc/3.0/
6
%
7

8
%\documentclass[10pt,b5paper]{book}
9
\documentclass[10pt]{book}
10
\usepackage[width=5.5in,height=8.5in,hmarginratio=3:2,vmarginratio=1:1]{geometry}
11

12
% for some of these packages, you might have to install
13
% texlive-latex-extra (in Ubuntu)
14

15
\usepackage[T1]{fontenc}
16
\usepackage{textcomp}
17
\usepackage{mathpazo}
18
\usepackage{url}
19
\usepackage{fancyhdr}
20
\usepackage{graphicx}
21
\usepackage{amsmath}
22
\usepackage{amsthm}
23
%\usepackage{amssymb}
24
\usepackage{exercise}                        % texlive-latex-extra
25
\usepackage{makeidx}
26
\usepackage{setspace}
27
\usepackage{hevea}                           
28
\usepackage{upquote}
29
\usepackage{appendix}
30
\usepackage[bookmarks]{hyperref}
31

32
\title{Think Python}
33
\author{Allen B. Downey}
34
\newcommand{\thetitle}{Think Python: How to Think Like a Computer Scientist}
35
\newcommand{\theversion}{2nd Edition, Version 2.4.0}
36
\newcommand{\thedate}{}
37

38
% these styles get translated in CSS for the HTML version
39
\newstyle{a:link}{color:black;}
40
\newstyle{p+p}{margin-top:1em;margin-bottom:1em}
41
\newstyle{img}{border:0px}
42

43
% change the arrows
44
\setlinkstext
45
  {\imgsrc[ALT="Previous"]{back.png}}
46
  {\imgsrc[ALT="Up"]{up.png}}
47
  {\imgsrc[ALT="Next"]{next.png}}
48

49
\makeindex
50

51
\newif\ifplastex
52
\plastexfalse
53

54
\begin{document}
55

56
\frontmatter
57

58
% PLASTEX ONLY
59
\ifplastex
60
    \usepackage{localdef}
61
    \maketitle
62

63
\newcount\anchorcnt
64
\newcommand*{\Anchor}[1]{%
65
  \@bsphack%
66
    \Hy@GlobalStepCount\anchorcnt%
67
    \edef\@currentHref{anchor.\the\anchorcnt}% 
68
    \Hy@raisedlink{\hyper@anchorstart{\@currentHref}\hyper@anchorend}% 
69
    \M@gettitle{}\label{#1}% 
70
    \@esphack%
71
}
72

73

74
\else
75
% skip the following for plastex
76

77
\newtheorem{exercise}{Exercise}[chapter]
78

79
% LATEXONLY
80

81
\input{latexonly}
82

83
\begin{latexonly}
84

85
\renewcommand{\blankpage}{\thispagestyle{empty} \quad \newpage}
86

87
%\blankpage
88
%\blankpage
89

90
% TITLE PAGES FOR LATEX VERSION
91

92
%-half title--------------------------------------------------
93
\thispagestyle{empty}
94

95
\begin{flushright}
96
\vspace*{2.0in}
97

98
\begin{spacing}{3}
99
{\huge Think Python}\\
100
{\Large How to Think Like a Computer Scientist}
101
\end{spacing}
102

103
\vspace{0.25in}
104

105
\theversion
106

107
\thedate
108

109
\vfill
110

111
\end{flushright}
112

113
%--verso------------------------------------------------------
114

115
\blankpage
116
\blankpage
117
%\clearemptydoublepage
118
%\pagebreak
119
%\thispagestyle{empty}
120
%\vspace*{6in}
121

122
%--title page--------------------------------------------------
123
\pagebreak
124
\thispagestyle{empty}
125

126
\begin{flushright}
127
\vspace*{2.0in}
128

129
\begin{spacing}{3}
130
{\huge Think Python}\\
131
{\Large How to Think Like a Computer Scientist}
132
\end{spacing}
133

134
\vspace{0.25in}
135

136
\theversion
137

138
\thedate
139

140
\vspace{1in}
141

142

143
{\Large
144
Allen Downey\\
145
}
146

147

148
\vspace{0.5in}
149

150
{\Large Green Tea Press}
151

152
{\small Needham, Massachusetts}
153

154
%\includegraphics[width=1in]{figs/logo1.pdf}
155
\vfill
156

157
\end{flushright}
158

159

160
%--copyright--------------------------------------------------
161
\pagebreak
162
\thispagestyle{empty}
163

164
{\small
165
Copyright \copyright ~2015 Allen Downey.
166

167

168
\vspace{0.2in}
169

170
\begin{flushleft}
171
Green Tea Press       \\
172
9 Washburn Ave        \\
173
Needham MA 02492
174
\end{flushleft}
175

176
Permission is granted to copy, distribute, and/or modify this document
177
under the terms of the Creative Commons Attribution-NonCommercial 3.0 Unported
178
License, which is available at \url{http://creativecommons.org/licenses/by-nc/3.0/}.
179

180
The original form of this book is \LaTeX\ source code.  Compiling this
181
\LaTeX\ source has the effect of generating a device-independent
182
representation of a textbook, which can be converted to other formats
183
and printed.
184

185
The \LaTeX\ source for this book is available from
186
\url{http://www.thinkpython2.com}
187

188
\vspace{0.2in}
189

190
} % end small
191

192
\end{latexonly}
193

194

195
% HTMLONLY
196

197
\begin{htmlonly}
198

199
% TITLE PAGE FOR HTML VERSION
200

201
{\Large \thetitle}
202

203
{\large Allen B. Downey}
204

205
\theversion
206

207
\thedate
208

209
\setcounter{chapter}{-1}
210

211
\end{htmlonly}
212

213
\fi
214
% END OF THE PART WE SKIP FOR PLASTEX
215

216

217
\chapter{Preface}
218

219
\section*{The strange history of this book}
220

221
In January 1999 I was preparing to teach an introductory programming
222
class in Java.  I had taught it three times and I was getting
223
frustrated.  The failure rate in the class was too high and, even for
224
students who succeeded, the overall level of achievement was too low.
225

226
One of the problems I saw was the books.  
227
They were too big, with too much unnecessary detail about Java, and
228
not enough high-level guidance about how to program.  And they all
229
suffered from the trap door effect: they would start out easy,
230
proceed gradually, and then somewhere around Chapter 5 the bottom would
231
fall out.  The students would get too much new material, too fast,
232
and I would spend the rest of the semester picking up the pieces.
233

234
Two weeks before the first day of classes, I decided to write my
235
own book.  My goals were:
236

237
\begin{itemize}
238

239
\item Keep it short.  It is better for students to read 10 pages
240
than not read 50 pages.
241

242
\item Be careful with vocabulary.  I tried to minimize jargon
243
and define each term at first use.
244

245
\item Build gradually. To avoid trap doors, I took the most difficult
246
topics and split them into a series of small steps. 
247

248
\item Focus on programming, not the programming language.  I included
249
the minimum useful subset of Java and left out the rest.
250

251
\end{itemize}
252

253
I needed a title, so on a whim I chose {\em How to Think Like
254
a Computer Scientist}.
255

256
My first version was rough, but it worked.  Students did the reading,
257
and they understood enough that I could spend class time on the hard
258
topics, the interesting topics and (most important) letting the
259
students practice.
260

261
I released the book under the GNU Free Documentation License,
262
which allows users to copy, modify, and distribute the book.
263
\index{GNU Free Documentation License}
264
\index{Free Documentation License, GNU}
265

266
What happened next is the cool part.  Jeff Elkner, a high school
267
teacher in Virginia, adopted my book and translated it into
268
Python.  He sent me a copy of his translation, and I had the
269
unusual experience of learning Python by reading my own book.
270
As Green Tea Press, I published the first Python version in 2001.
271
\index{Elkner, Jeff}
272

273
In 2003 I started teaching at Olin College and I got to teach
274
Python for the first time.  The contrast with Java was striking.
275
Students struggled less, learned more, worked on more interesting
276
projects, and generally had a lot more fun.
277
\index{Olin College}
278

279
Since then I've continued to develop the book,
280
correcting errors, improving some of the examples and
281
adding material, especially exercises.
282

283
The result is this book, now with the less grandiose title
284
{\em Think Python}.  Some of the changes are:
285

286
\begin{itemize}
287

288
\item I added a section about debugging at the end of each chapter.
289
  These sections present general techniques for finding and avoiding
290
  bugs, and warnings about Python pitfalls.
291

292
\item I added more exercises, ranging from short tests of
293
  understanding to a few substantial projects.  Most exercises
294
  include a link to my solution.
295

296
\item I added a series of case studies---longer examples with
297
  exercises, solutions, and discussion.
298
  
299
\item I expanded the discussion of program development plans
300
  and basic design patterns.
301

302
\item I added appendices about debugging and analysis of algorithms.
303

304
\end{itemize}
305

306
The second edition of {\em Think Python} has these new features:
307

308
\begin{itemize}
309

310
\item The book and all supporting code have been updated to Python 3.
311

312
\item I added a few sections, and more details on the web, to help
313
beginners get started running Python in a browser, so you don't have
314
to deal with installing Python until you want to.
315

316
\item For Chapter~\ref{turtle} I switched from my own turtle graphics
317
  package, called Swampy, to a more standard Python module, {\tt
318
    turtle}, which is easier to install and more powerful.
319

320
\item I added a new chapter called ``The Goodies'', which introduces
321
some additional Python features that are not strictly necessary, but
322
sometimes handy.
323

324
\end{itemize}
325

326
I hope you enjoy working with this book, and that it helps
327
you learn to program and think like
328
a computer scientist, at least a little bit.
329

330

331
Allen B. Downey \\
332

333
Olin College \\
334

335

336
\section*{Acknowledgments}
337

338
Many thanks to Jeff Elkner, who
339
translated my Java book into Python, which got this project
340
started and introduced me to what has turned out to be my
341
favorite language.
342
\index{Elkner, Jeff}
343

344
Thanks also to Chris Meyers, who contributed several sections
345
to {\em How to Think Like a Computer Scientist}.
346
\index{Meyers, Chris}
347

348
Thanks to the Free Software Foundation for developing
349
the GNU Free Documentation License, which helped make
350
my collaboration with Jeff and Chris possible, and Creative
351
Commons for the license I am using now.
352
\index{GNU Free Documentation License}
353
\index{Free Documentation License, GNU}
354
\index{Creative Commons}
355

356
Thanks to the editors at Lulu who worked on
357
{\em How to Think Like a Computer Scientist}.
358

359
Thanks to the editors at O'Reilly Media who worked on
360
{\em Think Python}.
361

362
Thanks to all the students who worked with earlier
363
versions of this book and all the contributors (listed
364
below) who sent in corrections and suggestions.
365

366

367
\section*{Contributor List}
368

369
\index{contributors}
370
More than 100 sharp-eyed and thoughtful readers have sent in
371
suggestions and corrections over the past few years.  Their
372
contributions, and enthusiasm for this project, have been a
373
huge help.
374

375
If you have a suggestion or correction, please send email to 
376
{\tt feedback@thinkpython.com}.  If I make a change based on your
377
feedback, I will add you to the contributor list
378
(unless you ask to be omitted).
379

380
If you include at least part of the sentence the
381
error appears in, that makes it easy for me to search.  Page and
382
section numbers are fine, too, but not quite as easy to work with.
383
Thanks!
384

385
\begin{itemize}
386

387
\small
388
\item Lloyd Hugh Allen sent in a correction to Section 8.4.
389

390
\item Yvon Boulianne sent in a correction of a semantic error in
391
Chapter 5.
392

393
\item Fred Bremmer submitted a correction in Section 2.1.
394

395
\item Jonah Cohen wrote the Perl scripts to convert the
396
LaTeX source for this book into beautiful HTML.
397

398
\item Michael Conlon sent in a grammar correction in Chapter 2
399
and an improvement in style in Chapter 1, and he initiated discussion
400
on the technical aspects of interpreters.
401

402
\item Beno\^{i}t Girard sent in a
403
correction to a humorous mistake in Section 5.6.
404

405
\item Courtney Gleason and Katherine Smith wrote {\tt horsebet.py},
406
which was used as a case study in an earlier version of the book.  Their
407
program can now be found on the website.
408

409
\item Lee Harr submitted more corrections than we have room to list
410
here, and indeed he should be listed as one of the principal editors
411
of the text.
412

413
\item James Kaylin is a student using the text. He has submitted
414
numerous corrections.
415

416
\item David Kershaw fixed the broken {\tt catTwice} function in Section
417
3.10.
418

419
\item Eddie Lam has sent in numerous corrections to Chapters 
420
1, 2, and 3.
421
He also fixed the Makefile so that it creates an index the first time it is
422
run and helped us set up a versioning scheme.  
423

424
\item Man-Yong Lee sent in a correction to the example code in
425
Section 2.4.  
426

427
\item David Mayo pointed out that the word ``unconsciously"
428
in Chapter 1 needed
429
to be changed to ``subconsciously".
430

431
\item Chris McAloon sent in several corrections to Sections 3.9 and
432
3.10.
433

434
\item Matthew J. Moelter has been a long-time contributor who sent
435
in numerous corrections and suggestions to the book.  
436

437
\item Simon Dicon Montford reported a missing function definition and
438
several typos in Chapter 3.  He also found errors in the {\tt increment}
439
function in Chapter 13.
440

441
\item John Ouzts corrected the definition of ``return value"
442
in Chapter 3.
443

444
\item Kevin Parks sent in valuable comments and suggestions as to how
445
to improve the distribution of the book.
446

447
\item David Pool sent in a typo in the glossary of Chapter 1, as well
448
as kind words of encouragement.
449

450
\item Michael Schmitt sent in a correction to the chapter on files
451
and exceptions.
452

453
\item Robin Shaw pointed out an error in Section 13.1, where the
454
printTime function was used in an example without being defined.
455

456
\item Paul Sleigh found an error in Chapter 7 and a bug in Jonah Cohen's
457
Perl script that generates HTML from LaTeX.
458

459
\item Craig T. Snydal is testing the text in a course at Drew
460
University.  He has contributed several valuable suggestions and corrections.
461

462
\item Ian Thomas and his students are using the text in a programming
463
course.  They are the first ones to test the chapters in the latter half
464
of the book, and they have made numerous corrections and suggestions.
465

466
\item Keith Verheyden sent in a correction in Chapter 3.
467

468
\item Peter Winstanley let us know about a longstanding error in
469
our Latin in Chapter 3.
470

471
\item Chris Wrobel made corrections to the code in the chapter on
472
file I/O and exceptions. 
473

474
\item Moshe Zadka has made invaluable contributions to this project.
475
In addition to writing the first draft of the chapter on Dictionaries, he
476
provided continual guidance in the early stages of the book.
477

478
\item Christoph Zwerschke sent several corrections and
479
pedagogic suggestions, and explained the difference between {\em gleich}
480
and {\em selbe}.
481

482
\item James Mayer sent us a whole slew of spelling and
483
typographical errors, including two in the contributor list.
484

485
\item Hayden McAfee caught a potentially confusing inconsistency
486
between two examples.
487

488
\item Angel Arnal is part of an international team of translators
489
working on the Spanish version of the text.  He has also found several
490
errors in the English version.
491

492
\item Tauhidul Hoque and Lex Berezhny created the illustrations
493
in Chapter 1 and improved many of the other illustrations.
494

495
\item Dr. Michele Alzetta caught an error in Chapter 8 and sent
496
some interesting pedagogic comments and suggestions about Fibonacci
497
and Old Maid.
498

499
\item Andy Mitchell caught a typo in Chapter 1 and a broken example
500
in Chapter 2.
501

502
\item Kalin Harvey suggested a clarification in Chapter 7 and
503
caught some typos.
504

505
\item Christopher P. Smith caught several typos and helped us
506
update the book for Python 2.2.
507

508
\item David Hutchins caught a typo in the Foreword.
509

510
\item Gregor Lingl is teaching Python at a high school in Vienna,
511
Austria.  He is working on a German translation of the book,
512
and he caught a couple of bad errors in Chapter 5.
513

514
\item Julie Peters caught a typo in the Preface.
515

516
\item Florin Oprina sent in an improvement in {\tt makeTime},
517
a correction in {\tt printTime}, and a nice typo.
518

519
\item D.~J.~Webre suggested a clarification in Chapter 3.
520

521
\item Ken found a fistful of errors in Chapters 8, 9 and 11.
522

523
\item Ivo Wever caught a typo in Chapter 5 and suggested a clarification
524
in Chapter 3.
525

526
\item Curtis Yanko suggested a clarification in Chapter 2.
527

528
\item Ben Logan sent in a number of typos and problems with translating
529
the book into HTML.
530

531
\item Jason Armstrong saw the missing word in Chapter 2.
532

533
\item Louis Cordier noticed a spot in Chapter 16 where the code
534
didn't match the text.
535

536
\item Brian Cain suggested several clarifications in Chapters 2 and 3.
537

538
\item Rob Black sent in a passel of corrections, including some
539
changes for Python 2.2.
540

541
\item Jean-Philippe Rey at \'{E}cole Centrale
542
Paris sent a number of patches, including some updates for Python 2.2
543
and other thoughtful improvements.
544

545
\item Jason Mader at George Washington University made a number
546
of useful suggestions and corrections.
547

548
\item Jan Gundtofte-Bruun reminded us that ``a error'' is an error.
549

550
\item Abel David and Alexis Dinno reminded us that the plural of
551
``matrix'' is ``matrices'', not ``matrixes''.  This error was in the
552
book for years, but two readers with the same initials reported it on
553
the same day.  Weird.
554

555
\item Charles Thayer encouraged us to get rid of the semi-colons
556
we had put at the ends of some statements and to clean up our
557
use of ``argument'' and ``parameter''.
558

559
\item Roger Sperberg pointed out a twisted piece of logic in Chapter 3.
560

561
\item Sam Bull pointed out a confusing paragraph in Chapter 2.
562

563
\item Andrew Cheung pointed out two instances of ``use before def''.
564

565
\item C. Corey Capel spotted the missing word in the Third Theorem
566
of Debugging and a typo in Chapter 4.
567

568
\item Alessandra helped clear up some Turtle confusion.
569

570
\item Wim Champagne found a brain-o in a dictionary example.
571

572
\item Douglas Wright pointed out a problem with floor division in
573
{\tt arc}.
574

575
\item Jared Spindor found some jetsam at the end of a sentence.
576

577
\item Lin Peiheng sent a number of very helpful suggestions.
578

579
\item Ray Hagtvedt sent in two errors and a not-quite-error.
580

581
\item Torsten H\"{u}bsch pointed out an inconsistency in Swampy.
582

583
\item Inga Petuhhov corrected an example in Chapter 14.
584

585
\item Arne Babenhauserheide sent several helpful corrections.
586

587
\item Mark E. Casida is is good at spotting repeated words.
588

589
\item Scott Tyler filled in a that was missing.  And then sent in
590
a heap of corrections.
591

592
\item Gordon Shephard sent in several corrections, all in separate
593
emails.
594

595
\item Andrew Turner {\tt spot}ted an error in Chapter 8.
596

597
\item Adam Hobart fixed a problem with floor division in {\tt arc}.
598

599
\item Daryl Hammond and Sarah Zimmerman pointed out that I served
600
up {\tt math.pi} too early.  And Zim spotted a typo.
601

602
\item George Sass found a bug in a Debugging section.
603

604
\item Brian Bingham suggested Exercise~\ref{exrotatepairs}.
605

606
\item Leah Engelbert-Fenton pointed out that I used {\tt tuple}
607
as a variable name, contrary to my own advice.  And then found
608
a bunch of typos and a ``use before def''.
609

610
\item Joe Funke spotted a typo.
611

612
\item Chao-chao Chen found an inconsistency in the Fibonacci example.
613

614
\item Jeff Paine knows the difference between space and spam.
615

616
\item Lubos Pintes sent in a typo.
617

618
\item Gregg Lind and Abigail Heithoff suggested Exercise~\ref{checksum}.
619

620
\item Max Hailperin has sent in a number of corrections and
621
  suggestions.  Max is one of the authors of the extraordinary {\em
622
    Concrete Abstractions}, which you might want to read when you are
623
  done with this book.
624

625
\item Chotipat Pornavalai found an error in an error message.
626

627
\item Stanislaw Antol sent a list of very helpful suggestions.
628

629
\item Eric Pashman sent a number of corrections for Chapters 4--11.
630

631
\item Miguel Azevedo found some typos.
632

633
\item Jianhua Liu sent in a long list of corrections.
634

635
\item Nick King found a missing word.
636

637
\item Martin Zuther sent a long list of suggestions.
638

639
\item Adam Zimmerman found an inconsistency in my instance
640
of an ``instance'' and several other errors.
641

642
\item Ratnakar Tiwari suggested a footnote explaining degenerate
643
triangles.
644

645
\item Anurag Goel suggested another solution for \verb"is_abecedarian"
646
and sent some additional corrections.  And he knows how to
647
spell Jane Austen.
648

649
\item Kelli Kratzer spotted one of the typos.
650

651
\item Mark Griffiths pointed out a confusing example in Chapter 3.
652

653
\item Roydan Ongie found an error in my Newton's method.
654

655
\item Patryk Wolowiec helped me with a problem in the HTML version.
656

657
\item Mark Chonofsky told me about a new keyword in Python 3.
658

659
\item Russell Coleman helped me with my geometry.
660

661
\item Nam Nguyen found a typo and pointed out that I used the Decorator
662
pattern but didn't mention it by name.
663

664
\item St\'{e}phane Morin sent in several corrections and suggestions.
665

666
\item Paul Stoop corrected a typo in \verb+uses_only+.
667

668
\item Eric Bronner pointed out a confusion in the discussion of the
669
order of operations.
670

671
\item Alexandros Gezerlis set a new standard for the number and
672
quality of suggestions he submitted.  We are deeply grateful!
673

674
\item Gray Thomas knows his right from his left.
675

676
\item Giovanni Escobar Sosa sent a long list of corrections and
677
suggestions.
678

679
\item Daniel Neilson corrected an error about the order of operations.
680

681
\item Will McGinnis pointed out that {\tt polyline} was defined
682
differently in two places.
683

684
\item Frank Hecker pointed out an exercise that was under-specified, and
685
some broken links.
686

687
\item Animesh B helped me clean up a confusing example.
688

689
\item Martin Caspersen found two round-off errors.
690

691
\item Gregor Ulm sent several corrections and suggestions.
692

693
\item Dimitrios Tsirigkas suggested I clarify an exercise.
694

695
\item Carlos Tafur sent a page of corrections and suggestions.
696

697
\item Martin Nordsletten found a bug in an exercise solution.
698

699
\item Sven Hoexter pointed out that a variable named {\tt input}
700
shadows a build-in function.
701

702
\item Stephen Gregory pointed out the problem with {\tt cmp}
703
in Python 3.
704

705
\item Ishwar Bhat corrected my statement of Fermat's last theorem.
706

707
\item Andrea Zanella translated the book into Italian, and sent a
708
number of corrections along the way.
709

710
\item Many, many thanks to Melissa Lewis and Luciano Ramalho for
711
  excellent comments and suggestions on the second edition.
712

713
\item Thanks to Harry Percival from PythonAnywhere for his help
714
getting people started running Python in a browser.
715

716
\item Xavier Van Aubel made several useful corrections in the second
717
edition.
718

719
\item William Murray corrected my definition of floor division.
720

721
\item Per Starb{\"a}ck brought me up to date on universal newlines in Python 3. 
722

723
\item Laurent Rosenfeld and Mihaela Rotaru translated this book into French.  Along the way, they sent many corrections and suggestions.
724

725
% ENDCONTRIB
726

727
In addition, people who spotted typos or made corrections include
728
Czeslaw Czapla, Dale Wilson, Francesco Carlo Cimini,
729
Richard Fursa, Brian McGhie, Lokesh Kumar Makani, Matthew Shultz, Viet
730
Le, Victor Simeone, Lars O.D. Christensen, Swarup Sahoo, Alix Etienne,
731
Kuang He, Wei Huang, Karen Barber, and Eric Ransom.
732

733

734

735

736
\end{itemize}
737

738
\normalsize
739
\clearemptydoublepage
740

741
% TABLE OF CONTENTS
742
\begin{latexonly}
743

744
\tableofcontents
745

746
\clearemptydoublepage
747

748
\end{latexonly}
749

750
% START THE BOOK
751
\mainmatter
752

753
\chapter{The way of the program}
754

755
The goal of this book is to teach you to think like a computer
756
scientist.  This way of thinking combines some of the best features of
757
mathematics, engineering, and natural science.  Like mathematicians,
758
computer scientists use formal languages to denote ideas (specifically
759
computations).  Like engineers, they design things, assembling
760
components into systems and evaluating tradeoffs among alternatives.
761
Like scientists, they observe the behavior of complex systems, form
762
hypotheses, and test predictions.  \index{problem solving}
763

764
The single most important skill for a computer scientist is {\bf
765
  problem solving}.  Problem solving means the ability to formulate
766
problems, think creatively about solutions, and express a solution
767
clearly and accurately.  As it turns out, the process of learning to
768
program is an excellent opportunity to practice problem-solving
769
skills.  That's why this chapter is called, ``The way of the
770
program''.
771

772
On one level, you will be learning to program, a useful skill by
773
itself.  On another level, you will use programming as a means to an
774
end.  As we go along, that end will become clearer.
775

776

777
\section{What is a program?}
778

779
A {\bf program} is a sequence of instructions that specifies how to
780
perform a computation.  The computation might be something
781
mathematical, such as solving a system of equations or finding the
782
roots of a polynomial, but it can also be a symbolic computation, such
783
as searching and replacing text in a document or something
784
graphical, like processing an image or playing a video.
785
\index{program}
786

787
The details look different in different languages, but a few basic
788
instructions appear in just about every language:
789

790
\begin{description}
791

792
\item[input:] Get data from the keyboard, a file, the network, or some
793
other device.
794

795
\item[output:] Display data on the screen, save it in a
796
file, send it over the network, etc.
797

798
\item[math:] Perform basic mathematical operations like addition and
799
multiplication.
800

801
\item[conditional execution:] Check for certain conditions and
802
run the appropriate code.
803

804
\item[repetition:] Perform some action repeatedly, usually with
805
some variation.
806

807
\end{description}
808

809
Believe it or not, that's pretty much all there is to it.  Every
810
program you've ever used, no matter how complicated, is made up of
811
instructions that look pretty much like these.  So you can think of
812
programming as the process of breaking a large, complex task
813
into smaller and smaller subtasks until the subtasks are
814
simple enough to be performed with one of these basic instructions.
815

816

817
\section{Running Python}
818

819
One of the challenges of getting started with Python is that you
820
might have to install Python and related software on your computer.
821
If you are familiar with your operating system, and especially
822
if you are comfortable with the command-line interface, you will
823
have no trouble installing Python.  But for beginners, it can be
824
painful to learn about system administration and programming at the
825
same time.
826
\index{running Python}
827
\index{Python!running}
828

829
To avoid that problem, I recommend that you start out running Python
830
in a browser.  Later, when you are comfortable with Python, I'll
831
make suggestions for installing Python on your computer.
832
\index{Python in a browser}
833

834
There are a number of web pages you can use to run Python.  If you
835
already have a favorite, go ahead and use it.  Otherwise I recommend
836
PythonAnywhere.  I provide detailed instructions for getting started
837
at \url{http://tinyurl.com/thinkpython2e}.  
838
\index{PythonAnywhere}
839

840
There are two versions of Python, called Python 2 and Python 3.
841
They are very similar, so if you learn one, it is easy to switch
842
to the other.  In fact, there are only a few differences you will
843
encounter as a beginner.
844
This book is written for Python 3, but I include some notes
845
about Python 2.
846
\index{Python 2}
847

848
The Python {\bf interpreter} is a program that reads and executes
849
Python code.  Depending on your environment, you might start the
850
interpreter by clicking on an icon, or by typing {\tt python} on
851
a command line. 
852
When it starts, you should see output like this:
853
\index{interpreter}
854

855
\begin{verbatim}
856
Python 3.4.0 (default, Jun 19 2015, 14:20:21) 
857
[GCC 4.8.2] on linux
858
Type "help", "copyright", "credits" or "license" for more information.
859
>>> 
860
\end{verbatim}
861
%
862
The first three lines contain information about the interpreter
863
and the operating system it's running on, so it might be different for
864
you.  But you should check that the version number, which is
865
{\tt 3.4.0} in this example, begins with 3, which indicates that
866
you are running Python 3.  If it begins with 2, you are running
867
(you guessed it) Python 2.
868

869
The last line is a {\bf prompt} that indicates that the interpreter is
870
ready for you to enter code.
871
If you type a line of code and hit Enter, the interpreter displays the
872
result: 
873
\index{prompt}
874

875
\begin{verbatim}
876
>>> 1 + 1
877
2
878
\end{verbatim}
879
%
880
Now you're ready to get started.
881
From here on, I assume that you know how to start the Python
882
interpreter and run code.
883

884

885
\section{The first program}
886
\label{hello}
887
\index{Hello, World}
888

889
Traditionally, the first program you write in a new language
890
is called ``Hello, World!'' because all it does is display the
891
words ``Hello, World!''.  In Python, it looks like this:
892

893
\begin{verbatim}
894
>>> print('Hello, World!')
895
\end{verbatim}
896
%
897
This is an example of a {\bf print statement}, although it
898
doesn't actually print anything on paper.  It displays a result on the
899
screen.  In this case, the result is the words
900

901
\begin{verbatim}
902
Hello, World!
903
\end{verbatim}
904
%
905
The quotation marks in the program mark the beginning and end
906
of the text to be displayed; they don't appear in the result.
907
\index{quotation mark}
908
\index{print statement}
909
\index{statement!print}
910

911
The parentheses indicate that {\tt print} is a function.  We'll get
912
to functions in Chapter~\ref{funcchap}.
913
\index{function} \index{print function}
914

915
In Python 2, the print statement is slightly different; it is not
916
a function, so it doesn't use parentheses.
917
\index{Python 2}
918

919
\begin{verbatim}
920
>>> print 'Hello, World!'
921
\end{verbatim}
922
%
923
This distinction will make more sense soon, but that's enough to
924
get started.
925

926

927
\section{Arithmetic operators}
928
\index{operator!arithmetic}
929
\index{arithmetic operator}
930

931
After ``Hello, World'', the next step is arithmetic.  Python provides
932
{\bf operators}, which are special symbols that represent computations
933
like addition and multiplication.  
934

935
The operators {\tt +}, {\tt -}, and {\tt *} perform addition,
936
subtraction, and multiplication, as in the following examples:
937

938
\begin{verbatim}
939
>>> 40 + 2
940
42
941
>>> 43 - 1
942
42
943
>>> 6 * 7
944
42
945
\end{verbatim}
946
%
947
The operator {\tt /} performs division:
948

949
\begin{verbatim}
950
>>> 84 / 2
951
42.0
952
\end{verbatim}
953
%
954
You might wonder why the result is {\tt 42.0} instead of {\tt 42}.
955
I'll explain in the next section.
956

957
Finally, the operator {\tt **} performs exponentiation; that is,
958
it raises a number to a power:
959

960
\begin{verbatim}
961
>>> 6**2 + 6
962
42
963
\end{verbatim}
964
%
965
In some other languages, \verb"^" is used for exponentiation, but
966
in Python it is a bitwise operator called XOR.  If you are not
967
familiar with bitwise operators, the result will surprise you:
968

969
\begin{verbatim}
970
>>> 6 ^ 2
971
4
972
\end{verbatim}
973
%
974
I won't cover
975
bitwise operators in this book, but you can read about
976
them at \url{http://wiki.python.org/moin/BitwiseOperators}.
977
\index{bitwise operator}
978
\index{operator!bitwise}
979

980

981
\section{Values and types}
982
\index{value}
983
\index{type}
984
\index{string}
985

986
A {\bf value} is one of the basic things a program works with, like a
987
letter or a number.  Some values we have seen so far are {\tt 2},
988
{\tt 42.0}, and \verb"'Hello, World!'".
989

990
These values belong to different {\bf types}:
991
{\tt 2} is an {\bf integer}, {\tt 42.0} is a {\bf floating-point number},
992
and \verb"'Hello, World!'" is a {\bf string},
993
so-called because the letters it contains are strung together.
994
\index{integer}
995
\index{floating-point}
996

997
If you are not sure what type a value has, the interpreter can
998
tell you:
999

1000
\begin{verbatim}
1001
>>> type(2)
1002
<class 'int'>
1003
>>> type(42.0)
1004
<class 'float'>
1005
>>> type('Hello, World!')
1006
<class 'str'>
1007
\end{verbatim}
1008
%
1009
In these results, the word ``class'' is used in the sense of
1010
a category; a type is a category of values.
1011
\index{class}
1012

1013
Not surprisingly, integers belong to the type {\tt int},
1014
strings belong to {\tt str} and floating-point
1015
numbers belong to {\tt float}.  
1016
\index{type}
1017
\index{string type}
1018
\index{type!str}
1019
\index{int type}
1020
\index{type!int}
1021
\index{float type}
1022
\index{type!float}
1023

1024
What about values like \verb"'2'" and \verb"'42.0'"?
1025
They look like numbers, but they are in quotation marks like
1026
strings.
1027
\index{quotation mark}
1028

1029
\begin{verbatim}
1030
>>> type('2')
1031
<class 'str'>
1032
>>> type('42.0')
1033
<class 'str'>
1034
\end{verbatim}
1035
%
1036
They're strings.
1037

1038
When you type a large integer, you might be tempted to use commas
1039
between groups of digits, as in {\tt 1,000,000}.  This is not a
1040
legal {\em integer} in Python, but it is legal:
1041

1042
\begin{verbatim}
1043
>>> 1,000,000
1044
(1, 0, 0)
1045
\end{verbatim}
1046
%
1047
That's not what we expected at all!  Python interprets {\tt
1048
  1,000,000} as a comma-separated sequence of integers.  We'll learn
1049
more about this kind of sequence later.
1050
\index{sequence}
1051

1052
%This is the first example we have seen of a semantic error: the code
1053
%runs without producing an error message, but it doesn't do the
1054
%``right'' thing.
1055
%\index{semantic error}
1056
%\index{error!semantic}
1057
%\index{error message}
1058
% TODO: use this as an example of a semantic error later
1059

1060

1061

1062
\section{Formal and natural languages}
1063
\index{formal language}
1064
\index{natural language}
1065
\index{language!formal}
1066
\index{language!natural}
1067

1068
{\bf Natural languages} are the languages people speak,
1069
such as English, Spanish, and French.  They were not designed
1070
by people (although people try to impose some order on them);
1071
they evolved naturally.
1072

1073
{\bf Formal languages} are languages that are designed by people for
1074
specific applications.  For example, the notation that mathematicians
1075
use is a formal language that is particularly good at denoting
1076
relationships among numbers and symbols.  Chemists use a formal
1077
language to represent the chemical structure of molecules.  And
1078
most importantly:
1079

1080
\begin{quote}
1081
{\bf Programming languages are formal languages that have been
1082
designed to express computations.}
1083
\end{quote}
1084

1085
Formal languages tend to have strict {\bf syntax} rules that
1086
govern the structure of statements.
1087
For example, in mathematics the statement
1088
$3 + 3 = 6$ has correct syntax, but
1089
$3 + = 3 \$ 6$ does not.  In chemistry
1090
$H_2O$ is a syntactically correct formula, but $_2Zz$ is not.
1091
\index{syntax}
1092

1093
Syntax rules come in two flavors, pertaining to {\bf tokens} and
1094
structure.  Tokens are the basic elements of the language, such as
1095
words, numbers, and chemical elements.  One of the problems with
1096
$3 += 3 \$ 6$ is that \( \$ \) is not a legal token in mathematics
1097
(at least as far as I know).  Similarly, $_2Zz$ is not legal because
1098
there is no element with the abbreviation $Zz$.
1099
\index{token}
1100
\index{structure}
1101

1102
The second type of syntax rule pertains to the way tokens are
1103
combined.  The equation $3 +/ 3$ is illegal because even though $+$
1104
and $/$ are legal tokens, you can't have one right after the other.
1105
Similarly, in a chemical formula the subscript comes after the element
1106
name, not before.
1107

1108
This is @ well-structured Engli\$h
1109
sentence with invalid t*kens in it.  This sentence all valid tokens
1110
has, but invalid structure with.
1111

1112
When you read a sentence in English or a statement in a formal
1113
language, you have to figure out the structure
1114
(although in a natural language you do this subconsciously).  This
1115
process is called {\bf parsing}.
1116
\index{parse}
1117

1118
Although formal and natural languages have many features in
1119
common---tokens, structure, and syntax---there are some
1120
differences:
1121
\index{ambiguity}
1122
\index{redundancy}
1123
\index{literalness}
1124

1125
\begin{description}
1126

1127
\item[ambiguity:] Natural languages are full of ambiguity, which
1128
people deal with by using contextual clues and other information.
1129
Formal languages are designed to be nearly or completely unambiguous,
1130
which means that any statement has exactly one meaning,
1131
regardless of context.
1132

1133
\item[redundancy:] In order to make up for ambiguity and reduce
1134
misunderstandings, natural languages employ lots of
1135
redundancy.  As a result, they are often verbose.  Formal languages
1136
are less redundant and more concise.
1137

1138
\item[literalness:] Natural languages are full of idiom and metaphor.
1139
If I say, ``The penny dropped'', there is probably no penny and
1140
nothing dropping (this idiom means that someone understood something
1141
after a period of confusion).  Formal languages
1142
mean exactly what they say.
1143

1144
\end{description}
1145

1146
Because we all grow up speaking natural languages, it is sometimes
1147
hard to adjust to formal languages.  The difference between formal and
1148
natural language is like the difference between poetry and prose, but
1149
more so: \index{poetry} \index{prose}
1150

1151
\begin{description}
1152

1153
\item[Poetry:] Words are used for their sounds as well as for
1154
their meaning, and the whole poem together creates an effect or
1155
emotional response.  Ambiguity is not only common but often
1156
deliberate.
1157

1158
\item[Prose:] The literal meaning of words is more important,
1159
and the structure contributes more meaning.  Prose is more amenable to
1160
analysis than poetry but still often ambiguous.
1161

1162
\item[Programs:] The meaning of a computer program is unambiguous
1163
and literal, and can be understood entirely by analysis of the
1164
tokens and structure.
1165

1166
\end{description}
1167

1168
Formal languages are more dense
1169
than natural languages, so it takes longer to read them.  Also, the
1170
structure is important, so it is not always best to read
1171
from top to bottom, left to right.  Instead, learn to parse the
1172
program in your head, identifying the tokens and interpreting the
1173
structure.  Finally, the details matter.  Small errors in
1174
spelling and punctuation, which you can get away
1175
with in natural languages, can make a big difference in a formal
1176
language.
1177

1178

1179
\section{Debugging}
1180
\index{debugging}
1181

1182
Programmers make mistakes.  For whimsical reasons, programming errors
1183
are called {\bf bugs} and the process of tracking them down is called
1184
{\bf debugging}.
1185
\index{debugging}
1186
\index{bug}
1187

1188
Programming, and especially debugging, sometimes brings out strong
1189
emotions.  If you are struggling with a difficult bug, you might 
1190
feel angry, despondent, or embarrassed.
1191

1192
There is evidence that people naturally respond to computers as if
1193
they were people.  When they work well, we think
1194
of them as teammates, and when they are obstinate or rude, we
1195
respond to them the same way we respond to rude,
1196
obstinate people (Reeves and Nass, {\it The Media
1197
    Equation: How People Treat Computers, Television, and New Media
1198
    Like Real People and Places}).
1199
\index{debugging!emotional response}
1200
\index{emotional debugging}
1201

1202
Preparing for these reactions might help you deal with them.
1203
One approach is to think of the computer as an employee with
1204
certain strengths, like speed and precision, and
1205
particular weaknesses, like lack of empathy and inability
1206
to grasp the big picture.
1207

1208
Your job is to be a good manager: find ways to take advantage
1209
of the strengths and mitigate the weaknesses.  And find ways
1210
to use your emotions to engage with the problem,
1211
without letting your reactions interfere with your ability
1212
to work effectively.
1213

1214
Learning to debug can be frustrating, but it is a valuable skill
1215
that is useful for many activities beyond programming.  At the
1216
end of each chapter there is a section, like this one,
1217
with my suggestions for debugging.  I hope they help!
1218

1219

1220
\section{Glossary}
1221

1222
\begin{description}
1223

1224
\item[problem solving:]  The process of formulating a problem, finding
1225
a solution, and expressing it.
1226
\index{problem solving}
1227

1228
\item[high-level language:]  A programming language like Python that
1229
is designed to be easy for humans to read and write.
1230
\index{high-level language}
1231

1232
\item[low-level language:]  A programming language that is designed
1233
to be easy for a computer to run; also called ``machine language'' or
1234
``assembly language''.
1235
\index{low-level language}
1236

1237
\item[portability:]  A property of a program that can run on more
1238
than one kind of computer.
1239
\index{portability}
1240

1241
\item[interpreter:]  A program that reads another program and executes
1242
it
1243
\index{interpret}
1244

1245
\item[prompt:] Characters displayed by the interpreter to indicate
1246
that it is ready to take input from the user.
1247
\index{prompt}
1248

1249
\item[program:] A set of instructions that specifies a computation.
1250
\index{program}
1251

1252
\item[print statement:]  An instruction that causes the Python
1253
interpreter to display a value on the screen.
1254
\index{print statement}
1255
\index{statement!print}
1256

1257
\item[operator:]  A special symbol that represents a simple computation like
1258
addition, multiplication, or string concatenation.
1259
\index{operator}
1260

1261
\item[value:]  One of the basic units of data, like a number or string, 
1262
that a program manipulates.
1263
\index{value}
1264

1265
\item[type:] A category of values.  The types we have seen so far
1266
are integers (type {\tt int}), floating-point numbers (type {\tt
1267
float}), and strings (type {\tt str}).
1268
\index{type}
1269

1270
\item[integer:] A type that represents whole numbers.
1271
\index{integer}
1272

1273
\item[floating-point:] A type that represents numbers with fractional
1274
parts.
1275
\index{floating-point}
1276

1277
\item[string:] A type that represents sequences of characters.
1278
\index{string}
1279

1280
\item[natural language:]  Any one of the languages that people speak that
1281
evolved naturally.
1282
\index{natural language}
1283

1284
\item[formal language:]  Any one of the languages that people have designed
1285
for specific purposes, such as representing mathematical ideas or
1286
computer programs; all programming languages are formal languages.
1287
\index{formal language}
1288

1289
\item[token:]  One of the basic elements of the syntactic structure of
1290
a program, analogous to a word in a natural language.
1291
\index{token}
1292

1293
\item[syntax:] The rules that govern the structure of a program.
1294
\index{syntax}
1295

1296
\item[parse:] To examine a program and analyze the syntactic structure.
1297
\index{parse}
1298

1299
\item[bug:] An error in a program.
1300
\index{bug}
1301

1302
\item[debugging:] The process of finding and correcting bugs.
1303
\index{debugging}
1304

1305
\end{description}
1306

1307

1308
\section{Exercises}
1309

1310
\begin{exercise}
1311

1312
It is a good idea to read this book in front of a computer so you can
1313
try out the examples as you go.
1314

1315
Whenever you are experimenting with a new feature, you should try
1316
to make mistakes.  For example, in the ``Hello, world!'' program,
1317
what happens if you leave out one of the quotation marks?  What
1318
if you leave out both?  What if you spell {\tt print} wrong?
1319
\index{error message}
1320

1321
This kind of experiment helps you remember what you read; it also
1322
helps when you are programming, because you get to know what the error
1323
messages mean.  It is better to make mistakes now and on purpose than
1324
later and accidentally.
1325

1326
\begin{enumerate}
1327

1328
\item In a print statement, what happens if you leave out one
1329
of the parentheses, or both?
1330

1331
\item If you are trying to print a string, what happens if you
1332
leave out one of the quotation marks, or both?
1333

1334
\item You can use a minus sign to make a negative number like
1335
{\tt -2}.  What happens if you put a plus sign before a number?
1336
What about {\tt 2++2}?
1337

1338
\item In math notation, leading zeros are ok, as in {\tt 09}.
1339
What happens if you try this in Python?  What about {\tt 011}?
1340

1341
\item What happens if you have two values with no operator
1342
between them?
1343

1344
\end{enumerate}
1345

1346
\end{exercise}
1347

1348

1349

1350
\begin{exercise}
1351

1352
Start the Python interpreter and use it as a calculator.
1353

1354
\begin{enumerate}
1355

1356
\item How many seconds are there in 42 minutes 42 seconds?
1357

1358
\item How many miles are there in 10 kilometers?  Hint: there are 1.61
1359
  kilometers in a mile.
1360

1361
\item If you run a 10 kilometer race in 42 minutes 42 seconds, what is
1362
  your average pace (time per mile in minutes and seconds)?  What is
1363
  your average speed in miles per hour?
1364

1365
\index{calculator}
1366
\index{running pace}
1367

1368
\end{enumerate}
1369

1370
\end{exercise}
1371

1372

1373

1374

1375
\chapter{Variables, expressions and statements}
1376

1377
One of the most powerful features of a programming language is the
1378
ability to manipulate {\bf variables}.  A variable is a name that
1379
refers to a value.
1380
\index{variable}
1381

1382

1383
\section{Assignment statements}
1384
\label{variables}
1385
\index{assignment statement}
1386
\index{statement!assignment}
1387

1388
An {\bf assignment statement} creates a new variable and gives
1389
it a value:
1390

1391
\begin{verbatim}
1392
>>> message = 'And now for something completely different'
1393
>>> n = 17
1394
>>> pi = 3.1415926535897932
1395
\end{verbatim}
1396
%
1397
This example makes three assignments.  The first assigns a string
1398
to a new variable named {\tt message};
1399
the second gives the integer {\tt 17} to {\tt n}; the third
1400
assigns the (approximate) value of $\pi$ to {\tt pi}.
1401
\index{state diagram}
1402
\index{diagram!state}
1403

1404
A common way to represent variables on paper is to write the name with
1405
an arrow pointing to its value.  This kind of figure is
1406
called a {\bf state diagram} because it shows what state each of the
1407
variables is in (think of it as the variable's state of mind).
1408
Figure~\ref{fig.state2} shows the result of the previous example.
1409

1410
\begin{figure}
1411
\centerline
1412
{\includegraphics[scale=0.8]{figs/state2.pdf}}
1413
\caption{State diagram.}
1414
\label{fig.state2}
1415
\end{figure}
1416

1417

1418

1419
\section{Variable names}
1420
\index{variable}
1421

1422
Programmers generally choose names for their variables that
1423
are meaningful---they document what the variable is used for.
1424

1425
Variable names can be as long as you like.  They can contain
1426
both letters and numbers, but they can't begin with a number.
1427
It is legal to use uppercase letters, but it is conventional
1428
to use only lower case for variables names.
1429

1430
The underscore character, \verb"_", can appear in a name.
1431
It is often used in names with multiple words, such as
1432
\verb"your_name" or \verb"airspeed_of_unladen_swallow".
1433
\index{underscore character}
1434

1435
If you give a variable an illegal name, you get a syntax error:
1436

1437
\begin{verbatim}
1438
>>> 76trombones = 'big parade'
1439
SyntaxError: invalid syntax
1440
>>> more@ = 1000000
1441
SyntaxError: invalid syntax
1442
>>> class = 'Advanced Theoretical Zymurgy'
1443
SyntaxError: invalid syntax
1444
\end{verbatim}
1445
%
1446
{\tt 76trombones} is illegal because it begins with a number.
1447
{\tt more@} is illegal because it contains an illegal character, {\tt
1448
@}.  But what's wrong with {\tt class}?
1449

1450
It turns out that {\tt class} is one of Python's {\bf keywords}.  The
1451
interpreter uses keywords to recognize the structure of the program,
1452
and they cannot be used as variable names.
1453
\index{keyword}
1454

1455
Python 3 has these keywords:
1456

1457
\begin{verbatim}
1458
False      class      finally    is         return
1459
None       continue   for        lambda     try
1460
True       def        from       nonlocal   while
1461
and        del        global     not        with
1462
as         elif       if         or         yield
1463
assert     else       import     pass
1464
break      except     in         raise
1465
\end{verbatim}
1466
%
1467
You don't have to memorize this list.  In most development environments,
1468
keywords are displayed in a different color; if you try to use one
1469
as a variable name, you'll know.
1470

1471

1472
\section{Expressions and statements}
1473

1474
An {\bf expression} is a combination of values, variables, and operators.
1475
A value all by itself is considered an expression, and so is
1476
a variable, so the following are all legal expressions:
1477
\index{expression}
1478

1479
\begin{verbatim}
1480
>>> 42
1481
42
1482
>>> n
1483
17
1484
>>> n + 25
1485
42
1486
\end{verbatim}
1487
%
1488
When you type an expression at the prompt, the interpreter
1489
{\bf evaluates} it, which means that it finds the value of
1490
the expression.
1491
In this example, {\tt n} has the value 17 and
1492
{\tt n + 25} has the value 42.
1493
\index{evaluate}
1494

1495
A {\bf statement} is a unit of code that has an effect, like
1496
creating a variable or displaying a value.  
1497
\index{statement}
1498

1499
\begin{verbatim}
1500
>>> n = 17
1501
>>> print(n)
1502
\end{verbatim}
1503
%
1504
The first line is an assignment statement that gives a value to
1505
{\tt n}.  The second line is a print statement that displays the
1506
value of {\tt n}.
1507

1508
When you type a statement, the interpreter {\bf executes} it,
1509
which means that it does whatever the statement says.  In general,
1510
statements don't have values.
1511
\index{execute}
1512

1513

1514
\section{Script mode}
1515

1516
So far we have run Python in {\bf interactive mode}, which
1517
means that you interact directly with the interpreter.
1518
Interactive mode is a good way to get started,
1519
but if you are working with more than a few lines of code, it can be
1520
clumsy.
1521
\index{interactive mode}
1522

1523
The alternative is to save code in a file called a {\bf script} and
1524
then run the interpreter in {\bf script mode} to execute the script.  By
1525
convention, Python scripts have names that end with {\tt .py}.
1526
\index{script}
1527
\index{script mode}
1528

1529
If you know how to create and run a script on your computer, you
1530
are ready to go.  Otherwise I recommend using PythonAnywhere again.
1531
I have posted instructions for running in script mode at
1532
\url{http://tinyurl.com/thinkpython2e}.
1533

1534
Because Python provides both modes,
1535
you can test bits of code in interactive mode before you put them
1536
in a script.  But there are differences between interactive mode
1537
and script mode that can be confusing.
1538
\index{interactive mode}
1539
\index{script mode}
1540

1541
For example, if you are using Python as a calculator, you might type
1542

1543
\begin{verbatim}
1544
>>> miles = 26.2
1545
>>> miles * 1.61
1546
42.182
1547
\end{verbatim}
1548

1549
The first line assigns a value to {\tt miles}, but it has no visible
1550
effect.  The second line is an expression, so the
1551
interpreter evaluates it and displays the result.  It turns out that a
1552
marathon is about 42 kilometers.
1553

1554
But if you type the same code into a script and run it, you get no
1555
output at all.  
1556
In script mode an expression, all by itself, has no
1557
visible effect.  Python evaluates the expression, but it doesn't
1558
display the result.
1559
To display the result, you need a {\tt print} statement like this:
1560

1561
\begin{verbatim}
1562
miles = 26.2
1563
print(miles * 1.61)
1564
\end{verbatim}
1565

1566
This behavior can be confusing at first.
1567
To check your understanding, type the following statements in the
1568
Python interpreter and see what they do:
1569

1570
\begin{verbatim}
1571
5
1572
x = 5
1573
x + 1
1574
\end{verbatim}
1575

1576
Now put the same statements in a script and run it.  What
1577
is the output?  Modify the script by transforming each
1578
expression into a print statement and then run it again.
1579

1580

1581
\section{Order of operations}
1582
\index{order of operations}
1583
\index{PEMDAS}
1584

1585
When an expression contains more than one operator, the order of
1586
evaluation depends on the {\bf order of operations}.  For
1587
mathematical operators, Python follows mathematical convention.
1588
The acronym {\bf PEMDAS} is a useful way to
1589
remember the rules:
1590

1591
\begin{itemize}
1592

1593
\item {\bf P}arentheses have the highest precedence and can be used 
1594
to force an expression to evaluate in the order you want. Since
1595
expressions in parentheses are evaluated first, {\tt 2 * (3-1)} is 4,
1596
and {\tt (1+1)**(5-2)} is 8. You can also use parentheses to make an
1597
expression easier to read, as in {\tt (minute * 100) / 60}, even
1598
if it doesn't change the result.
1599

1600
\item {\bf E}xponentiation has the next highest precedence, so
1601
{\tt 1 + 2**3} is 9, not 27, and {\tt 2 * 3**2} is 18, not 36.
1602

1603
\item {\bf M}ultiplication and {\bf D}ivision have higher precedence
1604
  than {\bf A}ddition and {\bf S}ubtraction.  So {\tt 2*3-1} is 5, not
1605
  4, and {\tt 6+4/2} is 8, not 5.
1606

1607
\item Operators with the same precedence are evaluated from left to
1608
  right (except exponentiation).  So in the expression {\tt degrees /
1609
    2 * pi}, the division happens first and the result is multiplied
1610
  by {\tt pi}.  To divide by $2 \pi$, you can use parentheses or write
1611
  {\tt degrees / 2 / pi}.
1612

1613
\end{itemize}
1614

1615
I don't work very hard to remember the precedence of
1616
operators.  If I can't tell by looking at the expression, I use
1617
parentheses to make it obvious.
1618

1619

1620
\section{String operations}
1621
\index{string!operation}
1622
\index{operator!string}
1623

1624
In general, you can't perform mathematical operations on strings, even
1625
if the strings look like numbers, so the following are illegal:
1626

1627
\begin{verbatim}
1628
'chinese'-'food'    'eggs'/'easy'    'third'*'a charm'
1629
\end{verbatim}
1630
%
1631
But there are two exceptions, {\tt +} and {\tt *}.
1632

1633
The {\tt +} operator performs {\bf string concatenation}, which means
1634
it joins the strings by linking them end-to-end.  For example:
1635
\index{concatenation}
1636

1637
\begin{verbatim}
1638
>>> first = 'throat'
1639
>>> second = 'warbler'
1640
>>> first + second
1641
throatwarbler
1642
\end{verbatim}
1643
%
1644
The {\tt *} operator also works on strings; it performs repetition.
1645
For example, \verb"'Spam'*3" is \verb"'SpamSpamSpam'".  If one of the
1646
values is a string, the other has to be an integer.
1647

1648
This use of {\tt +} and {\tt *} makes sense by
1649
analogy with addition and multiplication.  Just as {\tt 4*3} is
1650
equivalent to {\tt 4+4+4}, we expect \verb"'Spam'*3" to be the same as
1651
\verb"'Spam'+'Spam'+'Spam'", and it is.  On the other hand, there is a
1652
significant way in which string concatenation and repetition are
1653
different from integer addition and multiplication.
1654
Can you think of a property that addition has
1655
that string concatenation does not?
1656
\index{commutativity}
1657

1658

1659
\section{Comments}
1660
\index{comment}
1661

1662
As programs get bigger and more complicated, they get more difficult
1663
to read.  Formal languages are dense, and it is often difficult to
1664
look at a piece of code and figure out what it is doing, or why.
1665

1666
For this reason, it is a good idea to add notes to your programs to explain
1667
in natural language what the program is doing.  These notes are called
1668
{\bf comments}, and they start with the \verb"#" symbol:
1669

1670
\begin{verbatim}
1671
# compute the percentage of the hour that has elapsed
1672
percentage = (minute * 100) / 60
1673
\end{verbatim}
1674
%
1675
In this case, the comment appears on a line by itself.  You can also put
1676
comments at the end of a line:
1677

1678
\begin{verbatim}
1679
percentage = (minute * 100) / 60     # percentage of an hour
1680
\end{verbatim}
1681
%
1682
Everything from the {\tt \#} to the end of the line is ignored---it
1683
has no effect on the execution of the program.
1684

1685
Comments are most useful when they document non-obvious features of
1686
the code.  It is reasonable to assume that the reader can figure out
1687
{\em what} the code does; it is more useful to explain {\em why}.
1688

1689
This comment is redundant with the code and useless:
1690

1691
\begin{verbatim}
1692
v = 5     # assign 5 to v
1693
\end{verbatim}
1694
%
1695
This comment contains useful information that is not in the code:
1696

1697
\begin{verbatim}
1698
v = 5     # velocity in meters/second. 
1699
\end{verbatim}
1700
%
1701
Good variable names can reduce the need for comments, but
1702
long names can make complex expressions hard to read, so there is
1703
a tradeoff.
1704

1705

1706
\section{Debugging}
1707
\index{debugging}
1708
\index{bug}
1709

1710
Three kinds of errors can occur in a program: syntax errors, runtime 
1711
errors, and semantic errors.  It is useful
1712
to distinguish between them in order to track them down more quickly.
1713

1714
\begin{description}
1715

1716
\item[Syntax error:] ``Syntax'' refers to the structure of a program
1717
  and the rules about that structure.  For example, parentheses have
1718
  to come in matching pairs, so {\tt (1 + 2)} is legal, but {\tt 8)}
1719
  is a {\bf syntax error}.  \index{syntax error} \index{error!syntax}
1720
  \index{error message}
1721
\index{syntax} 
1722

1723
If there is a syntax error
1724
anywhere in your program, Python displays an error message and quits,
1725
and you will not be able to run the program.  During the first few
1726
weeks of your programming career, you might spend a lot of
1727
time tracking down syntax errors.  As you gain experience, you will
1728
make fewer errors and find them faster.
1729

1730

1731
\item[Runtime error:] The second type of error is a runtime error, so
1732
  called because the error does not appear until after the program has
1733
  started running.  These errors are also called {\bf exceptions}
1734
  because they usually indicate that something exceptional (and bad)
1735
  has happened.  \index{runtime error} \index{error!runtime}
1736
  \index{exception} \index{safe language} \index{language!safe}
1737

1738
Runtime errors are rare in the simple programs you will see in the
1739
first few chapters, so it might be a while before you encounter one.
1740

1741

1742
\item[Semantic error:] The third type of error is ``semantic'', which
1743
  means related to meaning.  If there is a semantic error in your
1744
  program, it will run without generating error messages, but it will
1745
  not do the right thing.  It will do something else.  Specifically,
1746
  it will do what you told it to do.  \index{semantic error}
1747
  \index{error!semantic} \index{error message}
1748

1749
Identifying semantic errors can be tricky because it requires you to work
1750
backward by looking at the output of the program and trying to figure
1751
out what it is doing.
1752

1753
\end{description}
1754

1755

1756
\section{Glossary}
1757

1758
\begin{description}
1759

1760
\item[variable:]  A name that refers to a value.
1761
\index{variable}
1762

1763
\item[assignment:]  A statement that assigns a value to a variable.
1764
\index{assignment}
1765

1766
\item[state diagram:]  A graphical representation of a set of variables and the
1767
values they refer to.
1768
\index{state diagram}
1769

1770
\item[keyword:]  A reserved word that is used to parse a
1771
program; you cannot use keywords like {\tt if}, {\tt  def}, and {\tt while} as
1772
variable names.
1773
\index{keyword}
1774

1775
\item[operand:]  One of the values on which an operator operates.
1776
\index{operand}
1777

1778
\item[expression:]  A combination of variables, operators, and values that
1779
represents a single result.
1780
\index{expression}
1781

1782
\item[evaluate:]  To simplify an expression by performing the operations
1783
in order to yield a single value.
1784

1785
\item[statement:]  A section of code that represents a command or action.  So
1786
far, the statements we have seen are assignments and print statements.
1787
\index{statement}
1788

1789
\item[execute:]  To run a statement and do what it says.
1790
\index{execute}
1791

1792
\item[interactive mode:] A way of using the Python interpreter by
1793
typing code at the prompt.
1794
\index{interactive mode}
1795

1796
\item[script mode:] A way of using the Python interpreter to read
1797
code from a script and run it.
1798
\index{script mode}
1799

1800
\item[script:] A program stored in a file.
1801
\index{script}
1802

1803
\item[order of operations:]  Rules governing the order in which
1804
expressions involving multiple operators and operands are evaluated.
1805
\index{order of operations}
1806

1807
\item[concatenate:]  To join two operands end-to-end.
1808
\index{concatenation}
1809

1810
\item[comment:]  Information in a program that is meant for other
1811
programmers (or anyone reading the source code) and has no effect on the
1812
execution of the program.
1813
\index{comment}
1814

1815
\item[syntax error:]  An error in a program that makes it impossible
1816
to parse (and therefore impossible to interpret).
1817
\index{syntax error}
1818

1819
\item[exception:]  An error that is detected while the program is running.
1820
\index{exception}
1821

1822
\item[semantics:]  The meaning of a program.
1823
\index{semantics}
1824

1825
\item[semantic error:]   An error in a program that makes it do something
1826
other than what the programmer intended.
1827
\index{semantic error}
1828

1829
\end{description}
1830

1831

1832
\section{Exercises}
1833

1834
\begin{exercise}
1835

1836
Repeating my advice from the previous chapter, whenever you learn
1837
a new feature, you should try it out in interactive mode and make
1838
errors on purpose to see what goes wrong.
1839

1840
\begin{itemize}
1841

1842
\item We've seen that {\tt n = 42} is legal.  What about {\tt 42 = n}?
1843

1844
\item How about {\tt x = y = 1}?
1845

1846
\item In some languages every statement ends with a semi-colon, {\tt ;}.
1847
What happens if you put a semi-colon at the end of a Python statement?
1848

1849
\item What if you put a period at the end of a statement?
1850

1851
\item In math notation you can multiply $x$ and $y$ like this: $x y$.
1852
What happens if you try that in Python?
1853

1854
\end{itemize}
1855

1856
\end{exercise}
1857

1858

1859
\begin{exercise}
1860

1861
Practice using the Python interpreter as a calculator: 
1862
\index{calculator}
1863

1864
\begin{enumerate}
1865

1866
\item The volume of a sphere with radius $r$ is $\frac{4}{3} \pi r^3$.
1867
  What is the volume of a sphere with radius 5?
1868

1869
\item Suppose the cover price of a book is \$24.95, but bookstores get a
1870
  40\% discount.  Shipping costs \$3 for the first copy and 75 cents
1871
  for each additional copy.  What is the total wholesale cost for
1872
  60 copies?
1873

1874
\item If I leave my house at 6:52 am and run 1 mile at an easy pace
1875
  (8:15 per mile), then 3 miles at tempo (7:12 per mile) and 1 mile at
1876
  easy pace again, what time do I get home for breakfast?
1877
\index{running pace}
1878

1879
\end{enumerate}
1880
\end{exercise}
1881

1882

1883
\chapter{Functions}
1884
\label{funcchap}
1885

1886
In the context of programming, a {\bf function} is a named sequence of
1887
statements that performs a computation.  When you define a function,
1888
you specify the name and the sequence of statements.  Later, you can
1889
``call'' the function by name.  
1890
\index{function}
1891

1892
\section{Function calls}
1893
\label{functionchap}
1894
\index{function call}
1895

1896
We have already seen one example of a {\bf function call}:
1897

1898
\begin{verbatim}
1899
>>> type(42)
1900
<class 'int'>
1901
\end{verbatim}
1902
%
1903
The name of the function is {\tt type}.  The expression in parentheses
1904
is called the {\bf argument} of the function.  The result, for this
1905
function, is the type of the argument.
1906
\index{parentheses!argument in}
1907

1908
It is common to say that a function ``takes'' an argument and ``returns''
1909
a result.  The result is also called the {\bf return value}.
1910
\index{argument}
1911
\index{return value}
1912

1913
Python provides functions that convert values
1914
from one type to another.  The {\tt int} function takes any value and
1915
converts it to an integer, if it can, or complains otherwise:
1916
\index{conversion!type}
1917
\index{type conversion}
1918
\index{int function}
1919
\index{function!int}
1920

1921
\begin{verbatim}
1922
>>> int('32')
1923
32
1924
>>> int('Hello')
1925
ValueError: invalid literal for int(): Hello
1926
\end{verbatim}
1927
%
1928
{\tt int} can convert floating-point values to integers, but it
1929
doesn't round off; it chops off the fraction part:
1930

1931
\begin{verbatim}
1932
>>> int(3.99999)
1933
3
1934
>>> int(-2.3)
1935
-2
1936
\end{verbatim}
1937
%
1938
{\tt float} converts integers and strings to floating-point
1939
numbers:
1940
\index{float function}
1941
\index{function!float}
1942

1943
\begin{verbatim}
1944
>>> float(32)
1945
32.0
1946
>>> float('3.14159')
1947
3.14159
1948
\end{verbatim}
1949
%
1950
Finally, {\tt str} converts its argument to a string:
1951
\index{str function}
1952
\index{function!str}
1953

1954
\begin{verbatim}
1955
>>> str(32)
1956
'32'
1957
>>> str(3.14159)
1958
'3.14159'
1959
\end{verbatim}
1960
%
1961

1962
\section{Math functions}
1963
\index{math function}
1964
\index{function!math}
1965

1966
Python has a math module that provides most of the familiar
1967
mathematical functions.  A {\bf module} is a file that contains a
1968
collection of related functions.
1969
\index{module}
1970
\index{module object}
1971

1972
Before we can use the functions in a module, we have to import it with
1973
an {\bf import statement}:
1974

1975
\begin{verbatim}
1976
>>> import math
1977
\end{verbatim}
1978
%
1979
This statement creates a {\bf module object} named math.  If
1980
you display the module object, you get some information about it:
1981

1982
\begin{verbatim}
1983
>>> math
1984
<module 'math' (built-in)>
1985
\end{verbatim}
1986
%
1987
The module object contains the functions and variables defined in the
1988
module.  To access one of the functions, you have to specify the name
1989
of the module and the name of the function, separated by a dot (also
1990
known as a period).  This format is called {\bf dot notation}.
1991
\index{dot notation}
1992

1993
\begin{verbatim}
1994
>>> ratio = signal_power / noise_power
1995
>>> decibels = 10 * math.log10(ratio)
1996

1997
>>> radians = 0.7
1998
>>> height = math.sin(radians)
1999
\end{verbatim}
2000
%
2001
The first example uses \verb"math.log10" to compute 
2002
a signal-to-noise ratio in decibels (assuming that \verb"signal_power" and
2003
\verb"noise_power" are defined).  The math module also provides {\tt log},
2004
which computes logarithms base {\tt e}.
2005
\index{log function}
2006
\index{function!log}
2007
\index{sine function}
2008
\index{radian}
2009
\index{trigonometric function}
2010
\index{function!trigonometric}
2011

2012
The second example finds the sine of {\tt radians}.  The variable name {\tt radians} is a hint that {\tt sin} and the other trigonometric
2013
functions ({\tt cos}, {\tt tan}, etc.)  take arguments in radians. To
2014
convert from degrees to radians, divide by 180 and multiply by
2015
$\pi$:
2016

2017
\begin{verbatim}
2018
>>> degrees = 45
2019
>>> radians = degrees / 180.0 * math.pi
2020
>>> math.sin(radians)
2021
0.707106781187
2022
\end{verbatim}
2023
%
2024
The expression {\tt math.pi} gets the variable {\tt pi} from the math
2025
module.  Its value is a floating-point approximation
2026
of $\pi$, accurate to about 15 digits.
2027
\index{pi}
2028

2029
If you know
2030
trigonometry, you can check the previous result by comparing it to
2031
the square root of two, divided by two:
2032
\index{sqrt function}
2033
\index{function!sqrt}
2034

2035
\begin{verbatim}
2036
>>> math.sqrt(2) / 2.0
2037
0.707106781187
2038
\end{verbatim}
2039
%
2040

2041
\section{Composition}
2042
\index{composition}
2043

2044
So far, we have looked at the elements of a program---variables,
2045
expressions, and statements---in isolation, without talking about how to
2046
combine them.
2047

2048
One of the most useful features of programming languages is their
2049
ability to take small building blocks and {\bf compose} them.  For
2050
example, the argument of a function can be any kind of expression,
2051
including arithmetic operators:
2052

2053
\begin{verbatim}
2054
x = math.sin(degrees / 360.0 * 2 * math.pi)
2055
\end{verbatim}
2056
%
2057
And even function calls:
2058

2059
\begin{verbatim}
2060
x = math.exp(math.log(x+1))
2061
\end{verbatim}
2062
%
2063
Almost anywhere you can put a value, you can put an arbitrary
2064
expression, with one exception: the left side of an assignment
2065
statement has to be a variable name.  Any other expression on the left
2066
side is a syntax error (we will see exceptions to this rule
2067
later).
2068

2069
\begin{verbatim}
2070
>>> minutes = hours * 60                 # right
2071
>>> hours * 60 = minutes                 # wrong!
2072
SyntaxError: can't assign to operator
2073
\end{verbatim}
2074
%
2075
\index{SyntaxError}
2076
\index{exception!SyntaxError}
2077

2078

2079
\section{Adding new functions}
2080

2081
So far, we have only been using the functions that come with Python,
2082
but it is also possible to add new functions.
2083
A {\bf function definition} specifies the name of a new function and
2084
the sequence of statements that run when the function is called.
2085
\index{function}
2086
\index{function definition}
2087
\index{definition!function}
2088

2089
Here is an example:
2090

2091
\begin{verbatim}
2092
def print_lyrics():
2093
    print("I'm a lumberjack, and I'm okay.")
2094
    print("I sleep all night and I work all day.")
2095
\end{verbatim}
2096
%
2097
{\tt def} is a keyword that indicates that this is a function
2098
definition.  The name of the function is \verb"print_lyrics".  The
2099
rules for function names are the same as for variable names: letters,
2100
numbers and underscore are legal, but the first character
2101
can't be a number.  You can't use a keyword as the name of a function,
2102
and you should avoid having a variable and a function with the same
2103
name.
2104
\index{def keyword}
2105
\index{keyword!def}
2106
\index{argument}
2107

2108
The empty parentheses after the name indicate that this function
2109
doesn't take any arguments.
2110
\index{parentheses!empty}
2111
\index{header}
2112
\index{body}
2113
\index{indentation}
2114
\index{colon}
2115

2116
The first line of the function definition is called the {\bf header};
2117
the rest is called the {\bf body}.  The header has to end with a colon
2118
and the body has to be indented.  By convention, indentation is
2119
always four spaces.  The body can contain
2120
any number of statements.
2121

2122
The strings in the print statements are enclosed in double
2123
quotes.  Single quotes and double quotes do the same thing;
2124
most people use single quotes except in cases like this where
2125
a single quote (which is also an apostrophe) appears in the string.
2126

2127
All quotation marks (single and double)
2128
must be ``straight quotes'', usually
2129
located next to Enter on the keyboard.  ``Curly quotes'', like
2130
the ones in this sentence, are not legal in Python.
2131

2132
If you type a function definition in interactive mode, the interpreter
2133
prints dots ({\tt ...}) to let you know that the definition
2134
isn't complete:
2135
\index{ellipses}
2136

2137
\begin{verbatim}
2138
>>> def print_lyrics():
2139
...     print("I'm a lumberjack, and I'm okay.")
2140
...     print("I sleep all night and I work all day.")
2141
...
2142
\end{verbatim}
2143
%
2144
To end the function, you have to enter an empty line.
2145

2146
Defining a function creates a {\bf function object}, which
2147
has type \verb"function":
2148
\index{function type}
2149
\index{type!function}
2150

2151
\begin{verbatim}
2152
>>> print(print_lyrics)
2153
<function print_lyrics at 0xb7e99e9c>
2154
>>> type(print_lyrics)
2155
<class 'function'>
2156
\end{verbatim}
2157
%
2158
The syntax for calling the new function is the same as
2159
for built-in functions:
2160

2161
\begin{verbatim}
2162
>>> print_lyrics()
2163
I'm a lumberjack, and I'm okay.
2164
I sleep all night and I work all day.
2165
\end{verbatim}
2166
%
2167
Once you have defined a function, you can use it inside another
2168
function.  For example, to repeat the previous refrain, we could write
2169
a function called \verb"repeat_lyrics":
2170

2171
\begin{verbatim}
2172
def repeat_lyrics():
2173
    print_lyrics()
2174
    print_lyrics()
2175
\end{verbatim}
2176
%
2177
And then call \verb"repeat_lyrics":
2178

2179
\begin{verbatim}
2180
>>> repeat_lyrics()
2181
I'm a lumberjack, and I'm okay.
2182
I sleep all night and I work all day.
2183
I'm a lumberjack, and I'm okay.
2184
I sleep all night and I work all day.
2185
\end{verbatim}
2186
%
2187
But that's not really how the song goes.
2188

2189

2190
\section{Definitions and uses}
2191
\index{function definition}
2192

2193
Pulling together the code fragments from the previous section, the
2194
whole program looks like this:
2195

2196
\begin{verbatim}
2197
def print_lyrics():
2198
    print("I'm a lumberjack, and I'm okay.")
2199
    print("I sleep all night and I work all day.")
2200

2201
def repeat_lyrics():
2202
    print_lyrics()
2203
    print_lyrics()
2204

2205
repeat_lyrics()
2206
\end{verbatim}
2207
%
2208
This program contains two function definitions: \verb"print_lyrics" and
2209
\verb"repeat_lyrics".  Function definitions get executed just like other
2210
statements, but the effect is to create function objects.  The statements
2211
inside the function do not run until the function is called, and
2212
the function definition generates no output.
2213
\index{use before def}
2214

2215
As you might expect, you have to create a function before you can
2216
run it.  In other words, the function definition has to run
2217
before the function gets called.
2218

2219
As an exercise, move the last line of this program
2220
to the top, so the function call appears before the definitions. Run 
2221
the program and see what error
2222
message you get.
2223

2224
Now move the function call back to the bottom
2225
and move the definition of \verb"print_lyrics" after the definition of
2226
\verb"repeat_lyrics".  What happens when you run this program?
2227

2228

2229
\section{Flow of execution}
2230
\index{flow of execution}
2231

2232
To ensure that a function is defined before its first use,
2233
you have to know the order statements run in, which is
2234
called the {\bf flow of execution}.
2235

2236
Execution always begins at the first statement of the program.
2237
Statements are run one at a time, in order from top to bottom.
2238

2239
Function definitions do not alter the flow of execution of the
2240
program, but remember that statements inside the function don't
2241
run until the function is called.
2242

2243
A function call is like a detour in the flow of execution. Instead of
2244
going to the next statement, the flow jumps to the body of
2245
the function, runs the statements there, and then comes back
2246
to pick up where it left off.
2247

2248
That sounds simple enough, until you remember that one function can
2249
call another.  While in the middle of one function, the program might
2250
have to run the statements in another function.  Then, while
2251
running that new function, the program might have to run yet
2252
another function!
2253

2254
Fortunately, Python is good at keeping track of where it is, so each
2255
time a function completes, the program picks up where it left off in
2256
the function that called it.  When it gets to the end of the program,
2257
it terminates.
2258

2259
In summary, when you read a program, you
2260
don't always want to read from top to bottom.  Sometimes it makes
2261
more sense if you follow the flow of execution.
2262

2263

2264
\section{Parameters and arguments}
2265
\label{parameters}
2266
\index{parameter}
2267
\index{function parameter}
2268
\index{argument}
2269
\index{function argument}
2270

2271
Some of the functions we have seen require arguments.  For
2272
example, when you call {\tt math.sin} you pass a number
2273
as an argument.  Some functions take more than one argument:
2274
{\tt math.pow} takes two, the base and the exponent.
2275

2276
Inside the function, the arguments are assigned to
2277
variables called {\bf parameters}.  Here is a definition for
2278
a function that takes an argument:
2279
\index{parentheses!parameters in}
2280

2281
\begin{verbatim}
2282
def print_twice(bruce):
2283
    print(bruce)
2284
    print(bruce)
2285
\end{verbatim}
2286
%
2287
This function assigns the argument to a parameter
2288
named {\tt bruce}.  When the function is called, it prints the value of
2289
the parameter (whatever it is) twice.
2290

2291
This function works with any value that can be printed.
2292

2293
\begin{verbatim}
2294
>>> print_twice('Spam')
2295
Spam
2296
Spam
2297
>>> print_twice(42)
2298
42
2299
42
2300
>>> print_twice(math.pi)
2301
3.14159265359
2302
3.14159265359
2303
\end{verbatim}
2304
%
2305
The same rules of composition that apply to built-in functions also
2306
apply to programmer-defined functions, so we can use any kind of expression
2307
as an argument for \verb"print_twice":
2308
\index{composition}
2309
\index{programmer-defined function}
2310
\index{function!programmer defined}
2311

2312
\begin{verbatim}
2313
>>> print_twice('Spam '*4)
2314
Spam Spam Spam Spam
2315
Spam Spam Spam Spam
2316
>>> print_twice(math.cos(math.pi))
2317
-1.0
2318
-1.0
2319
\end{verbatim}
2320
%
2321
The argument is evaluated before the function is called, so
2322
in the examples the expressions \verb"'Spam '*4" and
2323
{\tt math.cos(math.pi)} are only evaluated once.
2324
\index{argument}
2325

2326
You can also use a variable as an argument:
2327

2328
\begin{verbatim}
2329
>>> michael = 'Eric, the half a bee.'
2330
>>> print_twice(michael)
2331
Eric, the half a bee.
2332
Eric, the half a bee.
2333
\end{verbatim}
2334
%
2335
The name of the variable we pass as an argument ({\tt michael}) has
2336
nothing to do with the name of the parameter ({\tt bruce}).  It
2337
doesn't matter what the value was called back home (in the caller);
2338
here in \verb"print_twice", we call everybody {\tt bruce}.
2339

2340

2341
\section{Variables and parameters are local}
2342
\index{local variable}
2343
\index{variable!local}
2344

2345
When you create a variable inside a function, it is {\bf local},
2346
which means that it only
2347
exists inside the function.  For example:
2348
\index{parentheses!parameters in}
2349

2350
\begin{verbatim}
2351
def cat_twice(part1, part2):
2352
    cat = part1 + part2
2353
    print_twice(cat)
2354
\end{verbatim}
2355
%
2356
This function takes two arguments, concatenates them, and prints
2357
the result twice.  Here is an example that uses it:
2358
\index{concatenation}
2359

2360
\begin{verbatim}
2361
>>> line1 = 'Bing tiddle '
2362
>>> line2 = 'tiddle bang.'
2363
>>> cat_twice(line1, line2)
2364
Bing tiddle tiddle bang.
2365
Bing tiddle tiddle bang.
2366
\end{verbatim}
2367
%
2368
When \verb"cat_twice" terminates, the variable {\tt cat}
2369
is destroyed.  If we try to print it, we get an exception:
2370
\index{NameError}
2371
\index{exception!NameError}
2372

2373
\begin{verbatim}
2374
>>> print(cat)
2375
NameError: name 'cat' is not defined
2376
\end{verbatim}
2377
%
2378
Parameters are also local.
2379
For example, outside \verb"print_twice", there is no
2380
such thing as {\tt bruce}.
2381
\index{parameter}
2382

2383

2384
\section{Stack diagrams}
2385
\label{stackdiagram}
2386
\index{stack diagram}
2387
\index{function frame}
2388
\index{frame}
2389

2390
To keep track of which variables can be used where, it is sometimes
2391
useful to draw a {\bf stack diagram}.  Like state diagrams, stack
2392
diagrams show the value of each variable, but they also show the
2393
function each variable belongs to.
2394
\index{stack diagram}
2395
\index{diagram!stack}
2396

2397
Each function is represented by a {\bf frame}.  A frame is a box with
2398
the name of a function beside it and the parameters and variables of
2399
the function inside it.  The stack diagram for the previous example is
2400
shown in Figure~\ref{fig.stack}.
2401

2402
\begin{figure}
2403
\centerline
2404
{\includegraphics[scale=0.8]{figs/stack.pdf}}
2405
\caption{Stack diagram.}
2406
\label{fig.stack}
2407
\end{figure}
2408

2409

2410
The frames are arranged in a stack that indicates which function
2411
called which, and so on.  In this example, \verb"print_twice"
2412
was called by \verb"cat_twice", and \verb"cat_twice" was called by 
2413
\verb"__main__", which is a special name for the topmost frame.  When
2414
you create a variable outside of any function, it belongs to 
2415
\verb"__main__".
2416

2417
\index{main}
2418

2419
Each parameter refers to the same value as its corresponding
2420
argument.  So, {\tt part1} has the same value as
2421
{\tt line1}, {\tt part2} has the same value as {\tt line2},
2422
and {\tt bruce} has the same value as {\tt cat}.
2423

2424
If an error occurs during a function call, Python prints the
2425
name of the function, the name of the function that called
2426
it, and the name of the function that called {\em that}, all the
2427
way back to \verb"__main__".
2428

2429
For example, if you try to access {\tt cat} from within 
2430
\verb"print_twice", you get a {\tt NameError}:
2431

2432
\begin{verbatim}
2433
Traceback (innermost last):
2434
  File "test.py", line 13, in __main__
2435
    cat_twice(line1, line2)
2436
  File "test.py", line 5, in cat_twice
2437
    print_twice(cat)
2438
  File "test.py", line 9, in print_twice
2439
    print(cat)
2440
NameError: name 'cat' is not defined
2441
\end{verbatim}
2442
%
2443
This list of functions is called a {\bf traceback}.  It tells you what
2444
program file the error occurred in, and what line, and what functions
2445
were executing at the time.  It also shows the line of code that
2446
caused the error.
2447
\index{traceback}
2448

2449
The order of the functions in the traceback is the same as the
2450
order of the frames in the stack diagram.  The function that is
2451
currently running is at the bottom.
2452

2453

2454
\section{Fruitful functions and void functions}
2455
\index{fruitful function}
2456
\index{void function}
2457
\index{function!fruitful}
2458
\index{function!void} 
2459

2460
Some of the functions we have used, such as the math functions, return
2461
results; for lack of a better name, I call them {\bf fruitful
2462
  functions}.  Other functions, like \verb"print_twice", perform an
2463
action but don't return a value.  They are called {\bf void
2464
  functions}.
2465

2466
When you call a fruitful function, you almost always
2467
want to do something with the result; for example, you might
2468
assign it to a variable or use it as part of an expression:
2469

2470
\begin{verbatim}
2471
x = math.cos(radians)
2472
golden = (math.sqrt(5) + 1) / 2
2473
\end{verbatim}
2474
%
2475
When you call a function in interactive mode, Python displays
2476
the result:
2477

2478
\begin{verbatim}
2479
>>> math.sqrt(5)
2480
2.2360679774997898
2481
\end{verbatim}
2482
%
2483
But in a script, if you call a fruitful function all by itself,
2484
the return value is lost forever!
2485

2486
\begin{verbatim}
2487
math.sqrt(5)
2488
\end{verbatim}
2489
%
2490
This script computes the square root of 5, but since it doesn't store
2491
or display the result, it is not very useful.
2492
\index{interactive mode}
2493
\index{script mode}
2494

2495
Void functions might display something on the screen or have some
2496
other effect, but they don't have a return value.  If you
2497
assign the result to a variable, you get a special value called
2498
{\tt None}.
2499
\index{None special value}
2500
\index{special value!None}
2501

2502
\begin{verbatim}
2503
>>> result = print_twice('Bing')
2504
Bing
2505
Bing
2506
>>> print(result)
2507
None
2508
\end{verbatim}
2509
%
2510
The value {\tt None} is not the same as the string \verb"'None'". 
2511
It is a special value that has its own type:
2512

2513
\begin{verbatim}
2514
>>> type(None)
2515
<class 'NoneType'>
2516
\end{verbatim}
2517
%
2518
The functions we have written so far are all void.  We will start
2519
writing fruitful functions in a few chapters.
2520
\index{NoneType type}
2521
\index{type!NoneType}
2522

2523

2524
\section{Why functions?}
2525
\index{function!reasons for}
2526

2527
It may not be clear why it is worth the trouble to divide
2528
a program into functions.  There are several reasons:
2529

2530
\begin{itemize}
2531

2532
\item Creating a new function gives you an opportunity to name a group
2533
of statements, which makes your program easier to read and debug.
2534

2535
\item Functions can make a program smaller by eliminating repetitive
2536
code.  Later, if you make a change, you only have
2537
to make it in one place.
2538

2539
\item Dividing a long program into functions allows you to debug the
2540
parts one at a time and then assemble them into a working whole.
2541

2542
\item Well-designed functions are often useful for many programs.
2543
Once you write and debug one, you can reuse it.
2544

2545
\end{itemize}
2546

2547

2548
\section{Debugging}
2549

2550
One of the most important skills you will acquire is debugging.
2551
Although it can be frustrating, debugging is one of the most
2552
intellectually rich, challenging, and interesting parts of
2553
programming.
2554
\index{experimental debugging}
2555
\index{debugging!experimental}
2556

2557
In some ways debugging is like detective work.  You are confronted
2558
with clues and you have to infer the processes and events that led
2559
to the results you see.
2560

2561
Debugging is also like an experimental science.  Once you have an idea
2562
about what is going wrong, you modify your program and try again.  If
2563
your hypothesis was correct, you can predict the result of the
2564
modification, and you take a step closer to a working program.  If
2565
your hypothesis was wrong, you have to come up with a new one.  As
2566
Sherlock Holmes pointed out, ``When you have eliminated the
2567
impossible, whatever remains, however improbable, must be the truth.''
2568
(A. Conan Doyle, {\em The Sign of Four})
2569
\index{Holmes, Sherlock}
2570
\index{Doyle, Arthur Conan}
2571

2572
For some people, programming and debugging are the same thing.  That
2573
is, programming is the process of gradually debugging a program until
2574
it does what you want.  The idea is that you should start with a
2575
working program and make small modifications,
2576
debugging them as you go.
2577

2578
For example, Linux is an operating system that contains millions of
2579
lines of code, but it started out as a simple program Linus Torvalds
2580
used to explore the Intel 80386 chip.  According to Larry Greenfield,
2581
``One of Linus's earlier projects was a program that would switch
2582
between printing AAAA and BBBB.  This later evolved to Linux.''
2583
({\em The Linux Users' Guide} Beta Version 1).
2584
\index{Linux}
2585

2586

2587
\section{Glossary}
2588

2589
\begin{description}
2590

2591
\item[function:] A named sequence of statements that performs some
2592
useful operation.  Functions may or may not take arguments and may or
2593
may not produce a result.
2594
\index{function}
2595

2596
\item[function definition:]  A statement that creates a new function,
2597
specifying its name, parameters, and the statements it contains.
2598
\index{function definition}
2599

2600
\item[function object:]  A value created by a function definition.
2601
The name of the function is a variable that refers to a function
2602
object.
2603
\index{function definition}
2604

2605
\item[header:] The first line of a function definition.
2606
\index{header}
2607

2608
\item[body:] The sequence of statements inside a function definition.
2609
\index{body}
2610

2611
\item[parameter:] A name used inside a function to refer to the value
2612
passed as an argument.
2613
\index{parameter}
2614

2615
\item[function call:] A statement that runs a function. It
2616
consists of the function name followed by an argument list in
2617
parentheses.
2618
\index{function call}
2619

2620
\item[argument:]  A value provided to a function when the function is called.
2621
This value is assigned to the corresponding parameter in the function.
2622
\index{argument}
2623

2624
\item[local variable:]  A variable defined inside a function.  A local
2625
variable can only be used inside its function.
2626
\index{local variable}
2627

2628
\item[return value:]  The result of a function.  If a function call
2629
is used as an expression, the return value is the value of
2630
the expression.
2631
\index{return value}
2632

2633
\item[fruitful function:] A function that returns a value.
2634
\index{fruitful function}
2635

2636
\item[void function:] A function that always returns {\tt None}.
2637
\index{void function}
2638

2639
\item[{\tt None}:]  A special value returned by void functions.
2640
\index{None special value}
2641
\index{special value!None}
2642

2643
\item[module:] A file that contains a
2644
collection of related functions and other definitions.
2645
\index{module}
2646

2647
\item[import statement:] A statement that reads a module file and creates
2648
a module object.
2649
\index{import statement}
2650
\index{statement!import}
2651

2652
\item[module object:] A value created by an {\tt import} statement
2653
that provides access to the values defined in a module.
2654
\index{module}
2655

2656
\item[dot notation:]  The syntax for calling a function in another
2657
module by specifying the module name followed by a dot (period) and
2658
the function name.
2659
\index{dot notation}
2660

2661
\item[composition:] Using an expression as part of a larger expression,
2662
or a statement as part of a larger statement.
2663
\index{composition}
2664

2665
\item[flow of execution:]  The order statements run in.
2666
\index{flow of execution}
2667

2668
\item[stack diagram:]  A graphical representation of a stack of functions,
2669
their variables, and the values they refer to.
2670
\index{stack diagram}
2671

2672
\item[frame:]  A box in a stack diagram that represents a function call.
2673
It contains the local variables and parameters of the function.
2674
\index{function frame}
2675
\index{frame}
2676

2677
\item[traceback:]  A list of the functions that are executing,
2678
printed when an exception occurs.
2679
\index{traceback}
2680

2681

2682
\end{description}
2683

2684

2685
\section{Exercises}
2686

2687
\begin{exercise}
2688
\index{len function}
2689
\index{function!len}
2690

2691
Write a function named \verb"right_justify" that takes a string
2692
named {\tt s} as a parameter and prints the string with enough
2693
leading spaces so that the last letter of the string is in column 70
2694
of the display.
2695

2696
\begin{verbatim}
2697
>>> right_justify('monty')
2698
                                                                 monty
2699
\end{verbatim}
2700

2701
Hint: Use string concatenation and repetition.  Also,
2702
Python provides a built-in function called {\tt len} that
2703
returns the length of a string, so the value of \verb"len('monty')" is 5.
2704

2705
\end{exercise}
2706

2707

2708
\begin{exercise}
2709
\index{function object}
2710
\index{object!function}
2711

2712
A function object is a value you can assign to a variable
2713
or pass as an argument.  For example, \verb"do_twice" is a function
2714
that takes a function object as an argument and calls it twice:
2715

2716
\begin{verbatim}
2717
def do_twice(f):
2718
    f()
2719
    f()
2720
\end{verbatim}
2721

2722
Here's an example that uses \verb"do_twice" to call a function
2723
named \verb"print_spam" twice.
2724

2725
\begin{verbatim}
2726
def print_spam():
2727
    print('spam')
2728

2729
do_twice(print_spam)
2730
\end{verbatim}
2731

2732
\begin{enumerate}
2733

2734
\item Type this example into a script and test it.
2735

2736
\item Modify \verb"do_twice" so that it takes two arguments, a
2737
function object and a value, and calls the function twice,
2738
passing the value as an argument.
2739

2740
\item Copy the definition of 
2741
\verb"print_twice" from earlier in this chapter to your script.
2742

2743
\item Use the modified version of \verb"do_twice" to call
2744
\verb"print_twice" twice, passing \verb"'spam'" as an argument.
2745

2746
\item Define a new function called 
2747
\verb"do_four" that takes a function object and a value
2748
and calls the function four times, passing the value
2749
as a parameter.  There should be only
2750
two statements in the body of this function, not four.
2751

2752
\end{enumerate}
2753

2754
Solution: \url{http://thinkpython2.com/code/do_four.py}.
2755

2756
\end{exercise}
2757

2758

2759

2760
\begin{exercise}
2761

2762
Note: This exercise should be
2763
done using only the statements and other features we have learned so
2764
far.  
2765

2766
\begin{enumerate}
2767

2768
\item Write a function that draws a grid like the following:
2769
\index{grid}
2770

2771
\begin{verbatim}
2772
+ - - - - + - - - - +
2773
|         |         |
2774
|         |         |
2775
|         |         |
2776
|         |         |
2777
+ - - - - + - - - - +
2778
|         |         |
2779
|         |         |
2780
|         |         |
2781
|         |         |
2782
+ - - - - + - - - - +
2783
\end{verbatim}
2784
%
2785
Hint: to print more than one value on a line, you can print
2786
a comma-separated sequence of values:
2787

2788
\begin{verbatim}
2789
print('+', '-')
2790
\end{verbatim}
2791
%
2792
By default, {\tt print} advances to the next line, but you
2793
can override that behavior and put a space at the end, like this:
2794

2795
\begin{verbatim}
2796
print('+', end=' ')
2797
print('-')
2798
\end{verbatim}
2799
%
2800
The output of these statements is \verb"'+ -'" on the same line.
2801
The output from the next print statement would begin on the next line.
2802

2803
\item Write a function that draws a similar grid
2804
with four rows and four columns.
2805

2806
\end{enumerate}
2807

2808
Solution: \url{http://thinkpython2.com/code/grid.py}.
2809
Credit: This exercise is based on an exercise in Oualline, {\em
2810
    Practical C Programming, Third Edition}, O'Reilly Media, 1997.
2811

2812
\end{exercise}
2813

2814

2815

2816

2817

2818
\chapter{Case study: interface design}
2819
\label{turtlechap}
2820

2821
This chapter presents a case study that demonstrates a process for
2822
designing functions that work together.
2823

2824
It introduces the {\tt turtle} module, which allows you to
2825
create images using turtle graphics.  The {\tt turtle} module is
2826
included in most Python installations, but if you are running Python
2827
using PythonAnywhere, you won't be able to run the turtle examples (at
2828
least you couldn't when I wrote this).
2829

2830
If you have already installed Python on your computer, you should
2831
be able to run the examples.  Otherwise, now is a good time
2832
to install.  I have posted instructions at
2833
\url{http://tinyurl.com/thinkpython2e}.
2834

2835
Code examples from this chapter are available from
2836
\url{http://thinkpython2.com/code/polygon.py}.
2837

2838

2839
\section{The turtle module}
2840
\label{turtle}
2841

2842
To check whether you have the {\tt turtle} module, open the Python
2843
interpreter and type
2844

2845
\begin{verbatim}
2846
>>> import turtle
2847
>>> bob = turtle.Turtle()
2848
\end{verbatim}
2849

2850
When you run this code, it should create a new window
2851
with small arrow that represents the turtle.  Close the window.
2852

2853
Create a file named {\tt mypolygon.py} and type in the following
2854
code:
2855

2856
\begin{verbatim}
2857
import turtle
2858
bob = turtle.Turtle()
2859
print(bob)
2860
turtle.mainloop()
2861
\end{verbatim}
2862
%
2863
The {\tt turtle} module (with a lowercase 't') provides a function
2864
called {\tt Turtle} (with an uppercase 'T') that creates a Turtle
2865
object, which we assign to a variable named {\tt bob}.
2866
Printing {\tt bob} displays something like:
2867

2868
\begin{verbatim}
2869
<turtle.Turtle object at 0xb7bfbf4c>
2870
\end{verbatim}
2871
%
2872
This means that {\tt bob} refers to an object with type
2873
{\tt Turtle}
2874
as defined in module {\tt turtle}.
2875

2876
\verb"mainloop" tells the window to wait for the user
2877
to do something, although in this case there's not much for
2878
the user to do except close the window.
2879

2880
Once you create a Turtle, you can call a {\bf method} to move it
2881
around the window.  A method is similar to a function, but it
2882
uses slightly different syntax.  For example, to move the turtle
2883
forward:
2884

2885
\begin{verbatim}
2886
bob.fd(100)
2887
\end{verbatim}
2888
%
2889
The method, {\tt fd}, is associated with the turtle
2890
object we're calling {\tt bob}.  
2891
Calling a method is like making a request: you are asking {\tt bob}
2892
to move forward.
2893

2894
The argument of {\tt fd} is a distance in pixels, so the actual
2895
size depends on your display.
2896

2897
Other methods you can call on a Turtle are {\tt bk} to move
2898
backward, {\tt lt} for left turn, and {\tt rt} right turn.  The
2899
argument for {\tt lt} and {\tt rt} is an angle in degrees.
2900

2901
Also, each Turtle is holding a pen, which is
2902
either down or up; if the pen is down, the Turtle leaves
2903
a trail when it moves.  The methods {\tt pu} and {\tt pd}
2904
stand for ``pen up'' and ``pen down''.
2905

2906
To draw a right angle, add these lines to the program
2907
(after creating {\tt bob} and before calling \verb"mainloop"):
2908

2909
\begin{verbatim}
2910
bob.fd(100)
2911
bob.lt(90)
2912
bob.fd(100)
2913
\end{verbatim}
2914
%
2915
When you run this program, you should see {\tt bob} move east and then
2916
north, leaving two line segments behind.
2917

2918
Now modify the program to draw a square.  Don't go on until
2919
you've got it working!
2920

2921
%\newpage
2922

2923
\section{Simple repetition}
2924
\label{repetition}
2925
\index{repetition}
2926

2927
Chances are you wrote something like this:
2928

2929
\begin{verbatim}
2930
bob.fd(100)
2931
bob.lt(90)
2932

2933
bob.fd(100)
2934
bob.lt(90)
2935

2936
bob.fd(100)
2937
bob.lt(90)
2938

2939
bob.fd(100)
2940
\end{verbatim}
2941
%
2942
We can do the same thing more concisely with a {\tt for} statement.
2943
Add this example to {\tt mypolygon.py} and run it again:
2944
\index{for loop}
2945
\index{loop!for}
2946
\index{statement!for}
2947

2948
\begin{verbatim}
2949
for i in range(4):
2950
    print('Hello!')
2951
\end{verbatim}
2952
%
2953
You should see something like this:
2954

2955
\begin{verbatim}
2956
Hello!
2957
Hello!
2958
Hello!
2959
Hello!
2960
\end{verbatim}
2961
%
2962
This is the simplest use of the {\tt for} statement; we will see
2963
more later.  But that should be enough to let you rewrite your
2964
square-drawing program.  Don't go on until you do.
2965

2966
Here is a {\tt for} statement that draws a square:
2967

2968
\begin{verbatim}
2969
for i in range(4):
2970
    bob.fd(100)
2971
    bob.lt(90)
2972
\end{verbatim}
2973
%
2974
The syntax of a {\tt for} statement is similar to a function
2975
definition.  It has a header that ends with a colon and an indented
2976
body.  The body can contain any number of statements.
2977

2978
A {\tt for} statement is also called a {\bf loop} because
2979
the flow of execution runs through the body and then loops back
2980
to the top.  In this case, it runs the body four times.
2981
\index{loop}
2982

2983
This version is actually a little different from the previous
2984
square-drawing code because it makes another turn after
2985
drawing the last side of the square.  The extra turn takes
2986
more time, but it simplifies the code if we do the same thing
2987
every time through the loop.  This version also has the effect
2988
of leaving the turtle back in the starting position, facing in
2989
the starting direction.
2990

2991
\section{Exercises}
2992

2993
The following is a series of exercises using TurtleWorld.  They
2994
are meant to be fun, but they have a point, too.  While you are
2995
working on them, think about what the point is.
2996

2997
The following sections have solutions to the exercises, so
2998
don't look until you have finished (or at least tried).
2999

3000
\begin{enumerate}
3001

3002
\item Write a function called {\tt square} that takes a parameter
3003
named {\tt t}, which is a turtle.  It should use the turtle to draw
3004
a square.
3005

3006
Write a function call that passes {\tt bob} as an argument to
3007
{\tt square}, and then run the program again.
3008

3009
\item Add another parameter, named {\tt length}, to {\tt square}.
3010
Modify the body so length of the sides is {\tt length}, and then
3011
modify the function call to provide a second argument.  Run the
3012
program again.  Test your program with a range of values for {\tt
3013
length}.
3014

3015
\item Make a copy of {\tt square} and change the name to {\tt
3016
  polygon}.  Add another parameter named {\tt n} and modify the body
3017
  so it draws an n-sided regular polygon.  Hint: The exterior angles
3018
  of an n-sided regular polygon are $360/n$ degrees.  \index{polygon
3019
    function} \index{function!polygon}
3020

3021
\item Write a function called {\tt circle} that takes a turtle, 
3022
{\tt t}, and radius, {\tt r}, as parameters and that draws an
3023
approximate circle by calling {\tt polygon} with an appropriate
3024
length and number of sides.  Test your function with a range of values
3025
of {\tt r}.  \index{circle function} \index{function!circle}
3026

3027
Hint: figure out the circumference of the circle and make sure that
3028
{\tt length * n = circumference}.
3029

3030
\item Make a more general version of {\tt circle} called {\tt arc}
3031
that takes an additional parameter {\tt angle}, which determines
3032
what fraction of a circle to draw.  {\tt angle} is in units of
3033
degrees, so when {\tt angle=360}, {\tt arc} should draw a complete
3034
circle.
3035
\index{arc function}
3036
\index{function!arc}
3037

3038
\end{enumerate}
3039

3040

3041
\section{Encapsulation}
3042

3043
The first exercise asks you to put your square-drawing code
3044
into a function definition and then call the function, passing
3045
the turtle as a parameter.  Here is a solution:
3046

3047
\begin{verbatim}
3048
def square(t):
3049
    for i in range(4):
3050
        t.fd(100)
3051
        t.lt(90)
3052

3053
square(bob)
3054
\end{verbatim}
3055
%
3056
The innermost statements, {\tt fd} and {\tt lt} are indented twice to
3057
show that they are inside the {\tt for} loop, which is inside the
3058
function definition.  The next line, {\tt square(bob)}, is flush with
3059
the left margin, which indicates the end of both the {\tt for} loop
3060
and the function definition.
3061

3062
Inside the function, {\tt t} refers to the same turtle {\tt bob}, so
3063
{\tt t.lt(90)} has the same effect as {\tt bob.lt(90)}.  In that
3064
case, why not
3065
call the parameter {\tt bob}?  The idea is that {\tt t} can be any
3066
turtle, not just {\tt bob}, so you could create a second turtle and
3067
pass it as an argument to {\tt square}:
3068

3069
\begin{verbatim}
3070
alice = turtle.Turtle()
3071
square(alice)
3072
\end{verbatim}
3073
%
3074
Wrapping a piece of code up in a function is called {\bf
3075
encapsulation}.  One of the benefits of encapsulation is that it
3076
attaches a name to the code, which serves as a kind of documentation.
3077
Another advantage is that if you re-use the code, it is more concise
3078
to call a function twice than to copy and paste the body!
3079
\index{encapsulation}
3080

3081

3082
\section{Generalization}
3083

3084
The next step is to add a {\tt length} parameter to {\tt square}.
3085
Here is a solution:
3086

3087
\begin{verbatim}
3088
def square(t, length):
3089
    for i in range(4):
3090
        t.fd(length)
3091
        t.lt(90)
3092

3093
square(bob, 100)
3094
\end{verbatim}
3095
%
3096
Adding a parameter to a function is called {\bf generalization}
3097
because it makes the function more general: in the previous
3098
version, the square is always the same size; in this version
3099
it can be any size.
3100
\index{generalization}
3101

3102
The next step is also a generalization.  Instead of drawing
3103
squares, {\tt polygon} draws regular polygons with any number of
3104
sides.  Here is a solution:
3105

3106
\begin{verbatim}
3107
def polygon(t, n, length):
3108
    angle = 360 / n
3109
    for i in range(n):
3110
        t.fd(length)
3111
        t.lt(angle)
3112

3113
polygon(bob, 7, 70)
3114
\end{verbatim}
3115
%
3116
This example draws a 7-sided polygon with side length 70.
3117

3118
If you are using Python 2, the value of {\tt angle} might be off
3119
because of integer division.  A simple solution is to compute
3120
{\tt angle = 360.0 / n}.  Because the numerator is a floating-point
3121
number, the result is floating point.
3122
\index{Python 2}
3123

3124
When a function has more than a few numeric arguments, it is easy to
3125
forget what they are, or what order they should be in.  In that case
3126
it is often a good idea to include the names of the parameters in the
3127
argument list:
3128

3129
\begin{verbatim}
3130
polygon(bob, n=7, length=70)
3131
\end{verbatim}
3132
%
3133
These are called {\bf keyword arguments} because they include
3134
the parameter names as ``keywords'' (not to be confused with
3135
Python keywords like {\tt while} and {\tt def}).
3136
\index{keyword argument}
3137
\index{argument!keyword}
3138

3139
This syntax makes the program more readable.  It is also a reminder
3140
about how arguments and parameters work: when you call a function, the
3141
arguments are assigned to the parameters.
3142

3143

3144
\section{Interface design}
3145

3146
The next step is to write {\tt circle}, which takes a radius,
3147
{\tt r}, as a parameter.  Here is a simple solution that uses
3148
{\tt polygon} to draw a 50-sided polygon:
3149

3150
\begin{verbatim}
3151
import math
3152

3153
def circle(t, r):
3154
    circumference = 2 * math.pi * r
3155
    n = 50
3156
    length = circumference / n
3157
    polygon(t, n, length)
3158
\end{verbatim}
3159
%
3160
The first line computes the circumference of a circle with radius
3161
{\tt r} using the formula $2 \pi r$.  Since we use {\tt math.pi}, we
3162
have to import {\tt math}.  By convention, {\tt import} statements
3163
are usually at the beginning of the script.
3164

3165
{\tt n} is the number of line segments in our approximation of a circle,
3166
so {\tt length} is the length of each segment.  Thus, {\tt polygon}
3167
draws a 50-sided polygon that approximates a circle with radius {\tt r}.
3168

3169
One limitation of this solution is that {\tt n} is a constant, which
3170
means that for very big circles, the line segments are too long, and
3171
for small circles, we waste time drawing very small segments.  One
3172
solution would be to generalize the function by taking {\tt n} as
3173
a parameter.  This would give the user (whoever calls {\tt circle})
3174
more control, but the interface would be less clean.
3175
\index{interface}
3176

3177
The {\bf interface} of a function is a summary of how it is used: what
3178
are the parameters?  What does the function do?  And what is the return
3179
value?  An interface is ``clean'' if it allows the caller to do
3180
what they want without dealing with unnecessary details.
3181

3182
In this example, {\tt r} belongs in the interface because it
3183
specifies the circle to be drawn.  {\tt n} is less appropriate
3184
because it pertains to the details of {\em how} the circle should
3185
be rendered.
3186

3187
Rather than clutter up the interface, it is better
3188
to choose an appropriate value of {\tt n}
3189
depending on {\tt circumference}:
3190

3191
\begin{verbatim}
3192
def circle(t, r):
3193
    circumference = 2 * math.pi * r
3194
    n = int(circumference / 3) + 3
3195
    length = circumference / n
3196
    polygon(t, n, length)
3197
\end{verbatim}
3198
%
3199
Now the number of segments is an integer near {\tt circumference/3},
3200
so the length of each segment is approximately 3, which is small
3201
enough that the circles look good, but big enough to be efficient,
3202
and acceptable for any size circle.
3203

3204
Adding 3 to {\tt n} guarantees that the polygon has at least 3 sides.
3205

3206

3207
\section{Refactoring}
3208
\label{refactoring}
3209
\index{refactoring}
3210

3211
When I wrote {\tt circle}, I was able to re-use {\tt polygon}
3212
because a many-sided polygon is a good approximation of a circle.
3213
But {\tt arc} is not as cooperative; we can't use {\tt polygon}
3214
or {\tt circle} to draw an arc.
3215

3216
One alternative is to start with a copy
3217
of {\tt polygon} and transform it into {\tt arc}.  The result
3218
might look like this:
3219

3220
\begin{verbatim}
3221
def arc(t, r, angle):
3222
    arc_length = 2 * math.pi * r * angle / 360
3223
    n = int(arc_length / 3) + 1
3224
    step_length = arc_length / n
3225
    step_angle = angle / n
3226
    
3227
    for i in range(n):
3228
        t.fd(step_length)
3229
        t.lt(step_angle)
3230
\end{verbatim}
3231
%
3232
The second half of this function looks like {\tt polygon}, but we
3233
can't re-use {\tt polygon} without changing the interface.  We could
3234
generalize {\tt polygon} to take an angle as a third argument,
3235
but then {\tt polygon} would no longer be an appropriate name!
3236
Instead, let's call the more general function {\tt polyline}:
3237

3238
\begin{verbatim}
3239
def polyline(t, n, length, angle):
3240
    for i in range(n):
3241
        t.fd(length)
3242
        t.lt(angle)
3243
\end{verbatim}
3244
%
3245
Now we can rewrite {\tt polygon} and {\tt arc} to use {\tt polyline}:
3246

3247
\begin{verbatim}
3248
def polygon(t, n, length):
3249
    angle = 360.0 / n
3250
    polyline(t, n, length, angle)
3251

3252
def arc(t, r, angle):
3253
    arc_length = 2 * math.pi * r * angle / 360
3254
    n = int(arc_length / 3) + 1
3255
    step_length = arc_length / n
3256
    step_angle = float(angle) / n
3257
    polyline(t, n, step_length, step_angle)
3258
\end{verbatim}
3259
%
3260
Finally, we can rewrite {\tt circle} to use {\tt arc}:
3261

3262
\begin{verbatim}
3263
def circle(t, r):
3264
    arc(t, r, 360)
3265
\end{verbatim}
3266
%
3267
This process---rearranging a program to improve
3268
interfaces and facilitate code re-use---is called {\bf refactoring}.
3269
In this case, we noticed that there was similar code in {\tt arc} and
3270
{\tt polygon}, so we ``factored it out'' into {\tt polyline}.
3271
\index{refactoring}
3272

3273
If we had planned ahead, we might have written {\tt polyline} first
3274
and avoided refactoring, but often you don't know enough at the
3275
beginning of a project to design all the interfaces.  Once you start
3276
coding, you understand the problem better.  Sometimes refactoring is a
3277
sign that you have learned something.
3278

3279

3280
\section{A development plan}
3281
\index{development plan!encapsulation and generalization}
3282

3283
A {\bf development plan} is a process for writing programs.  The
3284
process we used in this case study is ``encapsulation and
3285
generalization''.  The steps of this process are:
3286

3287
\begin{enumerate}
3288

3289
\item Start by writing a small program with no function definitions.
3290

3291
\item Once you get the program working, identify a coherent piece of
3292
  it, encapsulate the piece in a function and give it a name.
3293

3294
\item Generalize the function by adding appropriate parameters.
3295

3296
\item Repeat steps 1--3 until you have a set of working functions.
3297
Copy and paste working code to avoid retyping (and re-debugging).
3298

3299
\item Look for opportunities to improve the program by refactoring.
3300
For example, if you have similar code in several places, consider
3301
factoring it into an appropriately general function.
3302

3303
\end{enumerate}
3304

3305
This process has some drawbacks---we will see alternatives later---but
3306
it can be useful if you don't know ahead of time how to divide the
3307
program into functions.  This approach lets you design as you go
3308
along.
3309

3310

3311
\section{docstring}
3312
\label{docstring}
3313
\index{docstring}
3314

3315
A {\bf docstring} is a string at the beginning of a function that
3316
explains the interface (``doc'' is short for ``documentation'').  Here
3317
is an example:
3318

3319
\begin{verbatim}
3320
def polyline(t, n, length, angle):
3321
    """Draws n line segments with the given length and
3322
    angle (in degrees) between them.  t is a turtle.
3323
    """    
3324
    for i in range(n):
3325
        t.fd(length)
3326
        t.lt(angle)
3327
\end{verbatim}
3328
%
3329
By convention, all docstrings are triple-quoted strings, also known
3330
as multiline strings because the triple quotes allow the string
3331
to span more than one line.
3332
\index{quotation mark}
3333
\index{triple-quoted string}
3334
\index{string!triple-quoted}
3335
\index{multiline string}
3336
\index{string!multiline}
3337

3338
It is terse, but it contains the essential information
3339
someone would need to use this function.  It explains concisely what
3340
the function does (without getting into the details of how it does
3341
it).  It explains what effect each parameter has on the behavior of
3342
the function and what type each parameter should be (if it is not
3343
obvious).
3344

3345
Writing this kind of documentation is an important part of interface
3346
design.  A well-designed interface should be simple to explain;
3347
if you have a hard time explaining one of your functions,
3348
maybe the interface could be improved.
3349

3350

3351
\section{Debugging}
3352
\index{debugging}
3353
\index{interface}
3354

3355
An interface is like a contract between a function and a caller.
3356
The caller agrees to provide certain parameters and the function
3357
agrees to do certain work.
3358

3359
For example, {\tt polyline} requires four arguments: {\tt t} has to be
3360
a Turtle; {\tt n} has to be an
3361
integer; {\tt length} should be a positive number; and {\tt
3362
  angle} has to be a number, which is understood to be in degrees.
3363

3364
These requirements are called {\bf preconditions} because they
3365
are supposed to be true before the function starts executing.
3366
Conversely, conditions at the end of the function are
3367
{\bf postconditions}.  Postconditions include the intended
3368
effect of the function (like drawing line segments) and any
3369
side effects (like moving the Turtle or making other changes).
3370
\index{precondition}
3371
\index{postcondition}
3372

3373
Preconditions are the responsibility of the caller.  If the caller
3374
violates a (properly documented!) precondition and the function
3375
doesn't work correctly, the bug is in the caller, not the function.
3376

3377
If the preconditions are satisfied and the postconditions are
3378
not, the bug is in the function.  If your pre- and postconditions
3379
are clear, they can help with debugging.
3380

3381

3382
\section{Glossary}
3383

3384
\begin{description}
3385

3386
\item[method:] A function that is associated with an object and called
3387
using dot notation.
3388
\index{method}
3389

3390
\item[loop:] A part of a program that can run repeatedly.
3391
\index{loop}
3392

3393
\item[encapsulation:] The process of transforming a sequence of
3394
statements into a function definition.
3395
\index{encapsulation}
3396

3397
\item[generalization:] The process of replacing something
3398
unnecessarily specific (like a number) with something appropriately
3399
general (like a variable or parameter).
3400
\index{generalization}
3401

3402
\item[keyword argument:] An argument that includes the name of
3403
the parameter as a ``keyword''.
3404
\index{keyword argument}
3405
\index{argument!keyword}
3406

3407
\item[interface:] A description of how to use a function, including
3408
the name and descriptions of the arguments and return value.
3409
\index{interface}
3410

3411
\item[refactoring:] The process of modifying a working program to
3412
  improve function interfaces and other qualities of the code.
3413
\index{refactoring}
3414

3415
\item[development plan:] A process for writing programs.
3416
\index{development plan}
3417

3418
\item[docstring:] A string that appears at the top of a function
3419
  definition to document the function's interface. 
3420
\index{docstring}
3421

3422
\item[precondition:] A requirement that should be satisfied by
3423
the caller before a function starts.
3424
\index{precondition}
3425

3426
\item[postcondition:] A requirement that should be satisfied by
3427
the function before it ends.
3428
\index{precondition}
3429

3430
\end{description}
3431

3432

3433
\section{Exercises}
3434

3435
\begin{exercise}
3436

3437
Download the code in this chapter from
3438
\url{http://thinkpython2.com/code/polygon.py}.
3439

3440
\begin{enumerate}
3441

3442
\item Draw a stack diagram that shows the state of the program
3443
while executing {\tt circle(bob, radius)}.  You can do the
3444
arithmetic by hand or add {\tt print} statements to the code.
3445
\index{stack diagram}
3446

3447
\item The version of {\tt arc} in Section~\ref{refactoring} is not
3448
very accurate because the linear approximation of the
3449
circle is always outside the true circle.  As a result,
3450
the Turtle ends up a few pixels away from the correct
3451
destination.  My solution shows a way to reduce
3452
the effect of this error.  Read the code and see if it makes
3453
sense to you.  If you draw a diagram, you might see how it works.
3454

3455
\end{enumerate}
3456

3457
\end{exercise}
3458

3459
\begin{figure}
3460
\centerline
3461
{\includegraphics[scale=0.8]{figs/flowers.pdf}}
3462
\caption{Turtle flowers.}
3463
\label{fig.flowers}
3464
\end{figure}
3465

3466
\begin{exercise}
3467
\index{flower}
3468

3469
Write an appropriately general set of functions that
3470
can draw flowers as in Figure~\ref{fig.flowers}.
3471

3472
Solution: \url{http://thinkpython2.com/code/flower.py},
3473
also requires \url{http://thinkpython2.com/code/polygon.py}.
3474

3475
\end{exercise}
3476

3477
\begin{figure}
3478
\centerline
3479
{\includegraphics[scale=0.8]{figs/pies.pdf}}
3480
\caption{Turtle pies.}
3481
\label{fig.pies}
3482
\end{figure}
3483

3484

3485
\begin{exercise}
3486
\index{pie}
3487

3488
Write an appropriately general set of functions that
3489
can draw shapes as in Figure~\ref{fig.pies}.
3490

3491
Solution: \url{http://thinkpython2.com/code/pie.py}.
3492

3493
\end{exercise}
3494

3495
\begin{exercise}
3496
\index{alphabet}
3497
\index{turtle typewriter}
3498
\index{typewriter, turtle}
3499

3500
The letters of the alphabet can be constructed from a moderate number
3501
of basic elements, like vertical and horizontal lines and a few
3502
curves.  Design an alphabet that can be drawn with a minimal
3503
number of basic elements and then write functions that draw the letters.
3504

3505
You should write one function for each letter, with names
3506
\verb"draw_a", \verb"draw_b", etc., and put your functions
3507
in a file named {\tt letters.py}.  You can download a
3508
``turtle typewriter'' from \url{http://thinkpython2.com/code/typewriter.py}
3509
to help you test your code.
3510

3511
You can get a solution from \url{http://thinkpython2.com/code/letters.py};
3512
it also requires
3513
\url{http://thinkpython2.com/code/polygon.py}.
3514

3515
\end{exercise}
3516

3517
\begin{exercise}
3518

3519
Read about spirals at \url{http://en.wikipedia.org/wiki/Spiral}; then
3520
write a program that draws an Archimedian spiral (or one of the other
3521
kinds).  Solution: \url{http://thinkpython2.com/code/spiral.py}.
3522
\index{spiral}
3523
\index{Archimedian spiral} 
3524

3525
\end{exercise}
3526

3527

3528
\chapter{Conditionals and recursion}
3529

3530
The main topic of this chapter is the {\tt if} statement, which
3531
executes different code depending on the state of the program.
3532
But first I want to introduce two new operators: floor division
3533
and modulus.
3534

3535

3536
\section{Floor division and modulus}
3537

3538
The {\bf floor division} operator, \verb"//", divides
3539
two numbers and rounds down to an integer.  For example, suppose the
3540
run time of a movie is 105 minutes.  You might want to know how
3541
long that is in hours.  Conventional division
3542
returns a floating-point number:
3543

3544
\begin{verbatim}
3545
>>> minutes = 105
3546
>>> minutes / 60
3547
1.75
3548
\end{verbatim}
3549

3550
But we don't normally write hours with decimal points.  Floor
3551
division returns the integer number of hours, rounding down:
3552

3553
\begin{verbatim}
3554
>>> minutes = 105
3555
>>> hours = minutes // 60
3556
>>> hours
3557
1
3558
\end{verbatim}
3559

3560
To get the remainder, you could subtract off one hour in minutes:
3561

3562
\begin{verbatim}
3563
>>> remainder = minutes - hours * 60
3564
>>> remainder
3565
45
3566
\end{verbatim}
3567

3568
\index{floor division}
3569
\index{floating-point division}
3570
\index{division!floor}
3571
\index{division!floating-point}
3572
\index{modulus operator}
3573
\index{operator!modulus}
3574

3575
An alternative is to use the {\bf modulus operator}, \verb"%", which
3576
divides two numbers and returns the remainder.
3577

3578
\begin{verbatim}
3579
>>> remainder = minutes % 60
3580
>>> remainder
3581
45
3582
\end{verbatim}
3583
%
3584
The modulus operator is more useful than it seems.  For
3585
example, you can check whether one number is divisible by another---if
3586
{\tt x \% y} is zero, then {\tt x} is divisible by {\tt y}.
3587
\index{divisibility}
3588

3589
Also, you can extract the right-most digit
3590
or digits from a number.  For example, {\tt x \% 10} yields the
3591
right-most digit of {\tt x} (in base 10).  Similarly {\tt x \% 100}
3592
yields the last two digits.
3593

3594
If you are using Python 2, division works differently.  The
3595
division operator, \verb"/", performs floor division if both
3596
operands are integers, and floating-point division if either
3597
operand is a {\tt float}.
3598
\index{Python 2}
3599

3600

3601
\section{Boolean expressions}
3602
\index{boolean expression}
3603
\index{expression!boolean}
3604
\index{logical operator}
3605
\index{operator!logical}
3606

3607
A {\bf boolean expression} is an expression that is either true
3608
or false.  The following examples use the 
3609
operator {\tt ==}, which compares two operands and produces
3610
{\tt True} if they are equal and {\tt False} otherwise:
3611

3612
\begin{verbatim}
3613
>>> 5 == 5
3614
True
3615
>>> 5 == 6
3616
False
3617
\end{verbatim}
3618
%
3619
{\tt True} and {\tt False} are special
3620
values that belong to the type {\tt bool}; they are not strings:
3621
\index{True special value}
3622
\index{False special value}
3623
\index{special value!True}
3624
\index{special value!False}
3625
\index{bool type}
3626
\index{type!bool}
3627

3628
\begin{verbatim}
3629
>>> type(True)
3630
<class 'bool'>
3631
>>> type(False)
3632
<class 'bool'>
3633
\end{verbatim}
3634
%
3635
The {\tt ==} operator is one of the {\bf relational operators}; the
3636
others are:
3637

3638
\begin{verbatim}
3639
      x != y               # x is not equal to y
3640
      x > y                # x is greater than y
3641
      x < y                # x is less than y
3642
      x >= y               # x is greater than or equal to y
3643
      x <= y               # x is less than or equal to y
3644
\end{verbatim}
3645
%
3646
Although these operations are probably familiar to you, the Python
3647
symbols are different from the mathematical symbols.  A common error
3648
is to use a single equal sign ({\tt =}) instead of a double equal sign
3649
({\tt ==}).  Remember that {\tt =} is an assignment operator and
3650
{\tt ==} is a relational operator.   There is no such thing as
3651
{\tt =<} or {\tt =>}.
3652
\index{relational operator}
3653
\index{operator!relational}
3654

3655

3656
\section {Logical operators}
3657
\index{logical operator}
3658
\index{operator!logical}
3659

3660
There are three {\bf logical operators}: {\tt and}, {\tt
3661
or}, and {\tt not}.  The semantics (meaning) of these operators is
3662
similar to their meaning in English.  For example,
3663
{\tt x > 0 and x < 10} is true only if {\tt x} is greater than 0
3664
{\em and} less than 10.
3665
\index{and operator}
3666
\index{or operator}
3667
\index{not operator}
3668
\index{operator!and}
3669
\index{operator!or}
3670
\index{operator!not}
3671

3672
{\tt n\%2 == 0 or n\%3 == 0} is true if {\em either or both} of the
3673
conditions is true, that is, if the number is divisible by 2 {\em or}
3674
3.
3675

3676
Finally, the {\tt not} operator negates a boolean
3677
expression, so {\tt not (x > y)} is true if {\tt x > y} is false,
3678
that is, if {\tt x} is less than or equal to {\tt y}.
3679

3680
Strictly speaking, the operands of the logical operators should be
3681
boolean expressions, but Python is not very strict.
3682
Any nonzero number is interpreted as {\tt True}:
3683

3684
\begin{verbatim}
3685
>>> 42 and True
3686
True
3687
\end{verbatim}
3688
%
3689
This flexibility can be useful, but there are some subtleties to
3690
it that might be confusing.  You might want to avoid it (unless
3691
you know what you are doing).
3692

3693

3694
\section{Conditional execution}
3695
\label{conditional.execution}
3696

3697
\index{conditional statement}
3698
\index{statement!conditional}
3699
\index{if statement}
3700
\index{statement!if}
3701
\index{conditional execution}
3702
In order to write useful programs, we almost always need the ability
3703
to check conditions and change the behavior of the program
3704
accordingly.  {\bf Conditional statements} give us this ability.  The
3705
simplest form is the {\tt if} statement:
3706

3707
\begin{verbatim}
3708
if x > 0:
3709
    print('x is positive')
3710
\end{verbatim}
3711
%
3712
The boolean expression after {\tt if} is
3713
called the {\bf condition}.  If it is true, the indented
3714
statement runs.  If not, nothing happens.
3715
\index{condition}
3716
\index{compound statement}
3717
\index{statement!compound}
3718

3719
{\tt if} statements have the same structure as function definitions:
3720
a header followed by an indented body.  Statements like this are
3721
called {\bf compound statements}.
3722

3723
There is no limit on the number of statements that can appear in
3724
the body, but there has to be at least one.
3725
Occasionally, it is useful to have a body with no statements (usually
3726
as a place keeper for code you haven't written yet).  In that
3727
case, you can use the {\tt pass} statement, which does nothing.
3728
\index{pass statement}
3729
\index{statement!pass}
3730

3731
\begin{verbatim}
3732
if x < 0:
3733
    pass          # TODO: need to handle negative values!
3734
\end{verbatim}
3735
%
3736

3737
\section{Alternative execution}
3738
\label{alternative.execution}
3739
\index{alternative execution}
3740
\index{else keyword}
3741
\index{keyword!else}
3742

3743
A second form of the {\tt if} statement is ``alternative execution'',
3744
in which there are two possibilities and the condition determines
3745
which one runs.  The syntax looks like this:
3746

3747
\begin{verbatim}
3748
if x % 2 == 0:
3749
    print('x is even')
3750
else:
3751
    print('x is odd')
3752
\end{verbatim}
3753
%
3754
If the remainder when {\tt x} is divided by 2 is 0, then we know that
3755
{\tt x} is even, and the program displays an appropriate message.  If
3756
the condition is false, the second set of statements runs.
3757
Since the condition must be true or false, exactly one of the
3758
alternatives will run.  The alternatives are called {\bf
3759
  branches}, because they are branches in the flow of execution.
3760
\index{branch}
3761

3762

3763

3764
\section{Chained conditionals}
3765
\index{chained conditional}
3766
\index{conditional!chained}
3767

3768
Sometimes there are more than two possibilities and we need more than
3769
two branches.  One way to express a computation like that is a {\bf
3770
chained conditional}:
3771

3772
\begin{verbatim}
3773
if x < y:
3774
    print('x is less than y')
3775
elif x > y:
3776
    print('x is greater than y')
3777
else:
3778
    print('x and y are equal')
3779
\end{verbatim}
3780
%
3781
{\tt elif} is an abbreviation of ``else if''.  Again, exactly one
3782
branch will run.  There is no limit on the number of {\tt
3783
elif} statements.  If there is an {\tt else} clause, it has to be
3784
at the end, but there doesn't have to be one.
3785
\index{elif keyword}
3786
\index{keyword!elif}
3787

3788
\begin{verbatim}
3789
if choice == 'a':
3790
    draw_a()
3791
elif choice == 'b':
3792
    draw_b()
3793
elif choice == 'c':
3794
    draw_c()
3795
\end{verbatim}
3796
%
3797
Each condition is checked in order.  If the first is false,
3798
the next is checked, and so on.  If one of them is
3799
true, the corresponding branch runs and the statement
3800
ends.  Even if more than one condition is true, only the
3801
first true branch runs.  
3802

3803

3804
\section{Nested conditionals}
3805
\index{nested conditional}
3806
\index{conditional!nested}
3807

3808
One conditional can also be nested within another.  We could have
3809
written the example in the previous section like this:
3810

3811
\begin{verbatim}
3812
if x == y:
3813
    print('x and y are equal')
3814
else:
3815
    if x < y:
3816
        print('x is less than y')
3817
    else:
3818
        print('x is greater than y')
3819
\end{verbatim}
3820
%
3821
The outer conditional contains two branches.  The
3822
first branch contains a simple statement.  The second branch
3823
contains another {\tt if} statement, which has two branches of its
3824
own.  Those two branches are both simple statements,
3825
although they could have been conditional statements as well.
3826

3827
Although the indentation of the statements makes the structure
3828
apparent, {\bf nested conditionals} become difficult to read very
3829
quickly.  It is a good idea to avoid them when you can.
3830

3831
Logical operators often provide a way to simplify nested conditional
3832
statements.  For example, we can rewrite the following code using a
3833
single conditional:
3834

3835
\begin{verbatim}
3836
if 0 < x:
3837
    if x < 10:
3838
        print('x is a positive single-digit number.')
3839
\end{verbatim}
3840
%
3841
The {\tt print} statement runs only if we make it past both
3842
conditionals, so we can get the same effect with the {\tt and} operator:
3843

3844
\begin{verbatim}
3845
if 0 < x and x < 10:
3846
    print('x is a positive single-digit number.')
3847
\end{verbatim}
3848

3849
For this kind of condition, Python provides a more concise option:
3850

3851
\begin{verbatim}
3852
if 0 < x < 10:
3853
    print('x is a positive single-digit number.')
3854
\end{verbatim}
3855

3856

3857
\section{Recursion}
3858
\label{recursion}
3859
\index{recursion}
3860

3861
It is legal for one function to call another;
3862
it is also legal for a function to call itself.  It may not be obvious
3863
why that is a good thing, but it turns out to be one of the most
3864
magical things a program can do.
3865
For example, look at the following function:
3866

3867
\begin{verbatim}
3868
def countdown(n):
3869
    if n <= 0:
3870
        print('Blastoff!')
3871
    else:
3872
        print(n)
3873
        countdown(n-1)
3874
\end{verbatim}
3875
%
3876
If {\tt n} is 0 or negative, it outputs the word, ``Blastoff!''
3877
Otherwise, it outputs {\tt n} and then calls a function named {\tt
3878
countdown}---itself---passing {\tt n-1} as an argument.
3879

3880
What happens if we call this function like this?
3881

3882
\begin{verbatim}
3883
>>> countdown(3)
3884
\end{verbatim}
3885
%
3886
The execution of {\tt countdown} begins with {\tt n=3}, and since
3887
{\tt n} is greater than 0, it outputs the value 3, and then calls itself...
3888

3889
\begin{quote}
3890
The execution of {\tt countdown} begins with {\tt n=2}, and since
3891
{\tt n} is greater than 0, it outputs the value 2, and then calls itself...
3892

3893
\begin{quote}
3894
The execution of {\tt countdown} begins with {\tt n=1}, and since
3895
{\tt n} is greater than 0, it outputs the value 1, and then calls itself...
3896

3897
\begin{quote}
3898
The execution of {\tt countdown} begins with {\tt n=0}, and since {\tt
3899
n} is not greater than 0, it outputs the word, ``Blastoff!'' and then
3900
returns.
3901
\end{quote}
3902

3903
The {\tt countdown} that got {\tt n=1} returns.
3904
\end{quote}
3905

3906
The {\tt countdown} that got {\tt n=2} returns.
3907
\end{quote}
3908

3909
The {\tt countdown} that got {\tt n=3} returns.
3910

3911
And then you're back in \verb"__main__".  So, the
3912
total output looks like this:
3913
\index{main}
3914

3915
\begin{verbatim}
3916
3
3917
2
3918
1
3919
Blastoff!
3920
\end{verbatim}
3921
%
3922
A function that calls itself is {\bf recursive}; the process of
3923
executing it is called {\bf recursion}.
3924
\index{recursion}
3925
\index{function!recursive}
3926

3927
As another example, we can write a function that prints a
3928
string {\tt n} times.
3929

3930
\begin{verbatim}
3931
def print_n(s, n):
3932
    if n <= 0:
3933
        return
3934
    print(s)
3935
    print_n(s, n-1)
3936
\end{verbatim}
3937
%
3938
If {\tt n <= 0} the {\bf return statement} exits the function.  The
3939
flow of execution immediately returns to the caller, and the remaining
3940
lines of the function don't run.
3941
\index{return statement}
3942
\index{statement!return}
3943

3944
The rest of the function is similar to {\tt countdown}: it displays
3945
{\tt s} and then calls itself to display {\tt s} $n-1$ additional
3946
times.  So the number of lines of output is {\tt 1 + (n - 1)}, which
3947
adds up to {\tt n}.
3948

3949
For simple examples like this, it is probably easier to use a {\tt
3950
for} loop.  But we will see examples later that are hard to write
3951
with a {\tt for} loop and easy to write with recursion, so it is
3952
good to start early.
3953
\index{for loop}
3954
\index{loop!for}
3955

3956

3957
\section{Stack diagrams for recursive functions}
3958
\label{recursive.stack}
3959
\index{stack diagram}
3960
\index{function frame}
3961
\index{frame}
3962

3963
In Section~\ref{stackdiagram}, we used a stack diagram to represent
3964
the state of a program during a function call.  The same kind of
3965
diagram can help interpret a recursive function.
3966

3967
Every time a function gets called, Python creates a
3968
frame to contain the function's local variables and parameters.
3969
For a recursive function, there might be more than one frame on the
3970
stack at the same time.
3971

3972
Figure~\ref{fig.stack2} shows a stack diagram for {\tt countdown} called with
3973
{\tt n = 3}.
3974

3975
\begin{figure}
3976
\centerline
3977
{\includegraphics[scale=0.8]{figs/stack2.pdf}}
3978
\caption{Stack diagram.}
3979
\label{fig.stack2}
3980
\end{figure}
3981

3982

3983
As usual, the top of the stack is the frame for \verb"__main__".
3984
It is empty because we did not create any variables in 
3985
\verb"__main__" or pass any arguments to it.
3986
\index{base case}
3987
\index{recursion!base case}
3988

3989
The four {\tt countdown} frames have different values for the
3990
parameter {\tt n}.  The bottom of the stack, where {\tt n=0}, is
3991
called the {\bf base case}.  It does not make a recursive call, so
3992
there are no more frames.
3993

3994
As an exercise, draw a stack diagram for \verb"print_n" called with
3995
\verb"s = 'Hello'" and {\tt n=2}.
3996
Then write a function called \verb"do_n" that takes a function
3997
object and a number, {\tt n}, as arguments, and that calls
3998
the given function {\tt n} times.
3999

4000

4001
\section{Infinite recursion}
4002
\index{infinite recursion}
4003
\index{recursion!infinite}
4004
\index{runtime error}
4005
\index{error!runtime}
4006
\index{traceback}
4007

4008
If a recursion never reaches a base case, it goes on making
4009
recursive calls forever, and the program never terminates.  This is
4010
known as {\bf infinite recursion}, and it is generally not
4011
a good idea.  Here is a minimal program with an infinite recursion:
4012

4013
\begin{verbatim}
4014
def recurse():
4015
    recurse()
4016
\end{verbatim}
4017
%
4018
In most programming environments, a program with infinite recursion
4019
does not really run forever.  Python reports an error
4020
message when the maximum recursion depth is reached:
4021
\index{exception!RuntimeError}
4022
\index{RuntimeError}
4023

4024
\begin{verbatim}
4025
  File "<stdin>", line 2, in recurse
4026
  File "<stdin>", line 2, in recurse
4027
  File "<stdin>", line 2, in recurse
4028
                  .   
4029
                  .
4030
                  .
4031
  File "<stdin>", line 2, in recurse
4032
RuntimeError: Maximum recursion depth exceeded
4033
\end{verbatim}
4034
%
4035
This traceback is a little bigger than the one we saw in the
4036
previous chapter.  When the error occurs, there are 1000
4037
{\tt recurse} frames on the stack!
4038

4039
If you encounter an infinite recursion by accident, review
4040
your function to confirm that there is a base case that does not
4041
make a recursive call.  And if there is a base case, check whether
4042
you are guaranteed to reach it.
4043

4044

4045
\section{Keyboard input}
4046
\index{keyboard input}
4047

4048
The programs we have written so far accept no input from the user.
4049
They just do the same thing every time.
4050

4051
Python provides a built-in function called {\tt input} that
4052
stops the program and
4053
waits for the user to type something.  When the user presses {\sf
4054
  Return} or {\sf Enter}, the program resumes and \verb"input"
4055
returns what the user typed as a string.  In Python 2, the same
4056
function is called \verb"raw_input".
4057
\index{Python 2}
4058
\index{input function}
4059
\index{function!input}
4060

4061
\begin{verbatim}
4062
>>> text = input()
4063
What are you waiting for?
4064
>>> text
4065
'What are you waiting for?'
4066
\end{verbatim}
4067
%
4068
Before getting input from the user, it is a good idea to print a
4069
prompt telling the user what to type.  \verb"input" can take a
4070
prompt as an argument:
4071
\index{prompt}
4072

4073
\begin{verbatim}
4074
>>> name = input('What...is your name?\n')
4075
What...is your name?
4076
Arthur, King of the Britons!
4077
>>> name
4078
'Arthur, King of the Britons!'
4079
\end{verbatim}
4080
%
4081
The sequence \verb"\n" at the end of the prompt represents a {\bf
4082
  newline}, which is a special character that causes a line break.
4083
That's why the user's input appears below the prompt.  \index{newline}
4084

4085
If you expect the user to type an integer, you can try to convert
4086
the return value to {\tt int}:
4087

4088
\begin{verbatim}
4089
>>> prompt = 'What...is the airspeed velocity of an unladen swallow?\n'
4090
>>> speed = input(prompt)
4091
What...is the airspeed velocity of an unladen swallow?
4092
42
4093
>>> int(speed)
4094
42
4095
\end{verbatim}
4096
%
4097
But if the user types something other than a string of digits,
4098
you get an error:
4099

4100
\begin{verbatim}
4101
>>> speed = input(prompt)
4102
What...is the airspeed velocity of an unladen swallow?
4103
What do you mean, an African or a European swallow?
4104
>>> int(speed)
4105
ValueError: invalid literal for int() with base 10
4106
\end{verbatim}
4107
%
4108
We will see how to handle this kind of error later.
4109
\index{ValueError}
4110
\index{exception!ValueError}
4111

4112

4113
\section{Debugging}
4114
\label{whitespace}
4115
\index{debugging}
4116
\index{traceback}
4117

4118
When a syntax or runtime error occurs, the error message contains
4119
a lot of information, but it can be overwhelming.  The most
4120
useful parts are usually:
4121

4122
\begin{itemize}
4123

4124
\item What kind of error it was, and
4125

4126
\item Where it occurred.
4127

4128
\end{itemize}
4129

4130
Syntax errors are usually easy to find, but there are a few
4131
gotchas.  Whitespace errors can be tricky because spaces and
4132
tabs are invisible and we are used to ignoring them.
4133
\index{whitespace}
4134

4135
\begin{verbatim}
4136
>>> x = 5
4137
>>>  y = 6
4138
  File "<stdin>", line 1
4139
    y = 6
4140
    ^
4141
IndentationError: unexpected indent
4142
\end{verbatim}
4143
%
4144
In this example, the problem is that the second line is indented by
4145
one space.  But the error message points to {\tt y}, which is
4146
misleading.  In general, error messages indicate where the problem was
4147
discovered, but the actual error might be earlier in the code,
4148
sometimes on a previous line.
4149
\index{error!runtime}
4150
\index{runtime error}
4151

4152
The same is true of runtime errors.  Suppose you are trying
4153
to compute a signal-to-noise ratio in decibels.  The formula
4154
is $SNR_{db} = 10 \log_{10} (P_{signal} / P_{noise})$.  In Python,
4155
you might write something like this:
4156

4157
\begin{verbatim}
4158
import math
4159
signal_power = 9
4160
noise_power = 10
4161
ratio = signal_power // noise_power
4162
decibels = 10 * math.log10(ratio)
4163
print(decibels)
4164
\end{verbatim}
4165
%
4166
When you run this program, you get an exception:
4167
%
4168
\index{exception!OverflowError}
4169
\index{OverflowError}
4170

4171
\begin{verbatim}
4172
Traceback (most recent call last):
4173
  File "snr.py", line 5, in ?
4174
    decibels = 10 * math.log10(ratio)
4175
ValueError: math domain error
4176
\end{verbatim}
4177
%
4178
The error message indicates line 5, but there is nothing
4179
wrong with that line.  To find the real error, it might be
4180
useful to print the value of {\tt ratio}, which turns out to
4181
be 0.  The problem is in line 4, which uses floor division
4182
instead of floating-point division.
4183
\index{floor division}
4184
\index{division!floor}
4185

4186
You should take the time to read error messages carefully, but don't
4187
assume that everything they say is correct.
4188

4189

4190
\section{Glossary}
4191

4192
\begin{description}
4193

4194
\item[floor division:] An operator, denoted {\tt //}, that divides two
4195
  numbers and rounds down (toward negative infinity) to an integer.
4196
  \index{floor division} 
4197
  \index{division!floor}
4198

4199
\item[modulus operator:]  An operator, denoted with a percent sign
4200
({\tt \%}), that works on integers and returns the remainder when one
4201
number is divided by another.
4202
\index{modulus operator}
4203
\index{operator!modulus}
4204

4205
\item[boolean expression:]  An expression whose value is either 
4206
{\tt True} or {\tt False}.
4207
\index{boolean expression}
4208
\index{expression!boolean}
4209

4210
\item[relational operator:] One of the operators that compares
4211
its operands: {\tt ==}, {\tt !=}, {\tt >}, {\tt <}, {\tt >=}, and {\tt <=}.
4212

4213
\item[logical operator:] One of the operators that combines boolean
4214
expressions: {\tt and}, {\tt or}, and {\tt not}.
4215

4216
\item[conditional statement:]  A statement that controls the flow of
4217
execution depending on some condition.
4218
\index{conditional statement}
4219
\index{statement!conditional}
4220

4221
\item[condition:] The boolean expression in a conditional statement
4222
that determines which branch runs.
4223
\index{condition}
4224

4225
\item[compound statement:]  A statement that consists of a header
4226
and a body.  The header ends with a colon (:).  The body is indented
4227
relative to the header.
4228
\index{compound statement}
4229

4230
\item[branch:] One of the alternative sequences of statements in
4231
a conditional statement.
4232
\index{branch}
4233

4234
\item[chained conditional:]  A conditional statement with a series
4235
of alternative branches.
4236
\index{chained conditional}
4237
\index{conditional!chained}
4238

4239
\item[nested conditional:]  A conditional statement that appears
4240
in one of the branches of another conditional statement.
4241
\index{nested conditional}
4242
\index{conditional!nested}
4243

4244
\item[return statement:] A statement that causes a function to
4245
end immediately and return to the caller.
4246

4247
\item[recursion:]  The process of calling the function that is
4248
currently executing.
4249
\index{recursion}
4250

4251
\item[base case:]  A conditional branch in a
4252
recursive function that does not make a recursive call.
4253
\index{base case}
4254

4255
\item[infinite recursion:]  A recursion that doesn't have a
4256
base case, or never reaches it.  Eventually, an infinite recursion
4257
causes a runtime error.
4258
\index{infinite recursion}
4259

4260
\end{description}
4261

4262
\section{Exercises}
4263

4264
\begin{exercise}
4265

4266
The {\tt time} module provides a function, also named {\tt time}, that
4267
returns the current Greenwich Mean Time in ``the epoch'', which is
4268
an arbitrary time used as a reference point.  On UNIX systems, the
4269
epoch is 1 January 1970.
4270

4271
\begin{verbatim}
4272
>>> import time
4273
>>> time.time()
4274
1437746094.5735958
4275
\end{verbatim}
4276

4277
Write a script that reads the current time and converts it to
4278
a time of day in hours, minutes, and seconds, plus the number of
4279
days since the epoch.
4280

4281
\end{exercise}
4282

4283

4284
\begin{exercise}
4285
\index{Fermat's Last Theorem}
4286

4287
Fermat's Last Theorem says that there are no positive integers
4288
$a$, $b$, and $c$ such that
4289

4290
\[ a^n + b^n = c^n \]
4291
%
4292
for any values of $n$ greater than 2.
4293

4294
\begin{enumerate}
4295

4296
\item Write a function named \verb"check_fermat" that takes four
4297
parameters---{\tt a}, {\tt b}, {\tt c} and {\tt n}---and
4298
checks to see if Fermat's theorem holds.  If
4299
$n$ is greater than 2 and 
4300

4301
\[a^n + b^n = c^n \]
4302
%
4303
the program should print, ``Holy smokes, Fermat was wrong!''
4304
Otherwise the program should print, ``No, that doesn't work.''
4305

4306
\item Write a function that prompts the user to input values
4307
for {\tt a}, {\tt b}, {\tt c} and {\tt n}, converts them to
4308
integers, and uses \verb"check_fermat" to check whether they
4309
violate Fermat's theorem.
4310

4311
\end{enumerate}
4312

4313
\end{exercise}
4314

4315

4316
\begin{exercise}
4317
\index{triangle}
4318

4319
If you are given three sticks, you may or may not be able to arrange
4320
them in a triangle.  For example, if one of the sticks is 12 inches
4321
long and the other two are one inch long, you will
4322
not be able to get the short sticks to meet in the middle.  For any
4323
three lengths, there is a simple test to see if it is possible to form
4324
a triangle:
4325

4326
\begin{quotation}
4327
If any of the three lengths is greater than the sum of the other
4328
  two, then you cannot form a triangle.  Otherwise, you
4329
  can.  (If the sum of two lengths equals the third, they form
4330
    what is called a ``degenerate'' triangle.)
4331
\end{quotation}
4332

4333
\begin{enumerate}
4334

4335
\item Write a function named \verb"is_triangle" that takes three
4336
  integers as arguments, and that prints either ``Yes'' or ``No'', depending
4337
  on whether you can or cannot form a triangle from sticks with the
4338
  given lengths.
4339

4340
\item Write a function that prompts the user to input three stick
4341
  lengths, converts them to integers, and uses \verb"is_triangle" to
4342
  check whether sticks with the given lengths can form a triangle.
4343

4344
\end{enumerate}
4345

4346
\end{exercise}
4347

4348
\begin{exercise}
4349
What is the output of the following program?
4350
Draw a stack diagram that shows the state of the program
4351
when it prints the result.
4352

4353
\begin{verbatim}
4354
def recurse(n, s):
4355
    if n == 0:
4356
        print(s)
4357
    else:
4358
        recurse(n-1, n+s)
4359

4360
recurse(3, 0)
4361
\end{verbatim}
4362

4363
\begin{enumerate}
4364

4365
\item What would happen if you called this function like this: {\tt
4366
  recurse(-1, 0)}?
4367

4368
\item Write a docstring that explains everything someone would need to
4369
  know in order to use this function (and nothing else).
4370

4371
\end{enumerate}
4372

4373
\end{exercise}
4374

4375

4376
The following exercises use the {\tt turtle} module, described in
4377
Chapter~\ref{turtlechap}:
4378
\index{TurtleWorld}
4379

4380
\begin{exercise}
4381

4382
Read the following function and see if you can figure out
4383
what it does (see the examples in Chapter~\ref{turtlechap}).  Then run it 
4384
and see if you got it right.
4385

4386
\begin{verbatim}
4387
def draw(t, length, n):
4388
    if n == 0:
4389
        return
4390
    angle = 50
4391
    t.fd(length*n)
4392
    t.lt(angle)
4393
    draw(t, length, n-1)
4394
    t.rt(2*angle)
4395
    draw(t, length, n-1)
4396
    t.lt(angle)
4397
    t.bk(length*n)
4398
\end{verbatim}
4399

4400
\end{exercise}
4401

4402

4403
\begin{figure}
4404
\centerline
4405
{\includegraphics[scale=0.8]{figs/koch.pdf}}
4406
\caption{A Koch curve.}
4407
\label{fig.koch}
4408
\end{figure}
4409

4410
\begin{exercise}
4411
\index{Koch curve}
4412

4413
The Koch curve is a fractal that looks something like
4414
Figure~\ref{fig.koch}.  To draw a Koch curve with length $x$, all you
4415
have to do is
4416

4417
\begin{enumerate}
4418

4419
\item Draw a Koch curve with length $x/3$.
4420

4421
\item Turn left 60 degrees.
4422

4423
\item Draw a Koch curve with length $x/3$.
4424

4425
\item Turn right 120 degrees.
4426

4427
\item Draw a Koch curve with length $x/3$.
4428

4429
\item Turn left 60 degrees.
4430

4431
\item Draw a Koch curve with length $x/3$.
4432

4433
\end{enumerate}
4434

4435
The exception is if $x$ is less than 3: in that case,
4436
you can just draw a straight line with length $x$.
4437

4438
\begin{enumerate}
4439

4440
\item Write a function called {\tt koch} that takes a turtle and
4441
a length as parameters, and that uses the turtle to draw a Koch
4442
curve with the given length.
4443

4444
\item Write a function called {\tt snowflake} that draws three
4445
Koch curves to make the outline of a snowflake.
4446

4447
Solution: \url{http://thinkpython2.com/code/koch.py}.
4448

4449
\item The Koch curve can be generalized in several ways.  See
4450
\url{http://en.wikipedia.org/wiki/Koch_snowflake} for examples and
4451
implement your favorite.
4452

4453
\end{enumerate}
4454
\end{exercise}
4455

4456

4457
\chapter{Fruitful functions}
4458
\label{fruitchap}
4459

4460
Many of the Python functions we have used, such as the math
4461
functions, produce return values.  But the functions we've written
4462
are all void: they have an effect, like printing a value
4463
or moving a turtle, but they don't have a return value.  In
4464
this chapter you will learn to write fruitful functions.
4465

4466

4467
\section{Return values}
4468
\index{return value}
4469

4470
Calling the function generates a return
4471
value, which we usually assign to a variable or use as part of an
4472
expression.
4473

4474
\begin{verbatim}
4475
e = math.exp(1.0)
4476
height = radius * math.sin(radians)
4477
\end{verbatim}
4478
%
4479
The functions we have written so far are void.  Speaking casually,
4480
they have no return value; more precisely,
4481
their return value is {\tt None}.
4482

4483
In this chapter, we are (finally) going to write fruitful functions.
4484
The first example is {\tt area}, which returns the area of a circle
4485
with the given radius:
4486

4487
\begin{verbatim}
4488
def area(radius):
4489
    a = math.pi * radius**2
4490
    return a
4491
\end{verbatim}
4492
%
4493
We have seen the {\tt return} statement before, but in a fruitful
4494
function the {\tt return} statement includes
4495
an expression.  This statement means: ``Return immediately from
4496
this function and use the following expression as a return value.''
4497
The expression can be arbitrarily complicated, so we could
4498
have written this function more concisely:
4499
\index{return statement}
4500
\index{statement!return}
4501

4502
\begin{verbatim}
4503
def area(radius):
4504
    return math.pi * radius**2
4505
\end{verbatim}
4506
%
4507
On the other hand, {\bf temporary variables} like {\tt a} can make
4508
debugging easier.
4509
\index{temporary variable}
4510
\index{variable!temporary}
4511

4512
Sometimes it is useful to have multiple return statements, one in each
4513
branch of a conditional:
4514

4515
\begin{verbatim}
4516
def absolute_value(x):
4517
    if x < 0:
4518
        return -x
4519
    else:
4520
        return x
4521
\end{verbatim}
4522
%
4523
Since these {\tt return} statements are in an alternative conditional,
4524
only one runs.
4525

4526
As soon as a return statement runs, the function
4527
terminates without executing any subsequent statements.
4528
Code that appears after a {\tt return} statement, or any other place
4529
the flow of execution can never reach, is called {\bf dead code}.
4530
\index{dead code}
4531

4532
In a fruitful function, it is a good idea to ensure
4533
that every possible path through the program hits a
4534
{\tt return} statement.  For example:
4535

4536
\begin{verbatim}
4537
def absolute_value(x):
4538
    if x < 0:
4539
        return -x
4540
    if x > 0:
4541
        return x
4542
\end{verbatim}
4543
%
4544
This function is incorrect because if {\tt x} happens to be 0,
4545
neither condition is true, and the function ends without hitting a
4546
{\tt return} statement.  If the flow of execution gets to the end
4547
of a function, the return value is {\tt None}, which is not
4548
the absolute value of 0.
4549
\index{None special value}
4550
\index{special value!None}
4551

4552
\begin{verbatim}
4553
>>> print(absolute_value(0))
4554
None
4555
\end{verbatim}
4556
%
4557
By the way, Python provides a built-in function called 
4558
{\tt abs} that computes absolute values.
4559
\index{abs function}
4560
\index{function!abs}
4561

4562
As an exercise, write a {\tt compare} function that
4563
takes two values, {\tt x} and {\tt y}, and returns {\tt 1} if {\tt x > y},
4564
{\tt 0} if {\tt x == y}, and {\tt -1} if {\tt x < y}.
4565
\index{compare function}
4566
\index{function!compare}
4567

4568

4569
\section{Incremental development}
4570
\label{incremental.development}
4571
\index{development plan!incremental}
4572

4573
As you write larger functions, you might find yourself
4574
spending more time debugging.
4575

4576
To deal with increasingly complex programs,
4577
you might want to try a process called
4578
{\bf incremental development}.  The goal of incremental development
4579
is to avoid long debugging sessions by adding and testing only
4580
a small amount of code at a time.
4581
\index{testing!incremental development}
4582
\index{Pythagorean theorem}
4583

4584
As an example, suppose you want to find the distance between two
4585
points, given by the coordinates $(x_1, y_1)$ and $(x_2, y_2)$.
4586
By the Pythagorean theorem, the distance is:
4587

4588
\begin{displaymath}
4589
\mathrm{distance} = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}
4590
\end{displaymath}
4591
%
4592
The first step is to consider what a {\tt distance} function should
4593
look like in Python.  In other words, what are the inputs (parameters)
4594
and what is the output (return value)?
4595

4596
In this case, the inputs are two points, which you can represent
4597
using four numbers.  The return value is the distance represented by
4598
a floating-point value.
4599

4600
Immediately you can write an outline of the function:
4601

4602
\begin{verbatim}
4603
def distance(x1, y1, x2, y2):
4604
    return 0.0
4605
\end{verbatim}
4606
%
4607
Obviously, this version doesn't compute distances; it always returns
4608
zero.  But it is syntactically correct, and it runs, which means that
4609
you can test it before you make it more complicated.
4610

4611
To test the new function, call it with sample arguments:
4612

4613
\begin{verbatim}
4614
>>> distance(1, 2, 4, 6)
4615
0.0
4616
\end{verbatim}
4617
%
4618
I chose these values so that the horizontal distance is 3 and the
4619
vertical distance is 4; that way, the result is 5, the hypotenuse 
4620
of a 3-4-5 triangle. When testing a function, it is
4621
useful to know the right answer.
4622
\index{testing!knowing the answer}
4623

4624
At this point we have confirmed that the function is syntactically
4625
correct, and we can start adding code to the body.
4626
A reasonable next step is to find the differences
4627
$x_2 - x_1$ and $y_2 - y_1$.  The next version stores those values in
4628
temporary variables and prints them.
4629

4630
\begin{verbatim}
4631
def distance(x1, y1, x2, y2):
4632
    dx = x2 - x1
4633
    dy = y2 - y1
4634
    print('dx is', dx)
4635
    print('dy is', dy)
4636
    return 0.0
4637
\end{verbatim}
4638
%
4639
If the function is working, it should display \verb"dx is 3" and 
4640
\verb"dy is 4".  If so, we know that the function is getting the right
4641
arguments and performing the first computation correctly.  If not,
4642
there are only a few lines to check.
4643

4644
Next we compute the sum of squares of {\tt dx} and {\tt dy}:
4645

4646
\begin{verbatim}
4647
def distance(x1, y1, x2, y2):
4648
    dx = x2 - x1
4649
    dy = y2 - y1
4650
    dsquared = dx**2 + dy**2
4651
    print('dsquared is: ', dsquared)
4652
    return 0.0
4653
\end{verbatim}
4654
%
4655
Again, you would run the program at this stage and check the output
4656
(which should be 25).
4657
Finally, you can use {\tt math.sqrt} to compute and return the result:
4658
\index{sqrt}
4659
\index{function!sqrt}
4660

4661
\begin{verbatim}
4662
def distance(x1, y1, x2, y2):
4663
    dx = x2 - x1
4664
    dy = y2 - y1
4665
    dsquared = dx**2 + dy**2
4666
    result = math.sqrt(dsquared)
4667
    return result
4668
\end{verbatim}
4669
%
4670
If that works correctly, you are done.  Otherwise, you might
4671
want to print the value of {\tt result} before the return
4672
statement.
4673

4674
The final version of the function doesn't display anything when it
4675
runs; it only returns a value.  The {\tt print} statements we wrote
4676
are useful for debugging, but once you get the function working, you
4677
should remove them.  Code like that is called {\bf scaffolding}
4678
because it is helpful for building the program but is not part of the
4679
final product.
4680
\index{scaffolding}
4681

4682
When you start out, you should add only a line or two of code at a
4683
time.  As you gain more experience, you might find yourself writing
4684
and debugging bigger chunks.  Either way, incremental development
4685
can save you a lot of debugging time.
4686

4687
The key aspects of the process are:
4688

4689
\begin{enumerate}
4690

4691
\item Start with a working program and make small incremental changes. 
4692
At any point, if there is an error, you should have a good idea
4693
where it is.
4694

4695
\item Use variables to hold intermediate values so you can
4696
display and check them.
4697

4698
\item Once the program is working, you might want to remove some of
4699
the scaffolding or consolidate multiple statements into compound
4700
expressions, but only if it does not make the program difficult to
4701
read.
4702

4703
\end{enumerate}
4704

4705
As an exercise, use incremental development to write a function
4706
called {\tt hypotenuse} that returns the length of the hypotenuse of a
4707
right triangle given the lengths of the other two legs as arguments.
4708
Record each stage of the development process as you go.
4709
\index{hypotenuse}
4710

4711

4712

4713
\section{Composition}
4714
\index{composition}
4715
\index{function composition}
4716

4717
As you should expect by now, you can call one function from within
4718
another.  As an example, we'll write a function that takes two points,
4719
the center of the circle and a point on the perimeter, and computes
4720
the area of the circle.
4721

4722
Assume that the center point is stored in the variables {\tt xc} and
4723
{\tt yc}, and the perimeter point is in {\tt xp} and {\tt yp}. The
4724
first step is to find the radius of the circle, which is the distance
4725
between the two points.  We just wrote a function, {\tt
4726
distance}, that does that:
4727

4728
\begin{verbatim}
4729
radius = distance(xc, yc, xp, yp)
4730
\end{verbatim}
4731
%
4732
The next step is to find the area of a circle with that radius;
4733
we just wrote that, too:
4734

4735
\begin{verbatim}
4736
result = area(radius)
4737
\end{verbatim}
4738
%
4739
Encapsulating these steps in a function, we get:
4740
\index{encapsulation}
4741

4742
\begin{verbatim}
4743
def circle_area(xc, yc, xp, yp):
4744
    radius = distance(xc, yc, xp, yp)
4745
    result = area(radius)
4746
    return result
4747
\end{verbatim}
4748
%
4749
The temporary variables {\tt radius} and {\tt result} are useful for
4750
development and debugging, but once the program is working, we can
4751
make it more concise by composing the function calls:
4752

4753
\begin{verbatim}
4754
def circle_area(xc, yc, xp, yp):
4755
    return area(distance(xc, yc, xp, yp))
4756
\end{verbatim}
4757
%
4758

4759
\section{Boolean functions}
4760
\label{boolean}
4761

4762
Functions can return booleans, which is often convenient for hiding
4763
complicated tests inside functions.  \index{boolean function}
4764
For example:
4765

4766
\begin{verbatim}
4767
def is_divisible(x, y):
4768
    if x % y == 0:
4769
        return True
4770
    else:
4771
        return False
4772
\end{verbatim}
4773
%
4774
It is common to give boolean functions names that sound like yes/no
4775
questions; \verb"is_divisible" returns either {\tt True} or {\tt False}
4776
to indicate whether {\tt x} is divisible by {\tt y}.
4777

4778
Here is an example:
4779

4780
\begin{verbatim}
4781
>>> is_divisible(6, 4)
4782
False
4783
>>> is_divisible(6, 3)
4784
True
4785
\end{verbatim}
4786
%
4787
The result of the {\tt ==} operator is a boolean, so we can write the
4788
function more concisely by returning it directly:
4789

4790
\begin{verbatim}
4791
def is_divisible(x, y):
4792
    return x % y == 0
4793
\end{verbatim}
4794
%
4795
Boolean functions are often used in conditional statements:
4796
\index{conditional statement}
4797
\index{statement!conditional}
4798

4799
\begin{verbatim}
4800
if is_divisible(x, y):
4801
    print('x is divisible by y')
4802
\end{verbatim}
4803
%
4804
It might be tempting to write something like:
4805

4806
\begin{verbatim}
4807
if is_divisible(x, y) == True:
4808
    print('x is divisible by y')
4809
\end{verbatim}
4810
%
4811
But the extra comparison is unnecessary.
4812

4813
As an exercise, write a function \verb"is_between(x, y, z)" that
4814
returns {\tt True} if $x \le y \le z$ or {\tt False} otherwise.
4815

4816

4817
\section{More recursion}
4818
\label{more.recursion}
4819
\index{recursion}
4820
\index{Turing complete language}
4821
\index{language!Turing complete}
4822
\index{Turing, Alan}
4823
\index{Turing Thesis}
4824

4825
We have only covered a small subset of Python, but you might
4826
be interested to know that this subset is a {\em complete}
4827
programming language, which means that anything that can be
4828
computed can be expressed in this language.  Any program ever written
4829
could be rewritten using only the language features you have learned
4830
so far (actually, you would need a few commands to control devices
4831
like the mouse, disks, etc., but that's all).
4832

4833
Proving that claim is a nontrivial exercise first accomplished by Alan
4834
Turing, one of the first computer scientists (some would argue that he
4835
was a mathematician, but a lot of early computer scientists started as
4836
mathematicians).  Accordingly, it is known as the Turing Thesis.
4837
For a more complete (and accurate) discussion of the Turing Thesis,
4838
I recommend Michael Sipser's book {\em Introduction to the
4839
Theory of Computation}.
4840

4841
To give you an idea of what you can do with the tools you have learned
4842
so far, we'll evaluate a few recursively defined mathematical
4843
functions.  A recursive definition is similar to a circular
4844
definition, in the sense that the definition contains a reference to
4845
the thing being defined.  A truly circular definition is not very
4846
useful:
4847

4848
\begin{description}
4849

4850
\item[vorpal:] An adjective used to describe something that is vorpal.
4851
\index{vorpal}
4852
\index{circular definition}
4853
\index{definition!circular}
4854

4855
\end{description}
4856

4857
If you saw that definition in the dictionary, you might be annoyed. On
4858
the other hand, if you looked up the definition of the factorial
4859
function, denoted with the symbol $!$, you might get something like
4860
this:
4861
%
4862
\begin{eqnarray*}
4863
&&  0! = 1 \\
4864
&&  n! = n (n-1)!
4865
\end{eqnarray*}
4866
%
4867
This definition says that the factorial of 0 is 1, and the factorial
4868
of any other value, $n$, is $n$ multiplied by the factorial of $n-1$.
4869

4870
So $3!$ is 3 times $2!$, which is 2 times $1!$, which is 1 times
4871
$0!$. Putting it all together, $3!$ equals 3 times 2 times 1 times 1,
4872
which is 6.
4873
\index{factorial function}
4874
\index{function!factorial}
4875
\index{recursive definition}
4876

4877
If you can write a recursive definition of something, you can
4878
write a Python program to evaluate it. The first step is to decide
4879
what the parameters should be.  In this case it should be clear
4880
that {\tt factorial} takes an integer:
4881

4882
\begin{verbatim}
4883
def factorial(n):
4884
\end{verbatim}
4885
%
4886
If the argument happens to be 0, all we have to do is return 1:
4887

4888
\begin{verbatim}
4889
def factorial(n):
4890
    if n == 0:
4891
        return 1
4892
\end{verbatim}
4893
%
4894
Otherwise, and this is the interesting part, we have to make a
4895
recursive call to find the factorial of $n-1$ and then multiply it by
4896
$n$:
4897

4898
\begin{verbatim}
4899
def factorial(n):
4900
    if n == 0:
4901
        return 1
4902
    else:
4903
        recurse = factorial(n-1)
4904
        result = n * recurse
4905
        return result
4906
\end{verbatim}
4907
%
4908
The flow of execution for this program is similar to the flow of {\tt
4909
countdown} in Section~\ref{recursion}.  If we call {\tt factorial}
4910
with the value 3:
4911

4912
Since 3 is not 0, we take the second branch and calculate the factorial
4913
of {\tt n-1}...
4914

4915
\begin{quote}
4916
Since 2 is not 0, we take the second branch and calculate the factorial of
4917
{\tt n-1}...
4918

4919

4920
  \begin{quote}
4921
  Since 1 is not 0, we take the second branch and calculate the factorial
4922
  of {\tt n-1}...
4923

4924

4925
    \begin{quote}
4926
    Since 0 equals 0, we take the first branch and return 1
4927
    without making any more recursive calls.
4928
    \end{quote}
4929

4930

4931
  The return value, 1, is multiplied by $n$, which is 1, and the
4932
  result is returned.
4933
  \end{quote}
4934

4935

4936
The return value, 1, is multiplied by $n$, which is 2, and the
4937
result is returned.
4938
\end{quote}
4939

4940

4941
The return value (2) is multiplied by $n$, which is 3, and the result, 6,
4942
becomes the return value of the function call that started the whole
4943
process.
4944
\index{stack diagram}
4945

4946
Figure~\ref{fig.stack3} shows what the stack diagram looks like for
4947
this sequence of function calls.
4948

4949
\begin{figure}
4950
\centerline
4951
{\includegraphics[scale=0.8]{figs/stack3.pdf}}
4952
\caption{Stack diagram.}
4953
\label{fig.stack3}
4954
\end{figure}
4955

4956
The return values are shown being passed back up the stack.  In each
4957
frame, the return value is the value of {\tt result}, which is the
4958
product of {\tt n} and {\tt recurse}.
4959
\index{function frame}
4960
\index{frame}
4961

4962
In the last frame, the local
4963
variables {\tt recurse} and {\tt result} do not exist, because
4964
the branch that creates them does not run.
4965

4966

4967
\section{Leap of faith}
4968
\index{recursion}
4969
\index{leap of faith}
4970

4971
Following the flow of execution is one way to read programs, but
4972
it can quickly become overwhelming.  An
4973
alternative is what I call the ``leap of faith''.  When you come to a
4974
function call, instead of following the flow of execution, you {\em
4975
assume} that the function works correctly and returns the right
4976
result.
4977

4978
In fact, you are already practicing this leap of faith when you use
4979
built-in functions.  When you call {\tt math.cos} or {\tt math.exp},
4980
you don't examine the bodies of those functions.  You just
4981
assume that they work because the people who wrote the built-in
4982
functions were good programmers.
4983

4984
The same is true when you call one of your own functions.  For
4985
example, in Section~\ref{boolean}, we wrote a function called 
4986
\verb"is_divisible" that determines whether one number is divisible by
4987
another.  Once we have convinced ourselves that this function is
4988
correct---by examining the code and testing---we can use the function
4989
without looking at the body again.
4990
\index{testing!leap of faith}
4991

4992
The same is true of recursive programs.  When you get to the recursive
4993
call, instead of following the flow of execution, you should assume
4994
that the recursive call works (returns the correct result) and then ask
4995
yourself, ``Assuming that I can find the factorial of $n-1$, can I
4996
compute the factorial of $n$?''  It is clear that you
4997
can, by multiplying by $n$.
4998

4999
Of course, it's a bit strange to assume that the function works
5000
correctly when you haven't finished writing it, but that's why
5001
it's called a leap of faith!
5002

5003

5004
\section{One more example}
5005
\label{one.more.example}
5006

5007
\index{fibonacci function}
5008
\index{function!fibonacci}
5009
After {\tt factorial}, the most common example of a recursively
5010
defined mathematical function is {\tt fibonacci}, which has the
5011
following definition (see
5012
  \url{http://en.wikipedia.org/wiki/Fibonacci_number}):
5013
%
5014
\begin{eqnarray*}
5015
&& \mathrm{fibonacci}(0) = 0 \\
5016
&& \mathrm{fibonacci}(1) = 1 \\
5017
&& \mathrm{fibonacci}(n) = \mathrm{fibonacci}(n-1) + \mathrm{fibonacci}(n-2)
5018
\end{eqnarray*}
5019
%
5020
Translated into Python, it looks like this:
5021

5022
\begin{verbatim}
5023
def fibonacci(n):
5024
    if n == 0:
5025
        return 0
5026
    elif  n == 1:
5027
        return 1
5028
    else:
5029
        return fibonacci(n-1) + fibonacci(n-2)
5030
\end{verbatim}
5031
%
5032
If you try to follow the flow of execution here, even for fairly
5033
small values of $n$, your head explodes.  But according to the
5034
leap of faith, if you assume that the two recursive calls
5035
work correctly, then it is clear that you get
5036
the right result by adding them together.
5037
\index{flow of execution}
5038

5039

5040
\section{Checking types}
5041
\label{guardian}
5042

5043
What happens if we call {\tt factorial} and give it 1.5 as an argument?
5044
\index{type checking}
5045
\index{error checking}
5046
\index{factorial function}
5047
\index{RuntimeError}
5048

5049
\begin{verbatim}
5050
>>> factorial(1.5)
5051
RuntimeError: Maximum recursion depth exceeded
5052
\end{verbatim}
5053
%
5054
It looks like an infinite recursion.  How can that be?  The function
5055
has a base case---when {\tt n == 0}.  But if {\tt n} is not an integer,
5056
we can {\em miss} the base case and recurse forever.
5057
\index{infinite recursion}
5058
\index{recursion!infinite}
5059

5060
In the first recursive call, the value of {\tt n} is 0.5.
5061
In the next, it is -0.5.  From there, it gets smaller
5062
(more negative), but it will never be 0.
5063

5064
We have two choices.  We can try to generalize the {\tt factorial}
5065
function to work with floating-point numbers, or we can make {\tt
5066
  factorial} check the type of its argument.  The first option is
5067
called the gamma function and it's a
5068
little beyond the scope of this book.  So we'll go for the second.
5069
\index{gamma function}
5070

5071
We can use the built-in function {\tt isinstance} to verify the type
5072
of the argument.  While we're at it, we can also make sure the
5073
argument is positive:
5074
\index{isinstance function}
5075
\index{function!isinstance}
5076

5077
\begin{verbatim}
5078
def factorial(n):
5079
    if not isinstance(n, int):
5080
        print('Factorial is only defined for integers.')
5081
        return None
5082
    elif n < 0:
5083
        print('Factorial is not defined for negative integers.')
5084
        return None
5085
    elif n == 0:
5086
        return 1
5087
    else:
5088
        return n * factorial(n-1)
5089
\end{verbatim}
5090
%
5091
The first base case handles nonintegers; the
5092
second handles negative integers.  In both cases, the program prints
5093
an error message and returns {\tt None} to indicate that something
5094
went wrong:
5095

5096
\begin{verbatim}
5097
>>> print(factorial('fred'))
5098
Factorial is only defined for integers.
5099
None
5100
>>> print(factorial(-2))
5101
Factorial is not defined for negative integers.
5102
None
5103
\end{verbatim}
5104
% 
5105
If we get past both checks, we know that $n$ is a non-negative integer, so we can prove that the recursion terminates.
5106
\index{guardian pattern}
5107
\index{pattern!guardian}
5108

5109
This program demonstrates a pattern sometimes called a {\bf guardian}.
5110
The first two conditionals act as guardians, protecting the code that
5111
follows from values that might cause an error.  The guardians make it
5112
possible to prove the correctness of the code.
5113

5114
In Section~\ref{raise} we will see a more flexible alternative to printing
5115
an error message: raising an exception.
5116

5117

5118
\section{Debugging}
5119
\label{factdebug}
5120

5121
Breaking a large program into smaller functions creates natural
5122
checkpoints for debugging.  If a function is not
5123
working, there are three possibilities to consider:
5124
\index{debugging} 
5125

5126
\begin{itemize}
5127

5128
\item There is something wrong with the arguments the function
5129
is getting; a precondition is violated.
5130

5131
\item There is something wrong with the function; a postcondition
5132
is violated.
5133

5134
\item There is something wrong with the return value or the
5135
way it is being used.
5136

5137
\end{itemize}
5138

5139
To rule out the first possibility, you can add a {\tt print} statement
5140
at the beginning of the function and display the values of the
5141
parameters (and maybe their types).  Or you can write code
5142
that checks the preconditions explicitly.
5143
\index{precondition}
5144
\index{postcondition}
5145

5146
If the parameters look good, add a {\tt print} statement before each
5147
{\tt return} statement and display the return value.  If
5148
possible, check the result by hand.  Consider calling the
5149
function with values that make it easy to check the result
5150
(as in Section~\ref{incremental.development}).
5151

5152
If the function seems to be working, look at the function call
5153
to make sure the return value is being used correctly (or used
5154
at all!).
5155
\index{flow of execution}
5156

5157
Adding print statements at the beginning and end of a function
5158
can help make the flow of execution more visible.
5159
For example, here is a version of {\tt factorial} with
5160
print statements:
5161

5162
\begin{verbatim}
5163
def factorial(n):
5164
    space = ' ' * (4 * n)
5165
    print(space, 'factorial', n)
5166
    if n == 0:
5167
        print(space, 'returning 1')
5168
        return 1
5169
    else:
5170
        recurse = factorial(n-1)
5171
        result = n * recurse
5172
        print(space, 'returning', result)
5173
        return result
5174
\end{verbatim}
5175
%
5176
{\tt space} is a string of space characters that controls the
5177
indentation of the output.  Here is the result of {\tt factorial(4)} :
5178

5179
\begin{verbatim}
5180
                 factorial 4
5181
             factorial 3
5182
         factorial 2
5183
     factorial 1
5184
 factorial 0
5185
 returning 1
5186
     returning 1
5187
         returning 2
5188
             returning 6
5189
                 returning 24
5190
\end{verbatim}
5191
%
5192
If you are confused about the flow of execution, this kind of
5193
output can be helpful.  It takes some time to develop effective
5194
scaffolding, but a little bit of scaffolding can save a lot of debugging.
5195

5196

5197
\section{Glossary}
5198

5199
\begin{description}
5200

5201
\item[temporary variable:]  A variable used to store an intermediate value in
5202
a complex calculation.
5203
\index{temporary variable}
5204
\index{variable!temporary}
5205

5206
\item[dead code:]  Part of a program that can never run, often because
5207
it appears after a {\tt return} statement.
5208
\index{dead code}
5209

5210
\item[incremental development:]  A program development plan intended to
5211
avoid debugging by adding and testing only
5212
a small amount of code at a time.
5213
\index{incremental development}
5214

5215
\item[scaffolding:]  Code that is used during program development but is
5216
not part of the final version.
5217
\index{scaffolding}
5218

5219
\item[guardian:]  A programming pattern that uses a conditional
5220
statement to check for and handle circumstances that
5221
might cause an error.
5222
\index{guardian pattern}
5223
\index{pattern!guardian}
5224

5225
\end{description}
5226

5227

5228
\section{Exercises}
5229

5230
\begin{exercise}
5231

5232
Draw a stack diagram for the following program.  What does the program print?
5233
\index{stack diagram}
5234

5235
\begin{verbatim}
5236
def b(z):
5237
    prod = a(z, z)
5238
    print(z, prod)
5239
    return prod
5240

5241
def a(x, y):
5242
    x = x + 1
5243
    return x * y
5244

5245
def c(x, y, z):
5246
    total = x + y + z
5247
    square = b(total)**2
5248
    return square
5249

5250
x = 1
5251
y = x + 1
5252
print(c(x, y+3, x+y))
5253
\end{verbatim}
5254

5255
\end{exercise}
5256

5257

5258
\begin{exercise}
5259
\label{ackermann}
5260

5261
The Ackermann function, $A(m, n)$, is defined:
5262

5263
\begin{eqnarray*}
5264
A(m, n) = \begin{cases} 
5265
              n+1 & \mbox{if } m = 0 \\ 
5266
        A(m-1, 1) & \mbox{if } m > 0 \mbox{ and } n = 0 \\ 
5267
A(m-1, A(m, n-1)) & \mbox{if } m > 0 \mbox{ and } n > 0.
5268
\end{cases} 
5269
\end{eqnarray*}
5270
%
5271
See \url{http://en.wikipedia.org/wiki/Ackermann_function}.
5272
Write a function named {\tt ack} that evaluates the Ackermann function.
5273
Use your function to evaluate {\tt ack(3, 4)}, which should be 125.
5274
What happens for larger values of {\tt m} and {\tt n}?
5275
Solution: \url{http://thinkpython2.com/code/ackermann.py}.
5276
\index{Ackermann function}
5277
\index{function!ack}
5278

5279
\end{exercise}
5280

5281

5282
\begin{exercise}
5283
\label{palindrome}
5284

5285
A palindrome is a word that is spelled the same backward and
5286
forward, like ``noon'' and ``redivider''.  Recursively, a word
5287
is a palindrome if the first and last letters are the same
5288
and the middle is a palindrome.
5289
\index{palindrome}
5290

5291
The following are functions that take a string argument and
5292
return the first, last, and middle letters:
5293

5294
\begin{verbatim}
5295
def first(word):
5296
    return word[0]
5297

5298
def last(word):
5299
    return word[-1]
5300

5301
def middle(word):
5302
    return word[1:-1]
5303
\end{verbatim}
5304
%
5305
We'll see how they work in Chapter~\ref{strings}.
5306

5307
\begin{enumerate}
5308

5309
\item Type these functions into a file named {\tt palindrome.py}
5310
and test them out.  What happens if you call {\tt middle} with
5311
a string with two letters?  One letter?  What about the empty
5312
string, which is written \verb"''" and contains no letters?
5313

5314
\item Write a function called \verb"is_palindrome" that takes
5315
a string argument and returns {\tt True} if it is a palindrome
5316
and {\tt False} otherwise.  Remember that you can use the
5317
built-in function {\tt len} to check the length of a string.
5318

5319
\end{enumerate}
5320

5321
Solution: \url{http://thinkpython2.com/code/palindrome_soln.py}.
5322

5323
\end{exercise}
5324

5325
\begin{exercise}
5326

5327
A number, $a$, is a power of $b$ if it is divisible by $b$
5328
and $a/b$ is a power of $b$.  Write a function called
5329
\verb"is_power" that takes parameters {\tt a} and {\tt b}
5330
and returns {\tt True} if {\tt a} is a power of {\tt b}.
5331
Note: you will have to think about the base case.
5332

5333
\end{exercise}
5334

5335

5336
\begin{exercise}
5337
\index{greatest common divisor (GCD)}
5338
\index{GCD (greatest common divisor)}
5339

5340
The greatest common divisor (GCD) of $a$ and $b$ is the largest number
5341
that divides both of them with no remainder.  
5342

5343
One way to find the GCD of two numbers is based on the observation
5344
that if $r$ is the remainder when $a$ is divided by $b$, then $gcd(a,
5345
b) = gcd(b, r)$.  As a base case, we can use $gcd(a, 0) = a$.
5346

5347
Write a function called
5348
\verb"gcd" that takes parameters {\tt a} and {\tt b}
5349
and returns their greatest common divisor.
5350

5351
Credit: This exercise is based on an example from Abelson and
5352
Sussman's {\em Structure and Interpretation of Computer Programs}.
5353

5354
\end{exercise}
5355

5356

5357
\chapter{Iteration}
5358

5359
This chapter is about iteration, which is the ability to run
5360
a block of statements repeatedly.  We saw a kind of iteration,
5361
using recursion, in Section~\ref{recursion}.
5362
We saw another kind, using a {\tt for} loop,
5363
in Section~\ref{repetition}.  In this chapter we'll see yet another
5364
kind, using a {\tt while} statement.
5365
But first I want to say a little more about variable assignment.
5366

5367

5368
\section{Reassignment}
5369
\index{assignment}
5370
\index{statement!assignment}
5371
\index{reassignment}
5372

5373
As you may have discovered, it is legal to make more than one
5374
assignment to the same variable.  A new assignment makes an existing
5375
variable refer to a new value (and stop referring to the old value).
5376

5377
\begin{verbatim}
5378
>>> x = 5
5379
>>> x
5380
5
5381
>>> x = 7
5382
>>> x
5383
7
5384
\end{verbatim}
5385
%
5386
The first time we display 
5387
{\tt x}, its value is 5; the second time, its
5388
value is 7.
5389

5390
Figure~\ref{fig.assign2} shows what {\bf reassignment} looks
5391
like in a state diagram. \index{state diagram} \index{diagram!state}
5392

5393
At this point I want to address a common source of
5394
confusion.
5395
Because Python uses the equal sign ({\tt =}) for assignment, it is
5396
tempting to interpret a statement like {\tt a = b} as a
5397
mathematical
5398
proposition of equality; that is, the claim that {\tt a} and
5399
{\tt b} are equal.  But this interpretation is wrong.
5400
\index{equality and assignment}
5401

5402
First, equality is a symmetric relationship and assignment is not.  For
5403
example, in mathematics, if $a=7$ then $7=a$.  But in Python, the
5404
statement {\tt a = 7} is legal and {\tt 7 = a} is not.
5405

5406
Also, in mathematics, a proposition of equality is either true or
5407
false for all time.  If $a=b$ now, then $a$ will always equal $b$.
5408
In Python, an assignment statement can make two variables equal, but
5409
they don't have to stay that way:
5410

5411
\begin{verbatim}
5412
>>> a = 5
5413
>>> b = a    # a and b are now equal
5414
>>> a = 3    # a and b are no longer equal
5415
>>> b
5416
5
5417
\end{verbatim}
5418
%
5419
The third line changes the value of {\tt a} but does not change the
5420
value of {\tt b}, so they are no longer equal. 
5421

5422
Reassigning variables is often useful, but you should use it
5423
with caution.  If the values of variables change frequently, it can
5424
make the code difficult to read and debug.
5425

5426
\begin{figure}
5427
\centerline
5428
{\includegraphics[scale=0.8]{figs/assign2.pdf}}
5429
\caption{State diagram.}
5430
\label{fig.assign2}
5431
\end{figure}
5432

5433

5434

5435
\section{Updating variables}
5436
\label{update}
5437

5438
\index{update}
5439
\index{variable!updating}
5440

5441
A common kind of reassignment is an {\bf update},
5442
where the new value of the variable depends on the old.
5443

5444
\begin{verbatim}
5445
>>> x = x + 1
5446
\end{verbatim}
5447
%
5448
This means ``get the current value of {\tt x}, add one, and then
5449
update {\tt x} with the new value.''
5450

5451
If you try to update a variable that doesn't exist, you get an
5452
error, because Python evaluates the right side before it assigns
5453
a value to {\tt x}:
5454

5455
\begin{verbatim}
5456
>>> x = x + 1
5457
NameError: name 'x' is not defined
5458
\end{verbatim}
5459
%
5460
Before you can update a variable, you have to {\bf initialize}
5461
it, usually with a simple assignment:
5462
\index{initialization (before update)}
5463

5464
\begin{verbatim}
5465
>>> x = 0
5466
>>> x = x + 1
5467
\end{verbatim}
5468
%
5469
Updating a variable by adding 1 is called an {\bf increment};
5470
subtracting 1 is called a {\bf decrement}.
5471
\index{increment}
5472
\index{decrement}
5473

5474

5475

5476

5477
\section{The {\tt while} statement}
5478
\index{statement!while}
5479
\index{while loop}
5480
\index{loop!while}
5481
\index{iteration}
5482

5483
Computers are often used to automate repetitive tasks.  Repeating
5484
identical or similar tasks without making errors is something that
5485
computers do well and people do poorly.  In a computer program,
5486
repetition is also called {\bf iteration}.
5487

5488
We have already seen two functions, {\tt countdown} and
5489
\verb"print_n", that iterate using recursion.  Because iteration is so
5490
common, Python provides language features to make it easier.
5491
One is the {\tt for} statement we saw in Section~\ref{repetition}.
5492
We'll get back to that later.
5493

5494
Another is the {\tt while} statement.  Here is a version of {\tt
5495
countdown} that uses a {\tt while} statement:
5496

5497
\begin{verbatim}
5498
def countdown(n):
5499
    while n > 0:
5500
        print(n)
5501
        n = n - 1
5502
    print('Blastoff!')
5503
\end{verbatim}
5504
%
5505
You can almost read the {\tt while} statement as if it were English.
5506
It means, ``While {\tt n} is greater than 0,
5507
display the value of {\tt n} and then decrement
5508
{\tt n}.  When you get to 0, display the word {\tt Blastoff!}''
5509
\index{flow of execution}
5510

5511
More formally, here is the flow of execution for a {\tt while} statement:
5512

5513
\begin{enumerate}
5514

5515
\item Determine whether the condition is true or false.
5516

5517
\item If false, exit the {\tt while} statement
5518
and continue execution at the next statement.
5519

5520
\item If the condition is true, run the
5521
body and then go back to step 1.
5522

5523
\end{enumerate}
5524

5525
This type of flow is called a loop because the third step
5526
loops back around to the top.  
5527
\index{condition}
5528
\index{loop}
5529
\index{body}
5530

5531
The body of the loop should change the value of one or more variables
5532
so that the condition becomes false eventually and the loop
5533
terminates.  Otherwise the loop will repeat forever, which is called
5534
an {\bf infinite loop}.  An endless source of amusement for computer
5535
scientists is the observation that the directions on shampoo,
5536
``Lather, rinse, repeat'', are an infinite loop.
5537
\index{infinite loop}
5538
\index{loop!infinite}
5539

5540
In the case of {\tt countdown}, we can prove that the loop
5541
terminates: if {\tt n} is zero or negative, the loop never runs.
5542
Otherwise, {\tt n} gets smaller each time through the
5543
loop, so eventually we have to get to 0.
5544

5545
For some other loops, it is not so easy to tell.  For example:
5546

5547
\begin{verbatim}
5548
def sequence(n):
5549
    while n != 1:
5550
        print(n)
5551
        if n % 2 == 0:        # n is even
5552
            n = n / 2
5553
        else:                 # n is odd
5554
            n = n*3 + 1
5555
\end{verbatim}
5556
%
5557
The condition for this loop is {\tt n != 1}, so the loop will continue
5558
until {\tt n} is {\tt 1}, which makes the condition false.
5559

5560
Each time through the loop, the program outputs the value of {\tt n}
5561
and then checks whether it is even or odd.  If it is even, {\tt n} is
5562
divided by 2.  If it is odd, the value of {\tt n} is replaced with
5563
{\tt n*3 + 1}. For example, if the argument passed to {\tt sequence}
5564
is 3, the resulting values of {\tt n} are 3, 10, 5, 16, 8, 4, 2, 1.
5565

5566
Since {\tt n} sometimes increases and sometimes decreases, there is no
5567
obvious proof that {\tt n} will ever reach 1, or that the program
5568
terminates.  For some particular values of {\tt n}, we can prove
5569
termination.  For example, if the starting value is a power of two,
5570
{\tt n} will be even every time through the loop
5571
until it reaches 1. The previous example ends with such a sequence,
5572
starting with 16.
5573
\index{Collatz conjecture}
5574

5575
The hard question is whether we can prove that this program terminates
5576
for {\em all} positive values of {\tt n}.  So far, no one has
5577
been able to prove it {\em or} disprove it!  (See
5578
  \url{http://en.wikipedia.org/wiki/Collatz_conjecture}.)
5579

5580
As an exercise, rewrite the function \verb"print_n" from
5581
Section~\ref{recursion} using iteration instead of recursion.
5582

5583

5584
\section{{\tt break}}
5585
\index{break statement}
5586
\index{statement!break}
5587

5588
Sometimes you don't know it's time to end a loop until you get half
5589
way through the body.  In that case you can use the {\tt break}
5590
statement to jump out of the loop.
5591

5592
For example, suppose you want to take input from the user until they
5593
type {\tt done}.  You could write:
5594

5595
\begin{verbatim}
5596
while True:
5597
    line = input('> ')
5598
    if line == 'done':
5599
        break
5600
    print(line)
5601

5602
print('Done!')
5603
\end{verbatim}
5604
%
5605
The loop condition is {\tt True}, which is always true, so the
5606
loop runs until it hits the break statement.
5607

5608
Each time through, it prompts the user with an angle bracket.
5609
If the user types {\tt done}, the {\tt break} statement exits
5610
the loop.  Otherwise the program echoes whatever the user types
5611
and goes back to the top of the loop.  Here's a sample run:
5612

5613
\begin{verbatim}
5614
> not done
5615
not done
5616
> done
5617
Done!
5618
\end{verbatim}
5619
%
5620
This way of writing {\tt while} loops is common because you
5621
can check the condition anywhere in the loop (not just at the
5622
top) and you can express the stop condition affirmatively
5623
(``stop when this happens'') rather than negatively (``keep going
5624
until that happens'').
5625

5626

5627
\section{Square roots}
5628
\label{squareroot}
5629
\index{square root}
5630

5631
Loops are often used in programs that compute
5632
numerical results by starting with an approximate answer and
5633
iteratively improving it.
5634
\index{Newton's method}
5635

5636
For example, one way of computing square roots is Newton's method.
5637
Suppose that you want to know the square root of $a$.  If you start
5638
with almost any estimate, $x$, you can compute a better
5639
estimate with the following formula:
5640

5641
\[ y = \frac{x + a/x}{2} \]
5642
%
5643
For example, if $a$ is 4 and $x$ is 3:
5644

5645
\begin{verbatim}
5646
>>> a = 4
5647
>>> x = 3
5648
>>> y = (x + a/x) / 2
5649
>>> y
5650
2.16666666667
5651
\end{verbatim}
5652
%
5653
The result is closer to the correct answer ($\sqrt{4} = 2$).  If we
5654
repeat the process with the new estimate, it gets even closer:
5655

5656
\begin{verbatim}
5657
>>> x = y
5658
>>> y = (x + a/x) / 2
5659
>>> y
5660
2.00641025641
5661
\end{verbatim}
5662
%
5663
After a few more updates, the estimate is almost exact:
5664
\index{update}
5665

5666
\begin{verbatim}
5667
>>> x = y
5668
>>> y = (x + a/x) / 2
5669
>>> y
5670
2.00001024003
5671
>>> x = y
5672
>>> y = (x + a/x) / 2
5673
>>> y
5674
2.00000000003
5675
\end{verbatim}
5676
%
5677
In general we don't know ahead of time how many steps it takes
5678
to get to the right answer, but we know when we get there
5679
because the estimate
5680
stops changing:
5681

5682
\begin{verbatim}
5683
>>> x = y
5684
>>> y = (x + a/x) / 2
5685
>>> y
5686
2.0
5687
>>> x = y
5688
>>> y = (x + a/x) / 2
5689
>>> y
5690
2.0
5691
\end{verbatim}
5692
%
5693
When {\tt y == x}, we can stop.  Here is a loop that starts
5694
with an initial estimate, {\tt x}, and improves it until it
5695
stops changing:
5696

5697
\begin{verbatim}
5698
while True:
5699
    print(x)
5700
    y = (x + a/x) / 2
5701
    if y == x:
5702
        break
5703
    x = y
5704
\end{verbatim}
5705
%
5706
For most values of {\tt a} this works fine, but in general it is
5707
dangerous to test {\tt float} equality.
5708
Floating-point values are only approximately right:
5709
most rational numbers, like $1/3$, and irrational numbers, like
5710
$\sqrt{2}$, can't be represented exactly with a {\tt float}.
5711
\index{floating-point}
5712
\index{epsilon}
5713

5714
Rather than checking whether {\tt x} and {\tt y} are exactly equal, it
5715
is safer to use the built-in function {\tt abs} to compute the
5716
absolute value, or magnitude, of the difference between them:
5717

5718
\begin{verbatim}
5719
    if abs(y-x) < epsilon:
5720
        break
5721
\end{verbatim}
5722
%
5723
Where \verb"epsilon" has a value like {\tt 0.0000001} that
5724
determines how close is close enough.
5725

5726

5727
\section{Algorithms}
5728
\index{algorithm}
5729

5730
Newton's method is an example of an {\bf algorithm}: it is a
5731
mechanical process for solving a category of problems (in this
5732
case, computing square roots).
5733

5734
To understand what an algorithm is, it might help to start with
5735
something that is not an algorithm.  When you learned to multiply
5736
single-digit numbers, you probably memorized the multiplication table.
5737
In effect, you memorized 100 specific solutions.  That kind of
5738
knowledge is not algorithmic.
5739

5740
But if you were ``lazy'', you might have learned a few
5741
tricks.  For example, to find the product of $n$ and 9, you can
5742
write $n-1$ as the first digit and $10-n$ as the second
5743
digit.  This trick is a general solution for multiplying any
5744
single-digit number by 9.  That's an algorithm!
5745
\index{addition with carrying}
5746
\index{carrying, addition with}
5747
\index{subtraction!with borrowing}
5748
\index{borrowing, subtraction with}
5749

5750
Similarly, the techniques you learned for addition with carrying,
5751
subtraction with borrowing, and long division are all algorithms.  One
5752
of the characteristics of algorithms is that they do not require any
5753
intelligence to carry out.  They are mechanical processes where
5754
each step follows from the last according to a simple set of rules.
5755

5756
Executing algorithms is boring, but designing them is interesting,
5757
intellectually challenging, and a central part of computer science.
5758

5759
Some of the things that people do naturally, without difficulty or
5760
conscious thought, are the hardest to express algorithmically.
5761
Understanding natural language is a good example.  We all do it, but
5762
so far no one has been able to explain {\em how} we do it, at least
5763
not in the form of an algorithm.
5764

5765

5766
\section{Debugging}
5767
\label{bisectbug}
5768

5769
As you start writing bigger programs, you might find yourself
5770
spending more time debugging.  More code means more chances to
5771
make an error and more places for bugs to hide.
5772
\index{debugging!by bisection}
5773
\index{bisection, debugging by}
5774

5775
One way to cut your debugging time is ``debugging by bisection''.
5776
For example, if there are 100 lines in your program and you
5777
check them one at a time, it would take 100 steps.
5778

5779
Instead, try to break the problem in half.  Look at the middle
5780
of the program, or near it, for an intermediate value you
5781
can check.  Add a {\tt print} statement (or something else
5782
that has a verifiable effect) and run the program.
5783

5784
If the mid-point check is incorrect, there must be a problem in the
5785
first half of the program.  If it is correct, the problem is
5786
in the second half.
5787

5788
Every time you perform a check like this, you halve the number of
5789
lines you have to search.  After six steps (which is fewer than 100),
5790
you would be down to one or two lines of code, at least in theory.
5791

5792
In practice it is not always clear what
5793
the ``middle of the program'' is and not always possible to
5794
check it.  It doesn't make sense to count lines and find the
5795
exact midpoint.  Instead, think about places
5796
in the program where there might be errors and places where it
5797
is easy to put a check.  Then choose a spot where you
5798
think the chances are about the same that the bug is before
5799
or after the check.
5800

5801

5802

5803

5804
\section{Glossary}
5805

5806
\begin{description}
5807

5808
\item[reassignment:] Assigning a new value to a variable that
5809
already exists.
5810
\index{reassignment}
5811

5812
\item[update:] An assignment where the new value of the variable
5813
depends on the old.
5814
\index{update}
5815

5816
\item[initialization:] An assignment that gives an initial value to
5817
a variable that will be updated.
5818
\index{initialization!variable}
5819

5820
\item[increment:] An update that increases the value of a variable
5821
(often by one).
5822
\index{increment}
5823

5824
\item[decrement:] An update that decreases the value of a variable.
5825
\index{decrement}
5826

5827
\item[iteration:] Repeated execution of a set of statements using
5828
either a recursive function call or a loop.
5829
\index{iteration}
5830

5831
\item[infinite loop:] A loop in which the terminating condition is
5832
never satisfied.
5833
\index{infinite loop}
5834

5835
\item[algorithm:]  A general process for solving a category of
5836
problems.
5837
\index{algorithm}
5838

5839
\end{description}
5840

5841

5842
\section{Exercises}
5843

5844
\begin{exercise}
5845
\index{algorithm!square root}
5846

5847
Copy the loop from Section~\ref{squareroot}
5848
and encapsulate it in a function called
5849
\verb"mysqrt" that takes {\tt a} as a parameter, chooses a
5850
reasonable value of {\tt x}, and returns an estimate of the square
5851
root of {\tt a}.  \index{encapsulation}
5852

5853
To test it, write a function named \verb"test_square_root"
5854
that prints a table like this:
5855

5856
\begin{verbatim}
5857
a   mysqrt(a)     math.sqrt(a)  diff
5858
-   ---------     ------------  ----
5859
1.0 1.0           1.0           0.0
5860
2.0 1.41421356237 1.41421356237 2.22044604925e-16
5861
3.0 1.73205080757 1.73205080757 0.0
5862
4.0 2.0           2.0           0.0
5863
5.0 2.2360679775  2.2360679775  0.0
5864
6.0 2.44948974278 2.44948974278 0.0
5865
7.0 2.64575131106 2.64575131106 0.0
5866
8.0 2.82842712475 2.82842712475 4.4408920985e-16
5867
9.0 3.0           3.0           0.0
5868
\end{verbatim}
5869
%
5870
The first column is a number, $a$; the second column is the square
5871
root of $a$ computed with \verb"mysqrt"; the third column is the
5872
square root computed by {\tt math.sqrt}; the fourth column is the
5873
absolute value of the difference between the two estimates.
5874
\end{exercise}
5875

5876

5877
\begin{exercise}
5878
\index{eval function}
5879
\index{function!eval}
5880

5881
The built-in function {\tt eval} takes a string and evaluates
5882
it using the Python interpreter.  For example:
5883

5884
\begin{verbatim}
5885
>>> eval('1 + 2 * 3')
5886
7
5887
>>> import math
5888
>>> eval('math.sqrt(5)')
5889
2.2360679774997898
5890
>>> eval('type(math.pi)')
5891
<class 'float'>
5892
\end{verbatim}
5893
%
5894
Write a function called \verb"eval_loop" that iteratively
5895
prompts the user, takes the resulting input and evaluates
5896
it using {\tt eval}, and prints the result.
5897

5898
It should continue until the user enters \verb"'done'", and then
5899
return the value of the last expression it evaluated.
5900

5901
\end{exercise}
5902

5903

5904
\begin{exercise}
5905
\index{Ramanujan, Srinivasa}
5906

5907
The mathematician Srinivasa Ramanujan found an
5908
infinite series
5909
that can be used to generate a numerical
5910
approximation of $1 / \pi$:
5911
\index{pi}
5912

5913
\[ \frac{1}{\pi} = \frac{2\sqrt{2}}{9801} 
5914
\sum^\infty_{k=0} \frac{(4k)!(1103+26390k)}{(k!)^4 396^{4k}} \]
5915

5916
Write a function called \verb"estimate_pi" that uses this formula
5917
to compute and return an estimate of $\pi$.  It should use a {\tt while}
5918
loop to compute terms of the summation until the last term is
5919
smaller than {\tt 1e-15} (which is Python notation for $10^{-15}$).
5920
You can check the result by comparing it to {\tt math.pi}.
5921

5922
Solution: \url{http://thinkpython2.com/code/pi.py}.
5923

5924
\end{exercise}
5925

5926

5927
\chapter{Strings}
5928
\label{strings}
5929

5930
Strings are not like integers, floats, and booleans.  A string
5931
is a {\bf sequence}, which means it is
5932
an ordered collection of other values.  In this chapter you'll see
5933
how to access the characters that make up a string, and you'll
5934
learn about some of the methods strings provide.
5935
\index{sequence}
5936

5937

5938
\section{A string is a sequence}
5939

5940
\index{sequence}
5941
\index{character}
5942
\index{bracket operator}
5943
\index{operator!bracket}
5944
A string is a sequence of characters.  
5945
You can access the characters one at a time with the
5946
bracket operator:
5947

5948
\begin{verbatim}
5949
>>> fruit = 'banana'
5950
>>> letter = fruit[1]
5951
\end{verbatim}
5952
%
5953
The second statement selects character number 1 from {\tt
5954
fruit} and assigns it to {\tt letter}.  
5955
\index{index}
5956

5957
The expression in brackets is called an {\bf index}.  
5958
The index indicates which character in the sequence you
5959
want (hence the name).
5960

5961
But you might not get what you expect:
5962

5963
\begin{verbatim}
5964
>>> letter
5965
'a'
5966
\end{verbatim}
5967
%
5968
For most people, the first letter of \verb"'banana'" is {\tt b}, not
5969
{\tt a}.  But for computer scientists, the index is an offset from the
5970
beginning of the string, and the offset of the first letter is zero.
5971

5972
\begin{verbatim}
5973
>>> letter = fruit[0]
5974
>>> letter
5975
'b'
5976
\end{verbatim}
5977
%
5978
So {\tt b} is the 0th letter (``zero-eth'') of \verb"'banana'", {\tt
5979
  a} is the 1th letter (``one-eth''), and {\tt n} is the 2th letter
5980
(``two-eth'').  \index{index!starting at zero} \index{zero, index
5981
  starting at}
5982

5983
As an index you can use an expression that contains variables and
5984
operators:
5985
\index{index}
5986

5987
\begin{verbatim}
5988
>>> i = 1
5989
>>> fruit[i]
5990
'a'
5991
>>> fruit[i+1]
5992
'n'
5993
\end{verbatim}
5994
%
5995

5996
But the value of the index has to be an integer.  Otherwise you
5997
get:
5998
\index{exception!TypeError}
5999
\index{TypeError}
6000

6001
\begin{verbatim}
6002
>>> letter = fruit[1.5]
6003
TypeError: string indices must be integers
6004
\end{verbatim}
6005
%
6006

6007
\section{{\tt len}}
6008
\index{len function}
6009
\index{function!len}
6010

6011
{\tt len} is a built-in function that returns the number of characters
6012
in a string:
6013

6014
\begin{verbatim}
6015
>>> fruit = 'banana'
6016
>>> len(fruit)
6017
6
6018
\end{verbatim}
6019
%
6020
To get the last letter of a string, you might be tempted to try something
6021
like this:
6022
\index{exception!IndexError}
6023
\index{IndexError}
6024

6025
\begin{verbatim}
6026
>>> length = len(fruit)
6027
>>> last = fruit[length]
6028
IndexError: string index out of range
6029
\end{verbatim}
6030
%
6031
The reason for the {\tt IndexError} is that there is no letter in {\tt
6032
'banana'} with the index 6.  Since we started counting at zero, the
6033
six letters are numbered 0 to 5.  To get the last character, you have
6034
to subtract 1 from {\tt length}:
6035

6036
\begin{verbatim}
6037
>>> last = fruit[length-1]
6038
>>> last
6039
'a'
6040
\end{verbatim}
6041
%
6042
Or you can use negative indices, which count backward from
6043
the end of the string.  The expression {\tt fruit[-1]} yields the last
6044
letter, {\tt fruit[-2]} yields the second to last, and so on.
6045
\index{index!negative}
6046
\index{negative index}
6047

6048

6049
\section{Traversal with a {\tt for} loop}
6050
\label{for}
6051
\index{traversal}
6052
\index{loop!traversal}
6053
\index{for loop}
6054
\index{loop!for}
6055
\index{statement!for}
6056
\index{traversal}
6057

6058
A lot of computations involve processing a string one character at a
6059
time.  Often they start at the beginning, select each character in
6060
turn, do something to it, and continue until the end.  This pattern of
6061
processing is called a {\bf traversal}.  One way to write a traversal
6062
is with a {\tt while} loop:
6063

6064
\begin{verbatim}
6065
index = 0
6066
while index < len(fruit):
6067
    letter = fruit[index]
6068
    print(letter)
6069
    index = index + 1
6070
\end{verbatim}
6071
%
6072
This loop traverses the string and displays each letter on a line by
6073
itself.  The loop condition is {\tt index < len(fruit)}, so
6074
when {\tt index} is equal to the length of the string, the
6075
condition is false, and the body of the loop doesn't run.  The
6076
last character accessed is the one with the index {\tt len(fruit)-1},
6077
which is the last character in the string.
6078

6079
As an exercise, write a function that takes a string as an argument
6080
and displays the letters backward, one per line.
6081

6082
Another way to write a traversal is with a {\tt for} loop:
6083

6084
\begin{verbatim}
6085
for letter in fruit:
6086
    print(letter)
6087
\end{verbatim}
6088
%
6089
Each time through the loop, the next character in the string is assigned
6090
to the variable {\tt letter}.  The loop continues until no characters are
6091
left.
6092
\index{concatenation}
6093
\index{abecedarian}
6094
\index{McCloskey, Robert}
6095

6096
The following example shows how to use concatenation (string addition)
6097
and a {\tt for} loop to generate an abecedarian series (that is, in
6098
alphabetical order).  In Robert McCloskey's book {\em Make
6099
Way for Ducklings}, the names of the ducklings are Jack, Kack, Lack,
6100
Mack, Nack, Ouack, Pack, and Quack.  This loop outputs these names in
6101
order:
6102

6103
\begin{verbatim}
6104
prefixes = 'JKLMNOPQ'
6105
suffix = 'ack'
6106

6107
for letter in prefixes:
6108
    print(letter + suffix)
6109
\end{verbatim}
6110
%
6111
The output is:
6112

6113
\begin{verbatim}
6114
Jack
6115
Kack
6116
Lack
6117
Mack
6118
Nack
6119
Oack
6120
Pack
6121
Qack
6122
\end{verbatim}
6123
%
6124
Of course, that's not quite right because ``Ouack'' and ``Quack'' are
6125
misspelled.  As an exercise, modify the program to fix this error.
6126

6127

6128

6129
\section{String slices}
6130
\label{slice}
6131
\index{slice operator} \index{operator!slice} \index{index!slice}
6132
\index{string!slice} \index{slice!string}
6133

6134
A segment of a string is called a {\bf slice}.  Selecting a slice is
6135
similar to selecting a character:
6136

6137
\begin{verbatim}
6138
>>> s = 'Monty Python'
6139
>>> s[0:5]
6140
'Monty'
6141
>>> s[6:12]
6142
'Python'
6143
\end{verbatim}
6144
%
6145
The operator {\tt [n:m]} returns the part of the string from the 
6146
``n-eth'' character to the ``m-eth'' character, including the first but
6147
excluding the last.  This behavior is counterintuitive, but it might
6148
help to imagine the indices pointing {\em between} the
6149
characters, as in Figure~\ref{fig.banana}.
6150

6151
\begin{figure}
6152
\centerline
6153
{\includegraphics[scale=0.8]{figs/banana.pdf}}
6154
\caption{Slice indices.}
6155
\label{fig.banana}
6156
\end{figure}
6157

6158
If you omit the first index (before the colon), the slice starts at
6159
the beginning of the string.  If you omit the second index, the slice
6160
goes to the end of the string:
6161

6162
\begin{verbatim}
6163
>>> fruit = 'banana'
6164
>>> fruit[:3]
6165
'ban'
6166
>>> fruit[3:]
6167
'ana'
6168
\end{verbatim}
6169
%
6170
If the first index is greater than or equal to the second the result
6171
is an {\bf empty string}, represented by two quotation marks:
6172
\index{quotation mark}
6173

6174
\begin{verbatim}
6175
>>> fruit = 'banana'
6176
>>> fruit[3:3]
6177
''
6178
\end{verbatim}
6179
%
6180
An empty string contains no characters and has length 0, but other
6181
than that, it is the same as any other string.
6182

6183
Continuing this example, what do you think 
6184
{\tt fruit[:]} means?  Try it and see.
6185
\index{copy!slice}
6186
\index{slice!copy}
6187

6188

6189

6190
\section{Strings are immutable}
6191
\index{mutability}
6192
\index{immutability}
6193
\index{string!immutable}
6194

6195
It is tempting to use the {\tt []} operator on the left side of an
6196
assignment, with the intention of changing a character in a string.
6197
For example:
6198
\index{TypeError}
6199
\index{exception!TypeError}
6200

6201
\begin{verbatim}
6202
>>> greeting = 'Hello, world!'
6203
>>> greeting[0] = 'J'
6204
TypeError: 'str' object does not support item assignment
6205
\end{verbatim}
6206
%
6207
The ``object'' in this case is the string and the ``item'' is
6208
the character you tried to assign.  For now, an object is
6209
the same thing as a value, but we will refine that definition
6210
later (Section~\ref{equivalence}).  
6211
\index{object}
6212
\index{item}
6213
\index{item assignment}
6214
\index{assignment!item}
6215
\index{immutability}
6216

6217
The reason for the error is that
6218
strings are {\bf immutable}, which means you can't change an
6219
existing string.  The best you can do is create a new string
6220
that is a variation on the original:
6221

6222
\begin{verbatim}
6223
>>> greeting = 'Hello, world!'
6224
>>> new_greeting = 'J' + greeting[1:]
6225
>>> new_greeting
6226
'Jello, world!'
6227
\end{verbatim}
6228
%
6229
This example concatenates a new first letter onto
6230
a slice of {\tt greeting}.  It has no effect on
6231
the original string.
6232
\index{concatenation}
6233

6234

6235
\section{Searching}
6236
\label{find}
6237

6238
What does the following function do?
6239
\index{find function}
6240
\index{function!find}
6241

6242
\begin{verbatim}
6243
def find(word, letter):
6244
    index = 0
6245
    while index < len(word):
6246
        if word[index] == letter:
6247
            return index
6248
        index = index + 1
6249
    return -1
6250
\end{verbatim}
6251
%
6252
In a sense, {\tt find} is the inverse of the {\tt []} operator.
6253
Instead of taking an index and extracting the corresponding character,
6254
it takes a character and finds the index where that character
6255
appears.  If the character is not found, the function returns {\tt
6256
-1}.
6257

6258
This is the first example we have seen of a {\tt return} statement
6259
inside a loop.  If {\tt word[index] == letter}, the function breaks
6260
out of the loop and returns immediately.
6261

6262
If the character doesn't appear in the string, the program
6263
exits the loop normally and  returns {\tt -1}.
6264

6265
This pattern of computation---traversing a sequence and returning
6266
when we find what we are looking for---is called a {\bf search}.
6267
\index{traversal}
6268
\index{search pattern}
6269
\index{pattern!search}
6270

6271
As an exercise, modify {\tt find} so that it has a
6272
third parameter, the index in {\tt word} where it should start
6273
looking.
6274

6275

6276
\section{Looping and counting}
6277
\label{counter}
6278
\index{counter}
6279
\index{counting and looping}
6280
\index{looping and counting}
6281
\index{looping!with strings}
6282

6283
The following program counts the number of times the letter {\tt a}
6284
appears in a string:
6285

6286
\begin{verbatim}
6287
word = 'banana'
6288
count = 0
6289
for letter in word:
6290
    if letter == 'a':
6291
        count = count + 1
6292
print(count)
6293
\end{verbatim}
6294
%
6295
This program demonstrates another pattern of computation called a {\bf
6296
counter}.  The variable {\tt count} is initialized to 0 and then
6297
incremented each time an {\tt a} is found.
6298
When the loop exits, {\tt count}
6299
contains the result---the total number of {\tt a}'s.
6300

6301
\index{encapsulation}
6302
As an exercise, encapsulate this code in a function named {\tt
6303
count}, and generalize it so that it accepts the string and the
6304
letter as arguments.
6305

6306
Then rewrite the function so that instead of
6307
traversing the string, it uses the three-parameter version of {\tt
6308
find} from the previous section.
6309

6310

6311
\section{String methods}
6312
\label{optional}
6313

6314
Strings provide methods that perform a variety of useful operations.
6315
A method is similar to a function---it takes arguments and
6316
returns a value---but the syntax is different.  For example, the
6317
method {\tt upper} takes a string and returns a new string with
6318
all uppercase letters.
6319
\index{method}
6320
\index{string!method}
6321

6322
Instead of the function syntax {\tt upper(word)}, it uses
6323
the method syntax {\tt word.upper()}.
6324

6325
\begin{verbatim}
6326
>>> word = 'banana'
6327
>>> new_word = word.upper()
6328
>>> new_word
6329
'BANANA'
6330
\end{verbatim}
6331
%
6332
This form of dot notation specifies the name of the method, {\tt
6333
upper}, and the name of the string to apply the method to, {\tt
6334
word}.  The empty parentheses indicate that this method takes no
6335
arguments.
6336
\index{parentheses!empty}
6337
\index{dot notation}
6338

6339
A method call is called an {\bf invocation}; in this case, we would
6340
say that we are invoking {\tt upper} on {\tt word}.
6341
\index{invocation}
6342

6343
As it turns out, there is a string method named {\tt find} that
6344
is remarkably similar to the function we wrote:
6345

6346
\begin{verbatim}
6347
>>> word = 'banana'
6348
>>> index = word.find('a')
6349
>>> index
6350
1
6351
\end{verbatim}
6352
%
6353
In this example, we invoke {\tt find} on {\tt word} and pass
6354
the letter we are looking for as a parameter.
6355

6356
Actually, the {\tt find} method is more general than our function;
6357
it can find substrings, not just characters:
6358

6359
\begin{verbatim}
6360
>>> word.find('na')
6361
2
6362
\end{verbatim}
6363
%
6364
By default, {\tt find} starts at the beginning of the string, but
6365
it can take a second argument, the index where it should start:
6366
\index{optional argument}
6367
\index{argument!optional}
6368

6369
\begin{verbatim}
6370
>>> word.find('na', 3)
6371
4
6372
\end{verbatim}
6373
%
6374
This is an example of an {\bf optional argument};
6375
{\tt find} can
6376
also take a third argument, the index where it should stop:
6377

6378
\begin{verbatim}
6379
>>> name = 'bob'
6380
>>> name.find('b', 1, 2)
6381
-1
6382
\end{verbatim}
6383
%
6384
This search fails because {\tt b} does not
6385
appear in the index range from {\tt 1} to {\tt 2}, not including {\tt
6386
2}.  Searching up to, but not including, the second index makes
6387
{\tt find} consistent with the slice operator.
6388

6389

6390

6391
\section{The {\tt in} operator}
6392
\label{inboth}
6393
\index{in operator}
6394
\index{operator!in}
6395
\index{boolean operator}
6396
\index{operator!boolean}
6397

6398
The word {\tt in} is a boolean operator that takes two strings and
6399
returns {\tt True} if the first appears as a substring in the second:
6400

6401
\begin{verbatim}
6402
>>> 'a' in 'banana'
6403
True
6404
>>> 'seed' in 'banana'
6405
False
6406
\end{verbatim}
6407
%
6408
For example, the following function prints all the
6409
letters from {\tt word1} that also appear in {\tt word2}:
6410

6411
\begin{verbatim}
6412
def in_both(word1, word2):
6413
    for letter in word1:
6414
        if letter in word2:
6415
            print(letter)
6416
\end{verbatim}
6417
%
6418
With well-chosen variable names,
6419
Python sometimes reads like English.  You could read
6420
this loop, ``for (each) letter in (the first) word, if (the) letter 
6421
(appears) in (the second) word, print (the) letter.''
6422

6423
Here's what you get if you compare apples and oranges:
6424

6425
\begin{verbatim}
6426
>>> in_both('apples', 'oranges')
6427
a
6428
e
6429
s
6430
\end{verbatim}
6431
%
6432

6433
\section{String comparison}
6434
\index{string!comparison}
6435
\index{comparison!string}
6436

6437
The relational operators work on strings.  To see if two strings are equal:
6438

6439
\begin{verbatim}
6440
if word == 'banana':
6441
    print('All right, bananas.')
6442
\end{verbatim}
6443
%
6444
Other relational operations are useful for putting words in alphabetical
6445
order:
6446

6447
\begin{verbatim}
6448
if word < 'banana':
6449
    print('Your word, ' + word + ', comes before banana.')
6450
elif word > 'banana':
6451
    print('Your word, ' + word + ', comes after banana.')
6452
else:
6453
    print('All right, bananas.')
6454
\end{verbatim}
6455
%
6456
Python does not handle uppercase and lowercase letters the same way
6457
people do.  All the uppercase letters come before all the
6458
lowercase letters, so:
6459

6460
\begin{verbatim}
6461
Your word, Pineapple, comes before banana.
6462
\end{verbatim}
6463
%
6464
A common way to address this problem is to convert strings to a
6465
standard format, such as all lowercase, before performing the
6466
comparison.  Keep that in mind in case you have to defend yourself
6467
against a man armed with a Pineapple.
6468

6469

6470
\section{Debugging}
6471
\index{debugging}
6472
\index{traversal}
6473

6474
When you use indices to traverse the values in a sequence,
6475
it is tricky to get the beginning and end of the traversal
6476
right.  Here is a function that is supposed to compare two
6477
words and return {\tt True} if one of the words is the reverse
6478
of the other, but it contains two errors:
6479

6480
\begin{verbatim}
6481
def is_reverse(word1, word2):
6482
    if len(word1) != len(word2):
6483
        return False
6484
    
6485
    i = 0
6486
    j = len(word2)
6487

6488
    while j > 0:
6489
        if word1[i] != word2[j]:
6490
            return False
6491
        i = i+1
6492
        j = j-1
6493

6494
    return True
6495
\end{verbatim}
6496
%
6497
The first {\tt if} statement checks whether the words are the
6498
same length.  If not, we can return {\tt False} immediately.
6499
Otherwise, for the rest of the function, we can assume that the words
6500
are the same length.  This is an example of the guardian pattern
6501
in Section~\ref{guardian}.
6502
\index{guardian pattern}
6503
\index{pattern!guardian}
6504
\index{index}
6505

6506
{\tt i} and {\tt j} are indices: {\tt i} traverses {\tt word1}
6507
forward while {\tt j} traverses {\tt word2} backward.  If we find
6508
two letters that don't match, we can return {\tt False} immediately.
6509
If we get through the whole loop and all the letters match, we
6510
return {\tt True}.
6511

6512
If we test this function with the words ``pots'' and ``stop'', we
6513
expect the return value {\tt True}, but we get an IndexError:
6514
\index{IndexError}
6515
\index{exception!IndexError}
6516

6517
\begin{verbatim}
6518
>>> is_reverse('pots', 'stop')
6519
...
6520
  File "reverse.py", line 15, in is_reverse
6521
    if word1[i] != word2[j]:
6522
IndexError: string index out of range
6523
\end{verbatim}
6524
%
6525
For debugging this kind of error, my first move is to
6526
print the values of the indices immediately before the line
6527
where the error appears.
6528

6529
\begin{verbatim}
6530
    while j > 0:
6531
        print(i, j)        # print here
6532
        
6533
        if word1[i] != word2[j]:
6534
            return False
6535
        i = i+1
6536
        j = j-1
6537
\end{verbatim}
6538
%
6539
Now when I run the program again, I get more information:
6540

6541
\begin{verbatim}
6542
>>> is_reverse('pots', 'stop')
6543
0 4
6544
...
6545
IndexError: string index out of range
6546
\end{verbatim}
6547
%
6548
The first time through the loop, the value of {\tt j} is 4,
6549
which is out of range for the string \verb"'pots'".
6550
The index of the last character is 3, so the
6551
initial value for {\tt j} should be {\tt len(word2)-1}.
6552

6553
If I fix that error and run the program again, I get:
6554

6555
\begin{verbatim}
6556
>>> is_reverse('pots', 'stop')
6557
0 3
6558
1 2
6559
2 1
6560
True
6561
\end{verbatim}
6562
%
6563
This time we get the right answer, but it looks like the loop only ran
6564
three times, which is suspicious.  To get a better idea of what is
6565
happening, it is useful to draw a state diagram.  During the first
6566
iteration, the frame for \verb"is_reverse" is shown in
6567
Figure~\ref{fig.state4}.  \index{state diagram} \index{diagram!state}
6568

6569
\begin{figure}
6570
\centerline
6571
{\includegraphics[scale=0.8]{figs/state4.pdf}}
6572
\caption{State diagram.}
6573
\label{fig.state4}
6574
\end{figure}
6575

6576
I took some license by arranging the variables in the frame
6577
and adding dotted lines to show that the values of {\tt i} and
6578
{\tt j} indicate characters in {\tt word1} and {\tt word2}.
6579

6580
Starting with this diagram, run the program on paper, changing the
6581
values of {\tt i} and {\tt j} during each iteration.  Find and fix the
6582
second error in this function.
6583
\label{isreverse}
6584

6585

6586
\section{Glossary}
6587

6588
\begin{description}
6589

6590
\item[object:] Something a variable can refer to.  For now,
6591
you can use ``object'' and ``value'' interchangeably.
6592
\index{object}
6593

6594
\item[sequence:] An ordered collection of
6595
values where each value is identified by an integer index.
6596
\index{sequence}
6597

6598
\item[item:] One of the values in a sequence.
6599
\index{item}
6600

6601
\item[index:] An integer value used to select an item in
6602
a sequence, such as a character in a string.  In Python
6603
indices start from 0.
6604
\index{index}
6605

6606
\item[slice:] A part of a string specified by a range of indices.
6607
\index{slice}
6608

6609
\item[empty string:] A string with no characters and length 0, represented
6610
by two quotation marks.
6611
\index{empty string}
6612

6613
\item[immutable:] The property of a sequence whose items cannot
6614
be changed.
6615
\index{immutability}
6616

6617
\item[traverse:] To iterate through the items in a sequence,
6618
performing a similar operation on each.
6619
\index{traversal}
6620

6621
\item[search:] A pattern of traversal that stops
6622
when it finds what it is looking for.
6623
\index{search pattern}
6624
\index{pattern!search}
6625

6626
\item[counter:] A variable used to count something, usually initialized
6627
to zero and then incremented.
6628
\index{counter}
6629

6630
\item[invocation:] A statement that calls a method.
6631
\index{invocation}
6632

6633
\item[optional argument:] A function or method argument that is not
6634
required.
6635
\index{optional argument}
6636
\index{argument!optional}
6637

6638
\end{description}
6639

6640

6641
\section{Exercises}
6642

6643
\begin{exercise}
6644
\index{string method}
6645
\index{method!string}
6646

6647
Read the documentation of the string methods at
6648
\url{http://docs.python.org/3/library/stdtypes.html#string-methods}.
6649
You might want to experiment with some of them to make sure you
6650
understand how they work.  {\tt strip} and {\tt replace} are
6651
particularly useful.
6652

6653
The documentation uses a syntax that might be confusing.
6654
For example, in \verb"find(sub[, start[, end]])", the brackets
6655
indicate optional arguments.  So {\tt sub} is required, but
6656
{\tt start} is optional, and if you include {\tt start},
6657
then {\tt end} is optional.
6658
\index{optional argument}
6659
\index{argument!optional}
6660

6661
\end{exercise}
6662

6663

6664
\begin{exercise}
6665
\index{count method}
6666
\index{method!count}
6667

6668
There is a string method called {\tt count} that is similar
6669
to the function in Section~\ref{counter}.  Read the documentation
6670
of this method
6671
and write an invocation that counts the number of {\tt a}'s
6672
in \verb"'banana'".
6673
\end{exercise}
6674

6675

6676
\begin{exercise}
6677
\index{step size}
6678
\index{slice operator}
6679
\index{operator!slice}
6680

6681
A string slice can take a third index that specifies the ``step
6682
size''; that is, the number of spaces between successive characters.
6683
A step size of 2 means every other character; 3 means every third,
6684
etc.
6685

6686
\begin{verbatim}
6687
>>> fruit = 'banana'
6688
>>> fruit[0:5:2]
6689
'bnn'
6690
\end{verbatim}
6691

6692
A step size of -1 goes through the word backwards, so
6693
the slice \verb"[::-1]" generates a reversed string.
6694
\index{palindrome}
6695

6696
Use this idiom to write a one-line version of \verb"is_palindrome"
6697
from Exercise~\ref{palindrome}.
6698
\end{exercise}
6699

6700

6701
\begin{exercise}
6702

6703
The following functions are all {\em intended} to check whether a
6704
string contains any lowercase letters, but at least some of them are
6705
wrong.  For each function, describe what the function actually does
6706
(assuming that the parameter is a string).
6707

6708
\begin{verbatim}
6709
def any_lowercase1(s):
6710
    for c in s:
6711
        if c.islower():
6712
            return True
6713
        else:
6714
            return False
6715

6716
def any_lowercase2(s):
6717
    for c in s:
6718
        if 'c'.islower():
6719
            return 'True'
6720
        else:
6721
            return 'False'
6722

6723
def any_lowercase3(s):
6724
    for c in s:
6725
        flag = c.islower()
6726
    return flag
6727

6728
def any_lowercase4(s):
6729
    flag = False
6730
    for c in s:
6731
        flag = flag or c.islower()
6732
    return flag
6733

6734
def any_lowercase5(s):
6735
    for c in s:
6736
        if not c.islower():
6737
            return False
6738
    return True
6739
\end{verbatim}
6740

6741
\end{exercise}
6742

6743

6744
\begin{exercise}
6745
\index{letter rotation}
6746
\index{rotation, letter}
6747

6748
\label{exrotate}
6749
A Caesar cypher is a weak form of encryption that involves ``rotating'' each
6750
letter by a fixed number of places.  To rotate a letter means
6751
to shift it through the alphabet, wrapping around to the beginning if
6752
necessary, so 'A' rotated by 3 is 'D' and 'Z' rotated by 1 is 'A'.
6753

6754
To rotate a word, rotate each letter by the same amount.
6755
For example, ``cheer'' rotated by 7 is ``jolly'' and ``melon'' rotated
6756
by -10 is ``cubed''.  In the movie {\em 2001: A Space Odyssey}, the 
6757
ship computer is called HAL, which is IBM rotated by -1.
6758

6759
%For example ``sleep''
6760
%rotated by 9 is ``bunny'' and ``latex'' rotated by 7 is ``shale''.
6761

6762
Write a function called \verb"rotate_word"
6763
that takes a string and an integer as parameters, and returns
6764
a new string that contains the letters from the original string
6765
rotated by the given amount.  
6766

6767
You might want to use the built-in function {\tt ord}, which converts
6768
a character to a numeric code, and {\tt chr}, which converts numeric
6769
codes to characters.  Letters of the alphabet are encoded in alphabetical
6770
order, so for example:
6771

6772
\begin{verbatim}
6773
>>> ord('c') - ord('a')
6774
2
6775
\end{verbatim}
6776

6777
Because \verb"'c'" is the two-eth letter of the alphabet.  But
6778
beware: the numeric codes for upper case letters are different.
6779

6780
Potentially offensive jokes on the Internet are sometimes encoded in
6781
ROT13, which is a Caesar cypher with rotation 13.  If you are not
6782
easily offended, find and decode some of them.  Solution:
6783
\url{http://thinkpython2.com/code/rotate.py}.
6784

6785
\end{exercise}
6786

6787

6788
\chapter{Case study: word play}
6789
\label{wordplay}
6790

6791
This chapter presents the second case study, which involves
6792
solving word puzzles by searching for words that have certain
6793
properties.  For example, we'll find the longest palindromes
6794
in English and search for words whose letters appear in
6795
alphabetical order.  And I will present another program development
6796
plan: reduction to a previously solved problem.
6797

6798

6799
\section{Reading word lists}
6800
\label{wordlist}
6801

6802
For the exercises in this chapter we need a list of English words.
6803
There are lots of word lists available on the Web, but the one most
6804
suitable for our purpose is one of the word lists collected and
6805
contributed to the public domain by Grady Ward as part of the Moby
6806
lexicon project (see \url{http://wikipedia.org/wiki/Moby_Project}).  It
6807
is a list of 113,809 official crosswords; that is, words that are
6808
considered valid in crossword puzzles and other word games.  In the
6809
Moby collection, the filename is {\tt 113809of.fic}; you can download
6810
a copy, with the simpler name {\tt words.txt}, from
6811
\url{http://thinkpython2.com/code/words.txt}.
6812
\index{Moby Project}
6813
\index{crosswords}
6814

6815
This file is in plain text, so you can open it with a text
6816
editor, but you can also read it from Python.  The built-in
6817
function {\tt open} takes the name of the file as a parameter
6818
and returns a {\bf file object} you can use to read the file.
6819
\index{open function}
6820
\index{function!open}
6821
\index{plain text}
6822
\index{text!plain}
6823
\index{object!file}
6824
\index{file object}
6825

6826
\begin{verbatim}
6827
>>> fin = open('words.txt')
6828
\end{verbatim}
6829
%
6830
{\tt fin} is a common name for a file object used for input.  The file
6831
object provides several methods for reading, including {\tt readline},
6832
which reads characters from the file until it gets to a newline and
6833
returns the result as a string: \index{readline method}
6834
\index{method!readline}
6835

6836
\begin{verbatim}
6837
>>> fin.readline()
6838
'aa\n'
6839
\end{verbatim}
6840
%
6841
The first word in this particular list is ``aa'', which is a kind of
6842
lava.  The sequence \verb"\n" represents the newline character that 
6843
separates this word from the next.
6844

6845
The file object keeps track of where it is in the file, so
6846
if you call {\tt readline} again, you get the next word:
6847

6848
\begin{verbatim}
6849
>>> fin.readline()
6850
'aah\n'
6851
\end{verbatim}
6852
%
6853
The next word is ``aah'', which is a perfectly legitimate
6854
word, so stop looking at me like that.
6855
Or, if it's the newline character that's bothering you,
6856
we can get rid of it with the string method {\tt strip}:
6857
\index{strip method}
6858
\index{method!strip}
6859

6860
\begin{verbatim}
6861
>>> line = fin.readline()
6862
>>> word = line.strip()
6863
>>> word
6864
'aahed'
6865
\end{verbatim}
6866
%
6867
You can also use a file object as part of a {\tt for} loop.
6868
This program reads {\tt words.txt} and prints each word, one
6869
per line:
6870
\index{open function}
6871
\index{function!open}
6872

6873
\begin{verbatim}
6874
fin = open('words.txt')
6875
for line in fin:
6876
    word = line.strip()
6877
    print(word)
6878
\end{verbatim}
6879
%
6880

6881
\section{Exercises}
6882

6883
There are solutions to these exercises in the next section.
6884
You should at least attempt each one before you read the solutions.
6885

6886
\begin{exercise}
6887
Write a program that reads {\tt words.txt} and prints only the
6888
words with more than 20 characters (not counting whitespace).
6889
\index{whitespace}
6890

6891
\end{exercise}
6892

6893
\begin{exercise}
6894

6895
In 1939 Ernest Vincent Wright published a 50,000 word novel called
6896
{\em Gadsby} that does not contain the letter ``e''.  Since ``e'' is
6897
the most common letter in English, that's not easy to do.
6898

6899
In fact, it is difficult to construct a solitary thought without using
6900
that most common symbol.  It is slow going at first, but with caution
6901
and hours of training you can gradually gain facility.
6902

6903
All right, I'll stop now.
6904

6905
Write a function called \verb"has_no_e" that returns {\tt True} if
6906
the given word doesn't have the letter ``e'' in it.
6907

6908
Write a program that reads {\tt words.txt} and prints only the words
6909
that have no ``e''.  Compute the percentage of words in the list
6910
that have no ``e''.
6911
\index{lipogram}
6912

6913
\end{exercise}
6914

6915

6916
\begin{exercise} 
6917

6918
Write a function named {\tt avoids}
6919
that takes a word and a string of forbidden letters, and
6920
that returns {\tt True} if the word doesn't use any of the forbidden
6921
letters.
6922

6923
Write a program that prompts the user to enter a string
6924
of forbidden letters and then prints the number of words that
6925
don't contain any of them.
6926
Can you find a combination of 5 forbidden letters that
6927
excludes the smallest number of words?
6928

6929
\end{exercise}
6930

6931

6932

6933
\begin{exercise}
6934

6935
Write a function named \verb"uses_only" that takes a word and a
6936
string of letters, and that returns {\tt True} if the word contains
6937
only letters in the list.  Can you make a sentence using only the
6938
letters {\tt acefhlo}?  Other than ``Hoe alfalfa''?
6939

6940
\end{exercise}
6941

6942

6943
\begin{exercise} 
6944

6945
Write a function named \verb"uses_all" that takes a word and a
6946
string of required letters, and that returns {\tt True} if the word
6947
uses all the required letters at least once.  How many words are there
6948
that use all the vowels {\tt aeiou}?  How about {\tt aeiouy}?
6949

6950
\end{exercise}
6951

6952

6953
\begin{exercise}
6954

6955
Write a function called \verb"is_abecedarian" that returns
6956
{\tt True} if the letters in a word appear in alphabetical order
6957
(double letters are ok).  
6958
How many abecedarian words are there?
6959

6960
\index{abecedarian}
6961

6962
\end{exercise}
6963

6964

6965

6966
\section{Search}
6967
\label{search}
6968
\index{search pattern}
6969
\index{pattern!search}
6970

6971
All of the exercises in the previous section have something
6972
in common; they can be solved with the search pattern we saw
6973
in Section~\ref{find}.  The simplest example is:
6974

6975
\begin{verbatim}
6976
def has_no_e(word):
6977
    for letter in word:
6978
        if letter == 'e':
6979
            return False
6980
    return True
6981
\end{verbatim}
6982
%
6983
The {\tt for} loop traverses the characters in {\tt word}.  If we find
6984
the letter ``e'', we can immediately return {\tt False}; otherwise we
6985
have to go to the next letter.  If we exit the loop normally, that
6986
means we didn't find an ``e'', so we return {\tt True}.
6987
\index{traversal}
6988

6989
\index{in operator}
6990
\index{operator!in}
6991
You could write this function more concisely using the {\tt in}
6992
operator, but I started with this version because it 
6993
demonstrates the logic of the search pattern.
6994

6995
\index{generalization}
6996
{\tt avoids} is a more general version of \verb"has_no_e" but it
6997
has the same structure:
6998

6999
\begin{verbatim}
7000
def avoids(word, forbidden):
7001
    for letter in word:
7002
        if letter in forbidden:
7003
            return False
7004
    return True
7005
\end{verbatim}
7006
%
7007
We can return {\tt False} as soon as we find a forbidden letter;
7008
if we get to the end of the loop, we return {\tt True}.
7009

7010
\verb"uses_only" is similar except that the sense of the condition
7011
is reversed:
7012

7013
\begin{verbatim}
7014
def uses_only(word, available):
7015
    for letter in word: 
7016
        if letter not in available:
7017
            return False
7018
    return True
7019
\end{verbatim}
7020
%
7021
Instead of a list of forbidden letters, we have a list of available
7022
letters.  If we find a letter in {\tt word} that is not in
7023
{\tt available}, we can return {\tt False}.
7024

7025
\verb"uses_all" is similar except that we reverse the role
7026
of the word and the string of letters:
7027

7028
\begin{verbatim}
7029
def uses_all(word, required):
7030
    for letter in required: 
7031
        if letter not in word:
7032
            return False
7033
    return True
7034
\end{verbatim}
7035
%
7036
Instead of traversing the letters in {\tt word}, the loop
7037
traverses the required letters.  If any of the required letters
7038
do not appear in the word, we can return {\tt False}.
7039
\index{traversal}
7040

7041
If you were really thinking like a computer scientist, you would
7042
have recognized that \verb"uses_all" was an instance of a
7043
previously solved problem, and you would have written:
7044

7045
\begin{verbatim}
7046
def uses_all(word, required):
7047
    return uses_only(required, word)
7048
\end{verbatim}
7049
%
7050
This is an example of a program development plan called {\bf
7051
  reduction to a previously solved problem}, which means that you
7052
recognize the problem you are working on as an instance of a solved
7053
problem and apply an existing solution.  \index{reduction to a
7054
  previously solved problem} \index{development plan!reduction}
7055

7056

7057
\section{Looping with indices}
7058
\index{looping!with indices}
7059
\index{index!looping with}
7060

7061
I wrote the functions in the previous section with {\tt for}
7062
loops because I only needed the characters in the strings; I didn't
7063
have to do anything with the indices.
7064

7065
For \verb"is_abecedarian" we have to compare adjacent letters,
7066
which is a little tricky with a {\tt for} loop:
7067

7068
\begin{verbatim}
7069
def is_abecedarian(word):
7070
    previous = word[0]
7071
    for c in word:
7072
        if c < previous:
7073
            return False
7074
        previous = c
7075
    return True
7076
\end{verbatim}
7077

7078
An alternative is to use recursion:
7079

7080
\begin{verbatim}
7081
def is_abecedarian(word):
7082
    if len(word) <= 1:
7083
        return True
7084
    if word[0] > word[1]:
7085
        return False
7086
    return is_abecedarian(word[1:])
7087
\end{verbatim}
7088

7089
Another option is to use a {\tt while} loop:
7090

7091
\begin{verbatim}
7092
def is_abecedarian(word):
7093
    i = 0
7094
    while i < len(word)-1:
7095
        if word[i+1] < word[i]:
7096
            return False
7097
        i = i+1
7098
    return True
7099
\end{verbatim}
7100
%
7101
The loop starts at {\tt i=0} and ends when {\tt i=len(word)-1}.  Each
7102
time through the loop, it compares the $i$th character (which you can
7103
think of as the current character) to the $i+1$th character (which you
7104
can think of as the next).
7105

7106
If the next character is less than (alphabetically before) the current
7107
one, then we have discovered a break in the abecedarian trend, and
7108
we return {\tt False}.
7109

7110
If we get to the end of the loop without finding a fault, then the
7111
word passes the test.  To convince yourself that the loop ends
7112
correctly, consider an example like \verb"'flossy'".  The
7113
length of the word is 6, so
7114
the last time the loop runs is when {\tt i} is 4, which is the
7115
index of the second-to-last character.  On the last iteration,
7116
it compares the second-to-last character to the last, which is
7117
what we want.
7118
\index{palindrome}
7119

7120
Here is a version of \verb"is_palindrome" (see
7121
Exercise~\ref{palindrome}) that uses two indices; one starts at the
7122
beginning and goes up; the other starts at the end and goes down.
7123

7124
\begin{verbatim}
7125
def is_palindrome(word):
7126
    i = 0
7127
    j = len(word)-1
7128

7129
    while i<j:
7130
        if word[i] != word[j]:
7131
            return False
7132
        i = i+1
7133
        j = j-1
7134

7135
    return True
7136
\end{verbatim}
7137

7138
Or we could reduce to a previously solved
7139
problem and write:
7140
\index{reduction to a previously solved problem}
7141
\index{development plan!reduction}
7142

7143
\begin{verbatim}
7144
def is_palindrome(word):
7145
    return is_reverse(word, word)
7146
\end{verbatim}
7147
%
7148
Using \verb"is_reverse" from Section~\ref{isreverse}.
7149

7150

7151
\section{Debugging}
7152
\index{debugging}
7153
\index{testing!is hard}
7154
\index{program testing}
7155

7156
Testing programs is hard.  The functions in this chapter are
7157
relatively easy to test because you can check the results by hand.
7158
Even so, it is somewhere between difficult and impossible to choose a
7159
set of words that test for all possible errors.
7160

7161
Taking \verb"has_no_e" as an example, there are two obvious
7162
cases to check: words that have an `e' should return {\tt False}, and
7163
words that don't should return {\tt True}.  You should have no
7164
trouble coming up with one of each.
7165

7166
Within each case, there are some less obvious subcases.  Among the
7167
words that have an ``e'', you should test words with an ``e'' at the
7168
beginning, the end, and somewhere in the middle.  You should test long
7169
words, short words, and very short words, like the empty string.  The
7170
empty string is an example of a {\bf special case}, which is one of
7171
the non-obvious cases where errors often lurk.
7172
\index{special case}
7173

7174
In addition to the test cases you generate, you can also test
7175
your program with a word list like {\tt words.txt}.  By scanning
7176
the output, you might be able to catch errors, but be careful:
7177
you might catch one kind of error (words that should not be
7178
included, but are) and not another (words that should be included,
7179
but aren't).
7180

7181
In general, testing can help you find bugs, but it is not easy to
7182
generate a good set of test cases, and even if you do, you can't
7183
be sure your program is correct.
7184
According to a legendary computer scientist:
7185
\index{testing!and absence of bugs}
7186

7187
\begin{quote}
7188
Program testing can be used to show the presence of bugs, but never to
7189
show their absence!
7190

7191
--- Edsger W. Dijkstra
7192
\end{quote}
7193
\index{Dijkstra, Edsger}
7194

7195

7196
\section{Glossary}
7197

7198
\begin{description}
7199

7200
\item[file object:] A value that represents an open file.
7201
\index{file object}
7202
\index{object!file}
7203

7204
\item[reduction to a previously solved problem:] A way of solving a
7205
  problem by expressing it as an instance of a previously solved
7206
  problem.  \index{reduction to a previously solved problem}
7207
  \index{development plan!reduction}
7208

7209
\item[special case:] A test case that is atypical or non-obvious
7210
(and less likely to be handled correctly).
7211
\index{special case}
7212

7213
\end{description}
7214

7215

7216
\section{Exercises}
7217

7218
\begin{exercise}
7219
\index{Car Talk}
7220
\index{Puzzler}
7221
\index{double letters}
7222

7223
This question is based on a Puzzler that was broadcast on the radio
7224
program {\em Car Talk} 
7225
(\url{http://www.cartalk.com/content/puzzlers}):
7226

7227
\begin{quote}
7228
Give me a word with three consecutive double letters. I'll give you a
7229
couple of words that almost qualify, but don't. For example, the word
7230
committee, c-o-m-m-i-t-t-e-e. It would be great except for the `i' that
7231
sneaks in there. Or Mississippi: M-i-s-s-i-s-s-i-p-p-i. If you could
7232
take out those i's it would work. But there is a word that has three
7233
consecutive pairs of letters and to the best of my knowledge this may
7234
be the only word. Of course there are probably 500 more but I can only
7235
think of one. What is the word?
7236
\end{quote}
7237

7238
Write a program to find it.
7239
Solution: \url{http://thinkpython2.com/code/cartalk1.py}.
7240

7241
\end{exercise}
7242

7243

7244
\begin{exercise}
7245
Here's another {\em Car Talk}
7246
Puzzler (\url{http://www.cartalk.com/content/puzzlers}):
7247
\index{Car Talk}
7248
\index{Puzzler}
7249
\index{odometer}
7250
\index{palindrome}
7251

7252
\begin{quote}
7253
``I was driving on the highway the other day and I happened to
7254
notice my odometer. Like most odometers, it shows six digits,
7255
in whole miles only. So, if my car had 300,000
7256
miles, for example, I'd see 3-0-0-0-0-0.
7257

7258
``Now, what I saw that day was very interesting. I noticed that the
7259
last 4 digits were palindromic; that is, they read the same forward as
7260
backward. For example, 5-4-4-5 is a palindrome, so my odometer
7261
could have read 3-1-5-4-4-5.
7262

7263
``One mile later, the last 5 numbers were palindromic. For example, it
7264
could have read 3-6-5-4-5-6.  One mile after that, the middle 4 out of
7265
6 numbers were palindromic.  And you ready for this? One mile later,
7266
all 6 were palindromic!
7267

7268
``The question is, what was on the odometer when I first looked?''
7269
\end{quote}
7270

7271
Write a Python program that tests all the six-digit numbers and prints
7272
any numbers that satisfy these requirements.  
7273
Solution: \url{http://thinkpython2.com/code/cartalk2.py}.
7274

7275
\end{exercise}
7276

7277

7278
\begin{exercise}
7279
Here's another {\em Car Talk} Puzzler you can solve with a
7280
search (\url{http://www.cartalk.com/content/puzzlers}):
7281
\index{Car Talk}
7282
\index{Puzzler}
7283
\index{palindrome}
7284

7285
\begin{quote}
7286
``Recently I had a visit with my mom and we realized that
7287
the two digits that make up my age when reversed resulted in her
7288
age. For example, if she's 73, I'm 37. We wondered how often this has
7289
happened over the years but we got sidetracked with other topics and
7290
we never came up with an answer.
7291

7292
``When I got home I figured out that the digits of our ages have been
7293
reversible six times so far. I also figured out that if we're lucky it
7294
would happen again in a few years, and if we're really lucky it would
7295
happen one more time after that. In other words, it would have
7296
happened 8 times over all. So the question is, how old am I now?''
7297

7298
\end{quote}
7299

7300
Write a Python program that searches for solutions to this Puzzler.
7301
Hint: you might find the string method {\tt zfill} useful.
7302

7303
Solution: \url{http://thinkpython2.com/code/cartalk3.py}.
7304

7305
\end{exercise}
7306

7307

7308

7309
\chapter{Lists}
7310

7311
This chapter presents one of Python's most useful built-in types, lists.
7312
You will also learn more about objects and what can happen when you have
7313
more than one name for the same object.
7314

7315

7316
\section{A list is a sequence}
7317
\label{sequence}
7318

7319
Like a string, a {\bf list} is a sequence of values.  In a string, the
7320
values are characters; in a list, they can be any type.  The values in
7321
a list are called {\bf elements} or sometimes {\bf items}.
7322
\index{list}
7323
\index{type!list}
7324
\index{element}
7325
\index{sequence}
7326
\index{item}
7327

7328
There are several ways to create a new list; the simplest is to
7329
enclose the elements in square brackets (\verb"[" and \verb"]"):
7330

7331
\begin{verbatim}
7332
[10, 20, 30, 40]
7333
['crunchy frog', 'ram bladder', 'lark vomit']
7334
\end{verbatim}
7335
%
7336
The first example is a list of four integers.  The second is a list of
7337
three strings.  The elements of a list don't have to be the same type.
7338
The following list contains a string, a float, an integer, and
7339
(lo!) another list:
7340

7341
\begin{verbatim}
7342
['spam', 2.0, 5, [10, 20]]
7343
\end{verbatim}
7344
%
7345
A list within another list is {\bf nested}.
7346
\index{nested list}
7347
\index{list!nested}
7348

7349
A list that contains no elements is
7350
called an empty list; you can create one with empty
7351
brackets, \verb"[]".
7352
\index{empty list}
7353
\index{list!empty}
7354

7355
As you might expect, you can assign list values to variables:
7356

7357
\begin{verbatim}
7358
>>> cheeses = ['Cheddar', 'Edam', 'Gouda']
7359
>>> numbers = [42, 123]
7360
>>> empty = []
7361
>>> print(cheeses, numbers, empty)
7362
['Cheddar', 'Edam', 'Gouda'] [42, 123] []
7363
\end{verbatim}
7364
%
7365
\index{assignment}
7366

7367

7368
\section{Lists are mutable}
7369
\label{mutable}
7370
\index{list!element}
7371
\index{access}
7372
\index{index}
7373
\index{bracket operator}
7374
\index{operator!bracket}
7375

7376
The syntax for accessing the elements of a list is the same as for
7377
accessing the characters of a string---the bracket operator.  The
7378
expression inside the brackets specifies the index.  Remember that the
7379
indices start at 0:
7380

7381
\begin{verbatim}
7382
>>> cheeses[0]
7383
'Cheddar'
7384
\end{verbatim}
7385
%
7386
Unlike strings, lists are mutable.  When the bracket operator appears
7387
on the left side of an assignment, it identifies the element of the
7388
list that will be assigned.
7389
\index{mutability}
7390

7391
\begin{verbatim}
7392
>>> numbers = [42, 123]
7393
>>> numbers[1] = 5
7394
>>> numbers
7395
[42, 5]
7396
\end{verbatim}
7397
%
7398
The one-eth element of {\tt numbers}, which
7399
used to be 123, is now 5.
7400
\index{index!starting at zero}
7401
\index{zero, index starting at}
7402

7403
Figure~\ref{fig.liststate} shows 
7404
the state diagram for {\tt
7405
cheeses}, {\tt numbers} and {\tt empty}:
7406
\index{state diagram}
7407
\index{diagram!state}
7408

7409
\begin{figure}
7410
\centerline
7411
{\includegraphics[scale=0.8]{figs/liststate.pdf}}
7412
\caption{State diagram.}
7413
\label{fig.liststate}
7414
\end{figure}
7415

7416
Lists are represented by boxes with the word ``list'' outside
7417
and the elements of the list inside.  {\tt cheeses} refers to
7418
a list with three elements indexed 0, 1 and 2.
7419
{\tt numbers} contains two elements; the diagram shows that the
7420
value of the second element has been reassigned from 123 to 5.
7421
{\tt empty} refers to a list with no elements.
7422
\index{item assignment}
7423
\index{assignment!item}
7424
\index{reassignment}
7425

7426
List indices work the same way as string indices:
7427

7428
\begin{itemize}
7429

7430
\item Any integer expression can be used as an index.
7431

7432
\item If you try to read or write an element that does not exist, you
7433
get an {\tt IndexError}.
7434
\index{exception!IndexError}
7435
\index{IndexError}
7436

7437
\item If an index has a negative value, it counts backward from the
7438
end of the list.
7439

7440
\end{itemize}
7441
\index{list!index}
7442

7443
\index{list!membership}
7444
\index{membership!list}
7445
\index{in operator}
7446
\index{operator!in}
7447

7448
The {\tt in} operator also works on lists.
7449

7450
\begin{verbatim}
7451
>>> cheeses = ['Cheddar', 'Edam', 'Gouda']
7452
>>> 'Edam' in cheeses
7453
True
7454
>>> 'Brie' in cheeses
7455
False
7456
\end{verbatim}
7457

7458

7459
\section{Traversing a list}
7460
\index{list!traversal}
7461
\index{traversal!list}
7462
\index{for loop}
7463
\index{loop!for}
7464
\index{statement!for}
7465

7466
The most common way to traverse the elements of a list is
7467
with a {\tt for} loop.  The syntax is the same as for strings:
7468

7469
\begin{verbatim}
7470
for cheese in cheeses:
7471
    print(cheese)
7472
\end{verbatim}
7473
%
7474
This works well if you only need to read the elements of the
7475
list.  But if you want to write or update the elements, you
7476
need the indices.  A common way to do that is to combine
7477
the built-in functions {\tt range} and {\tt len}:
7478
\index{looping!with indices}
7479
\index{index!looping with}
7480

7481
\begin{verbatim}
7482
for i in range(len(numbers)):
7483
    numbers[i] = numbers[i] * 2
7484
\end{verbatim}
7485
%
7486
This loop traverses the list and updates each element.  {\tt len}
7487
returns the number of elements in the list.  {\tt range} returns
7488
a list of indices from 0 to $n-1$, where $n$ is the length of
7489
the list.  Each time through the loop {\tt i} gets the index
7490
of the next element.  The assignment statement in the body uses
7491
{\tt i} to read the old value of the element and to assign the
7492
new value.
7493
\index{item update}
7494
\index{update!item}
7495

7496
A {\tt for} loop over an empty list never runs the body:
7497

7498
\begin{verbatim}
7499
for x in []:
7500
    print('This never happens.')
7501
\end{verbatim}
7502
%
7503
Although a list can contain another list, the nested
7504
list still counts as a single element.  The length of this list is
7505
four:
7506
\index{nested list}
7507
\index{list!nested}
7508

7509
\begin{verbatim}
7510
['spam', 1, ['Brie', 'Roquefort', 'Pol le Veq'], [1, 2, 3]]
7511
\end{verbatim}
7512

7513

7514

7515
\section{List operations}
7516
\index{list!operation}
7517

7518
The {\tt +} operator concatenates lists:
7519
\index{concatenation!list}
7520
\index{list!concatenation}
7521

7522
\begin{verbatim}
7523
>>> a = [1, 2, 3]
7524
>>> b = [4, 5, 6]
7525
>>> c = a + b
7526
>>> c
7527
[1, 2, 3, 4, 5, 6]
7528
\end{verbatim}
7529
%
7530
The {\tt *} operator repeats a list a given number of times:
7531
\index{repetition!list}
7532
\index{list!repetition}
7533

7534
\begin{verbatim}
7535
>>> [0] * 4
7536
[0, 0, 0, 0]
7537
>>> [1, 2, 3] * 3
7538
[1, 2, 3, 1, 2, 3, 1, 2, 3]
7539
\end{verbatim}
7540
%
7541
The first example repeats {\tt [0]} four times.  The second example
7542
repeats the list {\tt [1, 2, 3]} three times.
7543

7544

7545
\section{List slices}
7546
\index{slice operator}
7547
\index{operator!slice}
7548
\index{index!slice}
7549
\index{list!slice}
7550
\index{slice!list}
7551

7552
The slice operator also works on lists:
7553

7554
\begin{verbatim}
7555
>>> t = ['a', 'b', 'c', 'd', 'e', 'f']
7556
>>> t[1:3]
7557
['b', 'c']
7558
>>> t[:4]
7559
['a', 'b', 'c', 'd']
7560
>>> t[3:]
7561
['d', 'e', 'f']
7562
\end{verbatim}
7563
%
7564
If you omit the first index, the slice starts at the beginning.
7565
If you omit the second, the slice goes to the end.  So if you
7566
omit both, the slice is a copy of the whole list.
7567
\index{list!copy}
7568
\index{slice!copy}
7569
\index{copy!slice}
7570

7571
\begin{verbatim}
7572
>>> t[:]
7573
['a', 'b', 'c', 'd', 'e', 'f']
7574
\end{verbatim}
7575
%
7576
Since lists are mutable, it is often useful to make a copy
7577
before performing operations that modify lists.
7578
\index{mutability}
7579

7580
A slice operator on the left side of an assignment
7581
can update multiple elements:
7582
\index{slice!update}
7583
\index{update!slice}
7584

7585
\begin{verbatim}
7586
>>> t = ['a', 'b', 'c', 'd', 'e', 'f']
7587
>>> t[1:3] = ['x', 'y']
7588
>>> t
7589
['a', 'x', 'y', 'd', 'e', 'f']
7590
\end{verbatim}
7591
%
7592

7593
% You can add elements to a list by squeezing them into an empty
7594
% slice:
7595

7596
% % \begin{verbatim}
7597
% >>> t = ['a', 'd', 'e', 'f']
7598
% >>> t[1:1] = ['b', 'c']
7599
% >>> print t
7600
% ['a', 'b', 'c', 'd', 'e', 'f']
7601
% \end{verbatim}
7602
% \afterverb
7603
%
7604
% And you can remove elements from a list by assigning the empty list to
7605
% them:
7606

7607
% % \begin{verbatim}
7608
% >>> t = ['a', 'b', 'c', 'd', 'e', 'f']
7609
% >>> t[1:3] = []
7610
% >>> print t
7611
% ['a', 'd', 'e', 'f']
7612
% \end{verbatim}
7613
% \afterverb
7614
%
7615
% But both of those operations can be expressed more clearly
7616
% with list methods.
7617

7618

7619
\section{List methods}
7620
\index{list!method}
7621
\index{method, list}
7622

7623
Python provides methods that operate on lists.  For example,
7624
{\tt append} adds a new element to the end of a list:
7625
\index{append method}
7626
\index{method!append}
7627

7628
\begin{verbatim}
7629
>>> t = ['a', 'b', 'c']
7630
>>> t.append('d')
7631
>>> t
7632
['a', 'b', 'c', 'd']
7633
\end{verbatim}
7634
%
7635
{\tt extend} takes a list as an argument and appends all of
7636
the elements:
7637
\index{extend method}
7638
\index{method!extend}
7639

7640
\begin{verbatim}
7641
>>> t1 = ['a', 'b', 'c']
7642
>>> t2 = ['d', 'e']
7643
>>> t1.extend(t2)
7644
>>> t1
7645
['a', 'b', 'c', 'd', 'e']
7646
\end{verbatim}
7647
%
7648
This example leaves {\tt t2} unmodified.
7649

7650
{\tt sort} arranges the elements of the list from low to high:
7651
\index{sort method}
7652
\index{method!sort}
7653

7654
\begin{verbatim}
7655
>>> t = ['d', 'c', 'e', 'b', 'a']
7656
>>> t.sort()
7657
>>> t
7658
['a', 'b', 'c', 'd', 'e']
7659
\end{verbatim}
7660
%
7661
Most list methods are void; they modify the list and return {\tt None}.
7662
If you accidentally write {\tt t = t.sort()}, you will be disappointed
7663
with the result.
7664
\index{void method}
7665
\index{method!void}
7666
\index{None special value}
7667
\index{special value!None}
7668

7669

7670
\section{Map, filter and reduce}
7671
\label{filter}
7672

7673
To add up all the numbers in a list, you can use a loop like this:
7674

7675
% see add.py
7676

7677
\begin{verbatim}
7678
def add_all(t):
7679
    total = 0
7680
    for x in t:
7681
        total += x
7682
    return total
7683
\end{verbatim}
7684
%
7685
{\tt total} is initialized to 0.  Each time through the loop,
7686
{\tt x} gets one element from the list.  The {\tt +=} operator
7687
provides a short way to update a variable.  This 
7688
{\bf augmented assignment statement},
7689
\index{update operator}
7690
\index{operator!update}
7691
\index{assignment!augmented}
7692
\index{augmented assignment}
7693

7694
\begin{verbatim}
7695
    total += x
7696
\end{verbatim}
7697
%
7698
is equivalent to
7699

7700
\begin{verbatim}
7701
    total = total + x
7702
\end{verbatim}
7703
%
7704
As the loop runs, {\tt total} accumulates the sum of the
7705
elements; a variable used this way is sometimes called an
7706
{\bf accumulator}.
7707
\index{accumulator!sum}
7708

7709
Adding up the elements of a list is such a common operation
7710
that Python provides it as a built-in function, {\tt sum}:
7711

7712
\begin{verbatim}
7713
>>> t = [1, 2, 3]
7714
>>> sum(t)
7715
6
7716
\end{verbatim}
7717
%
7718
An operation like this that combines a sequence of elements into
7719
a single value is sometimes called {\bf reduce}.
7720
\index{reduce pattern}
7721
\index{pattern!reduce}
7722
\index{traversal}
7723

7724
Sometimes you want to traverse one list while building
7725
another.  For example, the following function takes a list of strings
7726
and returns a new list that contains capitalized strings:
7727

7728
\begin{verbatim}
7729
def capitalize_all(t):
7730
    res = []
7731
    for s in t:
7732
        res.append(s.capitalize())
7733
    return res
7734
\end{verbatim}
7735
%
7736
{\tt res} is initialized with an empty list; each time through
7737
the loop, we append the next element.  So {\tt res} is another
7738
kind of accumulator.
7739
\index{accumulator!list}
7740

7741
An operation like \verb"capitalize_all" is sometimes called a {\bf
7742
map} because it ``maps'' a function (in this case the method {\tt
7743
capitalize}) onto each of the elements in a sequence.
7744
\index{map pattern}
7745
\index{pattern!map}
7746
\index{filter pattern}
7747
\index{pattern!filter}
7748

7749
Another common operation is to select some of the elements from
7750
a list and return a sublist.  For example, the following
7751
function takes a list of strings and returns a list that contains
7752
only the uppercase strings:
7753

7754
\begin{verbatim}
7755
def only_upper(t):
7756
    res = []
7757
    for s in t:
7758
        if s.isupper():
7759
            res.append(s)
7760
    return res
7761
\end{verbatim}
7762
%
7763
{\tt isupper} is a string method that returns {\tt True} if
7764
the string contains only upper case letters.
7765

7766
An operation like \verb"only_upper" is called a {\bf filter} because
7767
it selects some of the elements and filters out the others.
7768

7769
Most common list operations can be expressed as a combination
7770
of map, filter and reduce.
7771

7772

7773
\section{Deleting elements}
7774
\index{element deletion}
7775
\index{deletion, element of list}
7776

7777
There are several ways to delete elements from a list.  If you
7778
know the index of the element you want, you can use
7779
{\tt pop}:
7780
\index{pop method}
7781
\index{method!pop}
7782

7783
\begin{verbatim}
7784
>>> t = ['a', 'b', 'c']
7785
>>> x = t.pop(1)
7786
>>> t
7787
['a', 'c']
7788
>>> x
7789
'b'
7790
\end{verbatim}
7791
%
7792
{\tt pop} modifies the list and returns the element that was removed.
7793
If you don't provide an index, it deletes and returns the
7794
last element.
7795

7796
If you don't need the removed value, you can use the {\tt del}
7797
operator:
7798
\index{del operator}
7799
\index{operator!del}
7800

7801
\begin{verbatim}
7802
>>> t = ['a', 'b', 'c']
7803
>>> del t[1]
7804
>>> t
7805
['a', 'c']
7806
\end{verbatim}
7807
%
7808
If you know the element you want to remove (but not the index), you
7809
can use {\tt remove}:
7810
\index{remove method}
7811
\index{method!remove}
7812

7813
\begin{verbatim}
7814
>>> t = ['a', 'b', 'c']
7815
>>> t.remove('b')
7816
>>> t
7817
['a', 'c']
7818
\end{verbatim}
7819
%
7820
The return value from {\tt remove} is {\tt None}.
7821
\index{None special value}
7822
\index{special value!None}
7823

7824
To remove more than one element, you can use {\tt del} with
7825
a slice index:
7826

7827
\begin{verbatim}
7828
>>> t = ['a', 'b', 'c', 'd', 'e', 'f']
7829
>>> del t[1:5]
7830
>>> t
7831
['a', 'f']
7832
\end{verbatim}
7833
%
7834
As usual, the slice selects all the elements up to but not
7835
including the second index.
7836

7837

7838

7839
\section{Lists and strings}
7840
\index{list}
7841
\index{string}
7842
\index{sequence}
7843

7844
A string is a sequence of characters and a list is a sequence
7845
of values, but a list of characters is not the same as a
7846
string.  To convert from a string to a list of characters,
7847
you can use {\tt list}:
7848
\index{list!function}
7849
\index{function!list}
7850

7851
\begin{verbatim}
7852
>>> s = 'spam'
7853
>>> t = list(s)
7854
>>> t
7855
['s', 'p', 'a', 'm']
7856
\end{verbatim}
7857
%
7858
Because {\tt list} is the name of a built-in function, you should
7859
avoid using it as a variable name.  I also avoid {\tt l} because
7860
it looks too much like {\tt 1}.  So that's why I use {\tt t}.
7861

7862
The {\tt list} function breaks a string into individual letters.  If
7863
you want to break a string into words, you can use the {\tt split}
7864
method:
7865
\index{split method}
7866
\index{method!split}
7867

7868
\begin{verbatim}
7869
>>> s = 'pining for the fjords'
7870
>>> t = s.split()
7871
>>> t
7872
['pining', 'for', 'the', 'fjords']
7873
\end{verbatim}
7874
%
7875
An optional argument called a {\bf delimiter} specifies which
7876
characters to use as word boundaries.
7877
The following example
7878
uses a hyphen as a delimiter:
7879
\index{optional argument}
7880
\index{argument!optional}
7881
\index{delimiter}
7882

7883
\begin{verbatim}
7884
>>> s = 'spam-spam-spam'
7885
>>> delimiter = '-'
7886
>>> t = s.split(delimiter)
7887
>>> t
7888
['spam', 'spam', 'spam']
7889
\end{verbatim}
7890
%
7891
{\tt join} is the inverse of {\tt split}.  It
7892
takes a list of strings and
7893
concatenates the elements.  {\tt join} is a string method,
7894
so you have to invoke it on the delimiter and pass the
7895
list as a parameter:
7896
\index{join method}
7897
\index{method!join}
7898
\index{concatenation}
7899

7900
\begin{verbatim}
7901
>>> t = ['pining', 'for', 'the', 'fjords']
7902
>>> delimiter = ' '
7903
>>> s = delimiter.join(t)
7904
>>> s
7905
'pining for the fjords'
7906
\end{verbatim}
7907
%
7908
In this case the delimiter is a space character, so
7909
{\tt join} puts a space between words.  To concatenate
7910
strings without spaces, you can use the empty string,
7911
\verb"''", as a delimiter. 
7912
\index{empty string}
7913
\index{string!empty}
7914

7915

7916
\section{Objects and values}
7917
\label{equivalence}
7918
\index{object}
7919
\index{value}
7920

7921
If we run these assignment statements:
7922

7923
\begin{verbatim}
7924
a = 'banana'
7925
b = 'banana'
7926
\end{verbatim}
7927
%
7928
We know that {\tt a} and {\tt b} both refer to a
7929
string, but we don't
7930
know whether they refer to the {\em same} string.
7931
There are two possible states, shown in Figure~\ref{fig.list1}.
7932
\index{aliasing}
7933

7934
\begin{figure}
7935
\centerline
7936
{\includegraphics[scale=0.8]{figs/list1.pdf}}
7937
\caption{State diagram.}
7938
\label{fig.list1}
7939
\end{figure}
7940

7941
In one case, {\tt a} and {\tt b} refer to two different objects that
7942
have the same value.  In the second case, they refer to the same
7943
object.
7944
\index{is operator}
7945
\index{operator!is}
7946

7947
To check whether two variables refer to the same object, you can
7948
use the {\tt is} operator.
7949

7950
\begin{verbatim}
7951
>>> a = 'banana'
7952
>>> b = 'banana'
7953
>>> a is b
7954
True
7955
\end{verbatim}
7956
%
7957
In this example, Python only created one string object, and both {\tt
7958
  a} and {\tt b} refer to it.  But when you create two lists, you get
7959
two objects:
7960

7961
\begin{verbatim}
7962
>>> a = [1, 2, 3]
7963
>>> b = [1, 2, 3]
7964
>>> a is b
7965
False
7966
\end{verbatim}
7967
%
7968
So the state diagram looks like Figure~\ref{fig.list2}.
7969
\index{state diagram}
7970
\index{diagram!state}
7971

7972
\begin{figure}
7973
\centerline
7974
{\includegraphics[scale=0.8]{figs/list2.pdf}}
7975
\caption{State diagram.}
7976
\label{fig.list2}
7977
\end{figure}
7978

7979
In this case we would say that the two lists are {\bf equivalent},
7980
because they have the same elements, but not {\bf identical}, because
7981
they are not the same object.  If two objects are identical, they are
7982
also equivalent, but if they are equivalent, they are not necessarily
7983
identical.
7984
\index{equivalence}
7985
\index{identity}
7986

7987
Until now, we have been using ``object'' and ``value''
7988
interchangeably, but it is more precise to say that an object has a
7989
value.  If you evaluate {\tt [1, 2, 3]}, you get a list
7990
object whose value is a sequence of integers.  If another
7991
list has the same elements, we say it has the same value, but
7992
it is not the same object.
7993
\index{object}
7994
\index{value}
7995

7996

7997
\section{Aliasing}
7998
\index{aliasing}
7999
\index{reference!aliasing}
8000

8001
If {\tt a} refers to an object and you assign {\tt b = a},
8002
then both variables refer to the same object:
8003

8004
\begin{verbatim}
8005
>>> a = [1, 2, 3]
8006
>>> b = a
8007
>>> b is a
8008
True
8009
\end{verbatim}
8010
%
8011
The state diagram looks like Figure~\ref{fig.list3}.
8012
\index{state diagram}
8013
\index{diagram!state}
8014

8015
\begin{figure}
8016
\centerline
8017
{\includegraphics[scale=0.8]{figs/list3.pdf}}
8018
\caption{State diagram.}
8019
\label{fig.list3}
8020
\end{figure}
8021

8022
The association of a variable with an object is called a {\bf
8023
reference}.  In this example, there are two references to the same
8024
object.
8025
\index{reference}
8026

8027
An object with more than one reference has more
8028
than one name, so we say that the object is {\bf aliased}.
8029
\index{mutability}
8030

8031
If the aliased object is mutable, changes made with one alias affect
8032
the other:
8033

8034
\begin{verbatim}
8035
>>> b[0] = 42
8036
>>> a
8037
[42, 2, 3]
8038
\end{verbatim}
8039
%
8040
Although this behavior can be useful, it is error-prone.  In general,
8041
it is safer to avoid aliasing when you are working with mutable
8042
objects.
8043
\index{immutability}
8044

8045
For immutable objects like strings, aliasing is not as much of a
8046
problem.  In this example:
8047

8048
\begin{verbatim}
8049
a = 'banana'
8050
b = 'banana'
8051
\end{verbatim}
8052
%
8053
It almost never makes a difference whether {\tt a} and {\tt b} refer
8054
to the same string or not.
8055

8056

8057
\section{List arguments}
8058
\label{list.arguments}
8059
\index{list!as argument}
8060
\index{argument}
8061
\index{argument!list}
8062
\index{reference}
8063
\index{parameter}
8064

8065
When you pass a list to a function, the function gets a reference to
8066
the list.  If the function modifies the list, the caller sees
8067
the change.  For example, \verb"delete_head" removes the first element
8068
from a list:
8069

8070
\begin{verbatim}
8071
def delete_head(t):
8072
    del t[0]
8073
\end{verbatim}
8074
%
8075
Here's how it is used:
8076

8077
\begin{verbatim}
8078
>>> letters = ['a', 'b', 'c']
8079
>>> delete_head(letters)
8080
>>> letters
8081
['b', 'c']
8082
\end{verbatim}
8083
%
8084
The parameter {\tt t} and the variable {\tt letters} are
8085
aliases for the same object.  The stack diagram looks like
8086
Figure~\ref{fig.stack5}.
8087
\index{stack diagram}
8088
\index{diagram!stack}
8089

8090
\begin{figure}
8091
\centerline
8092
{\includegraphics[scale=0.8]{figs/stack5.pdf}}
8093
\caption{Stack diagram.}
8094
\label{fig.stack5}
8095
\end{figure}
8096

8097
Since the list is shared by two frames, I drew
8098
it between them.
8099

8100
It is important to distinguish between operations that
8101
modify lists and operations that create new lists.  For
8102
example, the {\tt append} method modifies a list, but the
8103
{\tt +} operator creates a new list.
8104
\index{append method}
8105
\index{method!append}
8106
\index{list!concatenation}
8107
\index{concatenation!list}
8108

8109
Here's an example using {\tt append}:
8110
%
8111
\begin{verbatim}
8112
>>> t1 = [1, 2]
8113
>>> t2 = t1.append(3)
8114
>>> t1
8115
[1, 2, 3]
8116
>>> t2
8117
None
8118
\end{verbatim}
8119
%
8120
The return value from {\tt append} is {\tt None}.
8121

8122
Here's an example using the {\tt +} operator:
8123
%
8124
\begin{verbatim}
8125
>>> t3 = t1 + [4]
8126
>>> t1
8127
[1, 2, 3]
8128
>>> t3
8129
[1, 2, 3, 4]
8130
\end{verbatim}
8131
%
8132
The result of the operator is a new list, and the original list is
8133
unchanged.
8134

8135
This difference is important when you write functions that
8136
are supposed to modify lists.  For example, this function
8137
{\em does not} delete the head of a list:
8138
%
8139
\begin{verbatim}
8140
def bad_delete_head(t):
8141
    t = t[1:]              # WRONG!
8142
\end{verbatim}
8143
%
8144
The slice operator creates a new list and the assignment
8145
makes {\tt t} refer to it, but that doesn't affect the caller.
8146
\index{slice operator}
8147
\index{operator!slice}
8148
%
8149
\begin{verbatim}
8150
>>> t4 = [1, 2, 3]
8151
>>> bad_delete_head(t4)
8152
>>> t4
8153
[1, 2, 3]
8154
\end{verbatim}
8155
%
8156
At the beginning of \verb"bad_delete_head", {\tt t} and {\tt t4}
8157
refer to the same list.  At the end, {\tt t} refers to a new list,
8158
but {\tt t4} still refers to the original, unmodified list.
8159

8160
An alternative is to write a function that creates and
8161
returns a new list.  For
8162
example, {\tt tail} returns all but the first
8163
element of a list:
8164

8165
\begin{verbatim}
8166
def tail(t):
8167
    return t[1:]
8168
\end{verbatim}
8169
%
8170
This function leaves the original list unmodified.
8171
Here's how it is used:
8172

8173
\begin{verbatim}
8174
>>> letters = ['a', 'b', 'c']
8175
>>> rest = tail(letters)
8176
>>> rest
8177
['b', 'c']
8178
\end{verbatim}
8179

8180

8181

8182
\section{Debugging}
8183
\index{debugging}
8184

8185
Careless use of lists (and other mutable objects)
8186
can lead to long hours of debugging.  Here are some common
8187
pitfalls and ways to avoid them:
8188

8189
\begin{enumerate}
8190

8191
\item Most list methods modify the argument and
8192
  return {\tt None}.  This is the opposite of the string methods,
8193
  which return a new string and leave the original alone.
8194

8195
If you are used to writing string code like this:
8196

8197
\begin{verbatim}
8198
word = word.strip()
8199
\end{verbatim}
8200

8201
It is tempting to write list code like this:
8202

8203
\begin{verbatim}
8204
t = t.sort()           # WRONG!
8205
\end{verbatim}
8206
\index{sort method}
8207
\index{method!sort}
8208

8209
Because {\tt sort} returns {\tt None}, the
8210
next operation you perform with {\tt t} is likely to fail.
8211

8212
Before using list methods and operators, you should read the
8213
documentation carefully and then test them in interactive mode.
8214

8215
\item Pick an idiom and stick with it.
8216

8217
Part of the problem with lists is that there are too many
8218
ways to do things.  For example, to remove an element from
8219
a list, you can use {\tt pop}, {\tt remove}, {\tt del},
8220
or even a slice assignment.
8221

8222
To add an element, you can use the {\tt append} method or
8223
the {\tt +} operator.  Assuming that {\tt t} is a list and
8224
{\tt x} is a list element, these are correct: 
8225

8226
\begin{verbatim}
8227
t.append(x)
8228
t = t + [x]
8229
t += [x]
8230
\end{verbatim}
8231

8232
And these are wrong:
8233

8234
\begin{verbatim}
8235
t.append([x])          # WRONG!
8236
t = t.append(x)        # WRONG!
8237
t + [x]                # WRONG!
8238
t = t + x              # WRONG!
8239
\end{verbatim}
8240

8241
Try out each of these examples in interactive mode to make sure
8242
you understand what they do.  Notice that only the last
8243
one causes a runtime error; the other three are legal, but they
8244
do the wrong thing.
8245

8246

8247
\item Make copies to avoid aliasing.
8248
\index{aliasing!copying to avoid}
8249
\index{copy!to avoid aliasing}
8250

8251
If you want to use a method like {\tt sort} that modifies
8252
the argument, but you need to keep the original list as
8253
well, you can make a copy.
8254

8255
\begin{verbatim}
8256
>>> t = [3, 1, 2]
8257
>>> t2 = t[:]
8258
>>> t2.sort()
8259
>>> t
8260
[3, 1, 2]
8261
>>> t2
8262
[1, 2, 3]
8263
\end{verbatim}
8264

8265
In this example you could also use the built-in function {\tt sorted},
8266
which returns a new, sorted list and leaves the original alone.
8267
\index{sorted!function}
8268
\index{function!sorted}
8269

8270
\begin{verbatim}
8271
>>> t2 = sorted(t)
8272
>>> t
8273
[3, 1, 2]
8274
>>> t2
8275
[1, 2, 3]
8276
\end{verbatim}
8277

8278
\end{enumerate}
8279

8280

8281

8282
\section{Glossary}
8283

8284
\begin{description}
8285

8286
\item[list:] A sequence of values.
8287
\index{list}
8288

8289
\item[element:] One of the values in a list (or other sequence),
8290
also called items.
8291
\index{element}
8292

8293
\item[nested list:] A list that is an element of another list.
8294
\index{nested list}
8295

8296
\item[accumulator:] A variable used in a loop to add up or
8297
accumulate a result.
8298
\index{accumulator}
8299

8300
\item[augmented assignment:] A statement that updates the value
8301
of a variable using an operator like \verb"+=".
8302
\index{assignment!augmented}
8303
\index{augmented assignment}
8304
\index{traversal}
8305

8306
\item[reduce:] A processing pattern that traverses a sequence 
8307
and accumulates the elements into a single result.
8308
\index{reduce pattern}
8309
\index{pattern!reduce}
8310

8311
\item[map:] A processing pattern that traverses a sequence and
8312
performs an operation on each element.
8313
\index{map pattern}
8314
\index{pattern!map}
8315

8316
\item[filter:] A processing pattern that traverses a list and
8317
selects the elements that satisfy some criterion.
8318
\index{filter pattern}
8319
\index{pattern!filter}
8320

8321
\item[object:] Something a variable can refer to.  An object
8322
has a type and a value.
8323
\index{object}
8324

8325
\item[equivalent:] Having the same value.
8326
\index{equivalent}
8327

8328
\item[identical:] Being the same object (which implies equivalence).
8329
\index{identical}
8330

8331
\item[reference:] The association between a variable and its value.
8332
\index{reference}
8333

8334
\item[aliasing:] A circumstance where two or more variables refer to the same
8335
object.
8336
\index{aliasing}
8337

8338
\item[delimiter:] A character or string used to indicate where a
8339
string should be split.
8340
\index{delimiter}
8341

8342
\end{description}
8343

8344

8345
\section{Exercises}
8346

8347
You can download solutions to these exercises from
8348
\url{http://thinkpython2.com/code/list_exercises.py}.
8349

8350
\begin{exercise}
8351

8352
Write a function called \verb"nested_sum" that takes a list of lists
8353
of integers and adds up the elements from all of the nested lists.
8354
For example:
8355

8356
\begin{verbatim}
8357
>>> t = [[1, 2], [3], [4, 5, 6]]
8358
>>> nested_sum(t)
8359
21
8360
\end{verbatim}
8361

8362
\end{exercise}
8363

8364
\begin{exercise}
8365
\label{cumulative}
8366
\index{cumulative sum}
8367

8368
Write a function called {\tt cumsum} that takes a list of numbers and
8369
returns the cumulative sum; that is, a new list where the $i$th
8370
element is the sum of the first $i+1$ elements from the original list.
8371
For example:
8372

8373
\begin{verbatim}
8374
>>> t = [1, 2, 3]
8375
>>> cumsum(t)
8376
[1, 3, 6]
8377
\end{verbatim}
8378

8379
\end{exercise}
8380

8381
\begin{exercise}
8382

8383
Write a function called \verb"middle" that takes a list and
8384
returns a new list that contains all but the first and last
8385
elements.  For example:
8386

8387
\begin{verbatim}
8388
>>> t = [1, 2, 3, 4]
8389
>>> middle(t)
8390
[2, 3]
8391
\end{verbatim}
8392

8393
\end{exercise}
8394

8395
\begin{exercise}
8396

8397
Write a function called \verb"chop" that takes a list, modifies it
8398
by removing the first and last elements, and returns {\tt None}.
8399
For example:
8400

8401
\begin{verbatim}
8402
>>> t = [1, 2, 3, 4]
8403
>>> chop(t)
8404
>>> t
8405
[2, 3]
8406
\end{verbatim}
8407

8408
\end{exercise}
8409

8410

8411
\begin{exercise}
8412
Write a function called \verb"is_sorted" that takes a list as a
8413
parameter and returns {\tt True} if the list is sorted in ascending
8414
order and {\tt False} otherwise.  For example:
8415

8416
\begin{verbatim}
8417
>>> is_sorted([1, 2, 2])
8418
True
8419
>>> is_sorted(['b', 'a'])
8420
False
8421
\end{verbatim}
8422

8423
\end{exercise}
8424

8425

8426
\begin{exercise}
8427
\label{anagram}
8428
\index{anagram}
8429

8430
Two words are anagrams if you can rearrange the letters from one
8431
to spell the other.  Write a function called \verb"is_anagram"
8432
that takes two strings and returns {\tt True} if they are anagrams.
8433
\end{exercise}
8434

8435

8436

8437
\begin{exercise}
8438
\label{duplicate}
8439
\index{duplicate}
8440
\index{uniqueness}
8441

8442
Write a function called \verb"has_duplicates" that takes
8443
a list and returns {\tt True} if there is any element that
8444
appears more than once.  It should not modify the original
8445
list.
8446

8447
\end{exercise}
8448

8449

8450
\begin{exercise}
8451

8452
This exercise pertains to the so-called Birthday Paradox, which you
8453
can read about at \url{http://en.wikipedia.org/wiki/Birthday_paradox}.
8454
\index{birthday paradox}
8455

8456
If there are 23 students in your class, what are the chances
8457
that two of you have the same birthday?  You can estimate this
8458
probability by generating random samples of 23 birthdays
8459
and checking for matches.  Hint: you can generate random birthdays
8460
with the {\tt randint} function in the {\tt random} module.
8461
\index{random module}
8462
\index{module!random}
8463
\index{randint function}
8464
\index{function!randint}
8465

8466
You can download my
8467
solution from \url{http://thinkpython2.com/code/birthday.py}.
8468

8469
\end{exercise}
8470

8471

8472

8473
\begin{exercise}
8474
\index{append method}
8475
\index{method append}
8476
\index{list!concatenation}
8477
\index{concatenation!list}
8478

8479
Write a function that reads the file {\tt words.txt} and builds
8480
a list with one element per word.  Write two versions of
8481
this function, one using the {\tt append} method and the
8482
other using the idiom {\tt t = t + [x]}.  Which one takes
8483
longer to run?  Why?
8484

8485
Solution: \url{http://thinkpython2.com/code/wordlist.py}.
8486
\index{time module}
8487
\index{module!time}
8488

8489
\end{exercise}
8490

8491

8492
\begin{exercise}
8493
\label{wordlist1}
8494
\label{bisection}
8495
\index{membership!bisection search}
8496
\index{bisection search}
8497
\index{search, bisection}
8498
\index{membership!binary search}
8499
\index{binary search}
8500
\index{search, binary}
8501

8502
To check whether a word is in the word list, you could use
8503
the {\tt in} operator, but it would be slow because it searches
8504
through the words in order.
8505

8506
Because the words are in alphabetical order, we can speed things up
8507
with a bisection search (also known as binary search), which is
8508
similar to what you do when you look a word up in the dictionary (the book, not the data structure).  You
8509
start in the middle and check to see whether the word you are looking
8510
for comes before the word in the middle of the list.  If so, you
8511
search the first half of the list the same way.  Otherwise you search
8512
the second half.
8513

8514
Either way, you cut the remaining search space in half.  If the
8515
word list has 113,809 words, it will take about 17 steps to
8516
find the word or conclude that it's not there.
8517

8518
Write a function called \verb"in_bisect" that takes a sorted list
8519
and a target value and returns {\tt True} if the word is
8520
in the list and {\tt False} if it's not.
8521
\index{bisect module}
8522
\index{module!bisect}
8523

8524
Or you could read the documentation of the {\tt bisect} module
8525
and use that!  Solution: \url{http://thinkpython2.com/code/inlist.py}.
8526

8527
\end{exercise}
8528

8529
\begin{exercise}
8530
\index{reverse word pair}
8531

8532
Two words are a ``reverse pair'' if each is the reverse of the
8533
other.  Write a program that finds all the reverse pairs in the
8534
word list.  Solution: \url{http://thinkpython2.com/code/reverse_pair.py}.
8535

8536
\end{exercise}
8537

8538
\begin{exercise}
8539
\index{interlocking words}
8540

8541
Two words ``interlock'' if taking alternating letters from each forms
8542
a new word.  For example, ``shoe'' and ``cold''
8543
interlock to form ``schooled''.
8544
Solution: \url{http://thinkpython2.com/code/interlock.py}.
8545
Credit: This exercise is inspired by an example at \url{http://puzzlers.org}.
8546

8547
\begin{enumerate}
8548

8549
\item Write a program that finds all pairs of words that interlock.
8550
  Hint: don't enumerate all pairs!
8551

8552
\item Can you find any words that are three-way interlocked; that is,
8553
  every third letter forms a word, starting from the first, second or
8554
  third?
8555

8556
\end{enumerate}
8557
\end{exercise}
8558

8559

8560
\chapter{Dictionaries}
8561

8562
This chapter presents another built-in type called a dictionary.
8563
Dictionaries are one of Python's best features; they are the
8564
building blocks of many efficient and elegant algorithms.
8565

8566

8567
\section{A dictionary is a mapping}
8568

8569
\index{dictionary}
8570
\index{dictionary}
8571
\index{type!dict}
8572
\index{key}
8573
\index{key-value pair}
8574
\index{index}
8575
A {\bf dictionary} is like a list, but more general.  In a list,
8576
the indices have to be integers; in a dictionary they can
8577
be (almost) any type.
8578

8579
A dictionary contains a collection of indices, which are called {\bf
8580
  keys}, and a collection of values.  Each key is associated with a
8581
single value.  The association of a key and a value is called a {\bf
8582
  key-value pair} or sometimes an {\bf item}.  \index{item}
8583

8584
In mathematical language, a dictionary represents a {\bf mapping}
8585
from keys to values, so you can also say that each key
8586
``maps to'' a value.
8587
As an example, we'll build a dictionary that maps from English
8588
to Spanish words, so the keys and the values are all strings.
8589

8590
The function {\tt dict} creates a new dictionary with no items.
8591
Because {\tt dict} is the name of a built-in function, you
8592
should avoid using it as a variable name.
8593
\index{dict function}
8594
\index{function!dict}
8595

8596
\begin{verbatim}
8597
>>> eng2sp = dict()
8598
>>> eng2sp
8599
{}
8600
\end{verbatim}
8601

8602
The squiggly-brackets, \verb"{}", represent an empty dictionary.
8603
To add items to the dictionary, you can use square brackets:
8604
\index{squiggly bracket}
8605
\index{bracket!squiggly}
8606

8607
\begin{verbatim}
8608
>>> eng2sp['one'] = 'uno'
8609
\end{verbatim}
8610
%
8611
This line creates an item that maps from the key
8612
\verb"'one'" to the value \verb"'uno'".  If we print the
8613
dictionary again, we see a key-value pair with a colon
8614
between the key and value:
8615

8616
\begin{verbatim}
8617
>>> eng2sp
8618
{'one': 'uno'}
8619
\end{verbatim}
8620
%
8621
This output format is also an input format.  For example,
8622
you can create a new dictionary with three items:
8623

8624
\begin{verbatim}
8625
>>> eng2sp = {'one': 'uno', 'two': 'dos', 'three': 'tres'}
8626
\end{verbatim}
8627
%
8628
But if you print {\tt eng2sp}, you might be surprised:
8629

8630
\begin{verbatim}
8631
>>> eng2sp
8632
{'one': 'uno', 'three': 'tres', 'two': 'dos'}
8633
\end{verbatim}
8634
%
8635
The order of the key-value pairs might not be the same.  If
8636
you type the same example on your computer, you might get a
8637
different result.  In general, the order of items in
8638
a dictionary is unpredictable.
8639

8640
But that's not a problem because
8641
the elements of a dictionary are never indexed with integer indices.
8642
Instead, you use the keys to look up the corresponding values:
8643

8644
\begin{verbatim}
8645
>>> eng2sp['two']
8646
'dos'
8647
\end{verbatim}
8648
%
8649
The key \verb"'two'" always maps to the value \verb"'dos'" so the order
8650
of the items doesn't matter.
8651

8652
If the key isn't in the dictionary, you get an exception:
8653
\index{exception!KeyError}
8654
\index{KeyError}
8655

8656
\begin{verbatim}
8657
>>> eng2sp['four']
8658
KeyError: 'four'
8659
\end{verbatim}
8660
%
8661
The {\tt len} function works on dictionaries; it returns the
8662
number of key-value pairs:
8663
\index{len function}
8664
\index{function!len}
8665

8666
\begin{verbatim}
8667
>>> len(eng2sp)
8668
3
8669
\end{verbatim}
8670
%
8671
The {\tt in} operator works on dictionaries, too; it tells you whether
8672
something appears as a {\em key} in the dictionary (appearing
8673
as a value is not good enough).
8674
\index{membership!dictionary}
8675
\index{in operator}
8676
\index{operator!in}
8677

8678
\begin{verbatim}
8679
>>> 'one' in eng2sp
8680
True
8681
>>> 'uno' in eng2sp
8682
False
8683
\end{verbatim}
8684
%
8685
To see whether something appears as a value in a dictionary, you
8686
can use the method {\tt values}, which returns a collection of
8687
values, and then use the {\tt in} operator:
8688
\index{values method}
8689
\index{method!values}
8690

8691
\begin{verbatim}
8692
>>> vals = eng2sp.values()
8693
>>> 'uno' in vals
8694
True
8695
\end{verbatim}
8696
%
8697
The {\tt in} operator uses different algorithms for lists and
8698
dictionaries.  For lists, it searches the elements of the list in
8699
order, as in Section~\ref{find}.  As the list gets longer, the search
8700
time gets longer in direct proportion.
8701

8702
Python dictionaries use a data structure
8703
called a {\bf hashtable} that has a remarkable property: the
8704
{\tt in} operator takes about the same amount of time no matter how
8705
many items are in the dictionary.  I explain how that's possible
8706
in Section~\ref{hashtable}, but the explanation might not make
8707
sense until you've read a few more chapters.
8708

8709

8710
\section{Dictionary as a collection of counters}
8711
\label{histogram}
8712
\index{counter}
8713

8714
Suppose you are given a string and you want to count how many
8715
times each letter appears.  There are several ways you could do it:
8716

8717
\begin{enumerate}
8718

8719
\item You could create 26 variables, one for each letter of the
8720
alphabet.  Then you could traverse the string and, for each
8721
character, increment the corresponding counter, probably using
8722
a chained conditional.
8723

8724
\item You could create a list with 26 elements.  Then you could
8725
convert each character to a number (using the built-in function
8726
{\tt ord}), use the number as an index into the list, and increment
8727
the appropriate counter.
8728

8729
\item You could create a dictionary with characters as keys
8730
and counters as the corresponding values.  The first time you
8731
see a character, you would add an item to the dictionary.  After
8732
that you would increment the value of an existing item.
8733

8734
\end{enumerate}
8735

8736
Each of these options performs the same computation, but each
8737
of them implements that computation in a different way.
8738
\index{implementation}
8739

8740
An {\bf implementation} is a way of performing a computation;
8741
some implementations are better than others.  For example,
8742
an advantage of the dictionary implementation is that we don't
8743
have to know ahead of time which letters appear in the string
8744
and we only have to make room for the letters that do appear.
8745

8746
Here is what the code might look like:
8747

8748
\begin{verbatim}
8749
def histogram(s):
8750
    d = dict()
8751
    for c in s:
8752
        if c not in d:
8753
            d[c] = 1
8754
        else:
8755
            d[c] += 1
8756
    return d
8757
\end{verbatim}
8758
%
8759
The name of the function is {\tt histogram}, which is a statistical
8760
term for a collection of counters (or frequencies).
8761
\index{histogram}
8762
\index{frequency}
8763
\index{traversal}
8764

8765
The first line of the
8766
function creates an empty dictionary.  The {\tt for} loop traverses
8767
the string.  Each time through the loop, if the character {\tt c} is
8768
not in the dictionary, we create a new item with key {\tt c} and the
8769
initial value 1 (since we have seen this letter once).  If {\tt c} is
8770
already in the dictionary we increment {\tt d[c]}.
8771
\index{histogram}
8772

8773
Here's how it works:
8774

8775
\begin{verbatim}
8776
>>> h = histogram('brontosaurus')
8777
>>> h
8778
{'a': 1, 'b': 1, 'o': 2, 'n': 1, 's': 2, 'r': 2, 'u': 2, 't': 1}
8779
\end{verbatim}
8780
%
8781
The histogram indicates that the letters \verb"'a'" and \verb"'b'"
8782
appear once; \verb"'o'" appears twice, and so on.
8783

8784

8785
\index{get method}
8786
\index{method!get}
8787
Dictionaries have a method called {\tt get} that takes a key
8788
and a default value.  If the key appears in the dictionary,
8789
{\tt get} returns the corresponding value; otherwise it returns
8790
the default value.  For example:
8791

8792
\begin{verbatim}
8793
>>> h = histogram('a')
8794
>>> h
8795
{'a': 1}
8796
>>> h.get('a', 0)
8797
1
8798
>>> h.get('c', 0)
8799
0
8800
\end{verbatim}
8801
%
8802
As an exercise, use {\tt get} to write {\tt histogram} more concisely.  You
8803
should be able to eliminate the {\tt if} statement.
8804

8805

8806
\section{Looping and dictionaries}
8807
\index{dictionary!looping with}
8808
\index{looping!with dictionaries}
8809
\index{traversal}
8810

8811
If you use a dictionary in a {\tt for} statement, it traverses
8812
the keys of the dictionary.  For example, \verb"print_hist"
8813
prints each key and the corresponding value:
8814

8815
\begin{verbatim}
8816
def print_hist(h):
8817
    for c in h:
8818
        print(c, h[c])
8819
\end{verbatim}
8820
%
8821
Here's what the output looks like:
8822

8823
\begin{verbatim}
8824
>>> h = histogram('parrot')
8825
>>> print_hist(h)
8826
a 1
8827
p 1
8828
r 2
8829
t 1
8830
o 1
8831
\end{verbatim}
8832
%
8833
Again, the keys are in no particular order.  To traverse the keys
8834
in sorted order, you can use the built-in function {\tt sorted}:
8835
\index{sorted!function}
8836
\index{function!sorted}
8837

8838
\begin{verbatim}
8839
>>> for key in sorted(h):
8840
...     print(key, h[key])
8841
a 1
8842
o 1
8843
p 1
8844
r 2
8845
t 1
8846
\end{verbatim}
8847

8848
%TODO: get this on Atlas
8849

8850

8851
\section{Reverse lookup}
8852
\label{raise}
8853
\index{dictionary!lookup}
8854
\index{dictionary!reverse lookup}
8855
\index{lookup, dictionary}
8856
\index{reverse lookup, dictionary}
8857

8858
Given a dictionary {\tt d} and a key {\tt k}, it is easy to
8859
find the corresponding value {\tt v = d[k]}.  This operation
8860
is called a {\bf lookup}.
8861

8862
But what if you have {\tt v} and you want to find {\tt k}?
8863
You have two problems: first, there might be more than one
8864
key that maps to the value {\tt v}.  Depending on the application,
8865
you might be able to pick one, or you might have to make
8866
a list that contains all of them.  Second, there is no
8867
simple syntax to do a {\bf reverse lookup}; you have to search.
8868

8869
Here is a function that takes a value and returns the first
8870
key that maps to that value:
8871

8872
\begin{verbatim}
8873
def reverse_lookup(d, v):
8874
    for k in d:
8875
        if d[k] == v:
8876
            return k
8877
    raise LookupError()
8878
\end{verbatim}
8879
%
8880
This function is yet another example of the search pattern, but it
8881
uses a feature we haven't seen before, {\tt raise}.  The 
8882
{\bf raise statement} causes an exception; in this case it causes a
8883
{\tt LookupError}, which is a built-in exception used to indicate
8884
that a lookup operation failed.
8885
\index{search}
8886
\index{pattern!search} \index{raise statement} \index{statement!raise}
8887
\index{exception!LookupError} \index{LookupError}
8888

8889
If we get to the end of the loop, that means {\tt v}
8890
doesn't appear in the dictionary as a value, so we raise an
8891
exception.
8892

8893
Here is an example of a successful reverse lookup:
8894

8895
\begin{verbatim}
8896
>>> h = histogram('parrot')
8897
>>> key = reverse_lookup(h, 2)
8898
>>> key
8899
'r'
8900
\end{verbatim}
8901
%
8902
And an unsuccessful one:
8903

8904
\begin{verbatim}
8905
>>> key = reverse_lookup(h, 3)
8906
Traceback (most recent call last):
8907
  File "<stdin>", line 1, in <module>
8908
  File "<stdin>", line 5, in reverse_lookup
8909
LookupError
8910
\end{verbatim}
8911
%
8912
The effect when you raise an exception is the same as when
8913
Python raises one: it prints a traceback and an error message.
8914
\index{traceback}
8915
\index{optional argument}
8916
\index{argument!optional}
8917

8918
When you raise an exception, you can provide a detailed error message as an optional argument.  For example:
8919

8920
\begin{verbatim}
8921
>>> raise LookupError('value does not appear in the dictionary')
8922
Traceback (most recent call last):
8923
  File "<stdin>", line 1, in ?
8924
LookupError: value does not appear in the dictionary
8925
\end{verbatim}
8926
%
8927
A reverse lookup is much slower than a forward lookup; if you
8928
have to do it often, or if the dictionary gets big, the performance
8929
of your program will suffer.
8930

8931

8932
\section{Dictionaries and lists}
8933
\label{invert}
8934

8935
Lists can appear as values in a dictionary.  For example, if you
8936
are given a dictionary that maps from letters to frequencies, you
8937
might want to invert it; that is, create a dictionary that maps
8938
from frequencies to letters.  Since there might be several letters
8939
with the same frequency, each value in the inverted dictionary
8940
should be a list of letters.
8941
\index{invert dictionary}
8942
\index{dictionary!invert}
8943

8944
Here is a function that inverts a dictionary:
8945

8946
\begin{verbatim}
8947
def invert_dict(d):
8948
    inverse = dict()
8949
    for key in d:
8950
        val = d[key]
8951
        if val not in inverse:
8952
            inverse[val] = [key]
8953
        else:
8954
            inverse[val].append(key)
8955
    return inverse
8956
\end{verbatim}
8957
%
8958
Each time through the loop, {\tt key} gets a key from {\tt d} and 
8959
{\tt val} gets the corresponding value.  If {\tt val} is not in {\tt
8960
  inverse}, that means we haven't seen it before, so we create a new
8961
item and initialize it with a {\bf singleton} (a list that contains a
8962
single element).  Otherwise we have seen this value before, so we
8963
append the corresponding key to the list.  \index{singleton}
8964

8965
Here is an example:
8966

8967
\begin{verbatim}
8968
>>> hist = histogram('parrot')
8969
>>> hist
8970
{'a': 1, 'p': 1, 'r': 2, 't': 1, 'o': 1}
8971
>>> inverse = invert_dict(hist)
8972
>>> inverse
8973
{1: ['a', 'p', 't', 'o'], 2: ['r']}
8974
\end{verbatim}
8975

8976
\begin{figure}
8977
\centerline
8978
{\includegraphics[scale=0.8]{figs/dict1.pdf}}
8979
\caption{State diagram.}
8980
\label{fig.dict1}
8981
\end{figure}
8982

8983
Figure~\ref{fig.dict1} is a state diagram showing {\tt hist} and {\tt inverse}.
8984
A dictionary is represented as a box with the type {\tt dict} above it
8985
and the key-value pairs inside.  If the values are integers, floats or
8986
strings, I draw them inside the box, but I usually draw lists
8987
outside the box, just to keep the diagram simple.
8988
\index{state diagram}
8989
\index{diagram!state}
8990

8991
Lists can be values in a dictionary, as this example shows, but they
8992
cannot be keys.  Here's what happens if you try:
8993
\index{TypeError}
8994
\index{exception!TypeError}
8995

8996

8997
\begin{verbatim}
8998
>>> t = [1, 2, 3]
8999
>>> d = dict()
9000
>>> d[t] = 'oops'
9001
Traceback (most recent call last):
9002
  File "<stdin>", line 1, in ?
9003
TypeError: list objects are unhashable
9004
\end{verbatim}
9005
%
9006
I mentioned earlier that a dictionary is implemented using
9007
a hashtable and that means that the keys have to be {\bf hashable}.
9008
\index{hash function}
9009
\index{hashable}
9010

9011
A {\bf hash} is a function that takes a value (of any kind)
9012
and returns an integer.  Dictionaries use these integers,
9013
called hash values, to store and look up key-value pairs.
9014
\index{immutability}
9015

9016
This system works fine if the keys are immutable.  But if the
9017
keys are mutable, like lists, bad things happen.  For example,
9018
when you create a key-value pair, Python hashes the key and 
9019
stores it in the corresponding location.  If you modify the
9020
key and then hash it again, it would go to a different location.
9021
In that case you might have two entries for the same key,
9022
or you might not be able to find a key.  Either way, the
9023
dictionary wouldn't work correctly.
9024

9025
That's why keys have to be hashable, and why mutable types like
9026
lists aren't.  The simplest way to get around this limitation is to
9027
use tuples, which we will see in the next chapter.
9028

9029
Since dictionaries are mutable, they can't be used as keys,
9030
but they {\em can} be used as values.
9031

9032

9033
\section{Memos}
9034
\label{memoize}
9035

9036
If you played with the {\tt fibonacci} function from
9037
Section~\ref{one.more.example}, you might have noticed that the bigger
9038
the argument you provide, the longer the function takes to run.
9039
Furthermore, the run time increases quickly.
9040
\index{fibonacci function}
9041
\index{function!fibonacci}
9042

9043
To understand why, consider Figure~\ref{fig.fibonacci}, which shows
9044
the {\bf call graph} for {\tt fibonacci} with {\tt n=4}:
9045

9046
\begin{figure}
9047
\centerline
9048
{\includegraphics[scale=0.7]{figs/fibonacci.pdf}}
9049
\caption{Call graph.}
9050
\label{fig.fibonacci}
9051
\end{figure}
9052

9053
A call graph shows a set of function frames, with lines connecting each
9054
frame to the frames of the functions it calls.  At the top of the
9055
graph, {\tt fibonacci} with {\tt n=4} calls {\tt fibonacci} with {\tt
9056
n=3} and {\tt n=2}.  In turn, {\tt fibonacci} with {\tt n=3} calls
9057
{\tt fibonacci} with {\tt n=2} and {\tt n=1}.  And so on.
9058
\index{function frame}
9059
\index{frame}
9060
\index{call graph}
9061

9062
Count how many times {\tt fibonacci(0)} and {\tt fibonacci(1)} are
9063
called.  This is an inefficient solution to the problem, and it gets
9064
worse as the argument gets bigger.
9065
\index{memo}
9066

9067
One solution is to keep track of values that have already been
9068
computed by storing them in a dictionary.  A previously computed value
9069
that is stored for later use is called a {\bf memo}.  Here is a
9070
``memoized'' version of {\tt fibonacci}:
9071

9072
\begin{verbatim}
9073
known = {0:0, 1:1}
9074

9075
def fibonacci(n):
9076
    if n in known:
9077
        return known[n]
9078

9079
    res = fibonacci(n-1) + fibonacci(n-2)
9080
    known[n] = res
9081
    return res
9082
\end{verbatim}
9083
%
9084
{\tt known} is a dictionary that keeps track of the Fibonacci
9085
numbers we already know.  It starts with
9086
two items: 0 maps to 0 and 1 maps to 1.
9087

9088
Whenever {\tt fibonacci} is called, it checks {\tt known}.
9089
If the result is already there, it can return
9090
immediately.  Otherwise it has to 
9091
compute the new value, add it to the dictionary, and return it.
9092

9093
If you run this version of {\tt fibonacci} and compare it with
9094
the original, you will find that it is much faster.
9095

9096

9097

9098
\section{Global variables}
9099
\index{global variable}
9100
\index{variable!global}
9101

9102
In the previous example, {\tt known} is created outside the function,
9103
so it belongs to the special frame called \verb"__main__".
9104
Variables in \verb"__main__" are sometimes called {\bf global}
9105
because they can be accessed from any function.  Unlike local
9106
variables, which disappear when their function ends, global variables
9107
persist from one function call to the next.
9108
\index{flag}
9109
\index{main}
9110

9111
It is common to use global variables for {\bf flags}; that is, 
9112
boolean variables that indicate (``flag'') whether a condition
9113
is true.  For example, some programs use
9114
a flag named {\tt verbose} to control the level of detail in the
9115
output:
9116

9117
\begin{verbatim}
9118
verbose = True
9119

9120
def example1():
9121
    if verbose:
9122
        print('Running example1')
9123
\end{verbatim}
9124
%
9125
If you try to reassign a global variable, you might be surprised.
9126
The following example is supposed to keep track of whether the
9127
function has been called:
9128
\index{reassignment}
9129

9130
\begin{verbatim}
9131
been_called = False
9132

9133
def example2():
9134
    been_called = True         # WRONG
9135
\end{verbatim}
9136
%
9137
But if you run it you will see that the value of \verb"been_called"
9138
doesn't change.  The problem is that {\tt example2} creates a new local
9139
variable named \verb"been_called".  The local variable goes away when
9140
the function ends, and has no effect on the global variable.
9141
\index{global statement}
9142
\index{statement!global}
9143
\index{declaration}
9144

9145
To reassign a global variable inside a function you have to
9146
{\bf declare} the global variable before you use it:
9147

9148
\begin{verbatim}
9149
been_called = False
9150

9151
def example2():
9152
    global been_called 
9153
    been_called = True
9154
\end{verbatim}
9155
%
9156
The {\bf global statement} tells the interpreter
9157
something like, ``In this function, when I say \verb"been_called", I
9158
mean the global variable; don't create a local one.''
9159
\index{update!global variable}
9160
\index{global variable!update}
9161

9162
Here's an example that tries to update a global variable:
9163

9164
\begin{verbatim}
9165
count = 0
9166

9167
def example3():
9168
    count = count + 1          # WRONG
9169
\end{verbatim}
9170
%
9171
If you run it you get:
9172
\index{UnboundLocalError}
9173
\index{exception!UnboundLocalError}
9174

9175
\begin{verbatim}
9176
UnboundLocalError: local variable 'count' referenced before assignment
9177
\end{verbatim}
9178
%
9179
Python assumes that {\tt count} is local, and under that assumption
9180
you are reading it before writing it.  The solution, again,
9181
is to declare {\tt count} global.
9182
\index{counter}
9183

9184
\begin{verbatim}
9185
def example3():
9186
    global count
9187
    count += 1
9188
\end{verbatim}
9189
%
9190
If a global variable refers to a mutable value, you can modify
9191
the value without declaring the variable:
9192
\index{mutability}
9193

9194
\begin{verbatim}
9195
known = {0:0, 1:1}
9196

9197
def example4():
9198
    known[2] = 1
9199
\end{verbatim}
9200
%
9201
So you can add, remove and replace elements of a global list or
9202
dictionary, but if you want to reassign the variable, you
9203
have to declare it:
9204

9205
\begin{verbatim}
9206
def example5():
9207
    global known
9208
    known = dict()
9209
\end{verbatim}
9210
%
9211
Global variables can be useful, but if you have a lot of them,
9212
and you modify them frequently, they can make programs
9213
hard to debug.
9214

9215

9216
\section{Debugging}
9217
\index{debugging}
9218

9219
As you work with bigger datasets it can become unwieldy to
9220
debug by printing and checking the output by hand.  Here are some
9221
suggestions for debugging large datasets:
9222

9223
\begin{description}
9224

9225
\item[Scale down the input:] If possible, reduce the size of the
9226
dataset.  For example if the program reads a text file, start with
9227
just the first 10 lines, or with the smallest example you can find.
9228
You can either edit the files themselves, or (better) modify the
9229
program so it reads only the first {\tt n} lines.
9230

9231
If there is an error, you can reduce {\tt n} to the smallest
9232
value that manifests the error, and then increase it gradually
9233
as you find and correct errors.
9234

9235
\item[Check summaries and types:] Instead of printing and checking the
9236
entire dataset, consider printing summaries of the data: for example,
9237
the number of items in a dictionary or the total of a list of numbers.
9238

9239
A common cause of runtime errors is a value that is not the right
9240
type.  For debugging this kind of error, it is often enough to print
9241
the type of a value.
9242

9243
\item[Write self-checks:]  Sometimes you can write code to check
9244
for errors automatically.  For example, if you are computing the
9245
average of a list of numbers, you could check that the result is
9246
not greater than the largest element in the list or less than
9247
the smallest.  This is called a ``sanity check'' because it detects
9248
results that are ``insane''.
9249
\index{sanity check}
9250
\index{consistency check}
9251

9252
Another kind of check compares the results of two different
9253
computations to see if they are consistent.  This is called a
9254
``consistency check''.
9255

9256
\item[Format the output:] Formatting debugging output
9257
can make it easier to spot an error.  We saw an example in
9258
Section~\ref{factdebug}.  Another tool you might find useful is the {\tt pprint} module, which provides
9259
a {\tt pprint} function that displays built-in types in
9260
a more human-readable format ({\tt pprint} stands for
9261
``pretty print'').
9262
\index{pretty print}
9263
\index{pprint module}
9264
\index{module!pprint}
9265

9266
\end{description}
9267

9268
Again, time you spend building scaffolding can reduce
9269
the time you spend debugging.
9270
\index{scaffolding}
9271

9272

9273
\section{Glossary}
9274

9275
\begin{description}
9276

9277
\item[mapping:] A relationship in which each element of one set
9278
corresponds to an element of another set.
9279
\index{mapping}
9280

9281
\item[dictionary:] A mapping from keys to their
9282
corresponding values.
9283
\index{dictionary}
9284

9285
\item[key-value pair:] The representation of the mapping from
9286
a key to a value.
9287
\index{key-value pair}
9288

9289
\item[item:] In a dictionary, another name for a key-value
9290
  pair.
9291
\index{item!dictionary}
9292

9293
\item[key:] An object that appears in a dictionary as the
9294
first part of a key-value pair.
9295
\index{key}
9296

9297
\item[value:] An object that appears in a dictionary as the
9298
second part of a key-value pair.  This is more specific than
9299
our previous use of the word ``value''.
9300
\index{value}
9301

9302
\item[implementation:] A way of performing a computation.
9303
\index{implementation}
9304

9305
\item[hashtable:] The algorithm used to implement Python
9306
dictionaries.
9307
\index{hashtable}
9308

9309
\item[hash function:] A function used by a hashtable to compute the
9310
location for a key.
9311
\index{hash function}
9312

9313
\item[hashable:] A type that has a hash function.  Immutable
9314
types like integers,
9315
floats and strings are hashable; mutable types like lists and
9316
dictionaries are not.
9317
\index{hashable}
9318

9319
\item[lookup:] A dictionary operation that takes a key and finds
9320
the corresponding value.
9321
\index{lookup}
9322

9323
\item[reverse lookup:] A dictionary operation that takes a value and finds
9324
one or more keys that map to it.
9325
\index{reverse lookup}
9326

9327
\item[raise statement:]  A statement that (deliberately) raises an exception.
9328
\index{raise statement}
9329
\index{statement!raise}
9330

9331
\item[singleton:] A list (or other sequence) with a single element.
9332
\index{singleton}
9333

9334
\item[call graph:] A diagram that shows every frame created during
9335
the execution of a program, with an arrow from each caller to
9336
each callee. 
9337
\index{call graph}
9338
\index{diagram!call graph}
9339

9340
\item[memo:] A computed value stored to avoid unnecessary future 
9341
computation.
9342
\index{memo}
9343

9344
\item[global variable:]  A variable defined outside a function.  Global
9345
variables can be accessed from any function.
9346
\index{global variable}
9347

9348
\item[global statement:]  A statement that declares a variable name
9349
global.
9350
\index{global statement}
9351
\index{statement!global}
9352

9353
\item[flag:] A boolean variable used to indicate whether a condition
9354
is true.
9355
\index{flag}
9356

9357
\item[declaration:] A statement like {\tt global} that tells the
9358
interpreter something about a variable.
9359
\index{declaration}
9360

9361
\end{description}
9362

9363

9364
\section{Exercises}
9365

9366
\begin{exercise}
9367
\label{wordlist2}
9368
\index{set membership}
9369
\index{membership!set}
9370

9371
Write a function that reads the words in {\tt words.txt} and
9372
stores them as keys in a dictionary.  It doesn't matter what the
9373
values are.  Then you can use the {\tt in} operator
9374
as a fast way to check whether a string is in
9375
the dictionary.
9376

9377
If you did Exercise~\ref{wordlist1}, you can compare the speed
9378
of this implementation with the list {\tt in} operator and the
9379
bisection search.
9380

9381
\end{exercise}
9382

9383

9384
\begin{exercise}
9385
\label{setdefault}
9386

9387
Read the documentation of the dictionary method {\tt setdefault}
9388
and use it to write a more concise version of \verb"invert_dict".
9389
Solution: \url{http://thinkpython2.com/code/invert_dict.py}.
9390
\index{setdefault method}
9391
\index{method!setdefault}
9392

9393
\end{exercise}
9394

9395

9396
\begin{exercise}
9397
Memoize the Ackermann function from Exercise~\ref{ackermann} and see if
9398
memoization makes it possible to evaluate the function with bigger
9399
arguments.  Hint: no.
9400
Solution: \url{http://thinkpython2.com/code/ackermann_memo.py}.
9401
\index{Ackermann function}
9402
\index{function!ack}
9403

9404
\end{exercise}
9405

9406

9407

9408
\begin{exercise}
9409
\index{duplicate}
9410

9411
If you did Exercise~\ref{duplicate}, you already have
9412
a function named \verb"has_duplicates" that takes a list
9413
as a parameter and returns {\tt True} if there is any object
9414
that appears more than once in the list.
9415

9416
Use a dictionary to write a faster, simpler version of
9417
\verb"has_duplicates". 
9418
Solution: \url{http://thinkpython2.com/code/has_duplicates.py}.
9419

9420
\end{exercise}
9421

9422

9423
\begin{exercise}
9424
\label{exrotatepairs}
9425
\index{letter rotation}
9426
\index{rotation!letters}
9427

9428
Two words are ``rotate pairs'' if you can rotate one of them
9429
and get the other (see \verb"rotate_word" in Exercise~\ref{exrotate}).
9430

9431
Write a program that reads a wordlist and finds all the rotate
9432
pairs.  Solution: \url{http://thinkpython2.com/code/rotate_pairs.py}.
9433

9434
\end{exercise}
9435

9436

9437
\begin{exercise}
9438
\index{Car Talk}
9439
\index{Puzzler}
9440

9441
Here's another Puzzler from {\em Car Talk} 
9442
(\url{http://www.cartalk.com/content/puzzlers}):
9443

9444
\begin{quote}
9445
This was sent in by a fellow named Dan O'Leary. He came upon a common
9446
one-syllable, five-letter word recently that has the following unique
9447
property. When you remove the first letter, the remaining letters form
9448
a homophone of the original word, that is a word that sounds exactly
9449
the same. Replace the first letter, that is, put it back and remove
9450
the second letter and the result is yet another homophone of the
9451
original word. And the question is, what's the word?
9452

9453
Now I'm going to give you an example that doesn't work. Let's look at
9454
the five-letter word, `wrack.' W-R-A-C-K, you know like to `wrack with
9455
pain.' If I remove the first letter, I am left with a four-letter
9456
word, 'R-A-C-K.' As in, `Holy cow, did you see the rack on that buck!
9457
It must have been a nine-pointer!' It's a perfect homophone. If you
9458
put the `w' back, and remove the `r,' instead, you're left with the
9459
word, `wack,' which is a real word, it's just not a homophone of the
9460
other two words.
9461

9462
But there is, however, at least one word that Dan and we know of,
9463
which will yield two homophones if you remove either of the first two
9464
letters to make two, new four-letter words. The question is, what's
9465
the word?
9466
\end{quote}
9467
\index{homophone}
9468
\index{reducible word}
9469
\index{word, reducible}
9470

9471
You can use the dictionary from Exercise~\ref{wordlist2} to check
9472
whether a string is in the word list.
9473

9474
To check whether two words are homophones, you can use the CMU
9475
Pronouncing Dictionary.  You can download it from
9476
\url{http://www.speech.cs.cmu.edu/cgi-bin/cmudict} or from
9477
\url{http://thinkpython2.com/code/c06d} and you can also download
9478
\url{http://thinkpython2.com/code/pronounce.py}, which provides a function
9479
named \verb"read_dictionary" that reads the pronouncing dictionary and
9480
returns a Python dictionary that maps from each word to a string that
9481
describes its primary pronunciation.
9482

9483
Write a program that lists all the words that solve the Puzzler.
9484
Solution: \url{http://thinkpython2.com/code/homophone.py}.
9485

9486
\end{exercise}
9487

9488

9489

9490
\chapter{Tuples}
9491
\label{tuplechap}
9492

9493
This chapter presents one more built-in type, the tuple, and then
9494
shows how lists, dictionaries, and tuples work together.
9495
I also present a useful feature for variable-length argument lists,
9496
the gather and scatter operators.
9497

9498
One note: there is no consensus on how to pronounce ``tuple''.
9499
Some people say ``tuh-ple'', which rhymes with ``supple''.  But
9500
in the context of programming, most people say ``too-ple'', which
9501
rhymes with ``quadruple''.
9502

9503

9504
\section{Tuples are immutable}
9505
\index{tuple}
9506
\index{type!tuple}
9507
\index{sequence}
9508

9509
A tuple is a sequence of values.  The values can be any type, and
9510
they are indexed by integers, so in that respect tuples are a lot
9511
like lists.  The important difference is that tuples are immutable.
9512
\index{mutability}
9513
\index{immutability}
9514

9515
Syntactically, a tuple is a comma-separated list of values:
9516

9517
\begin{verbatim}
9518
>>> t = 'a', 'b', 'c', 'd', 'e'
9519
\end{verbatim}
9520
%
9521
Although it is not necessary, it is common to enclose tuples in
9522
parentheses:
9523
\index{parentheses!tuples in}
9524

9525
\begin{verbatim}
9526
>>> t = ('a', 'b', 'c', 'd', 'e')
9527
\end{verbatim}
9528
%
9529
To create a tuple with a single element, you have to include a final
9530
comma:
9531
\index{singleton}
9532
\index{tuple!singleton}
9533

9534
\begin{verbatim}
9535
>>> t1 = 'a',
9536
>>> type(t1)
9537
<class 'tuple'>
9538
\end{verbatim}
9539
%
9540
A value in parentheses is not a tuple:
9541

9542
\begin{verbatim}
9543
>>> t2 = ('a')
9544
>>> type(t2)
9545
<class 'str'>
9546
\end{verbatim}
9547
%
9548
Another way to create a tuple is the built-in function {\tt tuple}.
9549
With no argument, it creates an empty tuple:
9550
\index{tuple function}
9551
\index{function!tuple}
9552

9553
\begin{verbatim}
9554
>>> t = tuple()
9555
>>> t
9556
()
9557
\end{verbatim}
9558
%
9559
If the argument is a sequence (string, list or tuple), the result
9560
is a tuple with the elements of the sequence:
9561

9562
\begin{verbatim}
9563
>>> t = tuple('lupins')
9564
>>> t
9565
('l', 'u', 'p', 'i', 'n', 's')
9566
\end{verbatim}
9567
%
9568
Because {\tt tuple} is the name of a built-in function, you should
9569
avoid using it as a variable name.
9570

9571
Most list operators also work on tuples.  The bracket operator
9572
indexes an element:
9573
\index{bracket operator}
9574
\index{operator!bracket}
9575

9576
\begin{verbatim}
9577
>>> t = ('a', 'b', 'c', 'd', 'e')
9578
>>> t[0]
9579
'a'
9580
\end{verbatim}
9581
%
9582
And the slice operator selects a range of elements.
9583
\index{slice operator}
9584
\index{operator!slice}
9585
\index{tuple!slice}
9586
\index{slice!tuple}
9587

9588
\begin{verbatim}
9589
>>> t[1:3]
9590
('b', 'c')
9591
\end{verbatim}
9592
%
9593
But if you try to modify one of the elements of the tuple, you get
9594
an error:
9595
\index{exception!TypeError}
9596
\index{TypeError}
9597
\index{item assignment}
9598
\index{assignment!item}
9599

9600
\begin{verbatim}
9601
>>> t[0] = 'A'
9602
TypeError: object doesn't support item assignment
9603
\end{verbatim}
9604
%
9605
Because tuples are immutable, you can't modify the elements.  But you
9606
can replace one tuple with another:
9607

9608
\begin{verbatim}
9609
>>> t = ('A',) + t[1:]
9610
>>> t
9611
('A', 'b', 'c', 'd', 'e')
9612
\end{verbatim}
9613
%
9614
This statement makes a new tuple and then makes {\tt t} refer to it.
9615

9616
The relational operators work with tuples and other sequences;
9617
Python starts by comparing the first element from each
9618
sequence.  If they are equal, it goes on to the next elements,
9619
and so on, until it finds elements that differ.  Subsequent
9620
elements are not considered (even if they are really big).
9621
\index{comparison!tuple}
9622
\index{tuple!comparison}
9623

9624
\begin{verbatim}
9625
>>> (0, 1, 2) < (0, 3, 4)
9626
True
9627
>>> (0, 1, 2000000) < (0, 3, 4)
9628
True
9629
\end{verbatim}
9630

9631

9632

9633
\section{Tuple assignment}
9634
\label{tuple.assignment}
9635
\index{tuple!assignment}
9636
\index{assignment!tuple}
9637
\index{swap pattern}
9638
\index{pattern!swap}
9639

9640
It is often useful to swap the values of two variables.
9641
With conventional assignments, you have to use a temporary
9642
variable.  For example, to swap {\tt a} and {\tt b}:
9643

9644
\begin{verbatim}
9645
>>> temp = a
9646
>>> a = b
9647
>>> b = temp
9648
\end{verbatim}
9649
%
9650
This solution is cumbersome; {\bf tuple assignment} is more elegant:
9651

9652
\begin{verbatim}
9653
>>> a, b = b, a
9654
\end{verbatim}
9655
%
9656
The left side is a tuple of variables; the right side is a tuple of
9657
expressions.  Each value is assigned to its respective variable.  
9658
All the expressions on the right side are evaluated before any
9659
of the assignments.
9660

9661
The number of variables on the left and the number of
9662
values on the right have to be the same:
9663
\index{exception!ValueError}
9664
\index{ValueError}
9665

9666
\begin{verbatim}
9667
>>> a, b = 1, 2, 3
9668
ValueError: too many values to unpack
9669
\end{verbatim}
9670
%
9671
More generally, the right side can be any kind of sequence
9672
(string, list or tuple).  For example, to split an email address
9673
into a user name and a domain, you could write:
9674
\index{split method}
9675
\index{method!split}
9676
\index{email address}
9677

9678
\begin{verbatim}
9679
>>> addr = 'monty@python.org'
9680
>>> uname, domain = addr.split('@')
9681
\end{verbatim}
9682
%
9683
The return value from {\tt split} is a list with two elements;
9684
the first element is assigned to {\tt uname}, the second to
9685
{\tt domain}.
9686

9687
\begin{verbatim}
9688
>>> uname
9689
'monty'
9690
>>> domain
9691
'python.org'
9692
\end{verbatim}
9693
%
9694

9695
\section{Tuples as return values}
9696
\index{tuple}
9697
\index{value!tuple}
9698
\index{return value!tuple}
9699
\index{function!tuple as return value}
9700

9701
Strictly speaking, a function can only return one value, but
9702
if the value is a tuple, the effect is the same as returning
9703
multiple values.  For example, if you want to divide two integers
9704
and compute the quotient and remainder, it is inefficient to
9705
compute {\tt x//y} and then {\tt x\%y}.  It is better to compute
9706
them both at the same time.
9707
\index{divmod}
9708

9709
The built-in function {\tt divmod} takes two arguments and
9710
returns a tuple of two values, the quotient and remainder.
9711
You can store the result as a tuple:
9712

9713
\begin{verbatim}
9714
>>> t = divmod(7, 3)
9715
>>> t
9716
(2, 1)
9717
\end{verbatim}
9718
%
9719
Or use tuple assignment to store the elements separately:
9720
\index{tuple assignment}
9721
\index{assignment!tuple}
9722

9723
\begin{verbatim}
9724
>>> quot, rem = divmod(7, 3)
9725
>>> quot
9726
2
9727
>>> rem
9728
1
9729
\end{verbatim}
9730
%
9731
Here is an example of a function that returns a tuple:
9732

9733
\begin{verbatim}
9734
def min_max(t):
9735
    return min(t), max(t)
9736
\end{verbatim}
9737
%
9738
{\tt max} and {\tt min} are built-in functions that find
9739
the largest and smallest elements of a sequence.  \verb"min_max"
9740
computes both and returns a tuple of two values.
9741
\index{max function}
9742
\index{function!max}
9743
\index{min function}
9744
\index{function!min}
9745

9746

9747
\section{Variable-length argument tuples}
9748
\label{gather}
9749
\index{variable-length argument tuple}
9750
\index{argument!variable-length tuple}
9751
\index{gather}
9752
\index{parameter!gather}
9753
\index{argument!gather}
9754

9755
Functions can take a variable number of arguments.  A parameter
9756
name that begins with {\tt *} {\bf gathers} arguments into
9757
a tuple.  For example, {\tt printall}
9758
takes any number of arguments and prints them:
9759

9760
\begin{verbatim}
9761
def printall(*args):
9762
    print(args)
9763
\end{verbatim}
9764
%
9765
The gather parameter can have any name you like, but {\tt args} is
9766
conventional.  Here's how the function works:
9767

9768
\begin{verbatim}
9769
>>> printall(1, 2.0, '3')
9770
(1, 2.0, '3')
9771
\end{verbatim}
9772
%
9773
The complement of gather is {\bf scatter}.  If you have a
9774
sequence of values and you want to pass it to a function
9775
as multiple arguments, you can use the {\tt *} operator.
9776
For example, {\tt divmod} takes exactly two arguments; it
9777
doesn't work with a tuple:
9778
\index{scatter}
9779
\index{argument scatter}
9780
\index{TypeError}
9781
\index{exception!TypeError}
9782

9783
\begin{verbatim}
9784
>>> t = (7, 3)
9785
>>> divmod(t)
9786
TypeError: divmod expected 2 arguments, got 1
9787
\end{verbatim}
9788
%
9789
But if you scatter the tuple, it works:
9790

9791
\begin{verbatim}
9792
>>> divmod(*t)
9793
(2, 1)
9794
\end{verbatim}
9795
%
9796
Many of the built-in functions use
9797
variable-length argument tuples.  For example, {\tt max}
9798
and {\tt min} can take any number of arguments:
9799
\index{max function}
9800
\index{function!max}
9801
\index{min function}
9802
\index{function!min}
9803

9804
\begin{verbatim}
9805
>>> max(1, 2, 3)
9806
3
9807
\end{verbatim}
9808
%
9809
But {\tt sum} does not.
9810
\index{sum function}
9811
\index{function!sum}
9812

9813
\begin{verbatim}
9814
>>> sum(1, 2, 3)
9815
TypeError: sum expected at most 2 arguments, got 3
9816
\end{verbatim}
9817
%
9818
As an exercise, write a function called \verb"sum_all" that takes any number
9819
of arguments and returns their sum.
9820

9821

9822
\section{Lists and tuples}
9823
\index{zip function}
9824
\index{function!zip}
9825

9826
{\tt zip} is a built-in function that takes two or more sequences and
9827
interleaves them.  The name of the function refers to
9828
a zipper, which interleaves two rows of teeth.
9829

9830
This example zips a string and a list:
9831

9832
\begin{verbatim}
9833
>>> s = 'abc'
9834
>>> t = [0, 1, 2]
9835
>>> zip(s, t)
9836
<zip object at 0x7f7d0a9e7c48>
9837
\end{verbatim}
9838
%
9839
The result is a {\bf zip object} that knows how to iterate through
9840
the pairs.  The most common use of {\tt zip} is in a {\tt for} loop:
9841

9842
\begin{verbatim}
9843
>>> for pair in zip(s, t):
9844
...     print(pair)
9845
...
9846
('a', 0)
9847
('b', 1)
9848
('c', 2)
9849
\end{verbatim}
9850
%
9851
A zip object is a kind of {\bf iterator}, which is any object
9852
that iterates through a sequence.  Iterators are similar to lists in some
9853
ways, but unlike lists, you can't use an index to select an element from
9854
an iterator.
9855
\index{iterator}
9856

9857
If you want to use list operators and methods, you can
9858
use a zip object to make a list:
9859

9860
\begin{verbatim}
9861
>>> list(zip(s, t))
9862
[('a', 0), ('b', 1), ('c', 2)]
9863
\end{verbatim}
9864
%
9865
The result is a list of tuples; in this example, each tuple contains
9866
a character from the string and the corresponding element from
9867
the list.
9868
\index{list!of tuples}
9869

9870
If the sequences are not the same length, the result has the
9871
length of the shorter one.
9872

9873
\begin{verbatim}
9874
>>> list(zip('Anne', 'Elk'))
9875
[('A', 'E'), ('n', 'l'), ('n', 'k')]
9876
\end{verbatim}
9877
%
9878
You can use tuple assignment in a {\tt for} loop to traverse a list of
9879
tuples:
9880
\index{traversal}
9881
\index{tuple assignment}
9882
\index{assignment!tuple}
9883

9884
\begin{verbatim}
9885
t = [('a', 0), ('b', 1), ('c', 2)]
9886
for letter, number in t:
9887
    print(number, letter)
9888
\end{verbatim}
9889
%
9890
Each time through the loop, Python selects the next tuple in
9891
the list and assigns the elements to {\tt letter} and 
9892
{\tt number}.  The output of this loop is:
9893
\index{loop}
9894

9895
\begin{verbatim}
9896
0 a
9897
1 b
9898
2 c
9899
\end{verbatim}
9900
%
9901
If you combine {\tt zip}, {\tt for} and tuple assignment, you get a
9902
useful idiom for traversing two (or more) sequences at the same
9903
time.  For example, \verb"has_match" takes two sequences, {\tt t1} and
9904
{\tt t2}, and returns {\tt True} if there is an index {\tt i}
9905
such that {\tt t1[i] == t2[i]}:
9906
\index{for loop}
9907

9908
\begin{verbatim}
9909
def has_match(t1, t2):
9910
    for x, y in zip(t1, t2):
9911
        if x == y:
9912
            return True
9913
    return False
9914
\end{verbatim}
9915
%
9916
If you need to traverse the elements of a sequence and their
9917
indices, you can use the built-in function {\tt enumerate}:
9918
\index{traversal}
9919
\index{enumerate function}
9920
\index{function!enumerate}
9921

9922
\begin{verbatim}
9923
for index, element in enumerate('abc'):
9924
    print(index, element)
9925
\end{verbatim}
9926
%
9927
The result from {\tt enumerate} is an enumerate object, which
9928
iterates a sequence of pairs; each pair contains an index (starting
9929
from 0) and an element from the given sequence.
9930
In this example, the output is
9931

9932
\begin{verbatim}
9933
0 a
9934
1 b
9935
2 c
9936
\end{verbatim}
9937
%
9938
Again.
9939
\index{iterator}
9940
\index{object!enumerate}
9941
\index{enumerate object}
9942

9943

9944
\section{Dictionaries and tuples}
9945
\label{dictuple}
9946
\index{dictionary}
9947
\index{items method}
9948
\index{method!items}
9949
\index{key-value pair}
9950

9951
Dictionaries have a method called {\tt items} that returns a sequence of
9952
tuples, where each tuple is a key-value pair.
9953

9954
\begin{verbatim}
9955
>>> d = {'a':0, 'b':1, 'c':2}
9956
>>> t = d.items()
9957
>>> t
9958
dict_items([('c', 2), ('a', 0), ('b', 1)])
9959
\end{verbatim}
9960
%
9961
The result is a \verb"dict_items" object, which is an iterator that
9962
iterates the key-value pairs.  You can use it in a {\tt for} loop
9963
like this:
9964
\index{iterator}
9965

9966
\begin{verbatim}
9967
>>> for key, value in d.items():
9968
...     print(key, value)
9969
...
9970
c 2
9971
a 0
9972
b 1
9973
\end{verbatim}
9974
%
9975
As you should expect from a dictionary, the items are in no
9976
particular order.
9977

9978
Going in the other direction, you can use a list of tuples to
9979
initialize a new dictionary: \index{dictionary!initialize}
9980

9981
\begin{verbatim}
9982
>>> t = [('a', 0), ('c', 2), ('b', 1)]
9983
>>> d = dict(t)
9984
>>> d
9985
{'a': 0, 'c': 2, 'b': 1}
9986
\end{verbatim}
9987

9988
Combining {\tt dict} with {\tt zip} yields a concise way
9989
to create a dictionary:
9990
\index{zip function!use with dict}
9991

9992
\begin{verbatim}
9993
>>> d = dict(zip('abc', range(3)))
9994
>>> d
9995
{'a': 0, 'c': 2, 'b': 1}
9996
\end{verbatim}
9997
%
9998
The dictionary method {\tt update} also takes a list of tuples
9999
and adds them, as key-value pairs, to an existing dictionary.
10000
\index{update method}
10001
\index{method!update}
10002
\index{traverse!dictionary}
10003
\index{dictionary!traversal}
10004

10005
It is common to use tuples as keys in dictionaries (primarily because
10006
you can't use lists).  For example, a telephone directory might map
10007
from last-name, first-name pairs to telephone numbers.  Assuming
10008
that we have defined {\tt last}, {\tt first} and {\tt number}, we
10009
could write:
10010
\index{tuple!as key in dictionary}
10011
\index{hashable}
10012

10013
\begin{verbatim}
10014
directory[last, first] = number
10015
\end{verbatim}
10016
%
10017
The expression in brackets is a tuple.  We could use tuple
10018
assignment to traverse this dictionary.
10019
\index{tuple!in brackets}
10020

10021
\begin{verbatim}
10022
for last, first in directory:
10023
    print(first, last, directory[last,first])
10024
\end{verbatim}
10025
%
10026
This loop traverses the keys in {\tt directory}, which are tuples.  It
10027
assigns the elements of each tuple to {\tt last} and {\tt first}, then
10028
prints the name and corresponding telephone number.
10029

10030
There are two ways to represent tuples in a state diagram.  The more
10031
detailed version shows the indices and elements just as they appear in
10032
a list.  For example, the tuple \verb"('Cleese', 'John')" would appear
10033
as in Figure~\ref{fig.tuple1}.
10034
\index{state diagram}
10035
\index{diagram!state}
10036

10037
\begin{figure}
10038
\centerline
10039
{\includegraphics[scale=0.8]{figs/tuple1.pdf}}
10040
\caption{State diagram.}
10041
\label{fig.tuple1}
10042
\end{figure}
10043

10044
But in a larger diagram you might want to leave out the
10045
details.  For example, a diagram of the telephone directory might
10046
appear as in Figure~\ref{fig.dict2}.
10047

10048
\begin{figure}
10049
\centerline
10050
{\includegraphics[scale=0.8]{figs/dict2.pdf}}
10051
\caption{State diagram.}
10052
\label{fig.dict2}
10053
\end{figure}
10054

10055
Here the tuples are shown using Python syntax as a graphical
10056
shorthand.  The telephone number in the diagram is the complaints line
10057
for the BBC, so please don't call it.
10058

10059

10060
\section{Sequences of sequences}
10061
\index{sequence}
10062

10063
I have focused on lists of tuples, but almost all of the examples in
10064
this chapter also work with lists of lists, tuples of tuples, and
10065
tuples of lists.  To avoid enumerating the possible combinations, it
10066
is sometimes easier to talk about sequences of sequences.
10067

10068
In many contexts, the different kinds of sequences (strings, lists and
10069
tuples) can be used interchangeably.  So how should you choose one
10070
over the others?
10071
\index{string}
10072
\index{list}
10073
\index{tuple}
10074
\index{mutability}
10075
\index{immutability}
10076

10077
To start with the obvious, strings are more limited than other
10078
sequences because the elements have to be characters.  They are
10079
also immutable.  If you need the ability to change the characters
10080
in a string (as opposed to creating a new string), you might
10081
want to use a list of characters instead.
10082

10083
Lists are more common than tuples, mostly because they are mutable.
10084
But there are a few cases where you might prefer tuples:
10085

10086
\begin{enumerate}
10087

10088
\item In some contexts, like a {\tt return} statement, it is
10089
syntactically simpler to create a tuple than a list.
10090

10091
\item If you want to use a sequence as a dictionary key, you
10092
have to use an immutable type like a tuple or string.
10093

10094
\item If you are passing a sequence as an argument to a function,
10095
using tuples reduces the potential for unexpected behavior
10096
due to aliasing.
10097

10098
\end{enumerate}
10099

10100
Because tuples are immutable, they don't provide methods like {\tt
10101
  sort} and {\tt reverse}, which modify existing lists.  But Python
10102
provides the built-in function {\tt sorted}, which takes any sequence
10103
and returns a new list with the same elements in sorted order, and
10104
{\tt reversed}, which takes a sequence and returns an iterator that
10105
traverses the list in reverse order.
10106
\index{sorted function}
10107
\index{function!sorted} \index{reversed function}
10108
\index{function!reversed}
10109
\index{iterator}
10110

10111

10112
\section{Debugging}
10113
\index{debugging}
10114
\index{data structure}
10115
\index{shape error}
10116
\index{error!shape}
10117

10118
Lists, dictionaries and tuples are examples of {\bf data
10119
  structures}; in this chapter we are starting to see compound data
10120
structures, like lists of tuples, or dictionaries that contain tuples
10121
as keys and lists as values.  Compound data structures are useful, but
10122
they are prone to what I call {\bf shape errors}; that is, errors
10123
caused when a data structure has the wrong type, size, or structure.
10124
For example, if you are expecting a list with one integer and I
10125
give you a plain old integer (not in a list), it won't work.
10126
\index{structshape module}
10127
\index{module!structshape}
10128

10129
To help debug these kinds of errors, I have written a module
10130
called {\tt structshape} that provides a function, also called
10131
{\tt structshape}, that takes any kind of data structure as
10132
an argument and returns a string that summarizes its shape.
10133
You can download it from \url{http://thinkpython2.com/code/structshape.py}
10134

10135
Here's the result for a simple list:
10136

10137
\begin{verbatim}
10138
>>> from structshape import structshape
10139
>>> t = [1, 2, 3]
10140
>>> structshape(t)
10141
'list of 3 int'
10142
\end{verbatim}
10143
%
10144
A fancier program might write ``list of 3 int{\em s}'', but it
10145
was easier not to deal with plurals.  Here's a list of lists:
10146

10147
\begin{verbatim}
10148
>>> t2 = [[1,2], [3,4], [5,6]]
10149
>>> structshape(t2)
10150
'list of 3 list of 2 int'
10151
\end{verbatim}
10152
%
10153
If the elements of the list are not the same type,
10154
{\tt structshape} groups them, in order, by type:
10155

10156
\begin{verbatim}
10157
>>> t3 = [1, 2, 3, 4.0, '5', '6', [7], [8], 9]
10158
>>> structshape(t3)
10159
'list of (3 int, float, 2 str, 2 list of int, int)'
10160
\end{verbatim}
10161
%
10162
Here's a list of tuples:
10163

10164
\begin{verbatim}
10165
>>> s = 'abc'
10166
>>> lt = list(zip(t, s))
10167
>>> structshape(lt)
10168
'list of 3 tuple of (int, str)'
10169
\end{verbatim}
10170
%
10171
And here's a dictionary with 3 items that map integers to strings.
10172

10173
\begin{verbatim}
10174
>>> d = dict(lt) 
10175
>>> structshape(d)
10176
'dict of 3 int->str'
10177
\end{verbatim}
10178
%
10179
If you are having trouble keeping track of your data structures,
10180
{\tt structshape} can help.
10181

10182

10183
\section{Glossary}
10184

10185
\begin{description}
10186

10187
\item[tuple:] An immutable sequence of elements.
10188
\index{tuple}
10189

10190
\item[tuple assignment:] An assignment with a sequence on the
10191
right side and a tuple of variables on the left.  The right
10192
side is evaluated and then its elements are assigned to the
10193
variables on the left.
10194
\index{tuple assignment}
10195
\index{assignment!tuple}
10196

10197
\item[gather:] An operation that collects multiple arguments into a tuple.
10198
\index{gather}
10199

10200
\item[scatter:] An operation that makes a sequence behave like multiple arguments.
10201
\index{scatter}
10202

10203
\item[zip object:] The result of calling a built-in function {\tt zip};
10204
an object that iterates through a sequence of tuples.
10205
\index{zip object}
10206
\index{object!zip}
10207

10208
\item[iterator:] An object that can iterate through a sequence, but
10209
which does not provide list operators and methods.
10210
\index{iterator}
10211

10212
\item[data structure:] A collection of related values, often
10213
organized in lists, dictionaries, tuples, etc.
10214
\index{data structure}
10215

10216
\item[shape error:] An error caused because a value has the
10217
wrong shape; that is, the wrong type or size.
10218
\index{shape}
10219

10220
\end{description}
10221

10222

10223
\section{Exercises}
10224

10225
\begin{exercise}
10226

10227
Write a function called \verb"most_frequent" that takes a string and
10228
prints the letters in decreasing order of frequency.  Find text
10229
samples from several different languages and see how letter frequency
10230
varies between languages.  Compare your results with the tables at
10231
\url{http://en.wikipedia.org/wiki/Letter_frequencies}.  Solution:
10232
\url{http://thinkpython2.com/code/most_frequent.py}.  \index{letter
10233
  frequency} \index{frequency!letter}
10234

10235
\end{exercise}
10236

10237

10238
\begin{exercise}
10239
\label{anagrams}
10240
\index{anagram set}
10241
\index{set!anagram}
10242

10243
More anagrams!
10244

10245
\begin{enumerate}
10246

10247
\item Write a program
10248
that reads a word list from a file (see Section~\ref{wordlist}) and
10249
prints all the sets of words that are anagrams.
10250

10251
Here is an example of what the output might look like:
10252

10253
\begin{verbatim}
10254
['deltas', 'desalt', 'lasted', 'salted', 'slated', 'staled']
10255
['retainers', 'ternaries']
10256
['generating', 'greatening']
10257
['resmelts', 'smelters', 'termless']
10258
\end{verbatim}
10259
%
10260
Hint: you might want to build a dictionary that maps from a
10261
collection of letters to a list of words that can be spelled with those
10262
letters.  The question is, how can you represent the collection of
10263
letters in a way that can be used as a key?
10264

10265
\item Modify the previous program so that it prints the longest list
10266
of anagrams first, followed by the second longest, and so on.
10267
\index{Scrabble}
10268
\index{bingo}
10269

10270
\item In Scrabble a ``bingo'' is when you play all seven tiles in
10271
your rack, along with a letter on the board, to form an eight-letter
10272
word.  What collection of 8 letters forms the most possible bingos?
10273

10274
% (7, ['angriest', 'astringe', 'ganister', 'gantries', 'granites',
10275
% 'ingrates', 'rangiest'])
10276

10277
Solution: \url{http://thinkpython2.com/code/anagram_sets.py}.
10278

10279
\end{enumerate}
10280
\end{exercise}
10281

10282
\begin{exercise}
10283
\index{metathesis}
10284

10285
Two words form a ``metathesis pair'' if you can transform one into the
10286
other by swapping two letters; for example, ``converse'' and
10287
``conserve''.  Write a program that finds all of the metathesis pairs
10288
in the dictionary.  Hint: don't test all pairs of words, and don't
10289
test all possible swaps.  Solution:
10290
\url{http://thinkpython2.com/code/metathesis.py}.  Credit: This
10291
exercise is inspired by an example at \url{http://puzzlers.org}.
10292

10293
\end{exercise}
10294

10295

10296
\begin{exercise}
10297
\index{Car Talk}
10298
\index{Puzzler}
10299

10300
Here's another Car Talk Puzzler
10301
(\url{http://www.cartalk.com/content/puzzlers}):
10302

10303
\begin{quote}
10304
What is the longest English word, that remains a valid English word,
10305
as you remove its letters one at a time?
10306

10307
Now, letters can be removed from either end, or the middle, but you
10308
can't rearrange any of the letters. Every time you drop a letter, you
10309
wind up with another English word. If you do that, you're eventually
10310
going to wind up with one letter and that too is going to be an
10311
English word---one that's found in the dictionary. I want to know
10312
what's the longest word and how many letters does it
10313
have?
10314

10315
I'm going to give you a little modest example: Sprite. Ok? You start
10316
off with sprite, you take a letter off, one from the interior of the
10317
word, take the r away, and we're left with the word spite, then we
10318
take the e off the end, we're left with spit, we take the s off, we're
10319
left with pit, it, and I.
10320
\end{quote}
10321
\index{reducible word}
10322
\index{word, reducible}
10323

10324
Write a program to find all words that can be reduced in this way,
10325
and then find the longest one.
10326

10327
This exercise is a little more challenging than most, so here are
10328
some suggestions:
10329

10330
\begin{enumerate}
10331

10332
\item You might want to write a function that takes a word and
10333
  computes a list of all the words that can be formed by removing one
10334
  letter.  These are the ``children'' of the word.
10335
\index{recursive definition}
10336
\index{definition!recursive}
10337

10338
\item Recursively, a word is reducible if any of its children
10339
are reducible.  As a base case, you can consider the empty
10340
string reducible.
10341

10342
\item The wordlist I provided, {\tt words.txt}, doesn't
10343
contain single letter words.  So you might want to add
10344
``I'', ``a'', and the empty string.
10345

10346
\item To improve the performance of your program, you might want
10347
to memoize the words that are known to be reducible.
10348

10349
\end{enumerate}
10350

10351
Solution: \url{http://thinkpython2.com/code/reducible.py}.
10352

10353
\end{exercise}
10354

10355

10356

10357

10358
%\begin{exercise}
10359
%\url{http://en.wikipedia.org/wiki/Word_Ladder}
10360
%\end{exercise}
10361

10362

10363

10364

10365
\chapter{Case study: data structure selection}
10366

10367
At this point you have learned about Python's core data structures,
10368
and you have seen some of the algorithms that use them.
10369
If you would like to know more about algorithms, this might be a good
10370
time to read Chapter~\ref{algorithms}.
10371
But you don't have to read it before you go on; you can read
10372
it whenever you are interested.
10373

10374
This chapter presents a case study with exercises that let
10375
you think about choosing data structures and practice using them.
10376

10377

10378
\section{Word frequency analysis}
10379
\label{analysis}
10380

10381
As usual, you should at least attempt the exercises
10382
before you read my solutions.
10383

10384
\begin{exercise}
10385

10386
Write a program that reads a file, breaks each line into
10387
words, strips whitespace and punctuation from the words, and
10388
converts them to lowercase.
10389
\index{string module}
10390
\index{module!string}
10391

10392
Hint: The {\tt string} module provides a string named {\tt whitespace},
10393
which contains space, tab, newline, etc., and {\tt
10394
  punctuation} which contains the punctuation characters.  Let's see
10395
if we can make Python swear:
10396

10397
\begin{verbatim}
10398
>>> import string
10399
>>> string.punctuation
10400
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
10401
\end{verbatim}
10402
%
10403
Also, you might consider using the string methods {\tt strip},
10404
{\tt replace} and {\tt translate}.
10405
\index{strip method}
10406
\index{method!strip}
10407
\index{replace method}
10408
\index{method!replace}
10409
\index{translate method}
10410
\index{method!translate}
10411

10412
\end{exercise}
10413

10414

10415
\begin{exercise}
10416
\index{Project Gutenberg}
10417

10418
Go to Project Gutenberg (\url{http://gutenberg.org}) and download 
10419
your favorite out-of-copyright book in plain text format.
10420
\index{plain text}
10421
\index{text!plain}
10422

10423
Modify your program from the previous exercise to read the book
10424
you downloaded, skip over the header information at the beginning
10425
of the file, and process the rest of the words as before.
10426

10427
Then modify the program to count the total number of words in
10428
the book, and the number of times each word is used.
10429
\index{word frequency}
10430
\index{frequency!word}
10431

10432
Print the number of different words used in the book.  Compare
10433
different books by different authors, written in different eras.
10434
Which author uses the most extensive vocabulary?
10435
\end{exercise}
10436

10437

10438
\begin{exercise}
10439

10440
Modify the program from the previous exercise to print the
10441
20 most frequently used words in the book.
10442

10443
\end{exercise}
10444

10445

10446
\begin{exercise}
10447

10448
Modify the previous program to read a word list (see
10449
Section~\ref{wordlist}) and then print all the words in the book that
10450
are not in the word list.  How many of them are typos?  How many of
10451
them are common words that {\em should} be in the word list, and how
10452
many of them are really obscure?
10453

10454
\end{exercise}
10455

10456

10457
\section{Random numbers}
10458
\index{random number}
10459
\index{number, random}
10460
\index{deterministic}
10461
\index{pseudorandom}
10462

10463
Given the same inputs, most computer programs generate the same
10464
outputs every time, so they are said to be {\bf deterministic}.
10465
Determinism is usually a good thing, since we expect the same
10466
calculation to yield the same result.  For some applications, though,
10467
we want the computer to be unpredictable.  Games are an obvious
10468
example, but there are more.
10469

10470
Making a program truly nondeterministic turns out to be difficult,
10471
but there are ways to make it at least seem nondeterministic.  One of
10472
them is to use algorithms that generate {\bf pseudorandom} numbers.
10473
Pseudorandom numbers are not truly random because they are generated
10474
by a deterministic computation, but just by looking at the numbers it
10475
is all but impossible to distinguish them from random.
10476
\index{random module}
10477
\index{module!random}
10478

10479
The {\tt random} module provides functions that generate
10480
pseudorandom numbers (which I will simply call ``random'' from
10481
here on).
10482
\index{random function}
10483
\index{function!random}
10484

10485
The function {\tt random} returns a random float
10486
between 0.0 and 1.0 (including 0.0 but not 1.0).  Each time you
10487
call {\tt random}, you get the next number in a long series.  To see a
10488
sample, run this loop:
10489

10490
\begin{verbatim}
10491
import random
10492

10493
for i in range(10):
10494
    x = random.random()
10495
    print(x)
10496
\end{verbatim}
10497
%
10498
The function {\tt randint} takes parameters {\tt low} and
10499
{\tt high} and returns an integer between {\tt low} and
10500
{\tt high} (including both).
10501
\index{randint function}
10502
\index{function!randint}
10503

10504
\begin{verbatim}
10505
>>> random.randint(5, 10)
10506
5
10507
>>> random.randint(5, 10)
10508
9
10509
\end{verbatim}
10510
%
10511
To choose an element from a sequence at random, you can use
10512
{\tt choice}:
10513
\index{choice function}
10514
\index{function!choice}
10515

10516
\begin{verbatim}
10517
>>> t = [1, 2, 3]
10518
>>> random.choice(t)
10519
2
10520
>>> random.choice(t)
10521
3
10522
\end{verbatim}
10523
%
10524
The {\tt random} module also provides functions to generate
10525
random values from continuous distributions including
10526
Gaussian, exponential, gamma, and a few more.
10527

10528
\begin{exercise}
10529
\index{histogram!random choice}
10530

10531
Write a function named \verb"choose_from_hist" that takes
10532
a histogram as defined in Section~\ref{histogram} and returns a 
10533
random value from the histogram, chosen with probability
10534
in proportion to frequency.  For example, for this histogram:
10535

10536
\begin{verbatim}
10537
>>> t = ['a', 'a', 'b']
10538
>>> hist = histogram(t)
10539
>>> hist
10540
{'a': 2, 'b': 1}
10541
\end{verbatim}
10542
%
10543
your function should return \verb"'a'" with probability $2/3$ and \verb"'b'"
10544
with probability $1/3$.
10545
\end{exercise}
10546

10547

10548
\section{Word histogram}
10549

10550
You should attempt the previous exercises before you go on.
10551
You can download my solution from
10552
 \url{http://thinkpython2.com/code/analyze_book1.py}.  You will
10553
also need \url{http://thinkpython2.com/code/emma.txt}.
10554

10555
Here is a program that reads a file and builds a histogram of the
10556
words in the file:
10557
\index{histogram!word frequencies}
10558

10559
\begin{verbatim}
10560
import string
10561

10562
def process_file(filename):
10563
    hist = dict()
10564
    fp = open(filename)
10565
    for line in fp:
10566
        process_line(line, hist)
10567
    return hist
10568

10569
def process_line(line, hist):
10570
    line = line.replace('-', ' ')
10571
    
10572
    for word in line.split():
10573
        word = word.strip(string.punctuation + string.whitespace)
10574
        word = word.lower()
10575
        hist[word] = hist.get(word, 0) + 1
10576

10577
hist = process_file('emma.txt')
10578
\end{verbatim}
10579
%
10580
This program reads {\tt emma.txt}, which contains the text of {\em
10581
  Emma} by Jane Austen.
10582
\index{Austin, Jane}
10583

10584
\verb"process_file" loops through the lines of the file,
10585
passing them one at a time to \verb"process_line".  The histogram
10586
{\tt hist} is being used as an accumulator.
10587
\index{accumulator!histogram}
10588
\index{traversal}
10589

10590
\verb"process_line" uses the string method {\tt replace} to replace
10591
hyphens with spaces before using {\tt split} to break the line into a
10592
list of strings.  It traverses the list of words and uses {\tt strip}
10593
and {\tt lower} to remove punctuation and convert to lower case.  (It
10594
is a shorthand to say that strings are ``converted''; remember that
10595
strings are immutable, so methods like {\tt strip} and {\tt lower}
10596
return new strings.)
10597

10598
Finally, \verb"process_line" updates the histogram by creating a new
10599
item or incrementing an existing one.
10600
\index{update!histogram}
10601

10602
To count the total number of words in the file, we can add up
10603
the frequencies in the histogram:
10604

10605
\begin{verbatim}
10606
def total_words(hist):
10607
    return sum(hist.values())
10608
\end{verbatim}
10609
%
10610
The number of different words is just the number of items in
10611
the dictionary:
10612

10613
\begin{verbatim}
10614
def different_words(hist):
10615
    return len(hist)
10616
\end{verbatim}
10617
%
10618
Here is some code to print the results:
10619

10620
\begin{verbatim}
10621
print('Total number of words:', total_words(hist))
10622
print('Number of different words:', different_words(hist))
10623
\end{verbatim}
10624
%
10625
And the results:
10626

10627
\begin{verbatim}
10628
Total number of words: 161080
10629
Number of different words: 7214
10630
\end{verbatim}
10631
%
10632

10633
\section{Most common words}
10634

10635
To find the most common words, we can make a list of tuples,
10636
where each tuple contains a word and its frequency,
10637
and sort it.
10638

10639
The following function takes a histogram and returns a list of
10640
word-frequency tuples:
10641

10642
\begin{verbatim}
10643
def most_common(hist):
10644
    t = []
10645
    for key, value in hist.items():
10646
        t.append((value, key))
10647

10648
    t.sort(reverse=True)
10649
    return t
10650
\end{verbatim}
10651

10652
In each tuple, the frequency appears first, so the resulting list is
10653
sorted by frequency.  Here is a loop that prints the ten most common
10654
words:
10655

10656
\begin{verbatim}
10657
t = most_common(hist)
10658
print('The most common words are:')
10659
for freq, word in t[:10]:
10660
    print(word, freq, sep='\t')
10661
\end{verbatim}
10662
%
10663
I use the keyword argument {\tt sep} to tell {\tt print} to use a tab
10664
character as a ``separator'', rather than a space, so the second
10665
column is lined up.  Here are the results from {\em Emma}:
10666

10667
\begin{verbatim}
10668
The most common words are:
10669
to      5242
10670
the     5205
10671
and     4897
10672
of      4295
10673
i       3191
10674
a       3130
10675
it      2529
10676
her     2483
10677
was     2400
10678
she     2364
10679
\end{verbatim}
10680
%
10681
This code can be simplified using the {\tt key} parameter of
10682
the {\tt sort} function.  If you are curious, you can read about it
10683
at \url{https://wiki.python.org/moin/HowTo/Sorting}.
10684

10685

10686
\section{Optional parameters}
10687
\index{optional parameter}
10688
\index{parameter!optional}
10689

10690
We have seen built-in functions and methods that take optional
10691
arguments.  It is possible to write programmer-defined functions
10692
with optional arguments, too.  For example, here is a function that
10693
prints the most common words in a histogram
10694
\index{programmer-defined function}
10695
\index{function!programmer defined}
10696

10697
\begin{verbatim}
10698
def print_most_common(hist, num=10):
10699
    t = most_common(hist)
10700
    print('The most common words are:')
10701
    for freq, word in t[:num]:
10702
        print(word, freq, sep='\t')
10703
\end{verbatim}
10704

10705
The first parameter is required; the second is optional.
10706
The {\bf default value} of {\tt num} is 10.
10707
\index{default value}
10708
\index{value!default}
10709

10710
If you only provide one argument:
10711

10712
\begin{verbatim}
10713
print_most_common(hist)
10714
\end{verbatim}
10715

10716
{\tt num} gets the default value.  If you provide two arguments:
10717

10718
\begin{verbatim}
10719
print_most_common(hist, 20)
10720
\end{verbatim}
10721

10722
{\tt num} gets the value of the argument instead.  In other
10723
words, the optional argument {\bf overrides} the default value.
10724
\index{override}
10725

10726
If a function has both required and optional parameters, all
10727
the required parameters have to come first, followed by the
10728
optional ones.
10729

10730

10731
\section{Dictionary subtraction}
10732
\label{dictsub}
10733
\index{dictionary!subtraction}
10734
\index{subtraction!dictionary}
10735

10736
Finding the words from the book that are not in the word list
10737
from {\tt words.txt} is a problem you might recognize as set
10738
subtraction; that is, we want to find all the words from one
10739
set (the words in the book) that are not in the other (the
10740
words in the list).
10741

10742
{\tt subtract} takes dictionaries {\tt d1} and {\tt d2} and returns a
10743
new dictionary that contains all the keys from {\tt d1} that are not
10744
in {\tt d2}.  Since we don't really care about the values, we
10745
set them all to None.
10746

10747
\begin{verbatim}
10748
def subtract(d1, d2):
10749
    res = dict()
10750
    for key in d1:
10751
        if key not in d2:
10752
            res[key] = None
10753
    return res
10754
\end{verbatim}
10755
%
10756
To find the words in the book that are not in {\tt words.txt},
10757
we can use \verb"process_file" to build a histogram for
10758
{\tt words.txt}, and then subtract:
10759

10760
\begin{verbatim}
10761
words = process_file('words.txt')
10762
diff = subtract(hist, words)
10763

10764
print("Words in the book that aren't in the word list:")
10765
for word in diff:
10766
    print(word, end=' ')
10767
\end{verbatim}
10768
%
10769
Here are some of the results from {\em Emma}:
10770

10771
\begin{verbatim}
10772
Words in the book that aren't in the word list:
10773
rencontre jane's blanche woodhouses disingenuousness 
10774
friend's venice apartment ...
10775
\end{verbatim}
10776
%
10777
Some of these words are names and possessives.  Others, like
10778
``rencontre'', are no longer in common use.  But a few are common
10779
words that should really be in the list!
10780

10781
\begin{exercise}
10782
\index{set}
10783
\index{type!set}
10784

10785
Python provides a data structure called {\tt set} that provides many
10786
common set operations.  You can read about them in Section~\ref{sets},
10787
or read the documentation at
10788
\url{http://docs.python.org/3/library/stdtypes.html#types-set}.
10789

10790
Write a program that uses set subtraction to find words in the book
10791
that are not in the word list.  Solution:
10792
\url{http://thinkpython2.com/code/analyze_book2.py}.
10793

10794
\end{exercise}
10795

10796

10797
\section{Random words}
10798
\label{randomwords}
10799
\index{histogram!random choice}
10800

10801
To choose a random word from the histogram, the simplest algorithm
10802
is to build a list with multiple copies of each word, according
10803
to the observed frequency, and then choose from the list:
10804

10805
\begin{verbatim}
10806
def random_word(h):
10807
    t = []
10808
    for word, freq in h.items():
10809
        t.extend([word] * freq)
10810

10811
    return random.choice(t)
10812
\end{verbatim}
10813
%
10814
The expression {\tt [word] * freq} creates a list with {\tt freq}
10815
copies of the string {\tt word}.  The {\tt extend}
10816
method is similar to {\tt append} except that the argument is
10817
a sequence.
10818

10819
This algorithm works, but it is not very efficient; each time you
10820
choose a random word, it rebuilds the list, which is as big as
10821
the original book.  An obvious improvement is to build the list
10822
once and then make multiple selections, but the list is still big.
10823

10824
An alternative is:
10825

10826
\begin{enumerate}
10827

10828
\item Use {\tt keys} to get a list of the words in the book.
10829

10830
\item Build a list that contains the cumulative sum of the word
10831
  frequencies (see Exercise~\ref{cumulative}).  The last item
10832
  in this list is the total number of words in the book, $n$.
10833
  
10834
\item Choose a random number from 1 to $n$.  Use a bisection search
10835
  (See Exercise~\ref{bisection}) to find the index where the random
10836
  number would be inserted in the cumulative sum.
10837

10838
\item Use the index to find the corresponding word in the word list.
10839

10840
\end{enumerate}
10841

10842
\begin{exercise}
10843
\label{randhist}
10844
\index{algorithm}
10845

10846
Write a program that uses this algorithm to choose a random word from
10847
the book.  Solution:
10848
\url{http://thinkpython2.com/code/analyze_book3.py}.
10849

10850
\end{exercise}
10851

10852

10853

10854
\section{Markov analysis}
10855
\label{markov}
10856
\index{Markov analysis}
10857

10858
If you choose words from the book at random, you can get a
10859
sense of the vocabulary, but you probably won't get a sentence:
10860

10861
\begin{verbatim}
10862
this the small regard harriet which knightley's it most things
10863
\end{verbatim}
10864
%
10865
A series of random words seldom makes sense because there
10866
is no relationship between successive words.  For example, in
10867
a real sentence you would expect an article like ``the'' to
10868
be followed by an adjective or a noun, and probably not a verb
10869
or adverb.
10870

10871
One way to measure these kinds of relationships is Markov
10872
analysis, which
10873
characterizes, for a given sequence of words, the probability of the
10874
words that might come next.  For example, the song {\em Eric, the Half a
10875
  Bee} begins:
10876

10877
\begin{quote}
10878
Half a bee, philosophically, \\
10879
Must, ipso facto, half not be. \\
10880
But half the bee has got to be \\
10881
Vis a vis, its entity. D'you see? \\
10882
\\
10883
But can a bee be said to be \\
10884
Or not to be an entire bee \\
10885
When half the bee is not a bee \\
10886
Due to some ancient injury? \\
10887
\end{quote}
10888
%
10889
In this text,
10890
the phrase ``half the'' is always followed by the word ``bee'',
10891
but the phrase ``the bee'' might be followed by either
10892
``has'' or ``is''.
10893
\index{prefix}
10894
\index{suffix}
10895
\index{mapping}
10896

10897
The result of Markov analysis is a mapping from each prefix
10898
(like ``half the'' and ``the bee'') to all possible suffixes
10899
(like ``has'' and ``is'').
10900
\index{random text}
10901
\index{text!random}
10902

10903
Given this mapping, you can generate a random text by
10904
starting with any prefix and choosing at random from the
10905
possible suffixes.  Next, you can combine the end of the
10906
prefix and the new suffix to form the next prefix, and repeat.
10907

10908
For example, if you start with the prefix ``Half a'', then the
10909
next word has to be ``bee'', because the prefix only appears
10910
once in the text.  The next prefix is ``a bee'', so the
10911
next suffix might be ``philosophically'', ``be'' or ``due''.
10912

10913
In this example the length of the prefix is always two, but
10914
you can do Markov analysis with any prefix length.
10915

10916
\begin{exercise}
10917

10918
Markov analysis:
10919

10920
\begin{enumerate}
10921

10922
\item Write a program to read a text from a file and perform Markov
10923
analysis.  The result should be a dictionary that maps from
10924
prefixes to a collection of possible suffixes.  The collection
10925
might be a list, tuple, or dictionary; it is up to you to make
10926
an appropriate choice.  You can test your program with prefix
10927
length two, but you should write the program in a way that makes
10928
it easy to try other lengths.
10929

10930
\item Add a function to the previous program to generate random text
10931
based on the Markov analysis.  Here is an example from {\em Emma}
10932
with prefix length 2:
10933

10934
\begin{quote}
10935
He was very clever, be it sweetness or be angry, ashamed or only
10936
amused, at such a stroke. She had never thought of Hannah till you
10937
were never meant for me?" "I cannot make speeches, Emma:" he soon cut
10938
it all himself.
10939
\end{quote}
10940

10941
For this example, I left the punctuation attached to the words.
10942
The result is almost syntactically correct, but not quite.
10943
Semantically, it almost makes sense, but not quite.
10944

10945
What happens if you increase the prefix length?  Does the random
10946
text make more sense?
10947

10948
\item Once your program is working, you might want to try a mash-up:
10949
if you combine text from two or more books, the random
10950
text you generate will blend the vocabulary and phrases from
10951
the sources in interesting ways.
10952
\index{mash-up}
10953

10954
\end{enumerate}
10955

10956
Credit: This case study is based on an example from Kernighan and
10957
Pike, {\em The Practice of Programming}, Addison-Wesley, 1999.
10958

10959
\end{exercise}
10960

10961
You should attempt this exercise before you go on; then you can
10962
download my solution from \url{http://thinkpython2.com/code/markov.py}.
10963
You will also need \url{http://thinkpython2.com/code/emma.txt}.
10964

10965

10966
\section{Data structures}
10967
\index{data structure}
10968

10969
Using Markov analysis to generate random text is fun, but there is
10970
also a point to this exercise: data structure selection.  In your
10971
solution to the previous exercises, you had to choose:
10972

10973
\begin{itemize}
10974

10975
\item How to represent the prefixes.
10976

10977
\item How to represent the collection of possible suffixes.
10978

10979
\item How to represent the mapping from each prefix to
10980
the collection of possible suffixes.
10981

10982
\end{itemize}
10983

10984
The last one is easy: a dictionary is the obvious choice
10985
for a mapping from keys to corresponding values.
10986

10987
For the prefixes, the most obvious options are string,
10988
list of strings, or tuple of strings.
10989

10990
For the suffixes,
10991
one option is a list; another is a histogram (dictionary).
10992
\index{implementation}
10993

10994
How should you choose?  The first step is to think about
10995
the operations you will need to implement for each data structure.
10996
For the prefixes, we need to be able to remove words from
10997
the beginning and add to the end.  For example, if the current
10998
prefix is ``Half a'', and the next word is ``bee'', you need
10999
to be able to form the next prefix, ``a bee''.
11000
\index{tuple!as key in dictionary}
11001

11002
Your first choice might be a list, since it is easy to add
11003
and remove elements, but we also need to be able to use the
11004
prefixes as keys in a dictionary, so that rules out lists.
11005
With tuples, you can't append or remove, but you can use
11006
the addition operator to form a new tuple:
11007

11008
\begin{verbatim}
11009
def shift(prefix, word):
11010
    return prefix[1:] + (word,)
11011
\end{verbatim}
11012
%
11013
{\tt shift} takes a tuple of words, {\tt prefix}, and a string, 
11014
{\tt word}, and forms a new tuple that has all the words
11015
in {\tt prefix} except the first, and {\tt word} added to
11016
the end.
11017

11018
For the collection of suffixes, the operations we need to
11019
perform include adding a new suffix (or increasing the frequency
11020
of an existing one), and choosing a random suffix.
11021

11022
Adding a new suffix is equally easy for the list implementation
11023
or the histogram.  Choosing a random element from a list
11024
is easy; choosing from a histogram is harder to do
11025
efficiently (see Exercise~\ref{randhist}).
11026

11027
So far we have been talking mostly about ease of implementation,
11028
but there are other factors to consider in choosing data structures.
11029
One is run time.  Sometimes there is a theoretical reason to expect
11030
one data structure to be faster than other; for example, I mentioned
11031
that the {\tt in} operator is faster for dictionaries than for lists,
11032
at least when the number of elements is large.
11033

11034
But often you don't know ahead of time which implementation will
11035
be faster.  One option is to implement both of them and see which
11036
is better.  This approach is called {\bf benchmarking}.  A practical
11037
alternative is to choose the data structure that is
11038
easiest to implement, and then see if it is fast enough for the
11039
intended application.  If so, there is no need to go on.  If not,
11040
there are tools, like the {\tt profile} module, that can identify
11041
the places in a program that take the most time.
11042
\index{benchmarking}
11043
\index{profile module}
11044
\index{module!profile}
11045

11046
The other factor to consider is storage space.  For example, using a
11047
histogram for the collection of suffixes might take less space because
11048
you only have to store each word once, no matter how many times it
11049
appears in the text.  In some cases, saving space can also make your
11050
program run faster, and in the extreme, your program might not run at
11051
all if you run out of memory.  But for many applications, space is a
11052
secondary consideration after run time.
11053

11054
One final thought: in this discussion, I have implied that
11055
we should use one data structure for both analysis and generation.  But
11056
since these are separate phases, it would also be possible to use one
11057
structure for analysis and then convert to another structure for
11058
generation.  This would be a net win if the time saved during
11059
generation exceeded the time spent in conversion.
11060

11061

11062
\section{Debugging}
11063
\index{debugging}
11064

11065
When you are debugging a program, and especially if you are
11066
working on a hard bug, there are five things to try:
11067

11068
\begin{description}
11069

11070
\item[Reading:] Examine your code, read it back to yourself, and
11071
check that it says what you meant to say.
11072

11073
\item[Running:] Experiment by making changes and running different
11074
versions.  Often if you display the right thing at the right place
11075
in the program, the problem becomes obvious, but sometimes you have to
11076
build scaffolding.
11077

11078
\item[Ruminating:] Take some time to think!  What kind of error
11079
is it: syntax, runtime, or semantic?  What information can you get from
11080
the error messages, or from the output of the program?  What kind of
11081
error could cause the problem you're seeing?  What did you change
11082
last, before the problem appeared?
11083

11084
\item[Rubberducking:] If you explain the problem to someone else, you
11085
  sometimes find the answer before you finish asking the question.
11086
  Often you don't need the other person; you could just talk to a rubber
11087
  duck.  And that's the origin of the well-known strategy called {\bf
11088
    rubber duck debugging}.  I am not making this up; see
11089
  \url{https://en.wikipedia.org/wiki/Rubber_duck_debugging}.
11090

11091
\item[Retreating:] At some point, the best thing to do is back
11092
off, undoing recent changes, until you get back to a program that
11093
works and that you understand.  Then you can start rebuilding.
11094

11095
\end{description}
11096

11097
Beginning programmers sometimes get stuck on one of these activities
11098
and forget the others.  Each activity comes with its own failure
11099
mode.
11100
\index{typographical error}
11101

11102
For example, reading your code might help if the problem is a
11103
typographical error, but not if the problem is a conceptual
11104
misunderstanding.  If you don't understand what your program does, you
11105
can read it 100 times and never see the error, because the error is in
11106
your head.
11107
\index{experimental debugging}
11108

11109
Running experiments can help, especially if you run small, simple
11110
tests.  But if you run experiments without thinking or reading your
11111
code, you might fall into a pattern I call ``random walk programming'',
11112
which is the process of making random changes until the program
11113
does the right thing.  Needless to say, random walk programming
11114
can take a long time.
11115
\index{random walk programming}
11116
\index{development plan!random walk programming}
11117

11118
You have to take time to think.  Debugging is like an
11119
experimental science.  You should have at least one hypothesis about
11120
what the problem is.  If there are two or more possibilities, try to
11121
think of a test that would eliminate one of them.
11122

11123
But even the best debugging techniques will fail if there are too many
11124
errors, or if the code you are trying to fix is too big and
11125
complicated.  Sometimes the best option is to retreat, simplifying the
11126
program until you get to something that works and that you
11127
understand.
11128

11129
Beginning programmers are often reluctant to retreat because
11130
they can't stand to delete a line of code (even if it's wrong).
11131
If it makes you feel better, copy your program into another file
11132
before you start stripping it down.  Then you can copy the pieces
11133
back one at a time.
11134

11135
Finding a hard bug requires reading, running, ruminating, and
11136
sometimes retreating.  If you get stuck on one of these activities,
11137
try the others.
11138

11139

11140
\section{Glossary}
11141

11142
\begin{description}
11143

11144
\item[deterministic:] Pertaining to a program that does the same
11145
thing each time it runs, given the same inputs.
11146
\index{deterministic}
11147

11148
\item[pseudorandom:] Pertaining to a sequence of numbers that appears
11149
to be random, but is generated by a deterministic program.
11150
\index{pseudorandom}
11151

11152
\item[default value:] The value given to an optional parameter if no
11153
argument is provided.
11154
\index{default value}
11155

11156
\item[override:] To replace a default value with an argument.
11157
\index{override}
11158

11159
\item[benchmarking:] The process of choosing between data structures
11160
by implementing alternatives and testing them on a sample of the
11161
possible inputs.  
11162
\index{benchmarking}
11163

11164
\item[rubber duck debugging:] Debugging by explaining your problem
11165
to an inanimate object such as a rubber duck.  Articulating the
11166
problem can help you solve it, even if the rubber duck doesn't know
11167
Python. 
11168
\index{rubber duck debugging}
11169
\index{debugging!rubber duck}
11170

11171
\end{description}
11172

11173

11174
\section{Exercises}
11175

11176
\begin{exercise}
11177
\index{word frequency}
11178
\index{frequency!word}
11179
\index{Zipf's law}
11180

11181
The ``rank'' of a word is its position in a list of words
11182
sorted by frequency: the most common word has rank 1, the
11183
second most common has rank 2, etc.
11184

11185
Zipf's law describes a relationship between the ranks and frequencies
11186
of words in natural languages
11187
(\url{http://en.wikipedia.org/wiki/Zipf's_law}).  Specifically, it
11188
predicts that the frequency, $f$, of the word with rank $r$ is:
11189

11190
\[ f = c r^{-s} \]
11191
%
11192
where $s$ and $c$ are parameters that depend on the language and the
11193
text.  If you take the logarithm of both sides of this equation, you
11194
get:
11195
\index{logarithm}
11196

11197
\[ \log f = \log c - s \log r \]
11198
%
11199
So if you plot log $f$ versus log $r$, you should get
11200
a straight line with slope $-s$ and intercept log $c$.
11201

11202
Write a program that reads a text from a file, counts
11203
word frequencies, and prints one line
11204
for each word, in descending order of frequency, with
11205
log $f$ and log $r$.  Use the graphing program of your
11206
choice to plot the results and check whether they form
11207
a straight line.  Can you estimate the value of $s$?
11208

11209
Solution: \url{http://thinkpython2.com/code/zipf.py}.
11210
To run my solution, you need the plotting module {\tt matplotlib}.
11211
If you installed Anaconda, you already have {\tt matplotlib};
11212
otherwise you might have to install it.
11213
\index{matplotlib}
11214

11215
\end{exercise}
11216

11217

11218

11219
\chapter{Files}
11220

11221
This chapter introduces the idea of ``persistent'' programs that
11222
keep data in permanent storage, and shows how to use different
11223
kinds of permanent storage, like files and databases.
11224

11225

11226
\section{Persistence}
11227
\index{file}
11228
\index{type!file}
11229
\index{persistence}
11230

11231
Most of the programs we have seen so far are transient in the
11232
sense that they run for a short time and produce some output,
11233
but when they end, their data disappears.  If you run the program
11234
again, it starts with a clean slate.
11235

11236
Other programs are {\bf persistent}: they run for a long time
11237
(or all the time); they keep at least some of their data
11238
in permanent storage (a hard drive, for example); and
11239
if they shut down and restart, they pick up where they left off.
11240

11241
Examples of persistent programs are operating systems, which
11242
run pretty much whenever a computer is on, and web servers,
11243
which run all the time, waiting for requests to come in on
11244
the network.
11245

11246
One of the simplest ways for programs to maintain their data
11247
is by reading and writing text files.  We have already seen
11248
programs that read text files; in this chapter we will see programs
11249
that write them.
11250

11251
An alternative is to store the state of the program in a database.
11252
In this chapter I will present a simple database and a module,
11253
{\tt pickle}, that makes it easy to store program data.
11254
\index{pickle module}
11255
\index{module!pickle}
11256

11257

11258
\section{Reading and writing}
11259
\index{file!reading and writing}
11260

11261
A text file is a sequence of characters stored on a permanent
11262
medium like a hard drive, flash memory, or CD-ROM.  We saw how
11263
to open and read a file in Section~\ref{wordlist}.
11264
\index{open function}
11265
\index{function!open}
11266

11267
To write a file, you have to open it with mode \verb"'w'" as a second
11268
parameter:
11269

11270
\begin{verbatim}
11271
>>> fout = open('output.txt', 'w')
11272
\end{verbatim}
11273
%
11274
If the file already exists, opening it in write mode clears out
11275
the old data and starts fresh, so be careful!
11276
If the file doesn't exist, a new one is created.
11277

11278
{\tt open} returns a file object that provides methods for working
11279
with the file.
11280
The {\tt write} method puts data into the file.
11281

11282
\begin{verbatim}
11283
>>> line1 = "This here's the wattle,\n"
11284
>>> fout.write(line1)
11285
24
11286
\end{verbatim}
11287
%
11288
The return value is the number of characters that were written.
11289
The file object keeps track of where it is, so if
11290
you call {\tt write} again, it adds the new data to the end of
11291
the file.
11292

11293
\begin{verbatim}
11294
>>> line2 = "the emblem of our land.\n"
11295
>>> fout.write(line2)
11296
24
11297
\end{verbatim}
11298
%
11299
When you are done writing, you should close the file.
11300

11301
\begin{verbatim}
11302
>>> fout.close()
11303
\end{verbatim}
11304
%
11305
\index{close method}
11306
\index{method!close}
11307
%
11308
If you don't close the file, it gets closed for you when the
11309
program ends.
11310

11311

11312
\section{Format operator}
11313
\index{format operator}
11314
\index{operator!format}
11315

11316
The argument of {\tt write} has to be a string, so if we want
11317
to put other values in a file, we have to convert them to
11318
strings.  The easiest way to do that is with {\tt str}:
11319

11320
\begin{verbatim}
11321
>>> x = 52
11322
>>> fout.write(str(x))
11323
\end{verbatim}
11324
%
11325
An alternative is to use the {\bf format operator}, {\tt \%}.  When
11326
applied to integers, {\tt \%} is the modulus operator.  But
11327
when the first operand is a string, {\tt \%} is the format operator.
11328
\index{format string}
11329

11330
The first operand is the {\bf format string}, which contains
11331
one or more {\bf format sequences}, which
11332
specify how
11333
the second operand is formatted.  The result is a string.
11334
\index{format sequence}
11335

11336
For example, the format sequence \verb"'%d'" means that
11337
the second operand should be formatted as a decimal
11338
integer:
11339

11340
\begin{verbatim}
11341
>>> camels = 42
11342
>>> '%d' % camels
11343
'42'
11344
\end{verbatim}
11345
%
11346
The result is the string \verb"'42'", which is not to be confused
11347
with the integer value {\tt 42}.
11348

11349
A format sequence can appear anywhere in the string,
11350
so you can embed a value in a sentence:
11351

11352
\begin{verbatim}
11353
>>> 'I have spotted %d camels.' % camels
11354
'I have spotted 42 camels.'
11355
\end{verbatim}
11356
%
11357
If there is more than one format sequence in the string,
11358
the second argument has to be a tuple.  Each format sequence is
11359
matched with an element of the tuple, in order.
11360

11361
The following example uses \verb"'%d'" to format an integer,
11362
\verb"'%g'" to format a floating-point number, and
11363
\verb"'%s'" to format a string:
11364

11365
\begin{verbatim}
11366
>>> 'In %d years I have spotted %g %s.' % (3, 0.1, 'camels')
11367
'In 3 years I have spotted 0.1 camels.'
11368
\end{verbatim}
11369
%
11370
The number of elements in the tuple has to match the number
11371
of format sequences in the string.  Also, the types of the
11372
elements have to match the format sequences:
11373
\index{exception!TypeError}
11374
\index{TypeError}
11375

11376
\begin{verbatim}
11377
>>> '%d %d %d' % (1, 2)
11378
TypeError: not enough arguments for format string
11379
>>> '%d' % 'dollars'
11380
TypeError: %d format: a number is required, not str
11381
\end{verbatim}
11382
%
11383
In the first example, there aren't enough elements; in the
11384
second, the element is the wrong type.
11385

11386
For more information on the format operator, see
11387
\url{https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting}.  A more powerful alternative is the string
11388
format method, which you can read about at
11389
\url{https://docs.python.org/3/library/stdtypes.html#str.format}.
11390

11391
% You can specify the number of digits as part of the format sequence.
11392
% For example, the sequence \verb"'%8.2f'"
11393
% formats a floating-point number to be 8 characters long, with
11394
% 2 digits after the decimal point:
11395

11396
% % \begin{verbatim}
11397
% >>> '%8.2f' % 3.14159
11398
% '    3.14'
11399
% \end{verbatim}
11400
% \afterverb
11401
% %
11402
% The result takes up eight spaces with two
11403
% digits after the decimal point.  
11404

11405

11406
\section{Filenames and paths}
11407
\label{paths}
11408
\index{filename}
11409
\index{path}
11410
\index{directory}
11411
\index{folder}
11412

11413
Files are organized into {\bf directories} (also called ``folders'').
11414
Every running program has a ``current directory'', which is the
11415
default directory for most operations.  
11416
For example, when you open a file for reading, Python looks for it in the
11417
current directory.
11418
\index{os module}
11419
\index{module!os}
11420

11421
The {\tt os} module provides functions for working with files and
11422
directories (``os'' stands for ``operating system'').  {\tt os.getcwd}
11423
returns the name of the current directory:
11424
\index{getcwd function}
11425
\index{function!getcwd}
11426

11427
\begin{verbatim}
11428
>>> import os
11429
>>> cwd = os.getcwd()
11430
>>> cwd
11431
'/home/dinsdale'
11432
\end{verbatim}
11433
%
11434
{\tt cwd} stands for ``current working directory''.  The result in
11435
this example is {\tt /home/dinsdale}, which is the home directory of a
11436
user named {\tt dinsdale}.
11437
\index{working directory}
11438
\index{directory!working}
11439

11440
A string like \verb"'/home/dinsdale'" that identifies a file or
11441
directory is called a {\bf path}.
11442

11443
A simple filename, like {\tt memo.txt} is also considered a path,
11444
but it is a {\bf relative path} because it relates to the current
11445
directory.  If the current directory is {\tt /home/dinsdale}, the
11446
filename {\tt memo.txt} would refer to {\tt /home/dinsdale/memo.txt}.
11447
\index{relative path} \index{path!relative}
11448
\index{absolute path} \index{path!absolute}
11449

11450
A path that begins with {\tt /} does not depend on the current
11451
directory; it is called an {\bf absolute path}.  To find the absolute
11452
path to a file, you can use {\tt os.path.abspath}:
11453

11454
\begin{verbatim}
11455
>>> os.path.abspath('memo.txt')
11456
'/home/dinsdale/memo.txt'
11457
\end{verbatim}
11458
%
11459
{\tt os.path} provides other functions for working with filenames
11460
and paths.  For example,
11461
{\tt os.path.exists} checks
11462
whether a file or directory exists:
11463
\index{exists function}
11464
\index{function!exists}
11465

11466
\begin{verbatim}
11467
>>> os.path.exists('memo.txt')
11468
True
11469
\end{verbatim}
11470
%
11471
If it exists, {\tt os.path.isdir} checks whether it's a directory:
11472

11473
\begin{verbatim}
11474
>>> os.path.isdir('memo.txt')
11475
False
11476
>>> os.path.isdir('/home/dinsdale')
11477
True
11478
\end{verbatim}
11479
%
11480
Similarly, {\tt os.path.isfile} checks whether it's a file.
11481

11482
{\tt os.listdir} returns a list of the files (and other directories)
11483
in the given directory:
11484

11485
\begin{verbatim}
11486
>>> os.listdir(cwd)
11487
['music', 'photos', 'memo.txt']
11488
\end{verbatim}
11489
%
11490
To demonstrate these functions, the following example
11491
``walks'' through a directory, prints
11492
the names of all the files, and calls itself recursively on
11493
all the directories.
11494
\index{walk, directory}
11495
\index{directory!walk}
11496

11497
\begin{verbatim}
11498
def walk(dirname):
11499
    for name in os.listdir(dirname):
11500
        path = os.path.join(dirname, name)
11501

11502
        if os.path.isfile(path):
11503
            print(path)
11504
        else:
11505
            walk(path)
11506
\end{verbatim}
11507
%
11508
{\tt os.path.join} takes a directory and a file name and joins
11509
them into a complete path.  
11510

11511
The {\tt os} module provides a function called {\tt walk} that is
11512
similar to this one but more versatile.  As an exercise, read the
11513
documentation and use it to print the names of the files in a given
11514
directory and its subdirectories.  You can download my solution from
11515
\url{http://thinkpython2.com/code/walk.py}.
11516

11517

11518
\section{Catching exceptions}
11519
\label{catch}
11520

11521
A lot of things can go wrong when you try to read and write
11522
files.  If you try to open a file that doesn't exist, you get an
11523
{\tt IOError}:
11524
\index{open function}
11525
\index{function!open}
11526
\index{exception!IOError}
11527
\index{IOError}
11528

11529
\begin{verbatim}
11530
>>> fin = open('bad_file')
11531
IOError: [Errno 2] No such file or directory: 'bad_file'
11532
\end{verbatim}
11533
%
11534
If you don't have permission to access a file:
11535
\index{file!permission}
11536
\index{permission, file}
11537

11538
\begin{verbatim}
11539
>>> fout = open('/etc/passwd', 'w')
11540
PermissionError: [Errno 13] Permission denied: '/etc/passwd'
11541
\end{verbatim}
11542
%
11543
And if you try to open a directory for reading, you get
11544

11545
\begin{verbatim}
11546
>>> fin = open('/home')
11547
IsADirectoryError: [Errno 21] Is a directory: '/home'
11548
\end{verbatim}
11549
%
11550
To avoid these errors, you could use functions like {\tt os.path.exists}
11551
and {\tt os.path.isfile}, but it would take a lot of time and code
11552
to check all the possibilities (if ``{\tt Errno 21}'' is any
11553
indication, there are at least 21 things that can go wrong).
11554
\index{exception, catching}
11555
\index{try statement}
11556
\index{statement!try}
11557

11558
It is better to go ahead and try---and deal with problems if they
11559
happen---which is exactly what the {\tt try} statement does.  The
11560
syntax is similar to an {\tt if...else} statement:
11561

11562
\begin{verbatim}
11563
try:    
11564
    fin = open('bad_file')
11565
except:
11566
    print('Something went wrong.')
11567
\end{verbatim}
11568
%
11569
Python starts by executing the {\tt try} clause.  If all goes
11570
well, it skips the {\tt except} clause and proceeds.  If an
11571
exception occurs, it jumps out of the {\tt try} clause and
11572
runs the {\tt except} clause.
11573

11574
Handling an exception with a {\tt try} statement is called {\bf
11575
catching} an exception.  In this example, the {\tt except} clause
11576
prints an error message that is not very helpful.  In general,
11577
catching an exception gives you a chance to fix the problem, or try
11578
again, or at least end the program gracefully.
11579

11580

11581
\section{Databases}
11582
\index{database}
11583

11584
A {\bf database} is a file that is organized for storing data.  Many
11585
databases are organized like a dictionary in the sense that they map
11586
from keys to values.  The biggest difference between a database and a
11587
dictionary is that the database is on disk (or other permanent
11588
storage), so it persists after the program ends.  \index{dbm
11589
  module} \index{module!dbm}
11590

11591
The module {\tt dbm} provides an interface for creating
11592
and updating database files.
11593
As an example, I'll create a database
11594
that contains captions for image files.
11595
\index{open function}
11596
\index{function!open}
11597

11598
Opening a database is similar to opening other files:
11599

11600
\begin{verbatim}
11601
>>> import dbm
11602
>>> db = dbm.open('captions', 'c')
11603
\end{verbatim}
11604
%
11605
The mode \verb"'c'" means that the database should be created if
11606
it doesn't already exist.  The result is a database object
11607
that can be used (for most operations) like a dictionary.
11608
\index{database object}
11609
\index{object!database}
11610

11611
When you create a new item, {\tt dbm} updates the database file.
11612
\index{update!database}
11613

11614
\begin{verbatim}
11615
>>> db['cleese.png'] = 'Photo of John Cleese.'
11616
\end{verbatim}
11617
%
11618
When you access one of the items, {\tt dbm} reads the file:
11619

11620
\begin{verbatim}
11621
>>> db['cleese.png']
11622
b'Photo of John Cleese.'
11623
\end{verbatim}
11624
%
11625
The result is a {\bf bytes object}, which is why it begins with {\tt
11626
  b}.  A bytes object is similar to a string in many ways.  When you
11627
get farther into Python, the difference becomes important, but for now
11628
we can ignore it.
11629
\index{bytes object}
11630
\index{object!bytes}
11631

11632
If you make another assignment to an existing key, {\tt dbm} replaces
11633
the old value:
11634

11635
\begin{verbatim}
11636
>>> db['cleese.png'] = 'Photo of John Cleese doing a silly walk.'
11637
>>> db['cleese.png']
11638
b'Photo of John Cleese doing a silly walk.'
11639
\end{verbatim}
11640
%
11641

11642
Some dictionary methods, like {\tt keys} and {\tt items}, don't
11643
work with database objects.  But iteration with a {\tt for}
11644
loop works:
11645
\index{dictionary methods!dbm module}
11646

11647
\begin{verbatim}
11648
for key in db:
11649
    print(key, db[key])
11650
\end{verbatim}
11651
%
11652
As with other files, you should close the database when you are
11653
done:
11654

11655
\begin{verbatim}
11656
>>> db.close()
11657
\end{verbatim}
11658
%
11659
\index{close method}
11660
\index{method!close}
11661

11662

11663
\section{Pickling}
11664
\index{pickling}
11665

11666
A limitation of {\tt dbm} is that the keys and values have to be
11667
strings or bytes.  If you try to use any other type, you get an error.
11668
\index{pickle module} \index{module!pickle}
11669

11670
The {\tt pickle} module can help.  It translates
11671
almost any type of object into a string suitable for storage in a
11672
database, and then translates strings back into objects.
11673

11674
{\tt pickle.dumps} takes an object as a parameter and returns
11675
a string representation ({\tt dumps} is short for ``dump string''):
11676

11677
\begin{verbatim}
11678
>>> import pickle
11679
>>> t = [1, 2, 3]
11680
>>> pickle.dumps(t)
11681
b'\x80\x03]q\x00(K\x01K\x02K\x03e.'
11682
\end{verbatim}
11683
%
11684
The format isn't obvious to human readers; it is meant to be
11685
easy for {\tt pickle} to interpret.  {\tt pickle.loads}
11686
(``load string'') reconstitutes the object:
11687

11688
\begin{verbatim}
11689
>>> t1 = [1, 2, 3]
11690
>>> s = pickle.dumps(t1)
11691
>>> t2 = pickle.loads(s)
11692
>>> t2
11693
[1, 2, 3]
11694
\end{verbatim}
11695
%
11696
Although the new object has the same value as the old, it is
11697
not (in general) the same object:
11698

11699
\begin{verbatim}
11700
>>> t1 == t2
11701
True
11702
>>> t1 is t2
11703
False
11704
\end{verbatim}
11705
%
11706
In other words, pickling and then unpickling has the same effect
11707
as copying the object.
11708

11709
You can use {\tt pickle} to store non-strings in a database.
11710
In fact, this combination is so common that it has been
11711
encapsulated in a module called {\tt shelve}.  
11712
\index{shelve module}
11713
\index{module!shelve}
11714

11715

11716
\section{Pipes}
11717
\index{shell}
11718
\index{pipe}
11719

11720
Most operating systems provide a command-line interface,
11721
also known as a {\bf shell}.  Shells usually provide commands
11722
to navigate the file system and launch applications.  For
11723
example, in Unix you can change directories with {\tt cd},
11724
display the contents of a directory with {\tt ls}, and launch
11725
a web browser by typing (for example) {\tt firefox}.
11726
\index{ls (Unix command)}
11727
\index{Unix command!ls}
11728

11729
Any program that you can launch from the shell can also be
11730
launched from Python using a {\bf pipe object}, which
11731
represents a running program.
11732

11733
For example, the Unix command {\tt ls -l} normally displays the
11734
contents of the current directory in long format.  You can
11735
launch {\tt ls} with {\tt os.popen}\footnote{{\tt popen} is deprecated
11736
now, which means we are supposed to stop using it and start using
11737
the {\tt subprocess} module.  But for simple cases, I find
11738
{\tt subprocess} more complicated than necessary.  So I am going
11739
to keep using {\tt popen} until they take it away.}:
11740
\index{popen function}
11741
\index{function!popen}
11742

11743
\begin{verbatim}
11744
>>> cmd = 'ls -l'
11745
>>> fp = os.popen(cmd)
11746
\end{verbatim}
11747
%
11748
The argument is a string that contains a shell command.  The
11749
return value is an object that behaves like an open
11750
file.  You can read the output from the {\tt ls} process one
11751
line at a time with {\tt readline} or get the whole thing at
11752
once with {\tt read}:
11753
\index{readline method}
11754
\index{method!readline}
11755
\index{read method}
11756
\index{method!read}
11757

11758
\begin{verbatim}
11759
>>> res = fp.read()
11760
\end{verbatim}
11761
%
11762
When you are done, you close the pipe like a file:
11763
\index{close method}
11764
\index{method!close}
11765

11766
\begin{verbatim}
11767
>>> stat = fp.close()
11768
>>> print(stat)
11769
None
11770
\end{verbatim}
11771
%
11772
The return value is the final status of the {\tt ls} process;
11773
{\tt None} means that it ended normally (with no errors).
11774

11775
For example, most Unix systems provide a command called {\tt md5sum}
11776
that reads the contents of a file and computes a ``checksum''.
11777
You can read about MD5 at \url{http://en.wikipedia.org/wiki/Md5}.  This
11778
command provides an efficient way to check whether two files
11779
have the same contents.  The probability that different contents
11780
yield the same checksum is very small (that is, unlikely to happen
11781
before the universe collapses).
11782
\index{md5}
11783
\index{checksum}
11784

11785
You can use a pipe to run {\tt md5sum} from Python and get the result:
11786

11787
\begin{verbatim}
11788
>>> filename = 'book.tex'
11789
>>> cmd = 'md5sum ' + filename
11790
>>> fp = os.popen(cmd)
11791
>>> res = fp.read()
11792
>>> stat = fp.close()
11793
>>> print(res)
11794
1e0033f0ed0656636de0d75144ba32e0  book.tex
11795
>>> print(stat)
11796
None
11797
\end{verbatim}
11798

11799

11800

11801
\section{Writing modules}
11802
\label{modules}
11803
\index{module, writing}
11804
\index{word count}
11805

11806
Any file that contains Python code can be imported as a module.
11807
For example, suppose you have a file named {\tt wc.py} with the following
11808
code:
11809

11810
\begin{verbatim}
11811
def linecount(filename):
11812
    count = 0
11813
    for line in open(filename):
11814
        count += 1
11815
    return count
11816

11817
print(linecount('wc.py'))
11818
\end{verbatim}
11819
%
11820
If you run this program, it reads itself and prints the number
11821
of lines in the file, which is 7.
11822
You can also import it like this:
11823

11824
\begin{verbatim}
11825
>>> import wc
11826
7
11827
\end{verbatim}
11828
%
11829
Now you have a module object {\tt wc}:
11830
\index{module object}
11831
\index{object!module}
11832

11833
\begin{verbatim}
11834
>>> wc
11835
<module 'wc' from 'wc.py'>
11836
\end{verbatim}
11837
%
11838
The module object provides \verb"linecount":
11839

11840
\begin{verbatim}
11841
>>> wc.linecount('wc.py')
11842
7
11843
\end{verbatim}
11844
%
11845
So that's how you write modules in Python.
11846

11847
The only problem with this example is that when you import
11848
the module it runs the test code at the bottom.  Normally
11849
when you import a module, it defines new functions but it
11850
doesn't run them.
11851
\index{import statement}
11852
\index{statement!import}
11853

11854
Programs that will be imported as modules often
11855
use the following idiom:
11856

11857
\begin{verbatim}
11858
if __name__ == '__main__':
11859
    print(linecount('wc.py'))
11860
\end{verbatim}
11861
%
11862
\verb"__name__" is a built-in variable that is set when the
11863
program starts.  If the program is running as a script,
11864
\verb"__name__" has the value \verb"'__main__'"; in that
11865
case, the test code runs.  Otherwise,
11866
if the module is being imported, the test code is skipped.
11867

11868
\index{name built-in variable}
11869
\index{main}
11870

11871
As an exercise, type this example into a file named {\tt wc.py} and run
11872
it as a script.  Then run the Python interpreter and
11873
{\tt import wc}.  What is the value of \verb"__name__"
11874
when the module is being imported?
11875

11876
Warning: If you import a module that has already been imported,
11877
Python does nothing.  It does not re-read the file, even if it has
11878
changed.
11879
\index{module!reload}
11880
\index{reload function}
11881
\index{function!reload}
11882

11883
If you want to reload a module, you can use the built-in function 
11884
{\tt reload}, but it can be tricky, so the safest thing to do is
11885
restart the interpreter and then import the module again.
11886

11887

11888
\section{Debugging}
11889
\index{debugging}
11890
\index{whitespace}
11891

11892
When you are reading and writing files, you might run into problems
11893
with whitespace.  These errors can be hard to debug because spaces,
11894
tabs and newlines are normally invisible:
11895

11896
\begin{verbatim}
11897
>>> s = '1 2\t 3\n 4'
11898
>>> print(s)
11899
1 2	 3
11900
 4
11901
\end{verbatim}
11902
\index{repr function}
11903
\index{function!repr}
11904
\index{string representation}
11905

11906
The built-in function {\tt repr} can help.  It takes any object as an
11907
argument and returns a string representation of the object.  For
11908
strings, it represents whitespace
11909
characters with backslash sequences:
11910

11911
\begin{verbatim}
11912
>>> print(repr(s))
11913
'1 2\t 3\n 4'
11914
\end{verbatim}
11915

11916
This can be helpful for debugging.
11917

11918
One other problem you might run into is that different systems
11919
use different characters to indicate the end of a line.  Some
11920
systems use a newline, represented \verb"\n".  Others use
11921
a return character, represented \verb"\r".  Some use both.
11922
If you move files between different systems, these inconsistencies
11923
can cause problems.
11924
\index{end of line character}
11925

11926
For most systems, there are applications to convert from one
11927
format to another.  You can find them (and read more about this
11928
issue) at \url{http://en.wikipedia.org/wiki/Newline}.  Or, of course, you
11929
could write one yourself.
11930

11931

11932
\section{Glossary}
11933

11934
\begin{description}
11935

11936
\item[persistent:] Pertaining to a program that runs indefinitely
11937
and keeps at least some of its data in permanent storage.
11938
\index{persistence}
11939

11940
\item[format operator:] An operator, {\tt \%}, that takes a format
11941
string and a tuple and generates a string that includes
11942
the elements of the tuple formatted as specified by the format string.
11943
\index{format operator}
11944
\index{operator!format}
11945

11946
\item[format string:] A string, used with the format operator, that
11947
contains format sequences.  
11948
\index{format string}
11949

11950
\item[format sequence:] A sequence of characters in a format string,
11951
like {\tt \%d}, that specifies how a value should be formatted.
11952
\index{format sequence}
11953

11954
\item[text file:] A sequence of characters stored in permanent
11955
storage like a hard drive.
11956
\index{text file}
11957

11958
\item[directory:] A named collection of files, also called a folder.
11959
\index{directory}
11960

11961
\item[path:] A string that identifies a file.
11962
\index{path}
11963

11964
\item[relative path:] A path that starts from the current directory.
11965
\index{relative path}
11966

11967
\item[absolute path:] A path that starts from the topmost directory
11968
in the file system.
11969
\index{absolute path}
11970

11971
\item[catch:] To prevent an exception from terminating
11972
a program using the {\tt try}
11973
and {\tt except} statements.
11974
\index{catch}
11975

11976
\item[database:] A file whose contents are organized like a dictionary
11977
with keys that correspond to values.
11978
\index{database}
11979

11980
\item[bytes object:] An object similar to a string.
11981
\index{bytes object}
11982
\index{object!bytes}
11983

11984
\item[shell:] A program that allows users to type commands and then
11985
executes them by starting other programs.
11986
\index{shell}
11987

11988
\item[pipe object:] An object that represents a running program, allowing
11989
a Python program to run commands and read the results.
11990
\index{pipe object}
11991
\index{object!pipe}
11992

11993
\end{description}
11994

11995

11996
\section{Exercises}
11997

11998
\begin{exercise}
11999

12000
Write a function called {\tt sed} that takes as arguments a pattern string,
12001
a replacement string, and two filenames; it should read the first file
12002
and write the contents into the second file (creating it if
12003
necessary).  If the pattern string appears anywhere in the file, it
12004
should be replaced with the replacement string.
12005

12006
If an error occurs while opening, reading, writing or closing files,
12007
your program should catch the exception, print an error message, and
12008
exit.  Solution: \url{http://thinkpython2.com/code/sed.py}.
12009

12010
\end{exercise}
12011

12012

12013
\begin{exercise}
12014
\index{anagram set}
12015
\index{set!anagram}
12016

12017
If you download my solution to Exercise~\ref{anagrams} from
12018
\url{http://thinkpython2.com/code/anagram_sets.py}, you'll see that it creates
12019
a dictionary that maps from a sorted string of letters to the list of
12020
words that can be spelled with those letters.  For example,
12021
\verb"'opst'" maps to the list
12022
\verb"['opts', 'post', 'pots', 'spot', 'stop', 'tops']".
12023

12024
Write a module that imports \verb"anagram_sets" and provides
12025
two new functions: \verb"store_anagrams" should store the
12026
anagram dictionary in a ``shelf''; \verb"read_anagrams" should
12027
look up a word and return a list of its anagrams.
12028
Solution: \url{http://thinkpython2.com/code/anagram_db.py}.
12029

12030
\end{exercise}
12031

12032

12033
\begin{exercise}
12034
\label{checksum}
12035
\index{MP3}
12036

12037
In a large collection of MP3 files, there may be more than one
12038
copy of the same song, stored in different directories or with
12039
different file names.  The goal of this exercise is to search for
12040
duplicates.
12041

12042
\begin{enumerate}
12043

12044
\item Write a program that searches a directory and all of its
12045
subdirectories, recursively, and returns a list of complete paths
12046
for all files with a given suffix (like {\tt .mp3}).
12047
Hint: {\tt os.path} provides several useful functions for
12048
manipulating file and path names.
12049
\index{duplicate}
12050
\index{MD5 algorithm}
12051
\index{algorithm!MD5}
12052
\index{checksum}
12053

12054
\item To recognize duplicates, you can use {\tt md5sum}
12055
to compute a ``checksum'' for each files.  If two files have
12056
the same checksum, they probably have the same contents.
12057
\index{md5sum}
12058

12059
\item To double-check, you can use the Unix command {\tt diff}.
12060
\index{diff}
12061

12062
\end{enumerate}
12063

12064
Solution: \url{http://thinkpython2.com/code/find_duplicates.py}.
12065

12066
\end{exercise}
12067

12068

12069

12070
\chapter{Classes and objects}
12071
\label{clobjects}
12072

12073
At this point you know how to use
12074
functions to organize code and 
12075
built-in types to organize data.  The next step is to learn
12076
``object-oriented programming'', which uses programmer-defined types
12077
to organize both code and data.  Object-oriented programming is
12078
a big topic; it will take a few chapters to get there.
12079
\index{object-oriented programming}
12080

12081
Code examples from this chapter are available from
12082
\url{http://thinkpython2.com/code/Point1.py}; solutions
12083
to the exercises are available from
12084
\url{http://thinkpython2.com/code/Point1_soln.py}.
12085

12086

12087
\section{Programmer-defined types}
12088
\label{point}
12089
\index{programmer-defined type}
12090
\index{type!programmer-defined}
12091

12092
We have used many of Python's built-in types; now we are going
12093
to define a new type.  As an example, we will create a type
12094
called {\tt Point} that represents a point in two-dimensional
12095
space.
12096
\index{point, mathematical}
12097

12098
In mathematical notation, points are often written in
12099
parentheses with a comma separating the coordinates. For example,
12100
$(0,0)$ represents the origin, and $(x,y)$ represents the
12101
point $x$ units to the right and $y$ units up from the origin.
12102

12103
There are several ways we might represent points in Python:
12104

12105
\begin{itemize}
12106

12107
\item We could store the coordinates separately in two
12108
variables, {\tt x} and {\tt y}.
12109

12110
\item We could store the coordinates as elements in a list
12111
or tuple.
12112

12113
\item We could create a new type to represent points as
12114
objects.
12115

12116
\end{itemize}
12117
\index{representation}
12118

12119
Creating a new type
12120
is more complicated than the other options, but
12121
it has advantages that will be apparent soon.
12122

12123
A programmer-defined type is also called a {\bf class}.
12124
A class definition looks like this:
12125
\index{class}
12126
\index{object!class}
12127
\index{class definition}
12128
\index{definition!class}
12129

12130
\begin{verbatim}
12131
class Point:
12132
    """Represents a point in 2-D space."""
12133
\end{verbatim}
12134
%
12135
The header indicates that the new class is called {\tt Point}.
12136
The body is a docstring that explains what the class is for.
12137
You can define variables and methods inside a class definition,
12138
but we will get back to that later.
12139
\index{Point class}
12140
\index{class!Point}
12141
\index{docstring}
12142

12143
Defining a class named {\tt Point} creates a {\bf class object}.
12144

12145
\begin{verbatim}
12146
>>> Point
12147
<class '__main__.Point'>
12148
\end{verbatim}
12149
%
12150
Because {\tt Point} is defined at the top level, its ``full
12151
name'' is \verb"__main__.Point".
12152
\index{object!class}
12153
\index{class object}
12154

12155
The class object is like a factory for creating objects.  To create a
12156
Point, you call {\tt Point} as if it were a function.
12157

12158
\begin{verbatim}
12159
>>> blank = Point()
12160
>>> blank
12161
<__main__.Point object at 0xb7e9d3ac>
12162
\end{verbatim}
12163
%
12164
The return value is a reference to a Point object, which we
12165
assign to {\tt blank}.  
12166

12167
Creating a new object is called
12168
{\bf instantiation}, and the object is an {\bf instance} of
12169
the class.
12170
\index{instance}
12171
\index{instantiation}
12172

12173
When you print an instance, Python tells you what class it
12174
belongs to and where it is stored in memory (the prefix
12175
{\tt 0x} means that the following number is in hexadecimal).
12176
\index{hexadecimal}
12177

12178
Every object is an instance of some class, so ``object'' and
12179
``instance'' are interchangeable.  But in this chapter I use
12180
``instance'' to indicate that I am talking about a programmer-defined
12181
type.
12182

12183

12184
\section{Attributes}
12185
\label{attributes}
12186
\index{instance attribute}
12187
\index{attribute!instance}
12188
\index{dot notation}
12189

12190
You can assign values to an instance using dot notation:
12191

12192
\begin{verbatim}
12193
>>> blank.x = 3.0
12194
>>> blank.y = 4.0
12195
\end{verbatim}
12196
%
12197
This syntax is similar to the syntax for selecting a variable from a
12198
module, such as {\tt math.pi} or {\tt string.whitespace}.  In this case,
12199
though, we are assigning values to named elements of an object.
12200
These elements are called {\bf attributes}.
12201

12202
As a noun, ``AT-trib-ute'' is pronounced with emphasis on the first
12203
syllable, as opposed to ``a-TRIB-ute'', which is a verb.
12204

12205
Figure~\ref{fig.point} is a state diagram that shows the result of these assignments.
12206
A state diagram that shows an object and its attributes is
12207
called an {\bf object diagram}.
12208
\index{state diagram}
12209
\index{diagram!state}
12210
\index{object diagram}
12211
\index{diagram!object}
12212

12213
\begin{figure}
12214
\centerline
12215
{\includegraphics[scale=0.8]{figs/point.pdf}}
12216
\caption{Object diagram.}
12217
\label{fig.point}
12218
\end{figure}
12219

12220
The variable {\tt blank} refers to a Point object, which
12221
contains two attributes.  Each attribute refers to a
12222
floating-point number.
12223

12224
You can read the value of an attribute using the same syntax:
12225

12226
\begin{verbatim}
12227
>>> blank.y
12228
4.0
12229
>>> x = blank.x
12230
>>> x
12231
3.0
12232
\end{verbatim}
12233
%
12234
The expression {\tt blank.x} means, ``Go to the object {\tt blank}
12235
refers to and get the value of {\tt x}.''  In the example, we assign that
12236
value to a variable named {\tt x}.  There is no conflict between
12237
the variable {\tt x} and the attribute {\tt x}.
12238

12239
You can use dot notation as part of any expression.  For example:
12240

12241
\begin{verbatim}
12242
>>> '(%g, %g)' % (blank.x, blank.y)
12243
'(3.0, 4.0)'
12244
>>> distance = math.sqrt(blank.x**2 + blank.y**2)
12245
>>> distance
12246
5.0
12247
\end{verbatim}
12248
%
12249
You can pass an instance as an argument in the usual way.
12250
For example:
12251
\index{instance!as argument}
12252

12253
\begin{verbatim}
12254
def print_point(p):
12255
    print('(%g, %g)' % (p.x, p.y))
12256
\end{verbatim}
12257
%
12258
\verb"print_point" takes a point as an argument and displays it in
12259
mathematical notation.  To invoke it, you can pass {\tt blank} as
12260
an argument:
12261

12262
\begin{verbatim}
12263
>>> print_point(blank)
12264
(3.0, 4.0)
12265
\end{verbatim}
12266
%
12267
Inside the function, {\tt p} is an alias for {\tt blank}, so if
12268
the function modifies {\tt p}, {\tt blank} changes.
12269
\index{aliasing}
12270

12271
As an exercise, write a function called \verb"distance_between_points"
12272
that takes two Points as arguments and returns the distance between
12273
them.
12274

12275

12276
\section{Rectangles}
12277
\label{rectangles}
12278

12279
Sometimes it is obvious what the attributes of an object should be,
12280
but other times you have to make decisions.  For example, imagine you
12281
are designing a class to represent rectangles.  What attributes would
12282
you use to specify the location and size of a rectangle?  You can
12283
ignore angle; to keep things simple, assume that the rectangle is
12284
either vertical or horizontal.
12285
\index{representation}
12286

12287
There are at least two possibilities: 
12288

12289
\begin{itemize}
12290

12291
\item You could specify one corner of the rectangle
12292
(or the center), the width, and the height.
12293

12294
\item You could specify two opposing corners.
12295

12296
\end{itemize}
12297

12298
At this point it is hard to say whether either is better than
12299
the other, so we'll implement the first one, just as an example.
12300
\index{Rectangle class}
12301
\index{class!Rectangle}
12302

12303
Here is the class definition:
12304

12305
\begin{verbatim}
12306
class Rectangle:
12307
    """Represents a rectangle. 
12308

12309
    attributes: width, height, corner.
12310
    """
12311
\end{verbatim}
12312
%
12313
The docstring lists the attributes:  {\tt width} and
12314
{\tt height} are numbers; {\tt corner} is a Point object that
12315
specifies the lower-left corner.
12316

12317
To represent a rectangle, you have to instantiate a Rectangle
12318
object and assign values to the attributes:
12319

12320
\begin{verbatim}
12321
box = Rectangle()
12322
box.width = 100.0
12323
box.height = 200.0
12324
box.corner = Point()
12325
box.corner.x = 0.0
12326
box.corner.y = 0.0
12327
\end{verbatim}
12328
%
12329
The expression {\tt box.corner.x} means,
12330
``Go to the object {\tt box} refers to and select the attribute named
12331
{\tt corner}; then go to that object and select the attribute named
12332
{\tt x}.''
12333

12334
\begin{figure}
12335
\centerline
12336
{\includegraphics[scale=0.8]{figs/rectangle.pdf}}
12337
\caption{Object diagram.}
12338
\label{fig.rectangle}
12339
\end{figure}
12340

12341

12342
Figure~\ref{fig.rectangle} shows the state of this object.
12343
An object that is an attribute of another object is {\bf embedded}.
12344
\index{state diagram}
12345
\index{diagram!state}
12346
\index{object diagram}
12347
\index{diagram!object}
12348
\index{embedded object}
12349
\index{object!embedded}
12350

12351

12352
\section{Instances as return values}
12353
\index{instance!as return value}
12354
\index{return value}
12355

12356
Functions can return instances.  For example, \verb"find_center"
12357
takes a {\tt Rectangle} as an argument and returns a {\tt Point}
12358
that contains the coordinates of the center of the {\tt Rectangle}:
12359

12360
\begin{verbatim}
12361
def find_center(rect):
12362
    p = Point()
12363
    p.x = rect.corner.x + rect.width/2
12364
    p.y = rect.corner.y + rect.height/2
12365
    return p
12366
\end{verbatim}
12367
%
12368
Here is an example that passes {\tt box} as an argument and assigns
12369
the resulting Point to {\tt center}:
12370

12371
\begin{verbatim}
12372
>>> center = find_center(box)
12373
>>> print_point(center)
12374
(50, 100)
12375
\end{verbatim}
12376
%
12377

12378
\section{Objects are mutable}
12379
\index{object!mutable}
12380
\index{mutability}
12381

12382
You can change the state of an object by making an assignment to one of
12383
its attributes.  For example, to change the size of a rectangle
12384
without changing its position, you can modify the values of {\tt
12385
width} and {\tt height}:
12386

12387
\begin{verbatim}
12388
box.width = box.width + 50
12389
box.height = box.height + 100
12390
\end{verbatim}
12391
%
12392
You can also write functions that modify objects.  For example,
12393
\verb"grow_rectangle" takes a Rectangle object and two numbers,
12394
{\tt dwidth} and {\tt dheight}, and adds the numbers to the
12395
width and height of the rectangle:
12396

12397
\begin{verbatim}
12398
def grow_rectangle(rect, dwidth, dheight):
12399
    rect.width += dwidth
12400
    rect.height += dheight
12401
\end{verbatim}
12402
%
12403
Here is an example that demonstrates the effect:
12404

12405
\begin{verbatim}
12406
>>> box.width, box.height
12407
(150.0, 300.0)
12408
>>> grow_rectangle(box, 50, 100)
12409
>>> box.width, box.height
12410
(200.0, 400.0)
12411
\end{verbatim}
12412
%
12413
Inside the function, {\tt rect} is an
12414
alias for {\tt box}, so when the function modifies {\tt rect}, 
12415
{\tt box} changes.
12416

12417
As an exercise, write a function named \verb"move_rectangle" that takes
12418
a Rectangle and two numbers named {\tt dx} and {\tt dy}.  It
12419
should change the location of the rectangle by adding {\tt dx}
12420
to the {\tt x} coordinate of {\tt corner} and adding {\tt dy}
12421
to the {\tt y} coordinate of {\tt corner}.
12422

12423

12424
\section{Copying}
12425
\label{copying}
12426
\index{aliasing}
12427

12428
Aliasing can make a program difficult to read because changes
12429
in one place might have unexpected effects in another place.
12430
It is hard to keep track of all the variables that might refer
12431
to a given object.
12432
\index{copying objects}
12433
\index{object!copying}
12434
\index{copy module}
12435
\index{module!copy}
12436

12437
Copying an object is often an alternative to aliasing.
12438
The {\tt copy} module contains a function called {\tt copy} that
12439
can duplicate any object:
12440

12441
\begin{verbatim}
12442
>>> p1 = Point()
12443
>>> p1.x = 3.0
12444
>>> p1.y = 4.0
12445

12446
>>> import copy
12447
>>> p2 = copy.copy(p1)
12448
\end{verbatim}
12449
%
12450
{\tt p1} and {\tt p2} contain the same data, but they are
12451
not the same Point.
12452

12453
\begin{verbatim}
12454
>>> print_point(p1)
12455
(3, 4)
12456
>>> print_point(p2)
12457
(3, 4)
12458
>>> p1 is p2
12459
False
12460
>>> p1 == p2
12461
False
12462
\end{verbatim}
12463
%
12464
The {\tt is} operator indicates that {\tt p1} and {\tt p2} are not the
12465
same object, which is what we expected.  But you might have expected
12466
{\tt ==} to yield {\tt True} because these points contain the same
12467
data.  In that case, you will be disappointed to learn that for
12468
instances, the default behavior of the {\tt ==} operator is the same
12469
as the {\tt is} operator; it checks object identity, not object
12470
equivalence.  That's because for programmer-defined types, Python doesn't
12471
know what should be considered equivalent.  At least, not yet.
12472
\index{is operator}
12473
\index{operator!is}
12474
\index{identity}
12475
\index{equivalence}
12476

12477
If you use {\tt copy.copy} to duplicate a Rectangle, you will find
12478
that it copies the Rectangle object but not the embedded Point.
12479
\index{embedded object!copying}
12480

12481
\begin{verbatim}
12482
>>> box2 = copy.copy(box)
12483
>>> box2 is box
12484
False
12485
>>> box2.corner is box.corner
12486
True
12487
\end{verbatim}
12488

12489
\begin{figure}
12490
\centerline
12491
{\includegraphics[scale=0.8]{figs/rectangle2.pdf}}
12492
\caption{Object diagram.}
12493
\label{fig.rectangle2}
12494
\end{figure}
12495

12496
Figure~\ref{fig.rectangle2} shows what the object diagram looks like.
12497
\index{state diagram}
12498
\index{diagram!state}
12499
\index{object diagram}
12500
\index{diagram!object}
12501
This operation is called a {\bf shallow copy} because it copies the
12502
object and any references it contains, but not the embedded objects.
12503
\index{shallow copy}
12504
\index{copy!shallow}
12505

12506
For most applications, this is not what you want.  In this example,
12507
invoking \verb"grow_rectangle" on one of the Rectangles would not
12508
affect the other, but invoking \verb"move_rectangle" on either would
12509
affect both!  This behavior is confusing and error-prone.
12510
\index{deep copy}
12511
\index{copy!deep}
12512

12513
Fortunately, the {\tt copy} module provides a method named {\tt
12514
deepcopy} that copies not only the object but also 
12515
the objects it refers to, and the objects {\em they} refer to,
12516
and so on.
12517
You will not be surprised to learn that this operation is
12518
called a {\bf deep copy}.
12519
\index{deepcopy function}
12520
\index{function!deepcopy}
12521

12522
\begin{verbatim}
12523
>>> box3 = copy.deepcopy(box)
12524
>>> box3 is box
12525
False
12526
>>> box3.corner is box.corner
12527
False
12528
\end{verbatim}
12529
%
12530
{\tt box3} and {\tt box} are completely separate objects.
12531

12532
As an exercise, write a version of \verb"move_rectangle" that creates and
12533
returns a new Rectangle instead of modifying the old one.
12534

12535

12536
\section{Debugging}
12537
\label{hasattr}
12538
\index{debugging}
12539

12540
When you start working with objects, you are likely to encounter
12541
some new exceptions.  If you try to access an attribute
12542
that doesn't exist, you get an {\tt AttributeError}:
12543
\index{exception!AttributeError}
12544
\index{AttributeError}
12545

12546
\begin{verbatim}
12547
>>> p = Point()
12548
>>> p.x = 3
12549
>>> p.y = 4
12550
>>> p.z
12551
AttributeError: Point instance has no attribute 'z'
12552
\end{verbatim}
12553
%
12554
If you are not sure what type an object is, you can ask:
12555
\index{type function}
12556
\index{function!type}
12557

12558
\begin{verbatim}
12559
>>> type(p)
12560
<class '__main__.Point'>
12561
\end{verbatim}
12562
%
12563
You can also use {\tt isinstance} to check whether an object
12564
is an instance of a class:
12565
\index{isinstance function}
12566
\index{function!isinstance}
12567

12568
\begin{verbatim}
12569
>>> isinstance(p, Point)
12570
True
12571
\end{verbatim}
12572
%
12573
If you are not sure whether an object has a particular attribute,
12574
you can use the built-in function {\tt hasattr}:
12575
\index{hasattr function}
12576
\index{function!hasattr}
12577

12578
\begin{verbatim}
12579
>>> hasattr(p, 'x')
12580
True
12581
>>> hasattr(p, 'z')
12582
False
12583
\end{verbatim}
12584
%
12585
The first argument can be any object; the second argument is a {\em
12586
string} that contains the name of the attribute.
12587
\index{attribute}
12588

12589
You can also use a {\tt try} statement to see if the object has the
12590
attributes you need:
12591
\index{try statement}
12592
\index{statement!try}
12593

12594
\begin{verbatim}
12595
try:
12596
    x = p.x
12597
except AttributeError:
12598
    x = 0
12599
\end{verbatim}
12600

12601
This approach can make it easier to write functions that work with
12602
different types; more on that topic is
12603
coming up in Section~\ref{polymorphism}.
12604

12605

12606
\section{Glossary}
12607

12608
\begin{description}
12609

12610
\item[class:] A programmer-defined type.  A class definition creates a new
12611
class object.
12612
\index{class}
12613
\index{programmer-defined type}
12614
\index{type!programmer-defined}
12615

12616
\item[class object:] An object that contains information about a
12617
programmer-defined type.  The class object can be used to create instances
12618
of the type.
12619
\index{class object}
12620
\index{object!class}
12621

12622
\item[instance:] An object that belongs to a class.
12623
\index{instance}
12624

12625
\item[instantiate:] To create a new object.
12626
\index{instantiate}
12627

12628
\item[attribute:] One of the named values associated with an object.
12629
\index{attribute!instance}
12630
\index{instance attribute}
12631

12632
\item[embedded object:] An object that is stored as an attribute
12633
of another object.
12634
\index{embedded object}
12635
\index{object!embedded}
12636

12637
\item[shallow copy:] To copy the contents of an object, including
12638
any references to embedded objects;
12639
implemented by the {\tt copy} function in the {\tt copy} module.
12640
\index{shallow copy}
12641

12642
\item[deep copy:] To copy the contents of an object as well as any
12643
embedded objects, and any objects embedded in them, and so on;
12644
implemented by the {\tt deepcopy} function in the {\tt copy} module.
12645
\index{deep copy}
12646

12647
\item[object diagram:] A diagram that shows objects, their
12648
attributes, and the values of the attributes.
12649
\index{object diagram}
12650
\index{diagram!object}
12651

12652
\end{description}
12653

12654

12655
\section{Exercises}
12656

12657
\begin{exercise}
12658

12659
Write a definition for a class named {\tt Circle} with attributes
12660
{\tt center} and {\tt radius}, where {\tt center} is a Point object
12661
and radius is a number.
12662

12663
Instantiate a Circle object that represents a circle with its center
12664
at $(150, 100)$ and radius 75.
12665

12666
Write a function named \verb"point_in_circle" that takes a Circle and
12667
a Point and returns True if the Point lies in or on the boundary of
12668
the circle.
12669

12670
Write a function named \verb"rect_in_circle" that takes a Circle and a
12671
Rectangle and returns True if the Rectangle lies entirely in or on the boundary
12672
of the circle.
12673

12674
Write a function named \verb"rect_circle_overlap" that takes a Circle
12675
and a Rectangle and returns True if any of the corners of the Rectangle fall
12676
inside the circle.  Or as a more challenging version, return True if
12677
any part of the Rectangle falls inside the circle.
12678

12679
Solution: \url{http://thinkpython2.com/code/Circle.py}.
12680

12681
\end{exercise}
12682

12683

12684
\begin{exercise}
12685

12686
Write a function called \verb"draw_rect" that takes a Turtle object
12687
and a Rectangle and uses the Turtle to draw the Rectangle.  See
12688
Chapter~\ref{turtlechap} for examples using Turtle objects.
12689

12690
Write a function called \verb"draw_circle" that takes a Turtle and
12691
a Circle and draws the Circle.
12692

12693
Solution: \url{http://thinkpython2.com/code/draw.py}.
12694

12695
\end{exercise}
12696

12697

12698

12699
\chapter{Classes and functions}
12700
\label{time}
12701

12702
Now that we know how to create new types, the next
12703
step is to write functions that take programmer-defined objects
12704
as parameters and return them as results.  In this chapter I
12705
also present ``functional programming style'' and two new
12706
program development plans.
12707

12708
Code examples from this chapter are available from
12709
\url{http://thinkpython2.com/code/Time1.py}.
12710
Solutions to the exercises are at
12711
\url{http://thinkpython2.com/code/Time1_soln.py}.
12712

12713

12714
\section{Time}
12715
\label{isafter}
12716

12717
As another example of a programmer-defined type, we'll define a class
12718
called {\tt Time} that records the time of day.  The class definition
12719
looks like this: \index{programmer-defined type}
12720
\index{type!programmer-defined} \index{Time class} \index{class!Time}
12721

12722
\begin{verbatim}
12723
class Time:
12724
    """Represents the time of day.
12725
       
12726
    attributes: hour, minute, second
12727
    """
12728
\end{verbatim}
12729
%
12730
We can create a new {\tt Time} object and assign
12731
attributes for hours, minutes, and seconds:
12732

12733
\begin{verbatim}
12734
time = Time()
12735
time.hour = 11
12736
time.minute = 59
12737
time.second = 30
12738
\end{verbatim}
12739
%
12740
The state diagram for the {\tt Time} object looks like Figure~\ref{fig.time}.
12741
\index{state diagram}
12742
\index{diagram!state}
12743
\index{object diagram}
12744
\index{diagram!object}
12745

12746
As an exercise, write a function called \verb"print_time" that takes a 
12747
Time object and prints it in the form {\tt hour:minute:second}.
12748
Hint: the format sequence \verb"'%.2d'" prints an integer using
12749
at least two digits, including a leading zero if necessary.
12750

12751
Write a boolean function called \verb"is_after" that
12752
takes two Time objects, {\tt t1} and {\tt t2}, and
12753
returns {\tt True} if {\tt t1} follows {\tt t2} chronologically and
12754
{\tt False} otherwise.  Challenge: don't use an {\tt if} statement.
12755

12756
\begin{figure}
12757
\centerline
12758
{\includegraphics[scale=0.8]{figs/time.pdf}}
12759
\caption{Object diagram.}
12760
\label{fig.time}
12761
\end{figure}
12762

12763

12764
\section{Pure functions}
12765
\index{prototype and patch}
12766
\index{development plan!prototype and patch}
12767

12768
In the next few sections, we'll write two functions that add time
12769
values.  They demonstrate two kinds of functions: pure functions and
12770
modifiers.  They also demonstrate a development plan I'll call {\bf
12771
  prototype and patch}, which is a way of tackling a complex problem
12772
by starting with a simple prototype and incrementally dealing with the
12773
complications.
12774

12775
Here is a simple prototype of \verb"add_time":
12776

12777
\begin{verbatim}
12778
def add_time(t1, t2):
12779
    sum = Time()
12780
    sum.hour = t1.hour + t2.hour
12781
    sum.minute = t1.minute + t2.minute
12782
    sum.second = t1.second + t2.second
12783
    return sum
12784
\end{verbatim}
12785
%
12786
The function creates a new {\tt Time} object, initializes its
12787
attributes, and returns a reference to the new object.  This is called
12788
a {\bf pure function} because it does not modify any of the objects
12789
passed to it as arguments and it has no effect,
12790
like displaying a value or getting user input, 
12791
other than returning a value.
12792
\index{pure function}
12793
\index{function type!pure}
12794

12795
To test this function, I'll create two Time objects: {\tt start}
12796
contains the start time of a movie, like {\em Monty Python and the
12797
Holy Grail}, and {\tt duration} contains the run time of the movie,
12798
which is one hour 35 minutes.
12799
\index{Monty Python and the Holy Grail}
12800

12801
\verb"add_time" figures out when the movie will be done.
12802

12803
\begin{verbatim}
12804
>>> start = Time()
12805
>>> start.hour = 9
12806
>>> start.minute = 45
12807
>>> start.second =  0
12808

12809
>>> duration = Time()
12810
>>> duration.hour = 1
12811
>>> duration.minute = 35
12812
>>> duration.second = 0
12813

12814
>>> done = add_time(start, duration)
12815
>>> print_time(done)
12816
10:80:00
12817
\end{verbatim}
12818
%
12819
The result, {\tt 10:80:00} might not be what you were hoping
12820
for.  The problem is that this function does not deal with cases where the
12821
number of seconds or minutes adds up to more than sixty.  When that
12822
happens, we have to ``carry'' the extra seconds into the minute column
12823
or the extra minutes into the hour column.
12824
\index{carrying, addition with}
12825

12826
Here's an improved version:
12827

12828
\begin{verbatim}
12829
def add_time(t1, t2):
12830
    sum = Time()
12831
    sum.hour = t1.hour + t2.hour
12832
    sum.minute = t1.minute + t2.minute
12833
    sum.second = t1.second + t2.second
12834

12835
    if sum.second >= 60:
12836
        sum.second -= 60
12837
        sum.minute += 1
12838

12839
    if sum.minute >= 60:
12840
        sum.minute -= 60
12841
        sum.hour += 1
12842

12843
    return sum
12844
\end{verbatim}
12845
%
12846
Although this function is correct, it is starting to get big.
12847
We will see a shorter alternative later.
12848

12849

12850
\section{Modifiers}
12851
\label{increment}
12852
\index{modifier}
12853
\index{function type!modifier}
12854

12855
Sometimes it is useful for a function to modify the objects it gets as
12856
parameters.  In that case, the changes are visible to the caller.
12857
Functions that work this way are called {\bf modifiers}.
12858
\index{increment}
12859

12860
{\tt increment}, which adds a given number of seconds to a {\tt Time}
12861
object, can be written naturally as a
12862
modifier.  Here is a rough draft:
12863

12864
\begin{verbatim}
12865
def increment(time, seconds):
12866
    time.second += seconds
12867

12868
    if time.second >= 60:
12869
        time.second -= 60
12870
        time.minute += 1
12871

12872
    if time.minute >= 60:
12873
        time.minute -= 60
12874
        time.hour += 1
12875
\end{verbatim}
12876
%
12877
The first line performs the basic operation; the remainder deals
12878
with the special cases we saw before.
12879
\index{special case}
12880

12881
Is this function correct?  What happens if {\tt seconds}
12882
is much greater than sixty?  
12883

12884
In that case, it is not enough to carry once; we have to keep doing it
12885
until {\tt time.second} is less than sixty.  One solution is to
12886
replace the {\tt if} statements with {\tt while} statements.  That
12887
would make the function correct, but not very efficient.  As an
12888
exercise, write a correct version of {\tt increment} that doesn't
12889
contain any loops.
12890

12891
Anything that can be done with modifiers can also be done with pure
12892
functions.  In fact, some programming languages only allow pure
12893
functions.  There is some evidence that programs that use pure
12894
functions are faster to develop and less error-prone than programs
12895
that use modifiers.  But modifiers are convenient at times,
12896
and functional programs tend to be less efficient.
12897

12898
In general, I recommend that you write pure functions whenever it is
12899
reasonable and resort to modifiers only if there is a compelling
12900
advantage.  This approach might be called a {\bf functional
12901
programming style}.
12902
\index{functional programming style}
12903

12904
As an exercise, write a ``pure'' version of {\tt increment} that
12905
creates and returns a new Time object rather than modifying the
12906
parameter.
12907

12908

12909
\section{Prototyping versus planning}
12910
\label{prototype}
12911
\index{prototype and patch}
12912
\index{development plan!prototype and patch}
12913
\index{planned development}
12914
\index{development plan!designed}
12915

12916
The development plan I am demonstrating is called ``prototype and
12917
patch''.  For each function, I wrote a prototype that performed the
12918
basic calculation and then tested it, patching errors along the
12919
way.
12920

12921
This approach can be effective, especially if you don't yet have a
12922
deep understanding of the problem.  But incremental corrections can
12923
generate code that is unnecessarily complicated---since it deals with
12924
many special cases---and unreliable---since it is hard to know if you
12925
have found all the errors.
12926

12927
An alternative is {\bf designed development}, in which high-level
12928
insight into the problem can make the programming much easier.  In
12929
this case, the insight is that a Time object is really a three-digit
12930
number in base 60 (see \url{http://en.wikipedia.org/wiki/Sexagesimal}.)!  The
12931
{\tt second} attribute is the ``ones column'', the {\tt minute}
12932
attribute is the ``sixties column'', and the {\tt hour} attribute is
12933
the ``thirty-six hundreds column''.
12934
\index{sexagesimal}
12935

12936
When we wrote \verb"add_time" and {\tt increment}, we were effectively
12937
doing addition in base 60, which is why we had to carry from one
12938
column to the next.
12939
\index{carrying, addition with}
12940

12941
This observation suggests another approach to the whole problem---we
12942
can convert Time objects to integers and take advantage of the fact
12943
that the computer knows how to do integer arithmetic.  
12944

12945
Here is a function that converts Times to integers:
12946

12947
\begin{verbatim}
12948
def time_to_int(time):
12949
    minutes = time.hour * 60 + time.minute
12950
    seconds = minutes * 60 + time.second
12951
    return seconds
12952
\end{verbatim}
12953
%
12954
And here is a function that converts an integer to a Time
12955
(recall that {\tt divmod} divides the first argument by the second
12956
and returns the quotient and remainder as a tuple).
12957
\index{divmod}
12958

12959
\begin{verbatim}
12960
def int_to_time(seconds):
12961
    time = Time()
12962
    minutes, time.second = divmod(seconds, 60)
12963
    time.hour, time.minute = divmod(minutes, 60)
12964
    return time
12965
\end{verbatim}
12966
%
12967
You might have to think a bit, and run some tests, to convince
12968
yourself that these functions are correct.  One way to test them is to
12969
check that \verb"time_to_int(int_to_time(x)) == x" for many values of
12970
{\tt x}.  This is an example of a consistency check.
12971
\index{consistency check}
12972

12973
Once you are convinced they are correct, you can use them to 
12974
rewrite \verb"add_time":
12975

12976
\begin{verbatim}
12977
def add_time(t1, t2):
12978
    seconds = time_to_int(t1) + time_to_int(t2)
12979
    return int_to_time(seconds)
12980
\end{verbatim}
12981
%
12982
This version is shorter than the original, and easier to verify.  As
12983
an exercise, rewrite {\tt increment} using \verb"time_to_int" and
12984
\verb"int_to_time".
12985

12986
In some ways, converting from base 60 to base 10 and back is harder
12987
than just dealing with times.  Base conversion is more abstract; our
12988
intuition for dealing with time values is better.
12989

12990
But if we have the insight to treat times as base 60 numbers and make
12991
the investment of writing the conversion functions (\verb"time_to_int"
12992
and \verb"int_to_time"), we get a program that is shorter, easier to
12993
read and debug, and more reliable.
12994

12995
It is also easier to add features later.  For example, imagine
12996
subtracting two Times to find the duration between them.  The
12997
naive approach would be to implement subtraction with borrowing.
12998
Using the conversion functions would be easier and more likely to be
12999
correct.
13000
\index{subtraction with borrowing}
13001
\index{borrowing, subtraction with}
13002
\index{generalization}
13003

13004
Ironically, sometimes making a problem harder (or more general) makes it
13005
easier (because there are fewer special cases and fewer opportunities
13006
for error).
13007

13008

13009
\section{Debugging}
13010
\index{debugging}
13011

13012
A Time object is well-formed if the values of {\tt minute} and {\tt
13013
second} are between 0 and 60 (including 0 but not 60) and if 
13014
{\tt hour} is positive.  {\tt hour} and {\tt minute} should be
13015
integral values, but we might allow {\tt second} to have a
13016
fraction part.
13017
\index{invariant}
13018

13019
Requirements like these are called {\bf invariants} because
13020
they should always be true.  To put it a different way, if they
13021
are not true, something has gone wrong.
13022

13023
Writing code to check invariants can help detect errors
13024
and find their causes.  For example, you might have a function
13025
like \verb"valid_time" that takes a Time object and returns
13026
{\tt False} if it violates an invariant:
13027

13028
\begin{verbatim}
13029
def valid_time(time):
13030
    if time.hour < 0 or time.minute < 0 or time.second < 0:
13031
        return False
13032
    if time.minute >= 60 or time.second >= 60:
13033
        return False
13034
    return True
13035
\end{verbatim}
13036
%
13037
At the beginning of each function you could check the
13038
arguments to make sure they are valid:
13039
\index{raise statement}
13040
\index{statement!raise}
13041

13042
\begin{verbatim}
13043
def add_time(t1, t2):
13044
    if not valid_time(t1) or not valid_time(t2):
13045
        raise ValueError('invalid Time object in add_time')
13046
    seconds = time_to_int(t1) + time_to_int(t2)
13047
    return int_to_time(seconds)
13048
\end{verbatim}
13049
%
13050
Or you could use an {\bf assert statement}, which checks a given invariant
13051
and raises an exception if it fails:
13052
\index{assert statement}
13053
\index{statement!assert}
13054

13055
\begin{verbatim}
13056
def add_time(t1, t2):
13057
    assert valid_time(t1) and valid_time(t2)
13058
    seconds = time_to_int(t1) + time_to_int(t2)
13059
    return int_to_time(seconds)
13060
\end{verbatim}
13061
%
13062
{\tt assert} statements are useful because they distinguish
13063
code that deals with normal conditions from code
13064
that checks for errors.
13065

13066

13067
\section{Glossary}
13068

13069
\begin{description}
13070

13071
\item[prototype and patch:] A development plan that involves
13072
writing a rough draft of a program, testing, and correcting errors as
13073
they are found.
13074
\index{prototype and patch}
13075

13076
\item[designed development:] A development plan that involves
13077
high-level insight into the problem and more planning than incremental
13078
development or prototype development.
13079
\index{designed development}
13080

13081
\item[pure function:] A function that does not modify any of the objects it
13082
receives as arguments.  Most pure functions are fruitful.
13083
\index{pure function}
13084

13085
\item[modifier:] A function that changes one or more of the objects it
13086
  receives as arguments.  Most modifiers are void; that is, they
13087
  return {\tt None}.  \index{modifier}
13088

13089
\item[functional programming style:] A style of program design in which the
13090
majority of functions are pure.
13091
\index{functional programming style}
13092

13093
\item[invariant:] A condition that should always be true during the
13094
execution of a program.
13095
\index{invariant}
13096

13097
\item[assert statement:] A statement that check a condition and raises
13098
an exception if it fails.
13099
\index{assert statement}
13100
\index{statement!assert}
13101

13102
\end{description}
13103

13104

13105
\section{Exercises}
13106

13107
Code examples from this chapter are available from
13108
\url{http://thinkpython2.com/code/Time1.py}; solutions to the
13109
exercises are available from \url{http://thinkpython2.com/code/Time1_soln.py}.
13110

13111
\begin{exercise}
13112

13113
Write a function called \verb"mul_time" that takes a Time object
13114
and a number and returns a new Time object that contains
13115
the product of the original Time and the number.
13116

13117
Then use \verb"mul_time" to write a function that takes a Time
13118
object that represents the finishing time in a race, and a number
13119
that represents the distance, and returns a Time object that represents
13120
the average pace (time per mile).
13121
\index{running pace}
13122

13123
\end{exercise}
13124

13125

13126
\begin{exercise}
13127
\index{datetime module}
13128
\index{module!datetime}
13129

13130
The {\tt datetime} module provides {\tt time} objects
13131
that are similar to the Time objects in this chapter, but
13132
they provide a rich set of methods and operators.  Read the
13133
documentation at \url{http://docs.python.org/3/library/datetime.html}.
13134

13135
\begin{enumerate}
13136

13137
\item Use the {\tt datetime} module to write a program that gets the
13138
  current date and prints the day of the week.
13139

13140
\item Write a program that takes a birthday as input and prints the
13141
  user's age and the number of days, hours, minutes and seconds until
13142
  their next birthday.
13143
\index{birthday}
13144

13145
\item For two people born on different days, there is a day when one
13146
  is twice as old as the other.  That's their Double Day.  Write a
13147
  program that takes two birth dates and computes their Double Day.
13148

13149
\item For a little more challenge, write the more general version that
13150
  computes the day when one person is $n$ times older than the other.
13151
\index{Double Day}
13152

13153
\end{enumerate}
13154

13155
Solution: \url{http://thinkpython2.com/code/double.py}
13156

13157
\end{exercise}
13158

13159

13160
\chapter{Classes and methods}
13161

13162
Although we are using some of Python's object-oriented features,
13163
the programs from the last two chapters are not really
13164
object-oriented because they don't represent the relationships
13165
between programmer-defined types and the functions that operate
13166
on them.  The next step is to transform those functions into
13167
methods that make the relationships explicit.
13168

13169
Code examples from this chapter are available from
13170
\url{http://thinkpython2.com/code/Time2.py}, and solutions
13171
to the exercises are in \url{http://thinkpython2.com/code/Point2_soln.py}.
13172

13173

13174
\section{Object-oriented features}
13175
\index{object-oriented programming}
13176

13177
Python is an {\bf object-oriented programming language}, which means
13178
that it provides features that support object-oriented
13179
programming, which has these defining characteristics:
13180

13181
\begin{itemize}
13182

13183
\item Programs include class and method definitions.
13184

13185
\item Most of the computation is expressed in terms of operations on
13186
  objects.
13187

13188
\item Objects often represent things
13189
in the real world, and methods often
13190
correspond to the ways things in the real world interact.
13191

13192
\end{itemize}
13193

13194
For example, the {\tt Time} class defined in Chapter~\ref{time}
13195
corresponds to the way people record the time of day, and the
13196
functions we defined correspond to the kinds of things people do with
13197
times.  Similarly, the {\tt Point} and {\tt Rectangle} classes
13198
in Chapter~\ref{clobjects}
13199
correspond to the mathematical concepts of a point and a rectangle.
13200

13201
So far, we have not taken advantage of the features Python provides to
13202
support object-oriented programming.  These
13203
features are not strictly necessary; most of them provide
13204
alternative syntax for things we have already done.  But in many cases,
13205
the alternative is more concise and more accurately conveys the
13206
structure of the program.
13207

13208
For example, in {\tt Time1.py} there is no obvious
13209
connection between the class definition and the function definitions
13210
that follow.  With some examination, it is apparent that every function
13211
takes at least one {\tt Time} object as an argument.
13212
\index{method}
13213
\index{function}
13214

13215
This observation is the motivation for {\bf methods}; a method is
13216
a function that is associated with a particular class.
13217
We have seen methods for strings, lists, dictionaries and tuples.
13218
In this chapter, we will define methods for programmer-defined types.
13219
\index{syntax}
13220
\index{semantics}
13221
\index{programmer-defined type}
13222
\index{type!programmer-defined}
13223

13224
Methods are semantically the same as functions, but there are
13225
two syntactic differences:
13226

13227
\begin{itemize}
13228

13229
\item Methods are defined inside a class definition in order
13230
to make the relationship between the class and the method explicit.
13231

13232
\item The syntax for invoking a method is different from the
13233
syntax for calling a function.
13234

13235
\end{itemize}
13236

13237
In the next few sections, we will take the functions from the previous
13238
two chapters and transform them into methods.  This transformation is
13239
purely mechanical; you can do it by following a sequence of
13240
steps.  If you are comfortable converting from one form to another,
13241
you will be able to choose the best form for whatever you are doing.
13242

13243

13244
\section{Printing objects}
13245
\index{object!printing}
13246

13247
In Chapter~\ref{time}, we defined a class named
13248
{\tt Time} and in Section~\ref{isafter}, you 
13249
wrote a function named \verb"print_time":
13250

13251
\begin{verbatim}
13252
class Time:
13253
    """Represents the time of day."""
13254

13255
def print_time(time):
13256
    print('%.2d:%.2d:%.2d' % (time.hour, time.minute, time.second))
13257
\end{verbatim}
13258
%
13259
To call this function, you have to pass a {\tt Time} object as an
13260
argument:
13261

13262
\begin{verbatim}
13263
>>> start = Time()
13264
>>> start.hour = 9
13265
>>> start.minute = 45
13266
>>> start.second = 00
13267
>>> print_time(start)
13268
09:45:00
13269
\end{verbatim}
13270
%
13271
To make \verb"print_time" a method, all we have to do is
13272
move the function definition inside the class definition.  Notice
13273
the change in indentation.
13274
\index{indentation}
13275

13276
\begin{verbatim}
13277
class Time:
13278
    def print_time(time):
13279
        print('%.2d:%.2d:%.2d' % (time.hour, time.minute, time.second))
13280
\end{verbatim}
13281
%
13282
Now there are two ways to call \verb"print_time".  The first
13283
(and less common) way is to use function syntax:
13284
\index{function syntax}
13285
\index{dot notation}
13286

13287
\begin{verbatim}
13288
>>> Time.print_time(start)
13289
09:45:00
13290
\end{verbatim}
13291
%
13292
In this use of dot notation, {\tt Time} is the name of the class,
13293
and \verb"print_time" is the name of the method.  {\tt start} is
13294
passed as a parameter.
13295

13296
The second (and more concise) way is to use method syntax:
13297
\index{method syntax}
13298

13299
\begin{verbatim}
13300
>>> start.print_time()
13301
09:45:00
13302
\end{verbatim}
13303
%
13304
In this use of dot notation, \verb"print_time" is the name of the
13305
method (again), and {\tt start} is the object the method is
13306
invoked on, which is called the {\bf subject}.  Just as the
13307
subject of a sentence is what the sentence is about, the subject
13308
of a method invocation is what the method is about.
13309
\index{subject}
13310

13311
Inside the method, the subject is assigned to the first
13312
parameter, so in this case {\tt start} is assigned
13313
to {\tt time}.
13314
\index{self (parameter name)}
13315
\index{parameter!self}
13316

13317
By convention, the first parameter of a method is
13318
called {\tt self}, so it would be more common to write
13319
\verb"print_time" like this:
13320

13321
\begin{verbatim}
13322
class Time:
13323
    def print_time(self):
13324
        print('%.2d:%.2d:%.2d' % (self.hour, self.minute, self.second))
13325
\end{verbatim}
13326
%
13327
The reason for this convention is an implicit metaphor:
13328
\index{metaphor, method invocation}
13329

13330
\begin{itemize}
13331

13332
\item The syntax for a function call, \verb"print_time(start)",
13333
  suggests that the function is the active agent.  It says something
13334
  like, ``Hey \verb"print_time"!  Here's an object for you to print.''
13335

13336
\item In object-oriented programming, the objects are the active
13337
  agents.  A method invocation like \verb"start.print_time()" says
13338
  ``Hey {\tt start}!  Please print yourself.''
13339

13340
\end{itemize}
13341

13342
This change in perspective might be more polite, but it is not obvious
13343
that it is useful.  In the examples we have seen so far, it may not
13344
be.  But sometimes shifting responsibility from the functions onto the
13345
objects makes it possible to write more versatile functions (or
13346
methods), and makes it easier to maintain and reuse code.
13347

13348
As an exercise, rewrite \verb"time_to_int" (from
13349
Section~\ref{prototype}) as a method.  You might be tempted to
13350
rewrite \verb"int_to_time" as a method, too, but that doesn't
13351
really make sense because there would be no object to invoke
13352
it on.
13353

13354

13355
\section{Another example}
13356
\index{increment}
13357

13358
Here's a version of {\tt increment} (from Section~\ref{increment})
13359
rewritten as a method:
13360

13361
\begin{verbatim}
13362
# inside class Time:
13363

13364
    def increment(self, seconds):
13365
        seconds += self.time_to_int()
13366
        return int_to_time(seconds)
13367
\end{verbatim}
13368
%
13369
This version assumes that \verb"time_to_int" is written
13370
as a method.  Also, note that
13371
it is a pure function, not a modifier.
13372

13373
Here's how you would invoke {\tt increment}:
13374

13375
\begin{verbatim}
13376
>>> start.print_time()
13377
09:45:00
13378
>>> end = start.increment(1337)
13379
>>> end.print_time()
13380
10:07:17
13381
\end{verbatim}
13382
%
13383
The subject, {\tt start}, gets assigned to the first parameter,
13384
{\tt self}.  The argument, {\tt 1337}, gets assigned to the
13385
second parameter, {\tt seconds}.
13386

13387
This mechanism can be confusing, especially if you make an error.
13388
For example, if you invoke {\tt increment} with two arguments, you
13389
get:
13390
\index{exception!TypeError}
13391
\index{TypeError}
13392

13393
\begin{verbatim}
13394
>>> end = start.increment(1337, 460)
13395
TypeError: increment() takes 2 positional arguments but 3 were given
13396
\end{verbatim}
13397
%
13398
The error message is initially confusing, because there are
13399
only two arguments in parentheses.  But the subject is also
13400
considered an argument, so all together that's three.
13401

13402
By the way, a {\bf positional argument} is an argument that
13403
doesn't have a parameter name; that is, it is not a keyword
13404
argument.  In this function call:
13405
\index{positional argument}
13406
\index{argument!positional}
13407

13408
\begin{verbatim}
13409
sketch(parrot, cage, dead=True)
13410
\end{verbatim}
13411

13412
{\tt parrot} and {\tt cage} are positional, and {\tt dead} is
13413
a keyword argument.
13414

13415

13416
\section{A more complicated example}
13417

13418
Rewriting \verb"is_after" (from Section~\ref{isafter}) is slightly
13419
more complicated because it takes two Time objects as parameters.  In
13420
this case it is conventional to name the first parameter {\tt self}
13421
and the second parameter {\tt other}: \index{other (parameter name)}
13422
\index{parameter!other}
13423

13424
\begin{verbatim}
13425
# inside class Time:
13426

13427
    def is_after(self, other):
13428
        return self.time_to_int() > other.time_to_int()
13429
\end{verbatim}
13430
%
13431
To use this method, you have to invoke it on one object and pass
13432
the other as an argument:
13433

13434
\begin{verbatim}
13435
>>> end.is_after(start)
13436
True
13437
\end{verbatim}
13438
%
13439
One nice thing about this syntax is that it almost reads
13440
like English: ``end is after start?''
13441

13442

13443
\section{The init method}
13444
\index{init method}
13445
\index{method!init}
13446

13447
The init method (short for ``initialization'') is
13448
a special method that gets invoked when an object is instantiated.  
13449
Its full name is \verb"__init__" (two underscore characters,
13450
followed by {\tt init}, and then two more underscores).  An
13451
init method for the {\tt Time} class might look like this:
13452

13453
\begin{verbatim}
13454
# inside class Time:
13455

13456
    def __init__(self, hour=0, minute=0, second=0):
13457
        self.hour = hour
13458
        self.minute = minute
13459
        self.second = second
13460
\end{verbatim}
13461
%
13462
It is common for the parameters of \verb"__init__"
13463
to have the same names as the attributes.  The statement
13464

13465
\begin{verbatim}
13466
        self.hour = hour
13467
\end{verbatim}
13468
%
13469
stores the value of the parameter {\tt hour} as an attribute
13470
of {\tt self}.
13471
\index{optional parameter}
13472
\index{parameter!optional}
13473
\index{default value}
13474
\index{override}
13475

13476
The parameters are optional, so if you call {\tt Time} with
13477
no arguments, you get the default values.
13478

13479
\begin{verbatim}
13480
>>> time = Time()
13481
>>> time.print_time()
13482
00:00:00
13483
\end{verbatim}
13484
%
13485
If you provide one argument, it overrides {\tt hour}:
13486

13487
\begin{verbatim}
13488
>>> time = Time (9)
13489
>>> time.print_time()
13490
09:00:00
13491
\end{verbatim}
13492
%
13493
If you provide two arguments, they override {\tt hour} and
13494
{\tt minute}.
13495

13496
\begin{verbatim}
13497
>>> time = Time(9, 45)
13498
>>> time.print_time()
13499
09:45:00
13500
\end{verbatim}
13501
%
13502
And if you provide three arguments, they override all three
13503
default values.
13504

13505
As an exercise, write an init method for the {\tt Point} class that takes
13506
{\tt x} and {\tt y} as optional parameters and assigns
13507
them to the corresponding attributes.
13508
\index{Point class}
13509
\index{class!Point}
13510

13511

13512
\section{The {\tt \_\_str\_\_} method}
13513
\index{str method@\_\_str\_\_ method}
13514
\index{method!\_\_str\_\_}
13515

13516
\verb"__str__" is a special method, like \verb"__init__",
13517
that is supposed to return a string representation of an object.
13518
\index{string representation}
13519

13520
For example, here is a {\tt str} method for Time objects:
13521

13522
\begin{verbatim}
13523
# inside class Time:
13524

13525
    def __str__(self):
13526
        return '%.2d:%.2d:%.2d' % (self.hour, self.minute, self.second)
13527
\end{verbatim}
13528
%
13529
When you {\tt print} an object, Python invokes the {\tt str} method:
13530
\index{print statement}
13531
\index{statement!print}
13532

13533
\begin{verbatim}
13534
>>> time = Time(9, 45)
13535
>>> print(time)
13536
09:45:00
13537
\end{verbatim}
13538
%
13539
When I write a new class, I almost always start by writing 
13540
\verb"__init__", which makes it easier to instantiate objects, and 
13541
\verb"__str__", which is useful for debugging.
13542

13543
As an exercise, write a {\tt str} method for the {\tt Point} class.
13544
Create a Point object and print it.
13545

13546

13547
\section{Operator overloading}
13548
\label{operator.overloading}
13549

13550
By defining other special methods, you can specify the behavior
13551
of operators on programmer-defined types.  For example, if you define
13552
a method named \verb"__add__" for the {\tt Time} class, you can use the
13553
{\tt +} operator on Time objects.
13554
\index{programmer-defined type}
13555
\index{type!programmer-defined}
13556

13557
Here is what the definition might look like:
13558
\index{add method}
13559
\index{method!add}
13560

13561
\begin{verbatim}
13562
# inside class Time:
13563

13564
    def __add__(self, other):
13565
        seconds = self.time_to_int() + other.time_to_int()
13566
        return int_to_time(seconds)
13567
\end{verbatim}
13568
%
13569
And here is how you could use it:
13570

13571
\begin{verbatim}
13572
>>> start = Time(9, 45)
13573
>>> duration = Time(1, 35)
13574
>>> print(start + duration)
13575
11:20:00
13576
\end{verbatim}
13577
%
13578
When you apply the {\tt +} operator to Time objects, Python invokes
13579
\verb"__add__".  When you print the result, Python invokes 
13580
\verb"__str__".  So there is a lot happening behind the scenes!
13581
\index{operator overloading}
13582

13583
Changing the behavior of an operator so that it works with
13584
programmer-defined types is called {\bf operator overloading}.  For every
13585
operator in Python there is a corresponding special method, like 
13586
\verb"__add__".  For more details, see
13587
\url{http://docs.python.org/3/reference/datamodel.html#specialnames}.
13588

13589
As an exercise, write an {\tt add} method for the Point class.  
13590

13591

13592
\section{Type-based dispatch}
13593

13594
In the previous section we added two Time objects, but you
13595
also might want to add an integer to a Time object.  The
13596
following is a version of \verb"__add__"
13597
that checks the type of {\tt other} and invokes either
13598
\verb"add_time" or {\tt increment}:
13599

13600
\begin{verbatim}
13601
# inside class Time:
13602

13603
    def __add__(self, other):
13604
        if isinstance(other, Time):
13605
            return self.add_time(other)
13606
        else:
13607
            return self.increment(other)
13608

13609
    def add_time(self, other):
13610
        seconds = self.time_to_int() + other.time_to_int()
13611
        return int_to_time(seconds)
13612

13613
    def increment(self, seconds):
13614
        seconds += self.time_to_int()
13615
        return int_to_time(seconds)
13616
\end{verbatim}
13617
%
13618
The built-in function {\tt isinstance} takes a value and a
13619
class object, and returns {\tt True} if the value is an instance
13620
of the class.
13621
\index{isinstance function}
13622
\index{function!isinstance}
13623

13624
If {\tt other} is a Time object, \verb"__add__" invokes
13625
\verb"add_time".  Otherwise it assumes that the parameter
13626
is a number and invokes {\tt increment}.  This operation is
13627
called a {\bf type-based dispatch} because it dispatches the
13628
computation to different methods based on the type of the
13629
arguments.
13630
\index{type-based dispatch}
13631
\index{dispatch, type-based}
13632

13633
Here are examples that use the {\tt +} operator with different
13634
types:
13635

13636
\begin{verbatim}
13637
>>> start = Time(9, 45)
13638
>>> duration = Time(1, 35)
13639
>>> print(start + duration)
13640
11:20:00
13641
>>> print(start + 1337)
13642
10:07:17
13643
\end{verbatim}
13644
%
13645
Unfortunately, this implementation of addition is not commutative.
13646
If the integer is the first operand, you get
13647
\index{commutativity}
13648

13649
\begin{verbatim}
13650
>>> print(1337 + start)
13651
TypeError: unsupported operand type(s) for +: 'int' and 'instance'
13652
\end{verbatim}
13653
%
13654
The problem is, instead of asking the Time object to add an integer,
13655
Python is asking an integer to add a Time object, and it doesn't know
13656
how.  But there is a clever solution for this problem: the
13657
special method \verb"__radd__", which stands for ``right-side add''.
13658
This method is invoked when a Time object appears on the right side of
13659
the {\tt +} operator.  Here's the definition:
13660
\index{radd method}
13661
\index{method!radd}
13662

13663
\begin{verbatim}
13664
# inside class Time:
13665

13666
    def __radd__(self, other):
13667
        return self.__add__(other)
13668
\end{verbatim}
13669
%
13670
And here's how it's used:
13671

13672
\begin{verbatim}
13673
>>> print(1337 + start)
13674
10:07:17
13675
\end{verbatim}
13676
%
13677

13678
As an exercise, write an {\tt add} method for Points that works with
13679
either a Point object or a tuple:
13680

13681
\begin{itemize}
13682

13683
\item If the second operand is a Point, the method should return a new
13684
Point whose $x$ coordinate is the sum of the $x$ coordinates of the
13685
operands, and likewise for the $y$ coordinates.
13686

13687
\item If the second operand is a tuple, the method should add the
13688
first element of the tuple to the $x$ coordinate and the second
13689
element to the $y$ coordinate, and return a new Point with the result. 
13690

13691
\end{itemize}
13692

13693

13694

13695

13696
\section{Polymorphism}
13697
\label{polymorphism}
13698

13699
Type-based dispatch is useful when it is necessary, but (fortunately)
13700
it is not always necessary.  Often you can avoid it by writing functions
13701
that work correctly for arguments with different types.
13702
\index{type-based dispatch}
13703
\index{dispatch!type-based}
13704

13705
Many of the functions we wrote for strings also
13706
work for other sequence types.
13707
For example, in Section~\ref{histogram}
13708
we used {\tt histogram} to count the number of times each letter
13709
appears in a word.
13710

13711
\begin{verbatim}
13712
def histogram(s):
13713
    d = dict()
13714
    for c in s:
13715
        if c not in d:
13716
            d[c] = 1
13717
        else:
13718
            d[c] = d[c]+1
13719
    return d
13720
\end{verbatim}
13721
%
13722
This function also works for lists, tuples, and even dictionaries,
13723
as long as the elements of {\tt s} are hashable, so they can be used
13724
as keys in {\tt d}.
13725

13726
\begin{verbatim}
13727
>>> t = ['spam', 'egg', 'spam', 'spam', 'bacon', 'spam']
13728
>>> histogram(t)
13729
{'bacon': 1, 'egg': 1, 'spam': 4}
13730
\end{verbatim}
13731
%
13732
Functions that work with several types are called {\bf polymorphic}.
13733
Polymorphism can facilitate code reuse.  For example, the built-in
13734
function {\tt sum}, which adds the elements of a sequence, works
13735
as long as the elements of the sequence support addition.
13736
\index{polymorphism}
13737

13738
Since Time objects provide an {\tt add} method, they work
13739
with {\tt sum}:
13740

13741
\begin{verbatim}
13742
>>> t1 = Time(7, 43)
13743
>>> t2 = Time(7, 41)
13744
>>> t3 = Time(7, 37)
13745
>>> total = sum([t1, t2, t3])
13746
>>> print(total)
13747
23:01:00
13748
\end{verbatim}
13749
%
13750
In general, if all of the operations inside a function 
13751
work with a given type, the function works with that type.
13752

13753
The best kind of polymorphism is the unintentional kind, where
13754
you discover that a function you already wrote can be
13755
applied to a type you never planned for.
13756

13757

13758
\section{Debugging}
13759
\index{debugging}
13760

13761
It is legal to add attributes to objects at any point in the execution
13762
of a program, but if you have objects with the same type that don't
13763
have the same attributes, it is easy to make mistakes.
13764
It is considered a good idea to
13765
initialize all of an object's attributes in the init method.
13766
\index{init method}
13767
\index{attribute!initializing}
13768

13769
If you are not sure whether an object has a particular attribute, you
13770
can use the built-in function {\tt hasattr} (see Section~\ref{hasattr}).
13771
\index{hasattr function}
13772
\index{function!hasattr}
13773
\index{dict attribute@\_\_dict\_\_ attribute}
13774
\index{attribute!\_\_dict\_\_}
13775

13776
Another way to access attributes is the built-in function {\tt vars},
13777
which takes an object and returns a dictionary that maps from
13778
attribute names (as strings) to their values:
13779

13780
\begin{verbatim}
13781
>>> p = Point(3, 4)
13782
>>> vars(p)
13783
{'y': 4, 'x': 3}
13784
\end{verbatim}
13785
%
13786
For purposes of debugging, you might find it useful to keep this
13787
function handy:
13788

13789
\begin{verbatim}
13790
def print_attributes(obj):
13791
    for attr in vars(obj):
13792
        print(attr, getattr(obj, attr))
13793
\end{verbatim}
13794
%
13795
\verb"print_attributes" traverses the dictionary
13796
and prints each attribute name and its corresponding value.
13797
\index{traversal!dictionary}
13798
\index{dictionary!traversal}
13799

13800
The built-in function {\tt getattr} takes an object and an attribute
13801
name (as a string) and returns the attribute's value.
13802
\index{getattr function}
13803
\index{function!getattr}
13804

13805

13806
\section{Interface and implementation}
13807

13808
One of the goals of object-oriented design is to make software more
13809
maintainable, which means that you can keep the program working when
13810
other parts of the system change, and modify the program to meet new
13811
requirements.
13812
\index{interface}
13813
\index{implementation}
13814
\index{maintainable}
13815
\index{object-oriented design}
13816

13817
A design principle that helps achieve that goal is to keep
13818
interfaces separate from implementations.  For objects, that means
13819
that the methods a class provides should not depend on how the
13820
attributes are represented.
13821
\index{attribute}
13822

13823
For example, in this chapter we developed a class that represents
13824
a time of day.  Methods provided by this class include
13825
\verb"time_to_int", \verb"is_after", and \verb"add_time".
13826

13827
We could implement those methods in several ways.  The details of the
13828
implementation depend on how we represent time.  In this chapter, the
13829
attributes of a {\tt Time} object are {\tt hour}, {\tt minute}, and
13830
{\tt second}.
13831

13832
As an alternative, we could replace these attributes with
13833
a single integer representing the number of seconds
13834
since midnight.  This implementation would make some methods,
13835
like \verb"is_after", easier to write, but it makes other methods
13836
harder.
13837

13838
After you deploy a new class, you might discover a better
13839
implementation.  If other parts of the program are using your
13840
class, it might be time-consuming and error-prone to change the
13841
interface.  
13842

13843
But if you designed the interface carefully, you can
13844
change the implementation without changing the interface, which
13845
means that other parts of the program don't have to change.
13846

13847

13848
\section{Glossary}
13849

13850
\begin{description}
13851

13852
\item[object-oriented language:] A language that provides features,
13853
  such as programmer-defined types and methods, that facilitate
13854
  object-oriented programming.
13855
\index{object-oriented language}
13856

13857
\item[object-oriented programming:] A style of programming in which
13858
data and the operations that manipulate it are organized into classes
13859
and methods.
13860
\index{object-oriented programming}
13861

13862
\item[method:] A function that is defined inside a class definition and
13863
is invoked on instances of that class.
13864
\index{method}
13865

13866
\item[subject:] The object a method is invoked on.
13867
\index{subject}
13868

13869
\item[positional argument:]  An argument that does not include
13870
a parameter name, so it is not a keyword argument.
13871
\index{positional argument}
13872
\index{argument!positional}
13873

13874
\item[operator overloading:] Changing the behavior of an operator like
13875
{\tt +} so it works with a programmer-defined type.
13876
\index{overloading}
13877
\index{operator!overloading}
13878

13879
\item[type-based dispatch:] A programming pattern that checks the type
13880
of an operand and invokes different functions for different types.
13881
\index{type-based dispatch}
13882

13883
\item[polymorphic:] Pertaining to a function that can work with more
13884
  than one type.  
13885
\index{polymorphism}
13886

13887
\end{description}
13888

13889

13890
\section{Exercises}
13891

13892
\begin{exercise}
13893

13894
Download the code from this chapter from
13895
\url{http://thinkpython2.com/code/Time2.py}.  Change the attributes of
13896
    {\tt Time} to be a single integer representing seconds since
13897
    midnight.  Then modify the methods (and the function
13898
    \verb"int_to_time") to work with the new implementation.  You
13899
    should not have to modify the test code in {\tt main}.  When you
13900
    are done, the output should be the same as before.  Solution:
13901
    \url{http://thinkpython2.com/code/Time2_soln.py}.
13902

13903
\end{exercise}
13904

13905

13906
\begin{exercise}
13907
\label{kangaroo}
13908
\index{default value!avoiding mutable}
13909
\index{mutable object, as default value}
13910
\index{worst bug}
13911
\index{bug!worst}
13912
\index{Kangaroo class}
13913
\index{class!Kangaroo}
13914

13915
This exercise is a cautionary tale about one of the most
13916
common, and difficult to find, errors in Python.
13917
Write a definition for a class named {\tt Kangaroo} with the following
13918
methods:
13919

13920
\begin{enumerate}
13921

13922
\item An \verb"__init__" method that initializes an attribute named 
13923
\verb"pouch_contents" to an empty list.
13924

13925
\item A method named \verb"put_in_pouch" that takes an object
13926
of any type and adds it to \verb"pouch_contents".
13927

13928
\item A \verb"__str__" method that returns a string representation
13929
of the Kangaroo object and the contents of the pouch.
13930

13931
\end{enumerate}
13932
%
13933
Test your code 
13934
by creating two {\tt Kangaroo} objects, assigning them to variables
13935
named {\tt kanga} and {\tt roo}, and then adding {\tt roo} to the
13936
contents of {\tt kanga}'s pouch.
13937

13938
Download \url{http://thinkpython2.com/code/BadKangaroo.py}.  It contains
13939
a solution to the previous problem with one big, nasty bug.
13940
Find and fix the bug.
13941

13942
If you get stuck, you can download
13943
\url{http://thinkpython2.com/code/GoodKangaroo.py}, which explains the
13944
problem and demonstrates a solution.
13945
\index{aliasing}
13946
\index{embedded object}
13947
\index{object!embedded}
13948

13949
\end{exercise}
13950

13951

13952

13953
\chapter{Inheritance}
13954

13955
The language feature most often associated with object-oriented
13956
programming is {\bf inheritance}.  Inheritance is the ability to
13957
define a new class that is a modified version of an existing class.
13958
In this chapter I demonstrate inheritance using classes that represent
13959
playing cards, decks of cards, and poker hands.
13960
\index{deck} 
13961
\index{card, playing} 
13962
\index{poker}
13963

13964
If you don't play
13965
poker, you can read about it at
13966
\url{http://en.wikipedia.org/wiki/Poker}, but you don't have to; I'll
13967
tell you what you need to know for the exercises.
13968

13969
Code examples from
13970
this chapter are available from
13971
\url{http://thinkpython2.com/code/Card.py}.
13972

13973

13974
\section{Card objects}
13975

13976
There are fifty-two cards in a deck, each of which belongs to one of
13977
four suits and one of thirteen ranks.  The suits are Spades, Hearts,
13978
Diamonds, and Clubs (in descending order in bridge).  The ranks are
13979
Ace, 2, 3, 4, 5, 6, 7, 8, 9, 10, Jack, Queen, and King.  Depending on
13980
the game that you are playing, an Ace may be higher than King
13981
or lower than 2.
13982
\index{rank}
13983
\index{suit}
13984

13985
If we want to define a new object to represent a playing card, it is
13986
obvious what the attributes should be: {\tt rank} and
13987
{\tt suit}.  It is not as obvious what type the attributes
13988
should be.  One possibility is to use strings containing words like
13989
\verb"'Spade'" for suits and \verb"'Queen'" for ranks.  One problem with
13990
this implementation is that it would not be easy to compare cards to
13991
see which had a higher rank or suit.
13992
\index{encode}
13993
\index{encrypt}
13994
\index{map to}
13995
\index{representation}
13996

13997
An alternative is to use integers to {\bf encode} the ranks and suits.
13998
In this context, ``encode'' means that we are going to define a mapping
13999
between numbers and suits, or between numbers and ranks.  This
14000
kind of encoding is not meant to be a secret (that
14001
would be ``encryption'').
14002

14003
\newcommand{\mymapsto}{$\mapsto$}
14004

14005
For example, this table shows the suits and the corresponding integer
14006
codes:
14007

14008
\begin{tabular}{l c l}
14009
Spades & \mymapsto & 3 \\
14010
Hearts & \mymapsto & 2 \\
14011
Diamonds & \mymapsto & 1 \\
14012
Clubs & \mymapsto & 0
14013
\end{tabular}
14014

14015
This code makes it easy to compare cards; because higher suits map to
14016
higher numbers, we can compare suits by comparing their codes.
14017

14018
The mapping for ranks is fairly obvious; each of the numerical ranks
14019
maps to the corresponding integer, and for face cards:
14020

14021
\begin{tabular}{l c l}
14022
Jack & \mymapsto & 11 \\
14023
Queen & \mymapsto & 12 \\
14024
King & \mymapsto & 13 \\
14025
\end{tabular}
14026

14027
I am using the \mymapsto~symbol to make it clear that these mappings
14028
are not part of the Python program.  They are part of the program
14029
design, but they don't appear explicitly in the code.
14030
\index{Card class}
14031
\index{class!Card}
14032

14033
The class definition for {\tt Card} looks like this:
14034

14035
\begin{verbatim}
14036
class Card:
14037
    """Represents a standard playing card."""
14038

14039
    def __init__(self, suit=0, rank=2):
14040
        self.suit = suit
14041
        self.rank = rank
14042
\end{verbatim}
14043
%
14044
As usual, the init method takes an optional
14045
parameter for each attribute.  The default card is
14046
the 2 of Clubs.
14047
\index{init method}
14048
\index{method!init}
14049

14050
To create a Card, you call {\tt Card} with the
14051
suit and rank of the card you want.
14052

14053
\begin{verbatim}
14054
queen_of_diamonds = Card(1, 12)
14055
\end{verbatim}
14056
%
14057

14058

14059
\section{Class attributes}
14060
\label{class.attribute}
14061
\index{class attribute}
14062
\index{attribute!class}
14063

14064
In order to print Card objects in a way that people can easily
14065
read, we need a mapping from the integer codes to the corresponding
14066
ranks and suits.  A natural way to
14067
do that is with lists of strings.  We assign these lists to {\bf class
14068
attributes}:
14069

14070
\begin{verbatim}
14071
# inside class Card:
14072

14073
    suit_names = ['Clubs', 'Diamonds', 'Hearts', 'Spades']
14074
    rank_names = [None, 'Ace', '2', '3', '4', '5', '6', '7', 
14075
              '8', '9', '10', 'Jack', 'Queen', 'King']
14076

14077
    def __str__(self):
14078
        return '%s of %s' % (Card.rank_names[self.rank],
14079
                             Card.suit_names[self.suit])
14080
\end{verbatim}
14081
%
14082
Variables like \verb"suit_names" and \verb"rank_names", which are
14083
defined inside a class but outside of any method, are called
14084
class attributes because they are associated with the class object 
14085
{\tt Card}.
14086
\index{instance attribute}
14087
\index{attribute!instance}
14088

14089
This term distinguishes them from variables like {\tt suit} and {\tt
14090
  rank}, which are called {\bf instance attributes} because they are
14091
associated with a particular instance.
14092
\index{dot notation}
14093

14094
Both kinds of attribute are accessed using dot notation.  For
14095
example, in \verb"__str__", {\tt self} is a Card object,
14096
and {\tt self.rank} is its rank.  Similarly, {\tt Card}
14097
is a class object, and \verb"Card.rank_names" is a
14098
list of strings associated with the class.
14099

14100
Every card has its own {\tt suit} and {\tt rank}, but there
14101
is only one copy of \verb"suit_names" and \verb"rank_names".
14102

14103
Putting it all together, the expression
14104
\verb"Card.rank_names[self.rank]" means ``use the attribute {\tt rank}
14105
from the object {\tt self} as an index into the list \verb"rank_names"
14106
from the class {\tt Card}, and select the appropriate string.''
14107

14108
The first element of \verb"rank_names" is {\tt None} because there
14109
is no card with rank zero.  By including {\tt None} as a place-keeper,
14110
we get a mapping with the nice property that the index 2 maps to the
14111
string \verb"'2'", and so on.  To avoid this tweak, we could have
14112
used a dictionary instead of a list.
14113

14114
With the methods we have so far, we can create and print cards:
14115

14116
\begin{verbatim}
14117
>>> card1 = Card(2, 11)
14118
>>> print(card1)
14119
Jack of Hearts
14120
\end{verbatim}
14121

14122
\begin{figure}
14123
\centerline
14124
{\includegraphics[scale=0.8]{figs/card1.pdf}}
14125
\caption{Object diagram.}
14126
\label{fig.card1}
14127
\end{figure}
14128

14129
Figure~\ref{fig.card1} is a diagram of the {\tt Card} class object and
14130
one Card instance.  {\tt Card} is a class object; its type is {\tt
14131
  type}.  {\tt card1} is an instance of {\tt Card}, so its type is
14132
{\tt Card}.  To save space, I didn't draw the contents of
14133
\verb"suit_names" and \verb"rank_names".  \index{state diagram}
14134
\index{diagram!state} \index{object diagram} \index{diagram!object}
14135

14136

14137
\section{Comparing cards}
14138
\label{comparecard}
14139
\index{operator!relational}
14140
\index{relational operator}
14141

14142
For built-in types, there are relational operators
14143
({\tt <}, {\tt >}, {\tt ==}, etc.)
14144
that compare
14145
values and determine when one is greater than, less than, or equal to
14146
another.  For programmer-defined types, we can override the behavior of
14147
the built-in operators by providing a method named
14148
\verb"__lt__", which stands for ``less than''.
14149
\index{programmer-defined type}
14150
\index{type!programmer-defined}
14151

14152
\verb"__lt__" takes two parameters, {\tt self} and {\tt other},
14153
and returns {\tt True} if {\tt self} is strictly less than {\tt other}.
14154
\index{override}
14155
\index{operator overloading}
14156

14157
The correct ordering for cards is not obvious.
14158
For example, which
14159
is better, the 3 of Clubs or the 2 of Diamonds?  One has a higher
14160
rank, but the other has a higher suit.  In order to compare
14161
cards, you have to decide whether rank or suit is more important.
14162

14163
The answer might depend on what game you are playing, but to keep
14164
things simple, we'll make the arbitrary choice that suit is more
14165
important, so all of the Spades outrank all of the Diamonds,
14166
and so on.
14167
\index{cmp method@\_\_cmp\_\_ method}
14168
\index{method!\_\_cmp\_\_}
14169

14170
With that decided, we can write \verb"__lt__":
14171

14172
\begin{verbatim}
14173
# inside class Card:
14174

14175
    def __lt__(self, other):
14176
        # check the suits
14177
        if self.suit < other.suit: return True
14178
        if self.suit > other.suit: return False
14179

14180
        # suits are the same... check ranks
14181
        return self.rank < other.rank
14182
\end{verbatim}
14183
%
14184
You can write this more concisely using tuple comparison:
14185
\index{tuple!comparison}
14186
\index{comparison!tuple}
14187

14188
\begin{verbatim}
14189
# inside class Card:
14190

14191
    def __lt__(self, other):
14192
        t1 = self.suit, self.rank
14193
        t2 = other.suit, other.rank
14194
        return t1 < t2
14195
\end{verbatim}
14196
%
14197
As an exercise, write an \verb"__lt__" method for Time objects.  You
14198
can use tuple comparison, but you also might consider 
14199
comparing integers.
14200

14201

14202
\section{Decks}
14203
\index{list!of objects}
14204
\index{deck, playing cards}
14205

14206
Now that we have Cards, the next step is to define Decks.  Since a
14207
deck is made up of cards, it is natural for each Deck to contain a
14208
list of cards as an attribute.
14209
\index{init method}
14210
\index{method!init}
14211

14212
The following is a class definition for {\tt Deck}.  The
14213
init method creates the attribute {\tt cards} and generates
14214
the standard set of fifty-two cards:
14215
\index{composition}
14216
\index{loop!nested}
14217
\index{Deck class}
14218
\index{class!Deck}
14219

14220
\begin{verbatim}
14221
class Deck:
14222

14223
    def __init__(self):
14224
        self.cards = []
14225
        for suit in range(4):
14226
            for rank in range(1, 14):
14227
                card = Card(suit, rank)
14228
                self.cards.append(card)
14229
\end{verbatim}
14230
%
14231
The easiest way to populate the deck is with a nested loop.  The outer
14232
loop enumerates the suits from 0 to 3.  The inner loop enumerates the
14233
ranks from 1 to 13.  Each iteration
14234
creates a new Card with the current suit and rank,
14235
and appends it to {\tt self.cards}.
14236
\index{append method}
14237
\index{method!append}
14238

14239

14240
\section{Printing the deck}
14241
\label{printdeck}
14242
\index{str method@\_\_str\_\_ method}
14243
\index{method!\_\_str\_\_}
14244

14245
Here is a \verb"__str__" method for {\tt Deck}:
14246

14247
\begin{verbatim}
14248
#inside class Deck:
14249

14250
    def __str__(self):
14251
        res = []
14252
        for card in self.cards:
14253
            res.append(str(card))
14254
        return '\n'.join(res)
14255
\end{verbatim}
14256
%
14257
This method demonstrates an efficient way to accumulate a large
14258
string: building a list of strings and then using the string method
14259
{\tt join}.  The built-in function {\tt str} invokes the
14260
\verb"__str__" method on each card and returns the string
14261
representation.  \index{accumulator!string} \index{string!accumulator}
14262
\index{join method} \index{method!join} \index{newline}
14263

14264
Since we invoke {\tt join} on a newline character, the cards
14265
are separated by newlines.  Here's what the result looks like:
14266

14267
\begin{verbatim}
14268
>>> deck = Deck()
14269
>>> print(deck)
14270
Ace of Clubs
14271
2 of Clubs
14272
3 of Clubs
14273
...
14274
10 of Spades
14275
Jack of Spades
14276
Queen of Spades
14277
King of Spades
14278
\end{verbatim}
14279
%
14280
Even though the result appears on 52 lines, it is
14281
one long string that contains newlines.
14282

14283

14284
\section{Add, remove, shuffle and sort}
14285

14286
To deal cards, we would like a method that
14287
removes a card from the deck and returns it.
14288
The list method {\tt pop} provides a convenient way to do that:
14289
\index{pop method}
14290
\index{method!pop}
14291

14292
\begin{verbatim}
14293
#inside class Deck:
14294

14295
    def pop_card(self):
14296
        return self.cards.pop()
14297
\end{verbatim}
14298
%
14299
Since {\tt pop} removes the {\em last} card in the list, we are
14300
dealing from the bottom of the deck.
14301
\index{append method}
14302
\index{method!append}
14303

14304
To add a card, we can use the list method {\tt append}:
14305

14306
\begin{verbatim}
14307
#inside class Deck:
14308

14309
    def add_card(self, card):
14310
        self.cards.append(card)
14311
\end{verbatim}
14312
%
14313
A method like this that uses another method without doing
14314
much work is sometimes called a {\bf veneer}.  The metaphor
14315
comes from woodworking, where a veneer is a thin
14316
layer of good quality wood glued to the surface of a cheaper piece of
14317
wood to improve the appearance.
14318
\index{veneer}
14319

14320
In this case \verb"add_card" is a ``thin'' method that expresses
14321
a list operation in terms appropriate for decks.  It
14322
improves the appearance, or interface, of the
14323
implementation.
14324

14325
As another example, we can write a Deck method named {\tt shuffle}
14326
using the function {\tt shuffle} from the {\tt random} module:
14327
\index{random module}
14328
\index{module!random}
14329
\index{shuffle function}
14330
\index{function!shuffle}
14331

14332
\begin{verbatim}
14333
# inside class Deck:
14334
            
14335
    def shuffle(self):
14336
        random.shuffle(self.cards)
14337
\end{verbatim}
14338
%
14339
Don't forget to import {\tt random}.
14340

14341
As an exercise, write a Deck method named {\tt sort} that uses the
14342
list method {\tt sort} to sort the cards in a {\tt Deck}.  {\tt sort}
14343
uses the \verb"__lt__" method we defined to determine the order.
14344
\index{sort method} \index{method!sort}
14345

14346

14347

14348
\section{Inheritance}
14349
\index{inheritance}
14350
\index{object-oriented programming}
14351

14352
Inheritance is the ability to define a new class that is a modified
14353
version of an existing class.  As an example, let's say we want a
14354
class to represent a ``hand'', that is, the cards held by one player.
14355
A hand is similar to a deck: both are made up of a collection of
14356
cards, and both require operations like adding and removing cards.
14357

14358
A hand is also different from a deck; there are operations we want for
14359
hands that don't make sense for a deck.  For example, in poker we
14360
might compare two hands to see which one wins.  In bridge, we might
14361
compute a score for a hand in order to make a bid.
14362

14363
This relationship between classes---similar, but different---lends
14364
itself to inheritance.
14365
To define a new class that inherits from an existing class,
14366
you put the name of the existing class in parentheses:
14367
\index{parentheses!parent class in}
14368
\index{parent class}
14369
\index{class!parent}
14370
\index{Hand class}
14371
\index{class!Hand}
14372

14373
\begin{verbatim}
14374
class Hand(Deck):
14375
    """Represents a hand of playing cards."""
14376
\end{verbatim}
14377
%
14378
This definition indicates that {\tt Hand} inherits from {\tt Deck};
14379
that means we can use methods like \verb"pop_card" and \verb"add_card"
14380
for Hands as well as Decks.
14381

14382
When a new class inherits from an existing one, the existing
14383
one is called the {\bf parent} and the new class is
14384
called the {\bf child}.
14385
\index{parent class}
14386
\index{child class}
14387
\index{class!child}
14388

14389
In this example, {\tt Hand} inherits \verb"__init__" from {\tt Deck},
14390
but it doesn't really do what we want: instead of populating the hand
14391
with 52 new cards, the init method for Hands should initialize {\tt
14392
  cards} with an empty list.  \index{override} \index{init method}
14393
\index{method!init}
14394

14395
If we provide an init method in the {\tt Hand} class, it overrides the
14396
one in the {\tt Deck} class:
14397

14398
\begin{verbatim}
14399
# inside class Hand:
14400

14401
    def __init__(self, label=''):
14402
        self.cards = []
14403
        self.label = label
14404
\end{verbatim}
14405
%
14406
When you create a Hand, Python invokes this init method, not the
14407
one in {\tt Deck}.
14408

14409
\begin{verbatim}
14410
>>> hand = Hand('new hand')
14411
>>> hand.cards
14412
[]
14413
>>> hand.label
14414
'new hand'
14415
\end{verbatim}
14416
%
14417
The other methods are inherited from {\tt Deck}, so we can use
14418
\verb"pop_card" and \verb"add_card" to deal a card:
14419

14420
\begin{verbatim}
14421
>>> deck = Deck()
14422
>>> card = deck.pop_card()
14423
>>> hand.add_card(card)
14424
>>> print(hand)
14425
King of Spades
14426
\end{verbatim}
14427
%
14428
A natural next step is to encapsulate this code in a method
14429
called \verb"move_cards":
14430
\index{encapsulation}
14431

14432
\begin{verbatim}
14433
#inside class Deck:
14434

14435
    def move_cards(self, hand, num):
14436
        for i in range(num):
14437
            hand.add_card(self.pop_card())
14438
\end{verbatim}
14439
%
14440
\verb"move_cards" takes two arguments, a Hand object and the number of
14441
cards to deal.  It modifies both {\tt self} and {\tt hand}, and
14442
returns {\tt None}.
14443

14444
In some games, cards are moved from one hand to another,
14445
or from a hand back to the deck.  You can use \verb"move_cards"
14446
for any of these operations: {\tt self} can be either a Deck
14447
or a Hand, and {\tt hand}, despite the name, can also be a {\tt Deck}.
14448

14449
Inheritance is a useful feature.  Some programs that would be
14450
repetitive without inheritance can be written more elegantly
14451
with it.  Inheritance can facilitate code reuse, since you can
14452
customize the behavior of parent classes without having to modify
14453
them.  In some cases, the inheritance structure reflects the natural
14454
structure of the problem, which makes the design easier to
14455
understand.
14456

14457
On the other hand, inheritance can make programs difficult to read.
14458
When a method is invoked, it is sometimes not clear where to find its
14459
definition.  The relevant code may be spread across several modules.
14460
Also, many of the things that can be done using inheritance can be
14461
done as well or better without it.
14462

14463

14464
\section{Class diagrams}
14465
\label{class.diagram}
14466

14467
So far we have seen stack diagrams, which show the state of
14468
a program, and object diagrams, which show the attributes
14469
of an object and their values.  These diagrams represent a snapshot
14470
in the execution of a program, so they change as the program
14471
runs.
14472

14473
They are also highly detailed; for some purposes, too
14474
detailed.  A class diagram is a more abstract representation
14475
of the structure of a program.  Instead of showing individual
14476
objects, it shows classes and the relationships between them.
14477

14478
There are several kinds of relationship between classes:
14479

14480
\begin{itemize}
14481

14482
\item Objects in one class might contain references to objects
14483
in another class.  For example, each Rectangle contains a reference
14484
to a Point, and each Deck contains references to many Cards.
14485
This kind of relationship is called {\bf HAS-A}, as in, ``a Rectangle
14486
has a Point.''
14487

14488
\item One class might inherit from another.  This relationship
14489
is called {\bf IS-A}, as in, ``a Hand is a kind of a Deck.''
14490

14491
\item One class might depend on another in the sense that objects
14492
in one class take objects in the second class as parameters, or
14493
use objects in the second class as part of a computation.  This
14494
kind of relationship is called a {\bf dependency}.
14495

14496
\end{itemize}
14497
\index{IS-A relationship}
14498
\index{HAS-A relationship}
14499
\index{class diagram}
14500
\index{diagram!class}
14501

14502
A {\bf class diagram} is a graphical representation of these
14503
relationships.  For example, Figure~\ref{fig.class1} shows the
14504
relationships between {\tt Card}, {\tt Deck} and {\tt Hand}.
14505

14506
\begin{figure}
14507
\centerline
14508
{\includegraphics[scale=0.8]{figs/class1.pdf}}
14509
\caption{Class diagram.}
14510
\label{fig.class1}
14511
\end{figure}
14512

14513
The arrow with a hollow triangle head represents an IS-A
14514
relationship; in this case it indicates that Hand inherits
14515
from Deck.
14516

14517
The standard arrow head represents a HAS-A
14518
relationship; in this case a Deck has references to Card
14519
objects.
14520
\index{multiplicity (in class diagram)}
14521

14522
The star ({\tt *}) near the arrow head is a 
14523
{\bf multiplicity}; it indicates how many Cards a Deck has.
14524
A multiplicity can be a simple number, like {\tt 52}, a range,
14525
like {\tt 5..7} or a star, which indicates that a Deck can
14526
have any number of Cards.
14527

14528
There are no dependencies in this diagram.  They would normally
14529
be shown with a dashed arrow.  Or if there are a lot of
14530
dependencies, they are sometimes omitted.
14531

14532
A more detailed diagram might show that a Deck actually
14533
contains a {\em list} of Cards, but built-in types
14534
like list and dict are usually not included in class diagrams.
14535

14536

14537
\section{Debugging}
14538
\index{debugging}
14539

14540
Inheritance can make debugging difficult because when you invoke a
14541
method on an object, it might be hard to figure out which method will
14542
be invoked.  
14543
\index{inheritance}
14544

14545
Suppose you are writing a function that works with Hand objects.
14546
You would like it to work with all kinds of Hands, like
14547
PokerHands, BridgeHands, etc.  If you invoke a method like
14548
{\tt shuffle}, you might get the one defined in {\tt Deck},
14549
but if any of the subclasses override this method, you'll
14550
get that version instead.  This behavior is usually a good
14551
thing, but it can be confusing.
14552

14553
Any time you are unsure about the flow of execution through your
14554
program, the simplest solution is to add print statements at the
14555
beginning of the relevant methods.  If {\tt Deck.shuffle} prints a
14556
message that says something like {\tt Running Deck.shuffle}, then as
14557
the program runs it traces the flow of execution.
14558
\index{flow of execution}
14559

14560
As an alternative, you could use this function, which takes an
14561
object and a method name (as a string) and returns the class that
14562
provides the definition of the method:
14563

14564
\begin{verbatim}
14565
def find_defining_class(obj, meth_name):
14566
    for ty in type(obj).mro():
14567
        if meth_name in ty.__dict__:
14568
            return ty
14569
\end{verbatim}
14570
%
14571
Here's an example:
14572

14573
\begin{verbatim}
14574
>>> hand = Hand()
14575
>>> find_defining_class(hand, 'shuffle')
14576
<class '__main__.Deck'>
14577
\end{verbatim}
14578
%
14579
So the {\tt shuffle} method for this Hand is the one in {\tt Deck}.
14580
\index{mro method}
14581
\index{method!mro}
14582
\index{method resolution order}
14583

14584
\verb"find_defining_class" uses the {\tt mro} method to get the list
14585
of class objects (types) that will be searched for methods.  ``MRO''
14586
stands for ``method resolution order'', which is the sequence of
14587
classes Python searches to ``resolve'' a method name.
14588

14589
Here's a design suggestion: when you override a method,
14590
the interface of the new method should be the same as the old.  It
14591
should take the same parameters, return the same type, and obey the
14592
same preconditions and postconditions.  If you follow this rule, you
14593
will find that any function designed to work with an instance of a
14594
parent class, like a Deck, will also work with instances of child
14595
classes like a Hand and PokerHand.
14596
\index{override}
14597
\index{interface}
14598
\index{precondition}
14599
\index{postcondition}
14600

14601
If you violate this rule, which is called the ``Liskov substitution
14602
principle'', your code will collapse like (sorry) a house of cards.
14603
\index{Liskov substitution principle}
14604

14605

14606
\section{Data encapsulation}
14607

14608
The previous chapters demonstrate a development plan we might call
14609
``object-oriented design''.  We identified objects we needed---like
14610
{\tt Point}, {\tt Rectangle} and {\tt Time}---and defined classes to
14611
represent them.  In each case there is an obvious correspondence
14612
between the object and some entity in the real world (or at least a
14613
mathematical world).  
14614
\index{development plan!data encapsulation}
14615

14616
But sometimes it is less obvious what objects you need
14617
and how they should interact.  In that case you need a different
14618
development plan.  In the same way that we discovered function
14619
interfaces by encapsulation and generalization, we can discover
14620
class interfaces by {\bf data encapsulation}.
14621
\index{data encapsulation}
14622

14623
Markov analysis, from Section~\ref{markov}, provides a good example.
14624
If you download my code from \url{http://thinkpython2.com/code/markov.py},
14625
you'll see that it uses two global variables---\verb"suffix_map" and
14626
\verb"prefix"---that are read and written from several functions.
14627

14628
\begin{verbatim}
14629
suffix_map = {}        
14630
prefix = ()            
14631
\end{verbatim}
14632

14633
Because these variables are global, we can only run one analysis at a
14634
time.  If we read two texts, their prefixes and suffixes would be
14635
added to the same data structures (which makes for some interesting
14636
generated text).
14637

14638
To run multiple analyses, and keep them separate, we can encapsulate
14639
the state of each analysis in an object.
14640
Here's what that looks like:
14641

14642
\begin{verbatim}
14643
class Markov:
14644

14645
    def __init__(self):
14646
        self.suffix_map = {}
14647
        self.prefix = ()    
14648
\end{verbatim}
14649

14650
Next, we transform the functions into methods.  For example,
14651
here's \verb"process_word":
14652

14653
\begin{verbatim}
14654
    def process_word(self, word, order=2):
14655
        if len(self.prefix) < order:
14656
            self.prefix += (word,)
14657
            return
14658

14659
        try:
14660
            self.suffix_map[self.prefix].append(word)
14661
        except KeyError:
14662
            # if there is no entry for this prefix, make one
14663
            self.suffix_map[self.prefix] = [word]
14664

14665
        self.prefix = shift(self.prefix, word)        
14666
\end{verbatim}
14667

14668
Transforming a program like this---changing the design without
14669
changing the behavior---is another example of refactoring
14670
(see Section~\ref{refactoring}).
14671
\index{refactoring}
14672

14673
This example suggests a development plan for designing objects and
14674
methods:
14675

14676
\begin{enumerate}
14677

14678
\item Start by writing functions that read and write global
14679
variables (when necessary).
14680

14681
\item Once you get the program working, look for associations
14682
between global variables and the functions that use them.
14683

14684
\item Encapsulate related variables as attributes of an object.
14685

14686
\item Transform the associated functions into methods of the new
14687
class.
14688

14689
\end{enumerate}
14690

14691
As an exercise, download my Markov code from
14692
\url{http://thinkpython2.com/code/markov.py}, and follow the steps
14693
described above to encapsulate the global variables as attributes of a
14694
new class called {\tt Markov}.  Solution:
14695
\url{http://thinkpython2.com/code/markov2.py}.
14696

14697

14698
\section{Glossary}
14699

14700
\begin{description}
14701

14702
\item[encode:]  To represent one set of values using another
14703
set of values by constructing a mapping between them.
14704
\index{encode}
14705

14706
\item[class attribute:] An attribute associated with a class
14707
object.  Class attributes are defined inside
14708
a class definition but outside any method.
14709
\index{class attribute}
14710
\index{attribute!class}
14711

14712
\item[instance attribute:] An attribute associated with an
14713
instance of a class.
14714
\index{instance attribute}
14715
\index{attribute!instance}
14716

14717
\item[veneer:] A method or function that provides a different
14718
interface to another function without doing much computation.
14719
\index{veneer}
14720

14721
\item[inheritance:] The ability to define a new class that is a
14722
modified version of a previously defined class.
14723
\index{inheritance}
14724

14725
\item[parent class:] The class from which a child class inherits.
14726
\index{parent class}
14727

14728
\item[child class:] A new class created by inheriting from an
14729
existing class; also called a ``subclass''.
14730
\index{child class}
14731
\index{class!child}
14732

14733
\item[IS-A relationship:] A relationship between a child class
14734
and its parent class.
14735
\index{IS-A relationship}
14736

14737
\item[HAS-A relationship:] A relationship between two classes
14738
where instances of one class contain references to instances of
14739
the other.
14740
\index{HAS-A relationship}
14741

14742
\item[dependency:] A relationship between two classes
14743
where instances of one class use instances of the other class,
14744
but do not store them as attributes.
14745
\index{HAS-A relationship}
14746

14747
\item[class diagram:] A diagram that shows the classes in a program
14748
and the relationships between them.
14749
\index{class diagram}
14750
\index{diagram!class}
14751

14752
\item[multiplicity:] A notation in a class diagram that shows, for
14753
a HAS-A relationship, how many references there are to instances
14754
of another class.
14755
\index{multiplicity (in class diagram)}
14756

14757
\item[data encapsulation:]  A program development plan that
14758
involves a prototype using global variables and a final version
14759
that makes the global variables into instance attributes. 
14760
\index{data encapsulation}
14761
\index{development plan!data encapsulation}
14762

14763
\end{description}
14764

14765

14766
\section{Exercises}
14767

14768
\begin{exercise}
14769
For the following program, draw a UML class diagram that shows
14770
these classes and the relationships among them.
14771

14772
\begin{verbatim}
14773
class PingPongParent:
14774
    pass
14775

14776
class Ping(PingPongParent):
14777
    def __init__(self, pong):
14778
        self.pong = pong
14779

14780

14781
class Pong(PingPongParent):
14782
    def __init__(self, pings=None):
14783
        if pings is None:
14784
            self.pings = []
14785
        else:
14786
            self.pings = pings
14787

14788
    def add_ping(self, ping):
14789
        self.pings.append(ping)
14790

14791
pong = Pong()
14792
ping = Ping(pong)
14793
pong.add_ping(ping)
14794
\end{verbatim}
14795

14796

14797
\end{exercise}
14798

14799

14800

14801
\begin{exercise}
14802
Write a Deck method called \verb"deal_hands" that
14803
takes two parameters, the number of hands and the number of cards per
14804
hand.  It should create the appropriate number of Hand objects, deal
14805
the appropriate number of cards per hand, and return a list of Hands.
14806
\end{exercise}
14807

14808

14809
\begin{exercise}
14810
\label{poker}
14811

14812
The following are the possible hands in poker, in increasing order
14813
of value and decreasing order of probability:
14814
\index{poker}
14815

14816
\begin{description}
14817

14818
\item[pair:] two cards with the same rank
14819
\vspace{-0.05in}
14820

14821
\item[two pair:] two pairs of cards with the same rank
14822
\vspace{-0.05in}
14823

14824
\item[three of a kind:] three cards with the same rank
14825
\vspace{-0.05in}
14826

14827
\item[straight:] five cards with ranks in sequence (aces can
14828
be high or low, so {\tt Ace-2-3-4-5} is a straight and so is {\tt
14829
10-Jack-Queen-King-Ace}, but {\tt Queen-King-Ace-2-3} is not.)
14830
\vspace{-0.05in}
14831

14832
\item[flush:] five cards with the same suit
14833
\vspace{-0.05in}
14834

14835
\item[full house:] three cards with one rank, two cards with another
14836
\vspace{-0.05in}
14837

14838
\item[four of a kind:] four cards with the same rank
14839
\vspace{-0.05in}
14840

14841
\item[straight flush:] five cards in sequence (as defined above) and
14842
with the same suit
14843
\vspace{-0.05in}
14844

14845
\end{description}
14846
%
14847
The goal of these exercises is to estimate
14848
the probability of drawing these various hands.
14849

14850
\begin{enumerate}
14851

14852
\item Download the following files from \url{http://thinkpython2.com/code}:
14853

14854
\begin{description}
14855

14856
\item[{\tt Card.py}]: A complete version of the {\tt Card},
14857
{\tt Deck} and {\tt Hand} classes in this chapter.
14858

14859
\item[{\tt PokerHand.py}]: An incomplete implementation of a class
14860
that represents a poker hand, and some code that tests it.
14861

14862
\end{description}
14863
%
14864
\item If you run {\tt PokerHand.py}, it deals seven 7-card poker hands
14865
and checks to see if any of them contains a flush.  Read this
14866
code carefully before you go on.
14867

14868
\item Add methods to {\tt PokerHand.py} named \verb"has_pair",
14869
\verb"has_twopair", etc. that return True or False according to
14870
whether or not the hand meets the relevant criteria.  Your code should
14871
work correctly for ``hands'' that contain any number of cards
14872
(although 5 and 7 are the most common sizes).
14873

14874
\item Write a method named {\tt classify} that figures out
14875
the highest-value classification for a hand and sets the
14876
{\tt label} attribute accordingly.  For example, a 7-card hand
14877
might contain a flush and a pair; it should be labeled ``flush''.
14878

14879
\item When you are convinced that your classification methods are
14880
working, the next step is to estimate the probabilities of the various
14881
hands.  Write a function in {\tt PokerHand.py} that shuffles a deck of
14882
cards, divides it into hands, classifies the hands, and counts the
14883
number of times various classifications appear.
14884

14885
\item Print a table of the classifications and their probabilities.
14886
Run your program with larger and larger numbers of hands until the
14887
output values converge to a reasonable degree of accuracy.  Compare
14888
your results to the values at \url{http://en.wikipedia.org/wiki/Hand_rankings}.
14889

14890
\end{enumerate}
14891

14892
Solution: \url{http://thinkpython2.com/code/PokerHandSoln.py}.
14893
\end{exercise}
14894

14895

14896
\chapter{The Goodies}
14897

14898
One of my goals for this book has been to teach you as little Python
14899
as possible.  When there were two ways to do something, I picked 
14900
one and avoided mentioning the other.  Or sometimes I put the second
14901
one into an exercise.
14902

14903
Now I want to go back for some of the good bits that got left behind.
14904
Python provides a number of features that are not really necessary---you
14905
can write good code without them---but with them you can sometimes
14906
write code that's more concise, readable or efficient, and sometimes
14907
all three.
14908

14909
% TODO: add the with statement
14910

14911
\section{Conditional expressions}
14912

14913
We saw conditional statements in Section~\ref{conditional.execution}.
14914
Conditional statements are often used to choose one of two values;
14915
for example:
14916
\index{conditional expression}
14917
\index{expression!conditional}
14918

14919
\begin{verbatim}
14920
if x > 0:
14921
    y = math.log(x)
14922
else:
14923
    y = float('nan')
14924
\end{verbatim}
14925

14926
This statement checks whether {\tt x} is positive.  If so, it computes
14927
{\tt math.log}.  If not, {\tt math.log} would raise a ValueError.  To
14928
avoid stopping the program, we generate a ``NaN'', which is a special
14929
floating-point value that represents ``Not a Number''.
14930
\index{NaN}
14931
\index{floating-point}
14932

14933
We can write this statement more concisely using a {\bf conditional
14934
expression}:
14935

14936
\begin{verbatim}
14937
y = math.log(x) if x > 0 else float('nan')
14938
\end{verbatim}
14939

14940
You can almost read this line like English: ``{\tt y} gets log-{\tt x}
14941
if {\tt x} is greater than 0; otherwise it gets NaN''.
14942

14943
Recursive functions can sometimes be rewritten using conditional
14944
expressions.  For example, here is a recursive version of {\tt factorial}:
14945
\index{factorial}
14946
\index{function!factorial}
14947

14948
\begin{verbatim}
14949
def factorial(n):
14950
    if n == 0:
14951
        return 1
14952
    else:
14953
        return n * factorial(n-1)
14954
\end{verbatim}
14955

14956
We can rewrite it like this:
14957

14958
\begin{verbatim}
14959
def factorial(n):
14960
    return 1 if n == 0 else n * factorial(n-1)
14961
\end{verbatim}
14962

14963
Another use of conditional expressions is handling optional
14964
arguments.  For example, here is the init method from
14965
{\tt GoodKangaroo} (see Exercise~\ref{kangaroo}):
14966
\index{optional argument}
14967
\index{argument!optional}
14968

14969
\begin{verbatim}
14970
    def __init__(self, name, contents=None):
14971
        self.name = name
14972
        if contents == None:
14973
            contents = []
14974
        self.pouch_contents = contents
14975
\end{verbatim}
14976

14977
We can rewrite this one like this:
14978

14979
\begin{verbatim}
14980
    def __init__(self, name, contents=None):
14981
        self.name = name
14982
        self.pouch_contents = [] if contents == None else contents 
14983
\end{verbatim}
14984

14985
In general, you can replace a conditional statement with a conditional
14986
expression if both branches contain simple expressions that are
14987
either returned or assigned to the same variable.
14988
\index{conditional statement}
14989
\index{statement!conditional}
14990

14991

14992

14993
\section{List comprehensions}
14994

14995
In Section~\ref{filter} we saw the map and filter patterns.  For
14996
example, this function takes a list of strings, maps the string method
14997
{\tt capitalize} to the elements, and returns a new list of strings:
14998

14999
\begin{verbatim}
15000
def capitalize_all(t):
15001
    res = []
15002
    for s in t:
15003
        res.append(s.capitalize())
15004
    return res
15005
\end{verbatim}
15006

15007
We can write this more concisely using a {\bf list comprehension}:
15008
\index{list comprehension}
15009

15010
\begin{verbatim}
15011
def capitalize_all(t):
15012
    return [s.capitalize() for s in t]
15013
\end{verbatim}
15014

15015
The bracket operators indicate that we are constructing a new
15016
list.  The expression inside the brackets specifies the elements
15017
of the list, and the {\tt for} clause indicates what sequence
15018
we are traversing.
15019
\index{list}
15020
\index{for loop}
15021

15022
The syntax of a list comprehension is a little awkward because
15023
the loop variable, {\tt s} in this example, appears in the expression
15024
before we get to the definition.
15025
\index{loop variable}
15026

15027
List comprehensions can also be used for filtering.  For example,
15028
this function selects only the elements of {\tt t} that are
15029
upper case, and returns a new list:
15030
\index{filter pattern}
15031
\index{pattern!filter}
15032

15033
\begin{verbatim}
15034
def only_upper(t):
15035
    res = []
15036
    for s in t:
15037
        if s.isupper():
15038
            res.append(s)
15039
    return res
15040
\end{verbatim}
15041

15042
We can rewrite it using a list comprehension
15043

15044
\begin{verbatim}
15045
def only_upper(t):
15046
    return [s for s in t if s.isupper()]
15047
\end{verbatim}
15048

15049
List comprehensions are concise and easy to read, at least for simple
15050
expressions.  And they are usually faster than the equivalent for
15051
loops, sometimes much faster.  So if you are mad at me for not
15052
mentioning them earlier, I understand.
15053

15054
But, in my defense, list comprehensions are harder to debug because
15055
you can't put a print statement inside the loop.  I suggest that you
15056
use them only if the computation is simple enough that you are likely
15057
to get it right the first time.  And for beginners that means never.
15058
\index{debugging}
15059

15060

15061

15062
\section{Generator expressions}
15063

15064
{\bf Generator expressions} are similar to list comprehensions, but
15065
with parentheses instead of square brackets:
15066
\index{generator expression}
15067
\index{expression!generator}
15068

15069
\begin{verbatim}
15070
>>> g = (x**2 for x in range(5))
15071
>>> g
15072
<generator object <genexpr> at 0x7f4c45a786c0>
15073
\end{verbatim}
15074
%
15075
The result is a generator object that knows how to iterate through
15076
a sequence of values.  But unlike a list comprehension, it does not
15077
compute the values all at once; it waits to be asked.  The built-in
15078
function {\tt next} gets the next value from the generator:
15079
\index{generator object}
15080
\index{object!generator}
15081

15082
\begin{verbatim}
15083
>>> next(g)
15084
0
15085
>>> next(g)
15086
1
15087
\end{verbatim}
15088
%
15089
When you get to the end of the sequence, {\tt next} raises a 
15090
StopIteration exception.  You can also use a {\tt for} loop to iterate
15091
through the values:
15092
\index{StopIteration}
15093
\index{exception!StopIteration}
15094

15095
\begin{verbatim}
15096
>>> for val in g:
15097
...     print(val)
15098
4
15099
9
15100
16
15101
\end{verbatim}
15102
%
15103
The generator object keeps track of where it is in the sequence,
15104
so the {\tt for} loop picks up where {\tt next} left off.  Once the
15105
generator is exhausted, it continues to raise {\tt StopIteration}:
15106

15107
\begin{verbatim}
15108
>>> next(g)
15109
StopIteration
15110
\end{verbatim}
15111

15112
Generator expressions are often used with functions like {\tt sum},
15113
{\tt max}, and {\tt min}:
15114
\index{sum}
15115
\index{function!sum}
15116

15117
\begin{verbatim}
15118
>>> sum(x**2 for x in range(5))
15119
30
15120
\end{verbatim}
15121

15122

15123
\section{{\tt any} and {\tt all}}
15124

15125
Python provides a built-in function, {\tt any}, that takes a sequence
15126
of boolean values and returns {\tt True} if any of the values are {\tt
15127
  True}.  It works on lists:
15128
\index{any}
15129
\index{built-in function!any}
15130

15131
\begin{verbatim}
15132
>>> any([False, False, True])
15133
True
15134
\end{verbatim}
15135
%
15136
But it is often used with generator expressions:
15137
\index{generator expression}
15138
\index{expression!generator}
15139

15140
\begin{verbatim}
15141
>>> any(letter == 't' for letter in 'monty')
15142
True
15143
\end{verbatim}
15144
%
15145
That example isn't very useful because it does the same thing
15146
as the {\tt in} operator.  But we could use {\tt any} to rewrite
15147
some of the search functions we wrote in Section~\ref{search}.  For
15148
example, we could write {\tt avoids} like this:
15149
\index{search pattern}
15150
\index{pattern!search}
15151

15152
\begin{verbatim}
15153
def avoids(word, forbidden):
15154
    return not any(letter in forbidden for letter in word)
15155
\end{verbatim}
15156
%
15157
The function almost reads like English, ``{\tt word} avoids
15158
{\tt forbidden} if there are not any forbidden letters in {\tt word}.''
15159

15160
Using {\tt any} with a generator expression is efficient because
15161
it stops immediately if it finds a {\tt True} value,
15162
so it doesn't have to evaluate the whole sequence.
15163

15164
Python provides another built-in function, {\tt all}, that returns
15165
{\tt True} if every element of the sequence is {\tt True}.  As
15166
an exercise, use {\tt all} to re-write \verb"uses_all" from
15167
Section~\ref{search}.
15168
\index{all}
15169
\index{built-in function!any}
15170

15171

15172
\section{Sets}
15173
\label{sets}
15174

15175
In Section~\ref{dictsub} I use dictionaries to find the words
15176
that appear in a document but not in a word list.  The function
15177
I wrote takes {\tt d1}, which contains the words from the document
15178
as keys, and {\tt d2}, which contains the list of words.  It
15179
returns a dictionary that contains the keys from {\tt d1} that
15180
are not in {\tt d2}.
15181

15182
\begin{verbatim}
15183
def subtract(d1, d2):
15184
    res = dict()
15185
    for key in d1:
15186
        if key not in d2:
15187
            res[key] = None
15188
    return res
15189
\end{verbatim}
15190
%
15191
In all of these dictionaries, the values are {\tt None} because
15192
we never use them.  As a result, we waste some storage space.
15193
\index{dictionary subtraction}
15194

15195
Python provides another built-in type, called a {\tt set}, that
15196
behaves like a collection of dictionary keys with no values.  Adding
15197
elements to a set is fast; so is checking membership.  And sets
15198
provide methods and operators to compute common set operations.
15199
\index{set}
15200
\index{object!set}
15201

15202
For example, set subtraction is available as a method called
15203
{\tt difference} or as an operator, {\tt -}.  So we can rewrite
15204
{\tt subtract} like this:
15205
\index{set subtraction}
15206

15207
\begin{verbatim}
15208
def subtract(d1, d2):
15209
    return set(d1) - set(d2)
15210
\end{verbatim}
15211
%
15212
The result is a set instead of a dictionary, but for operations like
15213
iteration, the behavior is the same.
15214

15215
Some of the exercises in this book can be done concisely and
15216
efficiently with sets.  For example, here is a solution to
15217
\verb"has_duplicates", from
15218
Exercise~\ref{duplicate}, that uses a dictionary:
15219

15220
\begin{verbatim}
15221
def has_duplicates(t):
15222
    d = {}
15223
    for x in t:
15224
        if x in d:
15225
            return True
15226
        d[x] = True
15227
    return False
15228
\end{verbatim}
15229

15230
When an element appears for the first time, it is added to the
15231
dictionary.  If the same element appears again, the function returns
15232
{\tt True}.
15233

15234
Using sets, we can write the same function like this:
15235

15236
\begin{verbatim}
15237
def has_duplicates(t):
15238
    return len(set(t)) < len(t)
15239
\end{verbatim}
15240
%
15241
An element can only appear in a set once, so if an element in {\tt t}
15242
appears more than once, the set will be smaller than {\tt t}.  If there
15243
are no duplicates, the set will be the same size as {\tt t}.
15244
\index{duplicate}
15245

15246
We can also use sets to do some of the exercises in
15247
Chapter~\ref{wordplay}.  For example, here's a version of
15248
\verb"uses_only" with a loop:
15249

15250
\begin{verbatim}
15251
def uses_only(word, available):
15252
    for letter in word: 
15253
        if letter not in available:
15254
            return False
15255
    return True
15256
\end{verbatim}
15257
%
15258
\verb"uses_only" checks whether all letters in {\tt word} are
15259
in {\tt available}.  We can rewrite it like this:
15260

15261
\begin{verbatim}
15262
def uses_only(word, available):
15263
    return set(word) <= set(available)
15264
\end{verbatim}
15265
%
15266
The \verb"<=" operator checks whether one set is a subset of another,
15267
including the possibility that they are equal, which is true if all
15268
the letters in {\tt word} appear in {\tt available}.
15269
\index{subset}
15270

15271
As an exercise, rewrite \verb"avoids" using sets.
15272

15273

15274
\section{Counters}
15275

15276
A Counter is like a set, except that if an element appears more
15277
than once, the Counter keeps track of how many times it appears.
15278
If you are familiar with the mathematical idea of a {\bf multiset},
15279
a Counter is a natural way to represent a multiset.
15280
\index{Counter}
15281
\index{object!Counter}
15282
\index{multiset}
15283

15284
Counter is defined in a standard module called {\tt collections},
15285
so you have to import it.  You can initialize a Counter with a string,
15286
list, or anything else that supports iteration:
15287
\index{collections}
15288
\index{module!collections}
15289

15290
\begin{verbatim}
15291
>>> from collections import Counter
15292
>>> count = Counter('parrot')
15293
>>> count
15294
Counter({'r': 2, 't': 1, 'o': 1, 'p': 1, 'a': 1})
15295
\end{verbatim}
15296

15297
Counters behave like dictionaries in many ways; they map from each
15298
key to the number of times it appears.  As in dictionaries,
15299
the keys have to be hashable.
15300

15301
Unlike dictionaries, Counters don't raise an exception if you access
15302
an element that doesn't appear.  Instead, they return 0:
15303

15304
\begin{verbatim}
15305
>>> count['d']
15306
0
15307
\end{verbatim}
15308

15309
We can use Counters to rewrite \verb"is_anagram" from
15310
Exercise~\ref{anagram}:
15311

15312
\begin{verbatim}
15313
def is_anagram(word1, word2):
15314
    return Counter(word1) == Counter(word2)
15315
\end{verbatim}
15316

15317
If two words are anagrams, they contain the same letters with the same
15318
counts, so their Counters are equivalent.
15319

15320
Counters provide methods and operators to perform set-like operations,
15321
including addition, subtraction, union and intersection.  And
15322
they provide an often-useful method, \verb"most_common", which
15323
returns a list of value-frequency pairs, sorted from most common to
15324
least:
15325

15326
\begin{verbatim}
15327
>>> count = Counter('parrot')
15328
>>> for val, freq in count.most_common(3):
15329
...     print(val, freq)
15330
r 2
15331
p 1
15332
a 1
15333
\end{verbatim}
15334

15335

15336
\section{defaultdict}
15337

15338
The {\tt collections} module also provides {\tt defaultdict}, which is
15339
like a dictionary except that if you access a key that doesn't exist,
15340
it can generate a new value on the fly.
15341
\index{defaultdict}
15342
\index{object!defaultdict}
15343
\index{collections}
15344
\index{module!collections}
15345

15346
When you create a defaultdict, you provide a function that's used to
15347
create new values.  A function used to create objects is sometimes
15348
called a {\bf factory}.  The built-in functions that create lists, sets,
15349
and other types can be used as factories:
15350
\index{factory function}
15351

15352
\begin{verbatim}
15353
>>> from collections import defaultdict
15354
>>> d = defaultdict(list)
15355
\end{verbatim}
15356

15357
Notice that the argument is {\tt list}, which is a class object,
15358
not {\tt list()}, which is a new list.  The function you provide
15359
doesn't get called unless you access a key that doesn't exist.
15360

15361
\begin{verbatim}
15362
>>> t = d['new key']
15363
>>> t
15364
[]
15365
\end{verbatim}
15366

15367
The new list, which we're calling {\tt t}, is also added to the
15368
dictionary.  So if we modify {\tt t}, the change appears in {\tt d}:
15369

15370
\begin{verbatim}
15371
>>> t.append('new value')
15372
>>> d
15373
defaultdict(<class 'list'>, {'new key': ['new value']})
15374
\end{verbatim}
15375

15376
If you are making a dictionary of lists, you can often write simpler
15377
code using {\tt defaultdict}.  In my solution to
15378
Exercise~\ref{anagrams}, which you can get from
15379
\url{http://thinkpython2.com/code/anagram_sets.py}, I make a
15380
dictionary that maps from a sorted string of letters to the list of
15381
words that can be spelled with those letters.  For example, {\tt
15382
  'opst'} maps to the list {\tt ['opts', 'post', 'pots', 'spot',
15383
    'stop', 'tops']}.
15384

15385
Here's the original code:
15386

15387
\begin{verbatim}
15388
def all_anagrams(filename):
15389
    d = {}
15390
    for line in open(filename):
15391
        word = line.strip().lower()
15392
        t = signature(word)
15393
        if t not in d:
15394
            d[t] = [word]
15395
        else:
15396
            d[t].append(word)
15397
    return d
15398
\end{verbatim}
15399

15400
This can be simplified using {\tt setdefault}, which you might
15401
have used in Exercise~\ref{setdefault}:
15402
\index{setdefault}
15403

15404
\begin{verbatim}
15405
def all_anagrams(filename):
15406
    d = {}
15407
    for line in open(filename):
15408
        word = line.strip().lower()
15409
        t = signature(word)
15410
        d.setdefault(t, []).append(word)
15411
    return d
15412
\end{verbatim}
15413

15414
This solution has the drawback that it makes a new list
15415
every time, regardless of whether it is needed.  For lists,
15416
that's no big deal, but if the factory
15417
function is complicated, it might be.
15418
\index{factory function}
15419

15420
We can avoid this problem and 
15421
simplify the code using a {\tt defaultdict}:
15422

15423
\begin{verbatim}
15424
def all_anagrams(filename):
15425
    d = defaultdict(list)
15426
    for line in open(filename):
15427
        word = line.strip().lower()
15428
        t = signature(word)
15429
        d[t].append(word)
15430
    return d
15431
\end{verbatim}
15432

15433
My solution to Exercise~\ref{poker}, which you can download from
15434
\url{http://thinkpython2.com/code/PokerHandSoln.py},
15435
uses {\tt setdefault} in the function
15436
\verb"has_straightflush".  This solution has the drawback
15437
of creating a {\tt Hand} object every time through the loop, whether
15438
it is needed or not.  As an exercise, rewrite it using
15439
a defaultdict.
15440

15441

15442
\section{Named tuples}
15443

15444
Many simple objects are basically collections of related values.
15445
For example, the Point object defined in Chapter~\ref{clobjects} contains
15446
two numbers, {\tt x} and {\tt y}.  When you define a class like
15447
this, you usually start with an init method and a str method:
15448

15449
\begin{verbatim}
15450
class Point:
15451

15452
    def __init__(self, x=0, y=0):
15453
        self.x = x
15454
        self.y = y
15455

15456
    def __str__(self):
15457
        return '(%g, %g)' % (self.x, self.y)
15458
\end{verbatim}
15459

15460
This is a lot of code to convey a small amount of information.
15461
Python provides a more concise way to say the same thing:
15462

15463
\begin{verbatim}
15464
from collections import namedtuple
15465
Point = namedtuple('Point', ['x', 'y'])
15466
\end{verbatim}
15467

15468
The first argument is the name of the class you want to create.
15469
The second is a list of the attributes Point objects should have,
15470
as strings.  The return value from {\tt namedtuple} is a class object:
15471
\index{namedtuple}
15472
\index{object!namedtuple}
15473
\index{collections}
15474
\index{module!collections}
15475

15476
\begin{verbatim}
15477
>>> Point
15478
<class '__main__.Point'>
15479
\end{verbatim}
15480

15481
{\tt Point} automatically provides methods like \verb"__init__" and
15482
\verb"__str__" so you don't have to write them.
15483
\index{class object}
15484
\index{object!class}
15485

15486
To create a Point object, you use the Point class as a function:
15487

15488
\begin{verbatim}
15489
>>> p = Point(1, 2)
15490
>>> p
15491
Point(x=1, y=2)
15492
\end{verbatim}
15493

15494
The init method assigns the arguments to attributes using the names
15495
you provided.  The str method prints a representation of the Point
15496
object and its attributes.
15497

15498
You can access the elements of the named tuple by name:
15499

15500
\begin{verbatim}
15501
>>> p.x, p.y
15502
(1, 2)
15503
\end{verbatim}
15504

15505
But you can also treat a named tuple as a tuple:
15506

15507
\begin{verbatim}
15508
>>> p[0], p[1]
15509
(1, 2)
15510

15511
>>> x, y = p
15512
>>> x, y
15513
(1, 2)
15514
\end{verbatim}
15515

15516
Named tuples provide a quick way to define simple classes.
15517
The drawback is that simple classes don't always stay simple.
15518
You might decide later that you want to add methods to a named tuple.
15519
In that case, you could define a new class that inherits from
15520
the named tuple:
15521
\index{inheritance}
15522

15523
\begin{verbatim}
15524
class Pointier(Point):
15525
    # add more methods here
15526
\end{verbatim}
15527

15528
Or you could switch to a conventional class definition.
15529

15530

15531
\section{Gathering keyword args}
15532

15533
In Section~\ref{gather}, we saw how to write a function that
15534
gathers its arguments into a tuple:
15535
\index{gather}
15536

15537
\begin{verbatim}
15538
def printall(*args):
15539
    print(args)
15540
\end{verbatim}
15541
%
15542
You can call this function with any number of positional arguments
15543
(that is, arguments that don't have keywords):
15544
\index{positional argument}
15545
\index{argument!positional}
15546

15547
\begin{verbatim}
15548
>>> printall(1, 2.0, '3')
15549
(1, 2.0, '3')
15550
\end{verbatim}
15551
%
15552
But the {\tt *} operator doesn't gather keyword arguments:
15553
\index{keyword argument}
15554
\index{argument!keyword}
15555

15556
\begin{verbatim}
15557
>>> printall(1, 2.0, third='3')
15558
TypeError: printall() got an unexpected keyword argument 'third'
15559
\end{verbatim}
15560
%
15561
To gather keyword arguments, you can use the {\tt **} operator:
15562

15563
\begin{verbatim}
15564
def printall(*args, **kwargs):
15565
    print(args, kwargs)
15566
\end{verbatim}
15567
%
15568
You can call the keyword gathering parameter anything you want, but
15569
{\tt kwargs} is a common choice.  The result is a dictionary that maps
15570
keywords to values:
15571

15572
\begin{verbatim}
15573
>>> printall(1, 2.0, third='3')
15574
(1, 2.0) {'third': '3'}
15575
\end{verbatim}
15576
%
15577
If you have a dictionary of keywords and values, you can use the
15578
scatter operator, {\tt **} to call a function:
15579
\index{scatter}
15580

15581
\begin{verbatim}
15582
>>> d = dict(x=1, y=2)
15583
>>> Point(**d)
15584
Point(x=1, y=2)
15585
\end{verbatim}
15586
%
15587
Without the scatter operator, the function would treat {\tt d} as
15588
a single positional argument, so it would assign {\tt d} to
15589
{\tt x} and complain because there's nothing to assign to {\tt y}:
15590

15591
\begin{verbatim}
15592
>>> d = dict(x=1, y=2)
15593
>>> Point(d)
15594
Traceback (most recent call last):
15595
  File "<stdin>", line 1, in <module>
15596
TypeError: __new__() missing 1 required positional argument: 'y'
15597
\end{verbatim}
15598
%
15599
When you are working with functions that have a large number of
15600
parameters, it is often useful to create and pass around dictionaries
15601
that specify frequently used options.
15602

15603

15604
\section{Glossary}
15605

15606
\begin{description}
15607

15608
\item[conditional expression:] An expression that has one of two
15609
values, depending on a condition.
15610
\index{conditional expression}
15611
\index{expression!conditional}
15612

15613
\item[list comprehension:] An expression with a {\tt for} loop in square
15614
brackets that yields a new list.
15615
\index{list comprehension}
15616

15617
\item[generator expression:] An expression with a {\tt for} loop in parentheses
15618
that yields a generator object.  
15619
\index{generator expression}
15620
\index{expression!generator}
15621

15622
\item[multiset:] A mathematical entity that represents a mapping
15623
between the elements of a set and the number of times they appear.
15624

15625
\item[factory:] A function, usually passed as a parameter, used to
15626
create objects. 
15627
\index{factory}
15628

15629
\end{description}
15630

15631

15632

15633

15634
\section{Exercises}
15635

15636
\begin{exercise}
15637

15638
The following is a function computes the binomial
15639
coefficient recursively.
15640

15641
\begin{verbatim}
15642
def binomial_coeff(n, k):
15643
    """Compute the binomial coefficient "n choose k".
15644

15645
    n: number of trials
15646
    k: number of successes
15647

15648
    returns: int
15649
    """
15650
    if k == 0:
15651
        return 1
15652
    if n == 0:
15653
        return 0
15654

15655
    res = binomial_coeff(n-1, k) + binomial_coeff(n-1, k-1)
15656
    return res
15657
\end{verbatim}
15658

15659
Rewrite the body of the function using nested conditional
15660
expressions.
15661

15662
One note: this function is not very efficient because it ends up computing
15663
the same values over and over.  You could make it more efficient by
15664
memoizing (see Section~\ref{memoize}).  But you will find that it's harder to
15665
memoize if you write it using conditional expressions.
15666

15667
\end{exercise}
15668

15669

15670

15671
\appendix
15672

15673
\chapter{Debugging}
15674
\index{debugging}
15675

15676
When you are debugging, you should distinguish among different
15677
kinds of errors in order to track them down more quickly:
15678

15679
\begin{itemize}
15680

15681
\item Syntax errors are discovered by the interpreter when it is
15682
  translating the source code into byte code.  They indicate
15683
  that there is something wrong with the structure of the program.
15684
  Example: Omitting the colon at the end of a {\tt def} statement
15685
  generates the somewhat redundant message {\tt SyntaxError: invalid
15686
    syntax}.
15687
\index{syntax error}
15688
\index{error!syntax}
15689

15690
\item Runtime errors are produced by the interpreter if something goes
15691
  wrong while the program is running.  Most runtime error messages
15692
  include information about where the error occurred and what
15693
  functions were executing.  Example: An infinite recursion eventually
15694
  causes the runtime error ``maximum recursion depth exceeded''.
15695
\index{runtime error}
15696
\index{error!runtime}
15697
\index{exception}
15698

15699
\item Semantic errors are problems with a program that runs without
15700
  producing error messages but doesn't do the right thing.  Example:
15701
  An expression may not be evaluated in the order you expect, yielding
15702
  an incorrect result.
15703
\index{semantic error}
15704
\index{error!semantic}
15705

15706
\end{itemize}
15707

15708
The first step in debugging is to figure out which kind of
15709
error you are dealing with.  Although the following sections are
15710
organized by error type, some techniques are
15711
applicable in more than one situation.
15712

15713

15714
\section{Syntax errors}
15715
\index{error message}
15716

15717
Syntax errors are usually easy to fix once you figure out what they
15718
are.  Unfortunately, the error messages are often not helpful.
15719
The most common messages are {\tt SyntaxError: invalid syntax} and
15720
{\tt SyntaxError: invalid token}, neither of which is very informative.
15721

15722
On the other hand, the message does tell you where in the program the
15723
problem occurred.  Actually, it tells you where Python
15724
noticed a problem, which is not necessarily where the error
15725
is.  Sometimes the error is prior to the location of the error
15726
message, often on the preceding line.
15727
\index{incremental development}
15728
\index{development plan!incremental}
15729

15730
If you are building the program incrementally, you should have
15731
a good idea about where the error is.  It will be in the last
15732
line you added.
15733

15734
If you are copying code from a book, start by comparing
15735
your code to the book's code very carefully.  Check every character.
15736
At the same time, remember that the book might be wrong, so
15737
if you see something that looks like a syntax error, it might be.
15738

15739
Here are some ways to avoid the most common syntax errors:
15740
\index{syntax}
15741

15742
\begin{enumerate}
15743

15744
\item Make sure you are not using a Python keyword for a variable name.
15745
\index{keyword}
15746

15747
\item Check that you have a colon at the end of the header of every
15748
compound statement, including {\tt for}, {\tt while},
15749
{\tt if}, and {\tt def} statements.
15750
\index{header}
15751
\index{colon}
15752

15753
\item Make sure that any strings in the code have matching
15754
quotation marks.  Make sure that all quotation marks are
15755
``straight quotes'', not ``curly quotes''.
15756
\index{quotation mark}
15757

15758
\item If you have multiline strings with triple quotes (single or double), make
15759
sure you have terminated the string properly.  An unterminated string
15760
may cause an {\tt invalid token} error at the end of your program,
15761
or it may treat the following part of the program as a string until it
15762
comes to the next string.  In the second case, it might not produce an error
15763
message at all!
15764
\index{multiline string}
15765
\index{string!multiline}
15766

15767
\item An unclosed opening operator---\verb+(+, \verb+{+, or
15768
  \verb+[+---makes Python continue with the next line as part of the
15769
  current statement.  Generally, an error occurs almost immediately in
15770
  the next line.
15771

15772
\item Check for the classic {\tt =} instead of {\tt ==} inside
15773
a conditional.
15774
\index{conditional}
15775

15776
\item Check the indentation to make sure it lines up the way it
15777
is supposed to.  Python can handle space and tabs, but if you mix
15778
them it can cause problems.  The best way to avoid this problem
15779
is to use a text editor that knows about Python and generates
15780
consistent indentation.
15781
\index{indentation}
15782
\index{whitespace}
15783

15784
\item If you have non-ASCII characters in the code (including strings
15785
and comments), that might cause a problem, although Python 3 usually
15786
handles non-ASCII characters.  Be careful if you paste in text from
15787
a web page or other source.
15788

15789
\end{enumerate}
15790

15791
If nothing works, move on to the next section...
15792

15793

15794
\subsection{I keep making changes and it makes no difference.}
15795

15796
If the interpreter says there is an error and you don't see it, that
15797
might be because you and the interpreter are not looking at the same
15798
code.  Check your programming environment to make sure that the
15799
program you are editing is the one Python is trying to run.
15800

15801
If you are not sure, try putting an obvious and deliberate syntax
15802
error at the beginning of the program.  Now run it again.  If the
15803
interpreter doesn't find the new error, you are not running the
15804
new code.
15805

15806
There are a few likely culprits:
15807

15808
\begin{itemize}
15809

15810
\item You edited the file and forgot to save the changes before
15811
running it again.  Some programming environments do this
15812
for you, but some don't.
15813

15814
\item You changed the name of the file, but you are still running
15815
the old name.
15816

15817
\item Something in your development environment is configured
15818
incorrectly.
15819

15820
\item If you are writing a module and using {\tt import},
15821
make sure you don't give your module the same name as one
15822
of the standard Python modules.
15823

15824
\item If you are using {\tt import} to read a module, remember
15825
that you have to restart the interpreter or use {\tt reload}
15826
to read a modified file.  If you import the module again, it
15827
doesn't do anything.
15828
\index{module!reload}
15829
\index{reload function}
15830
\index{function!reload}
15831

15832
\end{itemize}
15833

15834
If you get stuck and you can't figure out what is going on, one
15835
approach is to start again with a new program like ``Hello, World!'',
15836
and make sure you can get a known program to run.  Then gradually add
15837
the pieces of the original program to the new one.
15838

15839

15840
\section{Runtime errors}
15841

15842
Once your program is syntactically correct,
15843
Python can read it and at least start running it.  What could
15844
possibly go wrong?
15845

15846

15847
\subsection{My program does absolutely nothing.}
15848

15849
This problem is most common when your file consists of functions and
15850
classes but does not actually invoke a function to start execution.
15851
This may be intentional if you only plan to import this module to
15852
supply classes and functions.
15853

15854
If it is not intentional, make sure there is a function call
15855
in the program, and make sure the flow of execution reaches
15856
it (see ``Flow of Execution'' below).
15857

15858

15859
\subsection{My program hangs.}
15860
\index{infinite loop}
15861
\index{infinite recursion}
15862
\index{hanging}
15863

15864
If a program stops and seems to be doing nothing, it is ``hanging''.
15865
Often that means that it is caught in an infinite loop or infinite
15866
recursion.
15867

15868
\begin{itemize}
15869

15870
\item If there is a particular loop that you suspect is the
15871
problem, add a {\tt print} statement immediately before the loop that says
15872
``entering the loop'' and another immediately after that says
15873
``exiting the loop''.
15874

15875
Run the program.  If you get the first message and not the second,
15876
you've got an infinite loop.  Go to the ``Infinite Loop'' section
15877
below.
15878

15879
\item Most of the time, an infinite recursion will cause the program
15880
to run for a while and then produce a ``RuntimeError: Maximum
15881
recursion depth exceeded'' error.  If that happens, go to the
15882
``Infinite Recursion'' section below.
15883

15884
If you are not getting this error but you suspect there is a problem
15885
with a recursive method or function, you can still use the techniques
15886
in the ``Infinite Recursion'' section.
15887

15888
\item If neither of those steps works, start testing other
15889
loops and other recursive functions and methods.
15890

15891
\item If that doesn't work, then it is possible that
15892
you don't understand the flow of execution in your program.
15893
Go to the ``Flow of Execution'' section below.
15894

15895
\end{itemize}
15896

15897

15898
\subsubsection{Infinite Loop}
15899
\index{infinite loop}
15900
\index{loop!infinite}
15901
\index{condition}
15902
\index{loop!condition}
15903

15904
If you think you have an infinite loop and you think you know
15905
what loop is causing the problem, add a {\tt print} statement at
15906
the end of the loop that prints the values of the variables in
15907
the condition and the value of the condition.
15908

15909
For example:
15910

15911
\begin{verbatim}
15912
while x > 0 and y < 0 :
15913
    # do something to x
15914
    # do something to y
15915

15916
    print('x: ', x)
15917
    print('y: ', y)
15918
    print("condition: ", (x > 0 and y < 0))
15919
\end{verbatim}
15920
%
15921
Now when you run the program, you will see three lines of output
15922
for each time through the loop.  The last time through the
15923
loop, the condition should be {\tt False}.  If the loop keeps
15924
going, you will be able to see the values of {\tt x} and {\tt y},
15925
and you might figure out why they are not being updated correctly.
15926

15927

15928
\subsubsection{Infinite Recursion}
15929
\index{infinite recursion}
15930
\index{recursion!infinite}
15931

15932
Most of the time, infinite recursion causes the program to run
15933
for a while and then produce a {\tt Maximum recursion depth exceeded}
15934
error.
15935

15936
If you suspect that a function is causing an infinite
15937
recursion, make sure that there is a base case.
15938
There should be some condition that causes the
15939
function to return without making a recursive invocation.
15940
If not, you need to rethink the algorithm and identify a base
15941
case.
15942

15943
If there is a base case but the program doesn't seem to be reaching
15944
it, add a {\tt print} statement at the beginning of the function
15945
that prints the parameters.  Now when you run the program, you will see
15946
a few lines of output every time the function is invoked,
15947
and you will see the parameter values.  If the parameters are not moving
15948
toward the base case, you will get some ideas about why not.
15949

15950

15951
\subsubsection{Flow of Execution}
15952
\index{flow of execution}
15953

15954
If you are not sure how the flow of execution is moving through
15955
your program, add {\tt print} statements to the beginning of each
15956
function with a message like ``entering function {\tt foo}'', where
15957
{\tt foo} is the name of the function.
15958

15959
Now when you run the program, it will print a trace of each
15960
function as it is invoked.
15961

15962

15963
\subsection{When I run the program I get an exception.}
15964
\index{exception}
15965
\index{runtime error}
15966

15967
If something goes wrong during runtime, Python
15968
prints a message that includes the name of the
15969
exception, the line of the program where the problem occurred,
15970
and a traceback.
15971
\index{traceback}
15972

15973
The traceback identifies the function that is currently running, and
15974
then the function that called it, and then the function that called
15975
{\em that}, and so on.  In other words, it traces the sequence of
15976
function calls that got you to where you are, including the line
15977
number in your file where each call occurred.
15978

15979
The first step is to examine the place in the program where
15980
the error occurred and see if you can figure out what happened.
15981
These are some of the most common runtime errors:
15982

15983
\begin{description}
15984

15985
\item[NameError:]  You are trying to use a variable that doesn't
15986
exist in the current environment.  Check if the name
15987
is spelled right, or at least consistently.
15988
And remember that local variables are local; you
15989
cannot refer to them from outside the function where they are defined.
15990
\index{NameError}
15991
\index{exception!NameError}
15992

15993
\item[TypeError:] There are several possible causes:
15994
\index{TypeError}
15995
\index{exception!TypeError}
15996

15997
\begin{itemize}
15998

15999
\item  You are trying to use a value improperly.  Example: indexing
16000
a string, list, or tuple with something other than an integer.
16001
\index{index}
16002

16003
\item There is a mismatch between the items in a format string and
16004
the items passed for conversion.  This can happen if either the number
16005
of items does not match or an invalid conversion is called for.
16006
\index{format operator}
16007
\index{operator!format}
16008

16009
\item You are passing the wrong number of arguments to a function.
16010
For methods, look at the method definition and
16011
check that the first parameter is {\tt self}.  Then look at the
16012
method invocation; make sure you are invoking the method on an
16013
object with the right type and providing the other arguments
16014
correctly.
16015

16016
\end{itemize}
16017

16018
\item[KeyError:]  You are trying to access an element of a dictionary
16019
using a key that the dictionary does not contain.  If the keys
16020
are strings, remember that capitalization matters.
16021
\index{KeyError}
16022
\index{exception!KeyError}
16023
\index{dictionary}
16024

16025
\item[AttributeError:] You are trying to access an attribute or method
16026
  that does not exist.  Check the spelling!  You can use the built-in
16027
  function {\tt vars} to list the attributes that do exist.
16028
\index{dir function}
16029
\index{function!dir}
16030

16031
If an AttributeError indicates that an object has {\tt NoneType},
16032
that means that it is {\tt None}.  So the problem is not the
16033
attribute name, but the object.
16034

16035
The reason the object is none might be that you forgot
16036
to return a value from a function; if you get to the end of
16037
a function without hitting a {\tt return} statement, it returns
16038
{\tt None}.  Another common cause is using the result from
16039
a list method, like {\tt sort}, that returns {\tt None}.
16040
\index{AttributeError}
16041
\index{exception!AttributeError}
16042

16043
\item[IndexError:] The index you are using
16044
to access a list, string, or tuple is greater than
16045
its length minus one.  Immediately before the site of the error,
16046
add a {\tt print} statement to display
16047
the value of the index and the length of the array.
16048
Is the array the right size?  Is the index the right value?
16049
\index{IndexError}
16050
\index{exception!IndexError}
16051

16052
\end{description}
16053

16054
The Python debugger ({\tt pdb}) is useful for tracking down
16055
exceptions because it allows you to examine the state of the
16056
program immediately before the error.  You can read
16057
about {\tt pdb} at \url{https://docs.python.org/3/library/pdb.html}.
16058
\index{debugger (pdb)}
16059
\index{pdb (Python debugger)}
16060

16061

16062
\subsection{I added so many {\tt print} statements I get inundated with
16063
output.}
16064
\index{print statement}
16065
\index{statement!print}
16066

16067
One of the problems with using {\tt print} statements for debugging
16068
is that you can end up buried in output.  There are two ways
16069
to proceed: simplify the output or simplify the program.
16070

16071
To simplify the output, you can remove or comment out {\tt print}
16072
statements that aren't helping, or combine them, or format
16073
the output so it is easier to understand.
16074

16075
To simplify the program, there are several things you can do.  First,
16076
scale down the problem the program is working on.  For example, if you
16077
are searching a list, search a {\em small} list.  If the program takes
16078
input from the user, give it the simplest input that causes the
16079
problem.
16080
\index{dead code}
16081

16082
Second, clean up the program.  Remove dead code and reorganize the
16083
program to make it as easy to read as possible.  For example, if you
16084
suspect that the problem is in a deeply nested part of the program,
16085
try rewriting that part with simpler structure.  If you suspect a
16086
large function, try splitting it into smaller functions and testing them
16087
separately.
16088
\index{testing!minimal test case}
16089
\index{test case, minimal}
16090

16091
Often the process of finding the minimal test case leads you to the
16092
bug.  If you find that a program works in one situation but not in
16093
another, that gives you a clue about what is going on.
16094

16095
Similarly, rewriting a piece of code can help you find subtle
16096
bugs.  If you make a change that you think shouldn't affect the
16097
program, and it does, that can tip you off.
16098

16099

16100
\section{Semantic errors}
16101

16102
In some ways, semantic errors are the hardest to debug,
16103
because the interpreter provides no information
16104
about what is wrong.  Only you know what the program is supposed to
16105
do.
16106
\index{semantic error}
16107
\index{error!semantic}
16108

16109
The first step is to make a connection between the program
16110
text and the behavior you are seeing.  You need a hypothesis
16111
about what the program is actually doing.  One of the things
16112
that makes that hard is that computers run so fast.
16113

16114
You will often wish that you could slow the program down to human
16115
speed, and with some debuggers you can.  But the time it takes to
16116
insert a few well-placed {\tt print} statements is often short compared to
16117
setting up the debugger, inserting and removing breakpoints, and
16118
``stepping'' the program to where the error is occurring.
16119

16120

16121
\subsection{My program doesn't work.}
16122

16123
You should ask yourself these questions:
16124

16125
\begin{itemize}
16126

16127
\item Is there something the program was supposed to do but
16128
which doesn't seem to be happening?  Find the section of the code
16129
that performs that function and make sure it is executing when
16130
you think it should.
16131

16132
\item Is something happening that shouldn't?  Find code in
16133
your program that performs that function and see if it is
16134
executing when it shouldn't.
16135

16136
\item Is a section of code producing an effect that is not
16137
what you expected?  Make sure that you understand the code in
16138
question, especially if it involves functions or methods in
16139
other Python modules.  Read the documentation for the functions you call.
16140
Try them out by writing simple test cases and checking the results.
16141

16142
\end{itemize}
16143

16144
In order to program, you need a mental model of how
16145
programs work.  If you write a program that doesn't do what you expect,
16146
often the problem is not in the program; it's in your mental
16147
model.
16148
\index{model, mental}
16149
\index{mental model}
16150

16151
The best way to correct your mental model is to break the program
16152
into its components (usually the functions and methods) and test
16153
each component independently.  Once you find the discrepancy
16154
between your model and reality, you can solve the problem.
16155

16156
Of course, you should be building and testing components as you
16157
develop the program.  If you encounter a problem,
16158
there should be only a small amount of new code
16159
that is not known to be correct.
16160

16161

16162
\subsection{I've got a big hairy expression and it doesn't
16163
do what I expect.}
16164
\index{expression!big and hairy}
16165
\index{big, hairy expression}
16166

16167
Writing complex expressions is fine as long as they are readable,
16168
but they can be hard to debug.  It is often a good idea to
16169
break a complex expression into a series of assignments to
16170
temporary variables.
16171

16172
For example:
16173

16174
\begin{verbatim}
16175
self.hands[i].addCard(self.hands[self.findNeighbor(i)].popCard())
16176
\end{verbatim}
16177
%
16178
This can be rewritten as:
16179

16180
\begin{verbatim}
16181
neighbor = self.findNeighbor(i)
16182
pickedCard = self.hands[neighbor].popCard()
16183
self.hands[i].addCard(pickedCard)
16184
\end{verbatim}
16185
%
16186
The explicit version is easier to read because the variable
16187
names provide additional documentation, and it is easier to debug
16188
because you can check the types of the intermediate variables
16189
and display their values.
16190
\index{temporary variable}
16191
\index{variable!temporary}
16192

16193
Another problem that can occur with big expressions is
16194
that the order of evaluation may not be what you expect.
16195
For example, if you are translating the expression
16196
$\frac{x}{2 \pi}$ into Python, you might write:
16197

16198
\begin{verbatim}
16199
y = x / 2 * math.pi
16200
\end{verbatim}
16201
%
16202
That is not correct because multiplication and division have
16203
the same precedence and are evaluated from left to right.
16204
So this expression computes $x \pi / 2$.
16205
\index{order of operations}
16206
\index{precedence}
16207

16208
A good way to debug expressions is to add parentheses to make
16209
the order of evaluation explicit:
16210

16211
\begin{verbatim}
16212
 y = x / (2 * math.pi)
16213
\end{verbatim}
16214
%
16215
Whenever you are not sure of the order of evaluation, use
16216
parentheses.  Not only will the program be correct (in the sense
16217
of doing what you intended), it will also be more readable for
16218
other people who haven't memorized the order of operations.
16219

16220

16221
\subsection{I've got a function that doesn't return what I
16222
expect.}
16223
\index{return statement}
16224
\index{statement!return}
16225

16226
If you have a {\tt return} statement with a complex expression,
16227
you don't have a chance to print the result before
16228
returning.  Again, you can use a temporary variable.  For
16229
example, instead of:
16230

16231
\begin{verbatim}
16232
return self.hands[i].removeMatches()
16233
\end{verbatim}
16234
%
16235
you could write:
16236

16237
\begin{verbatim}
16238
count = self.hands[i].removeMatches()
16239
return count
16240
\end{verbatim}
16241
%
16242
Now you have the opportunity to display the value of
16243
{\tt count} before returning.
16244

16245

16246
\subsection{I'm really, really stuck and I need help.}
16247

16248
First, try getting away from the computer for a few minutes.
16249
Computers emit waves that affect the brain, causing these
16250
symptoms:
16251

16252
\begin{itemize}
16253

16254
\item Frustration and rage.
16255
\index{frustration}
16256
\index{rage}
16257
\index{debugging!emotional response}
16258
\index{emotional debugging}
16259

16260
\item Superstitious beliefs (``the computer hates me'') and
16261
magical thinking (``the program only works when I wear my
16262
hat backward'').
16263
\index{debugging!superstition}
16264
\index{superstitious debugging}
16265

16266
\item Random walk programming (the attempt to program by writing
16267
every possible program and choosing the one that does the right
16268
thing).
16269
\index{random walk programming}
16270
\index{development plan!random walk programming}
16271

16272
\end{itemize}
16273

16274
If you find yourself suffering from any of these symptoms, get
16275
up and go for a walk.  When you are calm, think about the program.
16276
What is it doing?  What are some possible causes of that
16277
behavior?  When was the last time you had a working program,
16278
and what did you do next?
16279

16280
Sometimes it just takes time to find a bug.  I often find bugs
16281
when I am away from the computer and let my mind wander.  Some
16282
of the best places to find bugs are trains, showers, and in bed,
16283
just before you fall asleep.
16284

16285

16286
\subsection{No, I really need help.}
16287

16288
It happens.  Even the best programmers occasionally get stuck.
16289
Sometimes you work on a program so long that you can't see the
16290
error.  You need a fresh pair of eyes.
16291

16292
Before you bring someone else in, make sure you are prepared.
16293
Your program should be as simple
16294
as possible, and you should be working on the smallest input
16295
that causes the error.  You should have {\tt print} statements in the
16296
appropriate places (and the output they produce should be
16297
comprehensible).  You should understand the problem well enough
16298
to describe it concisely.
16299

16300
When you bring someone in to help, be sure to give
16301
them the information they need:
16302

16303
\begin{itemize}
16304

16305
\item If there is an error message, what is it
16306
and what part of the program does it indicate?
16307

16308
\item What was the last thing you did before this error occurred?
16309
What were the last lines of code that you wrote, or what is
16310
the new test case that fails?
16311

16312
\item What have you tried so far, and what have you learned?
16313

16314
\end{itemize}
16315

16316
When you find the bug, take a second to think about what you
16317
could have done to find it faster.  Next time you see something
16318
similar, you will be able to find the bug more quickly.
16319

16320
Remember, the goal is not just to make the program
16321
work.  The goal is to learn how to make the program work.
16322

16323

16324
\chapter{Analysis of Algorithms}
16325
\label{algorithms}
16326

16327
\begin{quote}
16328
This appendix is an edited excerpt from {\it Think Complexity}, by
16329
Allen B. Downey, also published by O'Reilly Media (2012).  When you
16330
are done with this book, you might want to move on to that one.
16331
\end{quote}
16332

16333
{\bf Analysis of algorithms} is a branch of computer science that
16334
studies the performance of algorithms, especially their run time and
16335
space requirements.  See
16336
\url{http://en.wikipedia.org/wiki/Analysis_of_algorithms}.
16337
\index{algorithm} \index{analysis of algorithms}
16338

16339
The practical goal of algorithm analysis is to predict the performance
16340
of different algorithms in order to guide design decisions.
16341

16342
During the 2008 United States Presidential Campaign, candidate
16343
Barack Obama was asked to perform an impromptu analysis when
16344
he visited Google.  Chief executive Eric Schmidt jokingly asked him
16345
for ``the most efficient way to sort a million 32-bit integers.''
16346
Obama had apparently been tipped off, because he quickly
16347
replied, ``I think the bubble sort would be the wrong way to go.''
16348
See \url{http://www.youtube.com/watch?v=k4RRi_ntQc8}.
16349
\index{Obama, Barack}
16350
\index{Schmidt, Eric}
16351
\index{bubble sort}
16352

16353
This is true: bubble sort is conceptually simple but slow for
16354
large datasets.  The answer Schmidt was probably looking for is
16355
``radix sort'' (\url{http://en.wikipedia.org/wiki/Radix_sort})\footnote{
16356
But if you get a question like this in an interview, I think
16357
a better answer is, ``The fastest way to sort a million integers
16358
is to use whatever sort function is provided by the language
16359
I'm using.  Its performance is good enough for the vast majority
16360
of applications, but if it turned out that my application was too
16361
slow, I would use a profiler to see where the time was being
16362
spent.  If it looked like a faster sort algorithm would have
16363
a significant effect on performance, then I would look
16364
around for a good implementation of radix sort.''}.
16365
\index{radix sort}
16366

16367
The goal of algorithm analysis is to make meaningful
16368
comparisons between algorithms, but there are some problems:
16369
\index{comparing algorithms}
16370

16371
\begin{itemize}
16372

16373
\item The relative performance of the algorithms might
16374
depend on characteristics of the hardware, so one algorithm
16375
might be faster on Machine A, another on Machine B.
16376
The general solution to this problem is to specify a
16377
{\bf machine model} and analyze the number of steps, or
16378
operations, an algorithm requires under a given model.
16379
\index{machine model}
16380

16381
\item Relative performance might depend on the details of
16382
the dataset.  For example, some sorting
16383
algorithms run faster if the data are already partially sorted;
16384
other algorithms run slower in this case.
16385
A common way to avoid this problem is to analyze the
16386
{\bf worst case} scenario.  It is sometimes useful to
16387
analyze average case performance, but that's usually harder,
16388
and it might not be obvious what set of cases to average over.
16389
\index{worst case}
16390
\index{average case}
16391

16392
\item Relative performance also depends on the size of the
16393
problem.  A sorting algorithm that is fast for small lists
16394
might be slow for long lists.
16395
The usual solution to this problem is to express run time
16396
(or number of operations) as a function of problem size,
16397
and group functions into categories depending on how quickly
16398
they grow as problem size increases.
16399

16400
\end{itemize}
16401

16402
The good thing about this kind of comparison is that it lends
16403
itself to simple classification of algorithms.  For example,
16404
if I know that the run time of Algorithm A tends to be
16405
proportional to the size of the input, $n$, and Algorithm B
16406
tends to be proportional to $n^2$, then I
16407
expect A to be faster than B, at least for large values of $n$.
16408

16409
This kind of analysis comes with some caveats, but we'll get
16410
to that later.
16411

16412

16413
\section{Order of growth}
16414

16415
Suppose you have analyzed two algorithms and expressed
16416
their run times in terms of the size of the input:
16417
Algorithm A takes $100n+1$ steps to solve a problem with
16418
size $n$; Algorithm B takes $n^2 + n + 1$ steps.
16419
\index{order of growth}
16420

16421
The following table shows the run time of these algorithms
16422
for different problem sizes:
16423

16424
\begin{tabular}{|r|r|r|}
16425
\hline
16426
Input     &   Run time of     & Run time of \\
16427
size      &   Algorithm A     & Algorithm B \\
16428
\hline
16429
10        &   1 001           & 111         \\
16430
100       &   10 001          & 10 101         \\
16431
1 000     &   100 001         & 1 001 001         \\
16432
10 000    &   1 000 001       & 100 010 001         \\
16433
\hline
16434
\end{tabular}
16435

16436
At $n=10$, Algorithm A looks pretty bad; it takes almost 10 times
16437
longer than Algorithm B.  But for $n=100$ they are about the same, and
16438
for larger values A is much better.
16439

16440
The fundamental reason is that for large values of $n$, any function
16441
that contains an $n^2$ term will grow faster than a function whose
16442
leading term is $n$.  The {\bf leading term} is the term with the
16443
highest exponent.
16444
\index{leading term}
16445
\index{exponent}
16446

16447
For Algorithm A, the leading term has a large coefficient, 100, which
16448
is why B does better than A for small $n$.  But regardless of the
16449
coefficients, there will always be some value of $n$ where
16450
$a n^2 > b n$, for any values of $a$ and $b$.
16451
\index{leading coefficient}
16452

16453
The same argument applies to the non-leading terms.  Even if the run
16454
time of Algorithm A were $n+1000000$, it would still be better than
16455
Algorithm B for sufficiently large $n$.
16456

16457
In general, we expect an algorithm with a smaller leading term to be a
16458
better algorithm for large problems, but for smaller problems, there
16459
may be a {\bf crossover point} where another algorithm is better.  The
16460
location of the crossover point depends on the details of the
16461
algorithms, the inputs, and the hardware, so it is usually ignored for
16462
purposes of algorithmic analysis.  But that doesn't mean you can forget
16463
about it.
16464
\index{crossover point}
16465

16466
If two algorithms have the same leading order term, it is hard to say
16467
which is better; again, the answer depends on the details.  So for
16468
algorithmic analysis, functions with the same leading term
16469
are considered equivalent, even if they have different coefficients.
16470

16471
An {\bf order of growth} is a set of functions whose growth
16472
behavior is considered equivalent.  For example, $2n$, $100n$ and $n+1$ 
16473
belong to the same order of growth, which is written $O(n)$ in
16474
{\bf Big-Oh notation} and often called {\bf linear} because every function
16475
in the set grows linearly with $n$.
16476
\index{big-oh notation}
16477
\index{linear growth}
16478

16479
All functions with the leading term $n^2$ belong to $O(n^2)$; they are
16480
called {\bf quadratic}.
16481
\index{quadratic growth}
16482

16483
The following table shows some of the orders of growth that
16484
appear most commonly in algorithmic analysis,
16485
in increasing order of badness.
16486
\index{badness}
16487

16488
\begin{tabular}{|r|r|r|}
16489
\hline
16490
Order of     &   Name      \\
16491
growth       &               \\
16492
\hline
16493
$O(1)$             & constant \\
16494
$O(\log_b n)$      & logarithmic (for any $b$) \\
16495
$O(n)$             & linear \\
16496
$O(n \log_b n)$    & linearithmic \\
16497
$O(n^2)$           & quadratic     \\
16498
$O(n^3)$           & cubic     \\
16499
$O(c^n)$           & exponential (for any $c$)    \\
16500
\hline
16501
\end{tabular}
16502

16503
For the logarithmic terms, the base of the logarithm doesn't matter;
16504
changing bases is the equivalent of multiplying by a constant, which
16505
doesn't change the order of growth.  Similarly, all exponential
16506
functions belong to the same order of growth regardless of the base of
16507
the exponent.
16508
Exponential functions grow very quickly, so exponential algorithms are
16509
only useful for small problems.
16510
\index{logarithmic growth}
16511
\index{exponential growth}
16512

16513

16514
\begin{exercise}
16515

16516
Read the Wikipedia page on Big-Oh notation at
16517
\url{http://en.wikipedia.org/wiki/Big_O_notation} and
16518
answer the following questions:
16519

16520
\begin{enumerate}
16521
\item What is the order of growth of $n^3 + n^2$?
16522
What about $1000000 n^3 + n^2$?
16523
What about $n^3 + 1000000 n^2$?
16524

16525
\item What is the order of growth of $(n^2 + n) \cdot (n + 1)$?  Before
16526
  you start multiplying, remember that you only need the leading term.
16527

16528
\item If $f$ is in $O(g)$, for some unspecified function $g$, what can
16529
  we say about $af+b$, where $a$ and $b$ are constants?
16530

16531
\item If $f_1$ and $f_2$ are in $O(g)$, what can we say about $f_1 + f_2$?
16532

16533
\item If  $f_1$ is in $O(g)$
16534
and $f_2$ is in $O(h)$,
16535
what can we say about  $f_1 + f_2$?
16536

16537
\item If  $f_1$ is in $O(g)$ and $f_2$ is $O(h)$,
16538
what can we say about  $f_1 \cdot f_2$?
16539
\end{enumerate}
16540

16541
\end{exercise}
16542

16543
Programmers who care about performance often find this kind of
16544
analysis hard to swallow.  They have a point: sometimes the
16545
coefficients and the non-leading terms make a real difference.
16546
Sometimes the details of the hardware, the programming language, and
16547
the characteristics of the input make a big difference.  And for small
16548
problems, order of growth is irrelevant.
16549

16550
But if you keep those caveats in mind, algorithmic analysis is a
16551
useful tool.  At least for large problems, the ``better'' algorithm
16552
is usually better, and sometimes it is {\em much} better.  The
16553
difference between two algorithms with the same order of growth is
16554
usually a constant factor, but the difference between a good algorithm
16555
and a bad algorithm is unbounded!
16556

16557

16558
\section{Analysis of basic Python operations}
16559

16560
In Python, most arithmetic operations are constant time;
16561
multiplication usually takes longer than addition and subtraction, and
16562
division takes even longer, but these run times don't depend on the
16563
magnitude of the operands.  Very large integers are an exception; in
16564
that case the run time increases with the number of digits.
16565
\index{analysis of primitives}
16566

16567
Indexing operations---reading or writing elements in a sequence
16568
or dictionary---are also constant time, regardless of the size
16569
of the data structure.
16570
\index{indexing}
16571

16572
A {\tt for} loop that traverses a sequence or dictionary is
16573
usually linear, as long as all of the operations in the body
16574
of the loop are constant time.  For example, adding up the
16575
elements of a list is linear:
16576

16577
\begin{verbatim}
16578
    total = 0
16579
    for x in t:
16580
        total += x
16581
\end{verbatim}
16582

16583
The built-in function {\tt sum} is also linear because it does
16584
the same thing, but it tends to be faster because it is a more
16585
efficient implementation; in the language of algorithmic analysis,
16586
it has a smaller leading coefficient.
16587

16588
As a rule of thumb, if the body of a loop is in $O(n^a)$ then
16589
the whole loop is in $O(n^{a+1})$.  The exception is if you can
16590
show that the loop exits after a constant number of iterations.
16591
If a loop runs $k$ times regardless of $n$, then
16592
the loop is in $O(n^a)$, even for large $k$.
16593

16594
Multiplying by $k$ doesn't change the order of growth, but neither
16595
does dividing.  So if the body of a loop is in $O(n^a)$ and it runs
16596
$n/k$ times, the loop is in $O(n^{a+1})$, even for large $k$.
16597

16598
Most string and tuple operations are linear, except indexing and {\tt
16599
  len}, which are constant time.  The built-in functions {\tt min} and
16600
{\tt max} are linear.  The run-time of a slice operation is
16601
proportional to the length of the output, but independent of the size
16602
of the input.
16603
\index{string methods}
16604
\index{tuple methods}
16605

16606
String concatenation is linear; the run time depends on the sum
16607
of the lengths of the operands.
16608
\index{string concatenation}
16609

16610
All string methods are linear, but if the lengths of
16611
the strings are bounded by a constant---for example, operations on single
16612
characters---they are considered constant time.
16613
The string method {\tt join} is linear; the run time depends on
16614
the total length of the strings.
16615
\index{join@{\tt join}}
16616

16617
Most list methods are linear, but there are some exceptions:
16618
\index{list methods}
16619

16620
\begin{itemize}
16621

16622
\item Adding an element to the end of a list is constant time on
16623
average; when it runs out of room it occasionally gets copied
16624
to a bigger location, but the total time for $n$ operations
16625
is $O(n)$, so the average time for each
16626
operation is $O(1)$.
16627

16628
\item Removing an element from the end of a list is constant time.
16629

16630
\item Sorting is $O(n \log n)$.
16631
\index{sorting}
16632

16633
\end{itemize}
16634

16635
Most dictionary operations and methods are constant time, but
16636
there are some exceptions:
16637
\index{dictionary methods}
16638

16639
\begin{itemize}
16640

16641
\item The run time of {\tt update} is
16642
  proportional to the size of the dictionary passed as a parameter,
16643
  not the dictionary being updated.
16644

16645
\item {\tt keys}, {\tt values} and {\tt items} are constant time because 
16646
  they return iterators.  But
16647
  if you loop through the iterators, the loop will be linear.
16648
\index{iterator}
16649

16650
\end{itemize}
16651

16652
The performance of dictionaries is one of the minor miracles of
16653
computer science.  We will see how they work in
16654
Section~\ref{hashtable}.
16655

16656

16657
\begin{exercise}
16658

16659
Read the Wikipedia page on sorting algorithms at
16660
\url{http://en.wikipedia.org/wiki/Sorting_algorithm} and answer
16661
the following questions:
16662
\index{sorting}
16663

16664
\begin{enumerate}
16665

16666
\item What is a ``comparison sort?'' What is the best worst-case order
16667
  of growth for a comparison sort?  What is the best worst-case order
16668
  of growth for any sort algorithm?
16669
\index{comparison sort}
16670

16671
\item What is the order of growth of bubble sort, and why does Barack
16672
  Obama think it is ``the wrong way to go?''
16673

16674
\item What is the order of growth of radix sort?  What preconditions
16675
  do we need to use it?
16676

16677
\item What is a stable sort and why might it matter in practice?
16678
\index{stable sort}
16679

16680
\item What is the worst sorting algorithm (that has a name)?
16681

16682
\item What sort algorithm does the C library use?  What sort algorithm
16683
  does Python use?  Are these algorithms stable?  You might have to
16684
  Google around to find these answers.
16685

16686
\item Many of the non-comparison sorts are linear, so why does does
16687
  Python use an $O(n \log n)$ comparison sort?
16688

16689
\end{enumerate}
16690

16691
\end{exercise}
16692

16693

16694
\section{Analysis of search algorithms}
16695

16696
A {\bf search} is an algorithm that takes a collection and a target
16697
item and determines whether the target is in the collection, often
16698
returning the index of the target.
16699
\index{search}
16700

16701
The simplest search algorithm is a ``linear search'', which traverses
16702
the items of the collection in order, stopping if it finds the target.
16703
In the worst case it has to traverse the entire collection, so the run
16704
time is linear.
16705
\index{linear search}
16706

16707
The {\tt in} operator for sequences uses a linear search; so do string
16708
methods like {\tt find} and {\tt count}.
16709
\index{in@{\tt in} operator}
16710

16711
If the elements of the sequence are in order, you can use a {\bf
16712
  bisection search}, which is $O(\log n)$.  Bisection search is
16713
similar to the algorithm you might use to look a word up in a
16714
dictionary (a paper dictionary, not the data structure).  Instead of
16715
starting at the beginning and checking each item in order, you start
16716
with the item in the middle and check whether the word you are looking
16717
for comes before or after.  If it comes before, then you search the
16718
first half of the sequence.  Otherwise you search the second half.
16719
Either way, you cut the number of remaining items in half.
16720
\index{bisection search}
16721

16722
If the sequence has 1,000,000 items, it will take about 20 steps to
16723
find the word or conclude that it's not there.  So that's about 50,000
16724
times faster than a linear search.
16725

16726
Bisection search can be much faster than linear search, but
16727
it requires the sequence to be in order, which might require
16728
extra work.
16729

16730
There is another data structure, called a {\bf hashtable} that
16731
is even faster---it can do a search in constant time---and it
16732
doesn't require the items to be sorted.  Python dictionaries
16733
are implemented using hashtables, which is why most dictionary
16734
operations, including the {\tt in} operator, are constant time.
16735

16736

16737
\section{Hashtables}
16738
\label{hashtable}
16739

16740
To explain how hashtables work and why their performance is so
16741
good, I start with a simple implementation of a map and
16742
gradually improve it until it's a hashtable.
16743
\index{hashtable}
16744

16745
I use Python to demonstrate these implementations, but in real
16746
life you wouldn't write code like this in Python; you would just use a
16747
dictionary!  So for the rest of this chapter, you have to imagine that
16748
dictionaries don't exist and you want to implement a data structure
16749
that maps from keys to values.  The operations you have to
16750
implement are:
16751

16752
\begin{description}
16753

16754
\item[{\tt add(k, v)}:] Add a new item that maps from key {\tt k}
16755
to value {\tt v}.  With a Python dictionary, {\tt d}, this operation
16756
is written {\tt d[k] = v}.
16757

16758
\item[{\tt get(k)}:] Look up and return the value that corresponds
16759
to key {\tt k}.  With a Python dictionary, {\tt d}, this operation
16760
is written {\tt d[k]} or {\tt d.get(k)}.
16761

16762
\end{description}
16763

16764
For now, I assume that each key only appears once.
16765
The simplest implementation of this interface uses a list of
16766
tuples, where each tuple is a key-value pair.
16767
\index{LinearMap@{\tt LinearMap}}
16768

16769
\begin{verbatim}
16770
class LinearMap:
16771

16772
    def __init__(self):
16773
        self.items = []
16774

16775
    def add(self, k, v):
16776
        self.items.append((k, v))
16777

16778
    def get(self, k):
16779
        for key, val in self.items:
16780
            if key == k:
16781
                return val
16782
        raise KeyError
16783
\end{verbatim}
16784

16785
{\tt add} appends a key-value tuple to the list of items, which
16786
takes constant time.
16787

16788
{\tt get} uses a {\tt for} loop to search the list:
16789
if it finds the target key it returns the corresponding value;
16790
otherwise it raises a {\tt KeyError}.
16791
So {\tt get} is linear.
16792
\index{KeyError@{\tt KeyError}}
16793

16794
An alternative is to keep the list sorted by key.  Then {\tt get}
16795
could use a bisection search, which is $O(\log n)$.  But inserting a
16796
new item in the middle of a list is linear, so this might not be the
16797
best option.  There are other data structures that can implement {\tt
16798
  add} and {\tt get} in log time, but that's still not as good as
16799
constant time, so let's move on.
16800
\index{red-black tree}
16801

16802
One way to improve {\tt LinearMap} is to break the list of key-value
16803
pairs into smaller lists.  Here's an implementation called
16804
{\tt BetterMap}, which is a list of 100 LinearMaps.  As we'll see
16805
in a second, the order of growth for {\tt get} is still linear,
16806
but {\tt BetterMap} is a step on the path toward hashtables:
16807
\index{BetterMap@{\tt BetterMap}}
16808

16809
\begin{verbatim}
16810
class BetterMap:
16811

16812
    def __init__(self, n=100):
16813
        self.maps = []
16814
        for i in range(n):
16815
            self.maps.append(LinearMap())
16816

16817
    def find_map(self, k):
16818
        index = hash(k) % len(self.maps)
16819
        return self.maps[index]
16820

16821
    def add(self, k, v):
16822
        m = self.find_map(k)
16823
        m.add(k, v)
16824

16825
    def get(self, k):
16826
        m = self.find_map(k)
16827
        return m.get(k)
16828
\end{verbatim}
16829

16830
\verb"__init__" makes a list of {\tt n} {\tt LinearMap}s.
16831

16832
\verb"find_map" is used by
16833
{\tt add} and {\tt get}
16834
to figure out which map to put the
16835
new item in, or which map to search.
16836

16837
\verb"find_map" uses the built-in function {\tt hash}, which takes
16838
almost any Python object and returns an integer.  A limitation of this
16839
implementation is that it only works with hashable keys.  Mutable
16840
types like lists and dictionaries are unhashable.
16841
\index{hash function}
16842

16843
Hashable objects that are considered equivalent return the same hash
16844
value, but the converse is not necessarily true: two objects with
16845
different values can return the same hash value.
16846

16847
\verb"find_map" uses the modulus operator to wrap the hash values
16848
into the range from 0 to {\tt len(self.maps)}, so the result is a legal
16849
index into the list.  Of course, this means that many different
16850
hash values will wrap onto the same index.  But if the hash function
16851
spreads things out pretty evenly (which is what hash functions
16852
are designed to do), then we expect $n/100$ items per LinearMap.
16853

16854
Since the run time of {\tt LinearMap.get} is proportional to the
16855
number of items, we expect BetterMap to be about 100 times faster
16856
than LinearMap.  The order of growth is still linear, but the
16857
leading coefficient is smaller.  That's nice, but still not
16858
as good as a hashtable.
16859

16860
Here (finally) is the crucial idea that makes hashtables fast: if you
16861
can keep the maximum length of the LinearMaps bounded, {\tt
16862
  LinearMap.get} is constant time.  All you have to do is keep track
16863
of the number of items and when the number of
16864
items per LinearMap exceeds a threshold, resize the hashtable by
16865
adding more LinearMaps.
16866
\index{bounded}
16867

16868
Here is an implementation of a hashtable:
16869
\index{HashMap}
16870

16871
\begin{verbatim}
16872
class HashMap:
16873

16874
    def __init__(self):
16875
        self.maps = BetterMap(2)
16876
        self.num = 0
16877

16878
    def get(self, k):
16879
        return self.maps.get(k)
16880

16881
    def add(self, k, v):
16882
        if self.num == len(self.maps.maps):
16883
            self.resize()
16884

16885
        self.maps.add(k, v)
16886
        self.num += 1
16887

16888
    def resize(self):
16889
        new_maps = BetterMap(self.num * 2)
16890

16891
        for m in self.maps.maps:
16892
            for k, v in m.items:
16893
                new_maps.add(k, v)
16894

16895
        self.maps = new_maps
16896
\end{verbatim}
16897

16898
\verb"__init__" creates a {\tt BetterMap} and initializes {\tt num}, which keeps track of the number of items.
16899

16900
{\tt get} just dispatches to {\tt BetterMap}.  The real work happens
16901
in {\tt add}, which checks the number of items and the size of the
16902
{\tt BetterMap}: if they are equal, the average number of items per
16903
LinearMap is 1, so it calls {\tt resize}.
16904

16905
{\tt resize} make a new {\tt BetterMap}, twice as big as the previous
16906
one, and then ``rehashes'' the items from the old map to the new.
16907

16908
Rehashing is necessary because changing the number of LinearMaps
16909
changes the denominator of the modulus operator in
16910
\verb"find_map".  That means that some objects that used
16911
to hash into the same LinearMap will get split up (which is
16912
what we wanted, right?).
16913
\index{rehashing}
16914

16915
Rehashing is linear, so
16916
{\tt resize} is linear, which might seem bad, since I promised
16917
that {\tt add} would be constant time.  But remember that
16918
we don't have to resize every time, so {\tt add} is usually
16919
constant time and only occasionally linear.  The total amount
16920
of work to run {\tt add} $n$ times is proportional to $n$,
16921
so the average time of each {\tt add} is constant time!
16922
\index{constant time}
16923

16924
To see how this works, think about starting with an empty
16925
HashTable and adding a sequence of items.  We start with 2 LinearMaps,
16926
so the first 2 adds are fast (no resizing required).  Let's
16927
say that they take one unit of work each.  The next add
16928
requires a resize, so we have to rehash the first two
16929
items (let's call that 2 more units of work) and then
16930
add the third item (one more unit).  Adding the next item
16931
costs 1 unit, so the total so far is
16932
6 units of work for 4 items.
16933

16934
The next {\tt add} costs 5 units, but the next three
16935
are only one unit each, so the total is 14 units for the
16936
first 8 adds.
16937

16938
The next {\tt add} costs 9 units, but then we can add 7 more
16939
before the next resize, so the total is 30 units for the
16940
first 16 adds.
16941

16942
After 32 adds, the total cost is 62 units, and I hope you are starting
16943
to see a pattern.  After $n$ adds, where $n$ is a power of two, the
16944
total cost is $2n-2$ units, so the average work per add is
16945
a little less than 2 units.  When $n$ is a power of two, that's
16946
the best case; for other values of $n$ the average work is a little
16947
higher, but that's not important.  The important thing is that it
16948
is $O(1)$.
16949
\index{average cost}
16950

16951
Figure~\ref{fig.hash} shows how this works graphically.  Each
16952
block represents a unit of work.  The columns show the total
16953
work for each add in order from left to right: the first two
16954
{\tt adds} cost 1 unit each, the third costs 3 units, etc.
16955

16956
\begin{figure}
16957
\centerline{\includegraphics[width=5.5in]{figs/towers.pdf}}
16958
\caption{The cost of a hashtable add.\label{fig.hash}}
16959
\end{figure}
16960

16961
The extra work of rehashing appears as a sequence of increasingly
16962
tall towers with increasing space between them.  Now if you knock
16963
over the towers, spreading the cost of resizing over all
16964
adds, you can see graphically that the total cost after $n$
16965
adds is $2n - 2$.
16966

16967
An important feature of this algorithm is that when we resize the
16968
HashTable it grows geometrically; that is, we multiply the size by a
16969
constant.  If you increase the size
16970
arithmetically---adding a fixed number each time---the average time
16971
per {\tt add} is linear.
16972
\index{geometric resizing}
16973

16974
You can download my implementation of HashMap from
16975
\url{http://thinkpython2.com/code/Map.py}, but remember that there
16976
is no reason to use it; if you want a map, just use a Python dictionary.
16977

16978
\section{Glossary}
16979

16980
\begin{description}
16981

16982
\item[analysis of algorithms:] A way to compare algorithms in terms of
16983
their run time and/or space requirements.
16984
\index{analysis of algorithms}
16985

16986
\item[machine model:] A simplified representation of a computer used
16987
to describe algorithms.
16988
\index{machine model}
16989

16990
\item[worst case:] The input that makes a given algorithm run slowest (or
16991
require the most space).
16992
\index{worst case}
16993

16994
\item[leading term:] In a polynomial, the term with the highest exponent.
16995
\index{leading term}
16996

16997
\item[crossover point:] The problem size where two algorithms require
16998
the same run time or space. 
16999
\index{crossover point}
17000

17001
\item[order of growth:] A set of functions that all grow in a way
17002
considered equivalent for purposes of analysis of algorithms. 
17003
For example, all functions that grow linearly belong to the same
17004
order of growth.
17005
\index{order of growth}
17006

17007
\item[Big-Oh notation:] Notation for representing an order of growth;
17008
for example, $O(n)$ represents the set of functions that grow
17009
linearly. 
17010
\index{Big-Oh notation}
17011

17012
\item[linear:] An algorithm whose run time is proportional to
17013
problem size, at least for large problem sizes.
17014
\index{linear}
17015

17016
\item[quadratic:] An algorithm whose run time is proportional to
17017
$n^2$, where $n$ is a measure of problem size.
17018
\index{quadratic}
17019

17020
\item[search:] The problem of locating an element of a collection
17021
(like a list or dictionary) or determining that it is not present.
17022
\index{search}
17023

17024
\item[hashtable:] A data structure that represents a collection of
17025
key-value pairs and performs search in constant time.
17026
\index{hashtable}
17027

17028
\end{description}
17029

17030

17031
\printindex
17032

17033
\clearemptydoublepage
17034
%\blankpage
17035
%\blankpage
17036
%\blankpage
17037

17038

17039
\end{document}
17040

17041
Product

Resources

Company