CoCalc -- thinkdsp.tex

📚 The CoCalc Library - books, templates and other resources
cocalc-examples / think-dsp / book / thinkdsp.tex
¹³²⁹²⁴ views
License: OTHER
1
% LaTeX source for ``Think DSP: Digital Signal Processing for Programmers''
2
% Copyright 2014  Allen B. Downey.
3

4
% License: Creative Commons Attribution-NonCommercial 3.0 Unported License.
5
% http://creativecommons.org/licenses/by-nc/3.0/
6
%
7

8
\documentclass[12pt]{book}
9
\usepackage[width=5.5in,height=8.5in,
10
  hmarginratio=3:2,vmarginratio=1:1]{geometry}
11

12
% for some of these packages, you might have to install
13
% texlive-latex-extra (in Ubuntu)
14

15
\usepackage[T1]{fontenc}
16
\usepackage{textcomp}
17
\usepackage{mathpazo}
18
\usepackage{url}
19
\usepackage{graphicx}
20
\usepackage{subfig}
21
\usepackage{amsmath}
22
\usepackage{amsthm}
23
\usepackage{makeidx}
24
\usepackage{setspace}
25
\usepackage{hevea}                           
26
\usepackage{upquote}
27
\usepackage{fancyhdr}
28
\usepackage[bookmarks]{hyperref}
29

30
\title{Think DSP}
31
\author{Allen B. Downey}
32

33
\newcommand{\thetitle}{Think DSP: Digital Signal Processing in Python}
34
\newcommand{\theversion}{0.10.4}
35

36
% these styles get translated in CSS for the HTML version
37
\newstyle{a:link}{color:black;}
38
\newstyle{p+p}{margin-top:1em;margin-bottom:1em}
39
\newstyle{img}{border:0px}
40

41
% change the arrows in the HTML version
42
\setlinkstext
43
  {\imgsrc[ALT="Previous"]{back.png}}
44
  {\imgsrc[ALT="Up"]{up.png}}
45
  {\imgsrc[ALT="Next"]{next.png}} 
46

47
\makeindex
48

49
\newif\ifplastex
50
\plastexfalse
51

52
\begin{document}
53

54
\frontmatter
55

56
\ifplastex
57

58
\else
59
\fi
60

61
\ifplastex
62
    \usepackage{localdef}
63
    \maketitle
64

65
\else
66

67
\newtheoremstyle{exercise}% name of the style to be used
68
  {\topsep}% measure of space to leave above the theorem. E.g.: 3pt
69
  {\topsep}% measure of space to leave below the theorem. E.g.: 3pt
70
  {}% name of font to use in the body of the theorem
71
  {0pt}% measure of space to indent
72
  {\bfseries}% name of head font
73
  {}% punctuation between head and body
74
  { }% space after theorem head; " " = normal interword space
75
  {}% Manually specify head
76

77
\theoremstyle{exercise}
78
\newtheorem{exercise}{Exercise}[chapter]
79

80
\input{latexonly}
81

82
\begin{latexonly}
83

84
\renewcommand{\blankpage}{\thispagestyle{empty} \quad \newpage}
85

86
% TITLE PAGES FOR LATEX VERSION
87

88
%-half title--------------------------------------------------
89
\thispagestyle{empty}
90

91
\begin{flushright}
92
\vspace*{2.0in}
93

94
\begin{spacing}{3}
95
{\huge Think DSP}\\
96
{\Large Digital Signal Processing in Python}
97
\end{spacing}
98

99
\vspace{0.25in}
100

101
Version \theversion
102

103
\vfill
104

105
\end{flushright}
106

107
%--verso------------------------------------------------------
108

109
\blankpage
110
\blankpage
111

112
%--title page--------------------------------------------------
113
\pagebreak
114
\thispagestyle{empty}
115

116
\begin{flushright}
117
\vspace*{2.0in}
118

119
\begin{spacing}{3}
120
{\huge Think DSP}\\
121
{\Large Digital Signal Processing in Python}
122
\end{spacing}
123

124
\vspace{0.25in}
125

126
Version \theversion
127

128
\vspace{1in}
129

130

131
{\Large
132
Allen B. Downey\\
133
}
134

135

136
\vspace{0.5in}
137

138
{\Large Green Tea Press}
139

140
{\small Needham, Massachusetts}
141

142
\vfill
143

144
\end{flushright}
145

146

147
%--copyright--------------------------------------------------
148
\pagebreak
149
\thispagestyle{empty}
150

151
Copyright \copyright ~2014 Allen B. Downey.
152

153

154
\vspace{0.2in}
155

156
\begin{flushleft}
157
Green Tea Press       \\
158
9 Washburn Ave \\
159
Needham MA 02492
160
\end{flushleft}
161

162
Permission is granted to copy, distribute, and/or modify this document
163
under the terms of the Creative Commons Attribution-NonCommercial 3.0 Unported
164
License, which is available at \url{http://creativecommons.org/licenses/by-nc/3.0/}.
165

166
\vspace{0.2in}
167

168
\end{latexonly}
169

170

171
% HTMLONLY
172

173
\begin{htmlonly}
174

175
% TITLE PAGE FOR HTML VERSION
176

177
{\Large \thetitle}
178

179
{\large Allen B. Downey}
180

181
Version \theversion
182

183
\vspace{0.25in}
184

185
Copyright 2012 Allen B. Downey
186

187
\vspace{0.25in}
188

189
Permission is granted to copy, distribute, and/or modify this document
190
under the terms of the Creative Commons Attribution-NonCommercial 3.0
191
Unported License, which is available at
192
\url{http://creativecommons.org/licenses/by-nc/3.0/}.
193

194
\setcounter{chapter}{-1}
195

196
\end{htmlonly}
197

198
\fi
199
% END OF THE PART WE SKIP FOR PLASTEX
200

201
\chapter{Preface}
202
\label{preface}
203

204
Signal processing is one of my favorite topics.  It is useful
205
in many areas of science and engineering, and if you understand
206
the fundamental ideas, it provides insight into many things
207
we see in the world, and especially the things we hear.
208

209
But unless you studied electrical or mechanical engineering, you
210
probably haven't had a chance to learn about signal processing.  The
211
problem is that most books (and the classes that use them) present the
212
material bottom-up, starting with mathematical abstractions like
213
phasors.  And they tend to be theoretical, with few applications and
214
little apparent relevance.
215

216
The premise of this book is that if you know how to program, you
217
can use that skill to learn other things, and have fun doing it.
218

219
With a programming-based approach, I can present the most important
220
ideas right away.  By the end of the first chapter, you can analyze
221
sound recordings and other signals, and generate new sounds.  Each
222
chapter introduces a new technique and an application you can
223
apply to real signals.  At each step you learn how to use a
224
technique first, and then how it works.
225

226
This approach is more practical and, I hope you'll agree, more fun.
227

228

229
\section{Who is this book for?}
230

231
The examples and supporting code for this book are in Python.  You
232
should know core Python and you should be
233
familiar with object-oriented features, at least using objects if not
234
defining your own.
235

236
If you are not already familiar with Python, you might want to start
237
with my other book, {\it Think Python}, which is an introduction to
238
Python for people who have never programmed, or Mark
239
Lutz's {\it Learning Python}, which might be better for people with
240
programming experience.
241

242
I use NumPy and SciPy extensively.  If you are familiar with them
243
already, that's great, but I will also explain the functions
244
and data structures I use.
245

246
I assume that the reader knows basic mathematics, including complex
247
numbers.  You don't need much calculus; if you understand the concepts
248
of integration and differentiation, that will do.
249
I use some linear algebra, but I will explain it as we
250
go along.
251

252

253
\section{Using the code}
254
\label{code}
255

256
The code and sound samples used in this book are available from
257
\url{https://github.com/AllenDowney/ThinkDSP}.  Git is a version
258
control system that allows you to keep track of the files that
259
make up a project.  A collection of files under Git's control is
260
called a {\bf repository}.  GitHub is a hosting service that provides
261
storage for Git repositories and a convenient web interface.
262
\index{repository}
263
\index{Git}
264
\index{GitHub}
265

266
The GitHub homepage for my repository provides several ways to
267
work with the code:
268

269
\begin{itemize}
270

271
\item You can create a copy of my repository
272
on GitHub by pressing the {\sf Fork} button.  If you don't already
273
have a GitHub account, you'll need to create one.  After forking, you'll
274
have your own repository on GitHub that you can use to keep track
275
of code you write while working on this book.  Then you can
276
clone the repo, which means that you copy the files
277
to your computer.
278
\index{fork}
279

280
\item Or you could clone
281
my repository.  You don't need a GitHub account to do this, but you
282
won't be able to write your changes back to GitHub.
283
\index{clone}
284

285
\item If you don't want to use Git at all, you can download the files
286
in a Zip file using the button in the lower-right corner of the
287
GitHub page.
288

289
\end{itemize}
290

291
All of the code is written to work in both Python 2 and Python 3
292
with no translation.
293

294
I developed this book using Anaconda from
295
Continuum Analytics, which is a free Python distribution that includes
296
all the packages you'll need to run the code (and lots more).
297
I found Anaconda easy to install.  By default it does a user-level
298
installation, not system-level, so you don't need administrative
299
privileges.  And it supports both Python 2 and Python 3.  You can
300
download Anaconda from \url{http://continuum.io/downloads}.
301
\index{Anaconda}
302

303
If you don't want to use Anaconda, you will need the following
304
packages:
305

306
\begin{itemize}
307

308
\item NumPy for basic numerical computation, \url{http://www.numpy.org/};
309
\index{NumPy}
310

311
\item SciPy for scientific computation,
312
  \url{http://www.scipy.org/};
313
\index{SciPy}
314

315
\item matplotlib for visualization, \url{http://matplotlib.org/}.
316
\index{matplotlib}
317

318
\end{itemize}
319

320
Although these are commonly used packages, they are not included with
321
all Python installations, and they can be hard to install in some
322
environments.  If you have trouble installing them, I
323
recommend using Anaconda or one of the other Python distributions
324
that include these packages.
325
\index{installation}
326

327
Most exercises use Python scripts, but some also use the IPython
328
notebook.  If you have not used IPython notebook before, I suggest
329
you start with the documentation at
330
\url{http://ipython.org/ipython-doc/stable/notebook/notebook.html}.
331
\index{IPython}
332

333
Good luck, and have fun!
334

335

336

337
\section*{Contributor List}
338

339
If you have a suggestion or correction, please send email to 
340
{\tt downey@allendowney.com}.  If I make a change based on your
341
feedback, I will add you to the contributor list
342
(unless you ask to be omitted).
343
\index{contributors}
344

345
If you include at least part of the sentence the
346
error appears in, that makes it easy for me to search.  Page and
347
section numbers are fine, too, but not as easy to work with.
348
Thanks!
349

350
\small
351

352
\begin{itemize}
353

354
\item Before I started writing, my thoughts about this book
355
benefited from conversations with Boulos Harb at Google and
356
Aurelio Ramos, formerly at Harmonix Music Systems.
357

358
\item During the Fall 2013 semester, Nathan Lintz and Ian Daniher
359
worked with me on an independent study project and helped me with
360
the first draft of this book.
361

362
\item On Reddit's DSP forum, the anonymous user RamjetSoundwave
363
helped me fix a problem with my implementation of Brownian Noise.
364
And andodli found a typo.
365

366
\item In Spring 2015 I had the pleasure of teaching this material
367
along with Prof. Oscar Mur-Miranda and Prof. Siddhartan Govindasamy.
368
Both made many suggestions and corrections.
369

370
\item Silas Gyger corrected an arithmetic error.
371

372
% ENDCONTRIB
373

374
\end{itemize}
375

376
\normalsize
377

378
\clearemptydoublepage
379

380
% TABLE OF CONTENTS
381
\begin{latexonly}
382

383
\tableofcontents
384

385
\clearemptydoublepage
386

387
\end{latexonly}
388

389
% START THE BOOK
390
\mainmatter
391

392

393
\chapter{Sounds and signals}
394
\label{sounds}
395

396
A {\bf signal} represents a quantity that varies in time,
397
or space, or both.  That definition is pretty abstract, so let's start
398
with a concrete example: sound.  Sound is variation in air pressure.
399
A sound signal represents variations in air pressure over time.
400

401
A microphone is a device that measures these variations and generates
402
an electrical signal that represents sound.  A speaker is a device
403
that takes an electrical signal and produces sound.
404
Microphones and speakers are called {\bf transducers} because they
405
transduce, or convert, signals from one form to another.
406

407
This book is about signal processing, which includes processes for
408
synthesizing, transforming, and analyzing signals.  I will focus on
409
sound signals, but the same methods apply to electronic signals,
410
mechanical vibration, and signals in many other domains.
411

412
They also apply to signals that vary in space rather than time, like
413
elevation along a hiking trail.  And they apply to signals in more
414
than one dimension, like an image, which you can think of as a signal
415
that varies in two-dimensional space.  Or a movie, which is
416
a signal that varies in two-dimensional space {\it and} time.
417

418
But we start with simple one-dimensional sound.
419

420
The code for this chapter is in {\tt chap01.ipynb}, which is in the
421
repository for this book (see Section~\ref{code}).
422
You can also view it at \url{http://tinyurl.com/thinkdsp01}.
423

424

425
\section{Periodic signals}
426
\label{violin}
427

428
\begin{figure}
429
% sounds.py
430
\centerline{\includegraphics[height=2.5in]{figs/sounds1.eps}}
431
\caption{Segment from a recording of a bell.}
432
\label{fig.sounds1}
433
\end{figure}
434

435
We'll start with {\bf periodic signals}, which are signals that
436
repeat themselves after some period of time.  For example, if you
437
strike a bell, it vibrates and generates sound.  If you record
438
that sound and plot the transduced signal, it looks like
439
Figure~\ref{fig.sounds1}.
440

441
This signal resembles a {\bf sinusoid}, which means it has the same
442
shape as the trigonometric sine function.
443

444
You can see that this signal is periodic.  I chose the duration
445
to show three full periods, also known as {\bf cycles}.
446
The duration of each cycle is about 2.3 ms.
447

448
The {\bf frequency} of a signal is the number of cycles
449
per second, which is the inverse of the period.
450
The units of frequency are cycles per second, or {\bf Hertz},
451
abbreviated ``Hz''.
452

453
The frequency of this signal is about 439 Hz, slightly lower than 440
454
Hz, which is the standard tuning pitch for orchestral music.  The
455
musical name of this note is A, or more specifically, A4.  If you are
456
not familiar with ``scientific pitch notation'', the numerical suffix
457
indicates which octave the note is in.  A4 is the A above middle C.
458
A5 is one octave higher.  See
459
\url{http://en.wikipedia.org/wiki/Scientific_pitch_notation}.
460

461
\begin{figure}
462
% sounds.py
463
\centerline{\includegraphics[height=2.5in]{figs/sounds2.eps}}
464
\caption{Segment from a recording of a violin.}
465
\label{fig.sounds2}
466
\end{figure}
467

468
A tuning fork generates a sinusoid because the vibration of the tines
469
is a form of simple harmonic motion.  Most musical instruments
470
produce periodic signals, but the shape of these signals is not
471
sinusoidal.  For example, Figure~\ref{fig.sounds2} shows a segment
472
from a recording of a violin playing
473
Boccherini's String Quintet No. 5 in E, 3rd
474
movement.
475

476
%\footnote{The recording is from
477
%  \url{http://www.freesound.org/people/jcveliz/sounds/92002/}.
478
%I identified the piece using \url{http://www.musipedia.org}.}
479
% Parson's code: DUUDDUURDR
480

481
Again we can see that the signal is periodic, but the shape of the
482
signal is more complex.  The shape of a periodic signal is called
483
the {\bf waveform}.  Most musical instruments produce waveforms more
484
complex than a sinusoid.  The shape of the waveform determines the
485
musical {\bf timbre}, which is our perception of the quality of the
486
sound.  People usually perceive complex waveforms as rich, warm and
487
more interesting than sinusoids.
488

489

490
\section{Spectral decomposition}
491

492
\begin{figure}
493
% sounds.py
494
\centerline{\includegraphics[height=2.5in]{figs/sounds3.eps}}
495
\caption{Spectrum of a segment from the violin recording.}
496
\label{fig.sounds3}
497
\end{figure}
498

499
The most important topic in this book is {\bf spectral decomposition},
500
which is the idea that any signal can be expressed as the sum of
501
sinusoids with different frequencies.
502

503
The most important mathematical idea in this book is the {\bf discrete
504
  Fourier transform}, or {\bf DFT}, which takes a signal and produces
505
its {\bf spectrum}.  The spectrum is the set of sinusoids that add up to
506
produce the signal.
507

508
And the most important algorithm in this book is the {\bf Fast
509
Fourier transform}, or {\bf FFT}, which is an efficient way to
510
compute the DFT.
511

512
For example, Figure~\ref{fig.sounds3} shows the spectrum of the violin
513
recording in Figure~\ref{fig.sounds2}.  The x-axis is the range of
514
frequencies that make up the signal.  The y-axis shows the strength
515
or {\bf amplitude} of each frequency component.
516

517
The lowest frequency component is called the {\bf fundamental
518
  frequency}.  The fundamental frequency of this signal is near 440 Hz
519
(actually a little lower, or ``flat'').
520

521
In this signal the fundamental frequency has the largest amplitude,
522
so it is also the {\bf dominant frequency}.
523
Normally the perceived pitch of a sound is determined by the
524
fundamental frequency, even if it is not dominant. 
525

526
The other spikes in the spectrum are at frequencies 880, 1320, 1760, and
527
2200, which are integer multiples of the fundamental.
528
These components are called {\bf harmonics} because they are
529
musically harmonious with the fundamental:
530

531
\begin{itemize}
532

533
\item 880 is the frequency of A5, one octave higher than the fundamental.  
534

535
\item 1320 is approximately E6, which is a major fifth above A5.
536
  If you are not familiar with musical intervals like "major fifth'', see
537
  \url{https://en.wikipedia.org/wiki/Interval_(music)}.
538

539
\item 1760 is A6, two octaves above the fundamental. 
540

541
\item 2200 is approximately C$\sharp$7, which is a major third
542
above A6.
543

544
\end{itemize}
545

546
These harmonics make up the notes of an A major
547
chord, although not all in the same octave.  Some of them are only
548
approximate because the notes that make up Western music have been
549
adjusted for {\bf equal temperament} (see
550
 \url{http://en.wikipedia.org/wiki/Equal_temperament}).
551

552
Given the harmonics and their amplitudes, you can reconstruct the
553
signal by adding up sinusoids.  
554
Next we'll see how.
555

556

557
\section{Signals}
558

559
I wrote a Python module called {\tt thinkdsp.py} that contains classes
560
and functions for working with signals and spectrums\footnote{The
561
plural of ``spectrum'' is often written ``spectra'', but I prefer
562
to use standard English plurals.  If you are familiar with ``spectra'',
563
I hope my choice doesn't sound too strange.}.  You
564
will find it in the repository for this book (see Section~\ref{code}).
565

566
To represent signals, {\tt thinkdsp} provides a class called
567
{\tt Signal}, which is the parent class for several signal types,
568
including {\tt Sinusoid}, which represents both sine and cosine
569
signals.
570

571
{\tt thinkdsp} provides functions to create sine and cosine signals:
572

573
\begin{verbatim}
574
    cos_sig = thinkdsp.CosSignal(freq=440, amp=1.0, offset=0)
575
    sin_sig = thinkdsp.SinSignal(freq=880, amp=0.5, offset=0)
576
\end{verbatim}
577

578
{\tt freq} is frequency in Hz.  {\tt amp} is amplitude in unspecified
579
units where 1.0 is defined as the largest amplitude we can record or
580
play back.
581

582
{\tt offset} is a {\bf phase offset} in radians.  Phase offset
583
determines where in the period the signal starts.  For example, a
584
sine signal with {\tt offset=0} starts at $\sin 0$, which is 0.
585
With {\tt offset=pi/2} it starts at $\sin \pi/2$, which is 1.  A cosine
586
signal with {\tt offset=0} also starts at 0.  In fact, a cosine signal
587
with {\tt offset=0} is identical to a sine signal with {\tt
588
  offset=pi/2}.
589

590
Signals have an \verb"__add__" method, so you can use the {\tt +}
591
operator to add them:
592

593
\begin{verbatim}
594
    mix = sin_sig + cos_sig
595
\end{verbatim}
596

597
The result is a {\tt SumSignal}, which represents the sum of two
598
or more signals.
599

600
A Signal is basically a Python representation of a mathematical
601
function.  Most signals are defined for all values of {\tt t},
602
from negative infinity to infinity.
603

604
You can't do much with a Signal until you evaluate it.  In this
605
context, ``evaluate'' means taking a sequence of points in time, {\tt
606
  ts}, and computing the corresponding values of the signal, {\tt ys}.
607
I represent {\tt ts} and {\tt ys} using NumPy arrays and excapsulate
608
them in an object called a Wave.
609

610
A Wave represents a signal evaluated at a sequence of points in
611
time.  Each point in time is called a {\bf frame} (a term borrowed
612
from movies and video).  The measurement itself is called a
613
{\bf sample}, although ``frame'' and ``sample'' are sometimes
614
used interchangeably.
615

616
{\tt Signal} provides \verb"make_wave", which returns a new
617
Wave object:
618

619
\begin{verbatim}
620
    wave = mix.make_wave(duration=0.5, start=0, framerate=11025)
621
\end{verbatim}
622

623
{\tt duration} is the length of the Wave in seconds.  {\tt start} is
624
the start time, also in seconds.  {\tt framerate} is the (integer)
625
number of frames per second, which is also the number of samples
626
per second.
627

628
11,025 frames per second is one of several framerates commonly used in
629
audio file formats, including Waveform Audio File (WAV) and mp3. 
630

631
This example evaluates the signal from {\tt t=0} to {\tt t=0.5} at
632
5,513 equally-spaced frames (because 5,513 is half of 11,025).
633
The time between frames, or {\bf timestep}, is {\tt 1/11025} seconds, or
634
91 $\mu$s.
635

636
{\tt Wave} provides a {\tt plot} method that uses {\tt pyplot}.
637
You can plot the wave like this:
638

639
\begin{verbatim}
640
    wave.plot()
641
    pyplot.show()
642
\end{verbatim}
643

644
{\tt pyplot} is part of {\tt matplotlib}; it is included in many
645
Python distributions, or you might have to install it.
646

647
\begin{figure}
648
% sounds4.py
649
\centerline{\includegraphics[height=2.5in]{figs/sounds4.eps}}
650
\caption{Segment from a mixture of two sinusoid signals.}
651
\label{fig.sounds4}
652
\end{figure}
653

654
At {\tt freq=440} there are 220 periods in 0.5 seconds, so this plot
655
would look like a solid block of color.  To zoom in on a small number
656
of periods, we can use {\tt segment}, which copies a segment of a Wave
657
and returns a new wave:
658

659
\begin{verbatim}
660
    period = mix.period
661
    segment = wave.segment(start=0, duration=period*3)
662
\end{verbatim}
663

664
{\tt period} is a property of a Signal; it returns the period in seconds.
665

666
{\tt start} and {\tt duration} are in seconds.  This example copies
667
the first three periods from {\tt mix}.  The result is a Wave object.
668

669
If we plot {\tt segment}, it looks like Figure~\ref{fig.sounds4}.
670
This signal contains two frequency components, so it is more
671
complicated than the signal from the tuning fork, but less complicated
672
than the violin.
673

674

675
\section{Reading and writing Waves}
676

677
{\tt thinkdsp} provides \verb"read_wave", which reads a WAV
678
file and returns a Wave:
679

680
\begin{verbatim}
681
    violin_wave = thinkdsp.read_wave('input.wav')
682
\end{verbatim}
683

684
And {\tt Wave} provides {\tt write}, which writes a WAV file:
685

686
\begin{verbatim}
687
    wave.write(filename='output.wav')
688
\end{verbatim}
689

690
You can listen to the Wave with any media player that plays WAV
691
files.  On UNIX systems, I use {\tt aplay}, which is simple, robust,
692
and included in many Linux distributions.
693

694
{\tt thinkdsp} also provides \verb"play_wave", which runs
695
the media player as a subprocess:
696

697
\begin{verbatim}
698
    thinkdsp.play_wave(filename='output.wav', player='aplay')
699
\end{verbatim}
700

701
It uses {\tt aplay} by default, but you can provide the
702
name of another player.
703

704

705
\section{Spectrums}
706
\label{spectrums}
707

708
{\tt Wave} provides \verb"make_spectrum", which returns a
709
{\tt Spectrum}:
710

711
\begin{verbatim}
712
    spectrum = wave.make_spectrum()
713
\end{verbatim}
714

715
And {\tt Spectrum} provides {\tt plot}:
716

717
\begin{verbatim}
718
    spectrum.plot()
719
    thinkplot.show()
720
\end{verbatim}
721

722
{\tt thinkplot} is a module I wrote to provide wrappers around some of
723
the functions in {\tt pyplot}.  It is included in the
724
Git repository for this book (see Section~\ref{code}).
725

726
{\tt Spectrum} provides three methods that modify the spectrum:
727

728
\begin{itemize}
729

730
\item \verb"low_pass" applies a low-pass filter, which means that
731
  components above a given cutoff frequency are attenuated (that is,
732
  reduced in magnitude) by a factor.
733

734
\item \verb"high_pass" applies a high-pass filter, which means that
735
  it attenuates components below the cutoff.
736

737
\item \verb"band_stop" attenuates components in the band of
738
frequencies between two cutoffs.
739

740
\end{itemize}
741

742
This example attenuates all frequencies above 600 by 99\%:
743

744
\begin{verbatim}
745
   spectrum.low_pass(cutoff=600, factor=0.01)
746
\end{verbatim}
747

748
A low pass filter removes bright, high-frequency sounds, so
749
the result sounds muffled and darker.  To hear what it sounds
750
like, you can convert the Spectrum back to a Wave, and then play it.
751

752
\begin{verbatim}
753
    wave = spectrum.make_wave()
754
    wave.play('temp.wav')
755
\end{verbatim}
756

757
The {\tt play} method writes the wave to a file and then plays it.
758
If you use IPython notebooks, you can use \verb"make_audio", which
759
makes an Audio widget that plays the sound.
760

761

762
\section{Wave objects}
763

764
\begin{figure}
765
% http://yuml.me/edit/5294b377
766
% pdftops -eps diagram1.pdf
767
\centerline{\includegraphics[width=3.5in]{figs/diagram1.eps}}
768
\caption{Relationships among the classes in {\tt thinkdsp}.}
769
\label{fig.diagram1}
770
\end{figure}
771

772
There is nothing very complicated in {\tt thinkdsp.py}.  Most
773
of the functions it provides are thin wrappers around functions
774
from NumPy and SciPy.
775

776
The primary classes in {\tt thinkdsp} are Signal, Wave, and Spectrum.
777
Given a Signal, you can make a Wave.  Given a Wave, you can
778
make a Spectrum, and vice versa.  These relationships are shown
779
in Figure~\ref{fig.diagram1}.
780

781
A Wave object contains three attributes: {\tt ys} is a NumPy array
782
that contains the values in the signal; {\tt ts} is an array of the
783
times where the signal was evaluated or sampled; and {\tt
784
  framerate} is the number of samples per unit of time.  The
785
unit of time is usually seconds, but it doesn't have to be.  In
786
one of my examples, it's days.
787

788
Wave also provides three read-only properties: {\tt start},
789
{\tt end}, and {\tt duration}.  If you modify {\tt ts}, these
790
properties change accordingly.
791

792
To modify a wave, you can access the {\tt ts} and {\tt ys} directly.
793
For example:
794

795
\begin{verbatim}
796
wave.ys *= 2
797
wave.ts += 1
798
\end{verbatim} 
799

800
The first line scales the wave by a factor of 2, making
801
it louder.  The second line shifts the wave in time, making it
802
start 1 second later.
803

804
But Wave provides methods that perform many common operations.
805
For example, the same two transformations could be written:
806

807
\begin{verbatim}
808
wave.scale(2)
809
wave.shift(1)
810
\end{verbatim} 
811

812
You can read the documentation of these methods and others at
813
\url{http://think-dsp.com/thinkdsp.html}.
814

815

816
\section{Signal objects}
817
\label{sigobs}
818

819
Signal is a parent class that provides functions common to all
820
kinds of signals, like \verb"make_wave".  Child classes inherit
821
these methods and provide {\tt evaluate}, which evaluates the
822
signal at a given sequence of times.
823

824
For example, Sinusoid is a child class of Signal, with this
825
definition:
826

827
\begin{verbatim}
828
class Sinusoid(Signal):
829
    
830
    def __init__(self, freq=440, amp=1.0, offset=0, func=np.sin):
831
        Signal.__init__(self)
832
        self.freq = freq
833
        self.amp = amp
834
        self.offset = offset
835
        self.func = func
836
\end{verbatim}
837

838
The parameters of \verb"__init__" are:
839

840
\begin{itemize}
841

842
\item {\tt freq}: frequency in cycles per second, or Hz.
843

844
\item {\tt amp}: amplitude.  The units of amplitude are arbitrary,
845
usually chosen so 1.0 corresponds to the maximum input from a
846
microphone or maximum output to a speaker.
847

848
\item {\tt offset}: indicates where in its period the signal starts;
849
{\tt offset} is in units of radians, for reasons I explain below.
850

851
\item {\tt func}: a Python function used
852
to evaluate the signal at a particular point in time.  It is
853
usually either {\tt np.sin} or {\tt np.cos}, yielding a sine or
854
cosine signal.
855

856
\end{itemize}
857

858
Like many init methods, this one just tucks the parameters away for
859
future use.
860

861
Signal provides \verb"make_wave", which looks like
862
this:
863

864
\begin{verbatim}
865
    def make_wave(self, duration=1, start=0, framerate=11025):
866
        n = round(duration * framerate)
867
        ts = start + np.arange(n) / framerate
868
        ys = self.evaluate(ts)
869
        return Wave(ys, ts, framerate=framerate)
870
\end{verbatim}
871

872
{\tt start} and {\tt duration} are the start time and duration
873
in seconds.  {\tt framerate} is the number of frames (samples)
874
per second.
875

876
{\tt n} is the number of samples, and {\tt ts} is a NumPy array
877
of sample times.
878

879
To compute the {\tt ys}, \verb"make_wave" invokes {\tt evaluate}, 
880
is provided by {\tt Sinusoid}:
881

882
\begin{verbatim}
883
    def evaluate(self, ts):
884
        phases = PI2 * self.freq * ts + self.offset
885
        ys = self.amp * self.func(phases)
886
        return ys
887
\end{verbatim}
888

889
Let's unwind this function one step at time:
890

891
\begin{enumerate}
892

893
\item {\tt self.freq} is frequency in cycles per second, and each
894
  element of {\tt ts} is a time in seconds, so their product is the
895
  number of cycles since the start time.
896

897
\item {\tt PI2} is a constant that stores $2 \pi$.  Multiplying by
898
  {\tt PI2} converts from cycles to {\bf phase}.  You can think of
899
  phase as ``cycles since the start time'' expressed in radians.  Each
900
  cycle is $2 \pi$ radians.
901

902
\item {\tt self.offset} is the phase when $t=0$.
903
  It has the effect of shifting the signal left or right in time.
904

905
\item If {\tt self.func} is {\tt np.sin} or {\tt np.cos}, the result is a
906
  value between $-1$ and $+1$.
907

908
\item Multiplying by {\tt self.amp} yields a signal that ranges from
909
  {\tt -self.amp} to {\tt +self.amp}.
910

911
\end{enumerate}
912

913
In math notation, {\tt evaluate} is written like this:
914
%
915
\[ y = A \cos (2 \pi f t + \phi_0) \]
916
%
917
where $A$ is amplitude, $f$ is frequency, $t$ is time, and $\phi_0$
918
is the phase offset.  It may seem like I wrote a lot of code
919
to evaluate one simple expression, but as we'll see, this code
920
provides a framework for dealing with all kinds of signals, not
921
just sinusoids.
922

923

924
\section{Exercises}
925

926
Before you begin these exercises, you should download the code
927
for this book, following the instructions in Section~\ref{code}.
928

929
Solutions to these exercises are in {\tt chap01soln.ipynb}.
930

931
\begin{exercise}
932
If you have IPython, load {\tt chap01.ipynb}, read through it, and run
933
the examples.  You can also view this notebook at
934
\url{http://tinyurl.com/thinkdsp01}.
935
\end{exercise}
936

937

938
\begin{exercise}
939
Go to \url{http://freesound.org} and download a sound sample that
940
includes music, speech, or other sounds that have a well-defined pitch.
941
Select a roughly half-second segment where the pitch is
942
constant.  Compute and plot the spectrum of the segment you selected.
943
What connection can you make between the timbre of the sound and the
944
harmonic structure you see in the spectrum?
945

946
Use \verb"high_pass", \verb"low_pass", and \verb"band_stop" to
947
filter out some of the harmonics.  Then convert the spectrum back
948
to a wave and listen to it.  How does the sound relate to the
949
changes you made in the spectrum?
950
\end{exercise}
951

952

953
\begin{exercise}
954
Synthesize a compound signal by creating SinSignal and CosSignal
955
objects and adding them up.  Evaluate the signal to get a Wave,
956
and listen to it.  Compute its Spectrum and plot it.
957
What happens if you add frequency
958
components that are not multiples of the fundamental?
959
\end{exercise}
960

961

962
\begin{exercise}
963
Write a function called {\tt stretch} that takes a Wave and a stretch
964
factor and speeds up or slows down the wave by modifying {\tt ts} and
965
{\tt framerate}.  Hint: it should only take two lines of code.
966
\end{exercise}
967

968

969
\chapter{Harmonics}
970
\label{harmonics}
971

972
In this chapter I present several new waveforms; we will look and
973
their spectrums to understand their {\bf harmonic structure}, which is
974
the set of sinusoids they are made up of.
975

976
I'll also introduce one of the most important phenomena in digital
977
signal processing: aliasing.  And I'll explain a little more about how
978
the Spectrum class works.
979

980
The code for this chapter is in {\tt chap02.ipynb}, which is in the
981
repository for this book (see Section~\ref{code}).
982
You can also view it at \url{http://tinyurl.com/thinkdsp02}.
983

984

985
\section{Triangle waves}
986
\label{triangle}
987

988
A sinusoid contains only one frequency component, so its spectrum
989
has only one peak.  More complicated waveforms, like the
990
violin recording, yield DFTs with many peaks.  In this section we
991
investigate the relationship between waveforms and their spectrums.
992

993
\begin{figure}
994
% aliasing.py
995
\centerline{\includegraphics[height=2.5in]{figs/triangle-200-1.eps}}
996
\caption{Segment of a triangle signal at 200 Hz.}
997
\label{fig.triangle.200.1}
998
\end{figure}
999

1000
I'll start with a triangle waveform, which is like a straight-line
1001
version of a sinusoid.  Figure~\ref{fig.triangle.200.1} shows a
1002
triangle waveform with frequency 200 Hz.
1003

1004
To generate a triangle wave, you can use {\tt thinkdsp.TriangleSignal}:
1005

1006
\begin{verbatim}
1007
class TriangleSignal(Sinusoid):
1008
    
1009
    def evaluate(self, ts):
1010
        cycles = self.freq * ts + self.offset / PI2
1011
        frac, _ = np.modf(cycles)
1012
        ys = np.abs(frac - 0.5)
1013
        ys = normalize(unbias(ys), self.amp)
1014
        return ys
1015
\end{verbatim}
1016

1017
{\tt TriangleSignal} inherits \verb"__init__" from {\tt Sinusoid},
1018
so it takes the same arguments: {\tt freq}, {\tt amp}, and {\tt offset}.
1019

1020
The only difference is {\tt evaluate}.  As we saw before,
1021
{\tt ts} is the sequence of sample times where we want to
1022
evaluate the signal.
1023

1024
There are many ways to generate a triangle wave.  The details
1025
are not important, but here's how {\tt evaluate} works:
1026

1027
\begin{enumerate}
1028

1029
\item {\tt cycles} is the number of cycles since the start time.
1030
{\tt np.modf} splits the number of cycles into the fraction
1031
part, stored in {\tt frac}, and the integer part, which is ignored
1032
\footnote{Using an underscore as a variable name is a convention that
1033
means, ``I don't intend to use this value.''}.
1034

1035
\item {\tt frac} is a sequence that ramps from 0 to 1 with the given
1036
  frequency.  Subtracting 0.5 yields values between -0.5 and 0.5.
1037
  Taking the absolute value yields a waveform that zig-zags between
1038
  0.5 and 0.
1039

1040
\item {\tt unbias} shifts the waveform down so it is centered at 0; then
1041
{\tt normalize} scales it to the given amplitude, {\tt amp}.
1042

1043
\end{enumerate}
1044

1045
Here's the code that generates Figure~\ref{fig.triangle.200.1}:
1046

1047
\begin{verbatim}
1048
signal = thinkdsp.TriangleSignal(200)
1049
signal.plot()
1050
\end{verbatim}
1051

1052
\begin{figure}
1053
% aliasing.py
1054
\centerline{\includegraphics[height=2.5in]{figs/triangle-200-2.eps}}
1055
\caption{Spectrum of a triangle signal at 200 Hz.}
1056
\label{fig.triangle.200.2}
1057
\end{figure}
1058

1059
Next we can use the Signal to make a Wave, and use the Wave to
1060
make a Spectrum:
1061

1062
\begin{verbatim}
1063
wave = signal.make_wave(duration=0.5, framerate=10000)
1064
spectrum = wave.make_spectrum()
1065
spectrum.plot()
1066
\end{verbatim}
1067

1068
Figure~\ref{fig.triangle.200.2} shows the result.  As expected, the
1069
highest peak is at the fundamental frequency, 200 Hz, and there
1070
are additional peaks at harmonic frequencies, which are integer
1071
multiples of 200.
1072

1073
But one surprise is that there are no peaks at the even multiples:
1074
400, 800, etc.  The harmonics of a triangle wave are all
1075
odd multiples of the fundamental frequency, in this example
1076
600, 1000, 1400, etc.
1077

1078
Another feature of this spectrum is the relationship between the
1079
amplitude and frequency of the harmonics.  Their amplitude drops off
1080
in proportion to frequency squared.  For example the frequency ratio
1081
of the first two harmonics (200 and 600 Hz) is 3, and the amplitude
1082
ratio is approximately 9.  The frequency ratio of the next two
1083
harmonics (600 and 1000 Hz) is 1.7, and the amplitude ratio is
1084
approximately $1.7^2 = 2.9$.  This relationship is called the
1085
{\bf harmonic structure}.
1086

1087

1088
\section{Square waves}
1089
\label{square}
1090

1091
\begin{figure}
1092
% aliasing.py
1093
\centerline{\includegraphics[height=2.5in]{figs/square-100-1.eps}}
1094
\caption{Segment of a square signal at 100 Hz.}
1095
\label{fig.square.100.1}
1096
\end{figure}
1097

1098
{\tt thinkdsp} also provides {\tt SquareSignal}, which represents
1099
a square signal.  Here's the class definition:
1100

1101
\begin{verbatim}
1102
class SquareSignal(Sinusoid):
1103
    
1104
    def evaluate(self, ts):
1105
        cycles = self.freq * ts + self.offset / PI2
1106
        frac, _ = np.modf(cycles)
1107
        ys = self.amp * np.sign(unbias(frac))
1108
        return ys
1109
\end{verbatim}
1110

1111
Like {\tt TriangleSignal}, {\tt SquareSignal} inherits 
1112
\verb"__init__" from {\tt Sinusoid}, so it takes the same
1113
parameters.
1114

1115
And the {\tt evaluate} method is similar.  Again, {\tt cycles} is
1116
the number of cycles since the start time, and {\tt frac} is the
1117
fractional part, which ramps from 0 to 1 each period.
1118

1119
{\tt unbias} shifts {\tt frac} so it ramps from -0.5 to 0.5,
1120
then {\tt np.sign} maps the negative values to -1 and the
1121
positive values to 1.  Multiplying by {\tt amp} yields a square
1122
wave that jumps between {\tt -amp} and {\tt amp}.
1123

1124
\begin{figure}
1125
% aliasing.py
1126
\centerline{\includegraphics[height=2.5in]{figs/square-100-2.eps}}
1127
\caption{Spectrum of a square signal at 100 Hz.}
1128
\label{fig.square.100.2}
1129
\end{figure}
1130

1131
Figure~\ref{fig.square.100.1} shows three periods of a square
1132
wave with frequency 100 Hz,
1133
and Figure~\ref{fig.square.100.2} shows its spectrum.
1134

1135
Like a triangle wave, the square wave contains only odd harmonics,
1136
which is why there are peaks at 300, 500, and 700 Hz, etc.
1137
But the amplitude of the harmonics drops off more slowly.
1138
Specifically, amplitude drops in proportion to frequency (not frequency
1139
squared).
1140

1141
The exercises at the end of this chapter give you a chance to
1142
explore other waveforms and other harmonic structures.
1143

1144

1145
\section{Aliasing}
1146
\label{aliasing}
1147

1148
\begin{figure}
1149
% aliasing.py
1150
\centerline{\includegraphics[height=2.5in]{figs/triangle-1100-2.eps}}
1151
\caption{Spectrum of a triangle signal at 1100 Hz sampled at
1152
10,000 frames per second.}
1153
\label{fig.triangle.1100.2}
1154
\end{figure}
1155

1156
I have a confession.  I chose the examples in the previous section
1157
carefully to avoid showing you something confusing.  But now it's
1158
time to get confused.
1159

1160
Figure~\ref{fig.triangle.1100.2} shows the spectrum of a triangle wave
1161
at 1100 Hz, sampled at 10,000 frames per second.  The harmonics
1162
of this wave should be at 3300, 5500, 7700, and 9900 Hz.
1163

1164
In the figure, there are peaks at 1100 and 3300 Hz, as expected, but
1165
the third peak is at 4500, not 5500 Hz.  The
1166
fourth peak is at 2300, not 7700 Hz.  And if you look closely, the
1167
peak that should be at 9900 is actually at 100 Hz.  What's going on?
1168

1169
The problem is that when you evaluate the signal at
1170
discrete points in time, you lose information about what happened
1171
between samples.  For low frequency components, that's not a
1172
problem, because you have lots of samples per period.
1173

1174
But if you sample a signal at 5000 Hz with 10,000 frames per second,
1175
you only have two samples per period.  That turns out to be enough,
1176
just barely, but if the frequency is higher, it's not.
1177

1178
To see why, let's generate cosine signals at 4500 and 5500 Hz,
1179
and sample them at 10,000 frames per second:
1180

1181
\begin{verbatim}
1182
    framerate = 10000
1183

1184
    signal = thinkdsp.CosSignal(4500)
1185
    duration = signal.period*5
1186
    segment = signal.make_wave(duration, framerate=framerate)
1187
    segment.plot()
1188

1189
    signal = thinkdsp.CosSignal(5500)
1190
    segment = signal.make_wave(duration, framerate=framerate)
1191
    segment.plot()
1192
\end{verbatim}
1193

1194
\begin{figure}
1195
% aliasing.py
1196
\centerline{\includegraphics[height=2.5in]{figs/aliasing1.eps}}
1197
\caption{Cosine signals at 4500 and 5500 Hz, sampled at 10,000 frames
1198
per second.  They are identical.}
1199
\label{fig.aliasing1}
1200
\end{figure}
1201

1202
Figure~\ref{fig.aliasing1} shows the result.  I plotted the
1203
samples using vertical lines, to make it easier to compare the
1204
two Waves, and I offset them slightly in time.  The problem
1205
should be clear: the
1206
the two Waves are exactly the same!
1207

1208
When we sample a 5500 Hz signal at 10,000 frames per second, the
1209
result is indistinguishable from a 4500 Hz signal.
1210
For the same reason, a 7700 Hz signal is indistinguishable
1211
from 2300 Hz, and a 9900 Hz is indistinguishable from 100 Hz.
1212

1213
This effect is called {\bf aliasing} because when the high frequency
1214
signal is sampled, it appears to be a low frequency signal.
1215

1216
In this example, the highest frequency we can measure is 5000 Hz,
1217
which is half the sampling rate.  Frequencies above 5000 Hz are folded
1218
back below 5000 Hz, which is why this threshold is sometimes called
1219
the ``folding frequency''.  Is is sometimes also called the {\bf
1220
  Nyquist frequency}.  See
1221
\url{http://en.wikipedia.org/wiki/Nyquist_frequency}.
1222

1223
The folding pattern continues if the aliased frequency goes below
1224
zero.  For example, the 5th harmonic of the 1100 Hz triangle wave is
1225
at 12,100 Hz.  Folded at 5000 Hz, it would appear at -2100 Hz, but it
1226
gets folded again at 0 Hz, back to 2100 Hz.  In fact, you can see a
1227
small peak at 2100 Hz in Figure~\ref{fig.square.100.2}, and the next
1228
one at 4300 Hz.
1229

1230

1231
\section{Computing the spectrum}
1232

1233
We have seen the Wave method \verb"make_spectrum" several times.
1234
Here is the implementation (leaving out some details we'll get
1235
to later):
1236

1237
\begin{verbatim}
1238
from np.fft import rfft, rfftfreq
1239

1240
# class Wave:
1241
    def make_spectrum(self):
1242
        n = len(self.ys)
1243
        d = 1 / self.framerate
1244

1245
        hs = rfft(self.ys)
1246
        fs = rfftfreq(n, d)
1247

1248
        return Spectrum(hs, fs, self.framerate)
1249
\end{verbatim}
1250

1251
The parameter {\tt self} is a Wave object.  {\tt n} is the number
1252
of samples in the wave, and {\tt d} is the inverse of the
1253
frame rate, which is the time between samples.
1254

1255
{\tt np.fft} is the NumPy module that provides functions related
1256
to the {\bf Fast Fourier Transform} (FFT), which is an efficient
1257
algorithm that computes the Discrete Fourier Transform (DFT).
1258

1259
%TODO: add a forward reference to full fft
1260
\verb"make_spectrum" uses {\tt rfft}, which stands for ``real
1261
FFT'', because the Wave contains real values, not complex.  Later
1262
we'll see the full FFT, which can handle complex signals.  The result
1263
of {\tt rfft}, which I call {\tt hs}, is a NumPy array of complex
1264
numbers that represents the amplitude and phase offset of each
1265
frequency component in the wave.
1266

1267
The result of {\tt rfftfreq}, which I call {\tt fs}, is an array that
1268
contains frequencies corresponding to the {\tt hs}. 
1269

1270
To understand the values in {\tt hs}, consider these two ways to think
1271
about complex numbers:
1272

1273
\begin{itemize}
1274

1275
\item A complex number is the sum of a real part and an imaginary
1276
  part, often written $x + iy$, where $i$ is the imaginary unit,
1277
  $\sqrt{-1}$.  You can think of $x$ and $y$ as Cartesian coordinates.
1278

1279
\item A complex number is also the product of a magnitude and a
1280
  complex exponential, $A e^{i \phi}$, where $A$ is the {\bf
1281
    magnitude} and $\phi$ is the {\bf angle} in radians, also called
1282
  the ``argument''.  You can think of $A$ and $\phi$ as polar
1283
  coordinates.
1284

1285
\end{itemize}
1286

1287
Each value in {\tt hs} corresponds to a frequency component: its
1288
magnitude is proportional to the amplitude of the corresponding
1289
component; its angle is the phase offset.
1290

1291
The Spectrum class provides two read-only properties, {\tt amps}
1292
and {\tt angles}, which return NumPy arrays representing the
1293
magnitudes and angles of the {\tt hs}.  When we plot a Spectrum
1294
object, we usually plot {\tt amps} versus {\tt fs}.  Sometimes
1295
it is also useful to plot {\tt angles} versus {\tt fs}.
1296

1297
Although it might be tempting to look at the real and imaginary
1298
parts of {\tt hs}, you will almost never need to.  I encourage
1299
you to think of the DFT as a vector of amplitudes and phase offsets
1300
that happen to be encoded in the form of complex numbers.
1301

1302
To modify a Spectrum, you can access the {\tt hs} directly.
1303
For example:
1304

1305
\begin{verbatim}
1306
spectrum.hs *= 2
1307
spectrum.hs[spectrum.fs > cutoff] = 0
1308
\end{verbatim}
1309

1310
The first line multiples the elements of {\tt hs} by 2, which
1311
doubles the amplitudes of all components.  The second line
1312
sets to 0 only the elements of {\tt hs} where the corresponding
1313
frequency exceeds some cutoff frequency.
1314

1315
But Spectrum also provides methods to perform these operations:
1316

1317
\begin{verbatim}
1318
spectrum.scale(2)
1319
spectrum.low_pass(cutoff)
1320
\end{verbatim}
1321

1322
You can read the documentation of these methods and others at
1323
\url{http://think-dsp.com/thinkdsp.html}.
1324

1325
At this point you should have a better idea of how the Signal, Wave,
1326
and Spectrum classes work, but I have not explained how the Fast
1327
Fourier Transform works.  That will take a few more chapters.
1328

1329

1330
\section{Exercises}
1331

1332
Solutions to these exercises are in {\tt chap02soln.ipynb}.
1333

1334
\begin{exercise}
1335
If you use IPython, load {\tt chap02.ipynb} and try out the examples.
1336
You can also view the notebook at \url{http://tinyurl.com/thinkdsp02}.
1337
\end{exercise}
1338

1339
\begin{exercise}
1340
A sawtooth signal has a waveform that ramps up linearly from -1 to 1,
1341
then drops to -1 and repeats. See
1342
\url{http://en.wikipedia.org/wiki/Sawtooth_wave}
1343

1344
Write a class called
1345
{\tt SawtoothSignal} that extends {\tt Signal} and provides
1346
{\tt evaluate} to evaluate a sawtooth signal.
1347

1348
Compute the spectrum of a sawtooth wave.  How does the harmonic
1349
structure compare to triangle and square waves?
1350
\end{exercise}
1351

1352
\begin{exercise}
1353
Make a square signal at 1100 Hz and make a wave that samples it
1354
at 10000 frames per second.  If you plot the spectrum, you can
1355
see that most of the harmonics are aliased.
1356
When you listen to the wave, can you hear the aliased harmonics?  
1357
\end{exercise}
1358

1359

1360
\begin{exercise}
1361
If you have a spectrum object, {\tt spectrum}, and print the
1362
first few values of {\tt spectrum.fs}, you'll see that they
1363
start at zero.  So {\tt spectrum.hs[0]} is the magnitude
1364
of the component with frequency 0.  But what does that mean?
1365

1366
Try this experiment:
1367

1368
\begin{enumerate}
1369

1370
\item Make a triangle signal with frequency 440 and make
1371
a Wave with duration 0.01 seconds.  Plot the waveform.
1372

1373
\item Make a Spectrum object and print {\tt spectrum.hs[0]}.
1374
What is the amplitude and phase of this component?
1375

1376
\item Set {\tt spectrum.hs[0] = 100}.  Make a Wave from the
1377
modified Spectrum and plot it.  What effect does this operation
1378
have on the waveform?
1379

1380
\end{enumerate}
1381

1382
\end{exercise}
1383

1384

1385
\begin{exercise}
1386
Write a function that takes a Spectrum as a parameter and modifies
1387
it by dividing each element of {\tt hs} by the corresponding frequency
1388
from {\tt fs}.  Hint: since division by zero is undefined, you
1389
might want to set {\tt spectrum.hs[0] = 0}.  
1390

1391
Test your function using a square, triangle, or sawtooth wave.
1392

1393
\begin{enumerate}
1394

1395
\item Compute the Spectrum and plot it.
1396

1397
\item Modify the Spectrum using your function and plot it again.
1398

1399
\item Make a Wave from the modified Spectrum and listen to it.  What
1400
effect does this operation have on the signal?
1401

1402
\end{enumerate}
1403

1404
\end{exercise}
1405

1406
\begin{exercise}
1407
Triangle and square waves have odd harmonics only; the sawtooth
1408
wave has both even and odd harmonics.  The harmonics of the
1409
square and sawtooth waves drop off in proportion to $1/f$; the
1410
harmonics of the triangle wave drop off like $1/f^2$.  Can you
1411
find a waveform that has even and odd harmonics that drop off
1412
like $1/f^2$?
1413

1414
Hint: There are two ways you could approach this: you could
1415
construct the signal you want by adding up sinusoids, or you
1416
could start with a signal that is similar to what you want and
1417
modify it.
1418
\end{exercise}
1419

1420

1421
\chapter{Non-periodic signals}
1422
\label{nonperiodic}
1423

1424
The signals we have worked with so far are periodic, which means
1425
that they repeat forever.  It also means that the frequency
1426
components they contain do not change over time.
1427
In this chapter, we consider non-periodic signals,
1428
whose frequency components {\em do} change over time.
1429
In other words, pretty much all sound signals.
1430

1431
This chapter also presents spectrograms, a common way to visualize
1432
non-periodic signals.
1433

1434
The code for this chapter is in {\tt chap03.ipynb}, which is in the
1435
repository for this book (see Section~\ref{code}).
1436
You can also view it at \url{http://tinyurl.com/thinkdsp03}.
1437

1438

1439
\section{Linear chirp}
1440

1441
\begin{figure}
1442
% chirp.py
1443
\centerline{\includegraphics[height=2.5in]{figs/chirp3.eps}}
1444
\caption{Chirp waveform near the beginning, middle, and end.}
1445
\label{fig.chirp3}
1446
\end{figure}
1447

1448
We'll start with a {\bf chirp}, which is a signal with variable
1449
frequency.  {\tt thinkdsp} provides a Signal called Chirp that
1450
makes a sinusoid that sweeps linearly through a range of
1451
frequencies.
1452

1453
Here's an example what sweeps from 220 to 880 Hz, which
1454
is two octaves from A3 to A5:
1455

1456
\begin{verbatim}
1457
signal = thinkdsp.Chirp(start=220, end=880)
1458
wave = signal.make_wave()
1459
\end{verbatim}
1460

1461
Figure~\ref{fig.chirp3} shows segments of this wave near the
1462
beginning, middle, and end.  It's clear that the frequency is
1463
increasing.
1464

1465
Before we go on, let's see how Chirp is implemented.  Here
1466
is the class definition:
1467

1468
\begin{verbatim}
1469
class Chirp(Signal):
1470
    
1471
    def __init__(self, start=440, end=880, amp=1.0):
1472
        self.start = start
1473
        self.end = end
1474
        self.amp = amp
1475
\end{verbatim}
1476

1477
{\tt start} and {\tt end} are the frequencies, in Hz, at the start
1478
and end of the chirp.  {\tt amp} is amplitude.
1479

1480
Here is the function that evaluates the signal:
1481

1482
\begin{verbatim}
1483
    def evaluate(self, ts):
1484
        freqs = np.linspace(self.start, self.end, len(ts)-1)
1485
        return self._evaluate(ts, freqs)
1486
\end{verbatim}
1487

1488
{\tt ts} is the sequence of points in time where the signal should be
1489
evaluated; to keep this function simple, I assume they are equally-spaced.
1490

1491
If the length of {\tt ts} is $n$, you can think of it as a sequence of
1492
$n-1$ intervals of time.  To compute the frequency during each
1493
interval, I use {\tt np.linspace}, which returns a NumPy array of
1494
$n-1$ values between {\tt start} and {\tt end}.
1495

1496
\verb"_evaluate" is a private method
1497
that does the rest of the math\footnote{Beginning a method name
1498
  with an underscore makes it ``private'', indicating that it is not
1499
  part of the API that should be used outside the class definition.}:
1500

1501
\begin{verbatim}
1502
    def _evaluate(self, ts, freqs):
1503
        dts = np.diff(ts)
1504
        dphis = PI2 * freqs * dts
1505
        phases = np.cumsum(dphis)
1506
        phases = np.insert(phases, 0, 0)
1507
        ys = self.amp * np.cos(phases)
1508
        return ys
1509
\end{verbatim}
1510

1511
{\tt np.diff} computes the difference between adjacent elements
1512
of {\tt ts}, returning the length of each interval in seconds.  
1513
If the elements of {\tt ts} are equally spaced,
1514
the {\tt dts} are all the same.
1515

1516
The next step is to figure out how much the phase changes during
1517
each interval.  In Section~\ref{sigobs} we saw that when frequency is
1518
constant, the phase, $\phi$, increases linearly over time:
1519
%
1520
\[ \phi = 2 \pi f t \]
1521
%
1522
When frequency is a function of time, the {\em change} in phase
1523
during a short time interval, $\Delta t$ is:
1524
%
1525
\[ \Delta \phi = 2 \pi f(t) \Delta t \]
1526
%
1527
In Python, since {\tt freqs} contains $f(t)$ and {\tt dts}
1528
contains the time intervals, we can write
1529

1530
\begin{verbatim}
1531
dphis = PI2 * freqs * dts
1532
\end{verbatim}
1533

1534
Now, since {\tt dphis} contains the changes in phase, we can
1535
get the total phase at each timestep by adding up the changes:
1536

1537
\begin{verbatim}
1538
phases = np.cumsum(dphis)
1539
phases = np.insert(phases, 0, 0)
1540
\end{verbatim}
1541

1542
{\tt np.cumsum} computes the cumulative sum, which is almost
1543
what we want, but it doesn't start at 0.  So I use {\tt np.insert}
1544
to add a 0 at the beginning.
1545

1546
The result is a NumPy array where the {\tt i}th element contains the
1547
sum of the first {\tt i} terms from {\tt dphis}; that is, the total
1548
phase at the end of the {\tt i}th interval.  Finally, {\tt np.cos}
1549
computes the amplitude of the wave as a function of phase (remember
1550
that phase is expressed in radians).
1551

1552
If you know calculus, you might notice that the limit as
1553
$\Delta t$ gets small is
1554
%
1555
\[ d\phi = 2 \pi f(t) dt \]
1556
%
1557
Dividing through by $dt$ yields
1558
%
1559
\[ \frac{d\phi}{dt} = 2 \pi f(t) \]
1560
%
1561
In other words, frequency is the derivative of phase.  Conversely,
1562
phase is the integral of frequency.  When we used {\tt cumsum}
1563
to go from frequency to phase, we were approximating integration.
1564

1565

1566
\section{Exponential chirp}
1567

1568
When you listen to this chirp, you might notice that the pitch
1569
rises quickly at first and then slows down.
1570
The chirp spans two octaves, but it only takes 2/3 s to span
1571
the first octave, and twice as long to span the second.  
1572

1573
The reason is that our perception of pitch depends on the logarithm of
1574
frequency.  As a result, the {\bf interval} we hear between two notes
1575
depends on the {\em ratio} of their frequencies, not the difference.
1576
``Interval'' is the musical term for the perceived difference between
1577
two pitches.
1578

1579
For example, an octave is an interval where the ratio of two
1580
pitches is 2.  So the interval from 220 to 440 is one octave
1581
and the interval from 440 to 880 is also one octave.  The difference
1582
in frequency is bigger, but the ratio is the same.
1583

1584
As a result, if frequency increases linearly, as in a linear
1585
chirp, the perceived pitch increases logarithmically.
1586

1587
If you want the perceived pitch to increase linearly, the frequency
1588
has to increase exponentially.  A signal with that shape is called
1589
an {\bf exponential chirp}.
1590

1591
Here's the code that makes one:
1592

1593
\begin{verbatim}
1594
class ExpoChirp(Chirp):
1595
    
1596
    def evaluate(self, ts):
1597
        start, end = np.log10(self.start), np.log10(self.end)
1598
        freqs = np.logspace(start, end, len(ts)-1)
1599
        return self._evaluate(ts, freqs)
1600
\end{verbatim}
1601

1602
Instead of {\tt np.linspace}, this version of evaluate uses
1603
{\tt np.logspace}, which creates a series of frequencies
1604
whose logarithms are equally spaced, which means that they increase
1605
exponentially.
1606

1607
That's it; everything else is the same as Chirp.  Here's the code
1608
that makes one:
1609

1610
\begin{verbatim}
1611
    signal = thinkdsp.ExpoChirp(start=220, end=880)
1612
    wave = signal.make_wave(duration=1)
1613
\end{verbatim}
1614

1615
You can listen to these examples in {\tt chap03.ipynb} and compare
1616
the linear and exponential chirps.  
1617

1618

1619
\section{Spectrum of a chirp}
1620
\label{sauron}
1621

1622
\begin{figure}
1623
% chirp.py
1624
\centerline{\includegraphics[height=2.5in]{figs/chirp1.eps}}
1625
\caption{Spectrum of a one-second one-octave chirp.}
1626
\label{fig.chirp1}
1627
\end{figure}
1628

1629
What do you think happens if you compute the spectrum of a chirp?
1630
Here's an example that constructs a one-second, one-octave chirp and
1631
its spectrum:
1632

1633
\begin{verbatim}
1634
    signal = thinkdsp.Chirp(start=220, end=440)
1635
    wave = signal.make_wave(duration=1)
1636
    spectrum = wave.make_spectrum()
1637
\end{verbatim}
1638

1639
Figure~\ref{fig.chirp1} shows the result.  The spectrum has
1640
components at every frequency from 220 to 440 Hz, with variations
1641
that look a little like the Eye of Sauron
1642
(see \url{http://en.wikipedia.org/wiki/Sauron}).
1643

1644
The spectrum is approximately flat between 220 and 440 Hz, which
1645
indicates that the signal spends equal time at each frequency in this
1646
range.  Based on that observation, you should be able to guess what
1647
the spectrum of an exponential chirp looks like.
1648

1649
The spectrum gives hints about the structure of the signal,
1650
but it obscures the relationship between frequency and time.
1651
For example, we cannot tell by looking at this spectrum whether
1652
the frequency went up or down, or both.
1653

1654

1655
\section{Spectrogram}
1656

1657
\begin{figure}
1658
% chirp.py
1659
\centerline{\includegraphics[height=2.5in]{figs/chirp2.eps}}
1660
\caption{Spectrogram of a one-second one-octave chirp.}
1661
\label{fig.chirp2}
1662
\end{figure}
1663

1664
To recover the relationship between frequency and time, we can break
1665
the chirp into segments and plot the spectrum of each segment.  The
1666
result is called a {\bf short-time Fourier transform} (STFT).
1667

1668
There are several ways to visualize a STFT, but the most common
1669
is a {\bf spectrogram}, which shows time on the x-axis and frequency
1670
on the y-axis.  Each column in the spectrogram shows the spectrum of
1671
a short segment, using color or grayscale to represent amplitude.
1672

1673
As an example, I'll compute the spectrogram of this chirp:
1674

1675
\begin{verbatim}
1676
signal = thinkdsp.Chirp(start=220, end=440)
1677
wave = signal.make_wave(duration=1, framerate=11025)
1678
\end{verbatim}
1679

1680
{\tt Wave} provides \verb"make_spectrogram", which returns a
1681
{\tt Spectrogram} object:
1682

1683
\begin{verbatim}
1684
spectrogram = wave.make_spectrogram(seg_length=512)
1685
spectrogram.plot(high=700)
1686
\end{verbatim}
1687

1688
\verb"seg_length" is the number of samples in each segment.  I chose
1689
512 because FFT is most efficient when the number of samples is a
1690
power of 2.
1691

1692
Figure~\ref{fig.chirp2} shows the result.  The x-axis shows time from
1693
0 to 1 seconds.  The y-axis shows frequency from 0 to 700 Hz.  I cut
1694
off the top part of the spectrogram; the full range goes to 5512.5 Hz,
1695
which is half of the framerate.
1696

1697
The spectrogram shows clearly that frequency increases linearly
1698
over time.  Similarly, in the spectrogram of an exponential chirp, we
1699
can see the shape of the exponential curve.
1700

1701
However, notice that the peak in each column is blurred across 2--3
1702
cells.  This blurring reflects the limited resolution of the
1703
spectrogram.
1704

1705

1706
\section{The Gabor limit}
1707
\label{gabor}
1708

1709
The {\bf time resolution} of the spectrogram is the duration of the
1710
segments, which corresponds to the width of the cells in the
1711
spectrogram.  Since each segment is 512 frames, and there are 11,025
1712
frames per second, there are 0.046 seconds per segment.
1713

1714
The {\bf frequency resolution} is the frequency range between
1715
elements in the spectrum, which corresponds to the height of the
1716
cells.  With 512 frames, we get 256 frequency components over a range
1717
from 0 to 5512.5 Hz, so the range between components is 21.6 Hz.
1718

1719
More generally, if $n$ is the segment length, the spectrum contains
1720
$n/2$ components.  If the framerate is $r$, the maximum frequency in
1721
the spectrum is $r/2$.  So the time resolution is $n/r$ and the
1722
frequency resolution is
1723
%
1724
\[ \frac{r/2}{n/2} \]
1725
%
1726
which is $r/n$.
1727

1728
Ideally we would like time resolution to be small, so we can see rapid
1729
changes in frequency.  And we would like frequency resolution to be
1730
small so we can see small changes in frequency.  But you can't have
1731
both.  Notice that time resolution, $n/r$, is the inverse of frequency
1732
resolution, $r/n$.  So if one gets smaller, the other gets bigger.
1733

1734
For example, if you double the segment length, you cut frequency
1735
resolution in half (which is good), but you double time resolution
1736
(which is bad).  Even increasing the framerate doesn't help.  You get
1737
more samples, but the range of frequencies increases at
1738
the same time.
1739

1740
This tradeoff is called the {\bf Gabor limit} and it is a fundamental
1741
limitation of this kind of time-frequency analysis.
1742

1743

1744
\section{Leakage}
1745

1746
\begin{figure}
1747
% chirp.py
1748
\centerline{\includegraphics[height=2.5in]{figs/windowing1.eps}}
1749
\caption{Spectrum of a periodic segment of a sinusoid (left), a
1750
  non-periodic segment (middle), a windowed non-periodic segment
1751
  (right).}
1752
\label{fig.windowing1}
1753
\end{figure}
1754

1755
In order to explain how \verb"make_spectrogram" works, I have
1756
to explain windowing; and in order to explain windowing, I have to
1757
show you the problem it is meant to address, which is leakage.
1758

1759
The Discrete Fourier Transform (DFT), which we use to compute
1760
Spectrums, treats waves as if they are periodic; that is, it assumes
1761
that the finite segment it operates on is a complete period from an
1762
infinite signal that repeats over all time.  In practice, this
1763
assumption is often false, which creates problems.
1764

1765
One common problem is discontinuity at the beginning and end of the
1766
segment.  Because DFT assumes that the signal is periodic, it
1767
implicitly connects the end of the segment back to the beginning to
1768
make a loop.  If the end does not connect smoothly to the beginning,
1769
the discontinuity creates additional frequency components in the
1770
segment that are not in the signal.
1771

1772
As an example, let's start with a sine signal that contains only
1773
one frequency component at 440 Hz.
1774

1775
\begin{verbatim}
1776
    signal = thinkdsp.SinSignal(freq=440)
1777
\end{verbatim}
1778

1779
If we select a segment that happens to be an integer multiple of
1780
the period, the end of the segment connects smoothly with the
1781
beginning, and DFT behaves well.
1782

1783
\begin{verbatim}
1784
    duration = signal.period * 30
1785
    wave = signal.make_wave(duration)
1786
    spectrum = wave.make_spectrum()
1787
\end{verbatim}
1788

1789
Figure~\ref{fig.windowing1} (left) shows the result.  As expected,
1790
there is a single peak at 440 Hz.
1791

1792
But if the duration is not a multiple of the period, bad things
1793
happen.  With {\tt duration = signal.period * 30.25}, the signal
1794
starts at 0 and ends at 1.
1795

1796
Figure~\ref{fig.windowing1} (middle) shows
1797
the spectrum of this segment.  Again, the peak is at 440 Hz, but now
1798
there are additional components spread out from 240 to 640 Hz.  This
1799
spread is called ``spectral leakage'', because some of the energy that
1800
is actually at the fundamental frequency leaks into other frequencies.
1801

1802
In this example, leakage happens because we are using DFT on a segment
1803
that becomes discontinuous when we treat it as periodic.
1804

1805

1806
\section{Windowing}
1807

1808
\begin{figure}
1809
% chirp.py
1810
\centerline{\includegraphics[height=3.5in]{figs/windowing2.eps}}
1811
\caption{Segment of a sinusoid (top), Hamming window (middle), product
1812
of the segment and the window (bottom).}
1813
\label{fig.windowing2}
1814
\end{figure}
1815

1816
We can reduce leakage by smoothing out the discontinuity between
1817
the beginning and end of the segment, and one way to do that is
1818
{\bf windowing}.
1819

1820
A ``window'' is a function designed to transform a non-periodic
1821
segment into something that can pass for periodic.
1822
Figure~\ref{fig.windowing2} (top) shows a segment where the end does not
1823
connect smoothly to the beginning.
1824

1825
Figure~\ref{fig.windowing2} (middle) shows a ``Hamming window'', one of the
1826
more common window functions.  No window function is perfect, but some
1827
can be shown to be optimal for different applications, and Hamming
1828
is a good, all-purpose window.
1829

1830
Figure~\ref{fig.windowing2} (bottom) shows the result of multiplying the
1831
window by the original signal.  Where the window is close to 1, the
1832
signal is unchanged.  Where the window is close to 0, the signal is
1833
attenuated.  Because the window tapers at both ends, the end of the
1834
segment connects smoothly to the beginning.
1835

1836
Figure~\ref{fig.windowing1} (right) shows the spectrum of the windowed
1837
signal.  Windowing has reduced leakage substantially, but not
1838
completely.
1839

1840
Here's what the code looks like.  {\tt Wave} provides {\tt window},
1841
which applies a Hamming window:
1842

1843
\begin{verbatim}
1844
#class Wave:
1845
    def window(self, window):
1846
        self.ys *= window
1847
\end{verbatim}
1848

1849
And NumPy provides {\tt hamming}, which computes a Hamming window
1850
with a given length:
1851

1852
\begin{verbatim}
1853
window = np.hamming(len(wave))
1854
wave.window(window)
1855
\end{verbatim}
1856

1857
NumPy provides functions to compute other window
1858
functions, including {\tt bartlett}, {\tt blackman}, {\tt hanning},
1859
and {\tt kaiser}.  One of the exercises at the end of this chapter
1860
asks you to experiment with these other windows.
1861

1862

1863
\section{Implementing spectrograms}
1864

1865
\begin{figure}
1866
% chirp.py
1867
\centerline{\includegraphics[height=2.5in]{figs/windowing3.eps}}
1868
\caption{Overlapping Hamming windows.}
1869
\label{fig.windowing3}
1870
\end{figure}
1871

1872
Now that we understand windowing, we can understand the
1873
implementation of spectrogram.
1874
Here is the {\tt Wave} method that computes spectrograms:
1875

1876
\begin{verbatim}
1877
#class Wave:
1878
    def make_spectrogram(self, seg_length):
1879
        window = np.hamming(seg_length)
1880
        i, j = 0, seg_length
1881
        step = seg_length / 2
1882

1883
        spec_map = {}
1884

1885
        while j < len(self.ys):
1886
            segment = self.slice(i, j)
1887
            segment.window(window)
1888

1889
            t = (segment.start + segment.end) / 2
1890
            spec_map[t] = segment.make_spectrum()
1891

1892
            i += step
1893
            j += step
1894

1895
        return Spectrogram(spec_map, seg_length)
1896
\end{verbatim}
1897

1898
This is the longest function in the book, so if you can handle
1899
this, you can handle anything.
1900

1901
The parameter, {\tt self}, is a Wave object.
1902
\verb"seg_length" is the number of samples in each segment.
1903

1904
{\tt window} is a Hamming window with the same length as the segments.
1905

1906
{\tt i} and {\tt j} are the slice indices that select segments from
1907
the wave.  {\tt step} is the offset between segments.  Since {\tt
1908
  step} is half of \verb"seg_length", the segments overlap by half.
1909
Figure~\ref{fig.windowing3} shows what these overlapping windows look
1910
like.
1911

1912
\verb"spec_map" is a dictionary that maps from a timestamp to
1913
a Spectrum.
1914

1915
Inside the while loop, we select a slice from the wave and apply
1916
the window; then we construct a Spectrum
1917
object and add it to \verb"spec_map".  The nominal time of
1918
each segment, {\tt t}, is the midpoint.
1919

1920
Then we advance {\tt i} and {\tt j}, and continue as long as {\tt j}
1921
doesn't go past the end of the Wave.
1922

1923
Finally, the method constructs and returns a Spectrogram.  Here
1924
is the definition of Spectrogram:
1925

1926
\begin{verbatim}
1927
class Spectrogram(object):
1928
    def __init__(self, spec_map, seg_length):
1929
        self.spec_map = spec_map
1930
        self.seg_length = seg_length
1931
\end{verbatim}
1932

1933
Like many init methods, this one just stores the
1934
parameters as attributes.
1935

1936
{\tt Spectrogram} provides {\tt plot}, which generates a
1937
pseudocolor plot with time along the x-axis and frequency along
1938
the y-axis.
1939

1940
And that's how Spectrograms are implemented.
1941

1942

1943
\section{Exercises}
1944

1945
Solutions to these exercises are in {\tt chap03soln.ipynb}.
1946

1947
\begin{exercise}
1948
Run and listen to the examples in {\tt chap03.ipynb}, which is
1949
in the repository for this book, and also available at
1950
\url{http://tinyurl.com/thinkdsp03}.
1951

1952
In the leakage example, try replacing the Hamming window with one of
1953
the other windows provided by NumPy, and see what effect they have on
1954
leakage.  See
1955
\url{http://docs.scipy.org/doc/numpy/reference/routines.window.html}
1956
\end{exercise}
1957

1958

1959
\begin{exercise}
1960
Write a class called {\tt SawtoothChirp} that extends {\tt Chirp}
1961
and overrides {\tt evaluate} to generate a sawtooth waveform with
1962
frequency that increases (or decreases) linearly.
1963

1964
Hint: combine the evaluate functions from {\tt Chirp} and
1965
{\tt SawtoothSignal}.
1966

1967
Draw a sketch of what you think the spectrogram of this signal
1968
looks like, and then plot it.  The effect of aliasing should be
1969
visually apparent, and if you listen carefully, you can hear it.
1970
\end{exercise}
1971

1972

1973
\begin{exercise}
1974
Make a sawtooth chirp that sweeps from 2500 to 3000 Hz, then use it to
1975
make a wave with duration 1 s and framerate 20 kHz.  Draw a sketch of
1976
what you think the spectrum will look like.  Then plot the
1977
spectrum and see if you got it right.
1978
\end{exercise}
1979

1980

1981
\begin{exercise}
1982
In musical terminology, a ``glissando'' is a note that slides from one
1983
pitch to another, so it is similar to a chirp.
1984

1985
Find or make a recording of a glissando and plot a spectrogram of the
1986
first few seconds.  One suggestion: George Gershwin's {\it Rhapsody in
1987
  Blue} starts with a famous clarinet glissando, which you can download
1988
from \url{http://archive.org/details/rhapblue11924}.
1989
\end{exercise}
1990

1991

1992
\begin{exercise}
1993
A trombone player can play a glissando by extending the trombone
1994
slide while blowing continuously.  As the slide extends, the total
1995
length of the tube gets longer, and the resulting pitch is inversely
1996
proportional to length.
1997

1998
Assuming that the player moves the slide at a constant speed, how
1999
does frequency vary with time?  
2000

2001
Write a class called {\tt TromboneGliss} that extends {\tt Chirp} and
2002
provides {\tt evaluate}.  Make a wave that simulates a trombone
2003
glissando from C3 up to F3 and back down to C3.  C3 is 262 Hz; F3 is
2004
349 Hz.
2005

2006
Plot a spectrogram of the resulting wave.  Is a trombone glissando
2007
more like a linear or exponential chirp?
2008
\end{exercise}
2009

2010

2011
\begin{exercise}
2012
Make or find a recording of a series of vowel sounds and look at the
2013
spectrogram.  Can you identify different vowels?
2014
\end{exercise}
2015

2016

2017
\chapter{Noise}
2018

2019
In English, ``noise'' means an unwanted or unpleasant sound.  In the
2020
context of signal processing, it has two different senses:
2021

2022
\begin{enumerate}
2023

2024
\item As in English, it can mean an unwanted signal of any kind.  If
2025
two signals interfere with each other, each signal would consider
2026
the other to be noise.
2027

2028
\item ``Noise'' also refers to a signal that contains components at
2029
many frequencies, so it lacks the harmonic structure of the periodic
2030
signals we saw in previous chapters.  
2031

2032
\end{enumerate}
2033

2034
This chapter is about the second kind.
2035

2036
The code for this chapter is in {\tt chap04.ipynb}, which is in the
2037
repository for this book (see Section~\ref{code}).
2038
You can also view it at \url{http://tinyurl.com/thinkdsp04}.
2039

2040

2041
\section{Uncorrelated noise}
2042

2043
\begin{figure}
2044
% noise.py
2045
\centerline{\includegraphics[height=2.5in]{figs/whitenoise0.eps}}
2046
\caption{Waveform of uncorrelated uniform noise.}
2047
\label{fig.whitenoise0}
2048
\end{figure}
2049

2050
The simplest way to understand noise is to generate it, and the
2051
simplest kind to generate is uncorrelated uniform noise (UU noise).
2052
``Uniform'' means the signal contains random values from a uniform
2053
distribution; that is, every value in the range is equally likely.
2054
``Uncorrelated'' means that the values are independent; that is,
2055
knowing one value provides no information about the others.
2056

2057
Here's a class that represents UU noise:
2058

2059
\begin{verbatim}
2060
class UncorrelatedUniformNoise(_Noise):
2061

2062
    def evaluate(self, ts):
2063
        ys = np.random.uniform(-self.amp, self.amp, len(ts))
2064
        return ys
2065
\end{verbatim}
2066

2067
{\tt UncorrelatedUniformNoise} inherits from \verb"_Noise", which
2068
inherits from {\tt Signal}.
2069

2070
As usual, the evaluate function takes {\tt ts}, the times when the
2071
signal should be evaluated.  It uses
2072
{\tt np.random.uniform}, which generates values from a
2073
uniform distribution.  In this example, the values are in
2074
the range between {\tt -amp} to {\tt amp}.
2075

2076
The following example generates UU noise with duration 0.5
2077
seconds at 11,025 samples per second.
2078

2079
\begin{verbatim}
2080
signal = thinkdsp.UncorrelatedUniformNoise()
2081
wave = signal.make_wave(duration=0.5, framerate=11025)
2082
\end{verbatim}
2083

2084
If you play this wave, it sounds like the static you hear if you tune
2085
a radio between channels.  Figure~\ref{fig.whitenoise0} shows what the
2086
waveform looks like.  As expected, it looks pretty random.
2087

2088
\begin{figure}
2089
% noise.py
2090
\centerline{\includegraphics[height=2.5in]{figs/whitenoise1.eps}}
2091
\caption{Power spectrum of uncorrelated uniform noise.}
2092
\label{fig.whitenoise1}
2093
\end{figure}
2094

2095
Now let's take a look at the spectrum:
2096

2097
\begin{verbatim}
2098
spectrum = wave.make_spectrum()
2099
spectrum.plot_power()
2100
\end{verbatim}
2101

2102
\verb"Spectrum.plot_power" is similar to \verb"Spectrum.plot",
2103
except that it plots power instead of amplitude.
2104
Power is the square of amplitude.
2105
I am switching from amplitude to power in this chapter because
2106
it is more conventional in the context of noise.
2107

2108
Figure~\ref{fig.whitenoise1} shows the result.  Like the signal, the
2109
spectrum looks pretty random.  In fact, it {\em is} random, but we have to
2110
be more precise about the word ``random''.  There are at least three
2111
things we might like to know about a noise signal or its spectrum:
2112

2113
\begin{itemize}
2114

2115
\item Distribution: The distribution of a random signal is the set of
2116
  possible values and their probabilities.  For example, in the
2117
  uniform noise signal, the set of values is the range from -1 to 1,
2118
  and all values have the same probability.  An alternative is
2119
  {\bf Gaussian noise}, where the set of values is the range from negative
2120
  to positive infinity, but values near 0 are the most likely, with
2121
  probability that drops off according to the Gaussian or
2122
  ``bell'' curve.
2123

2124
\item Correlation: Is each value in the signal independent of the
2125
  others, or are there dependencies between them?  In UU noise, the
2126
  values are independent.
2127
  An alternative is {\bf Brownian noise}, where each value is the sum
2128
  of the previous value and a random ``step''.  So if the value of the
2129
  signal is high at a particular point in time, we expect it to stay
2130
  high, and if it is low, we expect
2131
  it to stay low.
2132

2133
\item Relationship between power and frequency: In the spectrum of UU
2134
  noise, the power at all frequencies is drawn from the same
2135
  distribution; that is, the average power is the same for all
2136
  frequencies.  An alternative is {\bf pink noise}, where power is
2137
  inversely related to frequency; that is, the power at frequency $f$
2138
  is drawn from a distribution whose mean is proportional to $1/f$.
2139

2140
\end{itemize}
2141

2142

2143
\section{Integrated spectrum}
2144

2145
For UU noise we can see the relationship between power and frequency
2146
more clearly by looking at the {\bf integrated spectrum}, which
2147
is a function of frequency, $f$, that shows the cumulative power in
2148
the spectrum up to $f$.
2149

2150
\begin{figure}
2151
% noise.py
2152
\centerline{\includegraphics[height=2.5in]{figs/whitenoise2.eps}}
2153
\caption{Integrated spectrum of uncorrelated uniform noise.}
2154
\label{fig.whitenoise2}
2155
\end{figure}
2156

2157
{\tt Spectrum} provides a method that computes the IntegratedSpectrum:
2158

2159
\begin{verbatim}
2160
    def make_integrated_spectrum(self):
2161
        cs = np.cumsum(self.power)
2162
        cs /= cs[-1]
2163
        return IntegratedSpectrum(cs, self.fs)
2164
\end{verbatim}
2165

2166
{\tt self.power} is a NumPy array containing power for each frequency.
2167
{\tt np.cumsum} computes the cumulative sum of the powers.
2168
Dividing through by the last element normalizes the integrated
2169
spectrum so it runs from 0 to 1.
2170

2171
The result is an IntegratedSpectrum.  Here is the class definition:
2172

2173
\begin{verbatim}
2174
class IntegratedSpectrum(object):    
2175
    def __init__(self, cs, fs):
2176
        self.cs = cs
2177
        self.fs = fs
2178
\end{verbatim}
2179

2180
Like Spectrum, IntegratedSpectrum provides \verb"plot_power", so
2181
we can compute and plot the integrated spectrum like this:
2182

2183
\begin{verbatim}
2184
    integ = spectrum.make_integrated_spectrum()
2185
    integ.plot_power()
2186
\end{verbatim}
2187

2188
The result, shown in Figure~\ref{fig.whitenoise2}, is a straight line,
2189
which indicates that power at all frequencies is constant, on average.
2190
Noise with equal power at all frequencies is called {\bf white noise}
2191
by analogy with light, because an equal mixture of light at all
2192
visible frequencies is white.
2193

2194

2195

2196
\section{Brownian noise}
2197
\label{brownian}
2198

2199
\begin{figure}
2200
% noise.py
2201
\centerline{\includegraphics[height=2.5in]{figs/rednoise0.eps}}
2202
\caption{Waveform of Brownian noise.}
2203
\label{fig.rednoise0}
2204
\end{figure}
2205

2206
UU noise is uncorrelated, which means that each value does not depend
2207
on the others.  An alternative is Brownian noise, in which each value
2208
is the sum of the previous value and a random ``step''.
2209

2210
It is called ``Brownian'' by analogy with Brownian motion, in which a
2211
particle suspended in a fluid moves apparently at random, due to
2212
unseen interactions with the fluid.  Brownian motion is often
2213
described using a {\bf random walk}, which is a mathematical model 
2214
of a path where the distance between steps is characterized by a
2215
random distribution.
2216

2217
In a one-dimensional random walk, the particle moves up or down
2218
by a random amount at each time step.  The location of the particle
2219
at any point in time is the sum of all previous steps.
2220

2221
This observation suggests a way to generate Brownian noise:
2222
generate uncorrelated random steps and then add them up.
2223
Here is a class definition that implements this algorithm:
2224

2225
\begin{verbatim}
2226
class BrownianNoise(_Noise):
2227

2228
    def evaluate(self, ts):
2229
        dys = np.random.uniform(-1, 1, len(ts))
2230
        ys = np.cumsum(dys)
2231
        ys = normalize(unbias(ys), self.amp)
2232
        return ys
2233
\end{verbatim}
2234

2235
{\tt evaluate} uses {\tt np.random.uniform} to generate an
2236
uncorrelated signal and {\tt np.cumsum} to compute their cumulative
2237
sum.
2238

2239
Since the sum is likely to escape the range from -1 to
2240
1, we have to use {\tt unbias} to shift the mean to 0, and {\tt
2241
  normalize} to get the desired maximum amplitude.
2242

2243
Here's the code that generates a BrownianNoise object and plots the
2244
waveform.
2245

2246
\begin{verbatim}
2247
    signal = thinkdsp.BrownianNoise()
2248
    wave = signal.make_wave(duration=0.5, framerate=11025)
2249
    wave.plot()
2250
\end{verbatim}
2251

2252
Figure~\ref{fig.rednoise0} shows the result.  The waveform
2253
wanders up and down, but there is a clear correlation between
2254
successive values.  When the amplitude is high, it tends to stay
2255
high, and vice versa.
2256

2257
\begin{figure}
2258
% noise.py
2259
\centerline{\includegraphics[height=2.5in]{figs/rednoise3.eps}}
2260
\caption{Spectrum of Brownian noise on a log-log scale.}
2261
\label{fig.rednoise3}
2262
\end{figure}
2263

2264
If you plot the spectrum of Brownian noise, it doesn't look like
2265
much.  Nearly all of the power is at the lowest frequencies; on a
2266
linear scale, the higher frequency components are not visible.
2267

2268
To see the shape of the spectrum more clearly, we can plot power
2269
and frequency on a log-log scale.  Here's the code:
2270

2271
\begin{verbatim}
2272
    spectrum = wave.make_spectrum()
2273
    spectrum.plot_power(linewidth=1, alpha=0.5)
2274
    thinkplot.config(xscale='log', yscale='log')
2275
\end{verbatim}
2276

2277
The result is in Figure~\ref{fig.rednoise3}.  The relationship between
2278
power and frequency is noisy, but roughly linear.
2279

2280
{\tt Spectrum} provides \verb"estimate_slope", which uses SciPy to compute
2281
a least squares fit to the power spectrum:
2282

2283
\begin{verbatim}
2284
#class Spectrum
2285

2286
    def estimate_slope(self):
2287
        x = np.log(self.fs[1:])
2288
        y = np.log(self.power[1:])
2289
        t = scipy.stats.linregress(x,y)
2290
        return t
2291
\end{verbatim}
2292

2293
It discards the first component of the spectrum because
2294
this component corresponds to $f=0$, and $\log 0$ is undefined.
2295

2296
\verb"estimate_slope" returns the result from {\tt
2297
  scipy.stats.linregress} which is an object that contains the
2298
estimated slope and intercept, coefficient of determination ($R^2$),
2299
p-value, and standard error.  For our purposes, we only need the
2300
slope.
2301

2302
For Brownian noise, the slope of the power spectrum is -2 (we'll see
2303
why in Chapter~\ref{diffint}), so we can write this relationship:
2304
%
2305
\[ \log P = k -2 \log f \]
2306
%
2307
where $P$ is power, $f$ is frequency, and $k$ is the intercept
2308
of the line, which is not important for our purposes.
2309
Exponentiating both sides yields:
2310
%
2311
\[ P = K / f^{2} \]
2312
%
2313
where $K$ is $e^k$, but still not important.  More relevant is
2314
that power is proportional to $1/f^2$, which is characteristic
2315
of Brownian noise.
2316

2317
Brownian noise is also called {\bf red noise}, for the same reason that
2318
white noise is called ``white''.  If you combine visible light with
2319
power proportional to $1/f^2$, most of the power
2320
would be at the low-frequency end of the spectrum, which is red.
2321
Brownian noise is also sometimes called ``brown noise'', but I think
2322
that's confusing, so I won't use it.
2323

2324
%TODO: refer back to this in the diff/int chapter
2325

2326

2327
\section{Pink Noise}
2328
\label{pink}
2329

2330
\begin{figure}
2331
% noise.py
2332
\centerline{\includegraphics[height=2.5in]{figs/pinknoise0.eps}}
2333
\caption{Waveform of pink noise with $\beta=1$.}
2334
\label{fig.pinknoise0}
2335
\end{figure}
2336

2337
For red noise, the relationship between frequency
2338
and power is
2339
%
2340
\[ P = K / f^{2} \]
2341
%
2342
There is nothing special about the exponent 2.  More generally,
2343
we can synthesize noise with any exponent, $\beta$.
2344
%
2345
\[ P = K / f^{\beta} \]
2346
%
2347
When $\beta = 0$, power is constant at all frequencies,
2348
so the result is white noise.  When $\beta=2$ the result is red noise.
2349

2350
When $\beta$ is between 0 and 2, the result is between white and
2351
red noise, so it is called ``pink noise''.
2352

2353
There are several ways to generate pink noise.  The simplest is to
2354
generate white noise and then apply a low-pass filter with the
2355
desired exponent.  {\tt thinkdsp} provides a class that represents
2356
a pink noise signal:
2357

2358
\begin{verbatim}
2359
class PinkNoise(_Noise):
2360

2361
    def __init__(self, amp=1.0, beta=1.0):
2362
        self.amp = amp
2363
        self.beta = beta
2364
\end{verbatim}
2365

2366
{\tt amp} is the desired amplitude of the signal.
2367
{\tt beta} is the desired exponent.  {\tt PinkNoise} provides
2368
\verb"make_wave", which generates a Wave.
2369

2370
\begin{verbatim}
2371
    def make_wave(self, duration=1, start=0, framerate=11025):
2372
        signal = UncorrelatedUniformNoise()
2373
        wave = signal.make_wave(duration, start, framerate)
2374
        spectrum = wave.make_spectrum()
2375

2376
        spectrum.pink_filter(beta=self.beta)
2377

2378
        wave2 = spectrum.make_wave()
2379
        wave2.unbias()
2380
        wave2.normalize(self.amp)
2381
        return wave2
2382
\end{verbatim}
2383

2384
{\tt duration} is the length of the wave in seconds.  {\tt start} is
2385
the start time of the wave; it is included so that \verb"make_wave"
2386
has the same interface for all types of signal, but for random noise,
2387
start time is irrelevant.  And {\tt framerate} is the number of
2388
samples per second.
2389

2390
\begin{figure}
2391
% noise.py
2392
\centerline{\includegraphics[height=2.5in]{figs/noise-triple.eps}}
2393
\caption{Spectrum of white, pink, and red noise on a log-log scale.}
2394
\label{fig.noise-triple}
2395
\end{figure}
2396

2397
\verb"make_wave" creates a white noise wave, computes its spectrum,
2398
applies a filter with the desired exponent, and then converts the
2399
filtered spectrum back to a wave.  Then it unbiases and normalizes
2400
the wave.
2401

2402
{\tt Spectrum} provides \verb"pink_filter":
2403

2404
\begin{verbatim}
2405
    def pink_filter(self, beta=1.0):
2406
        denom = self.fs ** (beta/2.0)
2407
        denom[0] = 1
2408
        self.hs /= denom
2409
\end{verbatim}
2410

2411
\verb"pink_filter" divides each element of the spectrum by
2412
$f^{\beta/2}$.  Since power is the square of amplitude, this
2413
operation divides the power at
2414
each component by $f^\beta$.  It treats the component
2415
at $f=0$ as a special case, partly to avoid dividing by 0, and partly
2416
because this element represents the bias of the signal,
2417
which we are going to set to 0 anyway. 
2418

2419
Figure~\ref{fig.pinknoise0} shows the resulting waveform.  Like
2420
Brownian noise, it wanders up and down in a way that suggests
2421
correlation between successive values, but at least visually, it looks
2422
more random.  In the next chapter we will come back to this
2423
observation and I will be more precise about what I mean by
2424
``correlation'' and ``more random''.
2425

2426
Finally, Figure~\ref{fig.noise-triple} shows a spectrum for
2427
white, pink, and red noise on the same log-log scale.
2428
The relationship between the exponent, $\beta$, and the slope
2429
of the spectrum is apparent in this figure.
2430

2431

2432
\section{Gaussian noise}
2433

2434
\begin{figure}
2435
% noise.py
2436
\centerline{\includegraphics[height=2.5in]{figs/noise1.eps}}
2437
\caption{Normal probability plot for the real and imaginary parts
2438
of the spectrum of Gaussian noise.}
2439
\label{fig.noise1}
2440
\end{figure}
2441

2442
We started with uncorrelated uniform (UU) noise and showed that,
2443
because its spectrum has equal power at all frequencies, on
2444
average, UU noise is white.
2445

2446
But when people talk about ``white noise'', they don't always
2447
mean UU noise.  In fact, more often they mean uncorrelated
2448
Gaussian (UG) noise.
2449

2450
{\tt thinkdsp} provides an implementation of UG noise:
2451

2452
\begin{verbatim}
2453
class UncorrelatedGaussianNoise(_Noise):
2454

2455
    def evaluate(self, ts):
2456
        ys = np.random.normal(0, self.amp, len(ts))
2457
        return ys
2458
\end{verbatim}
2459

2460
{\tt np.random.normal} returns a NumPy array of values from a
2461
Gaussian distribution, in this case with mean 0 and standard deviation
2462
{\tt self.amp}.  In theory the range of values is from negative to
2463
positive infinity, but we expect about 99\% of the values to be
2464
between -3 and 3.
2465

2466
UG noise is similar in many ways to UU noise.  The spectrum has
2467
equal power at all frequencies, on average, so UG is also white.
2468
And it has one other interesting property: the spectrum of UG
2469
noise is also UG noise.  More precisely, the real and imaginary
2470
parts of the spectrum are uncorrelated Gaussian values.
2471

2472
To test that claim, we can generate the spectrum of UG noise and
2473
then generate a ``normal probability plot'', which is a graphical
2474
way to test whether a distribution is Gaussian.
2475

2476
\begin{verbatim}
2477
    signal = thinkdsp.UncorrelatedGaussianNoise()
2478
    wave = signal.make_wave(duration=0.5, framerate=11025)
2479
    spectrum = wave.make_spectrum()
2480

2481
    thinkstats2.NormalProbabilityPlot(spectrum.real)
2482
    thinkstats2.NormalProbabilityPlot(spectrum.imag)
2483
\end{verbatim}
2484

2485
{\tt NormalProbabilityPlot} is provided by {\tt thinkstats2}, which is
2486
included in the repository for this book.  If you are not familiar
2487
with normal probability plots, you can read about them in Chapter 5
2488
of {\it Think Stats} at \url{http://thinkstats2.com}.
2489

2490
Figure~\ref{fig.noise1} shows the results.  The gray lines
2491
show a linear model fit to the data; the dark lines show the
2492
data.
2493

2494
A straight line on a normal probability plot indicates
2495
that the data come from a Gaussian distribution.  Except for
2496
some random variation at the extremes, these lines are straight,
2497
which indicates that the spectrum of UG noise is UG noise.
2498

2499
The spectrum of UU noise is also UG noise, at least approximately.  In
2500
fact, by the Central Limit Theorem, the spectrum of almost any
2501
uncorrelated noise is approximately Gaussian, as long as the
2502
distribution has finite mean and standard deviation, and the number of
2503
samples is large.
2504

2505

2506
\section{Exercises}
2507

2508
Solutions to these exercises are in {\tt chap04soln.ipynb}.
2509

2510
\begin{exercise}
2511
``A Soft Murmur'' is a web site that plays a mixture of natural
2512
noise sources, including rain, waves, wind, etc.  At
2513
\url{http://asoftmurmur.com/about/} you can find their list
2514
of recordings, most of which are at \url{http://freesound.org}.
2515

2516
Download a few of these files and compute the spectrum of each
2517
signal.  Does the power spectrum look like white noise, pink noise,
2518
or Brownian noise?  How does the spectrum vary over time?
2519
\end{exercise}
2520

2521

2522
\begin{exercise}
2523
In a noise signal, the mixture of frequencies changes over time.
2524
In the long run, we expect the power at all frequencies to be equal,
2525
but in any sample, the power at each frequency is random.
2526

2527
To estimate the long-term average power at each frequency, we can
2528
break a long signal into segments, compute the power spectrum
2529
for each segment, and then compute the average across
2530
the segments.  You can read more about this algorithm at
2531
\url{http://en.wikipedia.org/wiki/Bartlett's_method}.
2532

2533
Implement Bartlett's method and use it to estimate the power
2534
spectrum for a noise wave.  Hint: look at the implementation
2535
of \verb"make_spectrogram".
2536
\end{exercise}
2537

2538

2539
\begin{exercise}
2540
At \url{http://www.coindesk.com} you can download the daily
2541
price of a BitCoin as a CSV file.  Read this file and compute
2542
the spectrum of BitCoin prices as a function of time.
2543
Does it resemble white, pink, or Brownian noise?
2544
\end{exercise}
2545

2546

2547
\begin{exercise}
2548
A Geiger counter is a device that detects radiation.
2549
When an ionizing particle strikes the detector, it outputs a surge of
2550
current.  The total output at a point in time can be modeled as 
2551
uncorrelated Poisson (UP) noise, where each sample is
2552
a random quantity from a Poisson distribution, which corresponds to the
2553
number of particles detected during an interval.
2554

2555
Write a class called {\tt UncorrelatedPoissonNoise} that inherits
2556
from \verb"thinkdsp._Noise" and provides {\tt evaluate}.  It should
2557
use {\tt np.random.poisson} to generate random values from a Poisson
2558
distribution.  The parameter of this function, {\tt lam}, is the
2559
average number of particles during each interval.  You can use the
2560
attribute {\tt amp} to specify {\tt lam}.  For example, if the
2561
framerate is 10 kHz and {\tt amp} is 0.001, we expect about 10
2562
``clicks'' per second.
2563

2564
Generate about a second of UP noise and listen to it.  For low values
2565
of {\tt amp}, like 0.001, it should sound like a Geiger counter.  For
2566
higher values it should sound like white noise.  Compute and plot the
2567
power spectrum to see whether it looks like white noise.
2568
\end{exercise}
2569

2570

2571
\begin{exercise}
2572
The algorithm in this chapter for generating pink noise is
2573
conceptually simple but computationally expensive.  There are
2574
more efficient alternatives, like the Voss-McCartney algorithm.
2575
Research this method, implement it, compute the spectrum of
2576
the result, and confirm that it has the desired relationship
2577
between power and frequency.
2578
\end{exercise}
2579

2580

2581
\chapter{Autocorrelation}
2582

2583
In the previous chapter I characterized white noise as ``uncorrelated'',
2584
which means that each value is independent of the others, and Brownian
2585
noise as ``correlated'', because each value depends on the preceding
2586
value.  In this chapter I define these terms more precisely and
2587
present the {\bf autocorrelation function}, which is a useful tool
2588
for signal analysis.
2589

2590
The code for this chapter is in {\tt chap05.ipynb}, which is in the
2591
repository for this book (see Section~\ref{code}).
2592
You can also view it at \url{http://tinyurl.com/thinkdsp05}.
2593

2594

2595
\section{Correlation}
2596

2597
In general, correlation between variables means that if you know the
2598
value of one, you have some information about the other.  There are
2599
several ways to quantify correlation, but the most common is the
2600
Pearson product-moment correlation coefficient, usually denoted
2601
$\rho$.  For two variables, $x$ and $y$, that each contain $N$ values:
2602
%
2603
\[ \rho = \frac{ \sum_i (x_i - \mu_x) (y_i - \mu_y)}{N \sigma_x \sigma_y} \]
2604
%
2605
Where $\mu_x$ and $\mu_y$ are the means of $x$ and $y$, and
2606
$\sigma_x$ and $\sigma_y$ are their standard deviations.
2607

2608
Pearson's correlation is always between -1 and +1 (including both).
2609
If $\rho$ is positive, we say that the correlation is positive,
2610
which means that when one variable is high, the other tends to be
2611
high.  If $\rho$ is negative, the correlation is negative, so
2612
when one variable is high, the other tends to be low.
2613

2614
The magnitude of $\rho$ indicates the strength of the correlation.  If
2615
$\rho$ is 1 or -1, the variables are perfectly correlated, which means
2616
that if you know one, you can make a perfect prediction about the
2617
other.  If $\rho$ is near zero, the correlation is probably weak, so
2618
if you know one, it doesn't tell you much about the others,
2619

2620
I say ``probably weak'' because it is also possible that there is
2621
a nonlinear relationship that is not captured by the coefficient
2622
of correlation.  Nonlinear relationships are often important in
2623
statistics, but less often relevant for signal processing, so I
2624
won't say more about them here.
2625

2626
Python provides several ways to compute correlations.  {\tt
2627
  np.corrcoef} takes any number of variables and computes a {\bf
2628
  correlation matrix} that includes correlations between each pair of
2629
variables.
2630

2631
\begin{figure}
2632
% autocorr.py
2633
\centerline{\includegraphics[height=2.5in]{figs/autocorr1.eps}}
2634
\caption{Two sine waves that differ by a phase offset of 1 radian;
2635
their coefficient of correlation is 0.54.}
2636
\label{fig.autocorr1}
2637
\end{figure}
2638

2639
I'll present an example with only two variables.  First, I define
2640
a function that constructs sine waves with different phase offsets:
2641

2642
\begin{verbatim}
2643
def make_sine(offset):
2644
    signal = thinkdsp.SinSignal(freq=440, offset=offset)
2645
    wave = signal.make_wave(duration=0.5, framerate=10000)
2646
    return wave
2647
\end{verbatim}
2648

2649
Next I instantiate two waves with different offsets:
2650

2651
\begin{verbatim}
2652
wave1 = make_sine(offset=0)
2653
wave2 = make_sine(offset=1)
2654
\end{verbatim}
2655

2656
Figure~\ref{fig.autocorr1} shows what the first few periods of these
2657
waves look like.  When one wave is high, the other is usually high, so we
2658
expect them to be correlated.
2659

2660
\begin{verbatim}
2661
>>> corr_matrix = np.corrcoef(wave1.ys, wave2.ys, ddof=0)
2662
[[ 1.    0.54]
2663
 [ 0.54  1.  ]]
2664
\end{verbatim}
2665

2666
The option {\tt ddof=0} indicates that {\tt corrcoef} should divide by
2667
$N$, as in the equation above, rather than use the default, $N-1$.
2668

2669
The result is a correlation matrix:
2670
the first element is the correlation of {\tt wave1}
2671
with itself, which is always 1.  Similarly, the last element
2672
is the correlation of {\tt wave2} with itself.
2673

2674
\begin{figure}
2675
% autocorr.py
2676
\centerline{\includegraphics[height=2.5in]{figs/autocorr2.eps}}
2677
\caption{The correlation of two sine waves as a function of the
2678
phase offset between them.  The result is a cosine.}
2679
\label{fig.autocorr2}
2680
\end{figure}
2681

2682
The off-diagonal elements contain the value we're interested in,
2683
the correlation of {\tt wave1} and {\tt wave2}.  The value 0.54
2684
indicates that the strength of the correlation is moderate.
2685

2686
As the phase offset increases, this correlation decreases until
2687
the waves are 180 degrees out of phase, which yields correlation
2688
-1.  Then it increases until the offset differs by 360 degrees.
2689
At that point we have come full circle and the correlation is 1.
2690

2691
Figure~\ref{fig.autocorr2} shows the relationship between correlation and
2692
phase offset for a sine wave.  The shape of that curve should look
2693
familiar; it is a cosine.  
2694

2695
{\tt thinkdsp} provides a simple interface for computing the
2696
correlation between waves:
2697

2698
\begin{verbatim}
2699
>>> wave1.corr(wave2)
2700
0.54
2701
\end{verbatim}
2702

2703

2704
\section{Serial correlation}
2705

2706
Signals often represent measurements of quantities that vary in
2707
time.  For example, the sound signals we've worked with represent
2708
measurements of voltage (or current), which correspond to the changes
2709
in air pressure we perceive as sound.
2710

2711
Measurements like this almost always have serial correlation, which
2712
is the correlation between each element and the next (or the previous).
2713
To compute serial correlation, we can shift a signal and then compute
2714
the correlation of the shifted version with the original.
2715

2716
\begin{verbatim}
2717
def serial_corr(wave, lag=1):
2718
    n = len(wave)
2719
    y1 = wave.ys[lag:]
2720
    y2 = wave.ys[:n-lag]
2721
    corr = np.corrcoef(y1, y2, ddof=0)[0, 1]
2722
    return corr
2723
\end{verbatim}
2724

2725
\verb"serial_corr" takes a Wave object and
2726
{\tt lag}, which is the integer number of places to shift the wave.
2727
It computes the correlation of the wave with a shifted version
2728
of itself.
2729

2730
We can test this function with the noise signals from the previous
2731
chapter.  We expect UU noise to be uncorrelated, based on the
2732
way it's generated (not to mention the name):
2733

2734
\begin{verbatim}
2735
signal = thinkdsp.UncorrelatedGaussianNoise()
2736
wave = signal.make_wave(duration=0.5, framerate=11025)
2737
serial_corr(wave)
2738
\end{verbatim}
2739

2740
When I ran this example, I got 0.006, which indicates a very
2741
small serial correlation.  You might get a different value when you run
2742
it, but it should be comparably small.
2743

2744
In a Brownian noise signal, each value is the sum of the previous
2745
value and a random ``step'', so we expect a strong serial
2746
correlation:
2747

2748
\begin{verbatim}
2749
signal = thinkdsp.BrownianNoise()
2750
wave = signal.make_wave(duration=0.5, framerate=11025)
2751
serial_corr(wave)
2752
\end{verbatim}
2753

2754
Sure enough, the result I got is greater than 0.999.
2755

2756
\begin{figure}
2757
% autocorr.py
2758
\centerline{\includegraphics[height=2.5in]{figs/autocorr3.eps}}
2759
\caption{Serial correlation for pink noise with a range of
2760
parameters.}
2761
\label{fig.autocorr3}
2762
\end{figure}
2763

2764
Since pink noise is in some sense between Brownian noise and UU noise,
2765
we might expect an intermediate correlation:
2766

2767
\begin{verbatim}
2768
signal = thinkdsp.PinkNoise(beta=1)
2769
wave = signal.make_wave(duration=0.5, framerate=11025)
2770
serial_corr(wave)
2771
\end{verbatim}
2772

2773
With parameter $\beta=1$, I got a serial correlation of 0.851.
2774
As we vary the parameter from $\beta=0$, which is uncorrelated
2775
noise, to $\beta=2$, which is Brownian, serial correlation
2776
ranges from 0 to almost 1, as shown in Figure~\ref{fig.autocorr3}.
2777

2778

2779
\section{Autocorrelation}
2780
\label{autopink}
2781

2782
In the previous section we computed the correlation between each
2783
value and the next, so we shifted the elements of the array by 1.
2784
But we can easily compute serial correlations with
2785
different lags.
2786

2787
\begin{figure}
2788
% autocorr.py
2789
\centerline{\includegraphics[height=2.5in]{figs/autocorr4.eps}}
2790
\caption{Autocorrelation functions for pink noise with a range
2791
of parameters.}
2792
\label{fig.autocorr4}
2793
\end{figure}
2794

2795
You can think of \verb"serial_corr" as a function that
2796
maps from each value of lag to the corresponding correlation, and we
2797
can evaluate that function by looping through values of {\tt lag}:
2798

2799
\begin{verbatim}
2800
def autocorr(wave):
2801
    lags = range(len(wave.ys)//2)
2802
    corrs = [serial_corr(wave, lag) for lag in lags]
2803
    return lags, corrs
2804
\end{verbatim}
2805

2806
{\tt autocorr} takes a Wave object and returns the autocorrelation
2807
function as a pair of sequences: {\tt lags} is a sequence of
2808
integers from 0 to half the length of the wave; {\tt corrs}
2809
is the sequence of serial correlations for each lag.
2810

2811
Figure~\ref{fig.autocorr4} shows autocorrelation functions for pink
2812
noise with three values of $\beta$.  For low values of $\beta$, the
2813
signal is less correlated, and the autocorrelation function drops
2814
off to zero quickly.  For larger values, serial correlation
2815
is stronger and drops off more slowly.  With $\beta=1.7$ serial
2816
correlation is strong even for long lags; this phenomenon is called
2817
{\bf long-range dependence}, because it indicates that each value in
2818
the signal depends on many preceding values.
2819

2820

2821
\section{Autocorrelation of periodic signals}
2822

2823
The autocorrelation of pink noise has interesting mathematical
2824
properties, but limited applications.  The autocorrelation of
2825
periodic signals is more useful.
2826

2827
\begin{figure}
2828
% autocorr.py
2829
\centerline{\includegraphics[height=2.5in]{figs/autocorr5.eps}}
2830
\caption{Spectrogram of a vocal chirp.}
2831
\label{fig.autocorr5}
2832
\end{figure}
2833

2834
As an example, I downloaded from \url{freesound.org} a recording of
2835
someone singing a chirp; the repository for this book includes the
2836
file: \url{28042__bcjordan__voicedownbew.wav}.  You can use the
2837
IPython notebook for this chapter, {\tt chap05.ipynb}, to play it.
2838

2839
Figure~\ref{fig.autocorr5} shows the spectrogram of this wave.
2840
The fundamental frequency and some of the harmonics show up clearly.
2841
The chirp starts near 500 Hz and drops down to about 300 Hz, roughly
2842
from C5 to E4.
2843

2844
\begin{figure}
2845
% autocorr.py
2846
\centerline{\includegraphics[height=2.5in]{figs/autocorr6.eps}}
2847
\caption{Spectrum of a segment from a vocal chirp.}
2848
\label{fig.autocorr6}
2849
\end{figure}
2850

2851
To estimate pitch at a particular point in time, we could use the
2852
spectrum, but it doesn't work very well.  To see why not, I'll take
2853
a short segment from the wave and plot its spectrum:
2854

2855
\begin{verbatim}
2856
    duration = 0.01
2857
    segment = wave.segment(start=0.2, duration=duration)
2858
    spectrum = segment.make_spectrum()
2859
    spectrum.plot(high=1000)
2860
\end{verbatim}
2861

2862
This segment starts at 0.2 seconds and lasts 0.01 seconds.
2863
Figure~\ref{fig.autocorr6} shows its spectrum.  There is a clear peak
2864
near 400 Hz, but it is hard to identify the pitch precisely.  The
2865
length of the segment is 441 samples at a framerate of 44100 Hz, so
2866
the frequency resolution is 100 Hz (see Section~\ref{gabor}).
2867
That means the estimated pitch might be off by 50 Hz; in musical
2868
terms, the range from 350 Hz to 450 Hz is about 5 semitones, which is
2869
a big difference!
2870

2871
We could get better frequency resolution by taking a longer segment,
2872
but since the pitch is changing over time, we would also get ``motion
2873
blur''; that is, the peak would spread between the start and end pitch of
2874
the segment, as we saw in Section~\ref{sauron}.
2875

2876
We can estimate pitch more precisely using autocorrelation.
2877
If a signal is periodic, we expect the autocorrelation to spike
2878
when the lag equals the period.
2879

2880
\begin{figure}
2881
% autocorr.py
2882
\centerline{\includegraphics[height=2.5in]{figs/autocorr7.eps}}
2883
\caption{Two segments from a chirp, one starting 0.0023 seconds
2884
after the other.}
2885
\label{fig.autocorr7}
2886
\end{figure}
2887

2888
To show why that works, I'll plot two segments from the same
2889
recording.
2890

2891
\begin{verbatim}
2892
def plot_shifted(wave, offset=0.001, start=0.2):
2893
    thinkplot.preplot(2)
2894
    segment1 = wave.segment(start=start, duration=0.01)
2895
    segment1.plot(linewidth=2, alpha=0.8)
2896

2897
    segment2 = wave.segment(start=start-offset, duration=0.01)
2898
    segment2.shift(offset)
2899
    segment2.plot(linewidth=2, alpha=0.4)
2900

2901
    corr = segment1.corr(segment2)
2902
    text = r'$\rho =$ %.2g' % corr
2903
    thinkplot.text(segment1.start+0.0005, -0.8, text)
2904
    thinkplot.config(xlabel='Time (s)')
2905
\end{verbatim} 
2906

2907
One segment starts at 0.2 seconds; the other starts 0.0023 seconds
2908
later.  Figure~\ref{fig.autocorr7} shows the result.  The segments
2909
are similar, and their correlation is 0.99.  This result suggests
2910
that the period is near 0.0023 seconds, which corresponds to a frequency
2911
of 435 Hz.
2912

2913
\begin{figure}
2914
% autocorr.py
2915
\centerline{\includegraphics[height=2.5in]{figs/autocorr8.eps}}
2916
\caption{Autocorrelation function for a segment from a chirp.}
2917
\label{fig.autocorr8}
2918
\end{figure}
2919

2920
For this example, I estimated the period by trial and error.  To automate
2921
the process, we can use the autocorrelation function.
2922

2923
\begin{verbatim}
2924
    lags, corrs = autocorr(segment)
2925
    thinkplot.plot(lags, corrs)
2926
\end{verbatim}
2927

2928
Figure~\ref{fig.autocorr8} shows the autocorrelation function for
2929
the segment starting at $t=0.2$ seconds.  The first peak occurs at
2930
{\tt lag=101}.  We can compute the frequency that corresponds
2931
to that period like this:
2932

2933
\begin{verbatim}
2934
    period = lag / segment.framerate
2935
    frequency = 1 / period
2936
\end{verbatim}
2937

2938
The estimated fundamental frequency is 437 Hz.  To evaluate the
2939
precision of the estimate, we can run the same computation with
2940
lags 100 and 102, which correspond to frequencies 432 and 441 Hz.
2941
The frequency precision using autocorrelation is less than 10 Hz,
2942
compared with 100 Hz using the spectrum.  In musical terms, the
2943
expected error is about 30 cents (a third of a semitone).
2944

2945

2946
\section{Correlation as dot product}
2947
\label{dotproduct}
2948

2949
I started the chapter with this definition of Pearson's
2950
correlation coefficient:
2951
%
2952
\[ \rho = \frac{ \sum_i (x_i - \mu_x) (y_i - \mu_y)}{N \sigma_x \sigma_y} \]
2953
%
2954
Then I used $\rho$ to define serial correlation and autocorrelation.
2955
That's consistent with how these terms are used in statistics,
2956
but in the context of signal processing, the definitions are
2957
a little different.
2958

2959
In signal processing, we are often working with unbiased signals,
2960
where the mean is 0, and normalized signals, where the standard
2961
deviation is 1.  In that case, the definition of $\rho$ simplifies to:
2962
%
2963
\[ \rho = \frac{1}{N} \sum_i x_i y_i \]
2964
%
2965
And it is common to simplify even further:
2966
%
2967
\[ r = \sum_i x_i y_i \]
2968
%
2969
This definition of correlation is not ``standardized'', so it doesn't
2970
generally fall between -1 and 1.  But it has other useful properties.
2971

2972
If you think of $x$ and $y$ as vectors, you might recognize this
2973
formula as the {\bf dot product}, $x \cdot y$.  See
2974
\url{http://en.wikipedia.org/wiki/Dot_product}.
2975

2976
\newcommand{\norm}{\mathrm{norm}}
2977

2978
The dot product indicates the degree to which the signals are similar.
2979
If they are normalized so their standard deviations are 1,
2980
%
2981
\[ x \cdot y = \cos \theta \]
2982
%
2983
where $\theta$ is the angle between the vectors.  And that explains
2984
why Figure~\ref{fig.autocorr2} is a cosine curve.
2985

2986

2987
\section{Using NumPy}
2988
\label{correlate}
2989

2990
\begin{figure}
2991
% autocorr.py
2992
\centerline{\includegraphics[height=2.5in]{figs/autocorr9.eps}}
2993
\caption{Autocorrelation function computed with {\tt np.correlate}.}
2994
\label{fig.autocorr9}
2995
\end{figure}
2996

2997
NumPy provides a function, {\tt correlate}, that computes
2998
the correlation of two functions or the autocorrelation of one
2999
function.  We can use it to compute the autocorrelation of
3000
the segment from the previous section:
3001

3002
\begin{verbatim}
3003
corrs2 = np.correlate(segment.ys, segment.ys, mode='same')
3004
\end{verbatim}
3005

3006
The option {\tt mode} tells {\tt correlate} what range
3007
of {\tt lag} to use.  With the value {\tt 'same'}, the
3008
range is from $-N/2$ to $N/2$, where $N$ is the length of the
3009
wave array.
3010

3011
Figure~\ref{fig.autocorr9} shows the result.  It is symmetric because
3012
the two signals are identical, so a negative lag on one has the same
3013
effect as a positive lag on the other.  To compare with the results
3014
from {\tt autocorr}, we can select the second half:
3015

3016
\begin{verbatim}
3017
    N = len(corrs2)
3018
    half = corrs2[N//2:]
3019
\end{verbatim}
3020

3021
If you compare Figure~\ref{fig.autocorr9} to Figure~\ref{fig.autocorr8},
3022
you'll notice that the correlations computed by {\tt np.correlate}
3023
get smaller as the lags increase.  That's because {\tt np.correlate}
3024
uses the unstandardized definition of correlation;
3025
as the lag gets bigger, the
3026
overlap between the two signals gets smaller, so the magnitude of
3027
the correlations decreases.
3028

3029
We can correct that by dividing through by the lengths:
3030

3031
\begin{verbatim}
3032
    lengths = range(N, N//2, -1)
3033
    half /= lengths
3034
\end{verbatim}
3035

3036
Finally, we can standardize the results so the correlation with
3037
{\tt lag=0} is 1.
3038

3039
\begin{verbatim}
3040
    half /= half[0]
3041
\end{verbatim}
3042

3043
With these adjustments, the results computed by {\tt autocorr} and
3044
{\tt np.correlate} are nearly the same.  They still differ by
3045
1-2\%.  The reason is not important, but if you are curious: {\tt autocorr}
3046
standardizes the correlations independently for each lag; for
3047
{\tt np.correlate}, we standardized them all at the end.
3048

3049
More importantly, now you know what autocorrelation is, how to
3050
use it to estimate the fundamental period of a signal, and two
3051
ways to compute it.
3052

3053

3054
\section{Exercises}
3055

3056
Solutions to these exercises are in {\tt chap05soln.ipynb}.
3057

3058
\begin{exercise}
3059
The IPython notebook for this chapter, {\tt chap05.ipynb}, includes
3060
an interaction that lets you compute autocorrelations for different
3061
lags.  Use this interaction to estimate the pitch of the vocal chirp
3062
for a few different start times.
3063
\end{exercise}
3064

3065

3066
\begin{exercise}
3067
The example code in \verb"chap05.ipynb" shows how to use autocorrelation
3068
to estimate the fundamental frequency of a periodic signal.
3069
Encapsulate this code in a function called \verb"estimate_fundamental",
3070
and use it to track the pitch of a recorded sound.
3071

3072
To see how well it works, try superimposing your pitch estimates on a
3073
spectrogram of the recording.
3074
\end{exercise}
3075

3076

3077
\begin{exercise}
3078
If you did the exercises in the previous chapter, you downloaded
3079
the historical price of BitCoins and estimated the power spectrum
3080
of the price changes.  Using the same data, compute the autocorrelation
3081
of BitCoin prices.  Does the autocorrelation function drop off quickly?
3082
Is there evidence of periodic behavior?
3083
\end{exercise}
3084

3085

3086
\begin{exercise}
3087
In the repository for this book you will find an IPython notebook
3088
called \verb"saxophone.ipynb" that explores autocorrelation,
3089
pitch perception, and a phenomenon called the ``missing fundamental''.
3090
Read through this notebook and run the examples.  Try selecting
3091
a different segment of the recording and running the examples again.
3092
\end{exercise}
3093

3094

3095

3096
\chapter{Discrete cosine transform}
3097
\label{dct}
3098

3099
The topic of this chapter is the {\bf Discrete Cosine
3100
  Transform} (DCT), which is used in MP3 and related formats for
3101
compressing music; JPEG and similar formats for images; and the MPEG
3102
family of formats for video.
3103

3104
DCT is similar in many ways to the Discrete Fourier Transform (DFT),
3105
which we have been using for spectral analysis.
3106
Once we learn how DCT works, it will be easier to explain DFT.
3107

3108
Here are the steps to get there:
3109

3110
\begin{enumerate}
3111

3112
\item We'll start with the synthesis problem: given a set of frequency
3113
  components and their amplitudes, how can we construct a wave?
3114

3115
\item Next we'll rewrite the synthesis problem using NumPy arrays.
3116
  This move is good for performance, and also provides insight
3117
  for the next step.
3118

3119
\item We'll look at the analysis problem: given a signal and a
3120
  set of frequencies, how can we find the amplitude of each frequency
3121
  component?  We'll start with a solution that is conceptually simple
3122
  but slow.
3123

3124
\item Finally, we'll use some principles from linear algebra to find a
3125
  more efficient algorithm.  If you already know linear algebra,
3126
  that's great, but I'll explain what you need as we go.
3127

3128
\end{enumerate}
3129

3130
The code for this chapter is in {\tt
3131
  chap06.ipynb} which is in the repository for this book (see
3132
Section~\ref{code}).
3133
You can also view it at \url{http://tinyurl.com/thinkdsp06}.
3134

3135

3136
\section{Synthesis}
3137
\label{synth1}
3138

3139
Suppose I give you a list of amplitudes and a list of frequencies,
3140
and ask you to construct a signal that is the sum of these frequency
3141
components.  Using objects in the {\tt thinkdsp} module, there is
3142
a simple way to perform this operation, which is called {\bf synthesis}:
3143

3144
\begin{verbatim}
3145
def synthesize1(amps, fs, ts):
3146
    components = [thinkdsp.CosSignal(freq, amp)
3147
                  for amp, freq in zip(amps, fs)]
3148
    signal = thinkdsp.SumSignal(*components)
3149

3150
    ys = signal.evaluate(ts)
3151
    return ys
3152
\end{verbatim}
3153

3154
{\tt amps} is a list of amplitudes, {\tt fs} is the list
3155
of frequencies, and {\tt ts} is the sequence
3156
of times where the signal should be evaluated.
3157

3158
{\tt components} is a list of {\tt CosSignal} objects, one for
3159
each amplitude-frequency pair.  {\tt SumSignal} represents the
3160
sum of these frequency components.
3161

3162
Finally, {\tt evaluate} computes the value of the signal at each
3163
time in {\tt ts}.
3164

3165
We can test this function like this:
3166

3167
\begin{verbatim}
3168
    amps = np.array([0.6, 0.25, 0.1, 0.05])
3169
    fs = [100, 200, 300, 400]
3170
    framerate = 11025
3171

3172
    ts = np.linspace(0, 1, framerate)
3173
    ys = synthesize1(amps, fs, ts)
3174
    wave = thinkdsp.Wave(ys, framerate)
3175
\end{verbatim}
3176

3177
This example makes a signal that contains a fundamental frequency at
3178
100 Hz and three harmonics (100 Hz is a sharp G2).  It renders the
3179
signal for one second at 11,025 frames per second and puts the results
3180
into a Wave object.
3181

3182
Conceptually, synthesis is pretty simple.  But in this form it doesn't
3183
help much with {\bf analysis}, which is the inverse problem: given the
3184
wave, how do we identify the frequency components and their amplitudes?
3185

3186

3187
\section{Synthesis with arrays}
3188
\label{synthesis}
3189

3190
\begin{figure}
3191
\input{diagram}
3192
\caption{Synthesis with arrays.}
3193
\label{fig.synthesis}
3194
\end{figure}
3195

3196
Here's another way to write {\tt synthesize}:
3197

3198
\begin{verbatim}
3199
def synthesize2(amps, fs, ts):
3200
    args = np.outer(ts, fs)
3201
    M = np.cos(PI2 * args)
3202
    ys = np.dot(M, amps)
3203
    return ys
3204
\end{verbatim}
3205

3206
This function looks very different, but it does the same thing.
3207
Let's see how it works:
3208

3209
\begin{enumerate}
3210

3211
\item {\tt np.outer} computes the outer product of {\tt ts} and
3212
  {\tt fs}.  The result is an array with one row for each element
3213
  of {\tt ts} and one column for each element of {\tt fs}.  Each
3214
  element in the array is the product of a frequency and a time, $f
3215
  t$.
3216

3217
\item We multiply {\tt args} by $2 \pi$ and apply {\tt cos}, so each
3218
  element of the result is $\cos (2 \pi f t)$.  Since the {\tt ts} run
3219
  down the columns, each column contains a cosine signal at a
3220
  particular frequency, evaluated at a sequence of times.
3221

3222
\item {\tt np.dot} multiplies each row of {\tt M} by {\tt amps},
3223
  element-wise, and then adds up the products.  In terms of linear
3224
  algebra, we are multiplying a matrix, {\tt M}, by a vector, {\tt
3225
    amps}.  In terms of signals, we are computing the weighted sum
3226
  of frequency components.
3227

3228
\end{enumerate}
3229

3230
Figure~\ref{fig.synthesis} shows the structure of this computation.
3231
Each row of the matrix, {\tt M}, corresponds to a time 
3232
from 0.0 to 1.0 seconds; $t_n$ is the time of the $n$th row.
3233
Each column corresponds to a frequency from
3234
100 to 400 Hz; $f_k$ is the frequency of the $k$th column.
3235

3236
I labeled the $n$th row with the letters $a$ through $d$; as an
3237
example, the value of $a$ is $\cos [2 \pi (100) t_n]$.
3238

3239
The result of the dot product, {\tt ys}, is a vector with one element
3240
for each row of {\tt M}.  The $n$th element, labeled $e$, is the sum
3241
of products:
3242
%
3243
\[ e = 0.6 a + 0.25 b + 0.1 c + 0.05 d \]
3244
%
3245
And likewise with the other elements of {\tt ys}.  So each element
3246
of {\tt y} is the sum of four frequency components, evaluated at
3247
a point in time, and multiplied by the corresponding amplitudes.
3248
And that's exactly what we wanted.
3249

3250
We can use the code from the previous section to check that the two
3251
versions of {\tt synthesize} produce the same results.
3252

3253
\begin{verbatim}
3254
ys1 = synthesize1(amps, fs, ts)
3255
ys2 = synthesize2(amps, fs, ts)
3256
max(abs(ys1 - ys2))
3257
\end{verbatim}
3258

3259
The biggest difference between {\tt ys1} and {\tt ys2} is about {\tt
3260
  1e-13}, which is what we expect due to floating-point errors.
3261

3262
Writing this computation in terms of linear algebra makes the code
3263
smaller and faster.  Linear algebra
3264
provides concise notation for operations on matrices and vectors.  For
3265
example, we could write {\tt synthesize} like this:
3266
%
3267
\begin{eqnarray*}
3268
M &=& \cos (2 \pi t \otimes f) \\
3269
y &=& M a
3270
\end{eqnarray*}
3271
%
3272
where $a$ is a vector of amplitudes,
3273
$t$ is a vector of times, $f$ is a vector of frequencies, and
3274
$\otimes$ is the symbol for the outer product of two vectors.
3275

3276

3277
\section{Analysis}
3278
\label{analysis}
3279

3280
Now we are ready to solve the analysis problem.  Suppose I give you
3281
a wave and tell you that it is the sum of cosines with a given set
3282
of frequencies.  How would you find the amplitude for each frequency
3283
component?  In other words, given {\tt ys}, {\tt ts} and {\tt fs},
3284
can you recover {\tt amps}?
3285

3286
In terms of linear algebra, the first step is the same as for
3287
synthesis: we compute $M = \cos (2 \pi t \otimes f)$.  Then we want
3288
to find $a$ so that $y = M a$; in other words, we want to solve a
3289
linear system.  NumPy provides {\tt linalg.solve}, which does
3290
exactly that.
3291

3292
Here's what the code looks like:
3293

3294
\begin{verbatim}
3295
def analyze1(ys, fs, ts):
3296
    args = np.outer(ts, fs)
3297
    M = np.cos(PI2 * args)
3298
    amps = np.linalg.solve(M, ys)
3299
    return amps
3300
\end{verbatim}
3301

3302
The first two lines use {\tt ts} and {\tt fs} to build the
3303
matrix, {\tt M}.  Then {\tt np.linalg.solve} computes {\tt amps}.
3304

3305
But there's a hitch.  In general we can only solve a system of linear
3306
equations if the matrix is square; that is, the number of equations
3307
(rows) is the same as the number of unknowns (columns).
3308

3309
In this example, we have only 4 frequencies, but we evaluated the
3310
signal at 11,025 times.  So we have many more equations than unknowns.
3311

3312
In general if {\tt ys}
3313
contains more than 4 elements, it is unlikely that we can analyze it
3314
using only 4 frequencies.
3315

3316
But in this case, we know that the {\tt ys} were actually generated by
3317
adding only 4 frequency components, so we can use any 4 values from
3318
the wave array to recover {\tt amps}.
3319

3320
For simplicity, I'll use the first 4 samples from the signal.
3321
Using the values of {\tt ys}, {\tt fs} and {\tt ts} from
3322
the previous section, we can run {\tt analyze1} like this:
3323

3324
\begin{verbatim}
3325
n = len(fs)
3326
amps2 = analyze1(ys[:n], fs, ts[:n])
3327
\end{verbatim}
3328

3329
And sure enough, {\tt amps2} is
3330

3331
\begin{verbatim}
3332
[ 0.6   0.25  0.1   0.05 ]
3333
\end{verbatim}
3334

3335
This algorithm works, but it is slow.  Solving a linear
3336
system of equations takes time proportional to $n^3$, where $n$ is
3337
the number of columns in $M$.  We can do better.
3338

3339

3340
\section{Orthogonal matrices}
3341

3342
One way to solve linear systems is by inverting matrices.  The
3343
inverse of a matrix $M$ is written $M^{-1}$, and it has the property
3344
that $M^{-1}M = I$.  $I$ is the identity matrix, which has
3345
the value 1 on all diagonal elements and 0 everywhere else.
3346

3347
So, to solve the equation $y = Ma$, we can multiply both sides by
3348
$M^{-1}$, which yields:
3349
%
3350
\[ M^{-1}y = M^{-1} M a \]
3351
%
3352
On the right side, we can replace $M^{-1}M$ with $I$:
3353
%
3354
\[ M^{-1}y = I a \]
3355
%
3356
If we multiply $I$ by any vector $a$, the result is $a$, so  
3357
%
3358
\[ M^{-1}y = a \]
3359
%
3360
This implies that if we can compute $M^{-1}$ efficiently, we can find
3361
$a$ with a simple matrix multiplication (using {\tt np.dot}).  That
3362
takes time proportional to $n^2$, which is better than $n^3$.
3363

3364
Inverting a matrix is slow, in general, but some special cases are
3365
faster.  In particular, if $M$ is {\bf orthogonal}, the inverse of $M$
3366
is just the transpose of $M$, written $M^T$.  In NumPy
3367
transposing an array is a constant-time operation.  It
3368
doesn't actually move the elements of the array; instead, it creates a
3369
``view'' that changes the way the elements are accessed.
3370

3371
Again, a matrix is orthogonal if its transpose is also its inverse;
3372
that is, $M^T = M^{-1}$.  That implies that $M^TM = I$, which means we
3373
can check whether a matrix is orthogonal by computing $M^TM$.
3374

3375
So let's see what the matrix looks like in {\tt synthesize2}.  In 
3376
the previous example, $M$ has 11,025 rows, so it might be a good idea
3377
to work with a smaller example:
3378

3379
\begin{verbatim}
3380
def test1():
3381
    amps = np.array([0.6, 0.25, 0.1, 0.05])
3382
    N = 4.0
3383
    time_unit = 0.001
3384
    ts = np.arange(N) / N * time_unit
3385
    max_freq = N / time_unit / 2
3386
    fs = np.arange(N) / N * max_freq
3387
    ys = synthesize2(amps, fs, ts)
3388
\end{verbatim}
3389

3390
{\tt amps} is the same vector of amplitudes we saw before.
3391
Since we have 4 frequency components, we'll sample the signal
3392
at 4 points in time.  That way, $M$ is square.
3393

3394
{\tt ts} is a vector of equally spaced sample times in the range from
3395
0 to 1 time unit.  I chose the time unit to be 1 millisecond, but it
3396
is an arbitrary choice, and we will see in a minute that it drops out
3397
of the computation anyway.
3398

3399
Since the frame rate is $N$ samples per time unit, the Nyquist
3400
frequency is \verb"N / time_unit / 2", which is 2000 Hz in this
3401
example.  So {\tt fs} is a vector of equally spaced frequencies
3402
between 0 and 2000 Hz.
3403

3404
With these values of {\tt ts} and {\tt fs}, the matrix, $M$, is:
3405

3406
\begin{verbatim}
3407
[[ 1.     1.     1.     1.   ]
3408
 [ 1.     0.707  0.    -0.707]
3409
 [ 1.     0.    -1.    -0.   ]
3410
 [ 1.    -0.707 -0.     0.707]]
3411
\end{verbatim}
3412

3413
You might recognize 0.707 as an approximation of $\sqrt{2}/2$,
3414
which is $\cos \pi/4$.  You also might notice that this matrix
3415
is {\bf symmetric}, which means that the element at $(j, k)$ always
3416
equals the element at $(k, j)$.  This implies that $M$ is its own
3417
transpose; that is, $M^T = M$.
3418

3419
But sadly, $M$ is not orthogonal.  If we compute $M^TM$, we get:
3420

3421
\begin{verbatim}
3422
[[ 4.  1. -0.  1.]
3423
 [ 1.  2.  1. -0.]
3424
 [-0.  1.  2.  1.]
3425
 [ 1. -0.  1.  2.]]
3426
\end{verbatim}
3427

3428
And that's not the identity matrix.
3429

3430

3431
\section{DCT-IV}
3432
\label{dctiv}
3433

3434
But if we choose {\tt ts} and {\tt fs} carefully,
3435
we can make $M$ orthogonal.  There are several ways to do it, which
3436
is why there are several versions of the discrete cosine transform (DCT).
3437

3438
One simple option is to shift {\tt ts} and {\tt fs} by a half unit.
3439
This version is called DCT-IV, where ``IV'' is a roman numeral
3440
indicating that this is the fourth of eight versions of the DCT.
3441

3442
Here's an updated version of {\tt test1}:
3443

3444
\begin{verbatim}
3445
def test2():
3446
    amps = np.array([0.6, 0.25, 0.1, 0.05])
3447
    N = 4.0
3448
    ts = (0.5 + np.arange(N)) / N
3449
    fs = (0.5 + np.arange(N)) / 2
3450
    ys = synthesize2(amps, fs, ts)
3451
\end{verbatim}
3452

3453
If you compare this to the previous version, you'll notice
3454
two changes.  First, I added 0.5 to {\tt ts} and {\tt fs}.
3455
Second, I cancelled out \verb"time_units", which simplifies
3456
the expression for {\tt fs}.
3457

3458
With these values, $M$ is
3459

3460
\begin{verbatim}
3461
[[ 0.981  0.831  0.556  0.195]
3462
 [ 0.831 -0.195 -0.981 -0.556]
3463
 [ 0.556 -0.981  0.195  0.831]
3464
 [ 0.195 -0.556  0.831 -0.981]]
3465
\end{verbatim}
3466

3467
And $M^TM$ is
3468

3469
\begin{verbatim}
3470
[[ 2.  0.  0.  0.]
3471
 [ 0.  2. -0.  0.]
3472
 [ 0. -0.  2. -0.]
3473
 [ 0.  0. -0.  2.]]
3474
\end{verbatim}
3475

3476
Some of the off-diagonal elements are displayed as -0, which means
3477
that the floating-point representation is a small negative number.
3478
This matrix is very close to $2I$, which means $M$ is almost
3479
orthogonal; it's just off by a factor of 2.  And for our purposes,
3480
that's good enough.
3481

3482
Because $M$ is symmetric and (almost) orthogonal, the inverse of $M$
3483
is just $M/2$.  Now we can write a more efficient version of {\tt
3484
  analyze}:
3485

3486
\begin{verbatim}
3487
def analyze2(ys, fs, ts):
3488
    args = np.outer(ts, fs)
3489
    M = np.cos(PI2 * args)
3490
    amps = np.dot(M, ys) / 2
3491
    return amps
3492
\end{verbatim}
3493

3494
Instead of using {\tt np.linalg.solve}, we just multiply
3495
by $M/2$.
3496

3497
Combining {\tt test2} and {\tt analyze2}, we can write an
3498
implementation of DCT-IV:
3499

3500
\begin{verbatim}
3501
def dct_iv(ys):
3502
    N = len(ys)
3503
    ts = (0.5 + np.arange(N)) / N
3504
    fs = (0.5 + np.arange(N)) / 2
3505
    args = np.outer(ts, fs)
3506
    M = np.cos(PI2 * args)
3507
    amps = np.dot(M, ys) / 2
3508
    return amps
3509
\end{verbatim}
3510

3511
Again, {\tt ys} is the wave array.  We don't have to pass
3512
{\tt ts} and {\tt fs} as parameters; \verb"dct_iv" can
3513
figure them out based on {\tt N}, the length of {\tt ys}.
3514

3515
If we've got it right, this function should solve the analysis
3516
problem; that is, given {\tt ys} it should be able to recover
3517
{\tt amps}.  We can test it like this.
3518

3519
\begin{verbatim}
3520
amps = np.array([0.6, 0.25, 0.1, 0.05])
3521
N = 4.0
3522
ts = (0.5 + np.arange(N)) / N
3523
fs = (0.5 + np.arange(N)) / 2
3524
ys = synthesize2(amps, fs, ts)
3525
amps2 = dct_iv(ys)
3526
max(abs(amps - amps2))
3527
\end{verbatim}
3528

3529
Starting with {\tt amps}, we synthesize a wave array, then use
3530
\verb"dct_iv" to compute {\tt amps2}.  The biggest
3531
difference between {\tt amps} and {\tt amps2} is about {\tt 1e-16},
3532
which is what we expect due to floating-point errors.
3533

3534

3535
\section{Inverse DCT}
3536

3537
Finally, notice that {\tt analyze2} and {\tt synthesize2} are almost
3538
identical.  The only difference is that {\tt analyze2} divides the
3539
result by 2.  We can use this insight to compute the inverse DCT:
3540

3541
\begin{verbatim}
3542
def inverse_dct_iv(amps):
3543
    return dct_iv(amps) * 2
3544
\end{verbatim}
3545

3546
\verb"inverse_dct_iv" solves the synthesis problem: it
3547
takes the vector of amplitudes and returns
3548
the wave array, {\tt ys}.  We can test it by starting
3549
with {\tt amps}, applying \verb"inverse_dct_iv" and \verb"dct_iv",
3550
and testing that we get back what we started with.
3551

3552
\begin{verbatim}
3553
amps = [0.6, 0.25, 0.1, 0.05]
3554
ys = inverse_dct_iv(amps)
3555
amps2 = dct_iv(ys)
3556
max(abs(amps - amps2))
3557
\end{verbatim}
3558

3559
Again, the biggest difference is about {\tt 1e-16}.
3560

3561

3562
\section{The Dct class}
3563

3564
\begin{figure}
3565
% dct.py
3566
\centerline{\includegraphics[height=2.5in]{figs/dct1.eps}}
3567
\caption{DCT of a triangle signal at 400 Hz, sampled at 10 kHz.}
3568
\label{fig.dct1}
3569
\end{figure}
3570

3571
{\tt thinkdsp} provides a Dct class that encapsulates the
3572
DCT in the same way the Spectrum class encapsulates the FFT.
3573
To make a Dct object, you can invoke \verb"make_dct" on a Wave.
3574

3575
\begin{verbatim}
3576
signal = thinkdsp.TriangleSignal(freq=400)
3577
wave = signal.make_wave(duration=1.0, framerate=10000)
3578
dct = wave.make_dct()
3579
dct.plot()
3580
\end{verbatim}
3581

3582
The result is the DCT of a triangle wave at 400 Hz, shown in
3583
Figure~\ref{dct1}.  The values of the DCT can be positive or negative;
3584
a negative value in the DCT corresponds to a negated cosine or,
3585
equivalently, to a cosine shifted by 180 degrees.
3586

3587
\verb"make_dct" uses DCT-II, which is the most common type of DCT,
3588
provided by {\tt scipy.fftpack}.
3589

3590
\begin{verbatim}
3591
import scipy.fftpack
3592

3593
# class Wave:
3594
    def make_dct(self):
3595
        N = len(self.ys)
3596
        hs = scipy.fftpack.dct(self.ys, type=2)
3597
        fs = (0.5 + np.arange(N)) / 2
3598
        return Dct(hs, fs, self.framerate)
3599
\end{verbatim}
3600

3601
The results from {\tt dct} are stored in {\tt hs}.  The corresponding
3602
frequencies, computed as in Section~\ref{dctiv}, are stored in {\tt fs}.
3603
And then both are used to initialize the Dct object.
3604

3605
Dct provides \verb"make_wave", which performs the inverse DCT.
3606
We can test it like this:
3607

3608
\begin{verbatim}
3609
wave2 = dct.make_wave()
3610
max(abs(wave.ys-wave2.ys))
3611
\end{verbatim}
3612

3613
The biggest difference between {\tt ys1} and {\tt ys2} is about {\tt
3614
  1e-16}, which is what we expect due to floating-point errors.
3615

3616
\verb"make_wave" uses {\tt scipy.fftpack.idct}: 
3617

3618
\begin{verbatim}
3619
# class Dct
3620
    def make_wave(self):
3621
        n = len(self.hs)
3622
        ys = scipy.fftpack.idct(self.hs, type=2) / 2 / n
3623
        return Wave(ys, framerate=self.framerate) 
3624
\end{verbatim}
3625

3626
Be default, the inverse DCT doesn't normalize the result, so we have
3627
to divide through by $2N$.
3628

3629

3630
\section{Exercises}
3631

3632
For the following exercises, I provide some starter code in
3633
{\tt chap06starter.ipynb}.
3634
Solutions are in {\tt chap06soln.ipynb}.
3635

3636
\begin{exercise}
3637
In this chapter I claim that {\tt analyze1} takes time proportional
3638
to $n^3$ and {\tt analyze2} takes time proportional to $n^2$.  To
3639
see if that's true, run them on a range of input sizes and time
3640
them.  In IPython, you can use the ``magic command'' \verb"%timeit".
3641

3642
If you plot run time versus input size on a log-log scale, you
3643
should get a straight line with slope 3 for  {\tt analyze1} and
3644
slope 2 for {\tt analyze2}.
3645

3646
You also might want to test \verb"dct_iv"
3647
and {\tt scipy.fftpack.dct}.
3648

3649
\end{exercise}
3650

3651

3652
\begin{exercise}
3653
One of the major applications of the DCT is compression for both
3654
sound and images.  In its simplest form, DCT-based compression
3655
works like this:
3656

3657
\begin{enumerate}
3658

3659
\item Break a long signal into segments.
3660

3661
\item Compute the DCT of each segment.
3662

3663
\item Identify frequency components with amplitudes so low they are
3664
  inaudible, and remove them.  Store only the frequencies and
3665
  amplitudes that remain.
3666

3667
\item To play back the signal, load the frequencies and amplitudes
3668
  for each segment and apply the inverse DCT.
3669

3670
\end{enumerate}
3671

3672
Implement a version of this algorithm and apply it to a recording
3673
of music or speech.  How many components can you eliminate before
3674
the difference is perceptible?
3675

3676
In order to make this method practical, you need some way to store a
3677
sparse array; that is, an array where most of the elements are zero.
3678
NumPy provides several implementations of sparse arrays, which you can
3679
read about at
3680
\url{http://docs.scipy.org/doc/scipy/reference/sparse.html}.
3681
\end{exercise}
3682

3683

3684
\begin{exercise}
3685
In the repository for this book you will find an IPython notebook
3686
called \verb"phase.ipynb" that the effect of phase on sound
3687
perception.
3688
Read through this notebook and run the examples.  
3689
Choose another segment of sound and run the same experiments.
3690
Can you find any general relationships between the phase structure
3691
of a sound and how we perceive it?
3692
\end{exercise}
3693

3694

3695

3696

3697
\chapter{Discrete Fourier Transform}
3698
\label{dft}
3699

3700
We've been using the discrete Fourier transform (DFT) since
3701
Chapter~\ref{sounds}, but I haven't explained how it works.  Now is
3702
the time.
3703

3704
If you understand the discrete cosine transform (DCT), you will
3705
understand the DFT.  The only difference is that instead of using the
3706
cosine function, we'll use the complex exponential function.  I'll
3707
start by explaining complex exponentials, then I'll follow the
3708
same progression as in Chapter~\ref{dct}:
3709

3710
\begin{enumerate}
3711

3712
\item We'll start with the synthesis
3713
  problem: given a set of frequency components and their amplitudes,
3714
  how can we construct a signal?  The synthesis problem is 
3715
  equivalent to the inverse DFT.
3716

3717
\item Then I'll rewrite the synthesis problem in the form of matrix
3718
  multiplication using NumPy arrays.
3719

3720
\item Next we'll solve the analysis problem, which is equivalent to
3721
  the DFT: given a signal, how to we find the amplitude and phase
3722
  offset of its frequency components?
3723

3724
\item Finally, we'll use linear algebra to find a more efficient way
3725
  to compute the DFT.
3726

3727
\end{enumerate}
3728

3729
The code for this chapter is in {\tt chap07.ipynb}, which is in the
3730
repository for this book (see Section~\ref{code}).
3731
You can also view it at \url{http://tinyurl.com/thinkdsp07}.
3732

3733

3734
\section{Complex exponentials}
3735

3736
One of the more interesting moves in mathematics is the generalization
3737
of an operation from one type to another.  For example, factorial is a
3738
function that operates on integers; the natural definition for
3739
factorial of $n$ is the product of all integers from 1 to $n$.
3740

3741
If you are of a certain inclination, you might wonder how to compute
3742
the factorial of a non-integer like 3.5.  Since the natural definition
3743
doesn't apply, you might look for other ways to compute the factorial
3744
function, ways that would work with non-integers.
3745

3746
In 1730, Leonhard Euler found one, a generalization of the factorial
3747
function that we know as the gamma function (see
3748
\url{http://en.wikipedia.org/wiki/Gamma_function}).
3749

3750
Euler also found one of the most useful generalizations in applied
3751
mathematics, the complex exponential function.
3752

3753
The natural definition of exponentiation is repeated multiplication.
3754
For example, $\phi^3 = \phi \cdot \phi \cdot \phi$.  But this
3755
definition doesn't apply to non-integer exponents.
3756

3757
However, exponentiation can also be expressed as a power series:
3758
%
3759
\[ e^\phi = 1 + \phi + \phi^2/2! + \phi^3/3! + ... \]
3760
%
3761
This definition works with real numbers, imaginary numbers and, by a simple
3762
extension, with complex numbers.  Applying this definition
3763
to a pure imaginary number, $i\phi$, we get
3764
%
3765
\[ e^{i\phi} = 1 + i\phi - \phi^2/2! - i\phi^3/3! + ... \]
3766
%
3767
By rearranging terms, we can show that this is equivalent to:
3768
%
3769
\[ e^{i\phi} = \cos \phi + i \sin \phi \]
3770
%
3771
You can see the derivation at
3772
\url{http://en.wikipedia.org/wiki/Euler's_formula}.
3773

3774
This formula
3775
implies that $e^{i\phi}$ is a complex number with magnitude 1; if you
3776
think of it as a point in the complex plane, it is always on the unit
3777
circle.  And if you think of it as a vector, the angle in radians
3778
between the vector and the positive x-axis is the argument, $\phi$.
3779

3780
In the case where the exponent is a complex number, we have:
3781
%
3782
\[ e^{a + i\phi} = e^a e^{i\phi} = A e^{i\phi} \]
3783
%
3784
where $A$ is a real number that indicates amplitude and
3785
$e^{i\phi}$ is a unit complex number that indicates angle.
3786

3787
NumPy provides a version of {\tt exp} that works with complex numbers:
3788

3789
\begin{verbatim}
3790
>>> phi = 1.5
3791
>>> z = np.exp(1j * phi)
3792
>>> z
3793
(0.0707+0.997j)
3794
\end{verbatim}
3795

3796
Python uses {\tt j} to represent the imaginary unit, rather
3797
than {\tt i}.  A number ending in {\tt j} is considered imaginary,
3798
so {\tt 1j} is just $i$.
3799

3800
When the argument to {\tt np.exp} is imaginary or complex, the
3801
result is a complex number; specifically, a {\tt np.complex128},
3802
which is represented by two 64-bit floating-point numbers.
3803
In this example, the result is {\tt 0.0707+0.997j}.  
3804

3805
Complex numbers have attributes {\tt real} and {\tt imag}:
3806

3807
\begin{verbatim}
3808
>>> z.real
3809
0.0707
3810
>>> z.imag
3811
0.997
3812
\end{verbatim}
3813

3814
To get the magnitude, you can use the built-in function {\tt abs}
3815
or {\tt np.absolute}:
3816

3817
\begin{verbatim}
3818
>>> abs(z)
3819
1.0
3820
>>> np.absolute(z)
3821
1.0
3822
\end{verbatim}
3823

3824
To get the angle, you can use {\tt np.angle}:
3825

3826
\begin{verbatim}
3827
>>> np.angle(z)
3828
1.5
3829
\end{verbatim}
3830

3831
This example confirms that $e^{i \phi}$ is a complex number with
3832
magnitude 1 and angle $\phi$ radians.
3833

3834

3835
\section{Complex signals}
3836

3837
If $\phi(t)$ is a function of time, $e^{i \phi(t)}$ is also a function
3838
of time.  Specifically,
3839
%
3840
\[ e^{i \phi(t)} = \cos \phi(t) + i \sin \phi(t) \]
3841
%
3842
This function describes a quantity that varies in time, so it is
3843
a signal.  Specifically, it is a {\bf complex exponential signal}.
3844

3845
In the special case where the frequency of the signal is constant,
3846
$\phi(t)$ is $2 \pi f t$ and the result is a {\bf complex sinusoid}:
3847
%
3848
\[ e^{i 2 \pi f t} = \cos 2 \pi f t + i \sin 2 \pi f t \]
3849
%
3850
Or more generally, the signal might start at a phase offset
3851
$\phi_0$, yielding
3852
%
3853
\[ e^{i (2 \pi f t + \phi_0)} \]
3854
%
3855
{\tt thinkdsp} provides an implementation of this signal,
3856
{\tt ComplexSinusoid}:
3857

3858
\begin{verbatim}
3859
class ComplexSinusoid(Sinusoid):
3860
 
3861
   def evaluate(self, ts):
3862
        phases = PI2 * self.freq * ts + self.offset
3863
        ys = self.amp * np.exp(1j * phases)
3864
        return ys
3865
\end{verbatim}
3866

3867
{\tt ComplexSinusoid} inherits \verb"__init__" from
3868
{\tt Sinusoid}.  It provides a version of {\tt evaluate}
3869
that is almost identical to {\tt Sinusoid.evaluate}; the
3870
only difference is that it uses {\tt np.exp} instead of
3871
{\tt np.sin}.
3872

3873
The result is a NumPy array of complex numbers:
3874

3875
\begin{verbatim}
3876
>>> signal = thinkdsp.ComplexSinusoid(freq=1, amp=0.6, offset=1)
3877
>>> wave = signal.make_wave(duration=1, framerate=4)
3878
>>> print(wave.ys)
3879
[ 0.324+0.505j -0.505+0.324j -0.324-0.505j  0.505-0.324j]
3880
\end{verbatim}
3881

3882
The frequency of this signal is 1 cycle per second; the amplitude
3883
is 0.6 (in unspecified units); and the phase offset is 1 radian.
3884

3885
This example evaluates the signal at 4 places equally spaced between
3886
0 and 1 second.  The resulting samples are complex numbers.
3887

3888

3889
\section{The synthesis problem}
3890

3891
Just as we did with real sinusoids, we we can create compound signals
3892
by adding up complex sinusoids with different frequencies.  And that
3893
brings us to the complex version of the synthesis problem: given the
3894
frequency and amplitude of each complex component, how do we evaluate the
3895
signal?
3896

3897
The simplest solution is to create {\tt ComplexSinusoid} objects and
3898
add them up.
3899

3900
\begin{verbatim}
3901
def synthesize1(amps, fs, ts):
3902
    components = [thinkdsp.ComplexSinusoid(freq, amp)
3903
                  for amp, freq in zip(amps, fs)]
3904
    signal = thinkdsp.SumSignal(*components)
3905
    ys = signal.evaluate(ts)
3906
    return ys
3907
\end{verbatim}
3908

3909
This function is almost identical to {\tt synthesize1} in
3910
Section~\ref{synth1}; the only difference is that I replaced
3911
{\tt CosSignal} with {\tt ComplexSinusoid}.
3912

3913
Here's an example:
3914

3915
\begin{verbatim}
3916
>>> amps = np.array([0.6, 0.25, 0.1, 0.05])
3917
>>> fs = [100, 200, 300, 400]
3918
>>> framerate = 11025
3919
>>> ts = np.linspace(0, 1, framerate)
3920
>>> ys = synthesize1(amps, fs, ts)
3921
>>> print(ys)
3922
[ 1.000 +0.000e+00j  0.995 +9.093e-02j  0.979 +1.803e-01j ...,
3923
  0.979 -1.803e-01j  0.995 -9.093e-02j  1.000 -5.081e-15j]
3924
\end{verbatim}
3925

3926
At the lowest level, a complex signal is a sequence of complex
3927
numbers.  But how should we interpret it?  We have some intuition for
3928
real signals: they represent quantities that vary in time; for
3929
example, a sound signal represents changes in air pressure.
3930
But nothing we measure in the world yields complex numbers.
3931

3932
\begin{figure}
3933
% dft.py
3934
\centerline{\includegraphics[height=2.5in]{figs/dft1.eps}}
3935
\caption{Real and imaginary parts of a mixture of complex sinusoids.}
3936
\label{fig.dft1}
3937
\end{figure}
3938

3939
So what is a complex signal?  I don't have a satisfying answer to this
3940
question.  The best I can offer is two unsatisfying
3941
answers:
3942

3943
\begin{enumerate}
3944

3945
\item A complex signal is a mathematical abstraction that is useful
3946
  for computation and analysis, but it does not correspond directly
3947
  with anything in the real world.
3948

3949
\item If you like, you can think of a complex signal as a sequence of
3950
  complex numbers that contains two signals as its real and imaginary
3951
  parts.
3952

3953
\end{enumerate}
3954

3955
Taking the second point of view, we can split the previous
3956
signal into its real and imaginary parts:
3957

3958
\begin{verbatim}
3959
    n = 500
3960
    thinkplot.plot(ts[:n], ys[:n].real, label='real')
3961
    thinkplot.plot(ts[:n], ys[:n].imag, label='imag')
3962
\end{verbatim}
3963

3964
Figure~\ref{fig.dft1} shows a segment of the result.  The
3965
real part is a sum of cosines; the imaginary part is
3966
a sum of sines.  Although the waveforms look different, they
3967
contain the same frequency components in the same proportions.
3968
To our ears, they sound the same (in general, we don't hear
3969
phase offsets).
3970

3971

3972
\section{Synthesis with matrices}
3973
\label{synthmat}
3974

3975
As we saw in Section~\ref{synthesis}, we can also express the synthesis
3976
problem in terms of matrix multiplication: 
3977

3978
\begin{verbatim}
3979
PI2 = np.pi * 2
3980

3981
def synthesize2(amps, fs, ts):
3982
    args = np.outer(ts, fs)
3983
    M = np.exp(1j * PI2 * args)
3984
    ys = np.dot(M, amps)
3985
    return ys
3986
\end{verbatim}
3987

3988
Again, {\tt amps} is a NumPy array that contains a sequence of
3989
amplitudes.
3990

3991
{\tt fs} is a sequence containing the frequencies of the
3992
components.  {\tt ts} contains the times where we will evaluate
3993
the signal.
3994

3995
{\tt args} contains the outer product of {\tt ts} and {\tt fs},
3996
with the {\tt ts} running down the rows and the {\tt fs} running
3997
across the columns (you might want to refer back to
3998
Figure~\ref{fig.synthesis}).
3999

4000
Each column of matrix {\tt M} contains a complex sinusoid with
4001
a particular frequency, evaluated at a sequence of {\tt ts}.
4002

4003
When we multiply {\tt M} by the amplitudes, the result is a vector
4004
whose elements correspond to the {\tt ts}; each element is the sum of
4005
several complex sinusoids, evaluated at a particular time.
4006

4007
Here's the example from the previous section again:
4008

4009
\begin{verbatim}
4010
>>> ys = synthesize2(amps, fs, ts)
4011
>>> print(ys)
4012
[ 1.000 +0.000e+00j  0.995 +9.093e-02j  0.979 +1.803e-01j ...,
4013
  0.979 -1.803e-01j  0.995 -9.093e-02j  1.000 -5.081e-15j]
4014
\end{verbatim}
4015

4016
The result is the same.
4017

4018
In this example the amplitudes are real, but they could also be
4019
complex.  What effect does a complex amplitude have on the result?
4020
Remember that we can think of a complex number in two ways: either the
4021
sum of a real and imaginary part, $x + i y$, or the product of a real
4022
amplitude and a complex exponential, $A e^{i \phi_0}$.  Using the
4023
second interpretation, we can see what happens when we multiply
4024
a complex amplitude by a complex sinusoid.  For each frequency, $f$,
4025
we have:
4026
%
4027
\[ A e^{i \phi_0} \cdot e^{i 2 \pi f t} = A e^{i 2 \pi f t + \phi_0} \]
4028
%
4029
Multiplying by $A e^{i \phi_0}$ multiplies the amplitude by $A$
4030
and adds the phase offset $\phi_0$.
4031

4032
\begin{figure}
4033
% dft.py
4034
\centerline{\includegraphics[height=2.5in]{figs/dft2.eps}}
4035
\caption{Real part of two complex signals that differ by a phase
4036
offset.}
4037
\label{fig.dft2}
4038
\end{figure}
4039

4040
We can test that claim by running the previous example with
4041
$\phi_0 = 1.5$ for all frequency components:
4042

4043
\begin{verbatim}
4044
    phi = 1.5
4045
    amps2 = amps * np.exp(1j * phi)
4046
    ys2 = synthesize2(amps2, fs, ts)
4047

4048
    thinkplot.plot(ts[:n], ys.real[:n])
4049
    thinkplot.plot(ts[:n], ys2.real[:n])
4050
\end{verbatim}
4051

4052
Since {\tt amps}
4053
is an array of reals, multiplying by {\tt np.exp(1j * phi)} yields
4054
an array of complex numbers with phase offset {\tt phi} radians, and
4055
the same magnitudes as {\tt amps}.
4056

4057
Figure~\ref{fig.dft2} shows the result.  The phase offset
4058
$\phi_0 = 1.5$ shifts the wave to the left by about one quarter of
4059
a cycle; it also changes the waveform, because the same phase
4060
offset applied to different frequencies changes how the frequency
4061
components line up with each other.
4062

4063
Now that we have the more general solution to the synthesis problem --
4064
one that handles complex amplitudes -- we are ready for the analysis
4065
problem.
4066

4067

4068
\section{The analysis problem}
4069

4070
The analysis problem is the inverse of the synthesis problem: given a
4071
sequence of samples, $y$, and knowing the frequencies
4072
that make up the signal, can we compute the complex amplitudes of the
4073
components, $a$?
4074

4075
As we saw in Section~\ref{analysis}, we can solve this problem by forming
4076
the synthesis matrix, $M$, and solving the system of linear
4077
equations, $M a = y$ for $a$.
4078

4079
\begin{verbatim}
4080
def analyze1(ys, fs, ts):
4081
    args = np.outer(ts, fs)
4082
    M = np.exp(1j * PI2 * args)
4083
    amps = np.linalg.solve(M, ys)
4084
    return amps
4085
\end{verbatim}
4086

4087
{\tt analyze1} takes a (possibly complex) wave array, {\tt ys}, a
4088
sequence of real frequencies, {\tt fs}, and a sequence of real
4089
times, {\tt ts}.  It returns a sequence of complex amplitudes, {\tt
4090
  amps}.
4091

4092
Continuing the previous example, we can confirm that {\tt analyze1}
4093
recovers the amplitudes we started with.  For the linear system
4094
solver to work, {\tt M} has to be square, so we need {\tt ys}, {\tt
4095
  fs} and {\tt ts} to have the same length.  I'll insure that by
4096
slicing {\tt ys} and {\tt ts} down to the length of {\tt fs}:
4097

4098
\begin{verbatim}
4099
>>> n = len(fs)
4100
>>> amps2 = analyze1(ys[:n], fs, ts[:n])
4101
>>> print(amps2)
4102
[ 0.60+0.j  0.25-0.j  0.10+0.j  0.05-0.j]
4103
\end{verbatim}
4104

4105
These are approximately the amplitudes we started with, although
4106
each component has a small imaginary part due to
4107
floating-point errors.
4108

4109

4110
\section{Efficient analysis}
4111

4112
Unfortunately, solving a linear system of equations is slow.  For the
4113
DCT, we were able to speed things up by choosing {\tt fs} and {\tt
4114
  ts} so that {\tt M} is orthogonal.  That way, the inverse of {\tt M}
4115
is the transpose of {\tt M}, and we can compute both DCT and inverse
4116
DCT by matrix multiplication.
4117

4118
We'll do the same thing for the DFT, with one small change.
4119
Since {\tt M} is complex, we need it to be {\bf unitary}, rather
4120
than orthogonal, which means that the inverse of {\tt M} is
4121
the conjugate transpose of {\tt M}, which we can compute by
4122
transposing the matrix and negating the imaginary part of each
4123
element.  See \url{http://en.wikipedia.org/wiki/Unitary_matrix}. 
4124

4125
The NumPy methods {\tt conj} and {\tt transpose} do what we
4126
want.  Here's the code that computes {\tt M} for $N=4$ components:
4127

4128
\begin{verbatim}
4129
>>> N = 4
4130
>>> ts = np.arange(N) / N
4131
>>> fs = np.arange(N)
4132
>>> args = np.outer(ts, fs)
4133
>>> M = np.exp(1j * PI2 * args)
4134
\end{verbatim}
4135

4136
If $M$ is unitary, $M^*M = I$, where $M^*$ is the conjugate transpose
4137
of $M$, and $I$ is the identity matrix.  We can test whether $M$
4138
is unitary like this:
4139

4140
\begin{verbatim}
4141
>>> MstarM = M.conj().transpose().dot(M)
4142
\end{verbatim}
4143

4144
The result, within the tolerance of floating-point error, is
4145
$4 I$, so $M$ is unitary except for an extra factor of $N$,
4146
similar to the extra factor of 2 we found with the DCT.
4147

4148
We can use this result to write a faster version of {\tt analyze1}:
4149

4150
\begin{verbatim}
4151
def analyze2(ys, fs, ts):
4152
    args = np.outer(ts, fs)
4153
    M = np.exp(1j * PI2 * args)
4154
    amps = M.conj().transpose().dot(ys) / N
4155
    return amps
4156
\end{verbatim}
4157

4158
And test it with appropriate values of {\tt fs} and {\tt ts}:
4159

4160
\begin{verbatim}
4161
>>> N = 4
4162
>>> amps = np.array([0.6, 0.25, 0.1, 0.05])
4163
>>> fs = np.arange(N)
4164
>>> ts = np.arange(N) / N
4165
>>> ys = synthesize2(amps, fs, ts)
4166

4167
>>> amps23 = analyze2(ys, fs, ts)
4168
>>> print(amps3)
4169
[ 0.60+0.j  0.25+0.j  0.10-0.j  0.05-0.j]
4170
\end{verbatim}
4171

4172
Again, the result is correct within the tolerance of floating-point
4173
arithmetic.
4174

4175

4176
\section{DFT}
4177
\label{dftsection}
4178

4179
As a function, {\tt analyze2} would be hard to use because it
4180
only works if {\tt fs} and {\tt ts} are chosen correctly.
4181
Instead, I will rewrite it to take just {\tt ys} and compute {\tt freq}
4182
and {\tt ts} itself.
4183

4184
First, I'll make a function to compute the synthesis matrix, $M$:
4185

4186
\begin{verbatim}
4187
def synthesis_matrix(N):
4188
    ts = np.arange(N) / N
4189
    fs = np.arange(N)
4190
    args = np.outer(ts, fs)
4191
    M = np.exp(1j * PI2 * args)
4192
    return M
4193
\end{verbatim}
4194

4195
Then I'll write the function that takes {\tt ys} and returns 
4196
{\tt amps}:
4197

4198
\begin{verbatim}
4199
def analyze3(ys):
4200
    N = len(ys)
4201
    M = synthesis_matrix(N)
4202
    amps = M.conj().transpose().dot(ys) / N
4203
    return amps
4204
\end{verbatim}
4205

4206
We are almost done; {\tt analyze3} computes something very
4207
close to the DFT, with one difference.  The conventional definition
4208
of DFT does not divide by {\tt N}:
4209

4210
\begin{verbatim}
4211
def dft(ys):
4212
    N = len(ys)
4213
    M = synthesis_matrix(N)
4214
    amps = M.conj().transpose().dot(ys)
4215
    return amps
4216
\end{verbatim}
4217

4218
Now we can confirm that my version yields the same result as
4219
FFT:
4220

4221
\begin{verbatim}
4222
>>> dft(ys)
4223
[ 2.4+0.j  1.0+0.j  0.4-0.j  0.2-0.j]
4224
\end{verbatim}
4225

4226
The result is close to {\tt amps * N}.
4227
And here's the version in {\tt np.fft}:
4228

4229
\begin{verbatim}
4230
>>> np.fft.fft(ys)
4231
[ 2.4+0.j  1.0+0.j  0.4-0.j  0.2-0.j]
4232
\end{verbatim}
4233

4234
They are the same, within floating point error.
4235

4236
The inverse DFT is almost the same, except we don't have to transpose
4237
and conjugate $M$, and {\em now} we have to divide through by N:
4238

4239
\begin{verbatim}
4240
def idft(ys):
4241
    N = len(ys)
4242
    M = synthesis_matrix(N)
4243
    amps = M.dot(ys) / N
4244
    return amps
4245
\end{verbatim}
4246

4247
Finally, we can confirm that {\tt dft(idft(amps))} yields {\tt amps}.
4248

4249
\begin{verbatim}
4250
>>> ys = idft(amps)
4251
>>> dft(ys)
4252
[ 0.60+0.j  0.25+0.j  0.10-0.j  0.05-0.j]
4253
\end{verbatim}
4254

4255
If I could go back in time, I might change the definition of
4256
DFT so it divides by $N$ and the inverse DFT doesn't.  That would
4257
be more consistent with my presentation of the synthesis and analysis
4258
problems.
4259

4260
Or I might change the definition so that both operations divide
4261
through by $\sqrt{N}$.  Then the DFT and inverse DFT would be
4262
more symmetric.
4263

4264
But I can't go back in time (yet!), so we're stuck with a slightly
4265
weird convention.  For practical purposes it doesn't really
4266
matter.
4267

4268

4269
\section{The DFT is periodic}
4270

4271
In this chapter I presented the DFT in the form of matrix multiplication.
4272
We compute the synthesis matrix, $M$, and the analysis matrix, $M^*$.
4273
When we multiply $M^{*}$ by the wave array, $y$, each element of the
4274
result is the product of a row from $M^*$ and $y$, which we can
4275
write in the form of a summation:
4276
%
4277
\[ DFT(y)[k] = \sum_n y[n] \exp(-2 \pi i n k / N) \]
4278
%
4279
where $k$ is an index of frequency from
4280
$0$ to $N-1$ and $n$ is an index of time from $0$ to $N-1$.
4281
So $DFT(y)[k]$ is the $k$th element of the DFT of $y$.
4282

4283
Normally we evaluate this summation for $N$ values of $k$, running from
4284
0 to $N-1$.  We {\em could} evaluate it for other values of $k$, but
4285
there is no point, because they start to repeat.  That is, the value at
4286
$k$ is the same as the value at $k+N$ or $k+2N$ or $k-N$, etc.
4287

4288
We can see that mathematically by plugging $k+N$ into the summation:
4289
%
4290
\[ DFT(y)[k+N] = \sum_n y[n] \exp(-2 \pi i n (k+N) / N) \]
4291
%
4292
Since there is a sum in the exponent, we can break it into two parts:
4293
%
4294
\[ DFT(y)[k+N] = \sum_n y[n] \exp(-2 \pi i n k / N)  \exp(-2 \pi i n N / N) \]
4295
%
4296
In the second term, the exponent is always an integer multiple of
4297
$2 \pi$, so the result is always 1, and we can drop it:
4298
%
4299
\[ DFT(y)[k+N] = \sum_n y[n] \exp(-2 \pi i n k / N) \]
4300
%
4301
And we can see that this summation is equivalent to $ DFT(y)[k]$.
4302
So the DFT is periodic, with period $N$.  You will need this result
4303
for one of the exercises below, which asks you to implement the Fast Fourier
4304
Transform (FFT).
4305

4306
As an aside, writing the DFT in the form of a summation provides an
4307
insight into how it works.  If you review the diagram in
4308
Section~\ref{synthesis}, you'll see that each column of the synthesis matrix
4309
is a signal evaluated at a sequence of times.  The analysis matrix is
4310
the (conjugate) transpose of the synthesis matrix, so each {\em row}
4311
is a signal evaluated at a sequence of times.
4312

4313
Therefore, each summation is the correlation of $y$ with one of the
4314
signals in the array (see Section~\ref{dotproduct}).  That is, each
4315
element of the DFT is a correlation that quantifies the similarity of
4316
the wave array, $y$, and a complex exponential at a particular
4317
frequency.
4318

4319

4320
\section{DFT of real signals}
4321

4322
\begin{figure}
4323
% dft.py
4324
\centerline{\includegraphics[height=2.5in]{figs/dft3.eps}}
4325
\caption{DFT of a 500 Hz sawtooth signal sampled at 10 kHz.}
4326
\label{fig.dft3}
4327
\end{figure}
4328

4329
The Spectrum class in {\tt thinkdsp} is based on {\tt np.ftt.rfft},
4330
which computes the ``real DFT''; that is, it works with real signals.
4331
But the DFT as presented in this chapter is more general than that; it
4332
works with complex signals.
4333

4334
So what happens when we apply the ``full DFT'' to a real signal?
4335
Let's look at an example:
4336

4337
\begin{verbatim}
4338
>>> signal = thinkdsp.SawtoothSignal(freq=500)
4339
>>> wave = signal.make_wave(duration=0.1, framerate=10000)
4340
>>> hs = dft(wave.ys)
4341
>>> amps = np.absolute(hs)
4342
\end{verbatim}
4343

4344
This code makes a sawtooth wave with frequency 500 Hz, sampled at
4345
framerate 10 kHz.  {\tt hs} contains the complex DFT of the wave;
4346
{\tt amps} contains the amplitude at each frequency.  But what
4347
frequency do these amplitudes correspond to?  If we look at the
4348
body of {\tt dft}, we see:
4349

4350
\begin{verbatim}
4351
>>> fs = np.arange(N)
4352
\end{verbatim}
4353

4354
It's tempting to think that these values are the right frequencies.
4355
The problem is that {\tt dft} doesn't know the sampling rate.  The DFT
4356
assumes that the duration of the wave is 1 time unit, so it thinks the
4357
sampling rate is $N$ per time unit.  In order to interpret the
4358
frequencies, we have to convert from these arbitrary time units back
4359
to seconds, like this:
4360

4361
\begin{verbatim}
4362
>>> fs = np.arange(N) * framerate / N
4363
\end{verbatim}
4364

4365
With this change, the range of frequencies is from 0 to the actual
4366
framerate, 10 kHz.  Now we can plot the spectrum:
4367

4368
\begin{verbatim}
4369
>>> thinkplot.plot(fs, amps)
4370
>>> thinkplot.config(xlabel='frequency (Hz)', 
4371
                     ylabel='amplitude')
4372
\end{verbatim}
4373

4374
Figure~\ref{fig.dft3} shows the amplitude of the signal for each
4375
frequency component from 0 to 10 kHz.  The left half of the figure
4376
is what we should expect: the dominant frequency is at 500 Hz, with
4377
harmonics dropping off like $1/f$.
4378

4379
But the right half of the figure is a surprise.  Past 5000 Hz, the
4380
amplitude of the harmonics start growing again, peaking at 9500 Hz.
4381
What's going on?
4382

4383
The answer: aliasing.  Remember that with framerate 10000 Hz, the
4384
folding frequency is 5000 Hz.  As we saw in Section~\ref{aliasing},
4385
a component at 5500 Hz is indistinguishable from a component
4386
at 4500 Hz.  When we evaluate the DFT at 5500 Hz, we get the same
4387
value as at 4500 Hz.  Similarly, the value at 6000 Hz is the same
4388
as the one at 4000 Hz, and so on.
4389

4390
The DFT of a real signal is symmetric around the folding frequency.
4391
Since there is no additional information past this point, we can
4392
save time by evaluating only the first half of the DFT,
4393
and that's exactly what {\tt np.fft.rfft} does.
4394

4395

4396

4397
\section{Exercises}
4398

4399
Solutions to these exercises are in {\tt chap07soln.ipynb}.
4400

4401

4402
\begin{exercise}
4403
The notebook for this chapter, {\tt chap07.ipynb}, contains
4404
additional examples and explanations.  Read through it and run
4405
the code.
4406
\end{exercise}
4407

4408

4409
\begin{exercise}
4410
In this chapter, I showed how we can express the DFT and inverse DFT
4411
as matrix multiplications.  These operations take time proportional to
4412
$N^2$, where $N$ is the length of the wave array.  That is fast enough
4413
for many applications, but there is a faster
4414
algorithm, the Fast Fourier Transform (FFT), which takes time
4415
proportional to $N \log N$.
4416

4417
The key to the FFT is the Danielson-Lanczos lemma:
4418
%
4419
\[ DFT(y)[n] = DFT(e)[n] + exp(-2 \pi i n / N) DFT(o)[n] \]
4420
%
4421
Where $ DFT(y)[n]$ is the $n$th element of the DFT of $y$; $e$ is a
4422
wave array containing the even elements of $y$, and $o$ contains the
4423
odd elements of $y$.
4424

4425
This lemma suggests a recursive algorithm for the DFT:
4426

4427
\begin{enumerate}
4428

4429
\item Given a wave array, $y$, split it into its even elements, $e$,
4430
  and its odd elements, $o$.
4431

4432
\item Compute the DFT of $e$ and $o$ by making recursive calls.
4433

4434
\item Compute $DFT(y)$ for each value of $n$ using the
4435
  Danielson-Lanczos lemma.
4436

4437
\end{enumerate}
4438

4439
For the base case of this recursion, you could wait until the length
4440
of $y$ is 1.  In that case, $DFT(y) = y$.  Or if the length of $y$
4441
is sufficiently small, you could compute its DFT by matrix multiplication,
4442
possibly using a precomputed matrix.
4443

4444
Hint: I suggest you implement this algorithm incrementally by starting
4445
with a version that is not truly recursive.  In Step 2, instead of
4446
making a recursive call, use {\tt dft}, as defined by
4447
Section~\ref{dftsection}, or {\tt np.fft.fft}.  Get Step 3 working,
4448
and confirm that the results are consistent with the other
4449
implementations.  Then add a base case and confirm that it works.
4450
Finally, replace Step 2 with recursive calls.
4451

4452
One more hint: Remember that the DFT is periodic; you might find {\tt
4453
  np.tile} useful.
4454

4455
You can read more about the FFT at
4456
\url{https://en.wikipedia.org/wiki/Fast_Fourier_transform}.
4457

4458
\end{exercise}
4459

4460

4461
\chapter{Filtering and Convolution}
4462

4463
In this chapter I present one of the most important and useful
4464
ideas related to signal processing: the Convolution Theorem.
4465
But before we can understand the Convolution Theorem, we have to understand
4466
convolution.  I'll start with a simple example, smoothing, and
4467
we'll go from there.
4468

4469
The code for this chapter is in {\tt chap08.ipynb}, which is in the
4470
repository for this book (see Section~\ref{code}).
4471
You can also view it at \url{http://tinyurl.com/thinkdsp08}.
4472

4473

4474
\section{Smoothing}
4475
\label{smoothing}
4476

4477
\begin{figure}
4478
% convolution1.py
4479
\centerline{\includegraphics[height=2.5in]{figs/convolution1.eps}}
4480
\caption{Daily closing price of Facebook stock and a 30-day moving
4481
average.}
4482
\label{fig.convolution1}
4483
\end{figure}
4484

4485
Smoothing is an operation that tries to remove short-term variations
4486
from a signal in order to reveal long-term trends.  For example, if
4487
you plot daily changes in the price of a stock, it would look noisy;
4488
a smoothing operator might make it easier to see whether the price
4489
was generally going up or down over time.  
4490

4491
A common smoothing algorithm is a moving average, which computes
4492
the mean of the previous $n$ values, for some value of $n$.
4493

4494
For example, Figure~\ref{fig.convolution1} shows the daily
4495
closing price of Facebook from May 17, 2012 to December 8,
4496
2015.  The gray line
4497
is the raw data, the darker line shows the 30-day moving average.
4498
Smoothing removes the most
4499
extreme changes and makes it easier to see long-term trends.
4500

4501
Smoothing operations also apply to sound signals.  As an example, I'll
4502
start with a square wave at 440 Hz.  As we saw in
4503
Section~\ref{square}, the harmonics of a square wave drop off
4504
slowly, so it contains many high-frequency components.
4505

4506
\begin{verbatim}
4507
>>> signal = thinkdsp.SquareSignal(freq=440)
4508
>>> wave = signal.make_wave(duration=1, framerate=44100)
4509
>>> segment = wave.segment(duration=0.01)
4510
\end{verbatim}
4511

4512
{\tt wave} is a 1-second segment of the signal; {\tt segment}
4513
is a shorter segment I'll use for plotting.
4514

4515
To compute the moving average of this signal, I'll create
4516
a window with 11 elements and normalize it so the elements
4517
add up to 1.
4518

4519
\begin{verbatim}
4520
>>> window = np.ones(11)
4521
>>> window /= sum(window)
4522
\end{verbatim}
4523

4524
Now I can compute the average of the first 11 elements by
4525
multiplying the window by the wave array:
4526

4527
\begin{verbatim}
4528
>>> ys = segment.ys
4529
>>> N = len(ys)
4530
>>> padded = thinkdsp.zero_pad(window, N)
4531
>>> prod = padded * ys
4532
>>> sum(prod)
4533
-1
4534
\end{verbatim}
4535

4536
{\tt padded} is a version of the window with zeros added to
4537
the end so it has the same length as {\tt segment.ys}
4538
{\tt prod} is the product of the window and the wave array.
4539

4540
The sum of the elementwise products is the average of the first 11
4541
elements of the array.  Since these elements are all -1, their average
4542
is -1.
4543

4544
\begin{figure}
4545
% convolution2.py
4546
\centerline{\includegraphics[height=2.5in]{figs/convolution2.eps}}
4547
\caption{A square signal at 400 Hz (gray) and an 11-element
4548
moving average.}
4549
\label{fig.convolution2}
4550
\end{figure}
4551

4552
To compute the next element of the smoothed signal, we shift the
4553
window to the right and compute the average of the next 11 elements of
4554
the wave array, starting with the second.
4555

4556
\begin{verbatim}
4557
>>> rolled = np.roll(rolled, 1)
4558
>>> prod = rolled * ys
4559
>>> sum(prod)
4560
-1
4561
\end{verbatim}
4562

4563
To compute the moving average, we repeat this process and store
4564
the result in an array named {\tt smoothed}:
4565

4566
\begin{verbatim}
4567
>>> smoothed = np.zeros_like(ys)
4568
>>> rolled = padded
4569
>>> for i in range(N):
4570
        smoothed[i] = sum(rolled * ys)
4571
        rolled = np.roll(rolled, 1)
4572
\end{verbatim}
4573

4574
{\tt rolled} is a copy of {\tt padded} that gets shifted to
4575
the right by one element each time through the loop.  Inside
4576
the loop, we multiply the segment by {\tt rolled} to select
4577
11 elements, and then add them up.
4578

4579
Figure~\ref{fig.convolution2} shows the result.  The gray line
4580
is the original signal; the darker line is the smoothed signal.
4581
The smoothed signal starts to ramp up when the leading edge of
4582
the window reaches the first transition, and levels off when
4583
the window crosses the transition.  As a result, the transitions
4584
are less abrupt, and the corners less sharp.  If you listen
4585
to the smoothed signal, it sounds less buzzy and slightly muffled.
4586

4587

4588
\section{Convolution}
4589
\label{convolution}
4590

4591
The operation we just computed is called {\bf convolution},
4592
and it is such a common operation that NumPy provides an
4593
implementation that is simpler and faster than my version:
4594

4595
\begin{verbatim}
4596
>>> convolved = np.convolve(ys, window, mode='valid')
4597
>>> smooth2 = thinkdsp.Wave(convolved, framerate=wave.framerate)
4598
\end{verbatim}
4599

4600
{\tt np.convolve} computes the convolution of the wave
4601
array and the window.  The mode flag {\tt valid} indicates
4602
that it should only compute values when the window and the
4603
wave array overlap completely, so it stops when the right
4604
edge of the window reaches the end of the wave array.  Other
4605
than that, the result is the same as in Figure~\ref{fig.convolution2}.
4606

4607
\newcommand{\conv}{\ast}
4608

4609
Actually, there is one other difference.  The loop in the
4610
previous section actually computes {\bf cross-correlation}:
4611
%
4612
\[ (f \star g)[n] = \sum_{m=0}^{N-1} f[m] g[n+m]  \]
4613
%
4614
where $f$ is a wave array with length $N$, $g$ is the window,
4615
and $\star$ is the symbol for cross-correlation.  To
4616
compute the $n$th element of the result, we shift $g$ to
4617
the right, which is why the index is $n+m$.
4618

4619
The definition of convolution is slightly different:
4620
%
4621
\[ (f \conv g)[n] = \sum_{m=0}^{N-1} f[m] g[n-m]  \]
4622
%
4623
The symbol $\conv$ represents convolution.  The difference is in the
4624
index of $g$: $m$ has been negated, so the summation iterates the
4625
elements of $g$ backward (assuming that negative indices wrap around
4626
to the end of the array).
4627

4628
Because the window we used in this example is symmetric,
4629
cross-correlation and convolution yield the same result.  When we use
4630
other windows, we will have to be more careful.
4631

4632
You might wonder why convolution is defined the way it is.  There
4633
are two reasons:
4634

4635
\begin{itemize}
4636

4637
\item This definition comes up naturally for several applications,
4638
especially analysis of signal-processing systems, which is
4639
the topic of Chapter~\ref{systems}.
4640

4641
\item Also, this definition is the basis of the Convolution Theorem,
4642
coming up very soon.
4643

4644
\end{itemize}
4645

4646

4647
\section{The frequency domain}
4648

4649
\begin{figure}
4650
% convolution4.py
4651
\centerline{\includegraphics[height=2.5in]{figs/convolution4.eps}}
4652
\caption{Spectrum of the square wave before and after smoothing.}
4653
\label{fig.convolution4}
4654
\end{figure}
4655

4656
Smoothing makes the transitions in a square signal less abrupt,
4657
and makes the sound slightly muffled.  Let's see what effect this
4658
operation has on the spectrum.  First I'll plot the spectrum
4659
of the original wave:
4660

4661
\begin{verbatim}
4662
>>> spectrum = wave.make_spectrum()
4663
>>> spectrum.plot(color=GRAY)
4664
\end{verbatim}
4665

4666
Then the smoothed wave:
4667

4668
\begin{verbatim}
4669
>>> convolved = np.convolve(wave.ys, window, mode='same')
4670
>>> smooth = thinkdsp.Wave(convolved, framerate=wave.framerate)
4671
>>> spectrum2 = smooth.make_spectrum()
4672
>>> spectrum2.plot()
4673
\end{verbatim}
4674

4675
The mode flag {\tt same} indicates that the result should have the
4676
same length as the input.  In this example, it will include a few values
4677
that ``wrap around'', but that's ok for now.
4678

4679
Figure~\ref{fig.convolution4} shows the result.  The fundamental
4680
frequency is almost unchanged; the first few harmonics are
4681
attenuated, and the higher harmonics are almost eliminated.  So
4682
smoothing has the effect of a low-pass filter, which we
4683
saw in Section~\ref{spectrums} and Section~\ref{pink}.
4684

4685
To see how much each component has been attenuated, we can
4686
compute the ratio of the two spectrums:
4687

4688
\begin{verbatim}
4689
>>> amps = spectrum.amps
4690
>>> amps2 = spectrum2.amps
4691
>>> ratio = amps2 / amps    
4692
>>> ratio[amps<560] = 0
4693
>>> thinkplot.plot(ratio)
4694
\end{verbatim}
4695

4696
{\tt ratio} is the ratio of the amplitude before and after 
4697
smoothing.  When {\tt amps} is small, this ratio can be big
4698
and noisy, so for simplicity I set the ratio to 0 except
4699
where the harmonics are.
4700

4701
\begin{figure}
4702
% convolution5.py
4703
\centerline{\includegraphics[height=2.5in]{figs/convolution5.eps}}
4704
\caption{Ratio of spectrums for the square wave, before and after smoothing.}
4705
\label{fig.convolution5}
4706
\end{figure}
4707

4708
Figure~\ref{fig.convolution5} shows the result.  As expected, the
4709
ratio is high for low frequencies and drops off at a cutoff frequency
4710
near 4000 Hz.  But there is another feature we did not expect: above
4711
the cutoff, the ratio bounces around between 0 and 0.2.
4712
What's up with that?
4713

4714

4715
\section{The convolution theorem}
4716
\label{convtheorem}
4717

4718
\begin{figure}
4719
% convolution6.py
4720
\centerline{\includegraphics[height=2.5in]{figs/convolution6.eps}}
4721
\caption{Ratio of spectrums for the square wave, before and after
4722
  smoothing, along with the DFT of the smoothing window.}
4723
\label{fig.convolution6}
4724
\end{figure}
4725

4726
\newcommand{\DFT}{\mathrm{DFT}}
4727
\newcommand{\IDFT}{\mathrm{IDFT}}
4728

4729
The answer is the convolution theorem.  Stated mathematically:
4730
%
4731
\[ \DFT(f \conv g) = \DFT(f) \cdot \DFT(g) \]
4732
%
4733
where $f$ is a wave array and $g$ is a window.  In words,
4734
the convolution theorem says that if we convolve $f$ and $g$,
4735
and then compute the DFT, we get the same answer as computing
4736
the DFT of $f$ and $g$, and then multiplying the results
4737
element-wise.  More concisely, convolution in the time
4738
domain corresponds to multiplication in the frequency domain.
4739

4740
And that explains Figure~\ref{fig.convolution5}, because when we
4741
convolve a wave and a window, we multiply the spectrum of
4742
the wave with the spectrum of the window.  To see how that works,
4743
we can compute the DFT of the window:
4744

4745
\begin{verbatim}
4746
    padded = zero_pad(window, N)
4747
    dft_window = np.fft.rfft(padded)
4748
    thinkplot.plot(abs(dft_window))
4749
\end{verbatim}
4750

4751
{\tt padded} contains the window, padded with zeros to be the
4752
same length as {\tt wave}.  \verb"dft_window" contains the
4753
DFT of the smoothing window.
4754

4755
\newcommand{\abs}{\mathrm{abs}}
4756

4757
Figure~\ref{fig.convolution6} shows the result, along with the
4758
ratios we computed in the previous section.  The ratios are
4759
exactly the amplitudes in \verb"dft_window".  Mathematically:
4760
%
4761
\[ \abs(\DFT(f \conv g)) / \abs(\DFT(f)) = \abs(\DFT(g)) \]
4762
%
4763
In this context, the DFT of a window is called a {\bf filter}.
4764
For any convolution window in the time domain, there is a
4765
corresponding filter in the frequency domain.  And for any
4766
filter than can be expressed by element-wise multiplication in
4767
the frequency domain, there is a corresponding window.
4768

4769

4770
\section{Gaussian filter}
4771

4772
\begin{figure}
4773
% convolution7.py
4774
\centerline{\includegraphics[height=2.5in]{figs/convolution7.eps}}
4775
\caption{Boxcar and Gaussian windows.}
4776
\label{fig.convolution7}
4777
\end{figure}
4778

4779
The moving average window we used in the previous section is a
4780
low-pass filter, but it is not a very good one.  The DFT drops off
4781
steeply at first, but then it bounces around.  Those bounces are
4782
called {\bf sidelobes}, and they are there because the moving average
4783
window is like a square wave, so its spectrum contains high-frequency
4784
harmonics that drop off proportionally to $1/f$, which is relatively
4785
slow.
4786

4787
We can do better with a Gaussian window.  SciPy provides functions
4788
that compute many common convolution windows, including {\tt
4789
  gaussian}:
4790

4791
\begin{verbatim}
4792
    gaussian = scipy.signal.gaussian(M=11, std=2)
4793
    gaussian /= sum(gaussian)
4794
\end{verbatim}
4795

4796
{\tt M} is the number of elements in the window; {\tt std}
4797
is the standard deviation of the Gaussian distribution used to
4798
compute it.  Figure~\ref{fig.convolution7} shows the shape
4799
of the window.  It is a discrete approximation of the Gaussian
4800
``bell curve''.  The figure also shows the moving average window
4801
from the previous example, which is sometimes called a
4802
``boxcar'' window because it looks like a rectangular railway car.
4803

4804
\begin{figure}
4805
% convolution.py
4806
\centerline{\includegraphics[height=2.5in]{figs/convolution8.eps}}
4807
\caption{Ratio of spectrums before and after Gaussian smoothing, and
4808
  the DFT of the window.}
4809
\label{fig.convolution8}
4810
\end{figure}
4811

4812
I ran the computations from the previous sections again
4813
with this curve, and generated Figure~\ref{fig.convolution8},
4814
which shows the ratio of the spectrums before and after
4815
smoothing, along with the DFT of the Gaussian window. 
4816

4817
As a low-pass filter, Gaussian smoothing is better than a simple
4818
moving average.  After the ratio drops off, it stays low, with almost
4819
none of the sidelobes we saw with the boxcar window.  So it does a
4820
better job of cutting off the higher frequencies.
4821

4822
The reason it does so well is that the DFT of a Gaussian curve is also a
4823
Gaussian curve.  So the ratio drops off in proportion to $\exp(-f^2)$,
4824
which is much faster than $1/f$.
4825

4826

4827
\section{Efficient convolution}
4828
\label{effconv}
4829

4830
One of the reasons the FFT is such an important algorithm is that,
4831
combined with the Convolution Theorem, it provides an efficient
4832
way to compute convolution, cross-correlation, and autocorrelation.
4833

4834
Again, the Convolution Theorem states
4835
%
4836
\[ \DFT(f \conv g) = \DFT(f) \cdot \DFT(g) \]
4837
%
4838
So one way to compute a convolution is:
4839
%
4840
\[ f \conv g = \IDFT(\DFT(f) \cdot \DFT(g)) \]
4841
%
4842
where $IDFT$ is the inverse DFT.  A simple implementation of
4843
convolution takes time proportional to $N^2$; this algorithm,
4844
using FFT, takes time proportional to $N \log N$.
4845

4846
We can confirm that it works by computing the same convolution
4847
both ways.  As an example, I'll apply it to the BitCoin data
4848
shown in Figure~\ref{fig.convolution1}.
4849

4850
\begin{verbatim}
4851
    df = pandas.read_csv('coindesk-bpi-USD-close.csv', 
4852
                         nrows=1625, 
4853
                         parse_dates=[0])
4854
    ys = df.Close.values
4855
\end{verbatim}
4856

4857
This example uses Pandas to read the data from the CSV file (included
4858
in the repository for this book).  If you are not familiar with
4859
Pandas, don't worry: I'm not going to do much with it in this book.
4860
But if you're interested, you can learn more about it in
4861
{\it Think Stats} at \url{http://thinkstats2.com}.
4862

4863
The result, {\tt df}, is a DataFrame, one of the data structures
4864
provided by Pandas.  {\tt ys} is a NumPy array that contains daily
4865
closing prices.
4866

4867
Next I'll create a Gaussian window and convolve it with {\tt ys}:
4868

4869
\begin{verbatim}
4870
    window = scipy.signal.gaussian(M=30, std=6)
4871
    window /= window.sum()
4872
    smoothed = np.convolve(ys, window, mode='valid')
4873
\end{verbatim}
4874

4875
\verb"fft_convolve" computes the same thing using FFT:
4876

4877
\begin{verbatim}
4878
from np.fft import fft, ifft
4879

4880
def fft_convolve(signal, window):
4881
    fft_signal = fft(signal)
4882
    fft_window = fft(window)
4883
    return ifft(fft_signal * fft_window)
4884
\end{verbatim}
4885

4886
We can test it by padding the window to the same length
4887
as {\tt ys} and then computing the convolution:
4888

4889
\begin{verbatim}
4890
    padded = zero_pad(window, N)
4891
    smoothed2 = fft_convolve(ys, padded)
4892
\end{verbatim}
4893

4894
The result has $M-1$ bogus values at the beginning, where $M$ is the
4895
length of the window.  If we slice off the bogus values, the result
4896
agrees with \verb"fft_convolve" with about 12 digits of precision.
4897

4898
\begin{verbatim}
4899
    M = len(window)
4900
    smoothed2 = smoothed2[M-1:]
4901
\end{verbatim}
4902

4903

4904
\section{Efficient autocorrelation}
4905

4906
\begin{figure}
4907
% convolution.py
4908
\centerline{\includegraphics[height=2.5in]{figs/convolution9.eps}}
4909
\caption{Autocorrelation functions computed by NumPy and
4910
  {\tt fft\_correlate}.}
4911
\label{fig.convolution9}
4912
\end{figure}
4913

4914
In Section~\ref{convolution} I presented definitions of
4915
cross-correlation and convolution, and we saw that they are
4916
almost the same, except that in convolution the window is
4917
reversed.
4918

4919
Now that we have an efficient algorithm for convolution, we
4920
can also use it to compute cross-correlations and autocorrelations.
4921
Using the data from the previous section, we can compute the
4922
autocorrelation Facebook stock prices:
4923

4924
\begin{verbatim}
4925
corrs = np.correlate(close, close, mode='same')
4926
\end{verbatim}
4927

4928
With {\tt mode='same'}, the result has the same length as {\tt close},
4929
corresponding to lags from $-N/2$ to $N/2-1$.  
4930
The gray line in Figure~\ref{fig.convolution9} shows the result.
4931
Except at {\tt lag=0}, there are no peaks, so there is no apparent
4932
periodic behavior in this signal.  However, the autocorrelation
4933
function drops off slowly, suggesting that this signal resembled
4934
pink noise, as we saw in Section~\ref{autopink}.
4935

4936
To compute autocorrelation using convolution, 
4937
have to zero-pad the signal to double the length. 
4938
This trick is necessary because the FFT is based
4939
on the assumption that the signal is periodic; that is, that it wraps
4940
around from the end to the beginning.  With time-series data like
4941
this, that assumption is invalid.  Adding zeros, and then trimming 
4942
the results, removes the bogus values.
4943

4944
Also, remember that convolution reverse the direction of the window.
4945
In order to cancel that effect, we reverse the direction of the
4946
window before calling \verb"fft_convolve", using {\tt np.flipud},
4947
which flips a NumPy array.  The result is a view of the array,
4948
not a copy, so this operation is fast.
4949

4950
\begin{verbatim}
4951
def fft_autocorr(signal):
4952
    N = len(signal)
4953
    signal = thinkdsp.zero_pad(signal, 2*N)
4954
    window = np.flipud(signal)
4955

4956
    corrs = fft_convolve(signal, window)
4957
    corrs = np.roll(corrs, N//2+1)[:N]
4958
    return corrs
4959
\end{verbatim}
4960

4961
The result from \verb"fft_convolve" has length $2N$.  Of those,
4962
the first and last $N/2$ are valid; the rest are the result of
4963
zero-padding.  To select the valid element, we roll the results
4964
and select the first $N$, corresponding to lags from $-N/2$ to
4965
$N/2-1$.
4966

4967
As shown in Figure~\ref{fig.convolution9} the results from
4968
\verb"fft_autocorr" and {\tt np.correlate} are identical (with
4969
about 9 digits of precision).
4970

4971
Notice that the correlations in Figure~\ref{fig.convolution9} are
4972
large numbers; we could normalize them (between -1 and 1) as shown
4973
in Section~\ref{correlate}.
4974

4975
The strategy we used here for auto-correlation also works for
4976
cross-correlation.  Again, you have to prepare the signals by flipping
4977
one and padding both, and then you have to trim the invalid parts of
4978
the result.  This padding and trimming is a nuisance, but that's why
4979
libraries like NumPy provide functions to do it for you.
4980

4981

4982
\section{Exercises}
4983

4984
Solutions to these exercises are in {\tt chap08soln.ipynb}.
4985

4986
\begin{exercise}
4987
The notebook for this chapter is {\tt chap08.ipynb}.
4988
Read through it and run the code.
4989

4990
It contains an interactive widget that lets you
4991
experiment with the parameters of the Gaussian window to see
4992
what effect they have on the cutoff frequency.
4993

4994
What goes wrong when you increase the width of the Gaussian,
4995
{\tt std}, without increasing the number of elements in the window,
4996
{\tt M}?
4997
\end{exercise}
4998

4999

5000
\begin{exercise}
5001
In this chapter I claimed that the Fourier transform of a Gaussian
5002
curve is also a Gaussian curve.  For discrete Fourier transforms,
5003
this relationship is approximately true.
5004

5005
Try it out for a few examples.  What happens to the Fourier transform
5006
as you vary {\tt std}?
5007
\end{exercise}
5008

5009

5010
\begin{exercise}
5011
If you did the exercises in Chapter~\ref{nonperiodic}, you saw the
5012
effect of the Hamming window, and some of the other windows provided
5013
by NumPy, on spectral leakage.  We can get some insight into the
5014
effect of these windows by looking at their DFTs.
5015

5016
In addition to the Gaussian window we used in this window, create a
5017
Hamming window with the same size.  Zero pad the windows and plot
5018
their DFTs.  Which window acts as a better low-pass filter?  You might
5019
find it useful to plot the DFTs on a log-$y$ scale.
5020

5021
Experiment with a few different windows and a few different sizes.
5022
\end{exercise}
5023

5024

5025

5026
\chapter{Differentiation and Integration}
5027
\label{diffint}
5028

5029
This chapter picks up where the previous chapter left off,
5030
looking at the relationship between windows in the time domain
5031
and filters in the frequency domain.
5032

5033
In particular, we'll look at the effect of a finite difference
5034
window, which approximates differentiation, and the cumulative
5035
sum operation, which approximates integration.
5036

5037
The code for this chapter is in {\tt chap09.ipynb}, which is in the
5038
repository for this book (see Section~\ref{code}).
5039
You can also view it at \url{http://tinyurl.com/thinkdsp09}.
5040

5041

5042
\section{Finite differences}
5043
\label{diffs}
5044

5045
In Section~\ref{smoothing}, we applied a smoothing window to
5046
the stock price of Facebook and found that a smoothing
5047
window in the time domain corresponds to a low-pass filter in
5048
the frequency domain.
5049

5050
In this section, we'll look at daily price changes and
5051
see that computing the difference between successive elements,
5052
in the time domain, corresponds to a high-pass filter.
5053

5054
Here's the code to read the data, store it as a wave, and compute its
5055
spectrum.
5056

5057
\begin{verbatim}
5058
import pandas as pd
5059

5060
names = ['date', 'open', 'high', 'low', 'close', 'volume']
5061
df = pd.read_csv('fb.csv', header=0, names=names)
5062
ys = df.close.values[::-1]
5063
close = thinkdsp.Wave(ys, framerate=1)
5064
spectrum = wave.make_spectrum()
5065
\end{verbatim}
5066

5067
This example uses Pandas to read the CSV file; the
5068
result is a DataFrame, {\tt df}, with columns for the opening
5069
price, closing price, and high and low prices.  I select the closing
5070
prices and save them in a Wave object.
5071
The framerate is 1 sample per day.  
5072

5073
Figure~\ref{fig.diff_int1} shows
5074
this time series and its spectrum.
5075
Visually, the time series resembles Brownian noise (see
5076
Section~\ref{brownian}).
5077
And the spectrum looks like a straight
5078
line, albeit a noisy one.  The estimated slope is -1.9,
5079
which is consistent with Brownian noise.
5080

5081
\begin{figure}
5082
% diff_int.py
5083
\centerline{\includegraphics[height=2.5in]{figs/diff_int1.eps}}
5084
\caption{Daily closing price of Facebook and the spectrum of this time
5085
  series.}
5086
\label{fig.diff_int1}
5087
\end{figure}
5088

5089
Now let's compute the daily price change using {\tt np.diff}:
5090

5091
\begin{verbatim}
5092
    diff = np.diff(ys)
5093
    change = thinkdsp.Wave(diff, framerate=1)
5094
    change_spectrum = change.make_spectrum()
5095
\end{verbatim}
5096

5097
Figure~\ref{fig.diff_int2} shows the resulting wave and its spectrum.
5098
The daily changes resemble white noise, and the estimated slope of the
5099
spectrum, -0.06, is near zero, which is what we expect for white
5100
noise.
5101

5102
\begin{figure}
5103
% diff_int2.py
5104
\centerline{\includegraphics[height=2.5in]{figs/diff_int2.eps}}
5105
\caption{Daily price change of Facebook and the spectrum of this time series.}
5106
\label{fig.diff_int2}
5107
\end{figure}
5108

5109

5110
\section{The frequency domain}
5111

5112
Computing the difference
5113
between successive elements is the same as convolution with
5114
the window {\tt [1, -1]}.
5115
If the order of those elements seems backward,
5116
remember that convolution reverses the window before applying it
5117
to the signal.
5118

5119
We can see the effect of this operation in the frequency domain
5120
by computing the DFT of the window.  
5121

5122
\begin{verbatim}
5123
    diff_window = np.array([1.0, -1.0])
5124
    padded = thinkdsp.zero_pad(diff_window, len(close))
5125
    diff_wave = thinkdsp.Wave(padded, framerate=close.framerate)
5126
    diff_filter = diff_wave.make_spectrum()
5127
\end{verbatim}
5128

5129
\begin{figure}
5130
% diff_int.py
5131
\centerline{\includegraphics[height=2.5in]{figs/diff_int3.eps}}
5132
\caption{Filters corresponding to the diff and differentiate operators (left) and integration operator (right, log-$y$ scale).}
5133
\label{fig.diff_int3}
5134
\end{figure}
5135

5136
Figure~\ref{fig.diff_int3} shows the result.  The finite difference
5137
window corresponds to a high pass filter: its amplitude increases with
5138
frequency, linearly for low frequencies, and then sublinearly after
5139
that.  In the next section, we'll see why.
5140

5141

5142
\section{Differentiation}
5143
\label{effdiff}
5144

5145
The window we used in the previous section is a
5146
numerical approximation of the first derivative, so the filter
5147
approximates the effect of differentiation.
5148

5149
Differentiation in the time domain corresponds to a simple filter
5150
in the frequency domain; we can figure out what it is with a little
5151
math.
5152

5153
Suppose we have a complex sinusoid with frequency $f$:
5154
%
5155
\[ E_f(t) = e^{2 \pi i f t} \]
5156
%
5157
The first derivative of $E_f$ is
5158
%
5159
\[ \frac{d}{dt} E_f(t) = 2 \pi i f e^{2 \pi i f t} \]
5160
%
5161
which we can rewrite as
5162
%
5163
\[ \frac{d}{dt} E_f(t) = 2 \pi i f E_f(t) \]
5164
%
5165
In other words, taking the derivative of $E_f$ is the same
5166
as multiplying by $2 \pi i f$, which is a complex number
5167
with magnitude $2 \pi f$ and angle $\pi/2$.
5168

5169
We can compute the filter that corresponds to differentiation,
5170
like this:
5171

5172
\begin{verbatim}
5173
    deriv_filter = close.make_spectrum()
5174
    deriv_filter.hs = PI2 * 1j * deriv_filter.fs
5175
\end{verbatim}
5176

5177
I started with the spectrum of {\tt close}, which has the right
5178
size and framerate, then replaced the {\tt hs} with $2 \pi i f$.
5179
Figure~\ref{fig.diff_int3} (left) shows this filter; it is a straight line.
5180

5181
As we saw in Section~\ref{synthmat}, multiplying a complex sinusoid
5182
by a complex number has two effects: it multiplies
5183
the amplitude, in this case by $2 \pi f$, and shifts the phase
5184
offset, in this case by $\pi/2$.
5185

5186
If you are familiar with the language of operators and eigenfunctions,
5187
each $E_f$ is an eigenfunction of the differentiation operator, with the
5188
corresponding eigenvalue $2 \pi i f$.  See
5189
\url{http://en.wikipedia.org/wiki/Eigenfunction}.
5190

5191
If you are not familiar with that language, here's what it
5192
means:
5193

5194
\newcommand{\op}{\mathcal{A}}
5195

5196
\begin{itemize}
5197

5198
\item An operator is a function that takes a function and returns
5199
another function.  For example, differentiation is an operator.
5200

5201
\item A function, $g$, is an eigenfunction of an operator, $\op$, if
5202
applying $\op$ to $g$ has the effect of multiplying the function by
5203
a scalar.  That is, $\op g = \lambda g$.
5204

5205
\item In that case, the scalar $\lambda$ is the eigenvalue that
5206
corresponds to the eigenfunction $g$.
5207

5208
\item A given operator might have many eigenfunctions, each with
5209
a corresponding eigenvalue. 
5210

5211
\end{itemize}
5212

5213
Because complex sinusoids are eigenfunctions of the differentiation
5214
operator, they are easy to differentiate.  All we have to do is
5215
multiply by a complex scalar.
5216

5217
For signals with more than one
5218
component, the process is only slightly harder:
5219

5220
\begin{enumerate}
5221

5222
\item Express the signal as the sum of complex sinusoids.
5223

5224
\item Compute the derivative of each component by multiplication.
5225

5226
\item Add up the differentiated components.
5227

5228
\end{enumerate}
5229

5230
If that process sounds familiar, that's because it is identical
5231
to the algorithm for convolution in Section~\ref{effconv}: compute
5232
the DFT, multiply by a filter, and compute the inverse DFT.
5233

5234
{\tt Spectrum} provides a method that applies the differentiation
5235
filter:
5236

5237
\begin{verbatim}
5238
# class Spectrum:
5239

5240
    def differentiate(self):
5241
        self.hs *= PI2 * 1j * self.fs
5242
\end{verbatim}
5243

5244
We can use it to compute the derivative of the Facebook time series:
5245

5246
\begin{verbatim}
5247
    deriv_spectrum = close.make_spectrum()
5248
    deriv_spectrum.differentiate()
5249
    deriv = deriv_spectrum.make_wave()
5250
\end{verbatim}
5251

5252
\begin{figure}
5253
% diff_int.py
5254
\centerline{\includegraphics[height=2.5in]{figs/diff_int4.eps}}
5255
\caption{Comparison of daily price changes computed by
5256
{\tt np.diff} and by applying the differentiation filter.}
5257
\label{fig.diff_int4}
5258
\end{figure}
5259

5260
Figure~\ref{fig.diff_int4} compares the daily price changes computed by
5261
{\tt np.diff} with the derivative we just computed.
5262
I selected the first 50 values in the time series so we can see the
5263
differences more clearly.
5264

5265
The derivative is noisier, because it amplifies the high frequency
5266
components more, as shown in Figure~\ref{fig.diff_int3} (left).  Also, the
5267
first few elements of the derivative are very noisy.  The problem
5268
there is that the DFT-based derivative is based on the assumption that
5269
the signal is periodic.  In effect, it connects the last element in
5270
the time series back to the first element, which creates artifacts at
5271
the boundaries.
5272

5273
To summarize, we have shown:
5274

5275
\begin{itemize} 
5276

5277
\item Computing the difference between successive values in a signal
5278
  can be expressed as convolution with a simple window.  The result is
5279
  an approximation of the first derivative.
5280

5281
\item Differentiation in the time domain corresponds to a simple
5282
  filter in the frequency domain.  For periodic signals, the result is
5283
  the first derivative, exactly.  For some non-periodic signals, it
5284
  can approximate the derivative.
5285

5286
\end{itemize}
5287

5288
Using the DFT to compute derivatives is the basis of {\bf spectral
5289
  methods} for solving differential equations (see
5290
\url{http://en.wikipedia.org/wiki/Spectral_method}).
5291

5292
In particular, it is useful for the analysis of linear, time-invariant
5293
systems, which is coming up in Chapter~\ref{systems}.
5294

5295

5296
\section{Integration}
5297

5298
\begin{figure}
5299
% diff_int.py
5300
\centerline{\includegraphics[height=2.5in]{figs/diff_int5.eps}}
5301
\caption{Comparison of the original time series and the integrated
5302
derivative.}
5303
\label{fig.diff_int5}
5304
\end{figure}
5305

5306
In the previous section, we showed that differentiation in the time
5307
domain corresponds to a simple filter in the frequency domain: it
5308
multiplies each component by $2 \pi i f$.  Since integration is
5309
the inverse of differentiation, it also corresponds to a simple
5310
filter: it divides each component by $2 \pi i f$.
5311

5312
We can compute this filter like this:
5313

5314
\begin{verbatim}
5315
    integ_filter = close.make_spectrum()
5316
    integ_filter.hs = 1 / (PI2 * 1j * integ_filter.fs)
5317
\end{verbatim}
5318

5319
Figure~\ref{fig.diff_int3} (right) shows this filter on a log-$y$ scale,
5320
which makes it easier to see.
5321

5322
{\tt Spectrum} provides a method that applies the integration filter:
5323

5324
\begin{verbatim}
5325
# class Spectrum:
5326

5327
    def integrate(self):
5328
        self.hs /= PI2 * 1j * self.fs
5329
\end{verbatim}
5330

5331
We can confirm that the integration filter is correct by applying it
5332
to the spectrum of the derivative we just computed:
5333

5334
\begin{verbatim}
5335
    integ_spectrum = deriv_spectrum.copy()
5336
    integ_spectrum.integrate()
5337
\end{verbatim}
5338

5339
But notice that at $f=0$, we are dividing by 0.  The result in
5340
NumPy is NaN, which is a special floating-point value that
5341
represents ``not a number''.  We can partially deal with this
5342
problem by setting this value to 0 before converting the
5343
spectrum back to a wave:
5344

5345
\begin{verbatim}
5346
    integ_spectrum.hs[0] = 0
5347
    integ_wave = integ_spectrum.make_wave()
5348
\end{verbatim}
5349

5350
Figure~\ref{fig.diff_int5} shows this integrated derivative along with
5351
the original time series.  They are almost identical, but the
5352
integrated derivative has been shifted down.  The problem is that when
5353
we clobbered the $f=0$ component, we set the bias of the signal to 0.
5354
But that should not be surprising; in general, differentiation loses
5355
information about the bias, and integration can't recover it.  In some
5356
sense, the NaN at $f=0$ is telling us that this element is unknown.
5357

5358
\begin{figure}
5359
% diff_int.py
5360
\centerline{\includegraphics[height=2.5in]{figs/diff_int6.eps}}
5361
\caption{A sawtooth wave and its spectrum.}
5362
\label{fig.diff_int6}
5363
\end{figure}
5364

5365
If we provide this ``constant of integration'', the results are
5366
identical, which confirms that this integration filter is the correct
5367
inverse of the differentiation filter.
5368

5369
\section{Cumulative sum}
5370
\label{cumsum}
5371

5372
In the same way that the diff operator approximates differentiation,
5373
the cumulative sum approximates integration.
5374
I'll demonstrate with a Sawtooth signal.
5375

5376
\begin{verbatim}
5377
    signal = thinkdsp.SawtoothSignal(freq=50)
5378
    in_wave = signal.make_wave(duration=0.1, framerate=44100)
5379
\end{verbatim}
5380

5381
Figure~\ref{fig.diff_int6} shows this wave and its spectrum.
5382

5383
{\tt Wave} provides a method that computes the cumulative sum of
5384
a wave array and returns a new Wave object:
5385

5386
\begin{verbatim}
5387
# class Wave:
5388

5389
    def cumsum(self):
5390
        ys = np.cumsum(self.ys)
5391
        ts = self.ts.copy()
5392
        return Wave(ys, ts, self.framerate)
5393
\end{verbatim}
5394

5395
We can use it to compute the cumulative sum of \verb"in_wave":
5396

5397
\begin{verbatim}
5398
    out_wave = in_wave.cumsum()
5399
    out_wave.unbias()
5400
\end{verbatim}
5401

5402
\begin{figure}
5403
% diff_int.py
5404
\centerline{\includegraphics[height=2.5in]{figs/diff_int7.eps}}
5405
\caption{A parabolic wave and its spectrum.}
5406
\label{fig.diff_int7}
5407
\end{figure}
5408

5409
Figure~\ref{fig.diff_int7} shows the resulting wave and its spectrum.
5410
If you did the exercises in Chapter~\ref{harmonics}, this waveform should
5411
look familiar: it's a parabolic signal.
5412

5413
Comparing the spectrum of the parabolic signal to the spectrum of the
5414
sawtooth, the amplitudes of the components drop off more quickly.  In
5415
Chapter~\ref{harmonics}, we saw that the components of the sawtooth
5416
drop off in proportion to $1/f$.  Since the cumulative sum
5417
approximates integration, and integration filters components in
5418
proportion to $1/f$, the components of the parabolic wave drop off in
5419
proportion to $1/f^2$.
5420

5421
We can see that graphically by computing the filter that corresponds
5422
to the cumulative sum:
5423

5424
\begin{verbatim}
5425
    cumsum_filter = diff_filter.copy()
5426
    cumsum_filter.hs = 1 / cumsum_filter.hs
5427
\end{verbatim}
5428

5429
Because {\tt cumsum} is the inverse operation of {\tt diff}, we
5430
start with a copy of \verb"diff_filter", which is the filter
5431
that corresponds to the {\tt diff} operation, and then invert the
5432
{\tt hs}.
5433

5434
\begin{figure}
5435
% diff_int.py
5436
\centerline{\includegraphics[height=2.5in]{figs/diff_int8.eps}}
5437
\caption{Filters corresponding to cumulative sum and integration.}
5438
\label{fig.diff_int8}
5439
\end{figure}
5440

5441
Figure~\ref{fig.diff_int8} shows the filters corresponding to
5442
cumulative sum and integration.  The cumulative sum is a good
5443
approximation of integration except at the highest frequencies,
5444
where it drops off a little faster. 
5445

5446
To confirm that this is the correct filter for the cumulative
5447
sum, we can compare it to the ratio of the spectrum
5448
\verb"out_wave" to the spectrum of \verb"in_wave":
5449

5450
\begin{verbatim}
5451
    in_spectrum = in_wave.make_spectrum()
5452
    out_spectrum = out_wave.make_spectrum()
5453
    ratio_spectrum = out_spectrum.ratio(in_spectrum, thresh=1)
5454
\end{verbatim}
5455

5456
\begin{figure}
5457
% diff_int.py
5458
\centerline{\includegraphics[height=2.5in]{figs/diff_int9.eps}}
5459
\caption{Filter corresponding to cumulative sum and actual ratios of
5460
  the before-and-after spectrums.}
5461
\label{fig.diff_int9}
5462
\end{figure}
5463

5464
And here's the method that computes the ratios:
5465

5466
\begin{verbatim}
5467
    def ratio(self, denom, thresh=1):
5468
        ratio_spectrum = self.copy()
5469
        ratio_spectrum.hs /= denom.hs
5470
        ratio_spectrum.hs[denom.amps < thresh] = np.nan
5471
        return ratio_spectrum
5472
\end{verbatim}
5473

5474
When {\tt denom.amps} is small, the resulting ratio is noisy,
5475
so I set those values to NaN.
5476

5477
Figure~\ref{fig.diff_int8} shows the ratios and the filter
5478
corresponding to the cumulative sum.  They agree, which confirms that
5479
inverting the filter for {\tt diff} yields the filter for {\tt
5480
  cumsum}.
5481

5482
Finally, we can confirm that the Convolution Theorem applies by
5483
applying the {\tt cumsum} filter in the frequency domain:
5484

5485
\begin{verbatim}
5486
    out_wave2 = (in_spectrum * cumsum_filter).make_wave()
5487
\end{verbatim}
5488

5489
Within the limits of floating-point error, \verb"out_wave2" is
5490
identical to \verb"out_wave", which we computed using {\tt cumsum}, so
5491
the Convolution Theorem works!  But note that this demonstration only
5492
works with periodic signals.
5493

5494

5495
\section{Integrating noise}
5496

5497
In Section~\ref{brownian}, we generated Brownian noise by computing the
5498
cumulative sum of white noise.
5499
Now that we understand the effect of {\tt cumsum} in the frequency
5500
domain, we have some insight into the spectrum of Brownian noise.
5501

5502
White noise has equal power at all frequencies, on average.  When we
5503
compute the cumulative sum, the amplitude of each component is divided
5504
by $f$.  Since power is the square of magnitude, the power of each
5505
component is divided by $f^2$.  So on average, the power at frequency
5506
$f$ is proportional to $1 / f^2$:
5507
%
5508
\[ P_f = K / f^2 \]
5509
%
5510
where $K$ is a constant that's not important.
5511
Taking the log of both sides yields:
5512
%
5513
\[ \log P_f = \log K - 2 \log f \]
5514
%
5515
And that's why, when we plot the spectrum of Brownian noise on a
5516
log-log scale, we expect to see a straight line with slope -2, at
5517
least approximately.
5518

5519
In Section~\ref{diffs} we plotted the spectrum of closing prices for
5520
Facebook, and estimated that the slope is -1.9, which is consistent
5521
with Brownian noise.  Many stock prices have similar spectrums.
5522

5523
When we use the {\tt diff} operator to compute daily changes, we
5524
multiplied the {\em amplitude} of each component by a filter proportional to
5525
$f$, which means we multiplied the {\em power} of each component by $f^2$.
5526
On a log-log scale, this operation adds 2 to the slope of the
5527
power spectrum, which is why the estimated slope of the result
5528
is near 0.1 (but a little lower, because {\tt diff} only approximates
5529
differentiation).
5530

5531

5532

5533
\section{Exercises}
5534

5535
Solutions to these exercises are in {\tt chap09soln.ipynb}.
5536

5537
\begin{exercise}
5538
The notebook for this chapter is {\tt chap08.ipynb}.
5539
Read through it and run the code.
5540

5541
In Section~\ref{cumsum}, I mentioned that some of the
5542
examples don't work with non-periodic signals.  Try replacing the
5543
sawtooth wave, which is periodic, with the Facebook data, which is
5544
not, and see what goes wrong.
5545
\end{exercise}
5546

5547
\begin{exercise}
5548
The goal of this exercise is to explore the effect of {\tt diff} and
5549
{\tt differentiate} on a signal. Create a triangle wave and plot
5550
it. Apply {\tt diff} and plot the result. Compute the spectrum of the
5551
triangle wave, apply {\tt differentiate}, and plot the result. Convert
5552
the spectrum back to a wave and plot it. Are there differences between
5553
the effect of {\tt diff} and {\tt differentiate} for this wave?
5554
\end{exercise}
5555

5556
\begin{exercise}
5557
The goal of this exercise is to explore the effect of {\tt cumsum} and
5558
{\tt integrate} on a signal. Create a square wave and plot it. Apply
5559
{\tt cumsum} and plot the result. Compute the spectrum of the square
5560
wave, apply {\tt integrate}, and plot the result. Convert the spectrum
5561
back to a wave and plot it. Are there differences between the effect
5562
of {\tt cumsum} and {\tt integrate} for this wave?
5563
\end{exercise}
5564

5565
\begin{exercise}
5566
The goal of this exercise is the explore the effect of integrating
5567
twice. Create a sawtooth wave, compute its spectrum, then apply {\tt
5568
  integrate} twice. Plot the resulting wave and its spectrum. What is
5569
the mathematical form of the wave? Why does it resemble a sinusoid?
5570
\end{exercise}
5571

5572
\begin{exercise}
5573
The goal of this exercise is to explore the effect of the 2nd
5574
difference and 2nd derivative. Create a {\tt CubicSignal}, which is
5575
defined in {\tt thinkdsp}. Compute the second difference by applying
5576
{\tt diff} twice. What does the result look like?  Compute the second
5577
derivative by applying {\tt differentiate} to the spectrum twice.
5578
Does the result look the same?
5579

5580
Plot the filters that corresponds to the 2nd difference and the 2nd
5581
derivative and compare them. Hint: In order to get the filters on the
5582
same scale, use a wave with framerate 1.
5583
\end{exercise}
5584

5585

5586

5587

5588
\chapter{LTI systems}
5589
\label{systems}
5590

5591
This chapter presents the theory of signals and systems, using
5592
musical acoustics as an example.  And it explains an
5593
important application of the Convolution Theorem, characterization
5594
of linear, time-invariant systems (which I'll define soon).
5595

5596
The code for this chapter is in {\tt chap10.ipynb}, which is in the
5597
repository for this book (see Section~\ref{code}).
5598
You can also view it at \url{http://tinyurl.com/thinkdsp10}.
5599

5600

5601

5602
\section{Signals and systems}
5603

5604
In the context of signal processing, a {\bf system} is an abstract
5605
representation of anything that takes a signal as input and produces
5606
a signal as output.
5607

5608
For example, an electronic amplifier is a circuit that takes an
5609
electrical signal as input and produces a (louder) signal as output.
5610

5611
As another example, when you listen to a musical performance, you
5612
can think of the room as a system that takes the sound of the
5613
performance at the location where it is produced and produces a
5614
somewhat different sound at the location where you hear it.
5615

5616
A {\bf linear, time-invariant system}\footnote{My presentation here
5617
  follows \url{http://en.wikipedia.org/wiki/LTI_system_theory}.} is a
5618
system with these additional properties:
5619

5620
\begin{enumerate}
5621

5622
\item Linearity: If you put two inputs into the system at the same
5623
  time, the result is the sum of their outputs.  Mathematically, if an
5624
  input $x_1$ produces output $y_1$ and another input $x_2$ produces
5625
  $y_2$, then $a x_1 + b x_2$ produces $a y_1 + b y_2$, where $a$ and
5626
  $b$ are scalars.
5627

5628
\item Time invariance: The
5629
  effect of the system doesn't vary over time, or depend on the state
5630
  of the system.  So if inputs $x_1$ and $x_2$ differ by a shift in time,
5631
  their outputs $y_1$ and $y_2$ differ by the same shift, but are otherwise
5632
  identical.
5633

5634
\end{enumerate}
5635

5636
Many physical systems have these properties, at least approximately.
5637

5638
\begin{itemize}
5639

5640
\item Circuits that contain only resistors, capacitors and inductors are
5641
LTI, to the degree that the components behave like their idealized
5642
models.
5643

5644
\item Mechanical systems that contain springs, masses and
5645
dashpots are also LTI, assuming linear springs (force proportional
5646
to displacement) and dashpots (force proportional to velocity).
5647

5648
\item Also, and most relevant to applications in this book,
5649
the media that transmit sounds (including air, water
5650
and solids) are well-modeled by LTI systems.
5651

5652
\end{itemize}
5653

5654
LTI systems are described by linear differential equations, and
5655
the solutions of those equations are complex sinusoids (see
5656
\url{http://en.wikipedia.org/wiki/Linear_differential_equation}).
5657

5658
This result provides an algorithm for computing the effect of
5659
an LTI system on an input signal:
5660

5661
\begin{enumerate}
5662

5663
\item Express the signal as the sum of complex sinusoid components.
5664

5665
\item For each input component, compute the corresponding output component.
5666

5667
\item Add up the output components.
5668

5669
\end{enumerate}
5670

5671
At this point, I hope this algorithm sounds familiar.  It's the
5672
same algorithm we used for convolution in Section~\ref{effconv}, and
5673
for differentiation in Section~\ref{effdiff}.  This process
5674
is called {\bf spectral decomposition} because we ``decompose''
5675
the input signal into its spectral components.
5676

5677
In order to apply this process to an LTI system, we have to {\bf
5678
  characterize} the system by finding its effect on each component
5679
of the input signal.  For mechanical systems, it turns out that there
5680
is a simple and efficient way to do that: you kick it and record
5681
the output.
5682

5683
Technically, the ``kick'' is called an {\bf impulse} and the
5684
output is called the {\bf impulse response}.  You might wonder
5685
how a single impulse can completely characterize a system.  You
5686
can see the answer by computing the DFT of an impulse.  Here's
5687
a wave array with an impulse at $t=0$:
5688

5689
\begin{verbatim}
5690
>>> impulse = np.zeros(8)
5691
>>> impulse[0] = 1
5692
>>> impulse
5693
[ 1.  0.  0.  0.  0.  0.  0.  0.]
5694
\end{verbatim}
5695

5696
And here's its spectrum:
5697

5698
\begin{verbatim}
5699
>>> spectrum = np.fft.fft(impulse)
5700
>>> spectrum
5701
[ 1.+0.j  1.+0.j  1.+0.j  1.+0.j  1.+0.j  1.+0.j  1.+0.j  1.+0.j]
5702
\end{verbatim}
5703

5704
That's right: an impulse is the sum of components with equal
5705
magnitudes at all frequencies\footnote{Not to be confused with white
5706
  noise, which has the same average power at all frequencies, but
5707
  varies around that average.} (and a phase offset of 0).
5708

5709
Therefore when you test a system by inputting
5710
an impulse, you are testing the response of the 
5711
system at all frequencies.  And you can test them all at the same
5712
time because the system is linear, so simultaneous tests don't
5713
interfere with each other.
5714

5715

5716
\section{Transfer functions}
5717

5718
To characterize the acoustic response of a room or open space, a
5719
simple way to generate an impulse is to pop a balloon or
5720
fire a gun.  A gunshot puts an impulse into
5721
the system; the sound you hear is the impulse response.
5722

5723
%TODO: update the numbers for the figures in this chapter
5724
\begin{figure}
5725
% systems.py
5726
\centerline{\includegraphics[height=2.5in]{figs/systems6.eps}}
5727
\caption{Waveform of a gunshot.}
5728
\label{fig.systems6}
5729
\end{figure}
5730

5731
As an example, I'll use a recording of a gunshot to characterize
5732
the room where the gun was fired, then use the impulse response
5733
to simulate the effect of that room on a violin recording.
5734

5735
This example is in {\tt chap10.ipynb}, which is in the repository
5736
for this book; you can also view it, and listen to the examples,
5737
at \url{http://tinyurl.com/thinkdsp10}.
5738

5739
Here's the gunshot:
5740

5741
\begin{verbatim}
5742
    response = thinkdsp.read_wave('180961__kleeb__gunshots.wav')
5743
    response = response.segment(start=0.26, duration=5.0)
5744
    response.normalize()
5745
    response.plot()
5746
\end{verbatim}
5747

5748
I select a segment starting at 0.26 seconds to remove the silence
5749
before the gunshot.  Figure~\ref{fig.systems6} (left) shows the
5750
waveform of the gunshot.  Next we compute the DFT of {\tt response}:
5751

5752
\begin{verbatim}
5753
    transfer = response.make_spectrum()
5754
    transfer.plot()
5755
\end{verbatim}
5756

5757
Figure~\ref{fig.systems6} (right) shows the result.  This spectrum
5758
encodes the response of the room; for each frequency, the spectrum
5759
contains a complex number that represents an amplitude multiplier and
5760
a phase shift.  This spectrum is called a {\bf transfer
5761
function} because it contains information about how the system transfers
5762
the input to the output.
5763

5764
Now we can simulate the effect this room would have on the sound
5765
of a violin.  Here is the violin recording we used in Section~\ref{violin}
5766

5767
\begin{verbatim}
5768
    wave = thinkdsp.read_wave('92002__jcveliz__violin-origional.wav')
5769
    wave.ys = wave.ys[:len(response)]
5770
    wave.normalize()
5771
\end{verbatim}
5772

5773
The violin and gunshot waves were sampled at the same framerate,
5774
44,100 Hz.  And coincidentally, the duration of both is about the
5775
same.  I trimmed the violin wave to the same length as the gunshot.
5776

5777
Next I compute the DFT of the violin wave:
5778

5779
\begin{verbatim}
5780
    spectrum = wave.make_spectrum()
5781
\end{verbatim}
5782

5783
Now I know the magnitude and phase of each component in the
5784
input, and I know the transfer function of the system.  Their
5785
product is the DFT of the output, which we can use to compute the
5786
output wave:
5787

5788
\begin{verbatim}
5789
    output = (spectrum * transfer).make_wave()
5790
    output.normalize()
5791
    output.plot()
5792
\end{verbatim}
5793

5794
%TODO: Am I missing a figure here?
5795
Figure~\ref{fig.systems7} shows the output of the system superimposed
5796
on the input.  The overall shape is similar, but there are substantial
5797
differences.  And those differences are clearly audible.  Load
5798
{\tt chap10.ipynb} and listen to them.  One thing I find striking
5799
about this example is that you can get a sense of what the room
5800
was like; to me, it sounds like a long, narrow room with hard floors
5801
and ceilings.  That is, like a firing range.
5802

5803
There's one things I glossed over in this example that I'll mention
5804
in case it bothers anyone.  The violin recording I started with
5805
has already been transformed by one system: the room where it was
5806
recorded.  So what I really computed in my example is the sound
5807
of the violin after two transformations.  To properly simulate
5808
the sound of a violin in a different room, I should have characterized
5809
the room where the violin was recorded and applied the inverse
5810
of that transfer function first.
5811

5812

5813
\section{Systems and convolution}
5814

5815
\begin{figure}
5816
% systems.py
5817
\centerline{\includegraphics[height=2.5in]{figs/systems8.eps}}
5818
\caption{Sum of a gunshot waveform with a shifted, scaled version of
5819
itself.}
5820
\label{fig.systems8}
5821
\end{figure}
5822

5823
If you think the previous example is black magic,
5824
you are not alone.  I've been thinking about it for a while and it
5825
still makes my head hurt.
5826

5827
In the previous section, I suggested one way to think about it:
5828

5829
\begin{itemize}
5830

5831
\item An impulse is made up of components with amplitude 1 at all
5832
  frequencies.
5833

5834
\item The impulse response contains the sum of the responses of the
5835
  system to all of these components.
5836

5837
\item The transfer function, which is the DFT of the impulse response,
5838
  encodes the effect of the system on each frequency component in the form
5839
  of an amplitude multiplier and a phase shift.
5840

5841
\item For any input, we can compute the response of the system
5842
  by breaking the input into components, computing the response to
5843
  each component, and adding them up.
5844

5845
\end{itemize}
5846

5847
But if you don't like that, there's another way to think about
5848
it altogether: convolution!
5849

5850
Suppose that instead of firing one gun, you fire two guns:
5851
a big one with amplitude 1 at $t=0$ and a smaller one with
5852
amplitude 0.5 at $t=1$.
5853

5854
We can compute the response of the system by adding up
5855
the original impulse response and a scaled, shifted version of itself.
5856
Here's a function that makes a shifted, scaled version of
5857
a wave:
5858

5859
\begin{verbatim}
5860
def shifted_scaled(wave, shift, factor):
5861
    res = wave.copy()
5862
    res.shift(shift)
5863
    res.scale(factor)
5864
    return res
5865
\end{verbatim}
5866

5867
Here's how we use it to compute the response to a two-gun salute:
5868

5869
\begin{verbatim}
5870
dt = 1
5871
shift = dt * response.framerate
5872
factor = 0.5
5873
response2 = response + shifted_scaled(response, shift, factor)
5874
\end{verbatim}
5875

5876
Figure~\ref{fig.systems8} shows the result.  You can hear what
5877
it sounds like in {\tt chap10.ipynb}.  Not surprisingly, it
5878
sounds like two gunshots, the first one louder than the second.
5879

5880
Now suppose instead of two guns, you add up 100 guns, with
5881
a shift of 100 between them.  This loop computes the result:
5882

5883
\begin{verbatim}
5884
    total = 0
5885
    for j in range(100):
5886
        total += shifted_scaled(response, j*100, 1.0)
5887
\end{verbatim}
5888

5889
At a framerate of 44,100 Hz, there are 441 gunshots per second,
5890
so you don't hear the individual shots.  Instead, it sounds
5891
like a periodic signal at 441 Hz.  If you play this example, it
5892
sounds like a car horn in a garage.
5893

5894
And that brings us to a key insight: you can think of any wave as a
5895
series of samples, where each sample is an impulse with a different
5896
magnitude.
5897

5898
As a example, I'll generate a sawtooth signal at 410 Hz:
5899

5900
\begin{verbatim}
5901
    signal = thinkdsp.SawtoothSignal(freq=410)
5902
    wave = signal.make_wave(duration=0.1,
5903
                            framerate=response.framerate)
5904
\end{verbatim}
5905

5906
Now I'll loop through the series of impulses that make up the
5907
sawtooth, and add up the impulse responses:
5908

5909
\begin{verbatim}
5910
    total = 0
5911
    for j, y in enumerate(wave.ys):
5912
        total += shifted_scaled(response, j, y)
5913
\end{verbatim}
5914

5915
The result is what it would sound like to play a sawtooth wave in a
5916
firing range.  Again, you can listen to it in {\tt chap10.ipynb}.
5917

5918
\begin{figure}
5919
\input{diagram2}
5920
\caption{Diagram of the sum of scaled and shifted copies of $g$.}
5921
\label{fig.convolution}
5922
\end{figure}
5923

5924
Figure~\ref{fig.convolution} shows a diagram of this computation,
5925
where $f$ is the sawtooth, $g$ is the impulse response, and $h$
5926
is the sum of the shifted, scaled copies of $g$.
5927

5928
For the example shown:
5929

5930
\[ h[2] = f[0]g[2] + f[1]g[1] + f[2]g[0]  \]
5931

5932
And more generally,
5933

5934
\[ h[n] = \sum_{m=0}^{N-1} f[m] g[n-m]  \]
5935

5936
You might recognize this equation from Section~\ref{convolution}.  It
5937
is the convolution of $f$ and $g$.  This shows that if the input is $f$
5938
and the impulse response of the system is $g$, the output is the
5939
convolution of $f$ and $g$.
5940

5941
In summary, there are two ways to think about the effect of a system
5942
on a signal:
5943

5944
\begin{enumerate}
5945

5946
\item The input is a sequence of impulses, so the output is the sum of
5947
  scaled, shifted copies of the impulse response; that sum is the
5948
  convolution of the input and the impulse response.
5949

5950
\item The DFT of the impulse response is a transfer function that
5951
  encodes the effect of the system on each frequency component as a
5952
  magnitude and phase offset.  The DFT of the input encodes the
5953
  magnitude and phase offset of the frequency components it contains.
5954
  Multiplying the DFT of the input by the transfer function yields
5955
  the DFT of the output.
5956

5957
\end{enumerate}
5958

5959
The equivalence of these two descriptions should not be a surprise;
5960
it is basically a statement of the Convolution Theorem:
5961
convolution of $f$ and $g$ in the time
5962
domain corresponds to multiplication in the frequency domain.
5963
And if you wondered why convolution is defined as it is, which
5964
seemed backwards when we talked about smoothing and difference
5965
windows, now you know the reason: the definition of convolution
5966
appears naturally in the response of an LTI system to a signal.
5967

5968

5969
\section{Proof of the Convolution Theorem}
5970

5971
Well, I've put it off long enough.  It's time to prove the Convolution
5972
Theorem (CT), which states:
5973

5974
\[ \DFT(f \conv g) = \DFT(f) \DFT(g) \]
5975

5976
where $f$ and $g$ are vectors with the same length, $N$.
5977

5978
I'll proceed in two steps:
5979

5980
\begin{enumerate}
5981

5982
\item I'll show that in the special case where $f$ is a complex
5983
exponential, convolution with $g$ has the effect of multiplying
5984
$f$ by a scalar.  
5985

5986
\item In the more general case where $f$ is not a complex exponential,
5987
we can use the DFT to express it as a sum of exponential components,
5988
compute the convolution of each component (by multiplication) and
5989
then add up the results.
5990

5991
\end{enumerate}
5992

5993
Together these steps prove the Convolution Theorem.  But first, let's
5994
assemble the pieces we'll need.  The DFT of $g$ is:
5995
%
5996
\[ DFT(g) = G[k] = \sum_n g[n] \exp(-2 \pi i n k / N) \]
5997
%
5998
where $k$ is an index of frequency from
5999
0 to $N-1$ and $n$ is an index of time from 0 to $N-1$.
6000
The result, $G$, is a vector of $N$ complex numbers.
6001

6002
The inverse DFT of $F$, which I'll call $f$, is:
6003
%
6004
\[ IDFT(F) = f[n] = \sum_k F[k] \exp(2 \pi i n k / N) \]
6005
%
6006
And here's the definition of convolution:
6007
%
6008
\[ (f \conv g)[n] = \sum_m f[m] g[n-m] \]
6009
%
6010
where $m$ is another index of time from 0 to $N-1$.
6011
Convolution is commutative, so I could equivalently write:
6012
%
6013
\[ (f \conv g)[n] = \sum_m f[n-m] g[m] \]
6014
%
6015
Now let's consider the special case where $f$ is a complex
6016
exponential with frequency $k$, which I'll call $e_k$:
6017
%
6018
\[ f[n] = e_k[n] = \exp(2 \pi i n k / N) \]
6019
%
6020
where $k$ is an index of frequency and $n$ is an index of time.
6021

6022
Plugging $e_k$ into the second definition of convolution yields
6023
%
6024
\[ (e_k \conv g)[n] = \sum_m \exp(2 \pi i (n-m) k / N) g[m]  \]
6025
%
6026
We can split the first term into a product:
6027
%
6028
\[ (e_k \conv g)[n] = \sum_m \exp(2 \pi i n k / N) \exp(-2 \pi i m k / N) g[m]  \]
6029
%
6030
The first half does not depend on $m$, so we can pull it out of the
6031
summation:
6032
%
6033
\[ (e_k \conv g)[n] = \exp(2 \pi i n k / N) \sum_m \exp(-2 \pi i m k / N) g[m]  \]
6034
%
6035
Now we recognize that the first term is $e_k$, and the summation is
6036
$G[k]$ (using $m$ as the index of time).  So we can write:
6037
%
6038
\[ (e_k \conv g)[n] = e_k[n] G[k] \]
6039
%
6040
which shows that for each complex exponential, $e_k$, convolution
6041
with $g$ has the effect of multiplying $e_k$ by $G[k]$.  In mathematical
6042
terms, each $e_k$ is an eigenvector of this operation, and
6043
$G[k]$ is the corresponding eigenvalue.
6044

6045
Now for the second part of the proof.  If the input signal, $f$, doesn't
6046
happen to be a complex exponential, we can express it as a sum of
6047
complex exponentials by computing its DFT, $F$.
6048
For each value of $k$ from 0 to $N-1$, $F[k]$ is the complex
6049
magnitude of the component with frequency $k$.
6050

6051
Each input component is a complex exponential with magnitude
6052
$F[k]$, so each output component is a complex
6053
exponential with magnitude $F[k]G[k]$, based on the first part of
6054
the proof.
6055

6056
Because the system is linear, the output is just the sum of the
6057
output components:
6058
%
6059
\[ (f \conv g)[n] = \sum_k F[k] G[k] e_k[n] \]
6060
%
6061
Plugging in the definition of $e_k$ yields
6062
%
6063
\[ (f \conv g)[n] = \sum_k F[k] G[k] \exp(2 \pi i n k / N) \]
6064
%
6065
The right hand side is the inverse DFT of the product $F G$.  Thus:
6066
%
6067
\[ (f \conv g) = \IDFT( F G ) \]
6068
%
6069
Substituting $F = \DFT(f)$ and $G = \DFT(g)$:
6070
%
6071
\[ (f \conv g) = \IDFT( \DFT(f) \DFT(g) ) \]
6072
%
6073
Finally, taking the DFT of both sides yields the Convolution Theorem:
6074
%
6075
\[ \DFT(f \conv g) = \DFT(f) \DFT(g) \]
6076
%
6077
QED
6078

6079

6080
\section{Exercises}
6081

6082
% TODO: make exercises
6083
\begin{exercise}
6084
\end{exercise}
6085

6086
\begin{exercise}
6087
\end{exercise}
6088

6089
Solutions to these exercises are in {\tt chap10soln.ipynb}.
6090

6091

6092
\chapter{Modulation and sampling}
6093

6094
In Section~\ref{aliasing} I defined the folding frequency, also called
6095
the Nyquist frequency, and I claimed that ???
6096

6097
This chapter presents the theory of signals and systems, using
6098
musical acoustics as an example.  And it explains an
6099
important application of the Convolution Theorem, characterization
6100
of linear, time-invariant systems (which I'll define soon).
6101

6102
The code for this chapter is in {\tt chap11.ipynb}, which is in the
6103
repository for this book (see Section~\ref{code}).
6104
You can also view it at \url{http://tinyurl.com/thinkdsp11}.
6105

6106

6107
\section{Convolution with impulses}
6108

6109
\begin{figure}
6110
% sampling.py
6111
\centerline{\includegraphics[height=2.5in]{figs/sampling1.eps}}
6112
\caption{.}
6113
\label{fig.sampling1}
6114
\end{figure}
6115

6116
Figure~\ref{fig.sampling1} shows 
6117

6118

6119

6120

6121

6122

6123

6124
\end{document}
6125

6126

6127

6128
Case study ideas:
6129

6130
\begin{itemize}
6131

6132
\item Pitch tracking in Rock Band
6133

6134
\item Auto-Tune
6135

6136
\item JPEG compression
6137

6138
\item Image recognition in Fourier space (money)
6139

6140
\item Application of cepstrum?
6141

6142
\item Application of z transform -- digital control systems
6143

6144
\item tilt shift effect on a photo
6145

6146

6147

6148
\end{itemize}
6149

6150
Product

Resources

Company