CoCalc -- book.tex

📚 The CoCalc Library - books, templates and other resources
cocalc-examples / think-dsp / book / book.tex
¹³²⁹²⁸ views
License: OTHER
1
% LaTeX source for ``Think DSP: Digital Signal Processing for Programmers''
2
% Copyright 2014  Allen B. Downey.
3

4
% License: Creative Commons Attribution-NonCommercial 3.0 Unported License.
5
% http://creativecommons.org/licenses/by-nc/3.0/
6
%
7

8
\documentclass[12pt]{book}
9
\usepackage[width=5.5in,height=8.5in,
10
  hmarginratio=3:2,vmarginratio=1:1]{geometry}
11

12
% for some of these packages, you might have to install
13
% texlive-latex-extra (in Ubuntu)
14

15
\usepackage[T1]{fontenc}
16
\usepackage{textcomp}
17
\usepackage{mathpazo}
18
\usepackage{url}
19
\usepackage{graphicx}
20
\usepackage{subfig}
21
\usepackage{amsmath}
22
\usepackage{amsthm}
23
\usepackage{makeidx}
24
\usepackage{setspace}
25
\usepackage{hevea}                           
26
\usepackage{upquote}
27
\usepackage{fancyhdr}
28
\usepackage[bookmarks]{hyperref}
29

30
\title{Think DSP}
31
\author{Allen B. Downey}
32

33
\newcommand{\thetitle}{Think DSP: Digital Signal Processing in Python}
34
\newcommand{\theversion}{1.0.5}
35

36
% these styles get translated in CSS for the HTML version
37
\newstyle{a:link}{color:black;}
38
\newstyle{p+p}{margin-top:1em;margin-bottom:1em}
39
\newstyle{img}{border:0px}
40

41
% change the arrows in the HTML version
42
\setlinkstext
43
  {\imgsrc[ALT="Previous"]{back.png}}
44
  {\imgsrc[ALT="Up"]{up.png}}
45
  {\imgsrc[ALT="Next"]{next.png}} 
46

47
\makeindex
48

49
\newif\ifplastex
50
\plastexfalse
51

52
\begin{document}
53

54
\frontmatter
55

56
\ifplastex
57

58
\else
59
\fi
60

61
\ifplastex
62
    \usepackage{localdef}
63
    \maketitle
64

65
\else
66

67
\newtheoremstyle{exercise}% name of the style to be used
68
  {\topsep}% measure of space to leave above the theorem. E.g.: 3pt
69
  {\topsep}% measure of space to leave below the theorem. E.g.: 3pt
70
  {}% name of font to use in the body of the theorem
71
  {}% measure of space to indent
72
  {\bfseries}% name of head font
73
  {}% punctuation between head and body
74
  { }% space after theorem head; " " = normal interword space
75
  {}% Manually specify head
76

77
\theoremstyle{exercise}
78
\newtheorem{exercise}{Exercise}[chapter]
79

80
\input{latexonly}
81

82
\begin{latexonly}
83

84
\renewcommand{\topfraction}{0.9}
85
\renewcommand{\blankpage}{\thispagestyle{empty} \quad \newpage}
86

87
% TITLE PAGES FOR LATEX VERSION
88

89
%-half title--------------------------------------------------
90
\thispagestyle{empty}
91

92
\begin{flushright}
93
\vspace*{2.0in}
94

95
\begin{spacing}{3}
96
{\huge Think DSP}\\
97
{\Large Digital Signal Processing in Python}
98
\end{spacing}
99

100
\vspace{0.25in}
101

102
Version \theversion
103

104
\vfill
105

106
\end{flushright}
107

108
%--verso------------------------------------------------------
109

110
\blankpage
111
\blankpage
112

113
%--title page--------------------------------------------------
114
\pagebreak
115
\thispagestyle{empty}
116

117
\begin{flushright}
118
\vspace*{2.0in}
119

120
\begin{spacing}{3}
121
{\huge Think DSP}\\
122
{\Large Digital Signal Processing in Python}
123
\end{spacing}
124

125
\vspace{0.25in}
126

127
Version \theversion
128

129
\vspace{1in}
130

131

132
{\Large
133
Allen B. Downey\\
134
}
135

136

137
\vspace{0.5in}
138

139
{\Large Green Tea Press}
140

141
{\small Needham, Massachusetts}
142

143
\vfill
144

145
\end{flushright}
146

147

148
%--copyright--------------------------------------------------
149
\pagebreak
150
\thispagestyle{empty}
151

152
Copyright \copyright ~2014 Allen B. Downey.
153

154

155
\vspace{0.2in}
156

157
\begin{flushleft}
158
Green Tea Press       \\
159
9 Washburn Ave \\
160
Needham MA 02492
161
\end{flushleft}
162

163
Permission is granted to copy, distribute, and/or modify this document
164
under the terms of the Creative Commons Attribution-NonCommercial 3.0 Unported
165
License, which is available at \url{http://creativecommons.org/licenses/by-nc/3.0/}.
166

167
\vspace{0.2in}
168

169
\end{latexonly}
170

171

172
% HTMLONLY
173

174
\begin{htmlonly}
175

176
% TITLE PAGE FOR HTML VERSION
177

178
{\Large \thetitle}
179

180
{\large Allen B. Downey}
181

182
Version \theversion
183

184
\vspace{0.25in}
185

186
Copyright 2012 Allen B. Downey
187

188
\vspace{0.25in}
189

190
Permission is granted to copy, distribute, and/or modify this document
191
under the terms of the Creative Commons Attribution-NonCommercial 3.0
192
Unported License, which is available at
193
\url{http://creativecommons.org/licenses/by-nc/3.0/}.
194

195
\setcounter{chapter}{-1}
196

197
\end{htmlonly}
198

199
\fi
200
% END OF THE PART WE SKIP FOR PLASTEX
201

202
\chapter{Preface}
203
\label{preface}
204

205
Signal processing is one of my favorite topics.  It is useful
206
in many areas of science and engineering, and if you understand
207
the fundamental ideas, it provides insight into many things
208
we see in the world, and especially the things we hear.
209

210
But unless you studied electrical or mechanical engineering, you
211
probably haven't had a chance to learn about signal processing.  The
212
problem is that most books (and the classes that use them) present the
213
material bottom-up, starting with mathematical abstractions like
214
phasors.  And they tend to be theoretical, with few applications and
215
little apparent relevance.
216

217
The premise of this book is that if you know how to program, you
218
can use that skill to learn other things, and have fun doing it.
219

220
With a programming-based approach, I can present the most important
221
ideas right away.  By the end of the first chapter, you can analyze
222
sound recordings and other signals, and generate new sounds.  Each
223
chapter introduces a new technique and an application you can
224
apply to real signals.  At each step you learn how to use a
225
technique first, and then how it works.
226

227
This approach is more practical and, I hope you'll agree, more fun.
228

229

230
\section{Who is this book for?}
231

232
The examples and supporting code for this book are in Python.  You
233
should know core Python and you should be
234
familiar with object-oriented features, at least using objects if not
235
defining your own.
236

237
If you are not already familiar with Python, you might want to start
238
with my other book, {\it Think Python}, which is an introduction to
239
Python for people who have never programmed, or Mark
240
Lutz's {\it Learning Python}, which might be better for people with
241
programming experience.
242

243
I use NumPy and SciPy extensively.  If you are familiar with them
244
already, that's great, but I will also explain the functions
245
and data structures I use.
246

247
I assume that the reader knows basic mathematics, including complex
248
numbers.  You don't need much calculus; if you understand the concepts
249
of integration and differentiation, that will do.
250
I use some linear algebra, but I will explain it as we
251
go along.
252

253

254
\section{Using the code}
255
\label{code}
256

257
The code and sound samples used in this book are available from
258
\url{https://github.com/AllenDowney/ThinkDSP}.  Git is a version
259
control system that allows you to keep track of the files that
260
make up a project.  A collection of files under Git's control is
261
called a ``repository''.  GitHub is a hosting service that provides
262
storage for Git repositories and a convenient web interface.
263
\index{repository}
264
\index{Git}
265
\index{GitHub}
266

267
The GitHub homepage for my repository provides several ways to
268
work with the code:
269

270
\begin{itemize}
271

272
\item You can create a copy of my repository
273
on GitHub by pressing the {\sf Fork} button.  If you don't already
274
have a GitHub account, you'll need to create one.  After forking, you'll
275
have your own repository on GitHub that you can use to keep track
276
of code you write while working on this book.  Then you can
277
clone the repo, which means that you copy the files
278
to your computer.
279
\index{fork}
280

281
\item Or you could clone
282
my repository.  You don't need a GitHub account to do this, but you
283
won't be able to write your changes back to GitHub.
284
\index{clone}
285

286
\item If you don't want to use Git at all, you can download the files
287
in a Zip file using the button in the lower-right corner of the
288
GitHub page.
289

290
\end{itemize}
291

292
All of the code is written to work in both Python 2 and Python 3
293
with no translation.
294

295
I developed this book using Anaconda from
296
Continuum Analytics, which is a free Python distribution that includes
297
all the packages you'll need to run the code (and lots more).
298
I found Anaconda easy to install.  By default it does a user-level
299
installation, not system-level, so you don't need administrative
300
privileges.  And it supports both Python 2 and Python 3.  You can
301
download Anaconda from \url{https://www.anaconda.com/distribution/}.
302
\index{Anaconda}
303

304
If you don't want to use Anaconda, you will need the following
305
packages:
306

307
\begin{itemize}
308

309
\item NumPy for basic numerical computation, \url{http://www.numpy.org/};
310
\index{NumPy}
311

312
\item SciPy for scientific computation,
313
  \url{http://www.scipy.org/};
314
\index{SciPy}
315

316
\item matplotlib for visualization, \url{http://matplotlib.org/}.
317
\index{matplotlib}
318

319
\end{itemize}
320

321
Although these are commonly used packages, they are not included with
322
all Python installations, and they can be hard to install in some
323
environments.  If you have trouble installing them, I
324
recommend using Anaconda or one of the other Python distributions
325
that include these packages.
326
\index{installation}
327

328
Most exercises use Python scripts, but some also use Jupyter
329
notebooks.  If you have not used Jupyter before, you can read about
330
it at \url{http://jupyter.org}.
331
\index{Jupyter}
332

333
There are three ways you can work with the Jupyter notebooks:
334

335
\begin{description}
336

337
\item[Run Jupyter on your computer]
338

339
If you installed Anaconda, you
340
  probably got Jupyter by default.  To check, start the server from
341
the command line, like this:
342

343
\begin{verbatim}
344
$ jupyter notebook
345
\end{verbatim}
346

347
If it's not installed, you can install it in Anaconda like this
348

349
\begin{verbatim}
350
$ conda install jupyter
351
\end{verbatim}
352

353
When you start the server, it should launch your default web browser
354
or create a new tab in an open browser window.
355

356
\item[Run Jupyter on Binder]
357

358
Binder is a service that runs Jupyter in a virtual machine.  If you
359
follow this link, \url{http://mybinder.org/repo/AllenDowney/ThinkDSP},
360
you should get a Jupyter home page with the notebooks for this book
361
and the supporting data and scripts.
362

363
You can run the scripts and modify them to run your own code, but the
364
virtual machine you run in is temporary.  Any changes you make will
365
disappear, along with the virtual machine, if you leave it idle for
366
more than about an hour.
367

368
\item[View notebooks on nbviewer]
369

370
When we refer to notebooks later in the book, we will provide links to
371
nbviewer, which provides a static view of the code and results.  You
372
can use these links to read the notebooks and listen to the examples,
373
but you won't be able to modify or run the code, or use the
374
interactive widgets.
375

376
\end{description}
377

378
Good luck, and have fun!
379

380

381

382
\section*{Contributor List}
383

384
If you have a suggestion or correction, please send email to 
385
{\tt downey@allendowney.com}.  If I make a change based on your
386
feedback, I will add you to the contributor list
387
(unless you ask to be omitted).
388
\index{contributors}
389

390
If you include at least part of the sentence the
391
error appears in, that makes it easy for me to search.  Page and
392
section numbers are fine, too, but not as easy to work with.
393
Thanks!
394

395
\small
396

397
\begin{itemize}
398

399
\item Before I started writing, my thoughts about this book
400
benefited from conversations with Boulos Harb at Google and
401
Aurelio Ramos, formerly at Harmonix Music Systems.
402

403
\item During the Fall 2013 semester, Nathan Lintz and Ian Daniher
404
worked with me on an independent study project and helped me with
405
the first draft of this book.
406

407
\item On Reddit's DSP forum, the anonymous user RamjetSoundwave
408
helped me fix a problem with my implementation of Brownian Noise.
409
And andodli found a typo.
410

411
\item In Spring 2015 I had the pleasure of teaching this material
412
along with Prof. Oscar Mur-Miranda and Prof. Siddhartan Govindasamy.
413
Both made many suggestions and corrections.
414

415
\item Silas Gyger corrected an arithmetic error.
416

417
\item Giuseppe Masetti sent a number of very helpful suggestions.
418

419
\item Eric Peters sent many helpful suggestions.
420

421
% ENDCONTRIB
422

423
\end{itemize}
424

425

426
Special thanks to Freesound, which is the source of many of the
427
sound samples I use in this book, and to the Freesound users who
428
uploaded those sounds.  I include some of their wave files in
429
the GitHub repository for this book, using the original file
430
names, so it should be easy to find their sources.
431

432
Unfortunately, most Freesound users don't make their real names
433
available, so I can only thank them using their user names.  Samples
434
used in this book were contributed by Freesound users: iluppai,
435
wcfl10, thirsk, docquesting, kleeb, landup, zippi1, themusicalnomad,
436
bcjordan, rockwehrmann, marcgascon7, jcveliz.  Thank you all!
437

438
%100475__iluppai__saxophone-weep.wav
439
%http://www.freesound.org/people/iluppai/sounds/100475/
440

441
%105977__wcfl10__favorite-station.wav
442
%http://www.freesound.org/people/wcfl10/sounds/105977/
443

444
%120994__thirsk__120-oboe.wav
445
%http://www.freesound.org/people/Thirsk/sounds/120994/
446

447
%132736__ciccarelli__ocean-waves.wav
448
%http://www.freesound.org/people/ciccarelli/sounds/132736/
449

450
%180960__kleeb__gunshot.wav
451
%http://www.freesound.org/people/Kleeb/sounds/180960/
452

453
%18871__zippi1__sound-bell-440hz.wav
454
%http://www.freesound.org/people/zippi1/sounds/18871/
455

456
%253887__themusicalnomad__positive-beeps.wav
457
%http://www.freesound.org/people/themusicalnomad/sounds/253887/
458

459
%28042__bcjordan__voicedownbew.wav
460
%http://www.freesound.org/people/bcjordan/sounds/28042/
461

462
%72475__rockwehrmann__glissup02.wav
463
%http://www.freesound.org/people/rockwehrmann/sounds/72475/
464

465
%87778__marcgascon7__vocals.wav
466
%http://www.freesound.org/people/marcgascon7/sounds/87778/
467

468
%92002__jcveliz__violin-origional.wav
469
%http://www.freesound.org/people/jcveliz/sounds/92002/
470

471

472
\normalsize
473

474
\clearemptydoublepage
475

476
% TABLE OF CONTENTS
477
\begin{latexonly}
478

479
\tableofcontents
480

481
\clearemptydoublepage
482

483
\end{latexonly}
484

485
% START THE BOOK
486
\mainmatter
487

488

489
\chapter{Sounds and signals}
490
\label{sounds}
491

492
A {\bf signal} represents a quantity that varies in time.  That
493
definition is pretty abstract, so let's start with a concrete example:
494
sound.  Sound is variation in air pressure.  A sound signal represents
495
variations in air pressure over time.
496

497
A microphone is a device that measures these variations and generates
498
an electrical signal that represents sound.  A speaker is a device
499
that takes an electrical signal and produces sound.
500
Microphones and speakers are called {\bf transducers} because they
501
transduce, or convert, signals from one form to another.
502

503
This book is about signal processing, which includes processes for
504
synthesizing, transforming, and analyzing signals.  I will focus on
505
sound signals, but the same methods apply to electronic signals,
506
mechanical vibration, and signals in many other domains.
507

508
They also apply to signals that vary in space rather than time, like
509
elevation along a hiking trail.  And they apply to signals in more
510
than one dimension, like an image, which you can think of as a signal
511
that varies in two-dimensional space.  Or a movie, which is
512
a signal that varies in two-dimensional space {\it and} time.
513

514
But we start with simple one-dimensional sound.
515

516
The code for this chapter is in {\tt chap01.ipynb}, which is in the
517
repository for this book (see Section~\ref{code}).
518
You can also view it at \url{http://tinyurl.com/thinkdsp01}.
519

520

521
\section{Periodic signals}
522
\label{violin}
523

524
\begin{figure}
525
% sounds.py
526
\centerline{\includegraphics[height=2.5in]{figs/sounds1.pdf}}
527
\caption{Segment from a recording of a bell.}
528
\label{fig.sounds1}
529
\end{figure}
530

531
We'll start with {\bf periodic signals}, which are signals that
532
repeat themselves after some period of time.  For example, if you
533
strike a bell, it vibrates and generates sound.  If you record
534
that sound and plot the transduced signal, it looks like
535
Figure~\ref{fig.sounds1}.
536

537
This signal resembles a {\bf sinusoid}, which means it has the same
538
shape as the trigonometric sine function.
539

540
You can see that this signal is periodic.  I chose the duration
541
to show three full repetitions, also known as {\bf cycles}.
542
The duration of each cycle, called the {\bf period}, is about 2.3 ms.
543

544
The {\bf frequency} of a signal is the number of cycles
545
per second, which is the inverse of the period.
546
The units of frequency are cycles per second, or {\bf Hertz},
547
abbreviated ``Hz''.  (Strictly speaking, the number of cycles is
548
a dimensionless number, so a Hertz is really a ``per second'').
549

550
The frequency of this signal is about 439 Hz, slightly lower than 440
551
Hz, which is the standard tuning pitch for orchestral music.  The
552
musical name of this note is A, or more specifically, A4.  If you are
553
not familiar with ``scientific pitch notation'', the numerical suffix
554
indicates which octave the note is in.  A4 is the A above middle C.
555
A5 is one octave higher.  See
556
\url{http://en.wikipedia.org/wiki/Scientific_pitch_notation}.
557

558
\begin{figure}
559
% sounds.py
560
\centerline{\includegraphics[height=2.5in]{figs/sounds2.pdf}}
561
\caption{Segment from a recording of a violin.}
562
\label{fig.sounds2}
563
\end{figure}
564

565
A tuning fork generates a sinusoid because the vibration of the tines
566
is a form of simple harmonic motion.  Most musical instruments
567
produce periodic signals, but the shape of these signals is not
568
sinusoidal.  For example, Figure~\ref{fig.sounds2} shows a segment
569
from a recording of a violin playing
570
Boccherini's String Quintet No. 5 in E, 3rd
571
movement.
572

573
%\footnote{The recording is from
574
%  \url{http://www.freesound.org/people/jcveliz/sounds/92002/}.
575
%I identified the piece using \url{http://www.musipedia.org}.}
576
% Parson's code: DUUDDUURDR
577

578
Again we can see that the signal is periodic, but the shape of the
579
signal is more complex.  The shape of a periodic signal is called
580
the {\bf waveform}.  Most musical instruments produce waveforms more
581
complex than a sinusoid.  The shape of the waveform determines the
582
musical {\bf timbre}, which is our perception of the quality of the
583
sound.  People usually perceive complex waveforms as rich, warm and
584
more interesting than sinusoids.
585

586

587
\section{Spectral decomposition}
588

589
\begin{figure}
590
% sounds.py
591
\centerline{\includegraphics[height=2.5in]{figs/sounds3.pdf}}
592
\caption{Spectrum of a segment from the violin recording.}
593
\label{fig.sounds3}
594
\end{figure}
595

596
The most important topic in this book is {\bf spectral decomposition},
597
which is the idea that any signal can be expressed as the sum of
598
sinusoids with different frequencies.
599

600
The most important mathematical idea in this book is the {\bf discrete
601
  Fourier transform}, or {\bf DFT}, which takes a signal and produces
602
its {\bf spectrum}.  The spectrum is the set of sinusoids that add up to
603
produce the signal.
604

605
And the most important algorithm in this book is the {\bf Fast
606
Fourier transform}, or {\bf FFT}, which is an efficient way to
607
compute the DFT.
608

609
For example, Figure~\ref{fig.sounds3} shows the spectrum of the violin
610
recording in Figure~\ref{fig.sounds2}.  The x-axis is the range of
611
frequencies that make up the signal.  The y-axis shows the strength
612
or {\bf amplitude} of each frequency component.
613

614
The lowest frequency component is called the {\bf fundamental
615
  frequency}.  The fundamental frequency of this signal is near 440 Hz
616
(actually a little lower, or ``flat'').
617

618
In this signal the fundamental frequency has the largest amplitude,
619
so it is also the {\bf dominant frequency}.
620
Normally the perceived pitch of a sound is determined by the
621
fundamental frequency, even if it is not dominant. 
622

623
The other spikes in the spectrum are at frequencies 880, 1320, 1760, and
624
2200, which are integer multiples of the fundamental.
625
These components are called {\bf harmonics} because they are
626
musically harmonious with the fundamental:
627

628
\begin{itemize}
629

630
\item 880 is the frequency of A5, one octave higher than the fundamental.  
631
An {\bf octave} is a doubling in frequency.
632

633
\item 1320 is approximately E6, which is a major fifth above A5.
634
  If you are not familiar with musical intervals like "major fifth'', see
635
  \url{https://en.wikipedia.org/wiki/Interval_(music)}.
636

637
\item 1760 is A6, two octaves above the fundamental. 
638

639
\item 2200 is approximately C$\sharp$7, which is a major third
640
above A6.
641

642
\end{itemize}
643

644
These harmonics make up the notes of an A major
645
chord, although not all in the same octave.  Some of them are only
646
approximate because the notes that make up Western music have been
647
adjusted for {\bf equal temperament} (see
648
 \url{http://en.wikipedia.org/wiki/Equal_temperament}).
649

650
Given the harmonics and their amplitudes, you can reconstruct the
651
signal by adding up sinusoids.  
652
Next we'll see how.
653

654

655
\section{Signals}
656

657
I wrote a Python module called {\tt thinkdsp.py} that contains classes
658
and functions for working with signals and spectrums\footnote{The
659
plural of ``spectrum'' is often written ``spectra'', but I prefer
660
to use standard English plurals.  If you are familiar with ``spectra'',
661
I hope my choice doesn't sound too strange.}.  You
662
will find it in the repository for this book (see Section~\ref{code}).
663

664
To represent signals, {\tt thinkdsp} provides a class called
665
{\tt Signal}, which is the parent class for several signal types,
666
including {\tt Sinusoid}, which represents both sine and cosine
667
signals.
668

669
{\tt thinkdsp} provides functions to create sine and cosine signals:
670

671
\begin{verbatim}
672
    cos_sig = thinkdsp.CosSignal(freq=440, amp=1.0, offset=0)
673
    sin_sig = thinkdsp.SinSignal(freq=880, amp=0.5, offset=0)
674
\end{verbatim}
675

676
{\tt freq} is frequency in Hz.  {\tt amp} is amplitude in unspecified
677
units where 1.0 is defined as the largest amplitude we can record or
678
play back.
679

680
{\tt offset} is a {\bf phase offset} in radians.  Phase offset
681
determines where in the period the signal starts.  For example, a
682
sine signal with {\tt offset=0} starts at $\sin 0$, which is 0.
683
With {\tt offset=pi/2} it starts at $\sin \pi/2$, which is 1.
684

685
Signals have an \verb"__add__" method, so you can use the {\tt +}
686
operator to add them:
687

688
\begin{verbatim}
689
    mix = sin_sig + cos_sig
690
\end{verbatim}
691

692
The result is a {\tt SumSignal}, which represents the sum of two
693
or more signals.
694

695
A Signal is basically a Python representation of a mathematical
696
function.  Most signals are defined for all values of {\tt t},
697
from negative infinity to infinity.
698

699
You can't do much with a Signal until you evaluate it.  In this
700
context, ``evaluate'' means taking a sequence of points in time, {\tt
701
  ts}, and computing the corresponding values of the signal, {\tt ys}.
702
I represent {\tt ts} and {\tt ys} using NumPy arrays and encapsulate
703
them in an object called a Wave.
704

705
A Wave represents a signal evaluated at a sequence of points in
706
time.  Each point in time is called a {\bf frame} (a term borrowed
707
from movies and video).  The measurement itself is called a
708
{\bf sample}, although ``frame'' and ``sample'' are sometimes
709
used interchangeably.
710

711
{\tt Signal} provides \verb"make_wave", which returns a new
712
Wave object:
713

714
\begin{verbatim}
715
    wave = mix.make_wave(duration=0.5, start=0, framerate=11025)
716
\end{verbatim}
717

718
{\tt duration} is the length of the Wave in seconds.  {\tt start} is
719
the start time, also in seconds.  {\tt framerate} is the (integer)
720
number of frames per second, which is also the number of samples
721
per second.
722

723
11,025 frames per second is one of several framerates commonly used in
724
audio file formats, including Waveform Audio File (WAV) and mp3. 
725

726
This example evaluates the signal from {\tt t=0} to {\tt t=0.5} at
727
5,513 equally-spaced frames (because 5,513 is half of 11,025).
728
The time between frames, or {\bf timestep}, is {\tt 1/11025} seconds, about
729
91 $\mu$s.
730

731
{\tt Wave} provides a {\tt plot} method that uses {\tt pyplot}.
732
You can plot the wave like this:
733

734
\begin{verbatim}
735
    wave.plot()
736
    pyplot.show()
737
\end{verbatim}
738

739
{\tt pyplot} is part of {\tt matplotlib}; it is included in many
740
Python distributions, or you might have to install it.
741

742
\begin{figure}
743
% sounds4.py
744
\centerline{\includegraphics[height=2.5in]{figs/sounds4.pdf}}
745
\caption{Segment from a mixture of two sinusoid signals.}
746
\label{fig.sounds4}
747
\end{figure}
748

749
At {\tt freq=440} there are 220 periods in 0.5 seconds, so this plot
750
would look like a solid block of color.  To zoom in on a small number
751
of periods, we can use {\tt segment}, which copies a segment of a Wave
752
and returns a new wave:
753

754
\begin{verbatim}
755
    period = mix.period
756
    segment = wave.segment(start=0, duration=period*3)
757
\end{verbatim}
758

759
{\tt period} is a property of a Signal; it returns the period in seconds.
760

761
{\tt start} and {\tt duration} are in seconds.  This example copies
762
the first three periods from {\tt mix}.  The result is a Wave object.
763

764
If we plot {\tt segment}, it looks like Figure~\ref{fig.sounds4}.
765
This signal contains two frequency components, so it is more
766
complicated than the signal from the tuning fork, but less complicated
767
than the violin.
768

769

770
\section{Reading and writing Waves}
771

772
{\tt thinkdsp} provides \verb"read_wave", which reads a WAV
773
file and returns a Wave:
774

775
\begin{verbatim}
776
    violin_wave = thinkdsp.read_wave('input.wav')
777
\end{verbatim}
778

779
And {\tt Wave} provides {\tt write}, which writes a WAV file:
780

781
\begin{verbatim}
782
    wave.write(filename='output.wav')
783
\end{verbatim}
784

785
You can listen to the Wave with any media player that plays WAV
786
files.  On UNIX systems, I use {\tt aplay}, which is simple, robust,
787
and included in many Linux distributions.
788

789
{\tt thinkdsp} also provides \verb"play_wave", which runs
790
the media player as a subprocess:
791

792
\begin{verbatim}
793
    thinkdsp.play_wave(filename='output.wav', player='aplay')
794
\end{verbatim}
795

796
It uses {\tt aplay} by default, but you can provide the
797
name of another player.
798

799

800
\section{Spectrums}
801
\label{spectrums}
802

803
{\tt Wave} provides \verb"make_spectrum", which returns a
804
{\tt Spectrum}:
805

806
\begin{verbatim}
807
    spectrum = wave.make_spectrum()
808
\end{verbatim}
809

810
And {\tt Spectrum} provides {\tt plot}:
811

812
\begin{verbatim}
813
    spectrum.plot()
814
    thinkplot.show()
815
\end{verbatim}
816

817
{\tt thinkplot} is a module I wrote to provide wrappers around some of
818
the functions in {\tt pyplot}.  It is included in the
819
Git repository for this book (see Section~\ref{code}).
820

821
{\tt Spectrum} provides three methods that modify the spectrum:
822

823
\begin{itemize}
824

825
\item \verb"low_pass" applies a low-pass filter, which means that
826
  components above a given cutoff frequency are attenuated (that is,
827
  reduced in magnitude) by a factor.
828

829
\item \verb"high_pass" applies a high-pass filter, which means that
830
  it attenuates components below the cutoff.
831

832
\item \verb"band_stop" attenuates components in the band of
833
frequencies between two cutoffs.
834

835
\end{itemize}
836

837
This example attenuates all frequencies above 600 by 99\%:
838

839
\begin{verbatim}
840
   spectrum.low_pass(cutoff=600, factor=0.01)
841
\end{verbatim}
842

843
A low pass filter removes bright, high-frequency sounds, so
844
the result sounds muffled and darker.  To hear what it sounds
845
like, you can convert the Spectrum back to a Wave, and then play it.
846

847
\begin{verbatim}
848
    wave = spectrum.make_wave()
849
    wave.play('temp.wav')
850
\end{verbatim}
851

852
The {\tt play} method writes the wave to a file and then plays it.
853
If you use Jupyter notebooks, you can use \verb"make_audio", which
854
makes an Audio widget that plays the sound.
855

856

857
\section{Wave objects}
858

859
\begin{figure}
860
% http://yuml.me/edit/5294b377
861
% pdftops -eps uml_diagram1.pdf
862
\centerline{\includegraphics[width=3.5in]{figs/uml_diagram1.pdf}}
863
\caption{Relationships among the classes in {\tt thinkdsp}.}
864
\label{fig.diagram1}
865
\end{figure}
866

867
There is nothing very complicated in {\tt thinkdsp.py}.  Most
868
of the functions it provides are thin wrappers around functions
869
from NumPy and SciPy.
870

871
The primary classes in {\tt thinkdsp} are Signal, Wave, and Spectrum.
872
Given a Signal, you can make a Wave.  Given a Wave, you can
873
make a Spectrum, and vice versa.  These relationships are shown
874
in Figure~\ref{fig.diagram1}.
875

876
A Wave object contains three attributes: {\tt ys} is a NumPy array
877
that contains the values in the signal; {\tt ts} is an array of the
878
times where the signal was evaluated or sampled; and {\tt
879
  framerate} is the number of samples per unit of time.  The
880
unit of time is usually seconds, but it doesn't have to be.  In
881
one of my examples, it's days.
882

883
Wave also provides three read-only properties: {\tt start},
884
{\tt end}, and {\tt duration}.  If you modify {\tt ts}, these
885
properties change accordingly.
886

887
To modify a wave, you can access the {\tt ts} and {\tt ys} directly.
888
For example:
889

890
\begin{verbatim}
891
wave.ys *= 2
892
wave.ts += 1
893
\end{verbatim} 
894

895
The first line scales the wave by a factor of 2, making
896
it louder.  The second line shifts the wave in time, making it
897
start 1 second later.
898

899
But Wave provides methods that perform many common operations.
900
For example, the same two transformations could be written:
901

902
\begin{verbatim}
903
wave.scale(2)
904
wave.shift(1)
905
\end{verbatim} 
906

907
You can read the documentation of these methods and others at
908
\url{http://greenteapress.com/thinkdsp.html}.
909

910

911
\section{Signal objects}
912
\label{sigobs}
913

914
Signal is a parent class that provides functions common to all
915
kinds of signals, like \verb"make_wave".  Child classes inherit
916
these methods and provide {\tt evaluate}, which evaluates the
917
signal at a given sequence of times.
918

919
For example, Sinusoid is a child class of Signal, with this
920
definition:
921

922
\begin{verbatim}
923
class Sinusoid(Signal):
924
    
925
    def __init__(self, freq=440, amp=1.0, offset=0, func=np.sin):
926
        Signal.__init__(self)
927
        self.freq = freq
928
        self.amp = amp
929
        self.offset = offset
930
        self.func = func
931
\end{verbatim}
932

933
The parameters of \verb"__init__" are:
934

935
\begin{itemize}
936

937
\item {\tt freq}: frequency in cycles per second, or Hz.
938

939
\item {\tt amp}: amplitude.  The units of amplitude are arbitrary,
940
usually chosen so 1.0 corresponds to the maximum input from a
941
microphone or maximum output to a speaker.
942

943
\item {\tt offset}: indicates where in its period the signal starts;
944
{\tt offset} is in units of radians, for reasons I explain below.
945

946
\item {\tt func}: a Python function used
947
to evaluate the signal at a particular point in time.  It is
948
usually either {\tt np.sin} or {\tt np.cos}, yielding a sine or
949
cosine signal.
950

951
\end{itemize}
952

953
Like many init methods, this one just tucks the parameters away for
954
future use.
955

956
Signal provides \verb"make_wave", which looks like
957
this:
958

959
\begin{verbatim}
960
    def make_wave(self, duration=1, start=0, framerate=11025):
961
        n = round(duration * framerate)
962
        ts = start + np.arange(n) / framerate
963
        ys = self.evaluate(ts)
964
        return Wave(ys, ts, framerate=framerate)
965
\end{verbatim}
966

967
{\tt start} and {\tt duration} are the start time and duration
968
in seconds.  {\tt framerate} is the number of frames (samples)
969
per second.
970

971
{\tt n} is the number of samples, and {\tt ts} is a NumPy array
972
of sample times.
973

974
To compute the {\tt ys}, \verb"make_wave" invokes {\tt evaluate}, 
975
is provided by {\tt Sinusoid}:
976

977
\begin{verbatim}
978
    def evaluate(self, ts):
979
        phases = PI2 * self.freq * ts + self.offset
980
        ys = self.amp * self.func(phases)
981
        return ys
982
\end{verbatim}
983

984
Let's unwind this function one step at time:
985

986
\begin{enumerate}
987

988
\item {\tt self.freq} is frequency in cycles per second, and each
989
  element of {\tt ts} is a time in seconds, so their product is the
990
  number of cycles since the start time.
991

992
\item {\tt PI2} is a constant that stores $2 \pi$.  Multiplying by
993
  {\tt PI2} converts from cycles to {\bf phase}.  You can think of
994
  phase as ``cycles since the start time'' expressed in radians.  Each
995
  cycle is $2 \pi$ radians.
996

997
\item {\tt self.offset} is the phase when $t=0$.
998
  It has the effect of shifting the signal left or right in time.
999

1000
\item If {\tt self.func} is {\tt np.sin} or {\tt np.cos}, the result is a
1001
  value between $-1$ and $+1$.
1002

1003
\item Multiplying by {\tt self.amp} yields a signal that ranges from
1004
  {\tt -self.amp} to {\tt +self.amp}.
1005

1006
\end{enumerate}
1007

1008
In math notation, {\tt evaluate} is written like this:
1009
%
1010
\[ y = A \cos (2 \pi f t + \phi_0) \]
1011
%
1012
where $A$ is amplitude, $f$ is frequency, $t$ is time, and $\phi_0$
1013
is the phase offset.  It may seem like I wrote a lot of code
1014
to evaluate one simple expression, but as we'll see, this code
1015
provides a framework for dealing with all kinds of signals, not
1016
just sinusoids.
1017

1018

1019
\section{Exercises}
1020

1021
Before you begin these exercises, you should download the code
1022
for this book, following the instructions in Section~\ref{code}.
1023

1024
Solutions to these exercises are in {\tt chap01soln.ipynb}.
1025

1026
\begin{exercise}
1027
If you have Jupyter, load {\tt chap01.ipynb}, read through it, and run
1028
the examples.  You can also view this notebook at
1029
\url{http://tinyurl.com/thinkdsp01}.
1030
\end{exercise}
1031

1032

1033
\begin{exercise}
1034
Go to \url{http://freesound.org} and download a sound sample that
1035
includes music, speech, or other sounds that have a well-defined pitch.
1036
Select a roughly half-second segment where the pitch is
1037
constant.  Compute and plot the spectrum of the segment you selected.
1038
What connection can you make between the timbre of the sound and the
1039
harmonic structure you see in the spectrum?
1040

1041
Use \verb"high_pass", \verb"low_pass", and \verb"band_stop" to
1042
filter out some of the harmonics.  Then convert the spectrum back
1043
to a wave and listen to it.  How does the sound relate to the
1044
changes you made in the spectrum?
1045
\end{exercise}
1046

1047

1048
\begin{exercise}
1049
Synthesize a compound signal by creating SinSignal and CosSignal
1050
objects and adding them up.  Evaluate the signal to get a Wave,
1051
and listen to it.  Compute its Spectrum and plot it.
1052
What happens if you add frequency
1053
components that are not multiples of the fundamental?
1054
\end{exercise}
1055

1056

1057
\begin{exercise}
1058
Write a function called {\tt stretch} that takes a Wave and a stretch
1059
factor and speeds up or slows down the wave by modifying {\tt ts} and
1060
{\tt framerate}.  Hint: it should only take two lines of code.
1061
\end{exercise}
1062

1063

1064
\chapter{Harmonics}
1065
\label{harmonics}
1066

1067
In this chapter I present several new waveforms; we will look and
1068
their spectrums to understand their {\bf harmonic structure}, which is
1069
the set of sinusoids they are made up of.
1070

1071
I'll also introduce one of the most important phenomena in digital
1072
signal processing: aliasing.  And I'll explain a little more about how
1073
the Spectrum class works.
1074

1075
The code for this chapter is in {\tt chap02.ipynb}, which is in the
1076
repository for this book (see Section~\ref{code}).
1077
You can also view it at \url{http://tinyurl.com/thinkdsp02}.
1078

1079

1080
\section{Triangle waves}
1081
\label{triangle}
1082

1083
A sinusoid contains only one frequency component, so its spectrum
1084
has only one peak.  More complicated waveforms, like the
1085
violin recording, yield DFTs with many peaks.  In this section we
1086
investigate the relationship between waveforms and their spectrums.
1087

1088
\begin{figure}
1089
% aliasing.py
1090
\centerline{\includegraphics[height=2.5in]{figs/triangle-200-1.pdf}}
1091
\caption{Segment of a triangle signal at 200 Hz.}
1092
\label{fig.triangle.200.1}
1093
\end{figure}
1094

1095
I'll start with a triangle waveform, which is like a straight-line
1096
version of a sinusoid.  Figure~\ref{fig.triangle.200.1} shows a
1097
triangle waveform with frequency 200 Hz.
1098

1099
To generate a triangle wave, you can use {\tt thinkdsp.TriangleSignal}:
1100

1101
\begin{verbatim}
1102
class TriangleSignal(Sinusoid):
1103
    
1104
    def evaluate(self, ts):
1105
        cycles = self.freq * ts + self.offset / PI2
1106
        frac, _ = np.modf(cycles)
1107
        ys = np.abs(frac - 0.5)
1108
        ys = normalize(unbias(ys), self.amp)
1109
        return ys
1110
\end{verbatim}
1111

1112
{\tt TriangleSignal} inherits \verb"__init__" from {\tt Sinusoid},
1113
so it takes the same arguments: {\tt freq}, {\tt amp}, and {\tt offset}.
1114

1115
The only difference is {\tt evaluate}.  As we saw before,
1116
{\tt ts} is the sequence of sample times where we want to
1117
evaluate the signal.
1118

1119
There are many ways to generate a triangle wave.  The details
1120
are not important, but here's how {\tt evaluate} works:
1121

1122
\begin{enumerate}
1123

1124
\item {\tt cycles} is the number of cycles since the start time.
1125
{\tt np.modf} splits the number of cycles into the fraction
1126
part, stored in {\tt frac}, and the integer part, which is ignored
1127
\footnote{Using an underscore as a variable name is a convention that
1128
means, ``I don't intend to use this value.''}.
1129

1130
\item {\tt frac} is a sequence that ramps from 0 to 1 with the given
1131
  frequency.  Subtracting 0.5 yields values between -0.5 and 0.5.
1132
  Taking the absolute value yields a waveform that zig-zags between
1133
  0.5 and 0.
1134

1135
\item {\tt unbias} shifts the waveform down so it is centered at 0; then
1136
{\tt normalize} scales it to the given amplitude, {\tt amp}.
1137

1138
\end{enumerate}
1139

1140
Here's the code that generates Figure~\ref{fig.triangle.200.1}:
1141

1142
\begin{verbatim}
1143
signal = thinkdsp.TriangleSignal(200)
1144
signal.plot()
1145
\end{verbatim}
1146

1147
\begin{figure}
1148
% aliasing.py
1149
\centerline{\includegraphics[height=2.5in]{figs/triangle-200-2.pdf}}
1150
\caption{Spectrum of a triangle signal at 200 Hz, shown on two
1151
  vertical scales.  The version on the right cuts off the fundamental
1152
  to show the harmonics more clearly.}
1153
\label{fig.triangle.200.2}
1154
\end{figure}
1155

1156
Next we can use the Signal to make a Wave, and use the Wave to
1157
make a Spectrum:
1158

1159
\begin{verbatim}
1160
wave = signal.make_wave(duration=0.5, framerate=10000)
1161
spectrum = wave.make_spectrum()
1162
spectrum.plot()
1163
\end{verbatim}
1164

1165
Figure~\ref{fig.triangle.200.2} shows two views of the result; the
1166
view on the right is scaled to show the harmonics more clearly.  As
1167
expected, the highest peak is at the fundamental frequency, 200 Hz,
1168
and there are additional peaks at harmonic frequencies, which are
1169
integer multiples of 200.
1170

1171
But one surprise is that there are no peaks at the even multiples:
1172
400, 800, etc.  The harmonics of a triangle wave are all
1173
odd multiples of the fundamental frequency, in this example
1174
600, 1000, 1400, etc.
1175

1176
Another feature of this spectrum is the relationship between the
1177
amplitude and frequency of the harmonics.  Their amplitude drops off
1178
in proportion to frequency squared.  For example the frequency ratio
1179
of the first two harmonics (200 and 600 Hz) is 3, and the amplitude
1180
ratio is approximately 9.  The frequency ratio of the next two
1181
harmonics (600 and 1000 Hz) is 1.7, and the amplitude ratio is
1182
approximately $1.7^2 = 2.9$.  This relationship is called the
1183
{\bf harmonic structure}.
1184

1185

1186
\section{Square waves}
1187
\label{square}
1188

1189
\begin{figure}
1190
% aliasing.py
1191
\centerline{\includegraphics[height=2.5in]{figs/square-100-1.pdf}}
1192
\caption{Segment of a square signal at 100 Hz.}
1193
\label{fig.square.100.1}
1194
\end{figure}
1195

1196
{\tt thinkdsp} also provides {\tt SquareSignal}, which represents
1197
a square signal.  Here's the class definition:
1198

1199
\begin{verbatim}
1200
class SquareSignal(Sinusoid):
1201
    
1202
    def evaluate(self, ts):
1203
        cycles = self.freq * ts + self.offset / PI2
1204
        frac, _ = np.modf(cycles)
1205
        ys = self.amp * np.sign(unbias(frac))
1206
        return ys
1207
\end{verbatim}
1208

1209
Like {\tt TriangleSignal}, {\tt SquareSignal} inherits 
1210
\verb"__init__" from {\tt Sinusoid}, so it takes the same
1211
parameters.
1212

1213
And the {\tt evaluate} method is similar.  Again, {\tt cycles} is
1214
the number of cycles since the start time, and {\tt frac} is the
1215
fractional part, which ramps from 0 to 1 each period.
1216

1217
{\tt unbias} shifts {\tt frac} so it ramps from -0.5 to 0.5,
1218
then {\tt np.sign} maps the negative values to -1 and the
1219
positive values to 1.  Multiplying by {\tt amp} yields a square
1220
wave that jumps between {\tt -amp} and {\tt amp}.
1221

1222
\begin{figure}
1223
% aliasing.py
1224
\centerline{\includegraphics[height=2.5in]{figs/square-100-2.pdf}}
1225
\caption{Spectrum of a square signal at 100 Hz.}
1226
\label{fig.square.100.2}
1227
\end{figure}
1228

1229
Figure~\ref{fig.square.100.1} shows three periods of a square
1230
wave with frequency 100 Hz,
1231
and Figure~\ref{fig.square.100.2} shows its spectrum.
1232

1233
Like a triangle wave, the square wave contains only odd harmonics,
1234
which is why there are peaks at 300, 500, and 700 Hz, etc.
1235
But the amplitude of the harmonics drops off more slowly.
1236
Specifically, amplitude drops in proportion to frequency (not frequency
1237
squared).
1238

1239
The exercises at the end of this chapter give you a chance to
1240
explore other waveforms and other harmonic structures.
1241

1242

1243
\section{Aliasing}
1244
\label{aliasing}
1245

1246
\begin{figure}
1247
% aliasing.py
1248
\centerline{\includegraphics[height=2.5in]{figs/triangle-1100-2.pdf}}
1249
\caption{Spectrum of a triangle signal at 1100 Hz sampled at
1250
10,000 frames per second.  The view on the right is scaled to
1251
show the harmonics.}
1252
\label{fig.triangle.1100.2}
1253
\end{figure}
1254

1255
I have a confession.  I chose the examples in the previous section
1256
carefully to avoid showing you something confusing.  But now it's
1257
time to get confused.
1258

1259
Figure~\ref{fig.triangle.1100.2} shows the spectrum of a triangle wave
1260
at 1100 Hz, sampled at 10,000 frames per second.  Again, the view on
1261
the right is scaled to show the harmonics.
1262

1263
The harmonics
1264
of this wave should be at 3300, 5500, 7700, and 9900 Hz.
1265
In the figure, there are peaks at 1100 and 3300 Hz, as expected, but
1266
the third peak is at 4500, not 5500 Hz.  The
1267
fourth peak is at 2300, not 7700 Hz.  And if you look closely, the
1268
peak that should be at 9900 is actually at 100 Hz.  What's going on?
1269

1270
The problem is that when you evaluate the signal at
1271
discrete points in time, you lose information about what happened
1272
between samples.  For low frequency components, that's not a
1273
problem, because you have lots of samples per period.
1274

1275
But if you sample a signal at 5000 Hz with 10,000 frames per second,
1276
you only have two samples per period.  That turns out to be enough,
1277
just barely, but if the frequency is higher, it's not.
1278

1279
To see why, let's generate cosine signals at 4500 and 5500 Hz,
1280
and sample them at 10,000 frames per second:
1281

1282
\begin{verbatim}
1283
    framerate = 10000
1284

1285
    signal = thinkdsp.CosSignal(4500)
1286
    duration = signal.period*5
1287
    segment = signal.make_wave(duration, framerate=framerate)
1288
    segment.plot()
1289

1290
    signal = thinkdsp.CosSignal(5500)
1291
    segment = signal.make_wave(duration, framerate=framerate)
1292
    segment.plot()
1293
\end{verbatim}
1294

1295
\begin{figure}
1296
% aliasing.py
1297
\centerline{\includegraphics[height=3.5in]{figs/aliasing1.pdf}}
1298
\caption{Cosine signals at 4500 and 5500 Hz, sampled at 10,000 frames
1299
per second.  The signals are different, but the samples are identical.}
1300
\label{fig.aliasing1}
1301
\end{figure}
1302

1303
Figure~\ref{fig.aliasing1} shows the result.  I plotted the Signals
1304
with thin gray lines and the samples using vertical lines,
1305
to make it easier to compare the
1306
two Waves.  The problem
1307
should be clear: even though the Signals are different, the
1308
Waves are identical!
1309

1310
When we sample a 5500 Hz signal at 10,000 frames per second, the
1311
result is indistinguishable from a 4500 Hz signal.
1312
For the same reason, a 7700 Hz signal is indistinguishable
1313
from 2300 Hz, and a 9900 Hz is indistinguishable from 100 Hz.
1314

1315
This effect is called {\bf aliasing} because when the high frequency
1316
signal is sampled, it appears to be a low frequency signal.
1317

1318
In this example, the highest frequency we can measure is 5000 Hz,
1319
which is half the sampling rate.  Frequencies above 5000 Hz are folded
1320
back below 5000 Hz, which is why this threshold is sometimes called
1321
the ``folding frequency''.  It is sometimes also called the {\bf
1322
  Nyquist frequency}.  See
1323
\url{http://en.wikipedia.org/wiki/Nyquist_frequency}.
1324

1325
The folding pattern continues if the aliased frequency goes below
1326
zero.  For example, the 5th harmonic of the 1100 Hz triangle wave is
1327
at 12,100 Hz.  Folded at 5000 Hz, it would appear at -2100 Hz, but it
1328
gets folded again at 0 Hz, back to 2100 Hz.  In fact, you can see a
1329
small peak at 2100 Hz in Figure~\ref{fig.square.100.2}, and the next
1330
one at 4300 Hz.
1331

1332

1333
\section{Computing the spectrum}
1334

1335
We have seen the Wave method \verb"make_spectrum" several times.
1336
Here is the implementation (leaving out some details we'll get
1337
to later):
1338

1339
\begin{verbatim}
1340
from np.fft import rfft, rfftfreq
1341

1342
# class Wave:
1343
    def make_spectrum(self):
1344
        n = len(self.ys)
1345
        d = 1 / self.framerate
1346

1347
        hs = rfft(self.ys)
1348
        fs = rfftfreq(n, d)
1349

1350
        return Spectrum(hs, fs, self.framerate)
1351
\end{verbatim}
1352

1353
The parameter {\tt self} is a Wave object.  {\tt n} is the number
1354
of samples in the wave, and {\tt d} is the inverse of the
1355
frame rate, which is the time between samples.
1356

1357
{\tt np.fft} is the NumPy module that provides functions related
1358
to the {\bf Fast Fourier Transform} (FFT), which is an efficient
1359
algorithm that computes the Discrete Fourier Transform (DFT).
1360

1361
%TODO: add a forward reference to full fft
1362
\verb"make_spectrum" uses {\tt rfft}, which stands for ``real
1363
FFT'', because the Wave contains real values, not complex.  Later
1364
we'll see the full FFT, which can handle complex signals.  The result
1365
of {\tt rfft}, which I call {\tt hs}, is a NumPy array of complex
1366
numbers that represents the amplitude and phase offset of each
1367
frequency component in the wave.
1368

1369
The result of {\tt rfftfreq}, which I call {\tt fs}, is an array that
1370
contains frequencies corresponding to the {\tt hs}. 
1371

1372
To understand the values in {\tt hs}, consider these two ways to think
1373
about complex numbers:
1374

1375
\begin{itemize}
1376

1377
\item A complex number is the sum of a real part and an imaginary
1378
  part, often written $x + iy$, where $i$ is the imaginary unit,
1379
  $\sqrt{-1}$.  You can think of $x$ and $y$ as Cartesian coordinates.
1380

1381
\item A complex number is also the product of a magnitude and a
1382
  complex exponential, $A e^{i \phi}$, where $A$ is the {\bf
1383
    magnitude} and $\phi$ is the {\bf angle} in radians, also called
1384
  the ``argument''.  You can think of $A$ and $\phi$ as polar
1385
  coordinates.
1386

1387
\end{itemize}
1388

1389
Each value in {\tt hs} corresponds to a frequency component: its
1390
magnitude is proportional to the amplitude of the corresponding
1391
component; its angle is the phase offset.
1392

1393
The Spectrum class provides two read-only properties, {\tt amps}
1394
and {\tt angles}, which return NumPy arrays representing the
1395
magnitudes and angles of the {\tt hs}.  When we plot a Spectrum
1396
object, we usually plot {\tt amps} versus {\tt fs}.  Sometimes
1397
it is also useful to plot {\tt angles} versus {\tt fs}.
1398

1399
Although it might be tempting to look at the real and imaginary
1400
parts of {\tt hs}, you will almost never need to.  I encourage
1401
you to think of the DFT as a vector of amplitudes and phase offsets
1402
that happen to be encoded in the form of complex numbers.
1403

1404
To modify a Spectrum, you can access the {\tt hs} directly.
1405
For example:
1406

1407
\begin{verbatim}
1408
spectrum.hs *= 2
1409
spectrum.hs[spectrum.fs > cutoff] = 0
1410
\end{verbatim}
1411

1412
The first line multiples the elements of {\tt hs} by 2, which
1413
doubles the amplitudes of all components.  The second line
1414
sets to 0 only the elements of {\tt hs} where the corresponding
1415
frequency exceeds some cutoff frequency.
1416

1417
But Spectrum also provides methods to perform these operations:
1418

1419
\begin{verbatim}
1420
spectrum.scale(2)
1421
spectrum.low_pass(cutoff)
1422
\end{verbatim}
1423

1424
You can read the documentation of these methods and others at
1425
\url{http://greenteapress.com/thinkdsp.html}.
1426

1427
At this point you should have a better idea of how the Signal, Wave,
1428
and Spectrum classes work, but I have not explained how the Fast
1429
Fourier Transform works.  That will take a few more chapters.
1430

1431

1432
\section{Exercises}
1433

1434
Solutions to these exercises are in {\tt chap02soln.ipynb}.
1435

1436
\begin{exercise}
1437
If you use Jupyter, load {\tt chap02.ipynb} and try out the examples.
1438
You can also view the notebook at \url{http://tinyurl.com/thinkdsp02}.
1439
\end{exercise}
1440

1441
\begin{exercise}
1442
A sawtooth signal has a waveform that ramps up linearly from -1 to 1,
1443
then drops to -1 and repeats. See
1444
\url{http://en.wikipedia.org/wiki/Sawtooth_wave}
1445

1446
Write a class called
1447
{\tt SawtoothSignal} that extends {\tt Signal} and provides
1448
{\tt evaluate} to evaluate a sawtooth signal.
1449

1450
Compute the spectrum of a sawtooth wave.  How does the harmonic
1451
structure compare to triangle and square waves?
1452
\end{exercise}
1453

1454
\begin{exercise}
1455
Make a square signal at 1100 Hz and make a wave that samples it
1456
at 10000 frames per second.  If you plot the spectrum, you can
1457
see that most of the harmonics are aliased.
1458
When you listen to the wave, can you hear the aliased harmonics?  
1459
\end{exercise}
1460

1461

1462
\begin{exercise}
1463
If you have a spectrum object, {\tt spectrum}, and print the
1464
first few values of {\tt spectrum.fs}, you'll see that they
1465
start at zero.  So {\tt spectrum.hs[0]} is the magnitude
1466
of the component with frequency 0.  But what does that mean?
1467

1468
Try this experiment:
1469

1470
\begin{enumerate}
1471

1472
\item Make a triangle signal with frequency 440 and make
1473
a Wave with duration 0.01 seconds.  Plot the waveform.
1474

1475
\item Make a Spectrum object and print {\tt spectrum.hs[0]}.
1476
What is the amplitude and phase of this component?
1477

1478
\item Set {\tt spectrum.hs[0] = 100}.  Make a Wave from the
1479
modified Spectrum and plot it.  What effect does this operation
1480
have on the waveform?
1481

1482
\end{enumerate}
1483

1484
\end{exercise}
1485

1486

1487
\begin{exercise}
1488
Write a function that takes a Spectrum as a parameter and modifies
1489
it by dividing each element of {\tt hs} by the corresponding frequency
1490
from {\tt fs}.  Hint: since division by zero is undefined, you
1491
might want to set {\tt spectrum.hs[0] = 0}.  
1492

1493
Test your function using a square, triangle, or sawtooth wave.
1494

1495
\begin{enumerate}
1496

1497
\item Compute the Spectrum and plot it.
1498

1499
\item Modify the Spectrum using your function and plot it again.
1500

1501
\item Make a Wave from the modified Spectrum and listen to it.  What
1502
effect does this operation have on the signal?
1503

1504
\end{enumerate}
1505

1506
\end{exercise}
1507

1508
\begin{exercise}
1509
Triangle and square waves have odd harmonics only; the sawtooth
1510
wave has both even and odd harmonics.  The harmonics of the
1511
square and sawtooth waves drop off in proportion to $1/f$; the
1512
harmonics of the triangle wave drop off like $1/f^2$.  Can you
1513
find a waveform that has even and odd harmonics that drop off
1514
like $1/f^2$?
1515

1516
Hint: There are two ways you could approach this: you could
1517
construct the signal you want by adding up sinusoids, or you
1518
could start with a signal that is similar to what you want and
1519
modify it.
1520
\end{exercise}
1521

1522

1523
\chapter{Non-periodic signals}
1524
\label{nonperiodic}
1525

1526
The signals we have worked with so far are periodic, which means
1527
that they repeat forever.  It also means that the frequency
1528
components they contain do not change over time.
1529
In this chapter, we consider non-periodic signals,
1530
whose frequency components {\em do} change over time.
1531
In other words, pretty much all sound signals.
1532

1533
This chapter also presents spectrograms, a common way to visualize
1534
non-periodic signals.
1535

1536
The code for this chapter is in {\tt chap03.ipynb}, which is in the
1537
repository for this book (see Section~\ref{code}).
1538
You can also view it at \url{http://tinyurl.com/thinkdsp03}.
1539

1540

1541
\section{Linear chirp}
1542

1543
\begin{figure}
1544
% chirp.py
1545
\centerline{\includegraphics[height=2.5in]{figs/chirp3.pdf}}
1546
\caption{Chirp waveform near the beginning, middle, and end.}
1547
\label{fig.chirp3}
1548
\end{figure}
1549

1550
We'll start with a {\bf chirp}, which is a signal with variable
1551
frequency.  {\tt thinkdsp} provides a Signal called Chirp that
1552
makes a sinusoid that sweeps linearly through a range of
1553
frequencies.
1554

1555
Here's an example that sweeps from 220 to 880 Hz, which
1556
is two octaves from A3 to A5:
1557

1558
\begin{verbatim}
1559
signal = thinkdsp.Chirp(start=220, end=880)
1560
wave = signal.make_wave()
1561
\end{verbatim}
1562

1563
Figure~\ref{fig.chirp3} shows segments of this wave near the
1564
beginning, middle, and end.  It's clear that the frequency is
1565
increasing.
1566

1567
Before we go on, let's see how Chirp is implemented.  Here
1568
is the class definition:
1569

1570
\begin{verbatim}
1571
class Chirp(Signal):
1572
    
1573
    def __init__(self, start=440, end=880, amp=1.0):
1574
        self.start = start
1575
        self.end = end
1576
        self.amp = amp
1577
\end{verbatim}
1578

1579
{\tt start} and {\tt end} are the frequencies, in Hz, at the start
1580
and end of the chirp.  {\tt amp} is amplitude.
1581

1582
Here is the function that evaluates the signal:
1583

1584
\begin{verbatim}
1585
    def evaluate(self, ts):
1586
        freqs = np.linspace(self.start, self.end, len(ts)-1)
1587
        return self._evaluate(ts, freqs)
1588
\end{verbatim}
1589

1590
{\tt ts} is the sequence of points in time where the signal should be
1591
evaluated; to keep this function simple, I assume they are equally-spaced.
1592

1593
If the length of {\tt ts} is $n$, you can think of it as a sequence of
1594
$n-1$ intervals of time.  To compute the frequency during each
1595
interval, I use {\tt np.linspace}, which returns a NumPy array of
1596
$n-1$ values between {\tt start} and {\tt end}.
1597

1598
\verb"_evaluate" is a private method
1599
that does the rest of the math\footnote{Beginning a method name
1600
  with an underscore makes it ``private'', indicating that it is not
1601
  part of the API that should be used outside the class definition.}:
1602

1603
\begin{verbatim}
1604
    def _evaluate(self, ts, freqs):
1605
        dts = np.diff(ts)
1606
        dphis = PI2 * freqs * dts
1607
        phases = np.cumsum(dphis)
1608
        phases = np.insert(phases, 0, 0)
1609
        ys = self.amp * np.cos(phases)
1610
        return ys
1611
\end{verbatim}
1612

1613
{\tt np.diff} computes the difference between adjacent elements
1614
of {\tt ts}, returning the length of each interval in seconds.  
1615
If the elements of {\tt ts} are equally spaced,
1616
the {\tt dts} are all the same.
1617

1618
The next step is to figure out how much the phase changes during
1619
each interval.  In Section~\ref{sigobs} we saw that when frequency is
1620
constant, the phase, $\phi$, increases linearly over time:
1621
%
1622
\[ \phi = 2 \pi f t \]
1623
%
1624
When frequency is a function of time, the {\em change} in phase
1625
during a short time interval, $\Delta t$ is:
1626
%
1627
\[ \Delta \phi = 2 \pi f(t) \Delta t \]
1628
%
1629
In Python, since {\tt freqs} contains $f(t)$ and {\tt dts}
1630
contains the time intervals, we can write
1631

1632
\begin{verbatim}
1633
dphis = PI2 * freqs * dts
1634
\end{verbatim}
1635

1636
Now, since {\tt dphis} contains the changes in phase, we can
1637
get the total phase at each timestep by adding up the changes:
1638

1639
\begin{verbatim}
1640
phases = np.cumsum(dphis)
1641
phases = np.insert(phases, 0, 0)
1642
\end{verbatim}
1643

1644
{\tt np.cumsum} computes the cumulative sum, which is almost
1645
what we want, but it doesn't start at 0.  So I use {\tt np.insert}
1646
to add a 0 at the beginning.
1647

1648
The result is a NumPy array where the {\tt i}th element contains the
1649
sum of the first {\tt i} terms from {\tt dphis}; that is, the total
1650
phase at the end of the {\tt i}th interval.  Finally, {\tt np.cos}
1651
computes the amplitude of the wave as a function of phase (remember
1652
that phase is expressed in radians).
1653

1654
If you know calculus, you might notice that the limit as
1655
$\Delta t$ gets small is
1656
%
1657
\[ d\phi = 2 \pi f(t) dt \]
1658
%
1659
Dividing through by $dt$ yields
1660
%
1661
\[ \frac{d\phi}{dt} = 2 \pi f(t) \]
1662
%
1663
In other words, frequency is the derivative of phase.  Conversely,
1664
phase is the integral of frequency.  When we used {\tt cumsum}
1665
to go from frequency to phase, we were approximating integration.
1666

1667

1668
\section{Exponential chirp}
1669

1670
When you listen to this chirp, you might notice that the pitch
1671
rises quickly at first and then slows down.
1672
The chirp spans two octaves, but it only takes 2/3 s to span
1673
the first octave, and twice as long to span the second.  
1674

1675
The reason is that our perception of pitch depends on the logarithm of
1676
frequency.  As a result, the {\bf interval} we hear between two notes
1677
depends on the {\em ratio} of their frequencies, not the difference.
1678
``Interval'' is the musical term for the perceived difference between
1679
two pitches.
1680

1681
For example, an octave is an interval where the ratio of two
1682
pitches is 2.  So the interval from 220 to 440 is one octave
1683
and the interval from 440 to 880 is also one octave.  The difference
1684
in frequency is bigger, but the ratio is the same.
1685

1686
As a result, if frequency increases linearly, as in a linear
1687
chirp, the perceived pitch increases logarithmically.
1688

1689
If you want the perceived pitch to increase linearly, the frequency
1690
has to increase exponentially.  A signal with that shape is called
1691
an {\bf exponential chirp}.
1692

1693
Here's the code that makes one:
1694

1695
\begin{verbatim}
1696
class ExpoChirp(Chirp):
1697
    
1698
    def evaluate(self, ts):
1699
        start, end = np.log10(self.start), np.log10(self.end)
1700
        freqs = np.logspace(start, end, len(ts)-1)
1701
        return self._evaluate(ts, freqs)
1702
\end{verbatim}
1703

1704
Instead of {\tt np.linspace}, this version of evaluate uses
1705
{\tt np.logspace}, which creates a series of frequencies
1706
whose logarithms are equally spaced, which means that they increase
1707
exponentially.
1708

1709
That's it; everything else is the same as Chirp.  Here's the code
1710
that makes one:
1711

1712
\begin{verbatim}
1713
    signal = thinkdsp.ExpoChirp(start=220, end=880)
1714
    wave = signal.make_wave(duration=1)
1715
\end{verbatim}
1716

1717
You can listen to these examples in {\tt chap03.ipynb} and compare
1718
the linear and exponential chirps.  
1719

1720

1721
\section{Spectrum of a chirp}
1722
\label{sauron}
1723

1724
\begin{figure}
1725
% chirp.py
1726
\centerline{\includegraphics[height=2.5in]{figs/chirp1.pdf}}
1727
\caption{Spectrum of a one-second one-octave chirp.}
1728
\label{fig.chirp1}
1729
\end{figure}
1730

1731
What do you think happens if you compute the spectrum of a chirp?
1732
Here's an example that constructs a one-second, one-octave chirp and
1733
its spectrum:
1734

1735
\begin{verbatim}
1736
    signal = thinkdsp.Chirp(start=220, end=440)
1737
    wave = signal.make_wave(duration=1)
1738
    spectrum = wave.make_spectrum()
1739
\end{verbatim}
1740

1741
Figure~\ref{fig.chirp1} shows the result.  The spectrum has
1742
components at every frequency from 220 to 440 Hz, with variations
1743
that look a little like the Eye of Sauron
1744
(see \url{http://en.wikipedia.org/wiki/Sauron}).
1745

1746
The spectrum is approximately flat between 220 and 440 Hz, which
1747
indicates that the signal spends equal time at each frequency in this
1748
range.  Based on that observation, you should be able to guess what
1749
the spectrum of an exponential chirp looks like.
1750

1751
The spectrum gives hints about the structure of the signal,
1752
but it obscures the relationship between frequency and time.
1753
For example, we cannot tell by looking at this spectrum whether
1754
the frequency went up or down, or both.
1755

1756

1757
\section{Spectrogram}
1758

1759
\begin{figure}
1760
% chirp.py
1761
\centerline{\includegraphics[height=2.5in]{figs/chirp2.pdf}}
1762
\caption{Spectrogram of a one-second one-octave chirp.}
1763
\label{fig.chirp2}
1764
\end{figure}
1765

1766
To recover the relationship between frequency and time, we can break
1767
the chirp into segments and plot the spectrum of each segment.  The
1768
result is called a {\bf short-time Fourier transform} (STFT).
1769

1770
There are several ways to visualize a STFT, but the most common
1771
is a {\bf spectrogram}, which shows time on the x-axis and frequency
1772
on the y-axis.  Each column in the spectrogram shows the spectrum of
1773
a short segment, using color or grayscale to represent amplitude.
1774

1775
As an example, I'll compute the spectrogram of this chirp:
1776

1777
\begin{verbatim}
1778
signal = thinkdsp.Chirp(start=220, end=440)
1779
wave = signal.make_wave(duration=1, framerate=11025)
1780
\end{verbatim}
1781

1782
{\tt Wave} provides \verb"make_spectrogram", which returns a
1783
{\tt Spectrogram} object:
1784

1785
\begin{verbatim}
1786
spectrogram = wave.make_spectrogram(seg_length=512)
1787
spectrogram.plot(high=700)
1788
\end{verbatim}
1789

1790
\verb"seg_length" is the number of samples in each segment.  I chose
1791
512 because FFT is most efficient when the number of samples is a
1792
power of 2.
1793

1794
Figure~\ref{fig.chirp2} shows the result.  The x-axis shows time from
1795
0 to 1 seconds.  The y-axis shows frequency from 0 to 700 Hz.  I cut
1796
off the top part of the spectrogram; the full range goes to 5512.5 Hz,
1797
which is half of the framerate.
1798

1799
The spectrogram shows clearly that frequency increases linearly
1800
over time.  Similarly, in the spectrogram of an exponential chirp, we
1801
can see the shape of the exponential curve.
1802

1803
However, notice that the peak in each column is blurred across 2--3
1804
cells.  This blurring reflects the limited resolution of the
1805
spectrogram.
1806

1807

1808
\section{The Gabor limit}
1809
\label{gabor}
1810

1811
The {\bf time resolution} of the spectrogram is the duration of the
1812
segments, which corresponds to the width of the cells in the
1813
spectrogram.  Since each segment is 512 frames, and there are 11,025
1814
frames per second, the duration of each segment is about 0.046 seconds.
1815

1816
The {\bf frequency resolution} is the frequency range between
1817
elements in the spectrum, which corresponds to the height of the
1818
cells.  With 512 frames, we get 256 frequency components over a range
1819
from 0 to 5512.5 Hz, so the range between components is 21.6 Hz.
1820

1821
More generally, if $n$ is the segment length, the spectrum contains
1822
$n/2$ components.  If the framerate is $r$, the maximum frequency in
1823
the spectrum is $r/2$.  So the time resolution is $n/r$ and the
1824
frequency resolution is
1825
%
1826
\[ \frac{r/2}{n/2} \]
1827
%
1828
which is $r/n$.
1829

1830
Ideally we would like time resolution to be small, so we can see rapid
1831
changes in frequency.  And we would like frequency resolution to be
1832
small so we can see small changes in frequency.  But you can't have
1833
both.  Notice that time resolution, $n/r$, is the inverse of frequency
1834
resolution, $r/n$.  So if one gets smaller, the other gets bigger.
1835

1836
For example, if you double the segment length, you cut frequency
1837
resolution in half (which is good), but you double time resolution
1838
(which is bad).  Even increasing the framerate doesn't help.  You get
1839
more samples, but the range of frequencies increases at
1840
the same time.
1841

1842
This tradeoff is called the {\bf Gabor limit} and it is a fundamental
1843
limitation of this kind of time-frequency analysis.
1844

1845

1846
\section{Leakage}
1847

1848
\begin{figure}
1849
% chirp.py
1850
\centerline{\includegraphics[width=5.5in]{figs/windowing1.pdf}}
1851
\caption{Spectrum of a periodic segment of a sinusoid (left), a
1852
  non-periodic segment (middle), a windowed non-periodic segment
1853
  (right).}
1854
\label{fig.windowing1}
1855
\end{figure}
1856

1857
In order to explain how \verb"make_spectrogram" works, I have
1858
to explain windowing; and in order to explain windowing, I have to
1859
show you the problem it is meant to address, which is leakage.
1860

1861
The Discrete Fourier Transform (DFT), which we use to compute
1862
Spectrums, treats waves as if they are periodic; that is, it assumes
1863
that the finite segment it operates on is a complete period from an
1864
infinite signal that repeats over all time.  In practice, this
1865
assumption is often false, which creates problems.
1866

1867
One common problem is discontinuity at the beginning and end of the
1868
segment.  Because DFT assumes that the signal is periodic, it
1869
implicitly connects the end of the segment back to the beginning to
1870
make a loop.  If the end does not connect smoothly to the beginning,
1871
the discontinuity creates additional frequency components in the
1872
segment that are not in the signal.
1873

1874
As an example, let's start with a sine signal that contains only
1875
one frequency component at 440 Hz.
1876

1877
\begin{verbatim}
1878
    signal = thinkdsp.SinSignal(freq=440)
1879
\end{verbatim}
1880

1881
If we select a segment that happens to be an integer multiple of
1882
the period, the end of the segment connects smoothly with the
1883
beginning, and DFT behaves well.
1884

1885
\begin{verbatim}
1886
    duration = signal.period * 30
1887
    wave = signal.make_wave(duration)
1888
    spectrum = wave.make_spectrum()
1889
\end{verbatim}
1890

1891
Figure~\ref{fig.windowing1} (left) shows the result.  As expected,
1892
there is a single peak at 440 Hz.
1893

1894
But if the duration is not a multiple of the period, bad things
1895
happen.  With {\tt duration = signal.period * 30.25}, the signal
1896
starts at 0 and ends at 1.
1897

1898
Figure~\ref{fig.windowing1} (middle) shows
1899
the spectrum of this segment.  Again, the peak is at 440 Hz, but now
1900
there are additional components spread out from 240 to 640 Hz.  This
1901
spread is called {\bf spectral leakage}, because some of the energy that
1902
is actually at the fundamental frequency leaks into other frequencies.
1903

1904
In this example, leakage happens because we are using DFT on a segment
1905
that becomes discontinuous when we treat it as periodic.
1906

1907

1908
\section{Windowing}
1909

1910
\begin{figure}
1911
% chirp.py
1912
\centerline{\includegraphics[height=3.5in]{figs/windowing2.pdf}}
1913
\caption{Segment of a sinusoid (top), Hamming window (middle), product
1914
of the segment and the window (bottom).}
1915
\label{fig.windowing2}
1916
\end{figure}
1917

1918
We can reduce leakage by smoothing out the discontinuity between
1919
the beginning and end of the segment, and one way to do that is
1920
{\bf windowing}.
1921

1922
A ``window'' is a function designed to transform a non-periodic
1923
segment into something that can pass for periodic.
1924
Figure~\ref{fig.windowing2} (top) shows a segment where the end does not
1925
connect smoothly to the beginning.
1926

1927
Figure~\ref{fig.windowing2} (middle) shows a ``Hamming window'', one of the
1928
more common window functions.  No window function is perfect, but some
1929
can be shown to be optimal for different applications, and Hamming
1930
is a good, all-purpose window.
1931

1932
Figure~\ref{fig.windowing2} (bottom) shows the result of multiplying the
1933
window by the original signal.  Where the window is close to 1, the
1934
signal is unchanged.  Where the window is close to 0, the signal is
1935
attenuated.  Because the window tapers at both ends, the end of the
1936
segment connects smoothly to the beginning.
1937

1938
Figure~\ref{fig.windowing1} (right) shows the spectrum of the windowed
1939
signal.  Windowing has reduced leakage substantially, but not
1940
completely.
1941

1942
Here's what the code looks like.  {\tt Wave} provides {\tt window},
1943
which applies a Hamming window:
1944

1945
\begin{verbatim}
1946
#class Wave:
1947
    def window(self, window):
1948
        self.ys *= window
1949
\end{verbatim}
1950

1951
And NumPy provides {\tt hamming}, which computes a Hamming window
1952
with a given length:
1953

1954
\begin{verbatim}
1955
window = np.hamming(len(wave))
1956
wave.window(window)
1957
\end{verbatim}
1958

1959
NumPy provides functions to compute other window
1960
functions, including {\tt bartlett}, {\tt blackman}, {\tt hanning},
1961
and {\tt kaiser}.  One of the exercises at the end of this chapter
1962
asks you to experiment with these other windows.
1963

1964

1965
\section{Implementing spectrograms}
1966

1967
\begin{figure}
1968
% chirp.py
1969
\centerline{\includegraphics[height=2.5in]{figs/windowing3.pdf}}
1970
\caption{Overlapping Hamming windows.}
1971
\label{fig.windowing3}
1972
\end{figure}
1973

1974
Now that we understand windowing, we can understand the
1975
implementation of spectrogram.
1976
Here is the {\tt Wave} method that computes spectrograms:
1977

1978
\begin{verbatim}
1979
#class Wave:
1980
    def make_spectrogram(self, seg_length):
1981
        window = np.hamming(seg_length)
1982
        i, j = 0, seg_length
1983
        step = seg_length / 2
1984

1985
        spec_map = {}
1986

1987
        while j < len(self.ys):
1988
            segment = self.slice(i, j)
1989
            segment.window(window)
1990

1991
            t = (segment.start + segment.end) / 2
1992
            spec_map[t] = segment.make_spectrum()
1993

1994
            i += step
1995
            j += step
1996

1997
        return Spectrogram(spec_map, seg_length)
1998
\end{verbatim}
1999

2000
This is the longest function in the book, so if you can handle
2001
this, you can handle anything.
2002

2003
The parameter, {\tt self}, is a Wave object.
2004
\verb"seg_length" is the number of samples in each segment.
2005

2006
{\tt window} is a Hamming window with the same length as the segments.
2007

2008
{\tt i} and {\tt j} are the slice indices that select segments from
2009
the wave.  {\tt step} is the offset between segments.  Since {\tt
2010
  step} is half of \verb"seg_length", the segments overlap by half.
2011
Figure~\ref{fig.windowing3} shows what these overlapping windows look
2012
like.
2013

2014
\verb"spec_map" is a dictionary that maps from a timestamp to
2015
a Spectrum.
2016

2017
Inside the while loop, we select a slice from the wave and apply
2018
the window; then we construct a Spectrum
2019
object and add it to \verb"spec_map".  The nominal time of
2020
each segment, {\tt t}, is the midpoint.
2021

2022
Then we advance {\tt i} and {\tt j}, and continue as long as {\tt j}
2023
doesn't go past the end of the Wave.
2024

2025
Finally, the method constructs and returns a Spectrogram.  Here
2026
is the definition of Spectrogram:
2027

2028
\begin{verbatim}
2029
class Spectrogram(object):
2030
    def __init__(self, spec_map, seg_length):
2031
        self.spec_map = spec_map
2032
        self.seg_length = seg_length
2033
\end{verbatim}
2034

2035
Like many init methods, this one just stores the
2036
parameters as attributes.
2037

2038
{\tt Spectrogram} provides {\tt plot}, which generates a
2039
pseudocolor plot with time along the x-axis and frequency along
2040
the y-axis.
2041

2042
And that's how Spectrograms are implemented.
2043

2044

2045
\section{Exercises}
2046

2047
Solutions to these exercises are in {\tt chap03soln.ipynb}.
2048

2049
\begin{exercise}
2050
Run and listen to the examples in {\tt chap03.ipynb}, which is
2051
in the repository for this book, and also available at
2052
\url{http://tinyurl.com/thinkdsp03}.
2053

2054
In the leakage example, try replacing the Hamming window with one of
2055
the other windows provided by NumPy, and see what effect they have on
2056
leakage.  See
2057
\url{http://docs.scipy.org/doc/numpy/reference/routines.window.html}
2058
\end{exercise}
2059

2060

2061
\begin{exercise}
2062
Write a class called {\tt SawtoothChirp} that extends {\tt Chirp}
2063
and overrides {\tt evaluate} to generate a sawtooth waveform with
2064
frequency that increases (or decreases) linearly.
2065

2066
Hint: combine the evaluate functions from {\tt Chirp} and
2067
{\tt SawtoothSignal}.
2068

2069
Draw a sketch of what you think the spectrogram of this signal
2070
looks like, and then plot it.  The effect of aliasing should be
2071
visually apparent, and if you listen carefully, you can hear it.
2072
\end{exercise}
2073

2074

2075
\begin{exercise}
2076
Make a sawtooth chirp that sweeps from 2500 to 3000 Hz, then use it to
2077
make a wave with duration 1 s and framerate 20 kHz.  Draw a sketch of
2078
what you think the spectrum will look like.  Then plot the
2079
spectrum and see if you got it right.
2080
\end{exercise}
2081

2082

2083
\begin{exercise}
2084
In musical terminology, a ``glissando'' is a note that slides from one
2085
pitch to another, so it is similar to a chirp.
2086

2087
Find or make a recording of a glissando and plot a spectrogram of the
2088
first few seconds.  One suggestion: George Gershwin's {\it Rhapsody in
2089
  Blue} starts with a famous clarinet glissando, which you can download
2090
from \url{http://archive.org/details/rhapblue11924}.
2091
\end{exercise}
2092

2093

2094
\begin{exercise}
2095
A trombone player can play a glissando by extending the trombone
2096
slide while blowing continuously.  As the slide extends, the total
2097
length of the tube gets longer, and the resulting pitch is inversely
2098
proportional to length.
2099

2100
Assuming that the player moves the slide at a constant speed, how
2101
does frequency vary with time?  
2102

2103
Write a class called {\tt TromboneGliss} that extends {\tt Chirp} and
2104
provides {\tt evaluate}.  Make a wave that simulates a trombone
2105
glissando from C3 up to F3 and back down to C3.  C3 is 262 Hz; F3 is
2106
349 Hz.
2107

2108
Plot a spectrogram of the resulting wave.  Is a trombone glissando
2109
more like a linear or exponential chirp?
2110
\end{exercise}
2111

2112

2113
\begin{exercise}
2114
Make or find a recording of a series of vowel sounds and look at the
2115
spectrogram.  Can you identify different vowels?
2116
\end{exercise}
2117

2118

2119
\chapter{Noise}
2120

2121
In English, ``noise'' means an unwanted or unpleasant sound.  In the
2122
context of signal processing, it has two different senses:
2123

2124
\begin{enumerate}
2125

2126
\item As in English, it can mean an unwanted signal of any kind.  If
2127
two signals interfere with each other, each signal would consider
2128
the other to be noise.
2129

2130
\item ``Noise'' also refers to a signal that contains components at
2131
many frequencies, so it lacks the harmonic structure of the periodic
2132
signals we saw in previous chapters.  
2133

2134
\end{enumerate}
2135

2136
This chapter is about the second kind.
2137

2138
The code for this chapter is in {\tt chap04.ipynb}, which is in the
2139
repository for this book (see Section~\ref{code}).
2140
You can also view it at \url{http://tinyurl.com/thinkdsp04}.
2141

2142

2143
\section{Uncorrelated noise}
2144

2145
\begin{figure}
2146
% noise.py
2147
\centerline{\includegraphics[height=2.5in]{figs/whitenoise0.pdf}}
2148
\caption{Waveform of uncorrelated uniform noise.}
2149
\label{fig.whitenoise0}
2150
\end{figure}
2151

2152
The simplest way to understand noise is to generate it, and the
2153
simplest kind to generate is uncorrelated uniform noise (UU noise).
2154
``Uniform'' means the signal contains random values from a uniform
2155
distribution; that is, every value in the range is equally likely.
2156
``Uncorrelated'' means that the values are independent; that is,
2157
knowing one value provides no information about the others.
2158

2159
Here's a class that represents UU noise:
2160

2161
\begin{verbatim}
2162
class UncorrelatedUniformNoise(_Noise):
2163

2164
    def evaluate(self, ts):
2165
        ys = np.random.uniform(-self.amp, self.amp, len(ts))
2166
        return ys
2167
\end{verbatim}
2168

2169
{\tt UncorrelatedUniformNoise} inherits from \verb"_Noise", which
2170
inherits from {\tt Signal}.
2171

2172
As usual, the evaluate function takes {\tt ts}, the times when the
2173
signal should be evaluated.  It uses
2174
{\tt np.random.uniform}, which generates values from a
2175
uniform distribution.  In this example, the values are in
2176
the range between {\tt -amp} to {\tt amp}.
2177

2178
The following example generates UU noise with duration 0.5
2179
seconds at 11,025 samples per second.
2180

2181
\begin{verbatim}
2182
signal = thinkdsp.UncorrelatedUniformNoise()
2183
wave = signal.make_wave(duration=0.5, framerate=11025)
2184
\end{verbatim}
2185

2186
If you play this wave, it sounds like the static you hear if you tune
2187
a radio between channels.  Figure~\ref{fig.whitenoise0} shows what the
2188
waveform looks like.  As expected, it looks pretty random.
2189

2190
\begin{figure}
2191
% noise.py
2192
\centerline{\includegraphics[height=2.5in]{figs/whitenoise1.pdf}}
2193
\caption{Power spectrum of uncorrelated uniform noise.}
2194
\label{fig.whitenoise1}
2195
\end{figure}
2196

2197
Now let's take a look at the spectrum:
2198

2199
\begin{verbatim}
2200
spectrum = wave.make_spectrum()
2201
spectrum.plot_power()
2202
\end{verbatim}
2203

2204
\verb"Spectrum.plot_power" is similar to \verb"Spectrum.plot",
2205
except that it plots power instead of amplitude.
2206
Power is the square of amplitude.
2207
I am switching from amplitude to power in this chapter because
2208
it is more conventional in the context of noise.
2209

2210
Figure~\ref{fig.whitenoise1} shows the result.  Like the signal, the
2211
spectrum looks pretty random.  In fact, it {\em is} random, but we have to
2212
be more precise about the word ``random''.  There are at least three
2213
things we might like to know about a noise signal or its spectrum:
2214

2215
\begin{itemize}
2216

2217
\item Distribution: The distribution of a random signal is the set of
2218
  possible values and their probabilities.  For example, in the
2219
  uniform noise signal, the set of values is the range from -1 to 1,
2220
  and all values have the same probability.  An alternative is
2221
  {\bf Gaussian noise}, where the set of values is the range from negative
2222
  to positive infinity, but values near 0 are the most likely, with
2223
  probability that drops off according to the Gaussian or
2224
  ``bell'' curve.
2225

2226
\item Correlation: Is each value in the signal independent of the
2227
  others, or are there dependencies between them?  In UU noise, the
2228
  values are independent.
2229
  An alternative is {\bf Brownian noise}, where each value is the sum
2230
  of the previous value and a random ``step''.  So if the value of the
2231
  signal is high at a particular point in time, we expect it to stay
2232
  high, and if it is low, we expect
2233
  it to stay low.
2234

2235
\item Relationship between power and frequency: In the spectrum of UU
2236
  noise, the power at all frequencies is drawn from the same
2237
  distribution; that is, the average power is the same for all
2238
  frequencies.  An alternative is {\bf pink noise}, where power is
2239
  inversely related to frequency; that is, the power at frequency $f$
2240
  is drawn from a distribution whose mean is proportional to $1/f$.
2241

2242
\end{itemize}
2243

2244

2245
\section{Integrated spectrum}
2246

2247
For UU noise we can see the relationship between power and frequency
2248
more clearly by looking at the {\bf integrated spectrum}, which
2249
is a function of frequency, $f$, that shows the cumulative power in
2250
the spectrum up to $f$.
2251

2252
\begin{figure}
2253
% noise.py
2254
\centerline{\includegraphics[height=2.5in]{figs/whitenoise2.pdf}}
2255
\caption{Integrated spectrum of uncorrelated uniform noise.}
2256
\label{fig.whitenoise2}
2257
\end{figure}
2258

2259
{\tt Spectrum} provides a method that computes the IntegratedSpectrum:
2260

2261
\begin{verbatim}
2262
    def make_integrated_spectrum(self):
2263
        cs = np.cumsum(self.power)
2264
        cs /= cs[-1]
2265
        return IntegratedSpectrum(cs, self.fs)
2266
\end{verbatim}
2267

2268
{\tt self.power} is a NumPy array containing power for each frequency.
2269
{\tt np.cumsum} computes the cumulative sum of the powers.
2270
Dividing through by the last element normalizes the integrated
2271
spectrum so it runs from 0 to 1.
2272

2273
The result is an IntegratedSpectrum.  Here is the class definition:
2274

2275
\begin{verbatim}
2276
class IntegratedSpectrum(object):    
2277
    def __init__(self, cs, fs):
2278
        self.cs = cs
2279
        self.fs = fs
2280
\end{verbatim}
2281

2282
Like Spectrum, IntegratedSpectrum provides \verb"plot_power", so
2283
we can compute and plot the integrated spectrum like this:
2284

2285
\begin{verbatim}
2286
    integ = spectrum.make_integrated_spectrum()
2287
    integ.plot_power()
2288
\end{verbatim}
2289

2290
The result, shown in Figure~\ref{fig.whitenoise2}, is a straight line,
2291
which indicates that power at all frequencies is constant, on average.
2292
Noise with equal power at all frequencies is called {\bf white noise}
2293
by analogy with light, because an equal mixture of light at all
2294
visible frequencies is white.
2295

2296

2297

2298
\section{Brownian noise}
2299
\label{brownian}
2300

2301
\begin{figure}
2302
% noise.py
2303
\centerline{\includegraphics[height=2.5in]{figs/rednoise0.pdf}}
2304
\caption{Waveform of Brownian noise.}
2305
\label{fig.rednoise0}
2306
\end{figure}
2307

2308
UU noise is uncorrelated, which means that each value does not depend
2309
on the others.  An alternative is {\bf Brownian noise}, in which each value
2310
is the sum of the previous value and a random ``step''.
2311

2312
It is called ``Brownian'' by analogy with Brownian motion, in which a
2313
particle suspended in a fluid moves apparently at random, due to
2314
unseen interactions with the fluid.  Brownian motion is often
2315
described using a {\bf random walk}, which is a mathematical model 
2316
of a path where the distance between steps is characterized by a
2317
random distribution.
2318

2319
In a one-dimensional random walk, the particle moves up or down
2320
by a random amount at each time step.  The location of the particle
2321
at any point in time is the sum of all previous steps.
2322

2323
This observation suggests a way to generate Brownian noise:
2324
generate uncorrelated random steps and then add them up.
2325
Here is a class definition that implements this algorithm:
2326

2327
\begin{verbatim}
2328
class BrownianNoise(_Noise):
2329

2330
    def evaluate(self, ts):
2331
        dys = np.random.uniform(-1, 1, len(ts))
2332
        ys = np.cumsum(dys)
2333
        ys = normalize(unbias(ys), self.amp)
2334
        return ys
2335
\end{verbatim}
2336

2337
{\tt evaluate} uses {\tt np.random.uniform} to generate an
2338
uncorrelated signal and {\tt np.cumsum} to compute their cumulative
2339
sum.
2340

2341
Since the sum is likely to escape the range from -1 to
2342
1, we have to use {\tt unbias} to shift the mean to 0, and {\tt
2343
  normalize} to get the desired maximum amplitude.
2344

2345
Here's the code that generates a BrownianNoise object and plots the
2346
waveform.
2347

2348
\begin{verbatim}
2349
    signal = thinkdsp.BrownianNoise()
2350
    wave = signal.make_wave(duration=0.5, framerate=11025)
2351
    wave.plot()
2352
\end{verbatim}
2353

2354
Figure~\ref{fig.rednoise0} shows the result.  The waveform
2355
wanders up and down, but there is a clear correlation between
2356
successive values.  When the amplitude is high, it tends to stay
2357
high, and vice versa.
2358

2359
\begin{figure}
2360
% noise.py
2361
\centerline{\includegraphics[height=2.5in]{figs/rednoise3.pdf}}
2362
\caption{Spectrum of Brownian noise on a linear scale (left) and
2363
  log-log scale (right).}
2364
\label{fig.rednoise3}
2365
\end{figure}
2366

2367
If you plot the spectrum of Brownian noise on a linear scale, as
2368
in Figure~\ref{fig.rednoise3} (left), it
2369
doesn't look like much.  Nearly all of the power is at the lowest
2370
frequencies; the higher frequency components are not visible.
2371

2372
To see the shape of the spectrum more clearly, we can plot power
2373
and frequency on a log-log scale.  Here's the code:
2374

2375
\begin{verbatim}
2376
    spectrum = wave.make_spectrum()
2377
    spectrum.plot_power(linewidth=1, alpha=0.5)
2378
    thinkplot.config(xscale='log', yscale='log')
2379
\end{verbatim}
2380

2381
The result is in Figure~\ref{fig.rednoise3} (right).  The relationship between
2382
power and frequency is noisy, but roughly linear.
2383

2384
{\tt Spectrum} provides \verb"estimate_slope", which uses SciPy to compute
2385
a least squares fit to the power spectrum:
2386

2387
\begin{verbatim}
2388
#class Spectrum
2389

2390
    def estimate_slope(self):
2391
        x = np.log(self.fs[1:])
2392
        y = np.log(self.power[1:])
2393
        t = scipy.stats.linregress(x,y)
2394
        return t
2395
\end{verbatim}
2396

2397
It discards the first component of the spectrum because
2398
this component corresponds to $f=0$, and $\log 0$ is undefined.
2399

2400
\verb"estimate_slope" returns the result from {\tt
2401
  scipy.stats.linregress} which is an object that contains the
2402
estimated slope and intercept, coefficient of determination ($R^2$),
2403
p-value, and standard error.  For our purposes, we only need the
2404
slope.
2405

2406
For Brownian noise, the slope of the power spectrum is -2 (we'll see
2407
why in Chapter~\ref{diffint}), so we can write this relationship:
2408
%
2409
\[ \log P = k -2 \log f \]
2410
%
2411
where $P$ is power, $f$ is frequency, and $k$ is the intercept
2412
of the line, which is not important for our purposes.
2413
Exponentiating both sides yields:
2414
%
2415
\[ P = K / f^{2} \]
2416
%
2417
where $K$ is $e^k$, but still not important.  More relevant is
2418
that power is proportional to $1/f^2$, which is characteristic
2419
of Brownian noise.
2420

2421
Brownian noise is also called {\bf red noise}, for the same reason that
2422
white noise is called ``white''.  If you combine visible light with
2423
power proportional to $1/f^2$, most of the power
2424
would be at the low-frequency end of the spectrum, which is red.
2425
Brownian noise is also sometimes called ``brown noise'', but I think
2426
that's confusing, so I won't use it.
2427

2428
%TODO: refer back to this in the diff/int chapter
2429

2430

2431
\section{Pink Noise}
2432
\label{pink}
2433

2434
\begin{figure}
2435
% noise.py
2436
\centerline{\includegraphics[height=2.5in]{figs/pinknoise0.pdf}}
2437
\caption{Waveform of pink noise with $\beta=1$.}
2438
\label{fig.pinknoise0}
2439
\end{figure}
2440

2441
For red noise, the relationship between frequency
2442
and power is
2443
%
2444
\[ P = K / f^{2} \]
2445
%
2446
There is nothing special about the exponent 2.  More generally,
2447
we can synthesize noise with any exponent, $\beta$.
2448
%
2449
\[ P = K / f^{\beta} \]
2450
%
2451
When $\beta = 0$, power is constant at all frequencies,
2452
so the result is white noise.  When $\beta=2$ the result is red noise.
2453

2454
When $\beta$ is between 0 and 2, the result is between white and
2455
red noise, so it is called {\bf pink noise}.
2456

2457
There are several ways to generate pink noise.  The simplest is to
2458
generate white noise and then apply a low-pass filter with the
2459
desired exponent.  {\tt thinkdsp} provides a class that represents
2460
a pink noise signal:
2461

2462
\begin{verbatim}
2463
class PinkNoise(_Noise):
2464

2465
    def __init__(self, amp=1.0, beta=1.0):
2466
        self.amp = amp
2467
        self.beta = beta
2468
\end{verbatim}
2469

2470
{\tt amp} is the desired amplitude of the signal.
2471
{\tt beta} is the desired exponent.  {\tt PinkNoise} provides
2472
\verb"make_wave", which generates a Wave.
2473

2474
\begin{verbatim}
2475
    def make_wave(self, duration=1, start=0, framerate=11025):
2476
        signal = UncorrelatedUniformNoise()
2477
        wave = signal.make_wave(duration, start, framerate)
2478
        spectrum = wave.make_spectrum()
2479

2480
        spectrum.pink_filter(beta=self.beta)
2481

2482
        wave2 = spectrum.make_wave()
2483
        wave2.unbias()
2484
        wave2.normalize(self.amp)
2485
        return wave2
2486
\end{verbatim}
2487

2488
{\tt duration} is the length of the wave in seconds.  {\tt start} is
2489
the start time of the wave; it is included so that \verb"make_wave"
2490
has the same interface for all types of signal, but for random noise,
2491
start time is irrelevant.  And {\tt framerate} is the number of
2492
samples per second.
2493

2494
\begin{figure}
2495
% noise.py
2496
\centerline{\includegraphics[height=2.5in]{figs/noise-triple.pdf}}
2497
\caption{Spectrum of white, pink, and red noise on a log-log scale.}
2498
\label{fig.noise-triple}
2499
\end{figure}
2500

2501
\verb"make_wave" creates a white noise wave, computes its spectrum,
2502
applies a filter with the desired exponent, and then converts the
2503
filtered spectrum back to a wave.  Then it unbiases and normalizes
2504
the wave.
2505

2506
{\tt Spectrum} provides \verb"pink_filter":
2507

2508
\begin{verbatim}
2509
    def pink_filter(self, beta=1.0):
2510
        denom = self.fs ** (beta/2.0)
2511
        denom[0] = 1
2512
        self.hs /= denom
2513
\end{verbatim}
2514

2515
\verb"pink_filter" divides each element of the spectrum by
2516
$f^{\beta/2}$.  Since power is the square of amplitude, this
2517
operation divides the power at
2518
each component by $f^\beta$.  It treats the component
2519
at $f=0$ as a special case, partly to avoid dividing by 0, and partly
2520
because this element represents the bias of the signal,
2521
which we are going to set to 0 anyway. 
2522

2523
Figure~\ref{fig.pinknoise0} shows the resulting waveform.  Like
2524
Brownian noise, it wanders up and down in a way that suggests
2525
correlation between successive values, but at least visually, it looks
2526
more random.  In the next chapter we will come back to this
2527
observation and I will be more precise about what I mean by
2528
``correlation'' and ``more random''.
2529

2530
Finally, Figure~\ref{fig.noise-triple} shows a spectrum for
2531
white, pink, and red noise on the same log-log scale.
2532
The relationship between the exponent, $\beta$, and the slope
2533
of the spectrum is apparent in this figure.
2534

2535

2536
\section{Gaussian noise}
2537

2538
\begin{figure}
2539
% noise.py
2540
\centerline{\includegraphics[height=2.5in]{figs/noise1.pdf}}
2541
\caption{Normal probability plot for the real and imaginary parts
2542
of the spectrum of Gaussian noise.}
2543
\label{fig.noise1}
2544
\end{figure}
2545

2546
We started with uncorrelated uniform (UU) noise and showed that,
2547
because its spectrum has equal power at all frequencies, on
2548
average, UU noise is white.
2549

2550
But when people talk about ``white noise'', they don't always
2551
mean UU noise.  In fact, more often they mean uncorrelated
2552
Gaussian (UG) noise.
2553

2554
{\tt thinkdsp} provides an implementation of UG noise:
2555

2556
\begin{verbatim}
2557
class UncorrelatedGaussianNoise(_Noise):
2558

2559
    def evaluate(self, ts):
2560
        ys = np.random.normal(0, self.amp, len(ts))
2561
        return ys
2562
\end{verbatim}
2563

2564
{\tt np.random.normal} returns a NumPy array of values from a
2565
Gaussian distribution, in this case with mean 0 and standard deviation
2566
{\tt self.amp}.  In theory the range of values is from negative to
2567
positive infinity, but we expect about 99\% of the values to be
2568
between -3 and 3.
2569

2570
UG noise is similar in many ways to UU noise.  The spectrum has
2571
equal power at all frequencies, on average, so UG is also white.
2572
And it has one other interesting property: the spectrum of UG
2573
noise is also UG noise.  More precisely, the real and imaginary
2574
parts of the spectrum are uncorrelated Gaussian values.
2575

2576
To test that claim, we can generate the spectrum of UG noise and
2577
then generate a ``normal probability plot'', which is a graphical
2578
way to test whether a distribution is Gaussian.
2579

2580
\begin{verbatim}
2581
    signal = thinkdsp.UncorrelatedGaussianNoise()
2582
    wave = signal.make_wave(duration=0.5, framerate=11025)
2583
    spectrum = wave.make_spectrum()
2584

2585
    thinkstats2.NormalProbabilityPlot(spectrum.real)
2586
    thinkstats2.NormalProbabilityPlot(spectrum.imag)
2587
\end{verbatim}
2588

2589
{\tt NormalProbabilityPlot} is provided by {\tt thinkstats2}, which is
2590
included in the repository for this book.  If you are not familiar
2591
with normal probability plots, you can read about them in Chapter 5
2592
of {\it Think Stats} at \url{http://thinkstats2.com}.
2593

2594
Figure~\ref{fig.noise1} shows the results.  The gray lines
2595
show a linear model fit to the data; the dark lines show the
2596
data.
2597

2598
A straight line on a normal probability plot indicates
2599
that the data come from a Gaussian distribution.  Except for
2600
some random variation at the extremes, these lines are straight,
2601
which indicates that the spectrum of UG noise is UG noise.
2602

2603
The spectrum of UU noise is also UG noise, at least approximately.  In
2604
fact, by the Central Limit Theorem, the spectrum of almost any
2605
uncorrelated noise is approximately Gaussian, as long as the
2606
distribution has finite mean and standard deviation, and the number of
2607
samples is large.
2608

2609

2610
\section{Exercises}
2611

2612
Solutions to these exercises are in {\tt chap04soln.ipynb}.
2613

2614
\begin{exercise}
2615
``A Soft Murmur'' is a web site that plays a mixture of natural
2616
noise sources, including rain, waves, wind, etc.  At
2617
\url{http://asoftmurmur.com/about/} you can find their list
2618
of recordings, most of which are at \url{http://freesound.org}.
2619

2620
Download a few of these files and compute the spectrum of each
2621
signal.  Does the power spectrum look like white noise, pink noise,
2622
or Brownian noise?  How does the spectrum vary over time?
2623
\end{exercise}
2624

2625

2626
\begin{exercise}
2627
In a noise signal, the mixture of frequencies changes over time.
2628
In the long run, we expect the power at all frequencies to be equal,
2629
but in any sample, the power at each frequency is random.
2630

2631
To estimate the long-term average power at each frequency, we can
2632
break a long signal into segments, compute the power spectrum
2633
for each segment, and then compute the average across
2634
the segments.  You can read more about this algorithm at
2635
\url{http://en.wikipedia.org/wiki/Bartlett's_method}.
2636

2637
Implement Bartlett's method and use it to estimate the power
2638
spectrum for a noise wave.  Hint: look at the implementation
2639
of \verb"make_spectrogram".
2640
\end{exercise}
2641

2642

2643
\begin{exercise}
2644
At \url{http://www.coindesk.com} you can download the daily
2645
price of a BitCoin as a CSV file.  Read this file and compute
2646
the spectrum of BitCoin prices as a function of time.
2647
Does it resemble white, pink, or Brownian noise?
2648
\end{exercise}
2649

2650

2651
\begin{exercise}
2652
A Geiger counter is a device that detects radiation.
2653
When an ionizing particle strikes the detector, it outputs a surge of
2654
current.  The total output at a point in time can be modeled as 
2655
uncorrelated Poisson (UP) noise, where each sample is
2656
a random quantity from a Poisson distribution, which corresponds to the
2657
number of particles detected during an interval.
2658

2659
Write a class called {\tt UncorrelatedPoissonNoise} that inherits
2660
from \verb"thinkdsp._Noise" and provides {\tt evaluate}.  It should
2661
use {\tt np.random.poisson} to generate random values from a Poisson
2662
distribution.  The parameter of this function, {\tt lam}, is the
2663
average number of particles during each interval.  You can use the
2664
attribute {\tt amp} to specify {\tt lam}.  For example, if the
2665
framerate is 10 kHz and {\tt amp} is 0.001, we expect about 10
2666
``clicks'' per second.
2667

2668
Generate about a second of UP noise and listen to it.  For low values
2669
of {\tt amp}, like 0.001, it should sound like a Geiger counter.  For
2670
higher values it should sound like white noise.  Compute and plot the
2671
power spectrum to see whether it looks like white noise.
2672
\end{exercise}
2673

2674

2675
\begin{exercise}
2676
The algorithm in this chapter for generating pink noise is
2677
conceptually simple but computationally expensive.  There are
2678
more efficient alternatives, like the Voss-McCartney algorithm.
2679
Research this method, implement it, compute the spectrum of
2680
the result, and confirm that it has the desired relationship
2681
between power and frequency.
2682
\end{exercise}
2683

2684

2685
\chapter{Autocorrelation}
2686

2687
In the previous chapter I characterized white noise as ``uncorrelated'',
2688
which means that each value is independent of the others, and Brownian
2689
noise as ``correlated'', because each value depends on the preceding
2690
value.  In this chapter I define these terms more precisely and
2691
present the {\bf autocorrelation function}, which is a useful tool
2692
for signal analysis.
2693

2694
The code for this chapter is in {\tt chap05.ipynb}, which is in the
2695
repository for this book (see Section~\ref{code}).
2696
You can also view it at \url{http://tinyurl.com/thinkdsp05}.
2697

2698

2699
\section{Correlation}
2700

2701
In general, correlation between variables means that if you know the
2702
value of one, you have some information about the other.  There are
2703
several ways to quantify correlation, but the most common is the
2704
Pearson product-moment correlation coefficient, usually denoted
2705
$\rho$.  For two variables, $x$ and $y$, that each contain $N$ values:
2706
%
2707
\[ \rho = \frac{ \sum_i (x_i - \mu_x) (y_i - \mu_y)}{N \sigma_x \sigma_y} \]
2708
%
2709
Where $\mu_x$ and $\mu_y$ are the means of $x$ and $y$, and
2710
$\sigma_x$ and $\sigma_y$ are their standard deviations.
2711

2712
Pearson's correlation is always between -1 and +1 (including both).
2713
If $\rho$ is positive, we say that the correlation is positive,
2714
which means that when one variable is high, the other tends to be
2715
high.  If $\rho$ is negative, the correlation is negative, so
2716
when one variable is high, the other tends to be low.
2717

2718
The magnitude of $\rho$ indicates the strength of the correlation.  If
2719
$\rho$ is 1 or -1, the variables are perfectly correlated, which means
2720
that if you know one, you can make a perfect prediction about the
2721
other.  If $\rho$ is near zero, the correlation is probably weak, so
2722
if you know one, it doesn't tell you much about the others,
2723

2724
I say ``probably weak'' because it is also possible that there is
2725
a nonlinear relationship that is not captured by the coefficient
2726
of correlation.  Nonlinear relationships are often important in
2727
statistics, but less often relevant for signal processing, so I
2728
won't say more about them here.
2729

2730
Python provides several ways to compute correlations.  {\tt
2731
  np.corrcoef} takes any number of variables and computes a {\bf
2732
  correlation matrix} that includes correlations between each pair of
2733
variables.
2734

2735
\begin{figure}
2736
% autocorr.py
2737
\centerline{\includegraphics[height=2.5in]{figs/autocorr1.pdf}}
2738
\caption{Two sine waves that differ by a phase offset of 1 radian;
2739
their coefficient of correlation is 0.54.}
2740
\label{fig.autocorr1}
2741
\end{figure}
2742

2743
I'll present an example with only two variables.  First, I define
2744
a function that constructs sine waves with different phase offsets:
2745

2746
\begin{verbatim}
2747
def make_sine(offset):
2748
    signal = thinkdsp.SinSignal(freq=440, offset=offset)
2749
    wave = signal.make_wave(duration=0.5, framerate=10000)
2750
    return wave
2751
\end{verbatim}
2752

2753
Next I instantiate two waves with different offsets:
2754

2755
\begin{verbatim}
2756
wave1 = make_sine(offset=0)
2757
wave2 = make_sine(offset=1)
2758
\end{verbatim}
2759

2760
Figure~\ref{fig.autocorr1} shows what the first few periods of these
2761
waves look like.  When one wave is high, the other is usually high, so we
2762
expect them to be correlated.
2763

2764
\begin{verbatim}
2765
>>> corr_matrix = np.corrcoef(wave1.ys, wave2.ys, ddof=0)
2766
[[ 1.    0.54]
2767
 [ 0.54  1.  ]]
2768
\end{verbatim}
2769

2770
The option {\tt ddof=0} indicates that {\tt corrcoef} should divide by
2771
$N$, as in the equation above, rather than use the default, $N-1$.
2772

2773
The result is a correlation matrix:
2774
the first element is the correlation of {\tt wave1}
2775
with itself, which is always 1.  Similarly, the last element
2776
is the correlation of {\tt wave2} with itself.
2777

2778
\begin{figure}
2779
% autocorr.py
2780
\centerline{\includegraphics[height=2.5in]{figs/autocorr2.pdf}}
2781
\caption{The correlation of two sine waves as a function of the
2782
phase offset between them.  The result is a cosine.}
2783
\label{fig.autocorr2}
2784
\end{figure}
2785

2786
The off-diagonal elements contain the value we're interested in,
2787
the correlation of {\tt wave1} and {\tt wave2}.  The value 0.54
2788
indicates that the strength of the correlation is moderate.
2789

2790
As the phase offset increases, this correlation decreases until
2791
the waves are 180 degrees out of phase, which yields correlation
2792
-1.  Then it increases until the offset differs by 360 degrees.
2793
At that point we have come full circle and the correlation is 1.
2794

2795
Figure~\ref{fig.autocorr2} shows the relationship between correlation and
2796
phase offset for a sine wave.  The shape of that curve should look
2797
familiar; it is a cosine.  
2798

2799
{\tt thinkdsp} provides a simple interface for computing the
2800
correlation between waves:
2801

2802
\begin{verbatim}
2803
>>> wave1.corr(wave2)
2804
0.54
2805
\end{verbatim}
2806

2807

2808
\section{Serial correlation}
2809

2810
Signals often represent measurements of quantities that vary in
2811
time.  For example, the sound signals we've worked with represent
2812
measurements of voltage (or current), which correspond to the changes
2813
in air pressure we perceive as sound.
2814

2815
Measurements like this almost always have serial correlation, which
2816
is the correlation between each element and the next (or the previous).
2817
To compute serial correlation, we can shift a signal and then compute
2818
the correlation of the shifted version with the original.
2819

2820
\begin{verbatim}
2821
def serial_corr(wave, lag=1):
2822
    n = len(wave)
2823
    y1 = wave.ys[lag:]
2824
    y2 = wave.ys[:n-lag]
2825
    corr = np.corrcoef(y1, y2, ddof=0)[0, 1]
2826
    return corr
2827
\end{verbatim}
2828

2829
\verb"serial_corr" takes a Wave object and
2830
{\tt lag}, which is the integer number of places to shift the wave.
2831
It computes the correlation of the wave with a shifted version
2832
of itself.
2833

2834
We can test this function with the noise signals from the previous
2835
chapter.  We expect UU noise to be uncorrelated, based on the
2836
way it's generated (not to mention the name):
2837

2838
\begin{verbatim}
2839
signal = thinkdsp.UncorrelatedGaussianNoise()
2840
wave = signal.make_wave(duration=0.5, framerate=11025)
2841
serial_corr(wave)
2842
\end{verbatim}
2843

2844
When I ran this example, I got 0.006, which indicates a very
2845
small serial correlation.  You might get a different value when you run
2846
it, but it should be comparably small.
2847

2848
In a Brownian noise signal, each value is the sum of the previous
2849
value and a random ``step'', so we expect a strong serial
2850
correlation:
2851

2852
\begin{verbatim}
2853
signal = thinkdsp.BrownianNoise()
2854
wave = signal.make_wave(duration=0.5, framerate=11025)
2855
serial_corr(wave)
2856
\end{verbatim}
2857

2858
Sure enough, the result I got is greater than 0.999.
2859

2860
\begin{figure}
2861
% autocorr.py
2862
\centerline{\includegraphics[height=2.5in]{figs/autocorr3.pdf}}
2863
\caption{Serial correlation for pink noise with a range of
2864
parameters.}
2865
\label{fig.autocorr3}
2866
\end{figure}
2867

2868
Since pink noise is in some sense between Brownian noise and UU noise,
2869
we might expect an intermediate correlation:
2870

2871
\begin{verbatim}
2872
signal = thinkdsp.PinkNoise(beta=1)
2873
wave = signal.make_wave(duration=0.5, framerate=11025)
2874
serial_corr(wave)
2875
\end{verbatim}
2876

2877
With parameter $\beta=1$, I got a serial correlation of 0.851.
2878
As we vary the parameter from $\beta=0$, which is uncorrelated
2879
noise, to $\beta=2$, which is Brownian, serial correlation
2880
ranges from 0 to almost 1, as shown in Figure~\ref{fig.autocorr3}.
2881

2882

2883
\section{Autocorrelation}
2884
\label{autopink}
2885

2886
In the previous section we computed the correlation between each
2887
value and the next, so we shifted the elements of the array by 1.
2888
But we can easily compute serial correlations with
2889
different lags.
2890

2891
\begin{figure}
2892
% autocorr.py
2893
\centerline{\includegraphics[height=2.5in]{figs/autocorr4.pdf}}
2894
\caption{Autocorrelation functions for pink noise with a range
2895
of parameters.}
2896
\label{fig.autocorr4}
2897
\end{figure}
2898

2899
You can think of \verb"serial_corr" as a function that
2900
maps from each value of lag to the corresponding correlation, and we
2901
can evaluate that function by looping through values of {\tt lag}:
2902

2903
\begin{verbatim}
2904
def autocorr(wave):
2905
    lags = range(len(wave.ys)//2)
2906
    corrs = [serial_corr(wave, lag) for lag in lags]
2907
    return lags, corrs
2908
\end{verbatim}
2909

2910
{\tt autocorr} takes a Wave object and returns the autocorrelation
2911
function as a pair of sequences: {\tt lags} is a sequence of
2912
integers from 0 to half the length of the wave; {\tt corrs}
2913
is the sequence of serial correlations for each lag.
2914

2915
Figure~\ref{fig.autocorr4} shows autocorrelation functions for pink
2916
noise with three values of $\beta$.  For low values of $\beta$, the
2917
signal is less correlated, and the autocorrelation function drops
2918
off to zero quickly.  For larger values, serial correlation
2919
is stronger and drops off more slowly.  With $\beta=1.7$ serial
2920
correlation is strong even for long lags; this phenomenon is called
2921
{\bf long-range dependence}, because it indicates that each value in
2922
the signal depends on many preceding values.
2923

2924

2925
\section{Autocorrelation of periodic signals}
2926

2927
The autocorrelation of pink noise has interesting mathematical
2928
properties, but limited applications.  The autocorrelation of
2929
periodic signals is more useful.
2930

2931
\begin{figure}
2932
% autocorr.py
2933
\centerline{\includegraphics[height=2.5in]{figs/autocorr5.pdf}}
2934
\caption{Spectrogram of a vocal chirp.}
2935
\label{fig.autocorr5}
2936
\end{figure}
2937

2938
As an example, I downloaded from \url{freesound.org} a recording of
2939
someone singing a chirp; the repository for this book includes the
2940
file: \url{28042__bcjordan__voicedownbew.wav}.  You can use the
2941
Jupyter notebook for this chapter, {\tt chap05.ipynb}, to play it.
2942

2943
Figure~\ref{fig.autocorr5} shows the spectrogram of this wave.
2944
The fundamental frequency and some of the harmonics show up clearly.
2945
The chirp starts near 500 Hz and drops down to about 300 Hz, roughly
2946
from C5 to E4.
2947

2948
\begin{figure}
2949
% autocorr.py
2950
\centerline{\includegraphics[height=2.5in]{figs/autocorr6.pdf}}
2951
\caption{Spectrum of a segment from a vocal chirp.}
2952
\label{fig.autocorr6}
2953
\end{figure}
2954

2955
To estimate pitch at a particular point in time, we could use the
2956
spectrum, but it doesn't work very well.  To see why not, I'll take
2957
a short segment from the wave and plot its spectrum:
2958

2959
\begin{verbatim}
2960
    duration = 0.01
2961
    segment = wave.segment(start=0.2, duration=duration)
2962
    spectrum = segment.make_spectrum()
2963
    spectrum.plot(high=1000)
2964
\end{verbatim}
2965

2966
This segment starts at 0.2 seconds and lasts 0.01 seconds.
2967
Figure~\ref{fig.autocorr6} shows its spectrum.  There is a clear peak
2968
near 400 Hz, but it is hard to identify the pitch precisely.  The
2969
length of the segment is 441 samples at a framerate of 44100 Hz, so
2970
the frequency resolution is 100 Hz (see Section~\ref{gabor}).
2971
That means the estimated pitch might be off by 50 Hz; in musical
2972
terms, the range from 350 Hz to 450 Hz is about 5 semitones, which is
2973
a big difference!
2974

2975
We could get better frequency resolution by taking a longer segment,
2976
but since the pitch is changing over time, we would also get ``motion
2977
blur''; that is, the peak would spread between the start and end pitch of
2978
the segment, as we saw in Section~\ref{sauron}.
2979

2980
We can estimate pitch more precisely using autocorrelation.
2981
If a signal is periodic, we expect the autocorrelation to spike
2982
when the lag equals the period.
2983

2984
\begin{figure}
2985
% autocorr.py
2986
\centerline{\includegraphics[height=2.5in]{figs/autocorr7.pdf}}
2987
\caption{Two segments from a chirp, one starting 0.0023 seconds
2988
after the other.}
2989
\label{fig.autocorr7}
2990
\end{figure}
2991

2992
To show why that works, I'll plot two segments from the same
2993
recording.
2994

2995
\begin{verbatim}
2996
def plot_shifted(wave, offset=0.001, start=0.2):
2997
    thinkplot.preplot(2)
2998
    segment1 = wave.segment(start=start, duration=0.01)
2999
    segment1.plot(linewidth=2, alpha=0.8)
3000

3001
    segment2 = wave.segment(start=start-offset, duration=0.01)
3002
    segment2.shift(offset)
3003
    segment2.plot(linewidth=2, alpha=0.4)
3004

3005
    corr = segment1.corr(segment2)
3006
    text = r'$\rho =$ %.2g' % corr
3007
    thinkplot.text(segment1.start+0.0005, -0.8, text)
3008
    thinkplot.config(xlabel='Time (s)')
3009
\end{verbatim} 
3010

3011
One segment starts at 0.2 seconds; the other starts 0.0023 seconds
3012
later.  Figure~\ref{fig.autocorr7} shows the result.  The segments
3013
are similar, and their correlation is 0.99.  This result suggests
3014
that the period is near 0.0023 seconds, which corresponds to a frequency
3015
of 435 Hz.
3016

3017
\begin{figure}
3018
% autocorr.py
3019
\centerline{\includegraphics[height=2.5in]{figs/autocorr8.pdf}}
3020
\caption{Autocorrelation function for a segment from a chirp.}
3021
\label{fig.autocorr8}
3022
\end{figure}
3023

3024
For this example, I estimated the period by trial and error.  To automate
3025
the process, we can use the autocorrelation function.
3026

3027
\begin{verbatim}
3028
    lags, corrs = autocorr(segment)
3029
    thinkplot.plot(lags, corrs)
3030
\end{verbatim}
3031

3032
Figure~\ref{fig.autocorr8} shows the autocorrelation function for
3033
the segment starting at $t=0.2$ seconds.  The first peak occurs at
3034
{\tt lag=101}.  We can compute the frequency that corresponds
3035
to that period like this:
3036

3037
\begin{verbatim}
3038
    period = lag / segment.framerate
3039
    frequency = 1 / period
3040
\end{verbatim}
3041

3042
The estimated fundamental frequency is 437 Hz.  To evaluate the
3043
precision of the estimate, we can run the same computation with
3044
lags 100 and 102, which correspond to frequencies 432 and 441 Hz.
3045
The frequency precision using autocorrelation is less than 10 Hz,
3046
compared with 100 Hz using the spectrum.  In musical terms, the
3047
expected error is about 30 cents (a third of a semitone).
3048

3049

3050
\section{Correlation as dot product}
3051
\label{dotproduct}
3052

3053
I started the chapter with this definition of Pearson's
3054
correlation coefficient:
3055
%
3056
\[ \rho = \frac{ \sum_i (x_i - \mu_x) (y_i - \mu_y)}{N \sigma_x \sigma_y} \]
3057
%
3058
Then I used $\rho$ to define serial correlation and autocorrelation.
3059
That's consistent with how these terms are used in statistics,
3060
but in the context of signal processing, the definitions are
3061
a little different.
3062

3063
In signal processing, we are often working with unbiased signals,
3064
where the mean is 0, and normalized signals, where the standard
3065
deviation is 1.  In that case, the definition of $\rho$ simplifies to:
3066
%
3067
\[ \rho = \frac{1}{N} \sum_i x_i y_i \]
3068
%
3069
And it is common to simplify even further:
3070
%
3071
\[ r = \sum_i x_i y_i \]
3072
%
3073
This definition of correlation is not ``standardized'', so it doesn't
3074
generally fall between -1 and 1.  But it has other useful properties.
3075

3076
If you think of $x$ and $y$ as vectors, you might recognize this
3077
formula as the {\bf dot product}, $x \cdot y$.  See
3078
\url{http://en.wikipedia.org/wiki/Dot_product}.
3079

3080
\newcommand{\norm}{\mathrm{norm}}
3081

3082
The dot product indicates the degree to which the signals are similar.
3083
If they are normalized so their standard deviations are 1,
3084
%
3085
\[ x \cdot y = \cos \theta \]
3086
%
3087
where $\theta$ is the angle between the vectors.  And that explains
3088
why Figure~\ref{fig.autocorr2} is a cosine curve.
3089

3090

3091
\section{Using NumPy}
3092
\label{correlate}
3093

3094
\begin{figure}
3095
% autocorr.py
3096
\centerline{\includegraphics[height=2.5in]{figs/autocorr9.pdf}}
3097
\caption{Autocorrelation function computed with {\tt np.correlate}.}
3098
\label{fig.autocorr9}
3099
\end{figure}
3100

3101
NumPy provides a function, {\tt correlate}, that computes
3102
the correlation of two functions or the autocorrelation of one
3103
function.  We can use it to compute the autocorrelation of
3104
the segment from the previous section:
3105

3106
\begin{verbatim}
3107
corrs2 = np.correlate(segment.ys, segment.ys, mode='same')
3108
\end{verbatim}
3109

3110
The option {\tt mode} tells {\tt correlate} what range
3111
of {\tt lag} to use.  With the value {\tt 'same'}, the
3112
range is from $-N/2$ to $N/2$, where $N$ is the length of the
3113
wave array.
3114

3115
Figure~\ref{fig.autocorr9} shows the result.  It is symmetric because
3116
the two signals are identical, so a negative lag on one has the same
3117
effect as a positive lag on the other.  To compare with the results
3118
from {\tt autocorr}, we can select the second half:
3119

3120
\begin{verbatim}
3121
    N = len(corrs2)
3122
    half = corrs2[N//2:]
3123
\end{verbatim}
3124

3125
If you compare Figure~\ref{fig.autocorr9} to Figure~\ref{fig.autocorr8},
3126
you'll notice that the correlations computed by {\tt np.correlate}
3127
get smaller as the lags increase.  That's because {\tt np.correlate}
3128
uses the unstandardized definition of correlation;
3129
as the lag gets bigger, the
3130
overlap between the two signals gets smaller, so the magnitude of
3131
the correlations decreases.
3132

3133
We can correct that by dividing through by the lengths:
3134

3135
\begin{verbatim}
3136
    lengths = range(N, N//2, -1)
3137
    half /= lengths
3138
\end{verbatim}
3139

3140
Finally, we can standardize the results so the correlation with
3141
{\tt lag=0} is 1.
3142

3143
\begin{verbatim}
3144
    half /= half[0]
3145
\end{verbatim}
3146

3147
With these adjustments, the results computed by {\tt autocorr} and
3148
{\tt np.correlate} are nearly the same.  They still differ by
3149
1-2\%.  The reason is not important, but if you are curious: {\tt autocorr}
3150
standardizes the correlations independently for each lag; for
3151
{\tt np.correlate}, we standardized them all at the end.
3152

3153
More importantly, now you know what autocorrelation is, how to
3154
use it to estimate the fundamental period of a signal, and two
3155
ways to compute it.
3156

3157

3158
\section{Exercises}
3159

3160
Solutions to these exercises are in {\tt chap05soln.ipynb}.
3161

3162
\begin{exercise}
3163
The Jupyter notebook for this chapter, {\tt chap05.ipynb}, includes
3164
an interaction that lets you compute autocorrelations for different
3165
lags.  Use this interaction to estimate the pitch of the vocal chirp
3166
for a few different start times.
3167
\end{exercise}
3168

3169

3170
\begin{exercise}
3171
The example code in \verb"chap05.ipynb" shows how to use autocorrelation
3172
to estimate the fundamental frequency of a periodic signal.
3173
Encapsulate this code in a function called \verb"estimate_fundamental",
3174
and use it to track the pitch of a recorded sound.
3175

3176
To see how well it works, try superimposing your pitch estimates on a
3177
spectrogram of the recording.
3178
\end{exercise}
3179

3180

3181
\begin{exercise}
3182
If you did the exercises in the previous chapter, you downloaded
3183
the historical price of BitCoins and estimated the power spectrum
3184
of the price changes.  Using the same data, compute the autocorrelation
3185
of BitCoin prices.  Does the autocorrelation function drop off quickly?
3186
Is there evidence of periodic behavior?
3187
\end{exercise}
3188

3189

3190
\begin{exercise}
3191
In the repository for this book you will find a Jupyter notebook
3192
called \verb"saxophone.ipynb" that explores autocorrelation,
3193
pitch perception, and a phenomenon called the {\bf missing fundamental}.
3194
Read through this notebook and run the examples.  Try selecting
3195
a different segment of the recording and running the examples again.
3196

3197
Vi Hart has an excellent video called ``What is up with Noises? (The
3198
Science and Mathematics of Sound, Frequency, and Pitch)''; it
3199
demonstrates the missing fundamental phenomenon and explains how pitch
3200
perception works (at least, to the degree that we know).  Watch it at
3201
\url{https://www.youtube.com/watch?v=i_0DXxNeaQ0}.
3202

3203

3204

3205
\end{exercise}
3206

3207

3208

3209
\chapter{Discrete cosine transform}
3210
\label{dct}
3211

3212
The topic of this chapter is the {\bf Discrete Cosine
3213
  Transform} (DCT), which is used in MP3 and related formats for
3214
compressing music; JPEG and similar formats for images; and the MPEG
3215
family of formats for video.
3216

3217
DCT is similar in many ways to the Discrete Fourier Transform (DFT),
3218
which we have been using for spectral analysis.
3219
Once we learn how DCT works, it will be easier to explain DFT.
3220

3221
Here are the steps to get there:
3222

3223
\begin{enumerate}
3224

3225
\item We'll start with the synthesis problem: given a set of frequency
3226
  components and their amplitudes, how can we construct a wave?
3227

3228
\item Next we'll rewrite the synthesis problem using NumPy arrays.
3229
  This move is good for performance, and also provides insight
3230
  for the next step.
3231

3232
\item We'll look at the analysis problem: given a signal and a
3233
  set of frequencies, how can we find the amplitude of each frequency
3234
  component?  We'll start with a solution that is conceptually simple
3235
  but slow.
3236

3237
\item Finally, we'll use some principles from linear algebra to find a
3238
  more efficient algorithm.  If you already know linear algebra,
3239
  that's great, but I'll explain what you need as we go.
3240

3241
\end{enumerate}
3242

3243
The code for this chapter is in {\tt
3244
  chap06.ipynb} which is in the repository for this book (see
3245
Section~\ref{code}).
3246
You can also view it at \url{http://tinyurl.com/thinkdsp06}.
3247

3248

3249
\section{Synthesis}
3250
\label{synth1}
3251

3252
Suppose I give you a list of amplitudes and a list of frequencies,
3253
and ask you to construct a signal that is the sum of these frequency
3254
components.  Using objects in the {\tt thinkdsp} module, there is
3255
a simple way to perform this operation, which is called {\bf synthesis}:
3256

3257
\begin{verbatim}
3258
def synthesize1(amps, fs, ts):
3259
    components = [thinkdsp.CosSignal(freq, amp)
3260
                  for amp, freq in zip(amps, fs)]
3261
    signal = thinkdsp.SumSignal(*components)
3262

3263
    ys = signal.evaluate(ts)
3264
    return ys
3265
\end{verbatim}
3266

3267
{\tt amps} is a list of amplitudes, {\tt fs} is the list
3268
of frequencies, and {\tt ts} is the sequence
3269
of times where the signal should be evaluated.
3270

3271
{\tt components} is a list of {\tt CosSignal} objects, one for
3272
each amplitude-frequency pair.  {\tt SumSignal} represents the
3273
sum of these frequency components.
3274

3275
Finally, {\tt evaluate} computes the value of the signal at each
3276
time in {\tt ts}.
3277

3278
We can test this function like this:
3279

3280
\begin{verbatim}
3281
    amps = np.array([0.6, 0.25, 0.1, 0.05])
3282
    fs = [100, 200, 300, 400]
3283
    framerate = 11025
3284

3285
    ts = np.linspace(0, 1, framerate)
3286
    ys = synthesize1(amps, fs, ts)
3287
    wave = thinkdsp.Wave(ys, framerate)
3288
\end{verbatim}
3289

3290
This example makes a signal that contains a fundamental frequency at
3291
100 Hz and three harmonics (100 Hz is a sharp G2).  It renders the
3292
signal for one second at 11,025 frames per second and puts the results
3293
into a Wave object.
3294

3295
Conceptually, synthesis is pretty simple.  But in this form it doesn't
3296
help much with {\bf analysis}, which is the inverse problem: given the
3297
wave, how do we identify the frequency components and their amplitudes?
3298

3299

3300
\section{Synthesis with arrays}
3301
\label{synthesis}
3302

3303
\begin{figure}
3304
\centerline{\includegraphics[height=2.5in]{figs/diagram1.pdf}}
3305
\caption{Synthesis with arrays.}
3306
\label{fig.synthesis}
3307
\end{figure}
3308

3309
Here's another way to write {\tt synthesize}:
3310

3311
\begin{verbatim}
3312
def synthesize2(amps, fs, ts):
3313
    args = np.outer(ts, fs)
3314
    M = np.cos(PI2 * args)
3315
    ys = np.dot(M, amps)
3316
    return ys
3317
\end{verbatim}
3318

3319
This function looks very different, but it does the same thing.
3320
Let's see how it works:
3321

3322
\begin{enumerate}
3323

3324
\item {\tt np.outer} computes the outer product of {\tt ts} and
3325
  {\tt fs}.  The result is an array with one row for each element
3326
  of {\tt ts} and one column for each element of {\tt fs}.  Each
3327
  element in the array is the product of a frequency and a time, $f
3328
  t$.
3329

3330
\item We multiply {\tt args} by $2 \pi$ and apply {\tt cos}, so each
3331
  element of the result is $\cos (2 \pi f t)$.  Since the {\tt ts} run
3332
  down the columns, each column contains a cosine signal at a
3333
  particular frequency, evaluated at a sequence of times.
3334

3335
\item {\tt np.dot} multiplies each row of {\tt M} by {\tt amps},
3336
  element-wise, and then adds up the products.  In terms of linear
3337
  algebra, we are multiplying a matrix, {\tt M}, by a vector, {\tt
3338
    amps}.  In terms of signals, we are computing the weighted sum
3339
  of frequency components.
3340

3341
\end{enumerate}
3342

3343
Figure~\ref{fig.synthesis} shows the structure of this computation.
3344
Each row of the matrix, {\tt M}, corresponds to a time 
3345
from 0.0 to 1.0 seconds; $t_n$ is the time of the $n$th row.
3346
Each column corresponds to a frequency from
3347
100 to 400 Hz; $f_k$ is the frequency of the $k$th column.
3348

3349
I labeled the $n$th row with the letters $a$ through $d$; as an
3350
example, the value of $a$ is $\cos [2 \pi (100) t_n]$.
3351

3352
The result of the dot product, {\tt ys}, is a vector with one element
3353
for each row of {\tt M}.  The $n$th element, labeled $e$, is the sum
3354
of products:
3355
%
3356
\[ e = 0.6 a + 0.25 b + 0.1 c + 0.05 d \]
3357
%
3358
And likewise with the other elements of {\tt ys}.  So each element
3359
of {\tt y} is the sum of four frequency components, evaluated at
3360
a point in time, and multiplied by the corresponding amplitudes.
3361
And that's exactly what we wanted.
3362

3363
We can use the code from the previous section to check that the two
3364
versions of {\tt synthesize} produce the same results.
3365

3366
\begin{verbatim}
3367
ys1 = synthesize1(amps, fs, ts)
3368
ys2 = synthesize2(amps, fs, ts)
3369
max(abs(ys1 - ys2))
3370
\end{verbatim}
3371

3372
The biggest difference between {\tt ys1} and {\tt ys2} is about {\tt
3373
  1e-13}, which is what we expect due to floating-point errors.
3374

3375
Writing this computation in terms of linear algebra makes the code
3376
smaller and faster.  Linear algebra
3377
provides concise notation for operations on matrices and vectors.  For
3378
example, we could write {\tt synthesize} like this:
3379
%
3380
\begin{eqnarray*}
3381
M &=& \cos (2 \pi t \otimes f) \\
3382
y &=& M a
3383
\end{eqnarray*}
3384
%
3385
where $a$ is a vector of amplitudes,
3386
$t$ is a vector of times, $f$ is a vector of frequencies, and
3387
$\otimes$ is the symbol for the outer product of two vectors.
3388

3389

3390
\section{Analysis}
3391
\label{analysis}
3392

3393
Now we are ready to solve the analysis problem.  Suppose I give you
3394
a wave and tell you that it is the sum of cosines with a given set
3395
of frequencies.  How would you find the amplitude for each frequency
3396
component?  In other words, given {\tt ys}, {\tt ts} and {\tt fs},
3397
can you recover {\tt amps}?
3398

3399
In terms of linear algebra, the first step is the same as for
3400
synthesis: we compute $M = \cos (2 \pi t \otimes f)$.  Then we want
3401
to find $a$ so that $y = M a$; in other words, we want to solve a
3402
linear system.  NumPy provides {\tt linalg.solve}, which does
3403
exactly that.
3404

3405
Here's what the code looks like:
3406

3407
\begin{verbatim}
3408
def analyze1(ys, fs, ts):
3409
    args = np.outer(ts, fs)
3410
    M = np.cos(PI2 * args)
3411
    amps = np.linalg.solve(M, ys)
3412
    return amps
3413
\end{verbatim}
3414

3415
The first two lines use {\tt ts} and {\tt fs} to build the
3416
matrix, {\tt M}.  Then {\tt np.linalg.solve} computes {\tt amps}.
3417

3418
But there's a hitch.  In general we can only solve a system of linear
3419
equations if the matrix is square; that is, the number of equations
3420
(rows) is the same as the number of unknowns (columns).
3421

3422
In this example, we have only 4 frequencies, but we evaluated the
3423
signal at 11,025 times.  So we have many more equations than unknowns.
3424

3425
In general if {\tt ys}
3426
contains more than 4 elements, it is unlikely that we can analyze it
3427
using only 4 frequencies.
3428

3429
But in this case, we know that the {\tt ys} were actually generated by
3430
adding only 4 frequency components, so we can use any 4 values from
3431
the wave array to recover {\tt amps}.
3432

3433
For simplicity, I'll use the first 4 samples from the signal.
3434
Using the values of {\tt ys}, {\tt fs} and {\tt ts} from
3435
the previous section, we can run {\tt analyze1} like this:
3436

3437
\begin{verbatim}
3438
n = len(fs)
3439
amps2 = analyze1(ys[:n], fs, ts[:n])
3440
\end{verbatim}
3441

3442
And sure enough, {\tt amps2} is
3443

3444
\begin{verbatim}
3445
[ 0.6   0.25  0.1   0.05 ]
3446
\end{verbatim}
3447

3448
This algorithm works, but it is slow.  Solving a linear
3449
system of equations takes time proportional to $n^3$, where $n$ is
3450
the number of columns in $M$.  We can do better.
3451

3452

3453
\section{Orthogonal matrices}
3454

3455
One way to solve linear systems is by inverting matrices.  The
3456
inverse of a matrix $M$ is written $M^{-1}$, and it has the property
3457
that $M^{-1}M = I$.  $I$ is the identity matrix, which has
3458
the value 1 on all diagonal elements and 0 everywhere else.
3459

3460
So, to solve the equation $y = Ma$, we can multiply both sides by
3461
$M^{-1}$, which yields:
3462
%
3463
\[ M^{-1}y = M^{-1} M a \]
3464
%
3465
On the right side, we can replace $M^{-1}M$ with $I$:
3466
%
3467
\[ M^{-1}y = I a \]
3468
%
3469
If we multiply $I$ by any vector $a$, the result is $a$, so  
3470
%
3471
\[ M^{-1}y = a \]
3472
%
3473
This implies that if we can compute $M^{-1}$ efficiently, we can find
3474
$a$ with a simple matrix multiplication (using {\tt np.dot}).  That
3475
takes time proportional to $n^2$, which is better than $n^3$.
3476

3477
Inverting a matrix is slow, in general, but some special cases are
3478
faster.  In particular, if $M$ is {\bf orthogonal}, the inverse of $M$
3479
is just the transpose of $M$, written $M^T$.  In NumPy
3480
transposing an array is a constant-time operation.  It
3481
doesn't actually move the elements of the array; instead, it creates a
3482
``view'' that changes the way the elements are accessed.
3483

3484
Again, a matrix is orthogonal if its transpose is also its inverse;
3485
that is, $M^T = M^{-1}$.  That implies that $M^TM = I$, which means we
3486
can check whether a matrix is orthogonal by computing $M^TM$.
3487

3488
So let's see what the matrix looks like in {\tt synthesize2}.  In 
3489
the previous example, $M$ has 11,025 rows, so it might be a good idea
3490
to work with a smaller example:
3491

3492
\begin{verbatim}
3493
def test1():
3494
    amps = np.array([0.6, 0.25, 0.1, 0.05])
3495
    N = 4.0
3496
    time_unit = 0.001
3497
    ts = np.arange(N) / N * time_unit
3498
    max_freq = N / time_unit / 2
3499
    fs = np.arange(N) / N * max_freq
3500
    ys = synthesize2(amps, fs, ts)
3501
\end{verbatim}
3502

3503
{\tt amps} is the same vector of amplitudes we saw before.
3504
Since we have 4 frequency components, we'll sample the signal
3505
at 4 points in time.  That way, $M$ is square.
3506

3507
{\tt ts} is a vector of equally spaced sample times in the range from
3508
0 to 1 time unit.  I chose the time unit to be 1 millisecond, but it
3509
is an arbitrary choice, and we will see in a minute that it drops out
3510
of the computation anyway.
3511

3512
Since the frame rate is $N$ samples per time unit, the Nyquist
3513
frequency is \verb"N / time_unit / 2", which is 2000 Hz in this
3514
example.  So {\tt fs} is a vector of equally spaced frequencies
3515
between 0 and 2000 Hz.
3516

3517
With these values of {\tt ts} and {\tt fs}, the matrix, $M$, is:
3518

3519
\begin{verbatim}
3520
[[ 1.     1.     1.     1.   ]
3521
 [ 1.     0.707  0.    -0.707]
3522
 [ 1.     0.    -1.    -0.   ]
3523
 [ 1.    -0.707 -0.     0.707]]
3524
\end{verbatim}
3525

3526
You might recognize 0.707 as an approximation of $\sqrt{2}/2$,
3527
which is $\cos \pi/4$.  You also might notice that this matrix
3528
is {\bf symmetric}, which means that the element at $(j, k)$ always
3529
equals the element at $(k, j)$.  This implies that $M$ is its own
3530
transpose; that is, $M^T = M$.
3531

3532
But sadly, $M$ is not orthogonal.  If we compute $M^TM$, we get:
3533

3534
\begin{verbatim}
3535
[[ 4.  1. -0.  1.]
3536
 [ 1.  2.  1. -0.]
3537
 [-0.  1.  2.  1.]
3538
 [ 1. -0.  1.  2.]]
3539
\end{verbatim}
3540

3541
And that's not the identity matrix.
3542

3543

3544
\section{DCT-IV}
3545
\label{dctiv}
3546

3547
But if we choose {\tt ts} and {\tt fs} carefully,
3548
we can make $M$ orthogonal.  There are several ways to do it, which
3549
is why there are several versions of the discrete cosine transform (DCT).
3550

3551
One simple option is to shift {\tt ts} and {\tt fs} by a half unit.
3552
This version is called DCT-IV, where ``IV'' is a roman numeral
3553
indicating that this is the fourth of eight versions of the DCT.
3554

3555
Here's an updated version of {\tt test1}:
3556

3557
\begin{verbatim}
3558
def test2():
3559
    amps = np.array([0.6, 0.25, 0.1, 0.05])
3560
    N = 4.0
3561
    ts = (0.5 + np.arange(N)) / N
3562
    fs = (0.5 + np.arange(N)) / 2
3563
    ys = synthesize2(amps, fs, ts)
3564
\end{verbatim}
3565

3566
If you compare this to the previous version, you'll notice
3567
two changes.  First, I added 0.5 to {\tt ts} and {\tt fs}.
3568
Second, I canceled out \verb"time_units", which simplifies
3569
the expression for {\tt fs}.
3570

3571
With these values, $M$ is
3572

3573
\begin{verbatim}
3574
[[ 0.981  0.831  0.556  0.195]
3575
 [ 0.831 -0.195 -0.981 -0.556]
3576
 [ 0.556 -0.981  0.195  0.831]
3577
 [ 0.195 -0.556  0.831 -0.981]]
3578
\end{verbatim}
3579

3580
And $M^TM$ is
3581

3582
\begin{verbatim}
3583
[[ 2.  0.  0.  0.]
3584
 [ 0.  2. -0.  0.]
3585
 [ 0. -0.  2. -0.]
3586
 [ 0.  0. -0.  2.]]
3587
\end{verbatim}
3588

3589
Some of the off-diagonal elements are displayed as -0, which means
3590
that the floating-point representation is a small negative number.
3591
This matrix is very close to $2I$, which means $M$ is almost
3592
orthogonal; it's just off by a factor of 2.  And for our purposes,
3593
that's good enough.
3594

3595
Because $M$ is symmetric and (almost) orthogonal, the inverse of $M$
3596
is just $M/2$.  Now we can write a more efficient version of {\tt
3597
  analyze}:
3598

3599
\begin{verbatim}
3600
def analyze2(ys, fs, ts):
3601
    args = np.outer(ts, fs)
3602
    M = np.cos(PI2 * args)
3603
    amps = np.dot(M, ys) / 2
3604
    return amps
3605
\end{verbatim}
3606

3607
Instead of using {\tt np.linalg.solve}, we just multiply
3608
by $M/2$.
3609

3610
Combining {\tt test2} and {\tt analyze2}, we can write an
3611
implementation of DCT-IV:
3612

3613
\begin{verbatim}
3614
def dct_iv(ys):
3615
    N = len(ys)
3616
    ts = (0.5 + np.arange(N)) / N
3617
    fs = (0.5 + np.arange(N)) / 2
3618
    args = np.outer(ts, fs)
3619
    M = np.cos(PI2 * args)
3620
    amps = np.dot(M, ys) / 2
3621
    return amps
3622
\end{verbatim}
3623

3624
Again, {\tt ys} is the wave array.  We don't have to pass
3625
{\tt ts} and {\tt fs} as parameters; \verb"dct_iv" can
3626
figure them out based on {\tt N}, the length of {\tt ys}.
3627

3628
If we've got it right, this function should solve the analysis
3629
problem; that is, given {\tt ys} it should be able to recover
3630
{\tt amps}.  We can test it like this.
3631

3632
\begin{verbatim}
3633
amps = np.array([0.6, 0.25, 0.1, 0.05])
3634
N = 4.0
3635
ts = (0.5 + np.arange(N)) / N
3636
fs = (0.5 + np.arange(N)) / 2
3637
ys = synthesize2(amps, fs, ts)
3638
amps2 = dct_iv(ys)
3639
max(abs(amps - amps2))
3640
\end{verbatim}
3641

3642
Starting with {\tt amps}, we synthesize a wave array, then use
3643
\verb"dct_iv" to compute {\tt amps2}.  The biggest
3644
difference between {\tt amps} and {\tt amps2} is about {\tt 1e-16},
3645
which is what we expect due to floating-point errors.
3646

3647

3648
\section{Inverse DCT}
3649

3650
Finally, notice that {\tt analyze2} and {\tt synthesize2} are almost
3651
identical.  The only difference is that {\tt analyze2} divides the
3652
result by 2.  We can use this insight to compute the inverse DCT:
3653

3654
\begin{verbatim}
3655
def inverse_dct_iv(amps):
3656
    return dct_iv(amps) * 2
3657
\end{verbatim}
3658

3659
\verb"inverse_dct_iv" solves the synthesis problem: it
3660
takes the vector of amplitudes and returns
3661
the wave array, {\tt ys}.  We can test it by starting
3662
with {\tt amps}, applying \verb"inverse_dct_iv" and \verb"dct_iv",
3663
and testing that we get back what we started with.
3664

3665
\begin{verbatim}
3666
amps = [0.6, 0.25, 0.1, 0.05]
3667
ys = inverse_dct_iv(amps)
3668
amps2 = dct_iv(ys)
3669
max(abs(amps - amps2))
3670
\end{verbatim}
3671

3672
Again, the biggest difference is about {\tt 1e-16}.
3673

3674

3675
\section{The Dct class}
3676

3677
\begin{figure}
3678
% dct.py
3679
\centerline{\includegraphics[height=2.5in]{figs/dct1.pdf}}
3680
\caption{DCT of a triangle signal at 400 Hz, sampled at 10 kHz.}
3681
\label{fig.dct1}
3682
\end{figure}
3683

3684
{\tt thinkdsp} provides a Dct class that encapsulates the
3685
DCT in the same way the Spectrum class encapsulates the FFT.
3686
To make a Dct object, you can invoke \verb"make_dct" on a Wave.
3687

3688
\begin{verbatim}
3689
signal = thinkdsp.TriangleSignal(freq=400)
3690
wave = signal.make_wave(duration=1.0, framerate=10000)
3691
dct = wave.make_dct()
3692
dct.plot()
3693
\end{verbatim}
3694

3695
The result is the DCT of a triangle wave at 400 Hz, shown in
3696
Figure~\ref{fig.dct1}.  The values of the DCT can be positive or negative;
3697
a negative value in the DCT corresponds to a negated cosine or,
3698
equivalently, to a cosine shifted by 180 degrees.
3699

3700
\verb"make_dct" uses DCT-II, which is the most common type of DCT,
3701
provided by {\tt scipy.fftpack}.
3702

3703
\begin{verbatim}
3704
import scipy.fftpack
3705

3706
# class Wave:
3707
    def make_dct(self):
3708
        N = len(self.ys)
3709
        hs = scipy.fftpack.dct(self.ys, type=2)
3710
        fs = (0.5 + np.arange(N)) / 2
3711
        return Dct(hs, fs, self.framerate)
3712
\end{verbatim}
3713

3714
The results from {\tt dct} are stored in {\tt hs}.  The corresponding
3715
frequencies, computed as in Section~\ref{dctiv}, are stored in {\tt fs}.
3716
And then both are used to initialize the Dct object.
3717

3718
Dct provides \verb"make_wave", which performs the inverse DCT.
3719
We can test it like this:
3720

3721
\begin{verbatim}
3722
wave2 = dct.make_wave()
3723
max(abs(wave.ys-wave2.ys))
3724
\end{verbatim}
3725

3726
The biggest difference between {\tt ys1} and {\tt ys2} is about {\tt
3727
  1e-16}, which is what we expect due to floating-point errors.
3728

3729
\verb"make_wave" uses {\tt scipy.fftpack.idct}: 
3730

3731
\begin{verbatim}
3732
# class Dct
3733
    def make_wave(self):
3734
        n = len(self.hs)
3735
        ys = scipy.fftpack.idct(self.hs, type=2) / 2 / n
3736
        return Wave(ys, framerate=self.framerate) 
3737
\end{verbatim}
3738

3739
Be default, the inverse DCT doesn't normalize the result, so we have
3740
to divide through by $2N$.
3741

3742

3743
\section{Exercises}
3744

3745
For the following exercises, I provide some starter code in
3746
{\tt chap06starter.ipynb}.
3747
Solutions are in {\tt chap06soln.ipynb}.
3748

3749
\begin{exercise}
3750
In this chapter I claim that {\tt analyze1} takes time proportional
3751
to $n^3$ and {\tt analyze2} takes time proportional to $n^2$.  To
3752
see if that's true, run them on a range of input sizes and time
3753
them.  In Jupyter, you can use the ``magic command'' \verb"%timeit".
3754

3755
If you plot run time versus input size on a log-log scale, you
3756
should get a straight line with slope 3 for  {\tt analyze1} and
3757
slope 2 for {\tt analyze2}.
3758

3759
You also might want to test \verb"dct_iv"
3760
and {\tt scipy.fftpack.dct}.
3761

3762
\end{exercise}
3763

3764

3765
\begin{exercise}
3766
One of the major applications of the DCT is compression for both
3767
sound and images.  In its simplest form, DCT-based compression
3768
works like this:
3769

3770
\begin{enumerate}
3771

3772
\item Break a long signal into segments.
3773

3774
\item Compute the DCT of each segment.
3775

3776
\item Identify frequency components with amplitudes so low they are
3777
  inaudible, and remove them.  Store only the frequencies and
3778
  amplitudes that remain.
3779

3780
\item To play back the signal, load the frequencies and amplitudes
3781
  for each segment and apply the inverse DCT.
3782

3783
\end{enumerate}
3784

3785
Implement a version of this algorithm and apply it to a recording
3786
of music or speech.  How many components can you eliminate before
3787
the difference is perceptible?
3788

3789
In order to make this method practical, you need some way to store a
3790
sparse array; that is, an array where most of the elements are zero.
3791
NumPy provides several implementations of sparse arrays, which you can
3792
read about at
3793
\url{http://docs.scipy.org/doc/scipy/reference/sparse.html}.
3794
\end{exercise}
3795

3796

3797
\begin{exercise}
3798
In the repository for this book you will find a Jupyter notebook
3799
called \verb"phase.ipynb" that explores the effect of phase on sound
3800
perception.
3801
Read through this notebook and run the examples.  
3802
Choose another segment of sound and run the same experiments.
3803
Can you find any general relationships between the phase structure
3804
of a sound and how we perceive it?
3805
\end{exercise}
3806

3807

3808

3809

3810
\chapter{Discrete Fourier Transform}
3811
\label{dft}
3812

3813
We've been using the discrete Fourier transform (DFT) since
3814
Chapter~\ref{sounds}, but I haven't explained how it works.  Now is
3815
the time.
3816

3817
If you understand the discrete cosine transform (DCT), you will
3818
understand the DFT.  The only difference is that instead of using the
3819
cosine function, we'll use the complex exponential function.  I'll
3820
start by explaining complex exponentials, then I'll follow the
3821
same progression as in Chapter~\ref{dct}:
3822

3823
\begin{enumerate}
3824

3825
\item We'll start with the synthesis
3826
  problem: given a set of frequency components and their amplitudes,
3827
  how can we construct a signal?  The synthesis problem is 
3828
  equivalent to the inverse DFT.
3829

3830
\item Then I'll rewrite the synthesis problem in the form of matrix
3831
  multiplication using NumPy arrays.
3832

3833
\item Next we'll solve the analysis problem, which is equivalent to
3834
  the DFT: given a signal, how to we find the amplitude and phase
3835
  offset of its frequency components?
3836

3837
\item Finally, we'll use linear algebra to find a more efficient way
3838
  to compute the DFT.
3839

3840
\end{enumerate}
3841

3842
The code for this chapter is in {\tt chap07.ipynb}, which is in the
3843
repository for this book (see Section~\ref{code}).
3844
You can also view it at \url{http://tinyurl.com/thinkdsp07}.
3845

3846

3847
\section{Complex exponentials}
3848

3849
One of the more interesting moves in mathematics is the generalization
3850
of an operation from one type to another.  For example, factorial is a
3851
function that operates on integers; the natural definition for
3852
factorial of $n$ is the product of all integers from 1 to $n$.
3853

3854
If you are of a certain inclination, you might wonder how to compute
3855
the factorial of a non-integer like 3.5.  Since the natural definition
3856
doesn't apply, you might look for other ways to compute the factorial
3857
function, ways that would work with non-integers.
3858

3859
In 1730, Leonhard Euler found one, a generalization of the factorial
3860
function that we know as the gamma function (see
3861
\url{http://en.wikipedia.org/wiki/Gamma_function}).
3862

3863
Euler also found one of the most useful generalizations in applied
3864
mathematics, the complex exponential function.
3865

3866
The natural definition of exponentiation is repeated multiplication.
3867
For example, $\phi^3 = \phi \cdot \phi \cdot \phi$.  But this
3868
definition doesn't apply to non-integer exponents.
3869

3870
However, exponentiation can also be expressed as a power series:
3871
%
3872
\[ e^\phi = 1 + \phi + \phi^2/2! + \phi^3/3! + ... \]
3873
%
3874
This definition works with real numbers, imaginary numbers and, by a simple
3875
extension, with complex numbers.  Applying this definition
3876
to a pure imaginary number, $i\phi$, we get
3877
%
3878
\[ e^{i\phi} = 1 + i\phi - \phi^2/2! - i\phi^3/3! + ... \]
3879
%
3880
By rearranging terms, we can show that this is equivalent to:
3881
%
3882
\[ e^{i\phi} = \cos \phi + i \sin \phi \]
3883
%
3884
You can see the derivation at
3885
\url{http://en.wikipedia.org/wiki/Euler's_formula}.
3886

3887
This formula
3888
implies that $e^{i\phi}$ is a complex number with magnitude 1; if you
3889
think of it as a point in the complex plane, it is always on the unit
3890
circle.  And if you think of it as a vector, the angle in radians
3891
between the vector and the positive x-axis is the argument, $\phi$.
3892

3893
In the case where the exponent is a complex number, we have:
3894
%
3895
\[ e^{a + i\phi} = e^a e^{i\phi} = A e^{i\phi} \]
3896
%
3897
where $A$ is a real number that indicates amplitude and
3898
$e^{i\phi}$ is a unit complex number that indicates angle.
3899

3900
NumPy provides a version of {\tt exp} that works with complex numbers:
3901

3902
\begin{verbatim}
3903
>>> phi = 1.5
3904
>>> z = np.exp(1j * phi)
3905
>>> z
3906
(0.0707+0.997j)
3907
\end{verbatim}
3908

3909
Python uses {\tt j} to represent the imaginary unit, rather
3910
than {\tt i}.  A number ending in {\tt j} is considered imaginary,
3911
so {\tt 1j} is just $i$.
3912

3913
When the argument to {\tt np.exp} is imaginary or complex, the
3914
result is a complex number; specifically, a {\tt np.complex128},
3915
which is represented by two 64-bit floating-point numbers.
3916
In this example, the result is {\tt 0.0707+0.997j}.  
3917

3918
Complex numbers have attributes {\tt real} and {\tt imag}:
3919

3920
\begin{verbatim}
3921
>>> z.real
3922
0.0707
3923
>>> z.imag
3924
0.997
3925
\end{verbatim}
3926

3927
To get the magnitude, you can use the built-in function {\tt abs}
3928
or {\tt np.absolute}:
3929

3930
\begin{verbatim}
3931
>>> abs(z)
3932
1.0
3933
>>> np.absolute(z)
3934
1.0
3935
\end{verbatim}
3936

3937
To get the angle, you can use {\tt np.angle}:
3938

3939
\begin{verbatim}
3940
>>> np.angle(z)
3941
1.5
3942
\end{verbatim}
3943

3944
This example confirms that $e^{i \phi}$ is a complex number with
3945
magnitude 1 and angle $\phi$ radians.
3946

3947

3948
\section{Complex signals}
3949

3950
If $\phi(t)$ is a function of time, $e^{i \phi(t)}$ is also a function
3951
of time.  Specifically,
3952
%
3953
\[ e^{i \phi(t)} = \cos \phi(t) + i \sin \phi(t) \]
3954
%
3955
This function describes a quantity that varies in time, so it is
3956
a signal.  Specifically, it is a {\bf complex exponential signal}.
3957

3958
In the special case where the frequency of the signal is constant,
3959
$\phi(t)$ is $2 \pi f t$ and the result is a {\bf complex sinusoid}:
3960
%
3961
\[ e^{i 2 \pi f t} = \cos 2 \pi f t + i \sin 2 \pi f t \]
3962
%
3963
Or more generally, the signal might start at a phase offset
3964
$\phi_0$, yielding
3965
%
3966
\[ e^{i (2 \pi f t + \phi_0)} \]
3967
%
3968
{\tt thinkdsp} provides an implementation of this signal,
3969
{\tt ComplexSinusoid}:
3970

3971
\begin{verbatim}
3972
class ComplexSinusoid(Sinusoid):
3973
 
3974
   def evaluate(self, ts):
3975
        phases = PI2 * self.freq * ts + self.offset
3976
        ys = self.amp * np.exp(1j * phases)
3977
        return ys
3978
\end{verbatim}
3979

3980
{\tt ComplexSinusoid} inherits \verb"__init__" from
3981
{\tt Sinusoid}.  It provides a version of {\tt evaluate}
3982
that is almost identical to {\tt Sinusoid.evaluate}; the
3983
only difference is that it uses {\tt np.exp} instead of
3984
{\tt np.sin}.
3985

3986
The result is a NumPy array of complex numbers:
3987

3988
\begin{verbatim}
3989
>>> signal = thinkdsp.ComplexSinusoid(freq=1, amp=0.6, offset=1)
3990
>>> wave = signal.make_wave(duration=1, framerate=4)
3991
>>> wave.ys
3992
[ 0.324+0.505j -0.505+0.324j -0.324-0.505j  0.505-0.324j]
3993
\end{verbatim}
3994

3995
The frequency of this signal is 1 cycle per second; the amplitude
3996
is 0.6 (in unspecified units); and the phase offset is 1 radian.
3997

3998
This example evaluates the signal at 4 places equally spaced between
3999
0 and 1 second.  The resulting samples are complex numbers.
4000

4001

4002
\section{The synthesis problem}
4003

4004
Just as we did with real sinusoids, we can create compound signals
4005
by adding up complex sinusoids with different frequencies.  And that
4006
brings us to the complex version of the synthesis problem: given the
4007
frequency and amplitude of each complex component, how do we evaluate the
4008
signal?
4009

4010
The simplest solution is to create {\tt ComplexSinusoid} objects and
4011
add them up.
4012

4013
\begin{verbatim}
4014
def synthesize1(amps, fs, ts):
4015
    components = [thinkdsp.ComplexSinusoid(freq, amp)
4016
                  for amp, freq in zip(amps, fs)]
4017
    signal = thinkdsp.SumSignal(*components)
4018
    ys = signal.evaluate(ts)
4019
    return ys
4020
\end{verbatim}
4021

4022
This function is almost identical to {\tt synthesize1} in
4023
Section~\ref{synth1}; the only difference is that I replaced
4024
{\tt CosSignal} with {\tt ComplexSinusoid}.
4025

4026
Here's an example:
4027

4028
\begin{verbatim}
4029
    amps = np.array([0.6, 0.25, 0.1, 0.05])
4030
    fs = [100, 200, 300, 400]
4031
    framerate = 11025
4032
    ts = np.linspace(0, 1, framerate)
4033
    ys = synthesize1(amps, fs, ts)
4034
\end{verbatim}
4035

4036
The result is:
4037

4038
\begin{verbatim}
4039
[ 1.000 +0.000e+00j  0.995 +9.093e-02j  0.979 +1.803e-01j ...,
4040
  0.979 -1.803e-01j  0.995 -9.093e-02j  1.000 -5.081e-15j]
4041
\end{verbatim}
4042

4043
At the lowest level, a complex signal is a sequence of complex
4044
numbers.  But how should we interpret it?  We have some intuition for
4045
real signals: they represent quantities that vary in time; for
4046
example, a sound signal represents changes in air pressure.
4047
But nothing we measure in the world yields complex numbers.
4048

4049
\begin{figure}
4050
% dft.py
4051
\centerline{\includegraphics[height=2.5in]{figs/dft1.pdf}}
4052
\caption{Real and imaginary parts of a mixture of complex sinusoids.}
4053
\label{fig.dft1}
4054
\end{figure}
4055

4056
So what is a complex signal?  I don't have a satisfying answer to this
4057
question.  The best I can offer is two unsatisfying
4058
answers:
4059

4060
\begin{enumerate}
4061

4062
\item A complex signal is a mathematical abstraction that is useful
4063
  for computation and analysis, but it does not correspond directly
4064
  with anything in the real world.
4065

4066
\item If you like, you can think of a complex signal as a sequence of
4067
  complex numbers that contains two signals as its real and imaginary
4068
  parts.
4069

4070
\end{enumerate}
4071

4072
Taking the second point of view, we can split the previous
4073
signal into its real and imaginary parts:
4074

4075
\begin{verbatim}
4076
    n = 500
4077
    thinkplot.plot(ts[:n], ys[:n].real, label='real')
4078
    thinkplot.plot(ts[:n], ys[:n].imag, label='imag')
4079
\end{verbatim}
4080

4081
Figure~\ref{fig.dft1} shows a segment of the result.  The
4082
real part is a sum of cosines; the imaginary part is
4083
a sum of sines.  Although the waveforms look different, they
4084
contain the same frequency components in the same proportions.
4085
To our ears, they sound the same (in general, we don't hear
4086
phase offsets).
4087

4088

4089
\section{Synthesis with matrices}
4090
\label{synthmat}
4091

4092
As we saw in Section~\ref{synthesis}, we can also express the synthesis
4093
problem in terms of matrix multiplication: 
4094

4095
\begin{verbatim}
4096
PI2 = np.pi * 2
4097

4098
def synthesize2(amps, fs, ts):
4099
    args = np.outer(ts, fs)
4100
    M = np.exp(1j * PI2 * args)
4101
    ys = np.dot(M, amps)
4102
    return ys
4103
\end{verbatim}
4104

4105
Again, {\tt amps} is a NumPy array that contains a sequence of
4106
amplitudes.
4107

4108
{\tt fs} is a sequence containing the frequencies of the
4109
components.  {\tt ts} contains the times where we will evaluate
4110
the signal.
4111

4112
{\tt args} contains the outer product of {\tt ts} and {\tt fs},
4113
with the {\tt ts} running down the rows and the {\tt fs} running
4114
across the columns (you might want to refer back to
4115
Figure~\ref{fig.synthesis}).
4116

4117
Each column of matrix {\tt M} contains a complex sinusoid with
4118
a particular frequency, evaluated at a sequence of {\tt ts}.
4119

4120
When we multiply {\tt M} by the amplitudes, the result is a vector
4121
whose elements correspond to the {\tt ts}; each element is the sum of
4122
several complex sinusoids, evaluated at a particular time.
4123

4124
Here's the example from the previous section again:
4125

4126
\begin{verbatim}
4127
>>> ys = synthesize2(amps, fs, ts)
4128
>>> ys
4129
[ 1.000 +0.000e+00j  0.995 +9.093e-02j  0.979 +1.803e-01j ...,
4130
  0.979 -1.803e-01j  0.995 -9.093e-02j  1.000 -5.081e-15j]
4131
\end{verbatim}
4132

4133
The result is the same.
4134

4135
In this example the amplitudes are real, but they could also be
4136
complex.  What effect does a complex amplitude have on the result?
4137
Remember that we can think of a complex number in two ways: either the
4138
sum of a real and imaginary part, $x + i y$, or the product of a real
4139
amplitude and a complex exponential, $A e^{i \phi_0}$.  Using the
4140
second interpretation, we can see what happens when we multiply
4141
a complex amplitude by a complex sinusoid.  For each frequency, $f$,
4142
we have:
4143
%
4144
\[ A \exp{i \phi_0} \cdot \exp{i 2 \pi f t} = A \exp{i (2 \pi f t + \phi_0)} \]
4145
%
4146
Multiplying by $A e^{i \phi_0}$ multiplies the amplitude by $A$
4147
and adds the phase offset $\phi_0$.
4148

4149
\begin{figure}
4150
% dft.py
4151
\centerline{\includegraphics[height=2.5in]{figs/dft2.pdf}}
4152
\caption{Real part of two complex signals that differ by a phase
4153
offset.}
4154
\label{fig.dft2}
4155
\end{figure}
4156

4157
We can test that claim by running the previous example with
4158
$\phi_0 = 1.5$ for all frequency components:
4159

4160
\begin{verbatim}
4161
    phi = 1.5
4162
    amps2 = amps * np.exp(1j * phi)
4163
    ys2 = synthesize2(amps2, fs, ts)
4164

4165
    thinkplot.plot(ts[:n], ys.real[:n])
4166
    thinkplot.plot(ts[:n], ys2.real[:n])
4167
\end{verbatim}
4168

4169
Since {\tt amps}
4170
is an array of reals, multiplying by {\tt np.exp(1j * phi)} yields
4171
an array of complex numbers with phase offset {\tt phi} radians, and
4172
the same magnitudes as {\tt amps}.
4173

4174
Figure~\ref{fig.dft2} shows the result.  The phase offset
4175
$\phi_0 = 1.5$ shifts the wave to the left by about one quarter of
4176
a cycle; it also changes the waveform, because the same phase
4177
offset applied to different frequencies changes how the frequency
4178
components line up with each other.
4179

4180
Now that we have the more general solution to the synthesis problem --
4181
one that handles complex amplitudes -- we are ready for the analysis
4182
problem.
4183

4184

4185
\section{The analysis problem}
4186

4187
The analysis problem is the inverse of the synthesis problem: given a
4188
sequence of samples, $y$, and knowing the frequencies
4189
that make up the signal, can we compute the complex amplitudes of the
4190
components, $a$?
4191

4192
As we saw in Section~\ref{analysis}, we can solve this problem by forming
4193
the synthesis matrix, $M$, and solving the system of linear
4194
equations, $M a = y$ for $a$.
4195

4196
\begin{verbatim}
4197
def analyze1(ys, fs, ts):
4198
    args = np.outer(ts, fs)
4199
    M = np.exp(1j * PI2 * args)
4200
    amps = np.linalg.solve(M, ys)
4201
    return amps
4202
\end{verbatim}
4203

4204
{\tt analyze1} takes a (possibly complex) wave array, {\tt ys}, a
4205
sequence of real frequencies, {\tt fs}, and a sequence of real
4206
times, {\tt ts}.  It returns a sequence of complex amplitudes, {\tt
4207
  amps}.
4208

4209
Continuing the previous example, we can confirm that {\tt analyze1}
4210
recovers the amplitudes we started with.  For the linear system
4211
solver to work, {\tt M} has to be square, so we need {\tt ys}, {\tt
4212
  fs} and {\tt ts} to have the same length.  I'll insure that by
4213
slicing {\tt ys} and {\tt ts} down to the length of {\tt fs}:
4214

4215
\begin{verbatim}
4216
>>> n = len(fs)
4217
>>> amps2 = analyze1(ys[:n], fs, ts[:n])
4218
>>> amps2
4219
[ 0.60+0.j  0.25-0.j  0.10+0.j  0.05-0.j]
4220
\end{verbatim}
4221

4222
These are approximately the amplitudes we started with, although
4223
each component has a small imaginary part due to
4224
floating-point errors.
4225

4226

4227
\section{Efficient analysis}
4228

4229
Unfortunately, solving a linear system of equations is slow.  For the
4230
DCT, we were able to speed things up by choosing {\tt fs} and {\tt
4231
  ts} so that {\tt M} is orthogonal.  That way, the inverse of {\tt M}
4232
is the transpose of {\tt M}, and we can compute both DCT and inverse
4233
DCT by matrix multiplication.
4234

4235
We'll do the same thing for the DFT, with one small change.
4236
Since {\tt M} is complex, we need it to be {\bf unitary}, rather
4237
than orthogonal, which means that the inverse of {\tt M} is
4238
the conjugate transpose of {\tt M}, which we can compute by
4239
transposing the matrix and negating the imaginary part of each
4240
element.  See \url{http://en.wikipedia.org/wiki/Unitary_matrix}. 
4241

4242
The NumPy methods {\tt conj} and {\tt transpose} do what we
4243
want.  Here's the code that computes {\tt M} for $N=4$ components:
4244

4245
\begin{verbatim}
4246
    N = 4
4247
    ts = np.arange(N) / N
4248
    fs = np.arange(N)
4249
    args = np.outer(ts, fs)
4250
    M = np.exp(1j * PI2 * args)
4251
\end{verbatim}
4252

4253
If $M$ is unitary, $M^*M = I$, where $M^*$ is the conjugate transpose
4254
of $M$, and $I$ is the identity matrix.  We can test whether $M$
4255
is unitary like this:
4256

4257
\begin{verbatim}
4258
    MstarM = M.conj().transpose().dot(M)
4259
\end{verbatim}
4260

4261
The result, within the tolerance of floating-point error, is
4262
$4 I$, so $M$ is unitary except for an extra factor of $N$,
4263
similar to the extra factor of 2 we found with the DCT.
4264

4265
We can use this result to write a faster version of {\tt analyze1}:
4266

4267
\begin{verbatim}
4268
def analyze2(ys, fs, ts):
4269
    args = np.outer(ts, fs)
4270
    M = np.exp(1j * PI2 * args)
4271
    amps = M.conj().transpose().dot(ys) / N
4272
    return amps
4273
\end{verbatim}
4274

4275
And test it with appropriate values of {\tt fs} and {\tt ts}:
4276

4277
\begin{verbatim}
4278
    N = 4
4279
    amps = np.array([0.6, 0.25, 0.1, 0.05])
4280
    fs = np.arange(N)
4281
    ts = np.arange(N) / N
4282
    ys = synthesize2(amps, fs, ts)
4283
    amps3 = analyze2(ys, fs, ts)
4284
\end{verbatim}
4285

4286
Again, the result is correct within the tolerance of floating-point
4287
arithmetic.
4288

4289
\begin{verbatim}
4290
[ 0.60+0.j  0.25+0.j  0.10-0.j  0.05-0.j]
4291
\end{verbatim}
4292

4293

4294
\section{DFT}
4295
\label{dftsection}
4296

4297
As a function, {\tt analyze2} would be hard to use because it
4298
only works if {\tt fs} and {\tt ts} are chosen correctly.
4299
Instead, I will rewrite it to take just {\tt ys} and compute {\tt freq}
4300
and {\tt ts} itself.
4301

4302
First, I'll make a function to compute the synthesis matrix, $M$:
4303

4304
\begin{verbatim}
4305
def synthesis_matrix(N):
4306
    ts = np.arange(N) / N
4307
    fs = np.arange(N)
4308
    args = np.outer(ts, fs)
4309
    M = np.exp(1j * PI2 * args)
4310
    return M
4311
\end{verbatim}
4312

4313
Then I'll write the function that takes {\tt ys} and returns 
4314
{\tt amps}:
4315

4316
\begin{verbatim}
4317
def analyze3(ys):
4318
    N = len(ys)
4319
    M = synthesis_matrix(N)
4320
    amps = M.conj().transpose().dot(ys) / N
4321
    return amps
4322
\end{verbatim}
4323

4324
We are almost done; {\tt analyze3} computes something very
4325
close to the DFT, with one difference.  The conventional definition
4326
of DFT does not divide by {\tt N}:
4327

4328
\begin{verbatim}
4329
def dft(ys):
4330
    N = len(ys)
4331
    M = synthesis_matrix(N)
4332
    amps = M.conj().transpose().dot(ys)
4333
    return amps
4334
\end{verbatim}
4335

4336
Now we can confirm that my version yields the same result as
4337
FFT:
4338

4339
\begin{verbatim}
4340
>>> dft(ys)
4341
[ 2.4+0.j  1.0+0.j  0.4-0.j  0.2-0.j]
4342
\end{verbatim}
4343

4344
The result is close to {\tt amps * N}.
4345
And here's the version in {\tt np.fft}:
4346

4347
\begin{verbatim}
4348
>>> np.fft.fft(ys)
4349
[ 2.4+0.j  1.0+0.j  0.4-0.j  0.2-0.j]
4350
\end{verbatim}
4351

4352
They are the same, within floating point error.
4353

4354
The inverse DFT is almost the same, except we don't have to transpose
4355
and conjugate $M$, and {\em now} we have to divide through by N:
4356

4357
\begin{verbatim}
4358
def idft(ys):
4359
    N = len(ys)
4360
    M = synthesis_matrix(N)
4361
    amps = M.dot(ys) / N
4362
    return amps
4363
\end{verbatim}
4364

4365
Finally, we can confirm that {\tt dft(idft(amps))} yields {\tt amps}.
4366

4367
\begin{verbatim}
4368
>>> ys = idft(amps)
4369
>>> dft(ys)
4370
[ 0.60+0.j  0.25+0.j  0.10-0.j  0.05-0.j]
4371
\end{verbatim}
4372

4373
If I could go back in time, I might change the definition of
4374
DFT so it divides by $N$ and the inverse DFT doesn't.  That would
4375
be more consistent with my presentation of the synthesis and analysis
4376
problems.
4377

4378
Or I might change the definition so that both operations divide
4379
through by $\sqrt{N}$.  Then the DFT and inverse DFT would be
4380
more symmetric.
4381

4382
But I can't go back in time (yet!), so we're stuck with a slightly
4383
weird convention.  For practical purposes it doesn't really
4384
matter.
4385

4386

4387
\section{The DFT is periodic}
4388

4389
In this chapter I presented the DFT in the form of matrix multiplication.
4390
We compute the synthesis matrix, $M$, and the analysis matrix, $M^*$.
4391
When we multiply $M^{*}$ by the wave array, $y$, each element of the
4392
result is the product of a row from $M^*$ and $y$, which we can
4393
write in the form of a summation:
4394
%
4395
\[ DFT(y)[k] = \sum_n y[n] \exp(-2 \pi i n k / N) \]
4396
%
4397
where $k$ is an index of frequency from
4398
$0$ to $N-1$ and $n$ is an index of time from $0$ to $N-1$.
4399
So $DFT(y)[k]$ is the $k$th element of the DFT of $y$.
4400

4401
Normally we evaluate this summation for $N$ values of $k$, running from
4402
0 to $N-1$.  We {\em could} evaluate it for other values of $k$, but
4403
there is no point, because they start to repeat.  That is, the value at
4404
$k$ is the same as the value at $k+N$ or $k+2N$ or $k-N$, etc.
4405

4406
We can see that mathematically by plugging $k+N$ into the summation:
4407
%
4408
\[ DFT(y)[k+N] = \sum_n y[n] \exp(-2 \pi i n (k+N) / N) \]
4409
%
4410
Since there is a sum in the exponent, we can break it into two parts:
4411
%
4412
\[ DFT(y)[k+N] = \sum_n y[n] \exp(-2 \pi i n k / N)  \exp(-2 \pi i n N / N) \]
4413
%
4414
In the second term, the exponent is always an integer multiple of
4415
$2 \pi$, so the result is always 1, and we can drop it:
4416
%
4417
\[ DFT(y)[k+N] = \sum_n y[n] \exp(-2 \pi i n k / N) \]
4418
%
4419
And we can see that this summation is equivalent to $ DFT(y)[k]$.
4420
So the DFT is periodic, with period $N$.  You will need this result
4421
for one of the exercises below, which asks you to implement the Fast Fourier
4422
Transform (FFT).
4423

4424
As an aside, writing the DFT in the form of a summation provides an
4425
insight into how it works.  If you review the diagram in
4426
Section~\ref{synthesis}, you'll see that each column of the synthesis matrix
4427
is a signal evaluated at a sequence of times.  The analysis matrix is
4428
the (conjugate) transpose of the synthesis matrix, so each {\em row}
4429
is a signal evaluated at a sequence of times.
4430

4431
Therefore, each summation is the correlation of $y$ with one of the
4432
signals in the array (see Section~\ref{dotproduct}).  That is, each
4433
element of the DFT is a correlation that quantifies the similarity of
4434
the wave array, $y$, and a complex exponential at a particular
4435
frequency.
4436

4437

4438
\section{DFT of real signals}
4439

4440
\begin{figure}
4441
% dft.py
4442
\centerline{\includegraphics[height=2.5in]{figs/dft3.pdf}}
4443
\caption{DFT of a 500 Hz sawtooth signal sampled at 10 kHz.}
4444
\label{fig.dft3}
4445
\end{figure}
4446

4447
The Spectrum class in {\tt thinkdsp} is based on {\tt np.ftt.rfft},
4448
which computes the ``real DFT''; that is, it works with real signals.
4449
But the DFT as presented in this chapter is more general than that; it
4450
works with complex signals.
4451

4452
So what happens when we apply the ``full DFT'' to a real signal?
4453
Let's look at an example:
4454

4455
\begin{verbatim}
4456
    signal = thinkdsp.SawtoothSignal(freq=500)
4457
    wave = signal.make_wave(duration=0.1, framerate=10000)
4458
    hs = dft(wave.ys)
4459
    amps = np.absolute(hs)
4460
\end{verbatim}
4461

4462
This code makes a sawtooth wave with frequency 500 Hz, sampled at
4463
framerate 10 kHz.  {\tt hs} contains the complex DFT of the wave;
4464
{\tt amps} contains the amplitude at each frequency.  But what
4465
frequency do these amplitudes correspond to?  If we look at the
4466
body of {\tt dft}, we see:
4467

4468
\begin{verbatim}
4469
    fs = np.arange(N)
4470
\end{verbatim}
4471

4472
It's tempting to think that these values are the right frequencies.
4473
The problem is that {\tt dft} doesn't know the sampling rate.  The DFT
4474
assumes that the duration of the wave is 1 time unit, so it thinks the
4475
sampling rate is $N$ per time unit.  In order to interpret the
4476
frequencies, we have to convert from these arbitrary time units back
4477
to seconds, like this:
4478

4479
\begin{verbatim}
4480
    fs = np.arange(N) * framerate / N
4481
\end{verbatim}
4482

4483
With this change, the range of frequencies is from 0 to the actual
4484
framerate, 10 kHz.  Now we can plot the spectrum:
4485

4486
\begin{verbatim}
4487
    thinkplot.plot(fs, amps)
4488
    thinkplot.config(xlabel='frequency (Hz)', 
4489
                     ylabel='amplitude')
4490
\end{verbatim}
4491

4492
Figure~\ref{fig.dft3} shows the amplitude of the signal for each
4493
frequency component from 0 to 10 kHz.  The left half of the figure
4494
is what we should expect: the dominant frequency is at 500 Hz, with
4495
harmonics dropping off like $1/f$.
4496

4497
But the right half of the figure is a surprise.  Past 5000 Hz, the
4498
amplitude of the harmonics start growing again, peaking at 9500 Hz.
4499
What's going on?
4500

4501
The answer: aliasing.  Remember that with framerate 10000 Hz, the
4502
folding frequency is 5000 Hz.  As we saw in Section~\ref{aliasing},
4503
a component at 5500 Hz is indistinguishable from a component
4504
at 4500 Hz.  When we evaluate the DFT at 5500 Hz, we get the same
4505
value as at 4500 Hz.  Similarly, the value at 6000 Hz is the same
4506
as the one at 4000 Hz, and so on.
4507

4508
The DFT of a real signal is symmetric around the folding frequency.
4509
Since there is no additional information past this point, we can
4510
save time by evaluating only the first half of the DFT,
4511
and that's exactly what {\tt np.fft.rfft} does.
4512

4513

4514

4515
\section{Exercises}
4516

4517
Solutions to these exercises are in {\tt chap07soln.ipynb}.
4518

4519

4520
\begin{exercise}
4521
The notebook for this chapter, {\tt chap07.ipynb}, contains
4522
additional examples and explanations.  Read through it and run
4523
the code.
4524
\end{exercise}
4525

4526

4527
\begin{exercise}
4528
In this chapter, I showed how we can express the DFT and inverse DFT
4529
as matrix multiplications.  These operations take time proportional to
4530
$N^2$, where $N$ is the length of the wave array.  That is fast enough
4531
for many applications, but there is a faster
4532
algorithm, the Fast Fourier Transform (FFT), which takes time
4533
proportional to $N \log N$.
4534

4535
The key to the FFT is the Danielson-Lanczos lemma:
4536
%
4537
\[ DFT(y)[n] = DFT(e)[n] + exp(-2 \pi i n / N) DFT(o)[n] \]
4538
%
4539
Where $ DFT(y)[n]$ is the $n$th element of the DFT of $y$; $e$ is a
4540
wave array containing the even elements of $y$, and $o$ contains the
4541
odd elements of $y$.
4542

4543
This lemma suggests a recursive algorithm for the DFT:
4544

4545
\begin{enumerate}
4546

4547
\item Given a wave array, $y$, split it into its even elements, $e$,
4548
  and its odd elements, $o$.
4549

4550
\item Compute the DFT of $e$ and $o$ by making recursive calls.
4551

4552
\item Compute $DFT(y)$ for each value of $n$ using the
4553
  Danielson-Lanczos lemma.
4554

4555
\end{enumerate}
4556

4557
For the base case of this recursion, you could wait until the length
4558
of $y$ is 1.  In that case, $DFT(y) = y$.  Or if the length of $y$
4559
is sufficiently small, you could compute its DFT by matrix multiplication,
4560
possibly using a precomputed matrix.
4561

4562
Hint: I suggest you implement this algorithm incrementally by starting
4563
with a version that is not truly recursive.  In Step 2, instead of
4564
making a recursive call, use {\tt dft}, as defined by
4565
Section~\ref{dftsection}, or {\tt np.fft.fft}.  Get Step 3 working,
4566
and confirm that the results are consistent with the other
4567
implementations.  Then add a base case and confirm that it works.
4568
Finally, replace Step 2 with recursive calls.
4569

4570
One more hint: Remember that the DFT is periodic; you might find {\tt
4571
  np.tile} useful.
4572

4573
You can read more about the FFT at
4574
\url{https://en.wikipedia.org/wiki/Fast_Fourier_transform}.
4575

4576
\end{exercise}
4577

4578

4579
\chapter{Filtering and Convolution}
4580

4581
In this chapter I present one of the most important and useful
4582
ideas related to signal processing: the Convolution Theorem.
4583
But before we can understand the Convolution Theorem, we have to understand
4584
convolution.  I'll start with a simple example, smoothing, and
4585
we'll go from there.
4586

4587
The code for this chapter is in {\tt chap08.ipynb}, which is in the
4588
repository for this book (see Section~\ref{code}).
4589
You can also view it at \url{http://tinyurl.com/thinkdsp08}.
4590

4591

4592
\section{Smoothing}
4593
\label{smoothing}
4594

4595
\begin{figure}
4596
% convolution1.py
4597
\centerline{\includegraphics[height=2.5in]{figs/convolution1.pdf}}
4598
\caption{Daily closing price of Facebook stock and a 30-day moving
4599
average.}
4600
\label{fig.convolution1}
4601
\end{figure}
4602

4603
Smoothing is an operation that tries to remove short-term variations
4604
from a signal in order to reveal long-term trends.  For example, if
4605
you plot daily changes in the price of a stock, it would look noisy;
4606
a smoothing operator might make it easier to see whether the price
4607
was generally going up or down over time.  
4608

4609
A common smoothing algorithm is a moving average, which computes
4610
the mean of the previous $n$ values, for some value of $n$.
4611

4612
For example, Figure~\ref{fig.convolution1} shows the daily
4613
closing price of Facebook from May 17, 2012 to December 8,
4614
2015.  The gray line
4615
is the raw data, the darker line shows the 30-day moving average.
4616
Smoothing removes the most
4617
extreme changes and makes it easier to see long-term trends.
4618

4619
Smoothing operations also apply to sound signals.  As an example, I'll
4620
start with a square wave at 440 Hz.  As we saw in
4621
Section~\ref{square}, the harmonics of a square wave drop off
4622
slowly, so it contains many high-frequency components.
4623

4624
\begin{verbatim}
4625
    signal = thinkdsp.SquareSignal(freq=440)
4626
    wave = signal.make_wave(duration=1, framerate=44100)
4627
    segment = wave.segment(duration=0.01)
4628
\end{verbatim}
4629

4630
{\tt wave} is a 1-second segment of the signal; {\tt segment}
4631
is a shorter segment I'll use for plotting.
4632

4633
To compute the moving average of this signal, I'll create
4634
a window with 11 elements and normalize it so the elements
4635
add up to 1.
4636

4637
\begin{verbatim}
4638
    window = np.ones(11)
4639
    window /= sum(window)
4640
\end{verbatim}
4641

4642
Now I can compute the average of the first 11 elements by
4643
multiplying the window by the wave array:
4644

4645
\begin{verbatim}
4646
    ys = segment.ys
4647
    N = len(ys)
4648
    padded = thinkdsp.zero_pad(window, N)
4649
    prod = padded * ys
4650
    sum(prod)
4651
\end{verbatim}
4652

4653
{\tt padded} is a version of the window with zeros added to
4654
the end so it has the same length as {\tt segment.ys}
4655
{\tt prod} is the product of the window and the wave array.
4656

4657
The sum of the elementwise products is the average of the first 11
4658
elements of the array.  Since these elements are all -1, their average
4659
is -1.
4660

4661
\begin{figure}
4662
% convolution2.py
4663
\centerline{\includegraphics[height=2.5in]{figs/convolution2.pdf}}
4664
\caption{A square signal at 400 Hz (gray) and an 11-element
4665
moving average.}
4666
\label{fig.convolution2}
4667
\end{figure}
4668

4669
To compute the next element of the smoothed signal, we shift the
4670
window to the right and compute the average of the next 11 elements of
4671
the wave array, starting with the second.
4672

4673
\begin{verbatim}
4674
    rolled = np.roll(rolled, 1)
4675
    prod = rolled * ys
4676
    sum(prod)
4677
\end{verbatim}
4678

4679
The result is -1 again.
4680
To compute the moving average, we repeat this process and store
4681
the result in an array named {\tt smoothed}:
4682

4683
\begin{verbatim}
4684
    smoothed = np.zeros_like(ys)
4685
    rolled = padded
4686
    for i in range(N):
4687
        smoothed[i] = sum(rolled * ys)
4688
        rolled = np.roll(rolled, 1)
4689
\end{verbatim}
4690

4691
{\tt rolled} is a copy of {\tt padded} that gets shifted to
4692
the right by one element each time through the loop.  Inside
4693
the loop, we multiply the segment by {\tt rolled} to select
4694
11 elements, and then add them up.
4695

4696
Figure~\ref{fig.convolution2} shows the result.  The gray line
4697
is the original signal; the darker line is the smoothed signal.
4698
The smoothed signal starts to ramp up when the leading edge of
4699
the window reaches the first transition, and levels off when
4700
the window crosses the transition.  As a result, the transitions
4701
are less abrupt, and the corners less sharp.  If you listen
4702
to the smoothed signal, it sounds less buzzy and slightly muffled.
4703

4704

4705
\section{Convolution}
4706
\label{convolution}
4707

4708
The operation we just computed is called {\bf convolution},
4709
and it is such a common operation that NumPy provides an
4710
implementation that is simpler and faster than my version:
4711

4712
\begin{verbatim}
4713
    convolved = np.convolve(ys, window, mode='valid')
4714
    smooth2 = thinkdsp.Wave(convolved, framerate=wave.framerate)
4715
\end{verbatim}
4716

4717
{\tt np.convolve} computes the convolution of the wave
4718
array and the window.  The mode flag {\tt valid} indicates
4719
that it should only compute values when the window and the
4720
wave array overlap completely, so it stops when the right
4721
edge of the window reaches the end of the wave array.  Other
4722
than that, the result is the same as in Figure~\ref{fig.convolution2}.
4723

4724
\newcommand{\conv}{\ast}
4725

4726
Actually, there is one other difference.  The loop in the
4727
previous section actually computes {\bf cross-correlation}:
4728
%
4729
\[ (f \star g)[n] = \sum_{m=0}^{N-1} f[m] g[n+m]  \]
4730
%
4731
where $f$ is a wave array with length $N$, $g$ is the window,
4732
and $\star$ is the symbol for cross-correlation.  To
4733
compute the $n$th element of the result, we shift $g$ to
4734
the right, which is why the index is $n+m$.
4735

4736
The definition of convolution is slightly different:
4737
%
4738
\[ (f \conv g)[n] = \sum_{m=0}^{N-1} f[m] g[n-m]  \]
4739
%
4740
The symbol $\conv$ represents convolution.  The difference is in the
4741
index of $g$: $m$ has been negated, so the summation iterates the
4742
elements of $g$ backward (assuming that negative indices wrap around
4743
to the end of the array).
4744

4745
Because the window we used in this example is symmetric,
4746
cross-correlation and convolution yield the same result.  When we use
4747
other windows, we will have to be more careful.
4748

4749
You might wonder why convolution is defined the way it is.  There
4750
are two reasons:
4751

4752
\begin{itemize}
4753

4754
\item This definition comes up naturally for several applications,
4755
especially analysis of signal-processing systems, which is
4756
the topic of Chapter~\ref{systems}.
4757

4758
\item Also, this definition is the basis of the Convolution Theorem,
4759
coming up very soon.
4760

4761
\end{itemize}
4762

4763

4764
\section{The frequency domain}
4765

4766
\begin{figure}
4767
% convolution4.py
4768
\centerline{\includegraphics[height=2.5in]{figs/convolution4.pdf}}
4769
\caption{Spectrum of the square wave before and after smoothing.}
4770
\label{fig.convolution4}
4771
\end{figure}
4772

4773
Smoothing makes the transitions in a square signal less abrupt,
4774
and makes the sound slightly muffled.  Let's see what effect this
4775
operation has on the spectrum.  First I'll plot the spectrum
4776
of the original wave:
4777

4778
\begin{verbatim}
4779
    spectrum = wave.make_spectrum()
4780
    spectrum.plot(color=GRAY)
4781
\end{verbatim}
4782

4783
Then the smoothed wave:
4784

4785
\begin{verbatim}
4786
    convolved = np.convolve(wave.ys, window, mode='same')
4787
    smooth = thinkdsp.Wave(convolved, framerate=wave.framerate)
4788
    spectrum2 = smooth.make_spectrum()
4789
    spectrum2.plot()
4790
\end{verbatim}
4791

4792
The mode flag {\tt same} indicates that the result should have the
4793
same length as the input.  In this example, it will include a few values
4794
that ``wrap around'', but that's ok for now.
4795

4796
Figure~\ref{fig.convolution4} shows the result.  The fundamental
4797
frequency is almost unchanged; the first few harmonics are
4798
attenuated, and the higher harmonics are almost eliminated.  So
4799
smoothing has the effect of a low-pass filter, which we
4800
saw in Section~\ref{spectrums} and Section~\ref{pink}.
4801

4802
To see how much each component has been attenuated, we can
4803
compute the ratio of the two spectrums:
4804

4805
\begin{verbatim}
4806
    amps = spectrum.amps
4807
    amps2 = spectrum2.amps
4808
    ratio = amps2 / amps    
4809
    ratio[amps<560] = 0
4810
    thinkplot.plot(ratio)
4811
\end{verbatim}
4812

4813
{\tt ratio} is the ratio of the amplitude before and after 
4814
smoothing.  When {\tt amps} is small, this ratio can be big
4815
and noisy, so for simplicity I set the ratio to 0 except
4816
where the harmonics are.
4817

4818
\begin{figure}
4819
% convolution5.py
4820
\centerline{\includegraphics[height=2.5in]{figs/convolution5.pdf}}
4821
\caption{Ratio of spectrums for the square wave, before and after smoothing.}
4822
\label{fig.convolution5}
4823
\end{figure}
4824

4825
Figure~\ref{fig.convolution5} shows the result.  As expected, the
4826
ratio is high for low frequencies and drops off at a cutoff frequency
4827
near 4000 Hz.  But there is another feature we did not expect: above
4828
the cutoff, the ratio bounces around between 0 and 0.2.
4829
What's up with that?
4830

4831

4832
\section{The convolution theorem}
4833
\label{convtheorem}
4834

4835
\begin{figure}
4836
% convolution6.py
4837
\centerline{\includegraphics[height=2.5in]{figs/convolution6.pdf}}
4838
\caption{Ratio of spectrums for the square wave, before and after
4839
  smoothing, along with the DFT of the smoothing window.}
4840
\label{fig.convolution6}
4841
\end{figure}
4842

4843
\newcommand{\DFT}{\mathrm{DFT}}
4844
\newcommand{\IDFT}{\mathrm{IDFT}}
4845

4846
The answer is the convolution theorem.  Stated mathematically:
4847
%
4848
\[ \DFT(f \conv g) = \DFT(f) \cdot \DFT(g) \]
4849
%
4850
where $f$ is a wave array and $g$ is a window.  In words,
4851
the convolution theorem says that if we convolve $f$ and $g$,
4852
and then compute the DFT, we get the same answer as computing
4853
the DFT of $f$ and $g$, and then multiplying the results
4854
element-wise.  More concisely, convolution in the time
4855
domain corresponds to multiplication in the frequency domain.
4856

4857
And that explains Figure~\ref{fig.convolution5}, because when we
4858
convolve a wave and a window, we multiply the spectrum of
4859
the wave with the spectrum of the window.  To see how that works,
4860
we can compute the DFT of the window:
4861

4862
\begin{verbatim}
4863
    padded = zero_pad(window, N)
4864
    dft_window = np.fft.rfft(padded)
4865
    thinkplot.plot(abs(dft_window))
4866
\end{verbatim}
4867

4868
{\tt padded} contains the window, padded with zeros to be the
4869
same length as {\tt wave}.  \verb"dft_window" contains the
4870
DFT of the smoothing window.
4871

4872
\newcommand{\abs}{\mathrm{abs}}
4873

4874
Figure~\ref{fig.convolution6} shows the result, along with the
4875
ratios we computed in the previous section.  The ratios are
4876
exactly the amplitudes in \verb"dft_window".  Mathematically:
4877
%
4878
\[ \abs(\DFT(f \conv g)) / \abs(\DFT(f)) = \abs(\DFT(g)) \]
4879
%
4880
In this context, the DFT of a window is called a {\bf filter}.
4881
For any convolution window in the time domain, there is a
4882
corresponding filter in the frequency domain.  And for any
4883
filter than can be expressed by element-wise multiplication in
4884
the frequency domain, there is a corresponding window.
4885

4886

4887
\section{Gaussian filter}
4888

4889
\begin{figure}
4890
% convolution7.py
4891
\centerline{\includegraphics[height=2.5in]{figs/convolution7.pdf}}
4892
\caption{Boxcar and Gaussian windows.}
4893
\label{fig.convolution7}
4894
\end{figure}
4895

4896
The moving average window we used in the previous section is a
4897
low-pass filter, but it is not a very good one.  The DFT drops off
4898
steeply at first, but then it bounces around.  Those bounces are
4899
called {\bf sidelobes}, and they are there because the moving average
4900
window is like a square wave, so its spectrum contains high-frequency
4901
harmonics that drop off proportionally to $1/f$, which is relatively
4902
slow.
4903

4904
We can do better with a Gaussian window.  SciPy provides functions
4905
that compute many common convolution windows, including {\tt
4906
  gaussian}:
4907

4908
\begin{verbatim}
4909
    gaussian = scipy.signal.gaussian(M=11, std=2)
4910
    gaussian /= sum(gaussian)
4911
\end{verbatim}
4912

4913
{\tt M} is the number of elements in the window; {\tt std}
4914
is the standard deviation of the Gaussian distribution used to
4915
compute it.  Figure~\ref{fig.convolution7} shows the shape
4916
of the window.  It is a discrete approximation of the Gaussian
4917
``bell curve''.  The figure also shows the moving average window
4918
from the previous example, which is sometimes called a
4919
{\bf boxcar window} because it looks like a rectangular railway car.
4920

4921
\begin{figure}
4922
% convolution.py
4923
\centerline{\includegraphics[height=2.5in]{figs/convolution8.pdf}}
4924
\caption{Ratio of spectrums before and after Gaussian smoothing, and
4925
  the DFT of the window.}
4926
\label{fig.convolution8}
4927
\end{figure}
4928

4929
I ran the computations from the previous sections again
4930
with this curve, and generated Figure~\ref{fig.convolution8},
4931
which shows the ratio of the spectrums before and after
4932
smoothing, along with the DFT of the Gaussian window. 
4933

4934
As a low-pass filter, Gaussian smoothing is better than a simple
4935
moving average.  After the ratio drops off, it stays low, with almost
4936
none of the sidelobes we saw with the boxcar window.  So it does a
4937
better job of cutting off the higher frequencies.
4938

4939
The reason it does so well is that the DFT of a Gaussian curve is also a
4940
Gaussian curve.  So the ratio drops off in proportion to $\exp(-f^2)$,
4941
which is much faster than $1/f$.
4942

4943

4944
\section{Efficient convolution}
4945
\label{effconv}
4946

4947
One of the reasons the FFT is such an important algorithm is that,
4948
combined with the Convolution Theorem, it provides an efficient
4949
way to compute convolution, cross-correlation, and autocorrelation.
4950

4951
Again, the Convolution Theorem states
4952
%
4953
\[ \DFT(f \conv g) = \DFT(f) \cdot \DFT(g) \]
4954
%
4955
So one way to compute a convolution is:
4956
%
4957
\[ f \conv g = \IDFT(\DFT(f) \cdot \DFT(g)) \]
4958
%
4959
where $IDFT$ is the inverse DFT.  A simple implementation of
4960
convolution takes time proportional to $N^2$; this algorithm,
4961
using FFT, takes time proportional to $N \log N$.
4962

4963
We can confirm that it works by computing the same convolution
4964
both ways.  As an example, I'll apply it to the BitCoin data
4965
shown in Figure~\ref{fig.convolution1}.
4966

4967
\begin{verbatim}
4968
    df = pandas.read_csv('coindesk-bpi-USD-close.csv', 
4969
                         nrows=1625, 
4970
                         parse_dates=[0])
4971
    ys = df.Close.values
4972
\end{verbatim}
4973

4974
This example uses Pandas to read the data from the CSV file (included
4975
in the repository for this book).  If you are not familiar with
4976
Pandas, don't worry: I'm not going to do much with it in this book.
4977
But if you're interested, you can learn more about it in
4978
{\it Think Stats} at \url{http://thinkstats2.com}.
4979

4980
The result, {\tt df}, is a DataFrame, one of the data structures
4981
provided by Pandas.  {\tt ys} is a NumPy array that contains daily
4982
closing prices.
4983

4984
Next I'll create a Gaussian window and convolve it with {\tt ys}:
4985

4986
\begin{verbatim}
4987
    window = scipy.signal.gaussian(M=30, std=6)
4988
    window /= window.sum()
4989
    smoothed = np.convolve(ys, window, mode='valid')
4990
\end{verbatim}
4991

4992
\verb"fft_convolve" computes the same thing using FFT:
4993

4994
\begin{verbatim}
4995
from np.fft import fft, ifft
4996

4997
def fft_convolve(signal, window):
4998
    fft_signal = fft(signal)
4999
    fft_window = fft(window)
5000
    return ifft(fft_signal * fft_window)
5001
\end{verbatim}
5002

5003
We can test it by padding the window to the same length
5004
as {\tt ys} and then computing the convolution:
5005

5006
\begin{verbatim}
5007
    padded = zero_pad(window, N)
5008
    smoothed2 = fft_convolve(ys, padded)
5009
\end{verbatim}
5010

5011
The result has $M-1$ bogus values at the beginning, where $M$ is the
5012
length of the window.  If we slice off the bogus values, the result
5013
agrees with \verb"fft_convolve" with about 12 digits of precision.
5014

5015
\begin{verbatim}
5016
    M = len(window)
5017
    smoothed2 = smoothed2[M-1:]
5018
\end{verbatim}
5019

5020

5021
\section{Efficient autocorrelation}
5022

5023
\begin{figure}
5024
% convolution.py
5025
\centerline{\includegraphics[height=2.5in]{figs/convolution9.pdf}}
5026
\caption{Autocorrelation functions computed by NumPy and
5027
  {\tt fft\_correlate}.}
5028
\label{fig.convolution9}
5029
\end{figure}
5030

5031
In Section~\ref{convolution} I presented definitions of
5032
cross-correlation and convolution, and we saw that they are
5033
almost the same, except that in convolution the window is
5034
reversed.
5035

5036
Now that we have an efficient algorithm for convolution, we
5037
can also use it to compute cross-correlations and autocorrelations.
5038
Using the data from the previous section, we can compute the
5039
autocorrelation Facebook stock prices:
5040

5041
\begin{verbatim}
5042
corrs = np.correlate(close, close, mode='same')
5043
\end{verbatim}
5044

5045
With {\tt mode='same'}, the result has the same length as {\tt close},
5046
corresponding to lags from $-N/2$ to $N/2-1$.  
5047
The gray line in Figure~\ref{fig.convolution9} shows the result.
5048
Except at {\tt lag=0}, there are no peaks, so there is no apparent
5049
periodic behavior in this signal.  However, the autocorrelation
5050
function drops off slowly, suggesting that this signal resembled
5051
pink noise, as we saw in Section~\ref{autopink}.
5052

5053
To compute autocorrelation using convolution, 
5054
have to zero-pad the signal to double the length. 
5055
This trick is necessary because the FFT is based
5056
on the assumption that the signal is periodic; that is, that it wraps
5057
around from the end to the beginning.  With time-series data like
5058
this, that assumption is invalid.  Adding zeros, and then trimming 
5059
the results, removes the bogus values.
5060

5061
Also, remember that convolution reverse the direction of the window.
5062
In order to cancel that effect, we reverse the direction of the
5063
window before calling \verb"fft_convolve", using {\tt np.flipud},
5064
which flips a NumPy array.  The result is a view of the array,
5065
not a copy, so this operation is fast.
5066

5067
\begin{verbatim}
5068
def fft_autocorr(signal):
5069
    N = len(signal)
5070
    signal = thinkdsp.zero_pad(signal, 2*N)
5071
    window = np.flipud(signal)
5072

5073
    corrs = fft_convolve(signal, window)
5074
    corrs = np.roll(corrs, N//2+1)[:N]
5075
    return corrs
5076
\end{verbatim}
5077

5078
The result from \verb"fft_convolve" has length $2N$.  Of those,
5079
the first and last $N/2$ are valid; the rest are the result of
5080
zero-padding.  To select the valid element, we roll the results
5081
and select the first $N$, corresponding to lags from $-N/2$ to
5082
$N/2-1$.
5083

5084
As shown in Figure~\ref{fig.convolution9} the results from
5085
\verb"fft_autocorr" and {\tt np.correlate} are identical (with
5086
about 9 digits of precision).
5087

5088
Notice that the correlations in Figure~\ref{fig.convolution9} are
5089
large numbers; we could normalize them (between -1 and 1) as shown
5090
in Section~\ref{correlate}.
5091

5092
The strategy we used here for auto-correlation also works for
5093
cross-correlation.  Again, you have to prepare the signals by flipping
5094
one and padding both, and then you have to trim the invalid parts of
5095
the result.  This padding and trimming is a nuisance, but that's why
5096
libraries like NumPy provide functions to do it for you.
5097

5098

5099
\section{Exercises}
5100

5101
Solutions to these exercises are in {\tt chap08soln.ipynb}.
5102

5103
\begin{exercise}
5104
The notebook for this chapter is {\tt chap08.ipynb}.
5105
Read through it and run the code.
5106

5107
It contains an interactive widget that lets you
5108
experiment with the parameters of the Gaussian window to see
5109
what effect they have on the cutoff frequency.
5110

5111
What goes wrong when you increase the width of the Gaussian,
5112
{\tt std}, without increasing the number of elements in the window,
5113
{\tt M}?
5114
\end{exercise}
5115

5116

5117
\begin{exercise}
5118
In this chapter I claimed that the Fourier transform of a Gaussian
5119
curve is also a Gaussian curve.  For discrete Fourier transforms,
5120
this relationship is approximately true.
5121

5122
Try it out for a few examples.  What happens to the Fourier transform
5123
as you vary {\tt std}?
5124
\end{exercise}
5125

5126

5127
\begin{exercise}
5128
If you did the exercises in Chapter~\ref{nonperiodic}, you saw the
5129
effect of the Hamming window, and some of the other windows provided
5130
by NumPy, on spectral leakage.  We can get some insight into the
5131
effect of these windows by looking at their DFTs.
5132

5133
In addition to the Gaussian window we used in this window, create a
5134
Hamming window with the same size.  Zero pad the windows and plot
5135
their DFTs.  Which window acts as a better low-pass filter?  You might
5136
find it useful to plot the DFTs on a log-$y$ scale.
5137

5138
Experiment with a few different windows and a few different sizes.
5139
\end{exercise}
5140

5141

5142

5143
\chapter{Differentiation and Integration}
5144
\label{diffint}
5145

5146
This chapter picks up where the previous chapter left off,
5147
looking at the relationship between windows in the time domain
5148
and filters in the frequency domain.
5149

5150
In particular, we'll look at the effect of a finite difference
5151
window, which approximates differentiation, and the cumulative
5152
sum operation, which approximates integration.
5153

5154
The code for this chapter is in {\tt chap09.ipynb}, which is in the
5155
repository for this book (see Section~\ref{code}).
5156
You can also view it at \url{http://tinyurl.com/thinkdsp09}.
5157

5158

5159
\section{Finite differences}
5160
\label{diffs}
5161

5162
In Section~\ref{smoothing}, we applied a smoothing window to
5163
the stock price of Facebook and found that a smoothing
5164
window in the time domain corresponds to a low-pass filter in
5165
the frequency domain.
5166

5167
In this section, we'll look at daily price changes and
5168
see that computing the difference between successive elements,
5169
in the time domain, corresponds to a high-pass filter.
5170

5171
Here's the code to read the data, store it as a wave, and compute its
5172
spectrum.
5173

5174
\begin{verbatim}
5175
import pandas as pd
5176

5177
names = ['date', 'open', 'high', 'low', 'close', 'volume']
5178
df = pd.read_csv('fb.csv', header=0, names=names)
5179
ys = df.close.values[::-1]
5180
close = thinkdsp.Wave(ys, framerate=1)
5181
spectrum = wave.make_spectrum()
5182
\end{verbatim}
5183

5184
This example uses Pandas to read the CSV file; the
5185
result is a DataFrame, {\tt df}, with columns for the opening
5186
price, closing price, and high and low prices.  I select the closing
5187
prices and save them in a Wave object.
5188
The framerate is 1 sample per day.  
5189

5190
Figure~\ref{fig.diff_int1} shows
5191
this time series and its spectrum.
5192
Visually, the time series resembles Brownian noise (see
5193
Section~\ref{brownian}).
5194
And the spectrum looks like a straight
5195
line, albeit a noisy one.  The estimated slope is -1.9,
5196
which is consistent with Brownian noise.
5197

5198
\begin{figure}
5199
% diff_int.py
5200
\centerline{\includegraphics[height=2.5in]{figs/diff_int1.pdf}}
5201
\caption{Daily closing price of Facebook and the spectrum of this time
5202
  series.}
5203
\label{fig.diff_int1}
5204
\end{figure}
5205

5206
Now let's compute the daily price change using {\tt np.diff}:
5207

5208
\begin{verbatim}
5209
    diff = np.diff(ys)
5210
    change = thinkdsp.Wave(diff, framerate=1)
5211
    change_spectrum = change.make_spectrum()
5212
\end{verbatim}
5213

5214
Figure~\ref{fig.diff_int2} shows the resulting wave and its spectrum.
5215
The daily changes resemble white noise, and the estimated slope of the
5216
spectrum, -0.06, is near zero, which is what we expect for white
5217
noise.
5218

5219
\begin{figure}
5220
% diff_int2.py
5221
\centerline{\includegraphics[height=2.5in]{figs/diff_int2.pdf}}
5222
\caption{Daily price change of Facebook and the spectrum of this time series.}
5223
\label{fig.diff_int2}
5224
\end{figure}
5225

5226

5227
\section{The frequency domain}
5228

5229
Computing the difference
5230
between successive elements is the same as convolution with
5231
the window {\tt [1, -1]}.
5232
If the order of those elements seems backward,
5233
remember that convolution reverses the window before applying it
5234
to the signal.
5235

5236
We can see the effect of this operation in the frequency domain
5237
by computing the DFT of the window.  
5238

5239
\begin{verbatim}
5240
    diff_window = np.array([1.0, -1.0])
5241
    padded = thinkdsp.zero_pad(diff_window, len(close))
5242
    diff_wave = thinkdsp.Wave(padded, framerate=close.framerate)
5243
    diff_filter = diff_wave.make_spectrum()
5244
\end{verbatim}
5245

5246
\begin{figure}
5247
% diff_int.py
5248
\centerline{\includegraphics[height=2.5in]{figs/diff_int3.pdf}}
5249
\caption{Filters corresponding to the diff and differentiate operators (left) and integration operator (right, log-$y$ scale).}
5250
\label{fig.diff_int3}
5251
\end{figure}
5252

5253
Figure~\ref{fig.diff_int3} shows the result.  The finite difference
5254
window corresponds to a high pass filter: its amplitude increases with
5255
frequency, linearly for low frequencies, and then sublinearly after
5256
that.  In the next section, we'll see why.
5257

5258

5259
\section{Differentiation}
5260
\label{effdiff}
5261

5262
The window we used in the previous section is a
5263
numerical approximation of the first derivative, so the filter
5264
approximates the effect of differentiation.
5265

5266
Differentiation in the time domain corresponds to a simple filter
5267
in the frequency domain; we can figure out what it is with a little
5268
math.
5269

5270
Suppose we have a complex sinusoid with frequency $f$:
5271
%
5272
\[ E_f(t) = e^{2 \pi i f t} \]
5273
%
5274
The first derivative of $E_f$ is
5275
%
5276
\[ \frac{d}{dt} E_f(t) = 2 \pi i f e^{2 \pi i f t} \]
5277
%
5278
which we can rewrite as
5279
%
5280
\[ \frac{d}{dt} E_f(t) = 2 \pi i f E_f(t) \]
5281
%
5282
In other words, taking the derivative of $E_f$ is the same
5283
as multiplying by $2 \pi i f$, which is a complex number
5284
with magnitude $2 \pi f$ and angle $\pi/2$.
5285

5286
We can compute the filter that corresponds to differentiation,
5287
like this:
5288

5289
\begin{verbatim}
5290
    deriv_filter = close.make_spectrum()
5291
    deriv_filter.hs = PI2 * 1j * deriv_filter.fs
5292
\end{verbatim}
5293

5294
I started with the spectrum of {\tt close}, which has the right
5295
size and framerate, then replaced the {\tt hs} with $2 \pi i f$.
5296
Figure~\ref{fig.diff_int3} (left) shows this filter; it is a straight line.
5297

5298
As we saw in Section~\ref{synthmat}, multiplying a complex sinusoid
5299
by a complex number has two effects: it multiplies
5300
the amplitude, in this case by $2 \pi f$, and shifts the phase
5301
offset, in this case by $\pi/2$.
5302

5303
If you are familiar with the language of operators and eigenfunctions,
5304
each $E_f$ is an eigenfunction of the differentiation operator, with the
5305
corresponding eigenvalue $2 \pi i f$.  See
5306
\url{http://en.wikipedia.org/wiki/Eigenfunction}.
5307

5308
If you are not familiar with that language, here's what it
5309
means:
5310

5311
\newcommand{\op}{\mathcal{A}}
5312

5313
\begin{itemize}
5314

5315
\item An operator is a function that takes a function and returns
5316
another function.  For example, differentiation is an operator.
5317

5318
\item A function, $g$, is an eigenfunction of an operator, $\op$, if
5319
applying $\op$ to $g$ has the effect of multiplying the function by
5320
a scalar.  That is, $\op g = \lambda g$.
5321

5322
\item In that case, the scalar $\lambda$ is the eigenvalue that
5323
corresponds to the eigenfunction $g$.
5324

5325
\item A given operator might have many eigenfunctions, each with
5326
a corresponding eigenvalue. 
5327

5328
\end{itemize}
5329

5330
Because complex sinusoids are eigenfunctions of the differentiation
5331
operator, they are easy to differentiate.  All we have to do is
5332
multiply by a complex scalar.
5333

5334
For signals with more than one
5335
component, the process is only slightly harder:
5336

5337
\begin{enumerate}
5338

5339
\item Express the signal as the sum of complex sinusoids.
5340

5341
\item Compute the derivative of each component by multiplication.
5342

5343
\item Add up the differentiated components.
5344

5345
\end{enumerate}
5346

5347
If that process sounds familiar, that's because it is identical
5348
to the algorithm for convolution in Section~\ref{effconv}: compute
5349
the DFT, multiply by a filter, and compute the inverse DFT.
5350

5351
{\tt Spectrum} provides a method that applies the differentiation
5352
filter:
5353

5354
\begin{verbatim}
5355
# class Spectrum:
5356

5357
    def differentiate(self):
5358
        self.hs *= PI2 * 1j * self.fs
5359
\end{verbatim}
5360

5361
We can use it to compute the derivative of the Facebook time series:
5362

5363
\begin{verbatim}
5364
    deriv_spectrum = close.make_spectrum()
5365
    deriv_spectrum.differentiate()
5366
    deriv = deriv_spectrum.make_wave()
5367
\end{verbatim}
5368

5369
\begin{figure}
5370
% diff_int.py
5371
\centerline{\includegraphics[height=2.5in]{figs/diff_int4.pdf}}
5372
\caption{Comparison of daily price changes computed by
5373
{\tt np.diff} and by applying the differentiation filter.}
5374
\label{fig.diff_int4}
5375
\end{figure}
5376

5377
Figure~\ref{fig.diff_int4} compares the daily price changes computed by
5378
{\tt np.diff} with the derivative we just computed.
5379
I selected the first 50 values in the time series so we can see the
5380
differences more clearly.
5381

5382
The derivative is noisier, because it amplifies the high frequency
5383
components more, as shown in Figure~\ref{fig.diff_int3} (left).  Also, the
5384
first few elements of the derivative are very noisy.  The problem
5385
there is that the DFT-based derivative is based on the assumption that
5386
the signal is periodic.  In effect, it connects the last element in
5387
the time series back to the first element, which creates artifacts at
5388
the boundaries.
5389

5390
To summarize, we have shown:
5391

5392
\begin{itemize} 
5393

5394
\item Computing the difference between successive values in a signal
5395
  can be expressed as convolution with a simple window.  The result is
5396
  an approximation of the first derivative.
5397

5398
\item Differentiation in the time domain corresponds to a simple
5399
  filter in the frequency domain.  For periodic signals, the result is
5400
  the first derivative, exactly.  For some non-periodic signals, it
5401
  can approximate the derivative.
5402

5403
\end{itemize}
5404

5405
Using the DFT to compute derivatives is the basis of {\bf spectral
5406
  methods} for solving differential equations (see
5407
\url{http://en.wikipedia.org/wiki/Spectral_method}).
5408

5409
In particular, it is useful for the analysis of linear, time-invariant
5410
systems, which is coming up in Chapter~\ref{systems}.
5411

5412

5413
\section{Integration}
5414

5415
\begin{figure}
5416
% diff_int.py
5417
\centerline{\includegraphics[height=2.5in]{figs/diff_int5.pdf}}
5418
\caption{Comparison of the original time series and the integrated
5419
derivative.}
5420
\label{fig.diff_int5}
5421
\end{figure}
5422

5423
In the previous section, we showed that differentiation in the time
5424
domain corresponds to a simple filter in the frequency domain: it
5425
multiplies each component by $2 \pi i f$.  Since integration is
5426
the inverse of differentiation, it also corresponds to a simple
5427
filter: it divides each component by $2 \pi i f$.
5428

5429
We can compute this filter like this:
5430

5431
\begin{verbatim}
5432
    integ_filter = close.make_spectrum()
5433
    integ_filter.hs = 1 / (PI2 * 1j * integ_filter.fs)
5434
\end{verbatim}
5435

5436
Figure~\ref{fig.diff_int3} (right) shows this filter on a log-$y$ scale,
5437
which makes it easier to see.
5438

5439
{\tt Spectrum} provides a method that applies the integration filter:
5440

5441
\begin{verbatim}
5442
# class Spectrum:
5443

5444
    def integrate(self):
5445
        self.hs /= PI2 * 1j * self.fs
5446
\end{verbatim}
5447

5448
We can confirm that the integration filter is correct by applying it
5449
to the spectrum of the derivative we just computed:
5450

5451
\begin{verbatim}
5452
    integ_spectrum = deriv_spectrum.copy()
5453
    integ_spectrum.integrate()
5454
\end{verbatim}
5455

5456
But notice that at $f=0$, we are dividing by 0.  The result in
5457
NumPy is NaN, which is a special floating-point value that
5458
represents ``not a number''.  We can partially deal with this
5459
problem by setting this value to 0 before converting the
5460
spectrum back to a wave:
5461

5462
\begin{verbatim}
5463
    integ_spectrum.hs[0] = 0
5464
    integ_wave = integ_spectrum.make_wave()
5465
\end{verbatim}
5466

5467
Figure~\ref{fig.diff_int5} shows this integrated derivative along with
5468
the original time series.  They are almost identical, but the
5469
integrated derivative has been shifted down.  The problem is that when
5470
we clobbered the $f=0$ component, we set the bias of the signal to 0.
5471
But that should not be surprising; in general, differentiation loses
5472
information about the bias, and integration can't recover it.  In some
5473
sense, the NaN at $f=0$ is telling us that this element is unknown.
5474

5475
\begin{figure}
5476
% diff_int.py
5477
\centerline{\includegraphics[height=2.5in]{figs/diff_int6.pdf}}
5478
\caption{A sawtooth wave and its spectrum.}
5479
\label{fig.diff_int6}
5480
\end{figure}
5481

5482
If we provide this ``constant of integration'', the results are
5483
identical, which confirms that this integration filter is the correct
5484
inverse of the differentiation filter.
5485

5486
\section{Cumulative sum}
5487
\label{cumsum}
5488

5489
In the same way that the diff operator approximates differentiation,
5490
the cumulative sum approximates integration.
5491
I'll demonstrate with a Sawtooth signal.
5492

5493
\begin{verbatim}
5494
    signal = thinkdsp.SawtoothSignal(freq=50)
5495
    in_wave = signal.make_wave(duration=0.1, framerate=44100)
5496
\end{verbatim}
5497

5498
Figure~\ref{fig.diff_int6} shows this wave and its spectrum.
5499

5500
{\tt Wave} provides a method that computes the cumulative sum of
5501
a wave array and returns a new Wave object:
5502

5503
\begin{verbatim}
5504
# class Wave:
5505

5506
    def cumsum(self):
5507
        ys = np.cumsum(self.ys)
5508
        ts = self.ts.copy()
5509
        return Wave(ys, ts, self.framerate)
5510
\end{verbatim}
5511

5512
We can use it to compute the cumulative sum of \verb"in_wave":
5513

5514
\begin{verbatim}
5515
    out_wave = in_wave.cumsum()
5516
    out_wave.unbias()
5517
\end{verbatim}
5518

5519
\begin{figure}
5520
% diff_int.py
5521
\centerline{\includegraphics[height=2.5in]{figs/diff_int7.pdf}}
5522
\caption{A parabolic wave and its spectrum.}
5523
\label{fig.diff_int7}
5524
\end{figure}
5525

5526
Figure~\ref{fig.diff_int7} shows the resulting wave and its spectrum.
5527
If you did the exercises in Chapter~\ref{harmonics}, this waveform should
5528
look familiar: it's a parabolic signal.
5529

5530
Comparing the spectrum of the parabolic signal to the spectrum of the
5531
sawtooth, the amplitudes of the components drop off more quickly.  In
5532
Chapter~\ref{harmonics}, we saw that the components of the sawtooth
5533
drop off in proportion to $1/f$.  Since the cumulative sum
5534
approximates integration, and integration filters components in
5535
proportion to $1/f$, the components of the parabolic wave drop off in
5536
proportion to $1/f^2$.
5537

5538
We can see that graphically by computing the filter that corresponds
5539
to the cumulative sum:
5540

5541
\begin{verbatim}
5542
    cumsum_filter = diff_filter.copy()
5543
    cumsum_filter.hs = 1 / cumsum_filter.hs
5544
\end{verbatim}
5545

5546
Because {\tt cumsum} is the inverse operation of {\tt diff}, we
5547
start with a copy of \verb"diff_filter", which is the filter
5548
that corresponds to the {\tt diff} operation, and then invert the
5549
{\tt hs}.
5550

5551
\begin{figure}
5552
% diff_int.py
5553
\centerline{\includegraphics[height=2.5in]{figs/diff_int8.pdf}}
5554
\caption{Filters corresponding to cumulative sum and integration.}
5555
\label{fig.diff_int8}
5556
\end{figure}
5557

5558
Figure~\ref{fig.diff_int8} shows the filters corresponding to
5559
cumulative sum and integration.  The cumulative sum is a good
5560
approximation of integration except at the highest frequencies,
5561
where it drops off a little faster. 
5562

5563
To confirm that this is the correct filter for the cumulative
5564
sum, we can compare it to the ratio of the spectrum
5565
\verb"out_wave" to the spectrum of \verb"in_wave":
5566

5567
\begin{verbatim}
5568
    in_spectrum = in_wave.make_spectrum()
5569
    out_spectrum = out_wave.make_spectrum()
5570
    ratio_spectrum = out_spectrum.ratio(in_spectrum, thresh=1)
5571
\end{verbatim}
5572

5573
\begin{figure}
5574
% diff_int.py
5575
\centerline{\includegraphics[height=2.5in]{figs/diff_int9.pdf}}
5576
\caption{Filter corresponding to cumulative sum and actual ratios of
5577
  the before-and-after spectrums.}
5578
\label{fig.diff_int9}
5579
\end{figure}
5580

5581
And here's the method that computes the ratios:
5582

5583
\begin{verbatim}
5584
    def ratio(self, denom, thresh=1):
5585
        ratio_spectrum = self.copy()
5586
        ratio_spectrum.hs /= denom.hs
5587
        ratio_spectrum.hs[denom.amps < thresh] = np.nan
5588
        return ratio_spectrum
5589
\end{verbatim}
5590

5591
When {\tt denom.amps} is small, the resulting ratio is noisy,
5592
so I set those values to NaN.
5593

5594
Figure~\ref{fig.diff_int9} shows the ratios and the filter
5595
corresponding to the cumulative sum.  They agree, which confirms that
5596
inverting the filter for {\tt diff} yields the filter for {\tt
5597
  cumsum}.
5598

5599
Finally, we can confirm that the Convolution Theorem applies by
5600
applying the {\tt cumsum} filter in the frequency domain:
5601

5602
\begin{verbatim}
5603
    out_wave2 = (in_spectrum * cumsum_filter).make_wave()
5604
\end{verbatim}
5605

5606
Within the limits of floating-point error, \verb"out_wave2" is
5607
identical to \verb"out_wave", which we computed using {\tt cumsum}, so
5608
the Convolution Theorem works!  But note that this demonstration only
5609
works with periodic signals.
5610

5611

5612
\section{Integrating noise}
5613

5614
In Section~\ref{brownian}, we generated Brownian noise by computing the
5615
cumulative sum of white noise.
5616
Now that we understand the effect of {\tt cumsum} in the frequency
5617
domain, we have some insight into the spectrum of Brownian noise.
5618

5619
White noise has equal power at all frequencies, on average.  When we
5620
compute the cumulative sum, the amplitude of each component is divided
5621
by $f$.  Since power is the square of magnitude, the power of each
5622
component is divided by $f^2$.  So on average, the power at frequency
5623
$f$ is proportional to $1 / f^2$:
5624
%
5625
\[ P_f = K / f^2 \]
5626
%
5627
where $K$ is a constant that's not important.
5628
Taking the log of both sides yields:
5629
%
5630
\[ \log P_f = \log K - 2 \log f \]
5631
%
5632
And that's why, when we plot the spectrum of Brownian noise on a
5633
log-log scale, we expect to see a straight line with slope -2, at
5634
least approximately.
5635

5636
In Section~\ref{diffs} we plotted the spectrum of closing prices for
5637
Facebook, and estimated that the slope is -1.9, which is consistent
5638
with Brownian noise.  Many stock prices have similar spectrums.
5639

5640
When we use the {\tt diff} operator to compute daily changes, we
5641
multiplied the {\em amplitude} of each component by a filter proportional to
5642
$f$, which means we multiplied the {\em power} of each component by $f^2$.
5643
On a log-log scale, this operation adds 2 to the slope of the
5644
power spectrum, which is why the estimated slope of the result
5645
is near 0.1 (but a little lower, because {\tt diff} only approximates
5646
differentiation).
5647

5648

5649

5650
\section{Exercises}
5651

5652
The notebook for this chapter is {\tt chap09.ipynb}.
5653
You might want to read through it and run the code.
5654

5655
Solutions to these exercises are in {\tt chap09soln.ipynb}.
5656

5657
\begin{exercise}
5658
The goal of this exercise is to explore the effect of {\tt diff} and
5659
{\tt differentiate} on a signal. Create a triangle wave and plot
5660
it. Apply {\tt diff} and plot the result. Compute the spectrum of the
5661
triangle wave, apply {\tt differentiate}, and plot the result. Convert
5662
the spectrum back to a wave and plot it. Are there differences between
5663
the effect of {\tt diff} and {\tt differentiate} for this wave?
5664
\end{exercise}
5665

5666
\begin{exercise}
5667
The goal of this exercise is to explore the effect of {\tt cumsum} and
5668
{\tt integrate} on a signal. Create a square wave and plot it. Apply
5669
{\tt cumsum} and plot the result. Compute the spectrum of the square
5670
wave, apply {\tt integrate}, and plot the result. Convert the spectrum
5671
back to a wave and plot it. Are there differences between the effect
5672
of {\tt cumsum} and {\tt integrate} for this wave?
5673
\end{exercise}
5674

5675
\begin{exercise}
5676
The goal of this exercise is the explore the effect of integrating
5677
twice. Create a sawtooth wave, compute its spectrum, then apply {\tt
5678
  integrate} twice. Plot the resulting wave and its spectrum. What is
5679
the mathematical form of the wave? Why does it resemble a sinusoid?
5680
\end{exercise}
5681

5682
\begin{exercise}
5683
The goal of this exercise is to explore the effect of the 2nd
5684
difference and 2nd derivative. Create a {\tt CubicSignal}, which is
5685
defined in {\tt thinkdsp}. Compute the second difference by applying
5686
{\tt diff} twice. What does the result look like?  Compute the second
5687
derivative by applying {\tt differentiate} to the spectrum twice.
5688
Does the result look the same?
5689

5690
Plot the filters that corresponds to the 2nd difference and the 2nd
5691
derivative and compare them. Hint: In order to get the filters on the
5692
same scale, use a wave with framerate 1.
5693
\end{exercise}
5694

5695

5696

5697

5698
\chapter{LTI systems}
5699
\label{systems}
5700

5701
This chapter presents the theory of signals and systems, using
5702
musical acoustics as an example.  It explains an
5703
important application of the Convolution Theorem, characterization
5704
of linear, time-invariant systems (which I'll define soon).
5705

5706
The code for this chapter is in {\tt chap10.ipynb}, which is in the
5707
repository for this book (see Section~\ref{code}).
5708
You can also view it at \url{http://tinyurl.com/thinkdsp10}.
5709

5710

5711

5712
\section{Signals and systems}
5713

5714
In the context of signal processing, a {\bf system} is an abstract
5715
representation of anything that takes a signal as input and produces
5716
a signal as output.
5717

5718
For example, an electronic amplifier is a circuit that takes an
5719
electrical signal as input and produces a (louder) signal as output.
5720

5721
As another example, when you listen to a musical performance, you
5722
can think of the room as a system that takes the sound of the
5723
performance at the location where it is generated and produces a
5724
somewhat different sound at the location where you hear it.
5725

5726
A {\bf linear, time-invariant system}\footnote{My presentation here
5727
  follows \url{http://en.wikipedia.org/wiki/LTI_system_theory}.} is a
5728
system with these two properties:
5729

5730
\begin{enumerate}
5731

5732
\item Linearity: If you put two inputs into the system at the same
5733
  time, the result is the sum of their outputs.  Mathematically, if an
5734
  input $x_1$ produces output $y_1$ and another input $x_2$ produces
5735
  $y_2$, then $a x_1 + b x_2$ produces $a y_1 + b y_2$, where $a$ and
5736
  $b$ are scalars.
5737

5738
\item Time invariance: The
5739
  effect of the system doesn't vary over time, or depend on the state
5740
  of the system.  So if inputs $x_1$ and $x_2$ differ by a shift in time,
5741
  their outputs $y_1$ and $y_2$ differ by the same shift, but are otherwise
5742
  identical.
5743

5744
\end{enumerate}
5745

5746
Many physical systems have these properties, at least approximately.
5747

5748
\begin{itemize}
5749

5750
\item Circuits that contain only resistors, capacitors and inductors are
5751
LTI, to the degree that the components behave like their idealized
5752
models.
5753

5754
\item Mechanical systems that contain springs, masses and
5755
dashpots are also LTI, assuming linear springs (force proportional
5756
to displacement) and dashpots (force proportional to velocity).
5757

5758
\item Also, and most relevant to applications in this book,
5759
the media that transmit sound (including air, water
5760
and solids) are well-modeled by LTI systems.
5761

5762
\end{itemize}
5763

5764
LTI systems are described by linear differential equations, and
5765
the solutions of those equations are complex sinusoids (see
5766
\url{http://en.wikipedia.org/wiki/Linear_differential_equation}).
5767

5768
This result provides an algorithm for computing the effect of
5769
an LTI system on an input signal:
5770

5771
\begin{enumerate}
5772

5773
\item Express the signal as the sum of complex sinusoid components.
5774

5775
\item For each input component, compute the corresponding output component.
5776

5777
\item Add up the output components.
5778

5779
\end{enumerate}
5780

5781
At this point, I hope this algorithm sounds familiar.  It's the
5782
same algorithm we used for convolution in Section~\ref{effconv}, and
5783
for differentiation in Section~\ref{effdiff}.  This process
5784
is called {\bf spectral decomposition} because we ``decompose''
5785
the input signal into its spectral components.
5786

5787
In order to apply this process to an LTI system, we have to {\bf
5788
  characterize} the system by finding its effect on each component
5789
of the input signal.  For mechanical systems, it turns out that there
5790
is a simple and efficient way to do that: you kick it and record
5791
the output.
5792

5793
Technically, the ``kick'' is called an {\bf impulse} and the
5794
output is called the {\bf impulse response}.  You might wonder
5795
how a single impulse can completely characterize a system.  You
5796
can see the answer by computing the DFT of an impulse.  Here's
5797
a wave array with an impulse at $t=0$:
5798

5799
\begin{verbatim}
5800
    impulse = np.zeros(8)
5801
    impulse[0] = 1
5802
    impulse_spectrum = np.fft.fft(impulse)
5803
\end{verbatim}
5804

5805
Here's the wave array:
5806

5807
\begin{verbatim}
5808
[ 1.  0.  0.  0.  0.  0.  0.  0.]
5809
\end{verbatim}
5810

5811
And here's its spectrum:
5812

5813
\begin{verbatim}
5814
[ 1.+0.j  1.+0.j  1.+0.j  1.+0.j  1.+0.j  1.+0.j  1.+0.j  1.+0.j]
5815
\end{verbatim}
5816

5817
The spectrum is all ones; that is, an impulse is the sum of components
5818
with equal magnitudes at all frequencies.  This
5819
spectrum should not be confused with white noise, which has the same
5820
{\em average} power at all frequencies, but varies around that
5821
average.
5822

5823
When you test a system by inputting
5824
an impulse, you are testing the response of the 
5825
system at all frequencies.  And you can test them all at the same
5826
time because the system is linear, so simultaneous tests don't
5827
interfere with each other.
5828

5829

5830
\section{Windows and filters}
5831
\label{winfilt}
5832

5833
To show why this kind of system characterization works, I
5834
will start with a simple example: a 2-element moving average.
5835
We can think of this operation as a system that takes a signal
5836
as an input and produces a slightly smoother signal as an output.
5837

5838
In this example we know what the window is, so we can compute
5839
the corresponding filter.  But that's not usually the case; in the
5840
next section we'll look at an example where we don't know the
5841
window or the filter ahead of time.
5842

5843
\begin{figure}
5844
% systems.py
5845
\centerline{\includegraphics[height=2.5in]{figs/systems1.pdf}}
5846
\caption{DFT of a 2-element moving average window.}
5847
\label{fig.systems1}
5848
\end{figure}
5849

5850
Here's a window that computes a
5851
2-element moving average (see Section~\ref{smoothing}):
5852

5853
\begin{verbatim}
5854
    window_array = np.array([0.5, 0.5, 0, 0, 0, 0, 0, 0,])
5855
    window = thinkdsp.Wave(window_array, framerate=8)
5856
\end{verbatim}
5857

5858
We can find the corresponding filter by computing the DFT of the
5859
window:
5860

5861
\begin{verbatim}
5862
    filtr = window.make_spectrum(full=True)
5863
\end{verbatim}
5864

5865
Figure~\ref{fig.systems1} shows the result.  The filter that corresponds to a
5866
moving average window is a low-pass filter with the approximate shape
5867
of a Gaussian curve.
5868

5869
Now imagine that we did not know the window or the corresponding filter,
5870
and we wanted to characterize this system.  We would do that by
5871
inputting an impulse and measuring the impulse response.
5872

5873
In this example, we can compute the impulse response by multiplying
5874
spectrum of the impulse and the filter, and then converting the result
5875
from a spectrum to a wave:
5876

5877
\begin{verbatim}
5878
    product = impulse_spectrum * filtr
5879
    filtered = product.make_wave()
5880
\end{verbatim}
5881

5882
Since \verb"impulse_spectrum" is all ones, the product is
5883
identical to the filter, and the filtered wave is identical to the
5884
window.
5885

5886
This example demonstrates two things:
5887

5888
\begin{itemize}
5889

5890
\item Because the spectrum of an impulse is all ones, the DFT
5891
of the impulse response is identical to the filter that
5892
characterizes the system.
5893

5894
\item Therefore, the impulse response is identical to the
5895
convolution window that characterizes the system.
5896

5897
\end{itemize}
5898

5899

5900
\section{Acoustic response}
5901
\label{response}
5902

5903
To characterize the acoustic response of a room or open space, a
5904
simple way to generate an impulse is to pop a balloon or
5905
fire a gun.  A gunshot puts an impulse into
5906
the system; the sound you hear is the impulse response.
5907

5908
\begin{figure}
5909
% systems.py
5910
\centerline{\includegraphics[height=2.5in]{figs/systems6.pdf}}
5911
\caption{Waveform of a gunshot.}
5912
\label{fig.systems6}
5913
\end{figure}
5914

5915
As an example, I'll use a recording of a gunshot to characterize
5916
the room where the gun was fired, then use the impulse response
5917
to simulate the effect of that room on a violin recording.
5918

5919
This example is in {\tt chap10.ipynb}, which is in the repository
5920
for this book; you can also view it, and listen to the examples,
5921
at \url{http://tinyurl.com/thinkdsp10}.
5922

5923
Here's the gunshot:
5924

5925
\begin{verbatim}
5926
    response = thinkdsp.read_wave('180961__kleeb__gunshots.wav')
5927
    response = response.segment(start=0.26, duration=5.0)
5928
    response.normalize()
5929
    response.plot()
5930
\end{verbatim}
5931

5932
I select a segment starting at 0.26 seconds to remove the silence
5933
before the gunshot.  Figure~\ref{fig.systems6} (left) shows the
5934
waveform of the gunshot.  Next we compute the DFT of {\tt response}:
5935

5936
\begin{verbatim}
5937
    transfer = response.make_spectrum()
5938
    transfer.plot()
5939
\end{verbatim}
5940

5941
Figure~\ref{fig.systems6} (right) shows the result.  This spectrum
5942
encodes the response of the room; for each frequency, the spectrum
5943
contains a complex number that represents an amplitude multiplier and
5944
a phase shift.  This spectrum is called a {\bf transfer
5945
function} because it contains information about how the system transfers
5946
the input to the output.
5947

5948
Now we can simulate the effect this room would have on the sound
5949
of a violin.  Here is the violin recording we used in Section~\ref{violin}
5950

5951
\begin{verbatim}
5952
    violin = thinkdsp.read_wave('92002__jcveliz__violin-origional.wav')
5953
    violin.truncate(len(response))
5954
    violin.normalize()
5955
\end{verbatim}
5956

5957
The violin and gunshot waves were sampled at the same framerate,
5958
44,100 Hz.  And coincidentally, the duration of both is about the
5959
same.  I trimmed the violin wave to the same length as the gunshot.
5960

5961
Next I compute the DFT of the violin wave:
5962

5963
\begin{verbatim}
5964
    spectrum = violin.make_spectrum()
5965
\end{verbatim}
5966

5967
Now I know the magnitude and phase of each component in the
5968
input, and I know the transfer function of the system.  Their
5969
product is the DFT of the output, which we can use to compute the
5970
output wave:
5971

5972
\begin{verbatim}
5973
    output = (spectrum * transfer).make_wave()
5974
    output.normalize()
5975
    output.plot()
5976
\end{verbatim}
5977

5978
\begin{figure}
5979
% systems.py
5980
\centerline{\includegraphics[height=2.5in]{figs/systems7.pdf}}
5981
\caption{The waveform of the violin recording before and after convolution.}
5982
\label{fig.systems7}
5983
\end{figure}
5984

5985
Figure~\ref{fig.systems7} shows the input (top) and output (bottom) of
5986
the system.  They are substantially different, and the differences are
5987
clearly audible.  Load {\tt chap10.ipynb} and listen to them.  One
5988
thing I find striking about this example is that you can get a sense
5989
of what the room was like; to me, it sounds like a long, narrow room
5990
with hard floors and ceilings.  That is, like a firing range.
5991

5992
There's one thing I glossed over in this example that I'll mention
5993
in case it bothers anyone.  The violin recording I started with
5994
has already been transformed by one system: the room where it was
5995
recorded.  So what I really computed in my example is the sound
5996
of the violin after two transformations.  To properly simulate
5997
the sound of a violin in a different room, I should have characterized
5998
the room where the violin was recorded and applied the inverse
5999
of that transfer function first.
6000

6001

6002
\section{Systems and convolution}
6003
\label{sysconv}
6004

6005
\begin{figure}
6006
% systems.py
6007
\centerline{\includegraphics[height=2.5in]{figs/systems8.pdf}}
6008
\caption{Sum of a wave and a shifted, scaled copy.}
6009
\label{fig.systems8}
6010
\end{figure}
6011

6012
If you think the previous example is black magic,
6013
you are not alone.  I've been thinking about it for a while and it
6014
still makes my head hurt.
6015

6016
In the previous section, I suggested one way to think about it:
6017

6018
\begin{itemize}
6019

6020
\item An impulse is made up of components with amplitude 1 at all
6021
  frequencies.
6022

6023
\item The impulse response contains the sum of the responses of the
6024
  system to all of these components.
6025

6026
\item The transfer function, which is the DFT of the impulse response,
6027
  encodes the effect of the system on each frequency component in the form
6028
  of an amplitude multiplier and a phase shift.
6029

6030
\item For any input, we can compute the response of the system
6031
  by breaking the input into components, computing the response to
6032
  each component, and adding them up.
6033

6034
\end{itemize}
6035

6036
But if you don't like that, there's another way to think about
6037
it altogether: convolution!  By the Convolution Theorem, multiplication
6038
in the frequency domain corresponds to convolution in the time
6039
domain.  In this example, the output of the system is the convolution
6040
of the input and the system response.
6041

6042
Here are the keys to understanding why that works:
6043

6044
\begin{itemize}
6045

6046
\item You can think of the samples in the input wave as a sequence
6047
of impulses with varying amplitude.
6048

6049
\item Each impulse in the input yields a copy of the impulse response,
6050
  shifted in time (because the system is time-invariant) and scaled by
6051
  the amplitude of the input.
6052

6053
\item The output is the sum of the shifted, scaled copies of the
6054
  impulse response.  The copies add up because the system is linear.
6055

6056
\end{itemize}
6057

6058
Let's work our way up gradually: suppose that instead of firing one
6059
gun, we fire two: a big one with amplitude 1 at $t=0$ and a
6060
smaller one with amplitude 0.5 at $t=1$.
6061

6062
We can compute the response of the system by adding up
6063
the original impulse response and a scaled, shifted copy of itself.
6064
Here's a function that makes a shifted, scaled copy of
6065
a wave:
6066

6067
\begin{verbatim}
6068
def shifted_scaled(wave, shift, factor):
6069
    res = wave.copy()
6070
    res.shift(shift)
6071
    res.scale(factor)
6072
    return res
6073
\end{verbatim}
6074

6075
The parameter {\tt shift} is a time shift in seconds; {\tt factor}
6076
is a multiplicative factor.
6077

6078
Here's how we use it to compute the response to a two-gun salute:
6079

6080
\begin{verbatim}
6081
    shift = 1
6082
    factor = 0.5
6083
    gun2 = response + shifted_scaled(response, shift, factor)
6084
\end{verbatim}
6085

6086
Figure~\ref{fig.systems8} shows the result.  You can hear what
6087
it sounds like in {\tt chap10.ipynb}.  Not surprisingly, it
6088
sounds like two gunshots, the first one louder than the second.
6089

6090
Now suppose instead of two guns, you add up 100 guns fired at a rate
6091
of 441 shots per second.
6092
This loop computes the result:
6093

6094
\begin{verbatim}
6095
    dt = 1 / 441
6096
    total = 0
6097
    for k in range(100):
6098
        total += shifted_scaled(response, k*dt, 1.0)
6099
\end{verbatim}
6100

6101
With 441 shots per second,
6102
so you don't hear the individual shots.  Instead, it sounds
6103
like a periodic signal at 441 Hz.  If you play this example, it
6104
sounds like a car horn in a garage.
6105

6106
And that brings us to a key insight: you can think of any wave as a
6107
series of samples, where each sample is an impulse with a different
6108
amplitude.
6109

6110
As a example, I'll generate a sawtooth signal at 441 Hz:
6111

6112
\begin{verbatim}
6113
    signal = thinkdsp.SawtoothSignal(freq=441)
6114
    wave = signal.make_wave(duration=0.1,
6115
                            framerate=response.framerate)
6116
\end{verbatim}
6117

6118
Now I'll loop through the series of impulses that make up the
6119
sawtooth, and add up the impulse responses:
6120

6121
\begin{verbatim}
6122
    total = 0
6123
    for t, y in zip(wave.ts, wave.ys):
6124
        total += shifted_scaled(response, t, y)
6125
\end{verbatim}
6126

6127
The result is what it would sound like to play a sawtooth wave in a
6128
firing range.  Again, you can listen to it in {\tt chap10.ipynb}.
6129

6130
\begin{figure}
6131
\centerline{\includegraphics[height=1.2in]{figs/diagram2.pdf}}
6132
\caption{Diagram of the sum of scaled and shifted copies of $g$.}
6133
\label{fig.convolution}
6134
\end{figure}
6135

6136
Figure~\ref{fig.convolution} shows a diagram of this computation,
6137
where $f$ is the sawtooth, $g$ is the impulse response, and $h$
6138
is the sum of the shifted, scaled copies of $g$.
6139

6140
For the example shown:
6141

6142
\[ h[2] = f[0]g[2] + f[1]g[1] + f[2]g[0]  \]
6143

6144
And more generally,
6145

6146
\[ h[n] = \sum_{m=0}^{N-1} f[m] g[n-m]  \]
6147

6148
You might recognize this equation from Section~\ref{convolution}.  It
6149
is the convolution of $f$ and $g$.  This shows that if the input is $f$
6150
and the impulse response of the system is $g$, the output is the
6151
convolution of $f$ and $g$.
6152

6153
In summary, there are two ways to think about the effect of a system
6154
on a signal:
6155

6156
\begin{enumerate}
6157

6158
\item The input is a sequence of impulses, so the output is the sum of
6159
  scaled, shifted copies of the impulse response; that sum is the
6160
  convolution of the input and the impulse response.
6161

6162
\item The DFT of the impulse response is a transfer function that
6163
  encodes the effect of the system on each frequency component as a
6164
  magnitude and phase offset.  The DFT of the input encodes the
6165
  magnitude and phase offset of the frequency components it contains.
6166
  Multiplying the DFT of the input by the transfer function yields
6167
  the DFT of the output.
6168

6169
\end{enumerate}
6170

6171
The equivalence of these descriptions should not be a surprise;
6172
it is basically a statement of the Convolution Theorem:
6173
convolution of $f$ and $g$ in the time
6174
domain corresponds to multiplication in the frequency domain.
6175
And if you wondered why convolution is defined as it is, which
6176
seemed backwards when we talked about smoothing and difference
6177
windows, now you know the reason: the definition of convolution
6178
appears naturally in the response of an LTI system to a signal.
6179

6180

6181
\section{Proof of the Convolution Theorem}
6182

6183
Well, I've put it off long enough.  It's time to prove the Convolution
6184
Theorem (CT), which states:
6185

6186
\[ \DFT(f \conv g) = \DFT(f) \DFT(g) \]
6187

6188
where $f$ and $g$ are vectors with the same length, $N$.
6189

6190
I'll proceed in two steps:
6191

6192
\begin{enumerate}
6193

6194
\item I'll show that in the special case where $f$ is a complex
6195
exponential, convolution with $g$ has the effect of multiplying
6196
$f$ by a scalar.  
6197

6198
\item In the more general case where $f$ is not a complex exponential,
6199
we can use the DFT to express it as a sum of exponential components,
6200
compute the convolution of each component (by multiplication) and
6201
then add up the results.
6202

6203
\end{enumerate}
6204

6205
Together these steps prove the Convolution Theorem.  But first, let's
6206
assemble the pieces we'll need.  The DFT of $g$, which I'll call $G$
6207
is:
6208
%
6209
\[ DFT(g)[k] = G[k] = \sum_n g[n] \exp(-2 \pi i n k / N) \]
6210
%
6211
where $k$ is an index of frequency from
6212
0 to $N-1$ and $n$ is an index of time from 0 to $N-1$.
6213
The result is a vector of $N$ complex numbers.
6214

6215
The inverse DFT of $F$, which I'll call $f$, is:
6216
%
6217
\[ IDFT(F)[n] = f[n] = \sum_k F[k] \exp(2 \pi i n k / N) \]
6218
%
6219
Here's the definition of convolution:
6220
%
6221
\[ (f \conv g)[n] = \sum_m f[m] g[n-m] \]
6222
%
6223
where $m$ is another index of time from 0 to $N-1$.
6224
Convolution is commutative, so I could equivalently write:
6225
%
6226
\[ (f \conv g)[n] = \sum_m f[n-m] g[m] \]
6227
%
6228
Now let's consider the special case where $f$ is a complex
6229
exponential with frequency $k$, which I'll call $e_k$:
6230
%
6231
\[ f[n] = e_k[n] = \exp(2 \pi i n k / N) \]
6232
%
6233
where $k$ is an index of frequency and $n$ is an index of time.
6234

6235
Plugging $e_k$ into the second definition of convolution yields
6236
%
6237
\[ (e_k \conv g)[n] = \sum_m \exp(2 \pi i (n-m) k / N) g[m]  \]
6238
%
6239
We can split the first term into a product:
6240
%
6241
\[ (e_k \conv g)[n] = \sum_m \exp(2 \pi i n k / N) \exp(-2 \pi i m k / N) g[m]  \]
6242
%
6243
The first half does not depend on $m$, so we can pull it out of the
6244
summation:
6245
%
6246
\[ (e_k \conv g)[n] = \exp(2 \pi i n k / N) \sum_m \exp(-2 \pi i m k / N) g[m]  \]
6247
%
6248
Now we recognize that the first term is $e_k$, and the summation is
6249
$G[k]$ (using $m$ as the index of time).  So we can write:
6250
%
6251
\[ (e_k \conv g)[n] = e_k[n] G[k] \]
6252
%
6253
which shows that for each complex exponential, $e_k$, convolution
6254
with $g$ has the effect of multiplying $e_k$ by $G[k]$.  In mathematical
6255
terms, each $e_k$ is an eigenvector of this operation, and
6256
$G[k]$ is the corresponding eigenvalue (see Section~\ref{effdiff}).
6257

6258
Now for the second part of the proof.  If the input signal, $f$, doesn't
6259
happen to be a complex exponential, we can express it as a sum of
6260
complex exponentials by computing its DFT, $F$.
6261
For each value of $k$ from 0 to $N-1$, $F[k]$ is the complex
6262
magnitude of the component with frequency $k$.
6263

6264
Each input component is a complex exponential with magnitude
6265
$F[k]$, so each output component is a complex
6266
exponential with magnitude $F[k] G[k]$, based on the first part of
6267
the proof.
6268

6269
Because the system is linear, the output is just the sum of the
6270
output components:
6271
%
6272
\[ (f \conv g)[n] = \sum_k F[k] G[k] e_k[n] \]
6273
%
6274
Plugging in the definition of $e_k$ yields
6275
%
6276
\[ (f \conv g)[n] = \sum_k F[k] G[k] \exp(2 \pi i n k / N) \]
6277
%
6278
The right hand side is the inverse DFT of the product $F G$.  Thus:
6279
%
6280
\[ (f \conv g) = \IDFT( F G ) \]
6281
%
6282
Substituting $F = \DFT(f)$ and $G = \DFT(g)$:
6283
%
6284
\[ (f \conv g) = \IDFT( \DFT(f) \DFT(g) ) \]
6285
%
6286
Finally, taking the DFT of both sides yields the Convolution Theorem:
6287
%
6288
\[ \DFT(f \conv g) = \DFT(f) \DFT(g) \]
6289
%
6290
QED
6291

6292

6293
\section{Exercises}
6294

6295
Solutions to these exercises are in {\tt chap10soln.ipynb}.
6296

6297
\begin{exercise}
6298
In Section~\ref{sysconv} I describe convolution as the sum of shifted,
6299
scaled copies of a signal.  Strictly speaking, this operation is
6300
{\em linear} convolution, which does not assume that the signal
6301
is periodic.
6302

6303
But in Section~\ref{response}, when we multiply the
6304
DFT of the signal by the transfer function, that operation corresponds
6305
to {\em circular} convolution, which assumes that the signal is
6306
periodic.  As a result, you might notice that the output contains
6307
an extra note at the beginning, which wraps around from the end.
6308

6309
Fortunately, there is a standard solution to this problem.  If you
6310
add enough zeros to the end of the signal before computing the DFT,
6311
you can avoid wrap-around and compute a linear convolution.
6312

6313
Modify the example in {\tt chap10.ipynb} and confirm that zero-padding
6314
eliminates the extra note at the beginning of the output.
6315
\end{exercise}
6316

6317

6318
\begin{exercise}
6319
The Open AIR library provides a ``centralized... on-line resource for
6320
anyone interested in auralization and acoustical impulse response
6321
data'' (\url{http://www.openairlib.net}).  Browse their collection
6322
of impulse response data and download one that sounds interesting.
6323
Find a short recording that has the same sample rate as the impulse
6324
response you downloaded.
6325

6326
Simulate the sound of your recording in the space where the impulse
6327
response was measured, computed two way: by convolving the recording
6328
with the impulse response and by computing the filter that corresponds
6329
to the impulse response and multiplying by the DFT of the recording.
6330
\end{exercise}
6331

6332
%TODO: one more?
6333
%\begin{exercise}
6334
%\end{exercise}
6335

6336

6337

6338
\chapter{Modulation and sampling}
6339

6340
In Section~\ref{aliasing} we saw that when a signal is sampled at
6341
10,000 Hz, a component at 5500 Hz is indistinguishable from a
6342
component at 4500 Hz.  In this example, the folding frequency, 5000 Hz,
6343
is half of the sampling rate.  But I didn't explain why.
6344

6345
This chapter explores the effect of sampling and presents the
6346
Sampling Theorem, which explains aliasing and the folding frequency.
6347

6348
I'll start by exploring the effect of convolution with impulses;
6349
I'll uses that to explain amplitude modulation (AM), which
6350
turns out to be useful for understanding the Sampling Theorem.
6351

6352
The code for this chapter is in {\tt chap11.ipynb}, which is in the
6353
repository for this book (see Section~\ref{code}).
6354
You can also view it at \url{http://tinyurl.com/thinkdsp11}.
6355

6356

6357
\section{Convolution with impulses}
6358

6359
As we saw in Section~\ref{sysconv}, convolution of a signal with
6360
a series of impulses has the effect of making shifted, scaled
6361
copies of the signal.
6362

6363
As an example, I'll read signal that sounds like a beep:
6364

6365
\begin{verbatim}
6366
filename = '253887__themusicalnomad__positive-beeps.wav'
6367
wave = thinkdsp.read_wave(filename)
6368
wave.normalize()
6369
\end{verbatim}
6370

6371
And I'll construct a wave with four impulses:
6372

6373
\begin{verbatim}
6374
imp_sig = thinkdsp.Impulses([0.005, 0.3, 0.6,  0.9], 
6375
                       amps=[1,     0.5, 0.25, 0.1])
6376
impulses = imp_sig.make_wave(start=0, duration=1.0, 
6377
                             framerate=wave.framerate)
6378
\end{verbatim}
6379

6380
And then convolve them:
6381

6382
\begin{verbatim}
6383
convolved = wave.convolve(impulses)
6384
\end{verbatim}
6385

6386
\begin{figure}
6387
% sampling.py
6388
\centerline{\includegraphics[height=3.5in]{figs/sampling1.pdf}}
6389
\caption{The effect of convolving a signal (top left) with a series of
6390
  impulses (bottom left).  The result (right) is the sum of shifted,
6391
  scaled copies of the signal.}
6392
\label{fig.sampling1}
6393
\end{figure}
6394

6395
Figure~\ref{fig.sampling1} shows the results, with the signal in
6396
the top left, the impulses in the lower left, and the result on
6397
the right.
6398

6399
You can hear the result in {\tt chap11.ipynb}; it sounds like
6400
a series of four beeps with decreasing loudness.
6401

6402
The point of this example is just to demonstrate that convolution
6403
with impulses makes shifted, scaled copies.  This result will be
6404
useful in the next section.
6405

6406

6407
\section{Amplitude modulation}
6408
\label{am}
6409

6410
\begin{figure}
6411
% sampling.py
6412
\centerline{\includegraphics[height=5.5in]{figs/sampling2.pdf}}
6413
\caption{Demonstration of amplitude modulation.  The top row is the
6414
spectrum of the signal; the next row is the spectrum after modulation;
6415
the next row is the spectrum after demodulation; the last row is the
6416
demodulated signal after low-pass filtering.}
6417
\label{fig.sampling2}
6418
\end{figure}
6419

6420
Amplitude modulation (AM) is used to broadcast AM radio, among other
6421
applications.  At the transmitter, the signal (which might contain
6422
speech, music, etc.) is ``modulated'' by multiplying it with a cosine
6423
signal that acts as a ``carrier wave''.  The result is a
6424
high-frequency wave that is suitable for broadcast by radio.  Typical
6425
frequencies for AM radio in the United States are 500--1600 kHz (see
6426
\url{https://en.wikipedia.org/wiki/AM_broadcasting}).
6427

6428
At the receiving end, the broadcast signal is ``demodulated'' to
6429
recover the original signal.  Surprisingly, demodulation works by
6430
multiplying the broadcast signal, again, by the same carrier wave.
6431

6432
To see how that works, I'll modulate a signal with a carrier wave at
6433
10 kHz.  Here's the signal:
6434

6435
\begin{verbatim}
6436
filename = '105977__wcfl10__favorite-station.wav'
6437
wave = thinkdsp.read_wave(filename)
6438
wave.unbias()
6439
wave.normalize()
6440
\end{verbatim}
6441

6442
And here's the carrier:
6443

6444
\begin{verbatim}
6445
carrier_sig = thinkdsp.CosSignal(freq=10000)
6446
carrier_wave = carrier_sig.make_wave(duration=wave.duration, 
6447
                                     framerate=wave.framerate)
6448
\end{verbatim}
6449

6450
We can multiply them using the {\tt *} operator, which multiplies
6451
the wave arrays elementwise:
6452

6453
\begin{verbatim}
6454
modulated = wave * carrier_wave
6455
\end{verbatim}
6456

6457
The result sounds pretty bad.  You can hear it in {\tt chap11.ipynb}.
6458

6459
Figure~\ref{fig.sampling2} shows what's happening in the frequency
6460
domain.  The top row is the spectrum of the original signal.  The
6461
next row is the spectrum of the modulated signal, after multiplying
6462
by the carrier.  It contains two copies of the original spectrum,
6463
shifted by plus and minus 10 kHz.
6464

6465
To understand why, recall that convolution in the time domain corresponds
6466
to multiplication in the frequency domain.  Conversely, multiplication
6467
in the time domain corresponds to convolution in the frequency domain.
6468
When we multiply the signal by the carrier, we are convolving its
6469
spectrum with the DFT of the carrier.
6470

6471
Since the carrier is a simple cosine wave, its DFT is two impulses, at
6472
plus and minus 10 kHz.  Convolution with these impulses makes
6473
shifted, scaled copies of the spectrum.  Notice that the amplitude of
6474
the spectrum is smaller after modulation.  The energy from the original
6475
signal is split between the copies.
6476

6477
We demodulate the signal, by multiplying by the carrier wave again:
6478

6479
\begin{verbatim}
6480
demodulated = modulated * carrier_wave
6481
\end{verbatim}
6482

6483
The third row of Figure~\ref{fig.sampling2} shows the result.  Again,
6484
multiplication in the time domain corresponds to convolution in the
6485
frequency domain, which makes shifted, scaled copies of the spectrum.
6486

6487
Since the modulated spectrum contains two peaks, each peak gets split
6488
in half and shifted by plus and minus 20 kHz.  Two of the copies
6489
meet at 0 kHz and get added together; the other two copies end up
6490
centered at plus and minus 20 kHz.
6491

6492
If you listen to the demodulated signal, it sounds pretty good.  The
6493
extra copies of the spectrum add high frequency components that were
6494
not it the original signal, but they are so high that your speakers
6495
probably can't play them, and most people can't hear them.  But if
6496
you have good speakers and good ears, you might.
6497

6498
In that case, you can get rid of the extra components by applying a
6499
low-pass filter:
6500

6501
\begin{verbatim}
6502
demodulated_spectrum = demodulated.make_spectrum(full=True)
6503
demodulated_spectrum.low_pass(10000)
6504
filtered = demodulated_spectrum.make_wave()
6505
\end{verbatim}
6506

6507
The result is quite close to the original wave, although about half
6508
of the power is lost after demodulating and filtering.  But that's not
6509
a problem in practice, because much more of the power is lost in
6510
transmitting and receiving the broadcast signal.  We have to amplify
6511
the result anyway, another factor of 2 is not an issue.
6512

6513

6514
\section{Sampling}
6515

6516
\begin{figure}
6517
% sampling.py
6518
\centerline{\includegraphics[height=3.0in]{figs/sampling3.pdf}}
6519
\caption{Spectrum of a signal before (top) and after (bottom) sampling.}
6520
\label{fig.sampling3}
6521
\end{figure}
6522

6523
I explained amplitude modulation in part because it is interesting, but
6524
mostly because it will help us understand sampling.  ``Sampling'' is
6525
the process of measuring an analog signal at an series of points in
6526
time, usually with equal spacing.
6527

6528
For example, the WAV files we have used as examples were
6529
recorded by sampling the output of a microphone using an analog-to-digital
6530
converter (ADC).  The sampling rate for most of them is 44.1 kHz,
6531
which is the standard rate for ``CD quality'' sound, or 48 kHz, which
6532
is the standard for DVD sound.
6533

6534
At 48 kHz, the folding frequency is 24 kHz, which is higher than most
6535
people can hear (see \url{https://en.wikipedia.org/wiki/Hearing_range}).
6536

6537
In most of these waves, each sample has 16 bits, so there
6538
are $2^{16}$ distinct levels.  This ``bit depth'' turns out to be enough
6539
that adding more bits does not improve the sound quality noticeably
6540
(see \url{https://en.wikipedia.org/wiki/Digital_audio}).
6541

6542
Of course, applications other than audio signals might require higher
6543
sampling rates, in order to capture higher frequencies, or higher
6544
bit-depth, in order to reproduce waveforms with more fidelity.
6545

6546
%TODO: refer to Vi Hart's video
6547
%TODO: refer to the sampling video
6548

6549
To demonstrate the effect of the sampling process, I am going to start
6550
with a wave sampled at 44.1 kHz and select samples from it at about 11 kHz.
6551
This is not exactly the same as sampling from an analog signal, but
6552
the effect is the same.
6553

6554
First I'll load a recording of a drum solo:
6555

6556
\begin{verbatim}
6557
filename = '263868__kevcio__amen-break-a-160-bpm.wav'
6558
wave = thinkdsp.read_wave(filename)
6559
wave.normalize()
6560
\end{verbatim}
6561

6562
Figure~\ref{fig.sampling3} (top) shows the spectrum of this wave.
6563
Now here's the function that samples from the wave:
6564

6565
\begin{verbatim}
6566
def sample(wave, factor=4):
6567
    ys = np.zeros(len(wave))
6568
    ys[::factor] = wave.ys[::factor]
6569
    return thinkdsp.Wave(ys, framerate=wave.framerate)
6570
\end{verbatim}
6571

6572
I'll use it to select every fourth element:
6573

6574
\begin{verbatim}
6575
sampled = sample(wave, 4)
6576
\end{verbatim}
6577

6578
The result has the same framerate as  the original, but most of the
6579
elements are zero.  If you play the sampled wave, it doesn't sound
6580
very good.  The sampling process introduces high-frequency
6581
components that were not in the original.
6582

6583
Figure~\ref{fig.sampling3} (bottom) shows the spectrum of the sampled
6584
wave.  It contains four copies of the original spectrum (it looks like
6585
five copies because one is split between the highest and lowest
6586
frequencies).
6587

6588
\begin{figure}
6589
% sampling.py
6590
\centerline{\includegraphics[height=4in]{figs/sampling9.pdf}}
6591
\caption{The DFT of an impulse train is also an impulse train.}
6592
\label{fig.sampling9}
6593
\end{figure}
6594

6595
To understand where these copies come from, we can think of the
6596
sampling process as multiplication with a series of impulses.  Instead
6597
of using {\tt sample} to select every fourth element, we could use
6598
this function to make a series of impulses, sometimes called an
6599
{\bf impulse train}:
6600
 
6601
\begin{verbatim}
6602
def make_impulses(wave, factor):
6603
    ys = np.zeros(len(wave))
6604
    ys[::factor] = 1
6605
    ts = np.arange(len(wave)) / wave.framerate
6606
    return thinkdsp.Wave(ys, ts, wave.framerate)
6607
\end{verbatim}
6608

6609
And then multiply the original wave by the impulse train:
6610
 
6611
\begin{verbatim}
6612
impulses = make_impulses(wave, 4)
6613
sampled = wave * impulses
6614
\end{verbatim}
6615

6616
The result is the same; it still doesn't sound very good, but now
6617
we understand why.  Multiplication in the time domain corresponds
6618
to convolution in the frequency domain.  When we multiply
6619
by an impulse train, we are convolving with the DFT of an
6620
impulse train.  As it turns out, the DFT of an impulse
6621
train is also an impulse train. 
6622

6623
Figure~\ref{fig.sampling9} shows two examples.  The top row is
6624
the impulse train in the example, with frequency 11,025 Hz.
6625
The DFT is a train of 4 impulses, which is why we get 4 copies
6626
of the spectrum.  The bottom row shows an impulse
6627
train with a lower frequency, about 5512 Hz.  Its DFT is a train
6628
of 8 impulses.  In general, more impulses in the time
6629
domain correspond to fewer impulses in the frequency
6630
domain.
6631

6632
In summary:
6633

6634
\begin{itemize}
6635

6636
\item We can think of sampling as multiplication by an impulse train.
6637

6638
\item Multiplying by an impulse train corresponds
6639
to convolution with an impulse train in the frequency domain.
6640

6641
\item Convolution with an impulse train makes multiple copies of the
6642
signal's spectrum.
6643

6644
\end{itemize}
6645

6646

6647
\section{Aliasing}
6648

6649
\begin{figure}
6650
% sampling.py
6651
\centerline{\includegraphics[height=5.5in]{figs/sampling4.pdf}}
6652
\caption{Spectrum of the drum solo (top), spectrum of the impulse
6653
train (second row), spectrum of the sampled wave (third row),
6654
after low-pass filtering (bottom).}
6655
\label{fig.sampling4}
6656
\end{figure}
6657

6658
Section~\ref{am}, after demodulating an AM signal we got rid of
6659
the extra copies of the spectrum by applying a low-pass filter.
6660
We can do the same thing after sampling, but it turns out
6661
not to be a perfect solution.
6662

6663
Figure~\ref{fig.sampling4} shows why not.  The top row is the spectrum
6664
of the drum solo.  It contains high frequency components that extend
6665
past 10 kHz.  When we sample this wave, we convolve the spectrum 
6666
with the impulse train (second row), which makes copies of the spectrum
6667
(third row).  The bottom row shows the result after applying a low-pass
6668
filter with a cutoff at the folding frequency, 5512 Hz.
6669

6670
If we convert the result back to a wave, it is similar to the original
6671
wave, but there are two problems:
6672

6673
\begin{itemize}
6674

6675
\item Because of the low-pass filter, the components above 5500 Hz
6676
have been lost, so the result sounds muted.
6677

6678
\item Even the components below 5500 Hz are not quite right, because
6679
the include contributions from leftover from the spectral copies we
6680
tried to filter out.
6681

6682
\end{itemize}
6683

6684
\begin{figure}
6685
% sampling.py
6686
\centerline{\includegraphics[height=4.8in]{figs/sampling5.pdf}}
6687
\caption{Spectrum of a bass guitar solo (top), its spectrum after
6688
sampling (middle), and after filtering (bottom).}
6689
\label{fig.sampling5}
6690
\end{figure}
6691

6692
If the spectral copies overlap after sampling, we lose information
6693
about the spectrum, and we won't be able to recover it.
6694

6695
But if the copies don't overlap, things work out pretty well.  As
6696
a second example, I loaded a recording of a bass guitar solo. 
6697

6698
Figure~\ref{fig.sampling5} shows its spectrum (top row), which contains
6699
no visible energy above 5512 Hz.  The second row shows the spectrum of
6700
the sampled wave, and the third row shows the spectrum after the low
6701
pass filter.  The amplitude is lower because we've filtered out some
6702
of the energy, but the shape of the spectrum is almost exactly what we
6703
started with.  And if we convert back to a wave, it sounds the same.
6704

6705

6706
\section{Interpolation}
6707

6708
\begin{figure}
6709
% sampling.py
6710
\centerline{\includegraphics[height=2.5in]{figs/sampling6.pdf}}
6711
\caption{A brick wall low-pass filter (right) and the corresponding
6712
convolution window (left).}
6713
\label{fig.sampling6}
6714
\end{figure}
6715

6716
The low pass filter I used in the last step is a so-called {\bf brick
6717
  wall filter}; frequencies above the cutoff are removed completely,
6718
as if they hit a brick wall.
6719

6720
Figure~\ref{fig.sampling6} (right) shows what this filter looks like.
6721
Of course, multiplication by the this filter, in the frequency domain,
6722
corresponds to convolution with a window in the time domain.  We can
6723
find out what that window is by computing the inverse DFT of the
6724
filter, which is shown in Figure~\ref{fig.sampling6} (left).
6725

6726
That function has a name; it is the normalized sinc function, or at
6727
least a discrete approximation of it (see
6728
\url{https://en.wikipedia.org/wiki/Sinc_function}):
6729

6730
\[ \mathrm{sinc}(x) = \frac{\sin \pi x}{\pi x} \]
6731

6732
When we apply the low-pass filter, we are convolving with a sinc
6733
function.  We can think of this convolution as the sum of shifted,
6734
scaled copies of the sinc function.  
6735

6736
The value of sinc is 1 at 0 and 0 at every other integer
6737
value of $x$.  When we shift the sinc function, we move the zero point.
6738
When we scale it, we change the height at the zero point.
6739
So when we add up the shifted, scaled copies, they interpolate
6740
between the sampled points.
6741

6742
\begin{figure}
6743
% sampling.py
6744
\centerline{\includegraphics[height=2.5in]{figs/sampling8.pdf}}
6745
\caption{.}
6746
\label{fig.sampling8}
6747
\end{figure}
6748

6749
Figure~\ref{fig.sampling8} shows how that works using a short segment
6750
of the bass guitar solo.  The line across the to is the original
6751
wave.  The vertical gray lines show the sampled values.  The thin
6752
curves are the shifted, scaled copies of the sinc function.
6753
The sum of these sinc functions is identical to the original wave.
6754

6755
Read that last sentence again, because it is more surprising than it
6756
might seem.  Because we started with a signal that contained no energy
6757
above 5512 Hz, and we sampled at 11,025 Hz, we were able to recover
6758
the original spectrum exactly.
6759

6760
In this example, I started with a wave that had already been
6761
sampled at 44,100 Hz, and I resampled it at 11,025 Hz.  After
6762
resampling, the gap between the spectral copies is the sampling
6763
rate, 11.025 kHz.  If the original signal contains components that
6764
exceed half of the sampling rate, 5512 Hz, the copies overlap
6765
and we lose information.
6766

6767
But if the signal is ``bandwidth limited''; that is, it contains no
6768
energy above 5512 Hz, the spectral copies don't overlap, we don't lose
6769
information, and we can recover the original signal exactly.
6770

6771
This result is known as the Nyquist-Shannon sampling theorem (see 
6772
\url{https://en.wikipedia.org/wiki/Nyquist-Shannon_sampling_theorem}).
6773

6774
This example does not prove the Sampling Theorem, but I hope it
6775
helps you understand what it says and why it works.
6776

6777
Notice that the argument I made does
6778
not depend on the original sampling rate, 44.1 kHz.  The result
6779
would be the same if the original had been sampled at a higher
6780
frequency, or even if the original had been a continuous analog
6781
signal: if we sample at framerate $f$, we can recover the original
6782
signal exactly, as long as it contains no energy at frequencies
6783
above $f/2$.  
6784

6785

6786
\section{Exercises}
6787

6788
Solutions to these exercises are in {\tt chap11soln.ipynb}.
6789

6790
\begin{exercise}
6791
The code in this chapter is in {\tt chap11.ipynb}.  Read through
6792
it and listen to the examples.
6793
\end{exercise}
6794

6795

6796
\begin{exercise}
6797
Chris ``Monty'' Montgomery has an excellent video called ``D/A and A/D
6798
| Digital Show and Tell''; it demonstrates the Sampling Theorem in
6799
action, and presents lots of other excellent information about
6800
sampling.  Watch it at
6801
\url{https://www.youtube.com/watch?v=cIQ9IXSUzuM}.
6802
\end{exercise}
6803

6804

6805
\begin{exercise}
6806
As we have seen, if you sample a signal at too low a
6807
framerate, frequencies above the folding frequency get aliased.
6808
Once that happens, it is no longer possible to filter out
6809
these components, because they are indistinguishable from
6810
lower frequencies.
6811

6812
It is a good idea to filter out these frequencies {\em before}
6813
sampling; a low-pass filter used for this purpose is called
6814
an {\bf anti-aliasing filter}.
6815

6816
Returning to the drum solo example, apply a low-pass filter
6817
before sampling, then apply the low-pass filter again to remove
6818
the spectral copies introduced by sampling.  The result should
6819
be identical to the filtered signal.
6820
\end{exercise}
6821

6822

6823
\backmatter
6824
\printindex
6825

6826
\end{document}
6827

6828
\end{itemize}
6829

6830
Product

Resources

Company