📚 The CoCalc Library - books, templates and other resources
License: OTHER
% LaTeX source for ``Think DSP: Digital Signal Processing for Programmers''1% Copyright 2014 Allen B. Downey.23% License: Creative Commons Attribution-NonCommercial 3.0 Unported License.4% http://creativecommons.org/licenses/by-nc/3.0/5%67\documentclass[12pt]{book}8\usepackage[width=5.5in,height=8.5in,9hmarginratio=3:2,vmarginratio=1:1]{geometry}1011% for some of these packages, you might have to install12% texlive-latex-extra (in Ubuntu)1314\usepackage[T1]{fontenc}15\usepackage{textcomp}16\usepackage{mathpazo}17\usepackage{url}18\usepackage{graphicx}19\usepackage{subfig}20\usepackage{amsmath}21\usepackage{amsthm}22\usepackage{makeidx}23\usepackage{setspace}24\usepackage{hevea}25\usepackage{upquote}26\usepackage{fancyhdr}27\usepackage[bookmarks]{hyperref}2829\title{Think DSP}30\author{Allen B. Downey}3132\newcommand{\thetitle}{Think DSP: Digital Signal Processing in Python}33\newcommand{\theversion}{0.10.4}3435% these styles get translated in CSS for the HTML version36\newstyle{a:link}{color:black;}37\newstyle{p+p}{margin-top:1em;margin-bottom:1em}38\newstyle{img}{border:0px}3940% change the arrows in the HTML version41\setlinkstext42{\imgsrc[ALT="Previous"]{back.png}}43{\imgsrc[ALT="Up"]{up.png}}44{\imgsrc[ALT="Next"]{next.png}}4546\makeindex4748\newif\ifplastex49\plastexfalse5051\begin{document}5253\frontmatter5455\ifplastex5657\else58\fi5960\ifplastex61\usepackage{localdef}62\maketitle6364\else6566\newtheoremstyle{exercise}% name of the style to be used67{\topsep}% measure of space to leave above the theorem. E.g.: 3pt68{\topsep}% measure of space to leave below the theorem. E.g.: 3pt69{}% name of font to use in the body of the theorem70{0pt}% measure of space to indent71{\bfseries}% name of head font72{}% punctuation between head and body73{ }% space after theorem head; " " = normal interword space74{}% Manually specify head7576\theoremstyle{exercise}77\newtheorem{exercise}{Exercise}[chapter]7879\input{latexonly}8081\begin{latexonly}8283\renewcommand{\blankpage}{\thispagestyle{empty} \quad \newpage}8485% TITLE PAGES FOR LATEX VERSION8687%-half title--------------------------------------------------88\thispagestyle{empty}8990\begin{flushright}91\vspace*{2.0in}9293\begin{spacing}{3}94{\huge Think DSP}\\95{\Large Digital Signal Processing in Python}96\end{spacing}9798\vspace{0.25in}99100Version \theversion101102\vfill103104\end{flushright}105106%--verso------------------------------------------------------107108\blankpage109\blankpage110111%--title page--------------------------------------------------112\pagebreak113\thispagestyle{empty}114115\begin{flushright}116\vspace*{2.0in}117118\begin{spacing}{3}119{\huge Think DSP}\\120{\Large Digital Signal Processing in Python}121\end{spacing}122123\vspace{0.25in}124125Version \theversion126127\vspace{1in}128129130{\Large131Allen B. Downey\\132}133134135\vspace{0.5in}136137{\Large Green Tea Press}138139{\small Needham, Massachusetts}140141\vfill142143\end{flushright}144145146%--copyright--------------------------------------------------147\pagebreak148\thispagestyle{empty}149150Copyright \copyright ~2014 Allen B. Downey.151152153\vspace{0.2in}154155\begin{flushleft}156Green Tea Press \\1579 Washburn Ave \\158Needham MA 02492159\end{flushleft}160161Permission is granted to copy, distribute, and/or modify this document162under the terms of the Creative Commons Attribution-NonCommercial 3.0 Unported163License, which is available at \url{http://creativecommons.org/licenses/by-nc/3.0/}.164165\vspace{0.2in}166167\end{latexonly}168169170% HTMLONLY171172\begin{htmlonly}173174% TITLE PAGE FOR HTML VERSION175176{\Large \thetitle}177178{\large Allen B. Downey}179180Version \theversion181182\vspace{0.25in}183184Copyright 2012 Allen B. Downey185186\vspace{0.25in}187188Permission is granted to copy, distribute, and/or modify this document189under the terms of the Creative Commons Attribution-NonCommercial 3.0190Unported License, which is available at191\url{http://creativecommons.org/licenses/by-nc/3.0/}.192193\setcounter{chapter}{-1}194195\end{htmlonly}196197\fi198% END OF THE PART WE SKIP FOR PLASTEX199200\chapter{Preface}201\label{preface}202203Signal processing is one of my favorite topics. It is useful204in many areas of science and engineering, and if you understand205the fundamental ideas, it provides insight into many things206we see in the world, and especially the things we hear.207208But unless you studied electrical or mechanical engineering, you209probably haven't had a chance to learn about signal processing. The210problem is that most books (and the classes that use them) present the211material bottom-up, starting with mathematical abstractions like212phasors. And they tend to be theoretical, with few applications and213little apparent relevance.214215The premise of this book is that if you know how to program, you216can use that skill to learn other things, and have fun doing it.217218With a programming-based approach, I can present the most important219ideas right away. By the end of the first chapter, you can analyze220sound recordings and other signals, and generate new sounds. Each221chapter introduces a new technique and an application you can222apply to real signals. At each step you learn how to use a223technique first, and then how it works.224225This approach is more practical and, I hope you'll agree, more fun.226227228\section{Who is this book for?}229230The examples and supporting code for this book are in Python. You231should know core Python and you should be232familiar with object-oriented features, at least using objects if not233defining your own.234235If you are not already familiar with Python, you might want to start236with my other book, {\it Think Python}, which is an introduction to237Python for people who have never programmed, or Mark238Lutz's {\it Learning Python}, which might be better for people with239programming experience.240241I use NumPy and SciPy extensively. If you are familiar with them242already, that's great, but I will also explain the functions243and data structures I use.244245I assume that the reader knows basic mathematics, including complex246numbers. You don't need much calculus; if you understand the concepts247of integration and differentiation, that will do.248I use some linear algebra, but I will explain it as we249go along.250251252\section{Using the code}253\label{code}254255The code and sound samples used in this book are available from256\url{https://github.com/AllenDowney/ThinkDSP}. Git is a version257control system that allows you to keep track of the files that258make up a project. A collection of files under Git's control is259called a {\bf repository}. GitHub is a hosting service that provides260storage for Git repositories and a convenient web interface.261\index{repository}262\index{Git}263\index{GitHub}264265The GitHub homepage for my repository provides several ways to266work with the code:267268\begin{itemize}269270\item You can create a copy of my repository271on GitHub by pressing the {\sf Fork} button. If you don't already272have a GitHub account, you'll need to create one. After forking, you'll273have your own repository on GitHub that you can use to keep track274of code you write while working on this book. Then you can275clone the repo, which means that you copy the files276to your computer.277\index{fork}278279\item Or you could clone280my repository. You don't need a GitHub account to do this, but you281won't be able to write your changes back to GitHub.282\index{clone}283284\item If you don't want to use Git at all, you can download the files285in a Zip file using the button in the lower-right corner of the286GitHub page.287288\end{itemize}289290All of the code is written to work in both Python 2 and Python 3291with no translation.292293I developed this book using Anaconda from294Continuum Analytics, which is a free Python distribution that includes295all the packages you'll need to run the code (and lots more).296I found Anaconda easy to install. By default it does a user-level297installation, not system-level, so you don't need administrative298privileges. And it supports both Python 2 and Python 3. You can299download Anaconda from \url{http://continuum.io/downloads}.300\index{Anaconda}301302If you don't want to use Anaconda, you will need the following303packages:304305\begin{itemize}306307\item NumPy for basic numerical computation, \url{http://www.numpy.org/};308\index{NumPy}309310\item SciPy for scientific computation,311\url{http://www.scipy.org/};312\index{SciPy}313314\item matplotlib for visualization, \url{http://matplotlib.org/}.315\index{matplotlib}316317\end{itemize}318319Although these are commonly used packages, they are not included with320all Python installations, and they can be hard to install in some321environments. If you have trouble installing them, I322recommend using Anaconda or one of the other Python distributions323that include these packages.324\index{installation}325326Most exercises use Python scripts, but some also use the IPython327notebook. If you have not used IPython notebook before, I suggest328you start with the documentation at329\url{http://ipython.org/ipython-doc/stable/notebook/notebook.html}.330\index{IPython}331332Good luck, and have fun!333334335336\section*{Contributor List}337338If you have a suggestion or correction, please send email to339{\tt downey@allendowney.com}. If I make a change based on your340feedback, I will add you to the contributor list341(unless you ask to be omitted).342\index{contributors}343344If you include at least part of the sentence the345error appears in, that makes it easy for me to search. Page and346section numbers are fine, too, but not as easy to work with.347Thanks!348349\small350351\begin{itemize}352353\item Before I started writing, my thoughts about this book354benefited from conversations with Boulos Harb at Google and355Aurelio Ramos, formerly at Harmonix Music Systems.356357\item During the Fall 2013 semester, Nathan Lintz and Ian Daniher358worked with me on an independent study project and helped me with359the first draft of this book.360361\item On Reddit's DSP forum, the anonymous user RamjetSoundwave362helped me fix a problem with my implementation of Brownian Noise.363And andodli found a typo.364365\item In Spring 2015 I had the pleasure of teaching this material366along with Prof. Oscar Mur-Miranda and Prof. Siddhartan Govindasamy.367Both made many suggestions and corrections.368369\item Silas Gyger corrected an arithmetic error.370371% ENDCONTRIB372373\end{itemize}374375\normalsize376377\clearemptydoublepage378379% TABLE OF CONTENTS380\begin{latexonly}381382\tableofcontents383384\clearemptydoublepage385386\end{latexonly}387388% START THE BOOK389\mainmatter390391392\chapter{Sounds and signals}393\label{sounds}394395A {\bf signal} represents a quantity that varies in time,396or space, or both. That definition is pretty abstract, so let's start397with a concrete example: sound. Sound is variation in air pressure.398A sound signal represents variations in air pressure over time.399400A microphone is a device that measures these variations and generates401an electrical signal that represents sound. A speaker is a device402that takes an electrical signal and produces sound.403Microphones and speakers are called {\bf transducers} because they404transduce, or convert, signals from one form to another.405406This book is about signal processing, which includes processes for407synthesizing, transforming, and analyzing signals. I will focus on408sound signals, but the same methods apply to electronic signals,409mechanical vibration, and signals in many other domains.410411They also apply to signals that vary in space rather than time, like412elevation along a hiking trail. And they apply to signals in more413than one dimension, like an image, which you can think of as a signal414that varies in two-dimensional space. Or a movie, which is415a signal that varies in two-dimensional space {\it and} time.416417But we start with simple one-dimensional sound.418419The code for this chapter is in {\tt chap01.ipynb}, which is in the420repository for this book (see Section~\ref{code}).421You can also view it at \url{http://tinyurl.com/thinkdsp01}.422423424\section{Periodic signals}425\label{violin}426427\begin{figure}428% sounds.py429\centerline{\includegraphics[height=2.5in]{figs/sounds1.eps}}430\caption{Segment from a recording of a bell.}431\label{fig.sounds1}432\end{figure}433434We'll start with {\bf periodic signals}, which are signals that435repeat themselves after some period of time. For example, if you436strike a bell, it vibrates and generates sound. If you record437that sound and plot the transduced signal, it looks like438Figure~\ref{fig.sounds1}.439440This signal resembles a {\bf sinusoid}, which means it has the same441shape as the trigonometric sine function.442443You can see that this signal is periodic. I chose the duration444to show three full periods, also known as {\bf cycles}.445The duration of each cycle is about 2.3 ms.446447The {\bf frequency} of a signal is the number of cycles448per second, which is the inverse of the period.449The units of frequency are cycles per second, or {\bf Hertz},450abbreviated ``Hz''.451452The frequency of this signal is about 439 Hz, slightly lower than 440453Hz, which is the standard tuning pitch for orchestral music. The454musical name of this note is A, or more specifically, A4. If you are455not familiar with ``scientific pitch notation'', the numerical suffix456indicates which octave the note is in. A4 is the A above middle C.457A5 is one octave higher. See458\url{http://en.wikipedia.org/wiki/Scientific_pitch_notation}.459460\begin{figure}461% sounds.py462\centerline{\includegraphics[height=2.5in]{figs/sounds2.eps}}463\caption{Segment from a recording of a violin.}464\label{fig.sounds2}465\end{figure}466467A tuning fork generates a sinusoid because the vibration of the tines468is a form of simple harmonic motion. Most musical instruments469produce periodic signals, but the shape of these signals is not470sinusoidal. For example, Figure~\ref{fig.sounds2} shows a segment471from a recording of a violin playing472Boccherini's String Quintet No. 5 in E, 3rd473movement.474475%\footnote{The recording is from476% \url{http://www.freesound.org/people/jcveliz/sounds/92002/}.477%I identified the piece using \url{http://www.musipedia.org}.}478% Parson's code: DUUDDUURDR479480Again we can see that the signal is periodic, but the shape of the481signal is more complex. The shape of a periodic signal is called482the {\bf waveform}. Most musical instruments produce waveforms more483complex than a sinusoid. The shape of the waveform determines the484musical {\bf timbre}, which is our perception of the quality of the485sound. People usually perceive complex waveforms as rich, warm and486more interesting than sinusoids.487488489\section{Spectral decomposition}490491\begin{figure}492% sounds.py493\centerline{\includegraphics[height=2.5in]{figs/sounds3.eps}}494\caption{Spectrum of a segment from the violin recording.}495\label{fig.sounds3}496\end{figure}497498The most important topic in this book is {\bf spectral decomposition},499which is the idea that any signal can be expressed as the sum of500sinusoids with different frequencies.501502The most important mathematical idea in this book is the {\bf discrete503Fourier transform}, or {\bf DFT}, which takes a signal and produces504its {\bf spectrum}. The spectrum is the set of sinusoids that add up to505produce the signal.506507And the most important algorithm in this book is the {\bf Fast508Fourier transform}, or {\bf FFT}, which is an efficient way to509compute the DFT.510511For example, Figure~\ref{fig.sounds3} shows the spectrum of the violin512recording in Figure~\ref{fig.sounds2}. The x-axis is the range of513frequencies that make up the signal. The y-axis shows the strength514or {\bf amplitude} of each frequency component.515516The lowest frequency component is called the {\bf fundamental517frequency}. The fundamental frequency of this signal is near 440 Hz518(actually a little lower, or ``flat'').519520In this signal the fundamental frequency has the largest amplitude,521so it is also the {\bf dominant frequency}.522Normally the perceived pitch of a sound is determined by the523fundamental frequency, even if it is not dominant.524525The other spikes in the spectrum are at frequencies 880, 1320, 1760, and5262200, which are integer multiples of the fundamental.527These components are called {\bf harmonics} because they are528musically harmonious with the fundamental:529530\begin{itemize}531532\item 880 is the frequency of A5, one octave higher than the fundamental.533534\item 1320 is approximately E6, which is a major fifth above A5.535If you are not familiar with musical intervals like "major fifth'', see536\url{https://en.wikipedia.org/wiki/Interval_(music)}.537538\item 1760 is A6, two octaves above the fundamental.539540\item 2200 is approximately C$\sharp$7, which is a major third541above A6.542543\end{itemize}544545These harmonics make up the notes of an A major546chord, although not all in the same octave. Some of them are only547approximate because the notes that make up Western music have been548adjusted for {\bf equal temperament} (see549\url{http://en.wikipedia.org/wiki/Equal_temperament}).550551Given the harmonics and their amplitudes, you can reconstruct the552signal by adding up sinusoids.553Next we'll see how.554555556\section{Signals}557558I wrote a Python module called {\tt thinkdsp.py} that contains classes559and functions for working with signals and spectrums\footnote{The560plural of ``spectrum'' is often written ``spectra'', but I prefer561to use standard English plurals. If you are familiar with ``spectra'',562I hope my choice doesn't sound too strange.}. You563will find it in the repository for this book (see Section~\ref{code}).564565To represent signals, {\tt thinkdsp} provides a class called566{\tt Signal}, which is the parent class for several signal types,567including {\tt Sinusoid}, which represents both sine and cosine568signals.569570{\tt thinkdsp} provides functions to create sine and cosine signals:571572\begin{verbatim}573cos_sig = thinkdsp.CosSignal(freq=440, amp=1.0, offset=0)574sin_sig = thinkdsp.SinSignal(freq=880, amp=0.5, offset=0)575\end{verbatim}576577{\tt freq} is frequency in Hz. {\tt amp} is amplitude in unspecified578units where 1.0 is defined as the largest amplitude we can record or579play back.580581{\tt offset} is a {\bf phase offset} in radians. Phase offset582determines where in the period the signal starts. For example, a583sine signal with {\tt offset=0} starts at $\sin 0$, which is 0.584With {\tt offset=pi/2} it starts at $\sin \pi/2$, which is 1. A cosine585signal with {\tt offset=0} also starts at 0. In fact, a cosine signal586with {\tt offset=0} is identical to a sine signal with {\tt587offset=pi/2}.588589Signals have an \verb"__add__" method, so you can use the {\tt +}590operator to add them:591592\begin{verbatim}593mix = sin_sig + cos_sig594\end{verbatim}595596The result is a {\tt SumSignal}, which represents the sum of two597or more signals.598599A Signal is basically a Python representation of a mathematical600function. Most signals are defined for all values of {\tt t},601from negative infinity to infinity.602603You can't do much with a Signal until you evaluate it. In this604context, ``evaluate'' means taking a sequence of points in time, {\tt605ts}, and computing the corresponding values of the signal, {\tt ys}.606I represent {\tt ts} and {\tt ys} using NumPy arrays and excapsulate607them in an object called a Wave.608609A Wave represents a signal evaluated at a sequence of points in610time. Each point in time is called a {\bf frame} (a term borrowed611from movies and video). The measurement itself is called a612{\bf sample}, although ``frame'' and ``sample'' are sometimes613used interchangeably.614615{\tt Signal} provides \verb"make_wave", which returns a new616Wave object:617618\begin{verbatim}619wave = mix.make_wave(duration=0.5, start=0, framerate=11025)620\end{verbatim}621622{\tt duration} is the length of the Wave in seconds. {\tt start} is623the start time, also in seconds. {\tt framerate} is the (integer)624number of frames per second, which is also the number of samples625per second.62662711,025 frames per second is one of several framerates commonly used in628audio file formats, including Waveform Audio File (WAV) and mp3.629630This example evaluates the signal from {\tt t=0} to {\tt t=0.5} at6315,513 equally-spaced frames (because 5,513 is half of 11,025).632The time between frames, or {\bf timestep}, is {\tt 1/11025} seconds, or63391 $\mu$s.634635{\tt Wave} provides a {\tt plot} method that uses {\tt pyplot}.636You can plot the wave like this:637638\begin{verbatim}639wave.plot()640pyplot.show()641\end{verbatim}642643{\tt pyplot} is part of {\tt matplotlib}; it is included in many644Python distributions, or you might have to install it.645646\begin{figure}647% sounds4.py648\centerline{\includegraphics[height=2.5in]{figs/sounds4.eps}}649\caption{Segment from a mixture of two sinusoid signals.}650\label{fig.sounds4}651\end{figure}652653At {\tt freq=440} there are 220 periods in 0.5 seconds, so this plot654would look like a solid block of color. To zoom in on a small number655of periods, we can use {\tt segment}, which copies a segment of a Wave656and returns a new wave:657658\begin{verbatim}659period = mix.period660segment = wave.segment(start=0, duration=period*3)661\end{verbatim}662663{\tt period} is a property of a Signal; it returns the period in seconds.664665{\tt start} and {\tt duration} are in seconds. This example copies666the first three periods from {\tt mix}. The result is a Wave object.667668If we plot {\tt segment}, it looks like Figure~\ref{fig.sounds4}.669This signal contains two frequency components, so it is more670complicated than the signal from the tuning fork, but less complicated671than the violin.672673674\section{Reading and writing Waves}675676{\tt thinkdsp} provides \verb"read_wave", which reads a WAV677file and returns a Wave:678679\begin{verbatim}680violin_wave = thinkdsp.read_wave('input.wav')681\end{verbatim}682683And {\tt Wave} provides {\tt write}, which writes a WAV file:684685\begin{verbatim}686wave.write(filename='output.wav')687\end{verbatim}688689You can listen to the Wave with any media player that plays WAV690files. On UNIX systems, I use {\tt aplay}, which is simple, robust,691and included in many Linux distributions.692693{\tt thinkdsp} also provides \verb"play_wave", which runs694the media player as a subprocess:695696\begin{verbatim}697thinkdsp.play_wave(filename='output.wav', player='aplay')698\end{verbatim}699700It uses {\tt aplay} by default, but you can provide the701name of another player.702703704\section{Spectrums}705\label{spectrums}706707{\tt Wave} provides \verb"make_spectrum", which returns a708{\tt Spectrum}:709710\begin{verbatim}711spectrum = wave.make_spectrum()712\end{verbatim}713714And {\tt Spectrum} provides {\tt plot}:715716\begin{verbatim}717spectrum.plot()718thinkplot.show()719\end{verbatim}720721{\tt thinkplot} is a module I wrote to provide wrappers around some of722the functions in {\tt pyplot}. It is included in the723Git repository for this book (see Section~\ref{code}).724725{\tt Spectrum} provides three methods that modify the spectrum:726727\begin{itemize}728729\item \verb"low_pass" applies a low-pass filter, which means that730components above a given cutoff frequency are attenuated (that is,731reduced in magnitude) by a factor.732733\item \verb"high_pass" applies a high-pass filter, which means that734it attenuates components below the cutoff.735736\item \verb"band_stop" attenuates components in the band of737frequencies between two cutoffs.738739\end{itemize}740741This example attenuates all frequencies above 600 by 99\%:742743\begin{verbatim}744spectrum.low_pass(cutoff=600, factor=0.01)745\end{verbatim}746747A low pass filter removes bright, high-frequency sounds, so748the result sounds muffled and darker. To hear what it sounds749like, you can convert the Spectrum back to a Wave, and then play it.750751\begin{verbatim}752wave = spectrum.make_wave()753wave.play('temp.wav')754\end{verbatim}755756The {\tt play} method writes the wave to a file and then plays it.757If you use IPython notebooks, you can use \verb"make_audio", which758makes an Audio widget that plays the sound.759760761\section{Wave objects}762763\begin{figure}764% http://yuml.me/edit/5294b377765% pdftops -eps diagram1.pdf766\centerline{\includegraphics[width=3.5in]{figs/diagram1.eps}}767\caption{Relationships among the classes in {\tt thinkdsp}.}768\label{fig.diagram1}769\end{figure}770771There is nothing very complicated in {\tt thinkdsp.py}. Most772of the functions it provides are thin wrappers around functions773from NumPy and SciPy.774775The primary classes in {\tt thinkdsp} are Signal, Wave, and Spectrum.776Given a Signal, you can make a Wave. Given a Wave, you can777make a Spectrum, and vice versa. These relationships are shown778in Figure~\ref{fig.diagram1}.779780A Wave object contains three attributes: {\tt ys} is a NumPy array781that contains the values in the signal; {\tt ts} is an array of the782times where the signal was evaluated or sampled; and {\tt783framerate} is the number of samples per unit of time. The784unit of time is usually seconds, but it doesn't have to be. In785one of my examples, it's days.786787Wave also provides three read-only properties: {\tt start},788{\tt end}, and {\tt duration}. If you modify {\tt ts}, these789properties change accordingly.790791To modify a wave, you can access the {\tt ts} and {\tt ys} directly.792For example:793794\begin{verbatim}795wave.ys *= 2796wave.ts += 1797\end{verbatim}798799The first line scales the wave by a factor of 2, making800it louder. The second line shifts the wave in time, making it801start 1 second later.802803But Wave provides methods that perform many common operations.804For example, the same two transformations could be written:805806\begin{verbatim}807wave.scale(2)808wave.shift(1)809\end{verbatim}810811You can read the documentation of these methods and others at812\url{http://think-dsp.com/thinkdsp.html}.813814815\section{Signal objects}816\label{sigobs}817818Signal is a parent class that provides functions common to all819kinds of signals, like \verb"make_wave". Child classes inherit820these methods and provide {\tt evaluate}, which evaluates the821signal at a given sequence of times.822823For example, Sinusoid is a child class of Signal, with this824definition:825826\begin{verbatim}827class Sinusoid(Signal):828829def __init__(self, freq=440, amp=1.0, offset=0, func=np.sin):830Signal.__init__(self)831self.freq = freq832self.amp = amp833self.offset = offset834self.func = func835\end{verbatim}836837The parameters of \verb"__init__" are:838839\begin{itemize}840841\item {\tt freq}: frequency in cycles per second, or Hz.842843\item {\tt amp}: amplitude. The units of amplitude are arbitrary,844usually chosen so 1.0 corresponds to the maximum input from a845microphone or maximum output to a speaker.846847\item {\tt offset}: indicates where in its period the signal starts;848{\tt offset} is in units of radians, for reasons I explain below.849850\item {\tt func}: a Python function used851to evaluate the signal at a particular point in time. It is852usually either {\tt np.sin} or {\tt np.cos}, yielding a sine or853cosine signal.854855\end{itemize}856857Like many init methods, this one just tucks the parameters away for858future use.859860Signal provides \verb"make_wave", which looks like861this:862863\begin{verbatim}864def make_wave(self, duration=1, start=0, framerate=11025):865n = round(duration * framerate)866ts = start + np.arange(n) / framerate867ys = self.evaluate(ts)868return Wave(ys, ts, framerate=framerate)869\end{verbatim}870871{\tt start} and {\tt duration} are the start time and duration872in seconds. {\tt framerate} is the number of frames (samples)873per second.874875{\tt n} is the number of samples, and {\tt ts} is a NumPy array876of sample times.877878To compute the {\tt ys}, \verb"make_wave" invokes {\tt evaluate},879is provided by {\tt Sinusoid}:880881\begin{verbatim}882def evaluate(self, ts):883phases = PI2 * self.freq * ts + self.offset884ys = self.amp * self.func(phases)885return ys886\end{verbatim}887888Let's unwind this function one step at time:889890\begin{enumerate}891892\item {\tt self.freq} is frequency in cycles per second, and each893element of {\tt ts} is a time in seconds, so their product is the894number of cycles since the start time.895896\item {\tt PI2} is a constant that stores $2 \pi$. Multiplying by897{\tt PI2} converts from cycles to {\bf phase}. You can think of898phase as ``cycles since the start time'' expressed in radians. Each899cycle is $2 \pi$ radians.900901\item {\tt self.offset} is the phase when $t=0$.902It has the effect of shifting the signal left or right in time.903904\item If {\tt self.func} is {\tt np.sin} or {\tt np.cos}, the result is a905value between $-1$ and $+1$.906907\item Multiplying by {\tt self.amp} yields a signal that ranges from908{\tt -self.amp} to {\tt +self.amp}.909910\end{enumerate}911912In math notation, {\tt evaluate} is written like this:913%914\[ y = A \cos (2 \pi f t + \phi_0) \]915%916where $A$ is amplitude, $f$ is frequency, $t$ is time, and $\phi_0$917is the phase offset. It may seem like I wrote a lot of code918to evaluate one simple expression, but as we'll see, this code919provides a framework for dealing with all kinds of signals, not920just sinusoids.921922923\section{Exercises}924925Before you begin these exercises, you should download the code926for this book, following the instructions in Section~\ref{code}.927928Solutions to these exercises are in {\tt chap01soln.ipynb}.929930\begin{exercise}931If you have IPython, load {\tt chap01.ipynb}, read through it, and run932the examples. You can also view this notebook at933\url{http://tinyurl.com/thinkdsp01}.934\end{exercise}935936937\begin{exercise}938Go to \url{http://freesound.org} and download a sound sample that939includes music, speech, or other sounds that have a well-defined pitch.940Select a roughly half-second segment where the pitch is941constant. Compute and plot the spectrum of the segment you selected.942What connection can you make between the timbre of the sound and the943harmonic structure you see in the spectrum?944945Use \verb"high_pass", \verb"low_pass", and \verb"band_stop" to946filter out some of the harmonics. Then convert the spectrum back947to a wave and listen to it. How does the sound relate to the948changes you made in the spectrum?949\end{exercise}950951952\begin{exercise}953Synthesize a compound signal by creating SinSignal and CosSignal954objects and adding them up. Evaluate the signal to get a Wave,955and listen to it. Compute its Spectrum and plot it.956What happens if you add frequency957components that are not multiples of the fundamental?958\end{exercise}959960961\begin{exercise}962Write a function called {\tt stretch} that takes a Wave and a stretch963factor and speeds up or slows down the wave by modifying {\tt ts} and964{\tt framerate}. Hint: it should only take two lines of code.965\end{exercise}966967968\chapter{Harmonics}969\label{harmonics}970971In this chapter I present several new waveforms; we will look and972their spectrums to understand their {\bf harmonic structure}, which is973the set of sinusoids they are made up of.974975I'll also introduce one of the most important phenomena in digital976signal processing: aliasing. And I'll explain a little more about how977the Spectrum class works.978979The code for this chapter is in {\tt chap02.ipynb}, which is in the980repository for this book (see Section~\ref{code}).981You can also view it at \url{http://tinyurl.com/thinkdsp02}.982983984\section{Triangle waves}985\label{triangle}986987A sinusoid contains only one frequency component, so its spectrum988has only one peak. More complicated waveforms, like the989violin recording, yield DFTs with many peaks. In this section we990investigate the relationship between waveforms and their spectrums.991992\begin{figure}993% aliasing.py994\centerline{\includegraphics[height=2.5in]{figs/triangle-200-1.eps}}995\caption{Segment of a triangle signal at 200 Hz.}996\label{fig.triangle.200.1}997\end{figure}998999I'll start with a triangle waveform, which is like a straight-line1000version of a sinusoid. Figure~\ref{fig.triangle.200.1} shows a1001triangle waveform with frequency 200 Hz.10021003To generate a triangle wave, you can use {\tt thinkdsp.TriangleSignal}:10041005\begin{verbatim}1006class TriangleSignal(Sinusoid):10071008def evaluate(self, ts):1009cycles = self.freq * ts + self.offset / PI21010frac, _ = np.modf(cycles)1011ys = np.abs(frac - 0.5)1012ys = normalize(unbias(ys), self.amp)1013return ys1014\end{verbatim}10151016{\tt TriangleSignal} inherits \verb"__init__" from {\tt Sinusoid},1017so it takes the same arguments: {\tt freq}, {\tt amp}, and {\tt offset}.10181019The only difference is {\tt evaluate}. As we saw before,1020{\tt ts} is the sequence of sample times where we want to1021evaluate the signal.10221023There are many ways to generate a triangle wave. The details1024are not important, but here's how {\tt evaluate} works:10251026\begin{enumerate}10271028\item {\tt cycles} is the number of cycles since the start time.1029{\tt np.modf} splits the number of cycles into the fraction1030part, stored in {\tt frac}, and the integer part, which is ignored1031\footnote{Using an underscore as a variable name is a convention that1032means, ``I don't intend to use this value.''}.10331034\item {\tt frac} is a sequence that ramps from 0 to 1 with the given1035frequency. Subtracting 0.5 yields values between -0.5 and 0.5.1036Taking the absolute value yields a waveform that zig-zags between10370.5 and 0.10381039\item {\tt unbias} shifts the waveform down so it is centered at 0; then1040{\tt normalize} scales it to the given amplitude, {\tt amp}.10411042\end{enumerate}10431044Here's the code that generates Figure~\ref{fig.triangle.200.1}:10451046\begin{verbatim}1047signal = thinkdsp.TriangleSignal(200)1048signal.plot()1049\end{verbatim}10501051\begin{figure}1052% aliasing.py1053\centerline{\includegraphics[height=2.5in]{figs/triangle-200-2.eps}}1054\caption{Spectrum of a triangle signal at 200 Hz.}1055\label{fig.triangle.200.2}1056\end{figure}10571058Next we can use the Signal to make a Wave, and use the Wave to1059make a Spectrum:10601061\begin{verbatim}1062wave = signal.make_wave(duration=0.5, framerate=10000)1063spectrum = wave.make_spectrum()1064spectrum.plot()1065\end{verbatim}10661067Figure~\ref{fig.triangle.200.2} shows the result. As expected, the1068highest peak is at the fundamental frequency, 200 Hz, and there1069are additional peaks at harmonic frequencies, which are integer1070multiples of 200.10711072But one surprise is that there are no peaks at the even multiples:1073400, 800, etc. The harmonics of a triangle wave are all1074odd multiples of the fundamental frequency, in this example1075600, 1000, 1400, etc.10761077Another feature of this spectrum is the relationship between the1078amplitude and frequency of the harmonics. Their amplitude drops off1079in proportion to frequency squared. For example the frequency ratio1080of the first two harmonics (200 and 600 Hz) is 3, and the amplitude1081ratio is approximately 9. The frequency ratio of the next two1082harmonics (600 and 1000 Hz) is 1.7, and the amplitude ratio is1083approximately $1.7^2 = 2.9$. This relationship is called the1084{\bf harmonic structure}.108510861087\section{Square waves}1088\label{square}10891090\begin{figure}1091% aliasing.py1092\centerline{\includegraphics[height=2.5in]{figs/square-100-1.eps}}1093\caption{Segment of a square signal at 100 Hz.}1094\label{fig.square.100.1}1095\end{figure}10961097{\tt thinkdsp} also provides {\tt SquareSignal}, which represents1098a square signal. Here's the class definition:10991100\begin{verbatim}1101class SquareSignal(Sinusoid):11021103def evaluate(self, ts):1104cycles = self.freq * ts + self.offset / PI21105frac, _ = np.modf(cycles)1106ys = self.amp * np.sign(unbias(frac))1107return ys1108\end{verbatim}11091110Like {\tt TriangleSignal}, {\tt SquareSignal} inherits1111\verb"__init__" from {\tt Sinusoid}, so it takes the same1112parameters.11131114And the {\tt evaluate} method is similar. Again, {\tt cycles} is1115the number of cycles since the start time, and {\tt frac} is the1116fractional part, which ramps from 0 to 1 each period.11171118{\tt unbias} shifts {\tt frac} so it ramps from -0.5 to 0.5,1119then {\tt np.sign} maps the negative values to -1 and the1120positive values to 1. Multiplying by {\tt amp} yields a square1121wave that jumps between {\tt -amp} and {\tt amp}.11221123\begin{figure}1124% aliasing.py1125\centerline{\includegraphics[height=2.5in]{figs/square-100-2.eps}}1126\caption{Spectrum of a square signal at 100 Hz.}1127\label{fig.square.100.2}1128\end{figure}11291130Figure~\ref{fig.square.100.1} shows three periods of a square1131wave with frequency 100 Hz,1132and Figure~\ref{fig.square.100.2} shows its spectrum.11331134Like a triangle wave, the square wave contains only odd harmonics,1135which is why there are peaks at 300, 500, and 700 Hz, etc.1136But the amplitude of the harmonics drops off more slowly.1137Specifically, amplitude drops in proportion to frequency (not frequency1138squared).11391140The exercises at the end of this chapter give you a chance to1141explore other waveforms and other harmonic structures.114211431144\section{Aliasing}1145\label{aliasing}11461147\begin{figure}1148% aliasing.py1149\centerline{\includegraphics[height=2.5in]{figs/triangle-1100-2.eps}}1150\caption{Spectrum of a triangle signal at 1100 Hz sampled at115110,000 frames per second.}1152\label{fig.triangle.1100.2}1153\end{figure}11541155I have a confession. I chose the examples in the previous section1156carefully to avoid showing you something confusing. But now it's1157time to get confused.11581159Figure~\ref{fig.triangle.1100.2} shows the spectrum of a triangle wave1160at 1100 Hz, sampled at 10,000 frames per second. The harmonics1161of this wave should be at 3300, 5500, 7700, and 9900 Hz.11621163In the figure, there are peaks at 1100 and 3300 Hz, as expected, but1164the third peak is at 4500, not 5500 Hz. The1165fourth peak is at 2300, not 7700 Hz. And if you look closely, the1166peak that should be at 9900 is actually at 100 Hz. What's going on?11671168The problem is that when you evaluate the signal at1169discrete points in time, you lose information about what happened1170between samples. For low frequency components, that's not a1171problem, because you have lots of samples per period.11721173But if you sample a signal at 5000 Hz with 10,000 frames per second,1174you only have two samples per period. That turns out to be enough,1175just barely, but if the frequency is higher, it's not.11761177To see why, let's generate cosine signals at 4500 and 5500 Hz,1178and sample them at 10,000 frames per second:11791180\begin{verbatim}1181framerate = 1000011821183signal = thinkdsp.CosSignal(4500)1184duration = signal.period*51185segment = signal.make_wave(duration, framerate=framerate)1186segment.plot()11871188signal = thinkdsp.CosSignal(5500)1189segment = signal.make_wave(duration, framerate=framerate)1190segment.plot()1191\end{verbatim}11921193\begin{figure}1194% aliasing.py1195\centerline{\includegraphics[height=2.5in]{figs/aliasing1.eps}}1196\caption{Cosine signals at 4500 and 5500 Hz, sampled at 10,000 frames1197per second. They are identical.}1198\label{fig.aliasing1}1199\end{figure}12001201Figure~\ref{fig.aliasing1} shows the result. I plotted the1202samples using vertical lines, to make it easier to compare the1203two Waves, and I offset them slightly in time. The problem1204should be clear: the1205the two Waves are exactly the same!12061207When we sample a 5500 Hz signal at 10,000 frames per second, the1208result is indistinguishable from a 4500 Hz signal.1209For the same reason, a 7700 Hz signal is indistinguishable1210from 2300 Hz, and a 9900 Hz is indistinguishable from 100 Hz.12111212This effect is called {\bf aliasing} because when the high frequency1213signal is sampled, it appears to be a low frequency signal.12141215In this example, the highest frequency we can measure is 5000 Hz,1216which is half the sampling rate. Frequencies above 5000 Hz are folded1217back below 5000 Hz, which is why this threshold is sometimes called1218the ``folding frequency''. Is is sometimes also called the {\bf1219Nyquist frequency}. See1220\url{http://en.wikipedia.org/wiki/Nyquist_frequency}.12211222The folding pattern continues if the aliased frequency goes below1223zero. For example, the 5th harmonic of the 1100 Hz triangle wave is1224at 12,100 Hz. Folded at 5000 Hz, it would appear at -2100 Hz, but it1225gets folded again at 0 Hz, back to 2100 Hz. In fact, you can see a1226small peak at 2100 Hz in Figure~\ref{fig.square.100.2}, and the next1227one at 4300 Hz.122812291230\section{Computing the spectrum}12311232We have seen the Wave method \verb"make_spectrum" several times.1233Here is the implementation (leaving out some details we'll get1234to later):12351236\begin{verbatim}1237from np.fft import rfft, rfftfreq12381239# class Wave:1240def make_spectrum(self):1241n = len(self.ys)1242d = 1 / self.framerate12431244hs = rfft(self.ys)1245fs = rfftfreq(n, d)12461247return Spectrum(hs, fs, self.framerate)1248\end{verbatim}12491250The parameter {\tt self} is a Wave object. {\tt n} is the number1251of samples in the wave, and {\tt d} is the inverse of the1252frame rate, which is the time between samples.12531254{\tt np.fft} is the NumPy module that provides functions related1255to the {\bf Fast Fourier Transform} (FFT), which is an efficient1256algorithm that computes the Discrete Fourier Transform (DFT).12571258%TODO: add a forward reference to full fft1259\verb"make_spectrum" uses {\tt rfft}, which stands for ``real1260FFT'', because the Wave contains real values, not complex. Later1261we'll see the full FFT, which can handle complex signals. The result1262of {\tt rfft}, which I call {\tt hs}, is a NumPy array of complex1263numbers that represents the amplitude and phase offset of each1264frequency component in the wave.12651266The result of {\tt rfftfreq}, which I call {\tt fs}, is an array that1267contains frequencies corresponding to the {\tt hs}.12681269To understand the values in {\tt hs}, consider these two ways to think1270about complex numbers:12711272\begin{itemize}12731274\item A complex number is the sum of a real part and an imaginary1275part, often written $x + iy$, where $i$ is the imaginary unit,1276$\sqrt{-1}$. You can think of $x$ and $y$ as Cartesian coordinates.12771278\item A complex number is also the product of a magnitude and a1279complex exponential, $A e^{i \phi}$, where $A$ is the {\bf1280magnitude} and $\phi$ is the {\bf angle} in radians, also called1281the ``argument''. You can think of $A$ and $\phi$ as polar1282coordinates.12831284\end{itemize}12851286Each value in {\tt hs} corresponds to a frequency component: its1287magnitude is proportional to the amplitude of the corresponding1288component; its angle is the phase offset.12891290The Spectrum class provides two read-only properties, {\tt amps}1291and {\tt angles}, which return NumPy arrays representing the1292magnitudes and angles of the {\tt hs}. When we plot a Spectrum1293object, we usually plot {\tt amps} versus {\tt fs}. Sometimes1294it is also useful to plot {\tt angles} versus {\tt fs}.12951296Although it might be tempting to look at the real and imaginary1297parts of {\tt hs}, you will almost never need to. I encourage1298you to think of the DFT as a vector of amplitudes and phase offsets1299that happen to be encoded in the form of complex numbers.13001301To modify a Spectrum, you can access the {\tt hs} directly.1302For example:13031304\begin{verbatim}1305spectrum.hs *= 21306spectrum.hs[spectrum.fs > cutoff] = 01307\end{verbatim}13081309The first line multiples the elements of {\tt hs} by 2, which1310doubles the amplitudes of all components. The second line1311sets to 0 only the elements of {\tt hs} where the corresponding1312frequency exceeds some cutoff frequency.13131314But Spectrum also provides methods to perform these operations:13151316\begin{verbatim}1317spectrum.scale(2)1318spectrum.low_pass(cutoff)1319\end{verbatim}13201321You can read the documentation of these methods and others at1322\url{http://think-dsp.com/thinkdsp.html}.13231324At this point you should have a better idea of how the Signal, Wave,1325and Spectrum classes work, but I have not explained how the Fast1326Fourier Transform works. That will take a few more chapters.132713281329\section{Exercises}13301331Solutions to these exercises are in {\tt chap02soln.ipynb}.13321333\begin{exercise}1334If you use IPython, load {\tt chap02.ipynb} and try out the examples.1335You can also view the notebook at \url{http://tinyurl.com/thinkdsp02}.1336\end{exercise}13371338\begin{exercise}1339A sawtooth signal has a waveform that ramps up linearly from -1 to 1,1340then drops to -1 and repeats. See1341\url{http://en.wikipedia.org/wiki/Sawtooth_wave}13421343Write a class called1344{\tt SawtoothSignal} that extends {\tt Signal} and provides1345{\tt evaluate} to evaluate a sawtooth signal.13461347Compute the spectrum of a sawtooth wave. How does the harmonic1348structure compare to triangle and square waves?1349\end{exercise}13501351\begin{exercise}1352Make a square signal at 1100 Hz and make a wave that samples it1353at 10000 frames per second. If you plot the spectrum, you can1354see that most of the harmonics are aliased.1355When you listen to the wave, can you hear the aliased harmonics?1356\end{exercise}135713581359\begin{exercise}1360If you have a spectrum object, {\tt spectrum}, and print the1361first few values of {\tt spectrum.fs}, you'll see that they1362start at zero. So {\tt spectrum.hs[0]} is the magnitude1363of the component with frequency 0. But what does that mean?13641365Try this experiment:13661367\begin{enumerate}13681369\item Make a triangle signal with frequency 440 and make1370a Wave with duration 0.01 seconds. Plot the waveform.13711372\item Make a Spectrum object and print {\tt spectrum.hs[0]}.1373What is the amplitude and phase of this component?13741375\item Set {\tt spectrum.hs[0] = 100}. Make a Wave from the1376modified Spectrum and plot it. What effect does this operation1377have on the waveform?13781379\end{enumerate}13801381\end{exercise}138213831384\begin{exercise}1385Write a function that takes a Spectrum as a parameter and modifies1386it by dividing each element of {\tt hs} by the corresponding frequency1387from {\tt fs}. Hint: since division by zero is undefined, you1388might want to set {\tt spectrum.hs[0] = 0}.13891390Test your function using a square, triangle, or sawtooth wave.13911392\begin{enumerate}13931394\item Compute the Spectrum and plot it.13951396\item Modify the Spectrum using your function and plot it again.13971398\item Make a Wave from the modified Spectrum and listen to it. What1399effect does this operation have on the signal?14001401\end{enumerate}14021403\end{exercise}14041405\begin{exercise}1406Triangle and square waves have odd harmonics only; the sawtooth1407wave has both even and odd harmonics. The harmonics of the1408square and sawtooth waves drop off in proportion to $1/f$; the1409harmonics of the triangle wave drop off like $1/f^2$. Can you1410find a waveform that has even and odd harmonics that drop off1411like $1/f^2$?14121413Hint: There are two ways you could approach this: you could1414construct the signal you want by adding up sinusoids, or you1415could start with a signal that is similar to what you want and1416modify it.1417\end{exercise}141814191420\chapter{Non-periodic signals}1421\label{nonperiodic}14221423The signals we have worked with so far are periodic, which means1424that they repeat forever. It also means that the frequency1425components they contain do not change over time.1426In this chapter, we consider non-periodic signals,1427whose frequency components {\em do} change over time.1428In other words, pretty much all sound signals.14291430This chapter also presents spectrograms, a common way to visualize1431non-periodic signals.14321433The code for this chapter is in {\tt chap03.ipynb}, which is in the1434repository for this book (see Section~\ref{code}).1435You can also view it at \url{http://tinyurl.com/thinkdsp03}.143614371438\section{Linear chirp}14391440\begin{figure}1441% chirp.py1442\centerline{\includegraphics[height=2.5in]{figs/chirp3.eps}}1443\caption{Chirp waveform near the beginning, middle, and end.}1444\label{fig.chirp3}1445\end{figure}14461447We'll start with a {\bf chirp}, which is a signal with variable1448frequency. {\tt thinkdsp} provides a Signal called Chirp that1449makes a sinusoid that sweeps linearly through a range of1450frequencies.14511452Here's an example what sweeps from 220 to 880 Hz, which1453is two octaves from A3 to A5:14541455\begin{verbatim}1456signal = thinkdsp.Chirp(start=220, end=880)1457wave = signal.make_wave()1458\end{verbatim}14591460Figure~\ref{fig.chirp3} shows segments of this wave near the1461beginning, middle, and end. It's clear that the frequency is1462increasing.14631464Before we go on, let's see how Chirp is implemented. Here1465is the class definition:14661467\begin{verbatim}1468class Chirp(Signal):14691470def __init__(self, start=440, end=880, amp=1.0):1471self.start = start1472self.end = end1473self.amp = amp1474\end{verbatim}14751476{\tt start} and {\tt end} are the frequencies, in Hz, at the start1477and end of the chirp. {\tt amp} is amplitude.14781479Here is the function that evaluates the signal:14801481\begin{verbatim}1482def evaluate(self, ts):1483freqs = np.linspace(self.start, self.end, len(ts)-1)1484return self._evaluate(ts, freqs)1485\end{verbatim}14861487{\tt ts} is the sequence of points in time where the signal should be1488evaluated; to keep this function simple, I assume they are equally-spaced.14891490If the length of {\tt ts} is $n$, you can think of it as a sequence of1491$n-1$ intervals of time. To compute the frequency during each1492interval, I use {\tt np.linspace}, which returns a NumPy array of1493$n-1$ values between {\tt start} and {\tt end}.14941495\verb"_evaluate" is a private method1496that does the rest of the math\footnote{Beginning a method name1497with an underscore makes it ``private'', indicating that it is not1498part of the API that should be used outside the class definition.}:14991500\begin{verbatim}1501def _evaluate(self, ts, freqs):1502dts = np.diff(ts)1503dphis = PI2 * freqs * dts1504phases = np.cumsum(dphis)1505phases = np.insert(phases, 0, 0)1506ys = self.amp * np.cos(phases)1507return ys1508\end{verbatim}15091510{\tt np.diff} computes the difference between adjacent elements1511of {\tt ts}, returning the length of each interval in seconds.1512If the elements of {\tt ts} are equally spaced,1513the {\tt dts} are all the same.15141515The next step is to figure out how much the phase changes during1516each interval. In Section~\ref{sigobs} we saw that when frequency is1517constant, the phase, $\phi$, increases linearly over time:1518%1519\[ \phi = 2 \pi f t \]1520%1521When frequency is a function of time, the {\em change} in phase1522during a short time interval, $\Delta t$ is:1523%1524\[ \Delta \phi = 2 \pi f(t) \Delta t \]1525%1526In Python, since {\tt freqs} contains $f(t)$ and {\tt dts}1527contains the time intervals, we can write15281529\begin{verbatim}1530dphis = PI2 * freqs * dts1531\end{verbatim}15321533Now, since {\tt dphis} contains the changes in phase, we can1534get the total phase at each timestep by adding up the changes:15351536\begin{verbatim}1537phases = np.cumsum(dphis)1538phases = np.insert(phases, 0, 0)1539\end{verbatim}15401541{\tt np.cumsum} computes the cumulative sum, which is almost1542what we want, but it doesn't start at 0. So I use {\tt np.insert}1543to add a 0 at the beginning.15441545The result is a NumPy array where the {\tt i}th element contains the1546sum of the first {\tt i} terms from {\tt dphis}; that is, the total1547phase at the end of the {\tt i}th interval. Finally, {\tt np.cos}1548computes the amplitude of the wave as a function of phase (remember1549that phase is expressed in radians).15501551If you know calculus, you might notice that the limit as1552$\Delta t$ gets small is1553%1554\[ d\phi = 2 \pi f(t) dt \]1555%1556Dividing through by $dt$ yields1557%1558\[ \frac{d\phi}{dt} = 2 \pi f(t) \]1559%1560In other words, frequency is the derivative of phase. Conversely,1561phase is the integral of frequency. When we used {\tt cumsum}1562to go from frequency to phase, we were approximating integration.156315641565\section{Exponential chirp}15661567When you listen to this chirp, you might notice that the pitch1568rises quickly at first and then slows down.1569The chirp spans two octaves, but it only takes 2/3 s to span1570the first octave, and twice as long to span the second.15711572The reason is that our perception of pitch depends on the logarithm of1573frequency. As a result, the {\bf interval} we hear between two notes1574depends on the {\em ratio} of their frequencies, not the difference.1575``Interval'' is the musical term for the perceived difference between1576two pitches.15771578For example, an octave is an interval where the ratio of two1579pitches is 2. So the interval from 220 to 440 is one octave1580and the interval from 440 to 880 is also one octave. The difference1581in frequency is bigger, but the ratio is the same.15821583As a result, if frequency increases linearly, as in a linear1584chirp, the perceived pitch increases logarithmically.15851586If you want the perceived pitch to increase linearly, the frequency1587has to increase exponentially. A signal with that shape is called1588an {\bf exponential chirp}.15891590Here's the code that makes one:15911592\begin{verbatim}1593class ExpoChirp(Chirp):15941595def evaluate(self, ts):1596start, end = np.log10(self.start), np.log10(self.end)1597freqs = np.logspace(start, end, len(ts)-1)1598return self._evaluate(ts, freqs)1599\end{verbatim}16001601Instead of {\tt np.linspace}, this version of evaluate uses1602{\tt np.logspace}, which creates a series of frequencies1603whose logarithms are equally spaced, which means that they increase1604exponentially.16051606That's it; everything else is the same as Chirp. Here's the code1607that makes one:16081609\begin{verbatim}1610signal = thinkdsp.ExpoChirp(start=220, end=880)1611wave = signal.make_wave(duration=1)1612\end{verbatim}16131614You can listen to these examples in {\tt chap03.ipynb} and compare1615the linear and exponential chirps.161616171618\section{Spectrum of a chirp}1619\label{sauron}16201621\begin{figure}1622% chirp.py1623\centerline{\includegraphics[height=2.5in]{figs/chirp1.eps}}1624\caption{Spectrum of a one-second one-octave chirp.}1625\label{fig.chirp1}1626\end{figure}16271628What do you think happens if you compute the spectrum of a chirp?1629Here's an example that constructs a one-second, one-octave chirp and1630its spectrum:16311632\begin{verbatim}1633signal = thinkdsp.Chirp(start=220, end=440)1634wave = signal.make_wave(duration=1)1635spectrum = wave.make_spectrum()1636\end{verbatim}16371638Figure~\ref{fig.chirp1} shows the result. The spectrum has1639components at every frequency from 220 to 440 Hz, with variations1640that look a little like the Eye of Sauron1641(see \url{http://en.wikipedia.org/wiki/Sauron}).16421643The spectrum is approximately flat between 220 and 440 Hz, which1644indicates that the signal spends equal time at each frequency in this1645range. Based on that observation, you should be able to guess what1646the spectrum of an exponential chirp looks like.16471648The spectrum gives hints about the structure of the signal,1649but it obscures the relationship between frequency and time.1650For example, we cannot tell by looking at this spectrum whether1651the frequency went up or down, or both.165216531654\section{Spectrogram}16551656\begin{figure}1657% chirp.py1658\centerline{\includegraphics[height=2.5in]{figs/chirp2.eps}}1659\caption{Spectrogram of a one-second one-octave chirp.}1660\label{fig.chirp2}1661\end{figure}16621663To recover the relationship between frequency and time, we can break1664the chirp into segments and plot the spectrum of each segment. The1665result is called a {\bf short-time Fourier transform} (STFT).16661667There are several ways to visualize a STFT, but the most common1668is a {\bf spectrogram}, which shows time on the x-axis and frequency1669on the y-axis. Each column in the spectrogram shows the spectrum of1670a short segment, using color or grayscale to represent amplitude.16711672As an example, I'll compute the spectrogram of this chirp:16731674\begin{verbatim}1675signal = thinkdsp.Chirp(start=220, end=440)1676wave = signal.make_wave(duration=1, framerate=11025)1677\end{verbatim}16781679{\tt Wave} provides \verb"make_spectrogram", which returns a1680{\tt Spectrogram} object:16811682\begin{verbatim}1683spectrogram = wave.make_spectrogram(seg_length=512)1684spectrogram.plot(high=700)1685\end{verbatim}16861687\verb"seg_length" is the number of samples in each segment. I chose1688512 because FFT is most efficient when the number of samples is a1689power of 2.16901691Figure~\ref{fig.chirp2} shows the result. The x-axis shows time from16920 to 1 seconds. The y-axis shows frequency from 0 to 700 Hz. I cut1693off the top part of the spectrogram; the full range goes to 5512.5 Hz,1694which is half of the framerate.16951696The spectrogram shows clearly that frequency increases linearly1697over time. Similarly, in the spectrogram of an exponential chirp, we1698can see the shape of the exponential curve.16991700However, notice that the peak in each column is blurred across 2--31701cells. This blurring reflects the limited resolution of the1702spectrogram.170317041705\section{The Gabor limit}1706\label{gabor}17071708The {\bf time resolution} of the spectrogram is the duration of the1709segments, which corresponds to the width of the cells in the1710spectrogram. Since each segment is 512 frames, and there are 11,0251711frames per second, there are 0.046 seconds per segment.17121713The {\bf frequency resolution} is the frequency range between1714elements in the spectrum, which corresponds to the height of the1715cells. With 512 frames, we get 256 frequency components over a range1716from 0 to 5512.5 Hz, so the range between components is 21.6 Hz.17171718More generally, if $n$ is the segment length, the spectrum contains1719$n/2$ components. If the framerate is $r$, the maximum frequency in1720the spectrum is $r/2$. So the time resolution is $n/r$ and the1721frequency resolution is1722%1723\[ \frac{r/2}{n/2} \]1724%1725which is $r/n$.17261727Ideally we would like time resolution to be small, so we can see rapid1728changes in frequency. And we would like frequency resolution to be1729small so we can see small changes in frequency. But you can't have1730both. Notice that time resolution, $n/r$, is the inverse of frequency1731resolution, $r/n$. So if one gets smaller, the other gets bigger.17321733For example, if you double the segment length, you cut frequency1734resolution in half (which is good), but you double time resolution1735(which is bad). Even increasing the framerate doesn't help. You get1736more samples, but the range of frequencies increases at1737the same time.17381739This tradeoff is called the {\bf Gabor limit} and it is a fundamental1740limitation of this kind of time-frequency analysis.174117421743\section{Leakage}17441745\begin{figure}1746% chirp.py1747\centerline{\includegraphics[height=2.5in]{figs/windowing1.eps}}1748\caption{Spectrum of a periodic segment of a sinusoid (left), a1749non-periodic segment (middle), a windowed non-periodic segment1750(right).}1751\label{fig.windowing1}1752\end{figure}17531754In order to explain how \verb"make_spectrogram" works, I have1755to explain windowing; and in order to explain windowing, I have to1756show you the problem it is meant to address, which is leakage.17571758The Discrete Fourier Transform (DFT), which we use to compute1759Spectrums, treats waves as if they are periodic; that is, it assumes1760that the finite segment it operates on is a complete period from an1761infinite signal that repeats over all time. In practice, this1762assumption is often false, which creates problems.17631764One common problem is discontinuity at the beginning and end of the1765segment. Because DFT assumes that the signal is periodic, it1766implicitly connects the end of the segment back to the beginning to1767make a loop. If the end does not connect smoothly to the beginning,1768the discontinuity creates additional frequency components in the1769segment that are not in the signal.17701771As an example, let's start with a sine signal that contains only1772one frequency component at 440 Hz.17731774\begin{verbatim}1775signal = thinkdsp.SinSignal(freq=440)1776\end{verbatim}17771778If we select a segment that happens to be an integer multiple of1779the period, the end of the segment connects smoothly with the1780beginning, and DFT behaves well.17811782\begin{verbatim}1783duration = signal.period * 301784wave = signal.make_wave(duration)1785spectrum = wave.make_spectrum()1786\end{verbatim}17871788Figure~\ref{fig.windowing1} (left) shows the result. As expected,1789there is a single peak at 440 Hz.17901791But if the duration is not a multiple of the period, bad things1792happen. With {\tt duration = signal.period * 30.25}, the signal1793starts at 0 and ends at 1.17941795Figure~\ref{fig.windowing1} (middle) shows1796the spectrum of this segment. Again, the peak is at 440 Hz, but now1797there are additional components spread out from 240 to 640 Hz. This1798spread is called ``spectral leakage'', because some of the energy that1799is actually at the fundamental frequency leaks into other frequencies.18001801In this example, leakage happens because we are using DFT on a segment1802that becomes discontinuous when we treat it as periodic.180318041805\section{Windowing}18061807\begin{figure}1808% chirp.py1809\centerline{\includegraphics[height=3.5in]{figs/windowing2.eps}}1810\caption{Segment of a sinusoid (top), Hamming window (middle), product1811of the segment and the window (bottom).}1812\label{fig.windowing2}1813\end{figure}18141815We can reduce leakage by smoothing out the discontinuity between1816the beginning and end of the segment, and one way to do that is1817{\bf windowing}.18181819A ``window'' is a function designed to transform a non-periodic1820segment into something that can pass for periodic.1821Figure~\ref{fig.windowing2} (top) shows a segment where the end does not1822connect smoothly to the beginning.18231824Figure~\ref{fig.windowing2} (middle) shows a ``Hamming window'', one of the1825more common window functions. No window function is perfect, but some1826can be shown to be optimal for different applications, and Hamming1827is a good, all-purpose window.18281829Figure~\ref{fig.windowing2} (bottom) shows the result of multiplying the1830window by the original signal. Where the window is close to 1, the1831signal is unchanged. Where the window is close to 0, the signal is1832attenuated. Because the window tapers at both ends, the end of the1833segment connects smoothly to the beginning.18341835Figure~\ref{fig.windowing1} (right) shows the spectrum of the windowed1836signal. Windowing has reduced leakage substantially, but not1837completely.18381839Here's what the code looks like. {\tt Wave} provides {\tt window},1840which applies a Hamming window:18411842\begin{verbatim}1843#class Wave:1844def window(self, window):1845self.ys *= window1846\end{verbatim}18471848And NumPy provides {\tt hamming}, which computes a Hamming window1849with a given length:18501851\begin{verbatim}1852window = np.hamming(len(wave))1853wave.window(window)1854\end{verbatim}18551856NumPy provides functions to compute other window1857functions, including {\tt bartlett}, {\tt blackman}, {\tt hanning},1858and {\tt kaiser}. One of the exercises at the end of this chapter1859asks you to experiment with these other windows.186018611862\section{Implementing spectrograms}18631864\begin{figure}1865% chirp.py1866\centerline{\includegraphics[height=2.5in]{figs/windowing3.eps}}1867\caption{Overlapping Hamming windows.}1868\label{fig.windowing3}1869\end{figure}18701871Now that we understand windowing, we can understand the1872implementation of spectrogram.1873Here is the {\tt Wave} method that computes spectrograms:18741875\begin{verbatim}1876#class Wave:1877def make_spectrogram(self, seg_length):1878window = np.hamming(seg_length)1879i, j = 0, seg_length1880step = seg_length / 218811882spec_map = {}18831884while j < len(self.ys):1885segment = self.slice(i, j)1886segment.window(window)18871888t = (segment.start + segment.end) / 21889spec_map[t] = segment.make_spectrum()18901891i += step1892j += step18931894return Spectrogram(spec_map, seg_length)1895\end{verbatim}18961897This is the longest function in the book, so if you can handle1898this, you can handle anything.18991900The parameter, {\tt self}, is a Wave object.1901\verb"seg_length" is the number of samples in each segment.19021903{\tt window} is a Hamming window with the same length as the segments.19041905{\tt i} and {\tt j} are the slice indices that select segments from1906the wave. {\tt step} is the offset between segments. Since {\tt1907step} is half of \verb"seg_length", the segments overlap by half.1908Figure~\ref{fig.windowing3} shows what these overlapping windows look1909like.19101911\verb"spec_map" is a dictionary that maps from a timestamp to1912a Spectrum.19131914Inside the while loop, we select a slice from the wave and apply1915the window; then we construct a Spectrum1916object and add it to \verb"spec_map". The nominal time of1917each segment, {\tt t}, is the midpoint.19181919Then we advance {\tt i} and {\tt j}, and continue as long as {\tt j}1920doesn't go past the end of the Wave.19211922Finally, the method constructs and returns a Spectrogram. Here1923is the definition of Spectrogram:19241925\begin{verbatim}1926class Spectrogram(object):1927def __init__(self, spec_map, seg_length):1928self.spec_map = spec_map1929self.seg_length = seg_length1930\end{verbatim}19311932Like many init methods, this one just stores the1933parameters as attributes.19341935{\tt Spectrogram} provides {\tt plot}, which generates a1936pseudocolor plot with time along the x-axis and frequency along1937the y-axis.19381939And that's how Spectrograms are implemented.194019411942\section{Exercises}19431944Solutions to these exercises are in {\tt chap03soln.ipynb}.19451946\begin{exercise}1947Run and listen to the examples in {\tt chap03.ipynb}, which is1948in the repository for this book, and also available at1949\url{http://tinyurl.com/thinkdsp03}.19501951In the leakage example, try replacing the Hamming window with one of1952the other windows provided by NumPy, and see what effect they have on1953leakage. See1954\url{http://docs.scipy.org/doc/numpy/reference/routines.window.html}1955\end{exercise}195619571958\begin{exercise}1959Write a class called {\tt SawtoothChirp} that extends {\tt Chirp}1960and overrides {\tt evaluate} to generate a sawtooth waveform with1961frequency that increases (or decreases) linearly.19621963Hint: combine the evaluate functions from {\tt Chirp} and1964{\tt SawtoothSignal}.19651966Draw a sketch of what you think the spectrogram of this signal1967looks like, and then plot it. The effect of aliasing should be1968visually apparent, and if you listen carefully, you can hear it.1969\end{exercise}197019711972\begin{exercise}1973Make a sawtooth chirp that sweeps from 2500 to 3000 Hz, then use it to1974make a wave with duration 1 s and framerate 20 kHz. Draw a sketch of1975what you think the spectrum will look like. Then plot the1976spectrum and see if you got it right.1977\end{exercise}197819791980\begin{exercise}1981In musical terminology, a ``glissando'' is a note that slides from one1982pitch to another, so it is similar to a chirp.19831984Find or make a recording of a glissando and plot a spectrogram of the1985first few seconds. One suggestion: George Gershwin's {\it Rhapsody in1986Blue} starts with a famous clarinet glissando, which you can download1987from \url{http://archive.org/details/rhapblue11924}.1988\end{exercise}198919901991\begin{exercise}1992A trombone player can play a glissando by extending the trombone1993slide while blowing continuously. As the slide extends, the total1994length of the tube gets longer, and the resulting pitch is inversely1995proportional to length.19961997Assuming that the player moves the slide at a constant speed, how1998does frequency vary with time?19992000Write a class called {\tt TromboneGliss} that extends {\tt Chirp} and2001provides {\tt evaluate}. Make a wave that simulates a trombone2002glissando from C3 up to F3 and back down to C3. C3 is 262 Hz; F3 is2003349 Hz.20042005Plot a spectrogram of the resulting wave. Is a trombone glissando2006more like a linear or exponential chirp?2007\end{exercise}200820092010\begin{exercise}2011Make or find a recording of a series of vowel sounds and look at the2012spectrogram. Can you identify different vowels?2013\end{exercise}201420152016\chapter{Noise}20172018In English, ``noise'' means an unwanted or unpleasant sound. In the2019context of signal processing, it has two different senses:20202021\begin{enumerate}20222023\item As in English, it can mean an unwanted signal of any kind. If2024two signals interfere with each other, each signal would consider2025the other to be noise.20262027\item ``Noise'' also refers to a signal that contains components at2028many frequencies, so it lacks the harmonic structure of the periodic2029signals we saw in previous chapters.20302031\end{enumerate}20322033This chapter is about the second kind.20342035The code for this chapter is in {\tt chap04.ipynb}, which is in the2036repository for this book (see Section~\ref{code}).2037You can also view it at \url{http://tinyurl.com/thinkdsp04}.203820392040\section{Uncorrelated noise}20412042\begin{figure}2043% noise.py2044\centerline{\includegraphics[height=2.5in]{figs/whitenoise0.eps}}2045\caption{Waveform of uncorrelated uniform noise.}2046\label{fig.whitenoise0}2047\end{figure}20482049The simplest way to understand noise is to generate it, and the2050simplest kind to generate is uncorrelated uniform noise (UU noise).2051``Uniform'' means the signal contains random values from a uniform2052distribution; that is, every value in the range is equally likely.2053``Uncorrelated'' means that the values are independent; that is,2054knowing one value provides no information about the others.20552056Here's a class that represents UU noise:20572058\begin{verbatim}2059class UncorrelatedUniformNoise(_Noise):20602061def evaluate(self, ts):2062ys = np.random.uniform(-self.amp, self.amp, len(ts))2063return ys2064\end{verbatim}20652066{\tt UncorrelatedUniformNoise} inherits from \verb"_Noise", which2067inherits from {\tt Signal}.20682069As usual, the evaluate function takes {\tt ts}, the times when the2070signal should be evaluated. It uses2071{\tt np.random.uniform}, which generates values from a2072uniform distribution. In this example, the values are in2073the range between {\tt -amp} to {\tt amp}.20742075The following example generates UU noise with duration 0.52076seconds at 11,025 samples per second.20772078\begin{verbatim}2079signal = thinkdsp.UncorrelatedUniformNoise()2080wave = signal.make_wave(duration=0.5, framerate=11025)2081\end{verbatim}20822083If you play this wave, it sounds like the static you hear if you tune2084a radio between channels. Figure~\ref{fig.whitenoise0} shows what the2085waveform looks like. As expected, it looks pretty random.20862087\begin{figure}2088% noise.py2089\centerline{\includegraphics[height=2.5in]{figs/whitenoise1.eps}}2090\caption{Power spectrum of uncorrelated uniform noise.}2091\label{fig.whitenoise1}2092\end{figure}20932094Now let's take a look at the spectrum:20952096\begin{verbatim}2097spectrum = wave.make_spectrum()2098spectrum.plot_power()2099\end{verbatim}21002101\verb"Spectrum.plot_power" is similar to \verb"Spectrum.plot",2102except that it plots power instead of amplitude.2103Power is the square of amplitude.2104I am switching from amplitude to power in this chapter because2105it is more conventional in the context of noise.21062107Figure~\ref{fig.whitenoise1} shows the result. Like the signal, the2108spectrum looks pretty random. In fact, it {\em is} random, but we have to2109be more precise about the word ``random''. There are at least three2110things we might like to know about a noise signal or its spectrum:21112112\begin{itemize}21132114\item Distribution: The distribution of a random signal is the set of2115possible values and their probabilities. For example, in the2116uniform noise signal, the set of values is the range from -1 to 1,2117and all values have the same probability. An alternative is2118{\bf Gaussian noise}, where the set of values is the range from negative2119to positive infinity, but values near 0 are the most likely, with2120probability that drops off according to the Gaussian or2121``bell'' curve.21222123\item Correlation: Is each value in the signal independent of the2124others, or are there dependencies between them? In UU noise, the2125values are independent.2126An alternative is {\bf Brownian noise}, where each value is the sum2127of the previous value and a random ``step''. So if the value of the2128signal is high at a particular point in time, we expect it to stay2129high, and if it is low, we expect2130it to stay low.21312132\item Relationship between power and frequency: In the spectrum of UU2133noise, the power at all frequencies is drawn from the same2134distribution; that is, the average power is the same for all2135frequencies. An alternative is {\bf pink noise}, where power is2136inversely related to frequency; that is, the power at frequency $f$2137is drawn from a distribution whose mean is proportional to $1/f$.21382139\end{itemize}214021412142\section{Integrated spectrum}21432144For UU noise we can see the relationship between power and frequency2145more clearly by looking at the {\bf integrated spectrum}, which2146is a function of frequency, $f$, that shows the cumulative power in2147the spectrum up to $f$.21482149\begin{figure}2150% noise.py2151\centerline{\includegraphics[height=2.5in]{figs/whitenoise2.eps}}2152\caption{Integrated spectrum of uncorrelated uniform noise.}2153\label{fig.whitenoise2}2154\end{figure}21552156{\tt Spectrum} provides a method that computes the IntegratedSpectrum:21572158\begin{verbatim}2159def make_integrated_spectrum(self):2160cs = np.cumsum(self.power)2161cs /= cs[-1]2162return IntegratedSpectrum(cs, self.fs)2163\end{verbatim}21642165{\tt self.power} is a NumPy array containing power for each frequency.2166{\tt np.cumsum} computes the cumulative sum of the powers.2167Dividing through by the last element normalizes the integrated2168spectrum so it runs from 0 to 1.21692170The result is an IntegratedSpectrum. Here is the class definition:21712172\begin{verbatim}2173class IntegratedSpectrum(object):2174def __init__(self, cs, fs):2175self.cs = cs2176self.fs = fs2177\end{verbatim}21782179Like Spectrum, IntegratedSpectrum provides \verb"plot_power", so2180we can compute and plot the integrated spectrum like this:21812182\begin{verbatim}2183integ = spectrum.make_integrated_spectrum()2184integ.plot_power()2185\end{verbatim}21862187The result, shown in Figure~\ref{fig.whitenoise2}, is a straight line,2188which indicates that power at all frequencies is constant, on average.2189Noise with equal power at all frequencies is called {\bf white noise}2190by analogy with light, because an equal mixture of light at all2191visible frequencies is white.2192219321942195\section{Brownian noise}2196\label{brownian}21972198\begin{figure}2199% noise.py2200\centerline{\includegraphics[height=2.5in]{figs/rednoise0.eps}}2201\caption{Waveform of Brownian noise.}2202\label{fig.rednoise0}2203\end{figure}22042205UU noise is uncorrelated, which means that each value does not depend2206on the others. An alternative is Brownian noise, in which each value2207is the sum of the previous value and a random ``step''.22082209It is called ``Brownian'' by analogy with Brownian motion, in which a2210particle suspended in a fluid moves apparently at random, due to2211unseen interactions with the fluid. Brownian motion is often2212described using a {\bf random walk}, which is a mathematical model2213of a path where the distance between steps is characterized by a2214random distribution.22152216In a one-dimensional random walk, the particle moves up or down2217by a random amount at each time step. The location of the particle2218at any point in time is the sum of all previous steps.22192220This observation suggests a way to generate Brownian noise:2221generate uncorrelated random steps and then add them up.2222Here is a class definition that implements this algorithm:22232224\begin{verbatim}2225class BrownianNoise(_Noise):22262227def evaluate(self, ts):2228dys = np.random.uniform(-1, 1, len(ts))2229ys = np.cumsum(dys)2230ys = normalize(unbias(ys), self.amp)2231return ys2232\end{verbatim}22332234{\tt evaluate} uses {\tt np.random.uniform} to generate an2235uncorrelated signal and {\tt np.cumsum} to compute their cumulative2236sum.22372238Since the sum is likely to escape the range from -1 to22391, we have to use {\tt unbias} to shift the mean to 0, and {\tt2240normalize} to get the desired maximum amplitude.22412242Here's the code that generates a BrownianNoise object and plots the2243waveform.22442245\begin{verbatim}2246signal = thinkdsp.BrownianNoise()2247wave = signal.make_wave(duration=0.5, framerate=11025)2248wave.plot()2249\end{verbatim}22502251Figure~\ref{fig.rednoise0} shows the result. The waveform2252wanders up and down, but there is a clear correlation between2253successive values. When the amplitude is high, it tends to stay2254high, and vice versa.22552256\begin{figure}2257% noise.py2258\centerline{\includegraphics[height=2.5in]{figs/rednoise3.eps}}2259\caption{Spectrum of Brownian noise on a log-log scale.}2260\label{fig.rednoise3}2261\end{figure}22622263If you plot the spectrum of Brownian noise, it doesn't look like2264much. Nearly all of the power is at the lowest frequencies; on a2265linear scale, the higher frequency components are not visible.22662267To see the shape of the spectrum more clearly, we can plot power2268and frequency on a log-log scale. Here's the code:22692270\begin{verbatim}2271spectrum = wave.make_spectrum()2272spectrum.plot_power(linewidth=1, alpha=0.5)2273thinkplot.config(xscale='log', yscale='log')2274\end{verbatim}22752276The result is in Figure~\ref{fig.rednoise3}. The relationship between2277power and frequency is noisy, but roughly linear.22782279{\tt Spectrum} provides \verb"estimate_slope", which uses SciPy to compute2280a least squares fit to the power spectrum:22812282\begin{verbatim}2283#class Spectrum22842285def estimate_slope(self):2286x = np.log(self.fs[1:])2287y = np.log(self.power[1:])2288t = scipy.stats.linregress(x,y)2289return t2290\end{verbatim}22912292It discards the first component of the spectrum because2293this component corresponds to $f=0$, and $\log 0$ is undefined.22942295\verb"estimate_slope" returns the result from {\tt2296scipy.stats.linregress} which is an object that contains the2297estimated slope and intercept, coefficient of determination ($R^2$),2298p-value, and standard error. For our purposes, we only need the2299slope.23002301For Brownian noise, the slope of the power spectrum is -2 (we'll see2302why in Chapter~\ref{diffint}), so we can write this relationship:2303%2304\[ \log P = k -2 \log f \]2305%2306where $P$ is power, $f$ is frequency, and $k$ is the intercept2307of the line, which is not important for our purposes.2308Exponentiating both sides yields:2309%2310\[ P = K / f^{2} \]2311%2312where $K$ is $e^k$, but still not important. More relevant is2313that power is proportional to $1/f^2$, which is characteristic2314of Brownian noise.23152316Brownian noise is also called {\bf red noise}, for the same reason that2317white noise is called ``white''. If you combine visible light with2318power proportional to $1/f^2$, most of the power2319would be at the low-frequency end of the spectrum, which is red.2320Brownian noise is also sometimes called ``brown noise'', but I think2321that's confusing, so I won't use it.23222323%TODO: refer back to this in the diff/int chapter232423252326\section{Pink Noise}2327\label{pink}23282329\begin{figure}2330% noise.py2331\centerline{\includegraphics[height=2.5in]{figs/pinknoise0.eps}}2332\caption{Waveform of pink noise with $\beta=1$.}2333\label{fig.pinknoise0}2334\end{figure}23352336For red noise, the relationship between frequency2337and power is2338%2339\[ P = K / f^{2} \]2340%2341There is nothing special about the exponent 2. More generally,2342we can synthesize noise with any exponent, $\beta$.2343%2344\[ P = K / f^{\beta} \]2345%2346When $\beta = 0$, power is constant at all frequencies,2347so the result is white noise. When $\beta=2$ the result is red noise.23482349When $\beta$ is between 0 and 2, the result is between white and2350red noise, so it is called ``pink noise''.23512352There are several ways to generate pink noise. The simplest is to2353generate white noise and then apply a low-pass filter with the2354desired exponent. {\tt thinkdsp} provides a class that represents2355a pink noise signal:23562357\begin{verbatim}2358class PinkNoise(_Noise):23592360def __init__(self, amp=1.0, beta=1.0):2361self.amp = amp2362self.beta = beta2363\end{verbatim}23642365{\tt amp} is the desired amplitude of the signal.2366{\tt beta} is the desired exponent. {\tt PinkNoise} provides2367\verb"make_wave", which generates a Wave.23682369\begin{verbatim}2370def make_wave(self, duration=1, start=0, framerate=11025):2371signal = UncorrelatedUniformNoise()2372wave = signal.make_wave(duration, start, framerate)2373spectrum = wave.make_spectrum()23742375spectrum.pink_filter(beta=self.beta)23762377wave2 = spectrum.make_wave()2378wave2.unbias()2379wave2.normalize(self.amp)2380return wave22381\end{verbatim}23822383{\tt duration} is the length of the wave in seconds. {\tt start} is2384the start time of the wave; it is included so that \verb"make_wave"2385has the same interface for all types of signal, but for random noise,2386start time is irrelevant. And {\tt framerate} is the number of2387samples per second.23882389\begin{figure}2390% noise.py2391\centerline{\includegraphics[height=2.5in]{figs/noise-triple.eps}}2392\caption{Spectrum of white, pink, and red noise on a log-log scale.}2393\label{fig.noise-triple}2394\end{figure}23952396\verb"make_wave" creates a white noise wave, computes its spectrum,2397applies a filter with the desired exponent, and then converts the2398filtered spectrum back to a wave. Then it unbiases and normalizes2399the wave.24002401{\tt Spectrum} provides \verb"pink_filter":24022403\begin{verbatim}2404def pink_filter(self, beta=1.0):2405denom = self.fs ** (beta/2.0)2406denom[0] = 12407self.hs /= denom2408\end{verbatim}24092410\verb"pink_filter" divides each element of the spectrum by2411$f^{\beta/2}$. Since power is the square of amplitude, this2412operation divides the power at2413each component by $f^\beta$. It treats the component2414at $f=0$ as a special case, partly to avoid dividing by 0, and partly2415because this element represents the bias of the signal,2416which we are going to set to 0 anyway.24172418Figure~\ref{fig.pinknoise0} shows the resulting waveform. Like2419Brownian noise, it wanders up and down in a way that suggests2420correlation between successive values, but at least visually, it looks2421more random. In the next chapter we will come back to this2422observation and I will be more precise about what I mean by2423``correlation'' and ``more random''.24242425Finally, Figure~\ref{fig.noise-triple} shows a spectrum for2426white, pink, and red noise on the same log-log scale.2427The relationship between the exponent, $\beta$, and the slope2428of the spectrum is apparent in this figure.242924302431\section{Gaussian noise}24322433\begin{figure}2434% noise.py2435\centerline{\includegraphics[height=2.5in]{figs/noise1.eps}}2436\caption{Normal probability plot for the real and imaginary parts2437of the spectrum of Gaussian noise.}2438\label{fig.noise1}2439\end{figure}24402441We started with uncorrelated uniform (UU) noise and showed that,2442because its spectrum has equal power at all frequencies, on2443average, UU noise is white.24442445But when people talk about ``white noise'', they don't always2446mean UU noise. In fact, more often they mean uncorrelated2447Gaussian (UG) noise.24482449{\tt thinkdsp} provides an implementation of UG noise:24502451\begin{verbatim}2452class UncorrelatedGaussianNoise(_Noise):24532454def evaluate(self, ts):2455ys = np.random.normal(0, self.amp, len(ts))2456return ys2457\end{verbatim}24582459{\tt np.random.normal} returns a NumPy array of values from a2460Gaussian distribution, in this case with mean 0 and standard deviation2461{\tt self.amp}. In theory the range of values is from negative to2462positive infinity, but we expect about 99\% of the values to be2463between -3 and 3.24642465UG noise is similar in many ways to UU noise. The spectrum has2466equal power at all frequencies, on average, so UG is also white.2467And it has one other interesting property: the spectrum of UG2468noise is also UG noise. More precisely, the real and imaginary2469parts of the spectrum are uncorrelated Gaussian values.24702471To test that claim, we can generate the spectrum of UG noise and2472then generate a ``normal probability plot'', which is a graphical2473way to test whether a distribution is Gaussian.24742475\begin{verbatim}2476signal = thinkdsp.UncorrelatedGaussianNoise()2477wave = signal.make_wave(duration=0.5, framerate=11025)2478spectrum = wave.make_spectrum()24792480thinkstats2.NormalProbabilityPlot(spectrum.real)2481thinkstats2.NormalProbabilityPlot(spectrum.imag)2482\end{verbatim}24832484{\tt NormalProbabilityPlot} is provided by {\tt thinkstats2}, which is2485included in the repository for this book. If you are not familiar2486with normal probability plots, you can read about them in Chapter 52487of {\it Think Stats} at \url{http://thinkstats2.com}.24882489Figure~\ref{fig.noise1} shows the results. The gray lines2490show a linear model fit to the data; the dark lines show the2491data.24922493A straight line on a normal probability plot indicates2494that the data come from a Gaussian distribution. Except for2495some random variation at the extremes, these lines are straight,2496which indicates that the spectrum of UG noise is UG noise.24972498The spectrum of UU noise is also UG noise, at least approximately. In2499fact, by the Central Limit Theorem, the spectrum of almost any2500uncorrelated noise is approximately Gaussian, as long as the2501distribution has finite mean and standard deviation, and the number of2502samples is large.250325042505\section{Exercises}25062507Solutions to these exercises are in {\tt chap04soln.ipynb}.25082509\begin{exercise}2510``A Soft Murmur'' is a web site that plays a mixture of natural2511noise sources, including rain, waves, wind, etc. At2512\url{http://asoftmurmur.com/about/} you can find their list2513of recordings, most of which are at \url{http://freesound.org}.25142515Download a few of these files and compute the spectrum of each2516signal. Does the power spectrum look like white noise, pink noise,2517or Brownian noise? How does the spectrum vary over time?2518\end{exercise}251925202521\begin{exercise}2522In a noise signal, the mixture of frequencies changes over time.2523In the long run, we expect the power at all frequencies to be equal,2524but in any sample, the power at each frequency is random.25252526To estimate the long-term average power at each frequency, we can2527break a long signal into segments, compute the power spectrum2528for each segment, and then compute the average across2529the segments. You can read more about this algorithm at2530\url{http://en.wikipedia.org/wiki/Bartlett's_method}.25312532Implement Bartlett's method and use it to estimate the power2533spectrum for a noise wave. Hint: look at the implementation2534of \verb"make_spectrogram".2535\end{exercise}253625372538\begin{exercise}2539At \url{http://www.coindesk.com} you can download the daily2540price of a BitCoin as a CSV file. Read this file and compute2541the spectrum of BitCoin prices as a function of time.2542Does it resemble white, pink, or Brownian noise?2543\end{exercise}254425452546\begin{exercise}2547A Geiger counter is a device that detects radiation.2548When an ionizing particle strikes the detector, it outputs a surge of2549current. The total output at a point in time can be modeled as2550uncorrelated Poisson (UP) noise, where each sample is2551a random quantity from a Poisson distribution, which corresponds to the2552number of particles detected during an interval.25532554Write a class called {\tt UncorrelatedPoissonNoise} that inherits2555from \verb"thinkdsp._Noise" and provides {\tt evaluate}. It should2556use {\tt np.random.poisson} to generate random values from a Poisson2557distribution. The parameter of this function, {\tt lam}, is the2558average number of particles during each interval. You can use the2559attribute {\tt amp} to specify {\tt lam}. For example, if the2560framerate is 10 kHz and {\tt amp} is 0.001, we expect about 102561``clicks'' per second.25622563Generate about a second of UP noise and listen to it. For low values2564of {\tt amp}, like 0.001, it should sound like a Geiger counter. For2565higher values it should sound like white noise. Compute and plot the2566power spectrum to see whether it looks like white noise.2567\end{exercise}256825692570\begin{exercise}2571The algorithm in this chapter for generating pink noise is2572conceptually simple but computationally expensive. There are2573more efficient alternatives, like the Voss-McCartney algorithm.2574Research this method, implement it, compute the spectrum of2575the result, and confirm that it has the desired relationship2576between power and frequency.2577\end{exercise}257825792580\chapter{Autocorrelation}25812582In the previous chapter I characterized white noise as ``uncorrelated'',2583which means that each value is independent of the others, and Brownian2584noise as ``correlated'', because each value depends on the preceding2585value. In this chapter I define these terms more precisely and2586present the {\bf autocorrelation function}, which is a useful tool2587for signal analysis.25882589The code for this chapter is in {\tt chap05.ipynb}, which is in the2590repository for this book (see Section~\ref{code}).2591You can also view it at \url{http://tinyurl.com/thinkdsp05}.259225932594\section{Correlation}25952596In general, correlation between variables means that if you know the2597value of one, you have some information about the other. There are2598several ways to quantify correlation, but the most common is the2599Pearson product-moment correlation coefficient, usually denoted2600$\rho$. For two variables, $x$ and $y$, that each contain $N$ values:2601%2602\[ \rho = \frac{ \sum_i (x_i - \mu_x) (y_i - \mu_y)}{N \sigma_x \sigma_y} \]2603%2604Where $\mu_x$ and $\mu_y$ are the means of $x$ and $y$, and2605$\sigma_x$ and $\sigma_y$ are their standard deviations.26062607Pearson's correlation is always between -1 and +1 (including both).2608If $\rho$ is positive, we say that the correlation is positive,2609which means that when one variable is high, the other tends to be2610high. If $\rho$ is negative, the correlation is negative, so2611when one variable is high, the other tends to be low.26122613The magnitude of $\rho$ indicates the strength of the correlation. If2614$\rho$ is 1 or -1, the variables are perfectly correlated, which means2615that if you know one, you can make a perfect prediction about the2616other. If $\rho$ is near zero, the correlation is probably weak, so2617if you know one, it doesn't tell you much about the others,26182619I say ``probably weak'' because it is also possible that there is2620a nonlinear relationship that is not captured by the coefficient2621of correlation. Nonlinear relationships are often important in2622statistics, but less often relevant for signal processing, so I2623won't say more about them here.26242625Python provides several ways to compute correlations. {\tt2626np.corrcoef} takes any number of variables and computes a {\bf2627correlation matrix} that includes correlations between each pair of2628variables.26292630\begin{figure}2631% autocorr.py2632\centerline{\includegraphics[height=2.5in]{figs/autocorr1.eps}}2633\caption{Two sine waves that differ by a phase offset of 1 radian;2634their coefficient of correlation is 0.54.}2635\label{fig.autocorr1}2636\end{figure}26372638I'll present an example with only two variables. First, I define2639a function that constructs sine waves with different phase offsets:26402641\begin{verbatim}2642def make_sine(offset):2643signal = thinkdsp.SinSignal(freq=440, offset=offset)2644wave = signal.make_wave(duration=0.5, framerate=10000)2645return wave2646\end{verbatim}26472648Next I instantiate two waves with different offsets:26492650\begin{verbatim}2651wave1 = make_sine(offset=0)2652wave2 = make_sine(offset=1)2653\end{verbatim}26542655Figure~\ref{fig.autocorr1} shows what the first few periods of these2656waves look like. When one wave is high, the other is usually high, so we2657expect them to be correlated.26582659\begin{verbatim}2660>>> corr_matrix = np.corrcoef(wave1.ys, wave2.ys, ddof=0)2661[[ 1. 0.54]2662[ 0.54 1. ]]2663\end{verbatim}26642665The option {\tt ddof=0} indicates that {\tt corrcoef} should divide by2666$N$, as in the equation above, rather than use the default, $N-1$.26672668The result is a correlation matrix:2669the first element is the correlation of {\tt wave1}2670with itself, which is always 1. Similarly, the last element2671is the correlation of {\tt wave2} with itself.26722673\begin{figure}2674% autocorr.py2675\centerline{\includegraphics[height=2.5in]{figs/autocorr2.eps}}2676\caption{The correlation of two sine waves as a function of the2677phase offset between them. The result is a cosine.}2678\label{fig.autocorr2}2679\end{figure}26802681The off-diagonal elements contain the value we're interested in,2682the correlation of {\tt wave1} and {\tt wave2}. The value 0.542683indicates that the strength of the correlation is moderate.26842685As the phase offset increases, this correlation decreases until2686the waves are 180 degrees out of phase, which yields correlation2687-1. Then it increases until the offset differs by 360 degrees.2688At that point we have come full circle and the correlation is 1.26892690Figure~\ref{fig.autocorr2} shows the relationship between correlation and2691phase offset for a sine wave. The shape of that curve should look2692familiar; it is a cosine.26932694{\tt thinkdsp} provides a simple interface for computing the2695correlation between waves:26962697\begin{verbatim}2698>>> wave1.corr(wave2)26990.542700\end{verbatim}270127022703\section{Serial correlation}27042705Signals often represent measurements of quantities that vary in2706time. For example, the sound signals we've worked with represent2707measurements of voltage (or current), which correspond to the changes2708in air pressure we perceive as sound.27092710Measurements like this almost always have serial correlation, which2711is the correlation between each element and the next (or the previous).2712To compute serial correlation, we can shift a signal and then compute2713the correlation of the shifted version with the original.27142715\begin{verbatim}2716def serial_corr(wave, lag=1):2717n = len(wave)2718y1 = wave.ys[lag:]2719y2 = wave.ys[:n-lag]2720corr = np.corrcoef(y1, y2, ddof=0)[0, 1]2721return corr2722\end{verbatim}27232724\verb"serial_corr" takes a Wave object and2725{\tt lag}, which is the integer number of places to shift the wave.2726It computes the correlation of the wave with a shifted version2727of itself.27282729We can test this function with the noise signals from the previous2730chapter. We expect UU noise to be uncorrelated, based on the2731way it's generated (not to mention the name):27322733\begin{verbatim}2734signal = thinkdsp.UncorrelatedGaussianNoise()2735wave = signal.make_wave(duration=0.5, framerate=11025)2736serial_corr(wave)2737\end{verbatim}27382739When I ran this example, I got 0.006, which indicates a very2740small serial correlation. You might get a different value when you run2741it, but it should be comparably small.27422743In a Brownian noise signal, each value is the sum of the previous2744value and a random ``step'', so we expect a strong serial2745correlation:27462747\begin{verbatim}2748signal = thinkdsp.BrownianNoise()2749wave = signal.make_wave(duration=0.5, framerate=11025)2750serial_corr(wave)2751\end{verbatim}27522753Sure enough, the result I got is greater than 0.999.27542755\begin{figure}2756% autocorr.py2757\centerline{\includegraphics[height=2.5in]{figs/autocorr3.eps}}2758\caption{Serial correlation for pink noise with a range of2759parameters.}2760\label{fig.autocorr3}2761\end{figure}27622763Since pink noise is in some sense between Brownian noise and UU noise,2764we might expect an intermediate correlation:27652766\begin{verbatim}2767signal = thinkdsp.PinkNoise(beta=1)2768wave = signal.make_wave(duration=0.5, framerate=11025)2769serial_corr(wave)2770\end{verbatim}27712772With parameter $\beta=1$, I got a serial correlation of 0.851.2773As we vary the parameter from $\beta=0$, which is uncorrelated2774noise, to $\beta=2$, which is Brownian, serial correlation2775ranges from 0 to almost 1, as shown in Figure~\ref{fig.autocorr3}.277627772778\section{Autocorrelation}2779\label{autopink}27802781In the previous section we computed the correlation between each2782value and the next, so we shifted the elements of the array by 1.2783But we can easily compute serial correlations with2784different lags.27852786\begin{figure}2787% autocorr.py2788\centerline{\includegraphics[height=2.5in]{figs/autocorr4.eps}}2789\caption{Autocorrelation functions for pink noise with a range2790of parameters.}2791\label{fig.autocorr4}2792\end{figure}27932794You can think of \verb"serial_corr" as a function that2795maps from each value of lag to the corresponding correlation, and we2796can evaluate that function by looping through values of {\tt lag}:27972798\begin{verbatim}2799def autocorr(wave):2800lags = range(len(wave.ys)//2)2801corrs = [serial_corr(wave, lag) for lag in lags]2802return lags, corrs2803\end{verbatim}28042805{\tt autocorr} takes a Wave object and returns the autocorrelation2806function as a pair of sequences: {\tt lags} is a sequence of2807integers from 0 to half the length of the wave; {\tt corrs}2808is the sequence of serial correlations for each lag.28092810Figure~\ref{fig.autocorr4} shows autocorrelation functions for pink2811noise with three values of $\beta$. For low values of $\beta$, the2812signal is less correlated, and the autocorrelation function drops2813off to zero quickly. For larger values, serial correlation2814is stronger and drops off more slowly. With $\beta=1.7$ serial2815correlation is strong even for long lags; this phenomenon is called2816{\bf long-range dependence}, because it indicates that each value in2817the signal depends on many preceding values.281828192820\section{Autocorrelation of periodic signals}28212822The autocorrelation of pink noise has interesting mathematical2823properties, but limited applications. The autocorrelation of2824periodic signals is more useful.28252826\begin{figure}2827% autocorr.py2828\centerline{\includegraphics[height=2.5in]{figs/autocorr5.eps}}2829\caption{Spectrogram of a vocal chirp.}2830\label{fig.autocorr5}2831\end{figure}28322833As an example, I downloaded from \url{freesound.org} a recording of2834someone singing a chirp; the repository for this book includes the2835file: \url{28042__bcjordan__voicedownbew.wav}. You can use the2836IPython notebook for this chapter, {\tt chap05.ipynb}, to play it.28372838Figure~\ref{fig.autocorr5} shows the spectrogram of this wave.2839The fundamental frequency and some of the harmonics show up clearly.2840The chirp starts near 500 Hz and drops down to about 300 Hz, roughly2841from C5 to E4.28422843\begin{figure}2844% autocorr.py2845\centerline{\includegraphics[height=2.5in]{figs/autocorr6.eps}}2846\caption{Spectrum of a segment from a vocal chirp.}2847\label{fig.autocorr6}2848\end{figure}28492850To estimate pitch at a particular point in time, we could use the2851spectrum, but it doesn't work very well. To see why not, I'll take2852a short segment from the wave and plot its spectrum:28532854\begin{verbatim}2855duration = 0.012856segment = wave.segment(start=0.2, duration=duration)2857spectrum = segment.make_spectrum()2858spectrum.plot(high=1000)2859\end{verbatim}28602861This segment starts at 0.2 seconds and lasts 0.01 seconds.2862Figure~\ref{fig.autocorr6} shows its spectrum. There is a clear peak2863near 400 Hz, but it is hard to identify the pitch precisely. The2864length of the segment is 441 samples at a framerate of 44100 Hz, so2865the frequency resolution is 100 Hz (see Section~\ref{gabor}).2866That means the estimated pitch might be off by 50 Hz; in musical2867terms, the range from 350 Hz to 450 Hz is about 5 semitones, which is2868a big difference!28692870We could get better frequency resolution by taking a longer segment,2871but since the pitch is changing over time, we would also get ``motion2872blur''; that is, the peak would spread between the start and end pitch of2873the segment, as we saw in Section~\ref{sauron}.28742875We can estimate pitch more precisely using autocorrelation.2876If a signal is periodic, we expect the autocorrelation to spike2877when the lag equals the period.28782879\begin{figure}2880% autocorr.py2881\centerline{\includegraphics[height=2.5in]{figs/autocorr7.eps}}2882\caption{Two segments from a chirp, one starting 0.0023 seconds2883after the other.}2884\label{fig.autocorr7}2885\end{figure}28862887To show why that works, I'll plot two segments from the same2888recording.28892890\begin{verbatim}2891def plot_shifted(wave, offset=0.001, start=0.2):2892thinkplot.preplot(2)2893segment1 = wave.segment(start=start, duration=0.01)2894segment1.plot(linewidth=2, alpha=0.8)28952896segment2 = wave.segment(start=start-offset, duration=0.01)2897segment2.shift(offset)2898segment2.plot(linewidth=2, alpha=0.4)28992900corr = segment1.corr(segment2)2901text = r'$\rho =$ %.2g' % corr2902thinkplot.text(segment1.start+0.0005, -0.8, text)2903thinkplot.config(xlabel='Time (s)')2904\end{verbatim}29052906One segment starts at 0.2 seconds; the other starts 0.0023 seconds2907later. Figure~\ref{fig.autocorr7} shows the result. The segments2908are similar, and their correlation is 0.99. This result suggests2909that the period is near 0.0023 seconds, which corresponds to a frequency2910of 435 Hz.29112912\begin{figure}2913% autocorr.py2914\centerline{\includegraphics[height=2.5in]{figs/autocorr8.eps}}2915\caption{Autocorrelation function for a segment from a chirp.}2916\label{fig.autocorr8}2917\end{figure}29182919For this example, I estimated the period by trial and error. To automate2920the process, we can use the autocorrelation function.29212922\begin{verbatim}2923lags, corrs = autocorr(segment)2924thinkplot.plot(lags, corrs)2925\end{verbatim}29262927Figure~\ref{fig.autocorr8} shows the autocorrelation function for2928the segment starting at $t=0.2$ seconds. The first peak occurs at2929{\tt lag=101}. We can compute the frequency that corresponds2930to that period like this:29312932\begin{verbatim}2933period = lag / segment.framerate2934frequency = 1 / period2935\end{verbatim}29362937The estimated fundamental frequency is 437 Hz. To evaluate the2938precision of the estimate, we can run the same computation with2939lags 100 and 102, which correspond to frequencies 432 and 441 Hz.2940The frequency precision using autocorrelation is less than 10 Hz,2941compared with 100 Hz using the spectrum. In musical terms, the2942expected error is about 30 cents (a third of a semitone).294329442945\section{Correlation as dot product}2946\label{dotproduct}29472948I started the chapter with this definition of Pearson's2949correlation coefficient:2950%2951\[ \rho = \frac{ \sum_i (x_i - \mu_x) (y_i - \mu_y)}{N \sigma_x \sigma_y} \]2952%2953Then I used $\rho$ to define serial correlation and autocorrelation.2954That's consistent with how these terms are used in statistics,2955but in the context of signal processing, the definitions are2956a little different.29572958In signal processing, we are often working with unbiased signals,2959where the mean is 0, and normalized signals, where the standard2960deviation is 1. In that case, the definition of $\rho$ simplifies to:2961%2962\[ \rho = \frac{1}{N} \sum_i x_i y_i \]2963%2964And it is common to simplify even further:2965%2966\[ r = \sum_i x_i y_i \]2967%2968This definition of correlation is not ``standardized'', so it doesn't2969generally fall between -1 and 1. But it has other useful properties.29702971If you think of $x$ and $y$ as vectors, you might recognize this2972formula as the {\bf dot product}, $x \cdot y$. See2973\url{http://en.wikipedia.org/wiki/Dot_product}.29742975\newcommand{\norm}{\mathrm{norm}}29762977The dot product indicates the degree to which the signals are similar.2978If they are normalized so their standard deviations are 1,2979%2980\[ x \cdot y = \cos \theta \]2981%2982where $\theta$ is the angle between the vectors. And that explains2983why Figure~\ref{fig.autocorr2} is a cosine curve.298429852986\section{Using NumPy}2987\label{correlate}29882989\begin{figure}2990% autocorr.py2991\centerline{\includegraphics[height=2.5in]{figs/autocorr9.eps}}2992\caption{Autocorrelation function computed with {\tt np.correlate}.}2993\label{fig.autocorr9}2994\end{figure}29952996NumPy provides a function, {\tt correlate}, that computes2997the correlation of two functions or the autocorrelation of one2998function. We can use it to compute the autocorrelation of2999the segment from the previous section:30003001\begin{verbatim}3002corrs2 = np.correlate(segment.ys, segment.ys, mode='same')3003\end{verbatim}30043005The option {\tt mode} tells {\tt correlate} what range3006of {\tt lag} to use. With the value {\tt 'same'}, the3007range is from $-N/2$ to $N/2$, where $N$ is the length of the3008wave array.30093010Figure~\ref{fig.autocorr9} shows the result. It is symmetric because3011the two signals are identical, so a negative lag on one has the same3012effect as a positive lag on the other. To compare with the results3013from {\tt autocorr}, we can select the second half:30143015\begin{verbatim}3016N = len(corrs2)3017half = corrs2[N//2:]3018\end{verbatim}30193020If you compare Figure~\ref{fig.autocorr9} to Figure~\ref{fig.autocorr8},3021you'll notice that the correlations computed by {\tt np.correlate}3022get smaller as the lags increase. That's because {\tt np.correlate}3023uses the unstandardized definition of correlation;3024as the lag gets bigger, the3025overlap between the two signals gets smaller, so the magnitude of3026the correlations decreases.30273028We can correct that by dividing through by the lengths:30293030\begin{verbatim}3031lengths = range(N, N//2, -1)3032half /= lengths3033\end{verbatim}30343035Finally, we can standardize the results so the correlation with3036{\tt lag=0} is 1.30373038\begin{verbatim}3039half /= half[0]3040\end{verbatim}30413042With these adjustments, the results computed by {\tt autocorr} and3043{\tt np.correlate} are nearly the same. They still differ by30441-2\%. The reason is not important, but if you are curious: {\tt autocorr}3045standardizes the correlations independently for each lag; for3046{\tt np.correlate}, we standardized them all at the end.30473048More importantly, now you know what autocorrelation is, how to3049use it to estimate the fundamental period of a signal, and two3050ways to compute it.305130523053\section{Exercises}30543055Solutions to these exercises are in {\tt chap05soln.ipynb}.30563057\begin{exercise}3058The IPython notebook for this chapter, {\tt chap05.ipynb}, includes3059an interaction that lets you compute autocorrelations for different3060lags. Use this interaction to estimate the pitch of the vocal chirp3061for a few different start times.3062\end{exercise}306330643065\begin{exercise}3066The example code in \verb"chap05.ipynb" shows how to use autocorrelation3067to estimate the fundamental frequency of a periodic signal.3068Encapsulate this code in a function called \verb"estimate_fundamental",3069and use it to track the pitch of a recorded sound.30703071To see how well it works, try superimposing your pitch estimates on a3072spectrogram of the recording.3073\end{exercise}307430753076\begin{exercise}3077If you did the exercises in the previous chapter, you downloaded3078the historical price of BitCoins and estimated the power spectrum3079of the price changes. Using the same data, compute the autocorrelation3080of BitCoin prices. Does the autocorrelation function drop off quickly?3081Is there evidence of periodic behavior?3082\end{exercise}308330843085\begin{exercise}3086In the repository for this book you will find an IPython notebook3087called \verb"saxophone.ipynb" that explores autocorrelation,3088pitch perception, and a phenomenon called the ``missing fundamental''.3089Read through this notebook and run the examples. Try selecting3090a different segment of the recording and running the examples again.3091\end{exercise}3092309330943095\chapter{Discrete cosine transform}3096\label{dct}30973098The topic of this chapter is the {\bf Discrete Cosine3099Transform} (DCT), which is used in MP3 and related formats for3100compressing music; JPEG and similar formats for images; and the MPEG3101family of formats for video.31023103DCT is similar in many ways to the Discrete Fourier Transform (DFT),3104which we have been using for spectral analysis.3105Once we learn how DCT works, it will be easier to explain DFT.31063107Here are the steps to get there:31083109\begin{enumerate}31103111\item We'll start with the synthesis problem: given a set of frequency3112components and their amplitudes, how can we construct a wave?31133114\item Next we'll rewrite the synthesis problem using NumPy arrays.3115This move is good for performance, and also provides insight3116for the next step.31173118\item We'll look at the analysis problem: given a signal and a3119set of frequencies, how can we find the amplitude of each frequency3120component? We'll start with a solution that is conceptually simple3121but slow.31223123\item Finally, we'll use some principles from linear algebra to find a3124more efficient algorithm. If you already know linear algebra,3125that's great, but I'll explain what you need as we go.31263127\end{enumerate}31283129The code for this chapter is in {\tt3130chap06.ipynb} which is in the repository for this book (see3131Section~\ref{code}).3132You can also view it at \url{http://tinyurl.com/thinkdsp06}.313331343135\section{Synthesis}3136\label{synth1}31373138Suppose I give you a list of amplitudes and a list of frequencies,3139and ask you to construct a signal that is the sum of these frequency3140components. Using objects in the {\tt thinkdsp} module, there is3141a simple way to perform this operation, which is called {\bf synthesis}:31423143\begin{verbatim}3144def synthesize1(amps, fs, ts):3145components = [thinkdsp.CosSignal(freq, amp)3146for amp, freq in zip(amps, fs)]3147signal = thinkdsp.SumSignal(*components)31483149ys = signal.evaluate(ts)3150return ys3151\end{verbatim}31523153{\tt amps} is a list of amplitudes, {\tt fs} is the list3154of frequencies, and {\tt ts} is the sequence3155of times where the signal should be evaluated.31563157{\tt components} is a list of {\tt CosSignal} objects, one for3158each amplitude-frequency pair. {\tt SumSignal} represents the3159sum of these frequency components.31603161Finally, {\tt evaluate} computes the value of the signal at each3162time in {\tt ts}.31633164We can test this function like this:31653166\begin{verbatim}3167amps = np.array([0.6, 0.25, 0.1, 0.05])3168fs = [100, 200, 300, 400]3169framerate = 1102531703171ts = np.linspace(0, 1, framerate)3172ys = synthesize1(amps, fs, ts)3173wave = thinkdsp.Wave(ys, framerate)3174\end{verbatim}31753176This example makes a signal that contains a fundamental frequency at3177100 Hz and three harmonics (100 Hz is a sharp G2). It renders the3178signal for one second at 11,025 frames per second and puts the results3179into a Wave object.31803181Conceptually, synthesis is pretty simple. But in this form it doesn't3182help much with {\bf analysis}, which is the inverse problem: given the3183wave, how do we identify the frequency components and their amplitudes?318431853186\section{Synthesis with arrays}3187\label{synthesis}31883189\begin{figure}3190\input{diagram}3191\caption{Synthesis with arrays.}3192\label{fig.synthesis}3193\end{figure}31943195Here's another way to write {\tt synthesize}:31963197\begin{verbatim}3198def synthesize2(amps, fs, ts):3199args = np.outer(ts, fs)3200M = np.cos(PI2 * args)3201ys = np.dot(M, amps)3202return ys3203\end{verbatim}32043205This function looks very different, but it does the same thing.3206Let's see how it works:32073208\begin{enumerate}32093210\item {\tt np.outer} computes the outer product of {\tt ts} and3211{\tt fs}. The result is an array with one row for each element3212of {\tt ts} and one column for each element of {\tt fs}. Each3213element in the array is the product of a frequency and a time, $f3214t$.32153216\item We multiply {\tt args} by $2 \pi$ and apply {\tt cos}, so each3217element of the result is $\cos (2 \pi f t)$. Since the {\tt ts} run3218down the columns, each column contains a cosine signal at a3219particular frequency, evaluated at a sequence of times.32203221\item {\tt np.dot} multiplies each row of {\tt M} by {\tt amps},3222element-wise, and then adds up the products. In terms of linear3223algebra, we are multiplying a matrix, {\tt M}, by a vector, {\tt3224amps}. In terms of signals, we are computing the weighted sum3225of frequency components.32263227\end{enumerate}32283229Figure~\ref{fig.synthesis} shows the structure of this computation.3230Each row of the matrix, {\tt M}, corresponds to a time3231from 0.0 to 1.0 seconds; $t_n$ is the time of the $n$th row.3232Each column corresponds to a frequency from3233100 to 400 Hz; $f_k$ is the frequency of the $k$th column.32343235I labeled the $n$th row with the letters $a$ through $d$; as an3236example, the value of $a$ is $\cos [2 \pi (100) t_n]$.32373238The result of the dot product, {\tt ys}, is a vector with one element3239for each row of {\tt M}. The $n$th element, labeled $e$, is the sum3240of products:3241%3242\[ e = 0.6 a + 0.25 b + 0.1 c + 0.05 d \]3243%3244And likewise with the other elements of {\tt ys}. So each element3245of {\tt y} is the sum of four frequency components, evaluated at3246a point in time, and multiplied by the corresponding amplitudes.3247And that's exactly what we wanted.32483249We can use the code from the previous section to check that the two3250versions of {\tt synthesize} produce the same results.32513252\begin{verbatim}3253ys1 = synthesize1(amps, fs, ts)3254ys2 = synthesize2(amps, fs, ts)3255max(abs(ys1 - ys2))3256\end{verbatim}32573258The biggest difference between {\tt ys1} and {\tt ys2} is about {\tt32591e-13}, which is what we expect due to floating-point errors.32603261Writing this computation in terms of linear algebra makes the code3262smaller and faster. Linear algebra3263provides concise notation for operations on matrices and vectors. For3264example, we could write {\tt synthesize} like this:3265%3266\begin{eqnarray*}3267M &=& \cos (2 \pi t \otimes f) \\3268y &=& M a3269\end{eqnarray*}3270%3271where $a$ is a vector of amplitudes,3272$t$ is a vector of times, $f$ is a vector of frequencies, and3273$\otimes$ is the symbol for the outer product of two vectors.327432753276\section{Analysis}3277\label{analysis}32783279Now we are ready to solve the analysis problem. Suppose I give you3280a wave and tell you that it is the sum of cosines with a given set3281of frequencies. How would you find the amplitude for each frequency3282component? In other words, given {\tt ys}, {\tt ts} and {\tt fs},3283can you recover {\tt amps}?32843285In terms of linear algebra, the first step is the same as for3286synthesis: we compute $M = \cos (2 \pi t \otimes f)$. Then we want3287to find $a$ so that $y = M a$; in other words, we want to solve a3288linear system. NumPy provides {\tt linalg.solve}, which does3289exactly that.32903291Here's what the code looks like:32923293\begin{verbatim}3294def analyze1(ys, fs, ts):3295args = np.outer(ts, fs)3296M = np.cos(PI2 * args)3297amps = np.linalg.solve(M, ys)3298return amps3299\end{verbatim}33003301The first two lines use {\tt ts} and {\tt fs} to build the3302matrix, {\tt M}. Then {\tt np.linalg.solve} computes {\tt amps}.33033304But there's a hitch. In general we can only solve a system of linear3305equations if the matrix is square; that is, the number of equations3306(rows) is the same as the number of unknowns (columns).33073308In this example, we have only 4 frequencies, but we evaluated the3309signal at 11,025 times. So we have many more equations than unknowns.33103311In general if {\tt ys}3312contains more than 4 elements, it is unlikely that we can analyze it3313using only 4 frequencies.33143315But in this case, we know that the {\tt ys} were actually generated by3316adding only 4 frequency components, so we can use any 4 values from3317the wave array to recover {\tt amps}.33183319For simplicity, I'll use the first 4 samples from the signal.3320Using the values of {\tt ys}, {\tt fs} and {\tt ts} from3321the previous section, we can run {\tt analyze1} like this:33223323\begin{verbatim}3324n = len(fs)3325amps2 = analyze1(ys[:n], fs, ts[:n])3326\end{verbatim}33273328And sure enough, {\tt amps2} is33293330\begin{verbatim}3331[ 0.6 0.25 0.1 0.05 ]3332\end{verbatim}33333334This algorithm works, but it is slow. Solving a linear3335system of equations takes time proportional to $n^3$, where $n$ is3336the number of columns in $M$. We can do better.333733383339\section{Orthogonal matrices}33403341One way to solve linear systems is by inverting matrices. The3342inverse of a matrix $M$ is written $M^{-1}$, and it has the property3343that $M^{-1}M = I$. $I$ is the identity matrix, which has3344the value 1 on all diagonal elements and 0 everywhere else.33453346So, to solve the equation $y = Ma$, we can multiply both sides by3347$M^{-1}$, which yields:3348%3349\[ M^{-1}y = M^{-1} M a \]3350%3351On the right side, we can replace $M^{-1}M$ with $I$:3352%3353\[ M^{-1}y = I a \]3354%3355If we multiply $I$ by any vector $a$, the result is $a$, so3356%3357\[ M^{-1}y = a \]3358%3359This implies that if we can compute $M^{-1}$ efficiently, we can find3360$a$ with a simple matrix multiplication (using {\tt np.dot}). That3361takes time proportional to $n^2$, which is better than $n^3$.33623363Inverting a matrix is slow, in general, but some special cases are3364faster. In particular, if $M$ is {\bf orthogonal}, the inverse of $M$3365is just the transpose of $M$, written $M^T$. In NumPy3366transposing an array is a constant-time operation. It3367doesn't actually move the elements of the array; instead, it creates a3368``view'' that changes the way the elements are accessed.33693370Again, a matrix is orthogonal if its transpose is also its inverse;3371that is, $M^T = M^{-1}$. That implies that $M^TM = I$, which means we3372can check whether a matrix is orthogonal by computing $M^TM$.33733374So let's see what the matrix looks like in {\tt synthesize2}. In3375the previous example, $M$ has 11,025 rows, so it might be a good idea3376to work with a smaller example:33773378\begin{verbatim}3379def test1():3380amps = np.array([0.6, 0.25, 0.1, 0.05])3381N = 4.03382time_unit = 0.0013383ts = np.arange(N) / N * time_unit3384max_freq = N / time_unit / 23385fs = np.arange(N) / N * max_freq3386ys = synthesize2(amps, fs, ts)3387\end{verbatim}33883389{\tt amps} is the same vector of amplitudes we saw before.3390Since we have 4 frequency components, we'll sample the signal3391at 4 points in time. That way, $M$ is square.33923393{\tt ts} is a vector of equally spaced sample times in the range from33940 to 1 time unit. I chose the time unit to be 1 millisecond, but it3395is an arbitrary choice, and we will see in a minute that it drops out3396of the computation anyway.33973398Since the frame rate is $N$ samples per time unit, the Nyquist3399frequency is \verb"N / time_unit / 2", which is 2000 Hz in this3400example. So {\tt fs} is a vector of equally spaced frequencies3401between 0 and 2000 Hz.34023403With these values of {\tt ts} and {\tt fs}, the matrix, $M$, is:34043405\begin{verbatim}3406[[ 1. 1. 1. 1. ]3407[ 1. 0.707 0. -0.707]3408[ 1. 0. -1. -0. ]3409[ 1. -0.707 -0. 0.707]]3410\end{verbatim}34113412You might recognize 0.707 as an approximation of $\sqrt{2}/2$,3413which is $\cos \pi/4$. You also might notice that this matrix3414is {\bf symmetric}, which means that the element at $(j, k)$ always3415equals the element at $(k, j)$. This implies that $M$ is its own3416transpose; that is, $M^T = M$.34173418But sadly, $M$ is not orthogonal. If we compute $M^TM$, we get:34193420\begin{verbatim}3421[[ 4. 1. -0. 1.]3422[ 1. 2. 1. -0.]3423[-0. 1. 2. 1.]3424[ 1. -0. 1. 2.]]3425\end{verbatim}34263427And that's not the identity matrix.342834293430\section{DCT-IV}3431\label{dctiv}34323433But if we choose {\tt ts} and {\tt fs} carefully,3434we can make $M$ orthogonal. There are several ways to do it, which3435is why there are several versions of the discrete cosine transform (DCT).34363437One simple option is to shift {\tt ts} and {\tt fs} by a half unit.3438This version is called DCT-IV, where ``IV'' is a roman numeral3439indicating that this is the fourth of eight versions of the DCT.34403441Here's an updated version of {\tt test1}:34423443\begin{verbatim}3444def test2():3445amps = np.array([0.6, 0.25, 0.1, 0.05])3446N = 4.03447ts = (0.5 + np.arange(N)) / N3448fs = (0.5 + np.arange(N)) / 23449ys = synthesize2(amps, fs, ts)3450\end{verbatim}34513452If you compare this to the previous version, you'll notice3453two changes. First, I added 0.5 to {\tt ts} and {\tt fs}.3454Second, I cancelled out \verb"time_units", which simplifies3455the expression for {\tt fs}.34563457With these values, $M$ is34583459\begin{verbatim}3460[[ 0.981 0.831 0.556 0.195]3461[ 0.831 -0.195 -0.981 -0.556]3462[ 0.556 -0.981 0.195 0.831]3463[ 0.195 -0.556 0.831 -0.981]]3464\end{verbatim}34653466And $M^TM$ is34673468\begin{verbatim}3469[[ 2. 0. 0. 0.]3470[ 0. 2. -0. 0.]3471[ 0. -0. 2. -0.]3472[ 0. 0. -0. 2.]]3473\end{verbatim}34743475Some of the off-diagonal elements are displayed as -0, which means3476that the floating-point representation is a small negative number.3477This matrix is very close to $2I$, which means $M$ is almost3478orthogonal; it's just off by a factor of 2. And for our purposes,3479that's good enough.34803481Because $M$ is symmetric and (almost) orthogonal, the inverse of $M$3482is just $M/2$. Now we can write a more efficient version of {\tt3483analyze}:34843485\begin{verbatim}3486def analyze2(ys, fs, ts):3487args = np.outer(ts, fs)3488M = np.cos(PI2 * args)3489amps = np.dot(M, ys) / 23490return amps3491\end{verbatim}34923493Instead of using {\tt np.linalg.solve}, we just multiply3494by $M/2$.34953496Combining {\tt test2} and {\tt analyze2}, we can write an3497implementation of DCT-IV:34983499\begin{verbatim}3500def dct_iv(ys):3501N = len(ys)3502ts = (0.5 + np.arange(N)) / N3503fs = (0.5 + np.arange(N)) / 23504args = np.outer(ts, fs)3505M = np.cos(PI2 * args)3506amps = np.dot(M, ys) / 23507return amps3508\end{verbatim}35093510Again, {\tt ys} is the wave array. We don't have to pass3511{\tt ts} and {\tt fs} as parameters; \verb"dct_iv" can3512figure them out based on {\tt N}, the length of {\tt ys}.35133514If we've got it right, this function should solve the analysis3515problem; that is, given {\tt ys} it should be able to recover3516{\tt amps}. We can test it like this.35173518\begin{verbatim}3519amps = np.array([0.6, 0.25, 0.1, 0.05])3520N = 4.03521ts = (0.5 + np.arange(N)) / N3522fs = (0.5 + np.arange(N)) / 23523ys = synthesize2(amps, fs, ts)3524amps2 = dct_iv(ys)3525max(abs(amps - amps2))3526\end{verbatim}35273528Starting with {\tt amps}, we synthesize a wave array, then use3529\verb"dct_iv" to compute {\tt amps2}. The biggest3530difference between {\tt amps} and {\tt amps2} is about {\tt 1e-16},3531which is what we expect due to floating-point errors.353235333534\section{Inverse DCT}35353536Finally, notice that {\tt analyze2} and {\tt synthesize2} are almost3537identical. The only difference is that {\tt analyze2} divides the3538result by 2. We can use this insight to compute the inverse DCT:35393540\begin{verbatim}3541def inverse_dct_iv(amps):3542return dct_iv(amps) * 23543\end{verbatim}35443545\verb"inverse_dct_iv" solves the synthesis problem: it3546takes the vector of amplitudes and returns3547the wave array, {\tt ys}. We can test it by starting3548with {\tt amps}, applying \verb"inverse_dct_iv" and \verb"dct_iv",3549and testing that we get back what we started with.35503551\begin{verbatim}3552amps = [0.6, 0.25, 0.1, 0.05]3553ys = inverse_dct_iv(amps)3554amps2 = dct_iv(ys)3555max(abs(amps - amps2))3556\end{verbatim}35573558Again, the biggest difference is about {\tt 1e-16}.355935603561\section{The Dct class}35623563\begin{figure}3564% dct.py3565\centerline{\includegraphics[height=2.5in]{figs/dct1.eps}}3566\caption{DCT of a triangle signal at 400 Hz, sampled at 10 kHz.}3567\label{fig.dct1}3568\end{figure}35693570{\tt thinkdsp} provides a Dct class that encapsulates the3571DCT in the same way the Spectrum class encapsulates the FFT.3572To make a Dct object, you can invoke \verb"make_dct" on a Wave.35733574\begin{verbatim}3575signal = thinkdsp.TriangleSignal(freq=400)3576wave = signal.make_wave(duration=1.0, framerate=10000)3577dct = wave.make_dct()3578dct.plot()3579\end{verbatim}35803581The result is the DCT of a triangle wave at 400 Hz, shown in3582Figure~\ref{dct1}. The values of the DCT can be positive or negative;3583a negative value in the DCT corresponds to a negated cosine or,3584equivalently, to a cosine shifted by 180 degrees.35853586\verb"make_dct" uses DCT-II, which is the most common type of DCT,3587provided by {\tt scipy.fftpack}.35883589\begin{verbatim}3590import scipy.fftpack35913592# class Wave:3593def make_dct(self):3594N = len(self.ys)3595hs = scipy.fftpack.dct(self.ys, type=2)3596fs = (0.5 + np.arange(N)) / 23597return Dct(hs, fs, self.framerate)3598\end{verbatim}35993600The results from {\tt dct} are stored in {\tt hs}. The corresponding3601frequencies, computed as in Section~\ref{dctiv}, are stored in {\tt fs}.3602And then both are used to initialize the Dct object.36033604Dct provides \verb"make_wave", which performs the inverse DCT.3605We can test it like this:36063607\begin{verbatim}3608wave2 = dct.make_wave()3609max(abs(wave.ys-wave2.ys))3610\end{verbatim}36113612The biggest difference between {\tt ys1} and {\tt ys2} is about {\tt36131e-16}, which is what we expect due to floating-point errors.36143615\verb"make_wave" uses {\tt scipy.fftpack.idct}:36163617\begin{verbatim}3618# class Dct3619def make_wave(self):3620n = len(self.hs)3621ys = scipy.fftpack.idct(self.hs, type=2) / 2 / n3622return Wave(ys, framerate=self.framerate)3623\end{verbatim}36243625Be default, the inverse DCT doesn't normalize the result, so we have3626to divide through by $2N$.362736283629\section{Exercises}36303631For the following exercises, I provide some starter code in3632{\tt chap06starter.ipynb}.3633Solutions are in {\tt chap06soln.ipynb}.36343635\begin{exercise}3636In this chapter I claim that {\tt analyze1} takes time proportional3637to $n^3$ and {\tt analyze2} takes time proportional to $n^2$. To3638see if that's true, run them on a range of input sizes and time3639them. In IPython, you can use the ``magic command'' \verb"%timeit".36403641If you plot run time versus input size on a log-log scale, you3642should get a straight line with slope 3 for {\tt analyze1} and3643slope 2 for {\tt analyze2}.36443645You also might want to test \verb"dct_iv"3646and {\tt scipy.fftpack.dct}.36473648\end{exercise}364936503651\begin{exercise}3652One of the major applications of the DCT is compression for both3653sound and images. In its simplest form, DCT-based compression3654works like this:36553656\begin{enumerate}36573658\item Break a long signal into segments.36593660\item Compute the DCT of each segment.36613662\item Identify frequency components with amplitudes so low they are3663inaudible, and remove them. Store only the frequencies and3664amplitudes that remain.36653666\item To play back the signal, load the frequencies and amplitudes3667for each segment and apply the inverse DCT.36683669\end{enumerate}36703671Implement a version of this algorithm and apply it to a recording3672of music or speech. How many components can you eliminate before3673the difference is perceptible?36743675In order to make this method practical, you need some way to store a3676sparse array; that is, an array where most of the elements are zero.3677NumPy provides several implementations of sparse arrays, which you can3678read about at3679\url{http://docs.scipy.org/doc/scipy/reference/sparse.html}.3680\end{exercise}368136823683\begin{exercise}3684In the repository for this book you will find an IPython notebook3685called \verb"phase.ipynb" that the effect of phase on sound3686perception.3687Read through this notebook and run the examples.3688Choose another segment of sound and run the same experiments.3689Can you find any general relationships between the phase structure3690of a sound and how we perceive it?3691\end{exercise}36923693369436953696\chapter{Discrete Fourier Transform}3697\label{dft}36983699We've been using the discrete Fourier transform (DFT) since3700Chapter~\ref{sounds}, but I haven't explained how it works. Now is3701the time.37023703If you understand the discrete cosine transform (DCT), you will3704understand the DFT. The only difference is that instead of using the3705cosine function, we'll use the complex exponential function. I'll3706start by explaining complex exponentials, then I'll follow the3707same progression as in Chapter~\ref{dct}:37083709\begin{enumerate}37103711\item We'll start with the synthesis3712problem: given a set of frequency components and their amplitudes,3713how can we construct a signal? The synthesis problem is3714equivalent to the inverse DFT.37153716\item Then I'll rewrite the synthesis problem in the form of matrix3717multiplication using NumPy arrays.37183719\item Next we'll solve the analysis problem, which is equivalent to3720the DFT: given a signal, how to we find the amplitude and phase3721offset of its frequency components?37223723\item Finally, we'll use linear algebra to find a more efficient way3724to compute the DFT.37253726\end{enumerate}37273728The code for this chapter is in {\tt chap07.ipynb}, which is in the3729repository for this book (see Section~\ref{code}).3730You can also view it at \url{http://tinyurl.com/thinkdsp07}.373137323733\section{Complex exponentials}37343735One of the more interesting moves in mathematics is the generalization3736of an operation from one type to another. For example, factorial is a3737function that operates on integers; the natural definition for3738factorial of $n$ is the product of all integers from 1 to $n$.37393740If you are of a certain inclination, you might wonder how to compute3741the factorial of a non-integer like 3.5. Since the natural definition3742doesn't apply, you might look for other ways to compute the factorial3743function, ways that would work with non-integers.37443745In 1730, Leonhard Euler found one, a generalization of the factorial3746function that we know as the gamma function (see3747\url{http://en.wikipedia.org/wiki/Gamma_function}).37483749Euler also found one of the most useful generalizations in applied3750mathematics, the complex exponential function.37513752The natural definition of exponentiation is repeated multiplication.3753For example, $\phi^3 = \phi \cdot \phi \cdot \phi$. But this3754definition doesn't apply to non-integer exponents.37553756However, exponentiation can also be expressed as a power series:3757%3758\[ e^\phi = 1 + \phi + \phi^2/2! + \phi^3/3! + ... \]3759%3760This definition works with real numbers, imaginary numbers and, by a simple3761extension, with complex numbers. Applying this definition3762to a pure imaginary number, $i\phi$, we get3763%3764\[ e^{i\phi} = 1 + i\phi - \phi^2/2! - i\phi^3/3! + ... \]3765%3766By rearranging terms, we can show that this is equivalent to:3767%3768\[ e^{i\phi} = \cos \phi + i \sin \phi \]3769%3770You can see the derivation at3771\url{http://en.wikipedia.org/wiki/Euler's_formula}.37723773This formula3774implies that $e^{i\phi}$ is a complex number with magnitude 1; if you3775think of it as a point in the complex plane, it is always on the unit3776circle. And if you think of it as a vector, the angle in radians3777between the vector and the positive x-axis is the argument, $\phi$.37783779In the case where the exponent is a complex number, we have:3780%3781\[ e^{a + i\phi} = e^a e^{i\phi} = A e^{i\phi} \]3782%3783where $A$ is a real number that indicates amplitude and3784$e^{i\phi}$ is a unit complex number that indicates angle.37853786NumPy provides a version of {\tt exp} that works with complex numbers:37873788\begin{verbatim}3789>>> phi = 1.53790>>> z = np.exp(1j * phi)3791>>> z3792(0.0707+0.997j)3793\end{verbatim}37943795Python uses {\tt j} to represent the imaginary unit, rather3796than {\tt i}. A number ending in {\tt j} is considered imaginary,3797so {\tt 1j} is just $i$.37983799When the argument to {\tt np.exp} is imaginary or complex, the3800result is a complex number; specifically, a {\tt np.complex128},3801which is represented by two 64-bit floating-point numbers.3802In this example, the result is {\tt 0.0707+0.997j}.38033804Complex numbers have attributes {\tt real} and {\tt imag}:38053806\begin{verbatim}3807>>> z.real38080.07073809>>> z.imag38100.9973811\end{verbatim}38123813To get the magnitude, you can use the built-in function {\tt abs}3814or {\tt np.absolute}:38153816\begin{verbatim}3817>>> abs(z)38181.03819>>> np.absolute(z)38201.03821\end{verbatim}38223823To get the angle, you can use {\tt np.angle}:38243825\begin{verbatim}3826>>> np.angle(z)38271.53828\end{verbatim}38293830This example confirms that $e^{i \phi}$ is a complex number with3831magnitude 1 and angle $\phi$ radians.383238333834\section{Complex signals}38353836If $\phi(t)$ is a function of time, $e^{i \phi(t)}$ is also a function3837of time. Specifically,3838%3839\[ e^{i \phi(t)} = \cos \phi(t) + i \sin \phi(t) \]3840%3841This function describes a quantity that varies in time, so it is3842a signal. Specifically, it is a {\bf complex exponential signal}.38433844In the special case where the frequency of the signal is constant,3845$\phi(t)$ is $2 \pi f t$ and the result is a {\bf complex sinusoid}:3846%3847\[ e^{i 2 \pi f t} = \cos 2 \pi f t + i \sin 2 \pi f t \]3848%3849Or more generally, the signal might start at a phase offset3850$\phi_0$, yielding3851%3852\[ e^{i (2 \pi f t + \phi_0)} \]3853%3854{\tt thinkdsp} provides an implementation of this signal,3855{\tt ComplexSinusoid}:38563857\begin{verbatim}3858class ComplexSinusoid(Sinusoid):38593860def evaluate(self, ts):3861phases = PI2 * self.freq * ts + self.offset3862ys = self.amp * np.exp(1j * phases)3863return ys3864\end{verbatim}38653866{\tt ComplexSinusoid} inherits \verb"__init__" from3867{\tt Sinusoid}. It provides a version of {\tt evaluate}3868that is almost identical to {\tt Sinusoid.evaluate}; the3869only difference is that it uses {\tt np.exp} instead of3870{\tt np.sin}.38713872The result is a NumPy array of complex numbers:38733874\begin{verbatim}3875>>> signal = thinkdsp.ComplexSinusoid(freq=1, amp=0.6, offset=1)3876>>> wave = signal.make_wave(duration=1, framerate=4)3877>>> print(wave.ys)3878[ 0.324+0.505j -0.505+0.324j -0.324-0.505j 0.505-0.324j]3879\end{verbatim}38803881The frequency of this signal is 1 cycle per second; the amplitude3882is 0.6 (in unspecified units); and the phase offset is 1 radian.38833884This example evaluates the signal at 4 places equally spaced between38850 and 1 second. The resulting samples are complex numbers.388638873888\section{The synthesis problem}38893890Just as we did with real sinusoids, we we can create compound signals3891by adding up complex sinusoids with different frequencies. And that3892brings us to the complex version of the synthesis problem: given the3893frequency and amplitude of each complex component, how do we evaluate the3894signal?38953896The simplest solution is to create {\tt ComplexSinusoid} objects and3897add them up.38983899\begin{verbatim}3900def synthesize1(amps, fs, ts):3901components = [thinkdsp.ComplexSinusoid(freq, amp)3902for amp, freq in zip(amps, fs)]3903signal = thinkdsp.SumSignal(*components)3904ys = signal.evaluate(ts)3905return ys3906\end{verbatim}39073908This function is almost identical to {\tt synthesize1} in3909Section~\ref{synth1}; the only difference is that I replaced3910{\tt CosSignal} with {\tt ComplexSinusoid}.39113912Here's an example:39133914\begin{verbatim}3915>>> amps = np.array([0.6, 0.25, 0.1, 0.05])3916>>> fs = [100, 200, 300, 400]3917>>> framerate = 110253918>>> ts = np.linspace(0, 1, framerate)3919>>> ys = synthesize1(amps, fs, ts)3920>>> print(ys)3921[ 1.000 +0.000e+00j 0.995 +9.093e-02j 0.979 +1.803e-01j ...,39220.979 -1.803e-01j 0.995 -9.093e-02j 1.000 -5.081e-15j]3923\end{verbatim}39243925At the lowest level, a complex signal is a sequence of complex3926numbers. But how should we interpret it? We have some intuition for3927real signals: they represent quantities that vary in time; for3928example, a sound signal represents changes in air pressure.3929But nothing we measure in the world yields complex numbers.39303931\begin{figure}3932% dft.py3933\centerline{\includegraphics[height=2.5in]{figs/dft1.eps}}3934\caption{Real and imaginary parts of a mixture of complex sinusoids.}3935\label{fig.dft1}3936\end{figure}39373938So what is a complex signal? I don't have a satisfying answer to this3939question. The best I can offer is two unsatisfying3940answers:39413942\begin{enumerate}39433944\item A complex signal is a mathematical abstraction that is useful3945for computation and analysis, but it does not correspond directly3946with anything in the real world.39473948\item If you like, you can think of a complex signal as a sequence of3949complex numbers that contains two signals as its real and imaginary3950parts.39513952\end{enumerate}39533954Taking the second point of view, we can split the previous3955signal into its real and imaginary parts:39563957\begin{verbatim}3958n = 5003959thinkplot.plot(ts[:n], ys[:n].real, label='real')3960thinkplot.plot(ts[:n], ys[:n].imag, label='imag')3961\end{verbatim}39623963Figure~\ref{fig.dft1} shows a segment of the result. The3964real part is a sum of cosines; the imaginary part is3965a sum of sines. Although the waveforms look different, they3966contain the same frequency components in the same proportions.3967To our ears, they sound the same (in general, we don't hear3968phase offsets).396939703971\section{Synthesis with matrices}3972\label{synthmat}39733974As we saw in Section~\ref{synthesis}, we can also express the synthesis3975problem in terms of matrix multiplication:39763977\begin{verbatim}3978PI2 = np.pi * 239793980def synthesize2(amps, fs, ts):3981args = np.outer(ts, fs)3982M = np.exp(1j * PI2 * args)3983ys = np.dot(M, amps)3984return ys3985\end{verbatim}39863987Again, {\tt amps} is a NumPy array that contains a sequence of3988amplitudes.39893990{\tt fs} is a sequence containing the frequencies of the3991components. {\tt ts} contains the times where we will evaluate3992the signal.39933994{\tt args} contains the outer product of {\tt ts} and {\tt fs},3995with the {\tt ts} running down the rows and the {\tt fs} running3996across the columns (you might want to refer back to3997Figure~\ref{fig.synthesis}).39983999Each column of matrix {\tt M} contains a complex sinusoid with4000a particular frequency, evaluated at a sequence of {\tt ts}.40014002When we multiply {\tt M} by the amplitudes, the result is a vector4003whose elements correspond to the {\tt ts}; each element is the sum of4004several complex sinusoids, evaluated at a particular time.40054006Here's the example from the previous section again:40074008\begin{verbatim}4009>>> ys = synthesize2(amps, fs, ts)4010>>> print(ys)4011[ 1.000 +0.000e+00j 0.995 +9.093e-02j 0.979 +1.803e-01j ...,40120.979 -1.803e-01j 0.995 -9.093e-02j 1.000 -5.081e-15j]4013\end{verbatim}40144015The result is the same.40164017In this example the amplitudes are real, but they could also be4018complex. What effect does a complex amplitude have on the result?4019Remember that we can think of a complex number in two ways: either the4020sum of a real and imaginary part, $x + i y$, or the product of a real4021amplitude and a complex exponential, $A e^{i \phi_0}$. Using the4022second interpretation, we can see what happens when we multiply4023a complex amplitude by a complex sinusoid. For each frequency, $f$,4024we have:4025%4026\[ A e^{i \phi_0} \cdot e^{i 2 \pi f t} = A e^{i 2 \pi f t + \phi_0} \]4027%4028Multiplying by $A e^{i \phi_0}$ multiplies the amplitude by $A$4029and adds the phase offset $\phi_0$.40304031\begin{figure}4032% dft.py4033\centerline{\includegraphics[height=2.5in]{figs/dft2.eps}}4034\caption{Real part of two complex signals that differ by a phase4035offset.}4036\label{fig.dft2}4037\end{figure}40384039We can test that claim by running the previous example with4040$\phi_0 = 1.5$ for all frequency components:40414042\begin{verbatim}4043phi = 1.54044amps2 = amps * np.exp(1j * phi)4045ys2 = synthesize2(amps2, fs, ts)40464047thinkplot.plot(ts[:n], ys.real[:n])4048thinkplot.plot(ts[:n], ys2.real[:n])4049\end{verbatim}40504051Since {\tt amps}4052is an array of reals, multiplying by {\tt np.exp(1j * phi)} yields4053an array of complex numbers with phase offset {\tt phi} radians, and4054the same magnitudes as {\tt amps}.40554056Figure~\ref{fig.dft2} shows the result. The phase offset4057$\phi_0 = 1.5$ shifts the wave to the left by about one quarter of4058a cycle; it also changes the waveform, because the same phase4059offset applied to different frequencies changes how the frequency4060components line up with each other.40614062Now that we have the more general solution to the synthesis problem --4063one that handles complex amplitudes -- we are ready for the analysis4064problem.406540664067\section{The analysis problem}40684069The analysis problem is the inverse of the synthesis problem: given a4070sequence of samples, $y$, and knowing the frequencies4071that make up the signal, can we compute the complex amplitudes of the4072components, $a$?40734074As we saw in Section~\ref{analysis}, we can solve this problem by forming4075the synthesis matrix, $M$, and solving the system of linear4076equations, $M a = y$ for $a$.40774078\begin{verbatim}4079def analyze1(ys, fs, ts):4080args = np.outer(ts, fs)4081M = np.exp(1j * PI2 * args)4082amps = np.linalg.solve(M, ys)4083return amps4084\end{verbatim}40854086{\tt analyze1} takes a (possibly complex) wave array, {\tt ys}, a4087sequence of real frequencies, {\tt fs}, and a sequence of real4088times, {\tt ts}. It returns a sequence of complex amplitudes, {\tt4089amps}.40904091Continuing the previous example, we can confirm that {\tt analyze1}4092recovers the amplitudes we started with. For the linear system4093solver to work, {\tt M} has to be square, so we need {\tt ys}, {\tt4094fs} and {\tt ts} to have the same length. I'll insure that by4095slicing {\tt ys} and {\tt ts} down to the length of {\tt fs}:40964097\begin{verbatim}4098>>> n = len(fs)4099>>> amps2 = analyze1(ys[:n], fs, ts[:n])4100>>> print(amps2)4101[ 0.60+0.j 0.25-0.j 0.10+0.j 0.05-0.j]4102\end{verbatim}41034104These are approximately the amplitudes we started with, although4105each component has a small imaginary part due to4106floating-point errors.410741084109\section{Efficient analysis}41104111Unfortunately, solving a linear system of equations is slow. For the4112DCT, we were able to speed things up by choosing {\tt fs} and {\tt4113ts} so that {\tt M} is orthogonal. That way, the inverse of {\tt M}4114is the transpose of {\tt M}, and we can compute both DCT and inverse4115DCT by matrix multiplication.41164117We'll do the same thing for the DFT, with one small change.4118Since {\tt M} is complex, we need it to be {\bf unitary}, rather4119than orthogonal, which means that the inverse of {\tt M} is4120the conjugate transpose of {\tt M}, which we can compute by4121transposing the matrix and negating the imaginary part of each4122element. See \url{http://en.wikipedia.org/wiki/Unitary_matrix}.41234124The NumPy methods {\tt conj} and {\tt transpose} do what we4125want. Here's the code that computes {\tt M} for $N=4$ components:41264127\begin{verbatim}4128>>> N = 44129>>> ts = np.arange(N) / N4130>>> fs = np.arange(N)4131>>> args = np.outer(ts, fs)4132>>> M = np.exp(1j * PI2 * args)4133\end{verbatim}41344135If $M$ is unitary, $M^*M = I$, where $M^*$ is the conjugate transpose4136of $M$, and $I$ is the identity matrix. We can test whether $M$4137is unitary like this:41384139\begin{verbatim}4140>>> MstarM = M.conj().transpose().dot(M)4141\end{verbatim}41424143The result, within the tolerance of floating-point error, is4144$4 I$, so $M$ is unitary except for an extra factor of $N$,4145similar to the extra factor of 2 we found with the DCT.41464147We can use this result to write a faster version of {\tt analyze1}:41484149\begin{verbatim}4150def analyze2(ys, fs, ts):4151args = np.outer(ts, fs)4152M = np.exp(1j * PI2 * args)4153amps = M.conj().transpose().dot(ys) / N4154return amps4155\end{verbatim}41564157And test it with appropriate values of {\tt fs} and {\tt ts}:41584159\begin{verbatim}4160>>> N = 44161>>> amps = np.array([0.6, 0.25, 0.1, 0.05])4162>>> fs = np.arange(N)4163>>> ts = np.arange(N) / N4164>>> ys = synthesize2(amps, fs, ts)41654166>>> amps23 = analyze2(ys, fs, ts)4167>>> print(amps3)4168[ 0.60+0.j 0.25+0.j 0.10-0.j 0.05-0.j]4169\end{verbatim}41704171Again, the result is correct within the tolerance of floating-point4172arithmetic.417341744175\section{DFT}4176\label{dftsection}41774178As a function, {\tt analyze2} would be hard to use because it4179only works if {\tt fs} and {\tt ts} are chosen correctly.4180Instead, I will rewrite it to take just {\tt ys} and compute {\tt freq}4181and {\tt ts} itself.41824183First, I'll make a function to compute the synthesis matrix, $M$:41844185\begin{verbatim}4186def synthesis_matrix(N):4187ts = np.arange(N) / N4188fs = np.arange(N)4189args = np.outer(ts, fs)4190M = np.exp(1j * PI2 * args)4191return M4192\end{verbatim}41934194Then I'll write the function that takes {\tt ys} and returns4195{\tt amps}:41964197\begin{verbatim}4198def analyze3(ys):4199N = len(ys)4200M = synthesis_matrix(N)4201amps = M.conj().transpose().dot(ys) / N4202return amps4203\end{verbatim}42044205We are almost done; {\tt analyze3} computes something very4206close to the DFT, with one difference. The conventional definition4207of DFT does not divide by {\tt N}:42084209\begin{verbatim}4210def dft(ys):4211N = len(ys)4212M = synthesis_matrix(N)4213amps = M.conj().transpose().dot(ys)4214return amps4215\end{verbatim}42164217Now we can confirm that my version yields the same result as4218FFT:42194220\begin{verbatim}4221>>> dft(ys)4222[ 2.4+0.j 1.0+0.j 0.4-0.j 0.2-0.j]4223\end{verbatim}42244225The result is close to {\tt amps * N}.4226And here's the version in {\tt np.fft}:42274228\begin{verbatim}4229>>> np.fft.fft(ys)4230[ 2.4+0.j 1.0+0.j 0.4-0.j 0.2-0.j]4231\end{verbatim}42324233They are the same, within floating point error.42344235The inverse DFT is almost the same, except we don't have to transpose4236and conjugate $M$, and {\em now} we have to divide through by N:42374238\begin{verbatim}4239def idft(ys):4240N = len(ys)4241M = synthesis_matrix(N)4242amps = M.dot(ys) / N4243return amps4244\end{verbatim}42454246Finally, we can confirm that {\tt dft(idft(amps))} yields {\tt amps}.42474248\begin{verbatim}4249>>> ys = idft(amps)4250>>> dft(ys)4251[ 0.60+0.j 0.25+0.j 0.10-0.j 0.05-0.j]4252\end{verbatim}42534254If I could go back in time, I might change the definition of4255DFT so it divides by $N$ and the inverse DFT doesn't. That would4256be more consistent with my presentation of the synthesis and analysis4257problems.42584259Or I might change the definition so that both operations divide4260through by $\sqrt{N}$. Then the DFT and inverse DFT would be4261more symmetric.42624263But I can't go back in time (yet!), so we're stuck with a slightly4264weird convention. For practical purposes it doesn't really4265matter.426642674268\section{The DFT is periodic}42694270In this chapter I presented the DFT in the form of matrix multiplication.4271We compute the synthesis matrix, $M$, and the analysis matrix, $M^*$.4272When we multiply $M^{*}$ by the wave array, $y$, each element of the4273result is the product of a row from $M^*$ and $y$, which we can4274write in the form of a summation:4275%4276\[ DFT(y)[k] = \sum_n y[n] \exp(-2 \pi i n k / N) \]4277%4278where $k$ is an index of frequency from4279$0$ to $N-1$ and $n$ is an index of time from $0$ to $N-1$.4280So $DFT(y)[k]$ is the $k$th element of the DFT of $y$.42814282Normally we evaluate this summation for $N$ values of $k$, running from42830 to $N-1$. We {\em could} evaluate it for other values of $k$, but4284there is no point, because they start to repeat. That is, the value at4285$k$ is the same as the value at $k+N$ or $k+2N$ or $k-N$, etc.42864287We can see that mathematically by plugging $k+N$ into the summation:4288%4289\[ DFT(y)[k+N] = \sum_n y[n] \exp(-2 \pi i n (k+N) / N) \]4290%4291Since there is a sum in the exponent, we can break it into two parts:4292%4293\[ DFT(y)[k+N] = \sum_n y[n] \exp(-2 \pi i n k / N) \exp(-2 \pi i n N / N) \]4294%4295In the second term, the exponent is always an integer multiple of4296$2 \pi$, so the result is always 1, and we can drop it:4297%4298\[ DFT(y)[k+N] = \sum_n y[n] \exp(-2 \pi i n k / N) \]4299%4300And we can see that this summation is equivalent to $ DFT(y)[k]$.4301So the DFT is periodic, with period $N$. You will need this result4302for one of the exercises below, which asks you to implement the Fast Fourier4303Transform (FFT).43044305As an aside, writing the DFT in the form of a summation provides an4306insight into how it works. If you review the diagram in4307Section~\ref{synthesis}, you'll see that each column of the synthesis matrix4308is a signal evaluated at a sequence of times. The analysis matrix is4309the (conjugate) transpose of the synthesis matrix, so each {\em row}4310is a signal evaluated at a sequence of times.43114312Therefore, each summation is the correlation of $y$ with one of the4313signals in the array (see Section~\ref{dotproduct}). That is, each4314element of the DFT is a correlation that quantifies the similarity of4315the wave array, $y$, and a complex exponential at a particular4316frequency.431743184319\section{DFT of real signals}43204321\begin{figure}4322% dft.py4323\centerline{\includegraphics[height=2.5in]{figs/dft3.eps}}4324\caption{DFT of a 500 Hz sawtooth signal sampled at 10 kHz.}4325\label{fig.dft3}4326\end{figure}43274328The Spectrum class in {\tt thinkdsp} is based on {\tt np.ftt.rfft},4329which computes the ``real DFT''; that is, it works with real signals.4330But the DFT as presented in this chapter is more general than that; it4331works with complex signals.43324333So what happens when we apply the ``full DFT'' to a real signal?4334Let's look at an example:43354336\begin{verbatim}4337>>> signal = thinkdsp.SawtoothSignal(freq=500)4338>>> wave = signal.make_wave(duration=0.1, framerate=10000)4339>>> hs = dft(wave.ys)4340>>> amps = np.absolute(hs)4341\end{verbatim}43424343This code makes a sawtooth wave with frequency 500 Hz, sampled at4344framerate 10 kHz. {\tt hs} contains the complex DFT of the wave;4345{\tt amps} contains the amplitude at each frequency. But what4346frequency do these amplitudes correspond to? If we look at the4347body of {\tt dft}, we see:43484349\begin{verbatim}4350>>> fs = np.arange(N)4351\end{verbatim}43524353It's tempting to think that these values are the right frequencies.4354The problem is that {\tt dft} doesn't know the sampling rate. The DFT4355assumes that the duration of the wave is 1 time unit, so it thinks the4356sampling rate is $N$ per time unit. In order to interpret the4357frequencies, we have to convert from these arbitrary time units back4358to seconds, like this:43594360\begin{verbatim}4361>>> fs = np.arange(N) * framerate / N4362\end{verbatim}43634364With this change, the range of frequencies is from 0 to the actual4365framerate, 10 kHz. Now we can plot the spectrum:43664367\begin{verbatim}4368>>> thinkplot.plot(fs, amps)4369>>> thinkplot.config(xlabel='frequency (Hz)',4370ylabel='amplitude')4371\end{verbatim}43724373Figure~\ref{fig.dft3} shows the amplitude of the signal for each4374frequency component from 0 to 10 kHz. The left half of the figure4375is what we should expect: the dominant frequency is at 500 Hz, with4376harmonics dropping off like $1/f$.43774378But the right half of the figure is a surprise. Past 5000 Hz, the4379amplitude of the harmonics start growing again, peaking at 9500 Hz.4380What's going on?43814382The answer: aliasing. Remember that with framerate 10000 Hz, the4383folding frequency is 5000 Hz. As we saw in Section~\ref{aliasing},4384a component at 5500 Hz is indistinguishable from a component4385at 4500 Hz. When we evaluate the DFT at 5500 Hz, we get the same4386value as at 4500 Hz. Similarly, the value at 6000 Hz is the same4387as the one at 4000 Hz, and so on.43884389The DFT of a real signal is symmetric around the folding frequency.4390Since there is no additional information past this point, we can4391save time by evaluating only the first half of the DFT,4392and that's exactly what {\tt np.fft.rfft} does.4393439443954396\section{Exercises}43974398Solutions to these exercises are in {\tt chap07soln.ipynb}.439944004401\begin{exercise}4402The notebook for this chapter, {\tt chap07.ipynb}, contains4403additional examples and explanations. Read through it and run4404the code.4405\end{exercise}440644074408\begin{exercise}4409In this chapter, I showed how we can express the DFT and inverse DFT4410as matrix multiplications. These operations take time proportional to4411$N^2$, where $N$ is the length of the wave array. That is fast enough4412for many applications, but there is a faster4413algorithm, the Fast Fourier Transform (FFT), which takes time4414proportional to $N \log N$.44154416The key to the FFT is the Danielson-Lanczos lemma:4417%4418\[ DFT(y)[n] = DFT(e)[n] + exp(-2 \pi i n / N) DFT(o)[n] \]4419%4420Where $ DFT(y)[n]$ is the $n$th element of the DFT of $y$; $e$ is a4421wave array containing the even elements of $y$, and $o$ contains the4422odd elements of $y$.44234424This lemma suggests a recursive algorithm for the DFT:44254426\begin{enumerate}44274428\item Given a wave array, $y$, split it into its even elements, $e$,4429and its odd elements, $o$.44304431\item Compute the DFT of $e$ and $o$ by making recursive calls.44324433\item Compute $DFT(y)$ for each value of $n$ using the4434Danielson-Lanczos lemma.44354436\end{enumerate}44374438For the base case of this recursion, you could wait until the length4439of $y$ is 1. In that case, $DFT(y) = y$. Or if the length of $y$4440is sufficiently small, you could compute its DFT by matrix multiplication,4441possibly using a precomputed matrix.44424443Hint: I suggest you implement this algorithm incrementally by starting4444with a version that is not truly recursive. In Step 2, instead of4445making a recursive call, use {\tt dft}, as defined by4446Section~\ref{dftsection}, or {\tt np.fft.fft}. Get Step 3 working,4447and confirm that the results are consistent with the other4448implementations. Then add a base case and confirm that it works.4449Finally, replace Step 2 with recursive calls.44504451One more hint: Remember that the DFT is periodic; you might find {\tt4452np.tile} useful.44534454You can read more about the FFT at4455\url{https://en.wikipedia.org/wiki/Fast_Fourier_transform}.44564457\end{exercise}445844594460\chapter{Filtering and Convolution}44614462In this chapter I present one of the most important and useful4463ideas related to signal processing: the Convolution Theorem.4464But before we can understand the Convolution Theorem, we have to understand4465convolution. I'll start with a simple example, smoothing, and4466we'll go from there.44674468The code for this chapter is in {\tt chap08.ipynb}, which is in the4469repository for this book (see Section~\ref{code}).4470You can also view it at \url{http://tinyurl.com/thinkdsp08}.447144724473\section{Smoothing}4474\label{smoothing}44754476\begin{figure}4477% convolution1.py4478\centerline{\includegraphics[height=2.5in]{figs/convolution1.eps}}4479\caption{Daily closing price of Facebook stock and a 30-day moving4480average.}4481\label{fig.convolution1}4482\end{figure}44834484Smoothing is an operation that tries to remove short-term variations4485from a signal in order to reveal long-term trends. For example, if4486you plot daily changes in the price of a stock, it would look noisy;4487a smoothing operator might make it easier to see whether the price4488was generally going up or down over time.44894490A common smoothing algorithm is a moving average, which computes4491the mean of the previous $n$ values, for some value of $n$.44924493For example, Figure~\ref{fig.convolution1} shows the daily4494closing price of Facebook from May 17, 2012 to December 8,44952015. The gray line4496is the raw data, the darker line shows the 30-day moving average.4497Smoothing removes the most4498extreme changes and makes it easier to see long-term trends.44994500Smoothing operations also apply to sound signals. As an example, I'll4501start with a square wave at 440 Hz. As we saw in4502Section~\ref{square}, the harmonics of a square wave drop off4503slowly, so it contains many high-frequency components.45044505\begin{verbatim}4506>>> signal = thinkdsp.SquareSignal(freq=440)4507>>> wave = signal.make_wave(duration=1, framerate=44100)4508>>> segment = wave.segment(duration=0.01)4509\end{verbatim}45104511{\tt wave} is a 1-second segment of the signal; {\tt segment}4512is a shorter segment I'll use for plotting.45134514To compute the moving average of this signal, I'll create4515a window with 11 elements and normalize it so the elements4516add up to 1.45174518\begin{verbatim}4519>>> window = np.ones(11)4520>>> window /= sum(window)4521\end{verbatim}45224523Now I can compute the average of the first 11 elements by4524multiplying the window by the wave array:45254526\begin{verbatim}4527>>> ys = segment.ys4528>>> N = len(ys)4529>>> padded = thinkdsp.zero_pad(window, N)4530>>> prod = padded * ys4531>>> sum(prod)4532-14533\end{verbatim}45344535{\tt padded} is a version of the window with zeros added to4536the end so it has the same length as {\tt segment.ys}4537{\tt prod} is the product of the window and the wave array.45384539The sum of the elementwise products is the average of the first 114540elements of the array. Since these elements are all -1, their average4541is -1.45424543\begin{figure}4544% convolution2.py4545\centerline{\includegraphics[height=2.5in]{figs/convolution2.eps}}4546\caption{A square signal at 400 Hz (gray) and an 11-element4547moving average.}4548\label{fig.convolution2}4549\end{figure}45504551To compute the next element of the smoothed signal, we shift the4552window to the right and compute the average of the next 11 elements of4553the wave array, starting with the second.45544555\begin{verbatim}4556>>> rolled = np.roll(rolled, 1)4557>>> prod = rolled * ys4558>>> sum(prod)4559-14560\end{verbatim}45614562To compute the moving average, we repeat this process and store4563the result in an array named {\tt smoothed}:45644565\begin{verbatim}4566>>> smoothed = np.zeros_like(ys)4567>>> rolled = padded4568>>> for i in range(N):4569smoothed[i] = sum(rolled * ys)4570rolled = np.roll(rolled, 1)4571\end{verbatim}45724573{\tt rolled} is a copy of {\tt padded} that gets shifted to4574the right by one element each time through the loop. Inside4575the loop, we multiply the segment by {\tt rolled} to select457611 elements, and then add them up.45774578Figure~\ref{fig.convolution2} shows the result. The gray line4579is the original signal; the darker line is the smoothed signal.4580The smoothed signal starts to ramp up when the leading edge of4581the window reaches the first transition, and levels off when4582the window crosses the transition. As a result, the transitions4583are less abrupt, and the corners less sharp. If you listen4584to the smoothed signal, it sounds less buzzy and slightly muffled.458545864587\section{Convolution}4588\label{convolution}45894590The operation we just computed is called {\bf convolution},4591and it is such a common operation that NumPy provides an4592implementation that is simpler and faster than my version:45934594\begin{verbatim}4595>>> convolved = np.convolve(ys, window, mode='valid')4596>>> smooth2 = thinkdsp.Wave(convolved, framerate=wave.framerate)4597\end{verbatim}45984599{\tt np.convolve} computes the convolution of the wave4600array and the window. The mode flag {\tt valid} indicates4601that it should only compute values when the window and the4602wave array overlap completely, so it stops when the right4603edge of the window reaches the end of the wave array. Other4604than that, the result is the same as in Figure~\ref{fig.convolution2}.46054606\newcommand{\conv}{\ast}46074608Actually, there is one other difference. The loop in the4609previous section actually computes {\bf cross-correlation}:4610%4611\[ (f \star g)[n] = \sum_{m=0}^{N-1} f[m] g[n+m] \]4612%4613where $f$ is a wave array with length $N$, $g$ is the window,4614and $\star$ is the symbol for cross-correlation. To4615compute the $n$th element of the result, we shift $g$ to4616the right, which is why the index is $n+m$.46174618The definition of convolution is slightly different:4619%4620\[ (f \conv g)[n] = \sum_{m=0}^{N-1} f[m] g[n-m] \]4621%4622The symbol $\conv$ represents convolution. The difference is in the4623index of $g$: $m$ has been negated, so the summation iterates the4624elements of $g$ backward (assuming that negative indices wrap around4625to the end of the array).46264627Because the window we used in this example is symmetric,4628cross-correlation and convolution yield the same result. When we use4629other windows, we will have to be more careful.46304631You might wonder why convolution is defined the way it is. There4632are two reasons:46334634\begin{itemize}46354636\item This definition comes up naturally for several applications,4637especially analysis of signal-processing systems, which is4638the topic of Chapter~\ref{systems}.46394640\item Also, this definition is the basis of the Convolution Theorem,4641coming up very soon.46424643\end{itemize}464446454646\section{The frequency domain}46474648\begin{figure}4649% convolution4.py4650\centerline{\includegraphics[height=2.5in]{figs/convolution4.eps}}4651\caption{Spectrum of the square wave before and after smoothing.}4652\label{fig.convolution4}4653\end{figure}46544655Smoothing makes the transitions in a square signal less abrupt,4656and makes the sound slightly muffled. Let's see what effect this4657operation has on the spectrum. First I'll plot the spectrum4658of the original wave:46594660\begin{verbatim}4661>>> spectrum = wave.make_spectrum()4662>>> spectrum.plot(color=GRAY)4663\end{verbatim}46644665Then the smoothed wave:46664667\begin{verbatim}4668>>> convolved = np.convolve(wave.ys, window, mode='same')4669>>> smooth = thinkdsp.Wave(convolved, framerate=wave.framerate)4670>>> spectrum2 = smooth.make_spectrum()4671>>> spectrum2.plot()4672\end{verbatim}46734674The mode flag {\tt same} indicates that the result should have the4675same length as the input. In this example, it will include a few values4676that ``wrap around'', but that's ok for now.46774678Figure~\ref{fig.convolution4} shows the result. The fundamental4679frequency is almost unchanged; the first few harmonics are4680attenuated, and the higher harmonics are almost eliminated. So4681smoothing has the effect of a low-pass filter, which we4682saw in Section~\ref{spectrums} and Section~\ref{pink}.46834684To see how much each component has been attenuated, we can4685compute the ratio of the two spectrums:46864687\begin{verbatim}4688>>> amps = spectrum.amps4689>>> amps2 = spectrum2.amps4690>>> ratio = amps2 / amps4691>>> ratio[amps<560] = 04692>>> thinkplot.plot(ratio)4693\end{verbatim}46944695{\tt ratio} is the ratio of the amplitude before and after4696smoothing. When {\tt amps} is small, this ratio can be big4697and noisy, so for simplicity I set the ratio to 0 except4698where the harmonics are.46994700\begin{figure}4701% convolution5.py4702\centerline{\includegraphics[height=2.5in]{figs/convolution5.eps}}4703\caption{Ratio of spectrums for the square wave, before and after smoothing.}4704\label{fig.convolution5}4705\end{figure}47064707Figure~\ref{fig.convolution5} shows the result. As expected, the4708ratio is high for low frequencies and drops off at a cutoff frequency4709near 4000 Hz. But there is another feature we did not expect: above4710the cutoff, the ratio bounces around between 0 and 0.2.4711What's up with that?471247134714\section{The convolution theorem}4715\label{convtheorem}47164717\begin{figure}4718% convolution6.py4719\centerline{\includegraphics[height=2.5in]{figs/convolution6.eps}}4720\caption{Ratio of spectrums for the square wave, before and after4721smoothing, along with the DFT of the smoothing window.}4722\label{fig.convolution6}4723\end{figure}47244725\newcommand{\DFT}{\mathrm{DFT}}4726\newcommand{\IDFT}{\mathrm{IDFT}}47274728The answer is the convolution theorem. Stated mathematically:4729%4730\[ \DFT(f \conv g) = \DFT(f) \cdot \DFT(g) \]4731%4732where $f$ is a wave array and $g$ is a window. In words,4733the convolution theorem says that if we convolve $f$ and $g$,4734and then compute the DFT, we get the same answer as computing4735the DFT of $f$ and $g$, and then multiplying the results4736element-wise. More concisely, convolution in the time4737domain corresponds to multiplication in the frequency domain.47384739And that explains Figure~\ref{fig.convolution5}, because when we4740convolve a wave and a window, we multiply the spectrum of4741the wave with the spectrum of the window. To see how that works,4742we can compute the DFT of the window:47434744\begin{verbatim}4745padded = zero_pad(window, N)4746dft_window = np.fft.rfft(padded)4747thinkplot.plot(abs(dft_window))4748\end{verbatim}47494750{\tt padded} contains the window, padded with zeros to be the4751same length as {\tt wave}. \verb"dft_window" contains the4752DFT of the smoothing window.47534754\newcommand{\abs}{\mathrm{abs}}47554756Figure~\ref{fig.convolution6} shows the result, along with the4757ratios we computed in the previous section. The ratios are4758exactly the amplitudes in \verb"dft_window". Mathematically:4759%4760\[ \abs(\DFT(f \conv g)) / \abs(\DFT(f)) = \abs(\DFT(g)) \]4761%4762In this context, the DFT of a window is called a {\bf filter}.4763For any convolution window in the time domain, there is a4764corresponding filter in the frequency domain. And for any4765filter than can be expressed by element-wise multiplication in4766the frequency domain, there is a corresponding window.476747684769\section{Gaussian filter}47704771\begin{figure}4772% convolution7.py4773\centerline{\includegraphics[height=2.5in]{figs/convolution7.eps}}4774\caption{Boxcar and Gaussian windows.}4775\label{fig.convolution7}4776\end{figure}47774778The moving average window we used in the previous section is a4779low-pass filter, but it is not a very good one. The DFT drops off4780steeply at first, but then it bounces around. Those bounces are4781called {\bf sidelobes}, and they are there because the moving average4782window is like a square wave, so its spectrum contains high-frequency4783harmonics that drop off proportionally to $1/f$, which is relatively4784slow.47854786We can do better with a Gaussian window. SciPy provides functions4787that compute many common convolution windows, including {\tt4788gaussian}:47894790\begin{verbatim}4791gaussian = scipy.signal.gaussian(M=11, std=2)4792gaussian /= sum(gaussian)4793\end{verbatim}47944795{\tt M} is the number of elements in the window; {\tt std}4796is the standard deviation of the Gaussian distribution used to4797compute it. Figure~\ref{fig.convolution7} shows the shape4798of the window. It is a discrete approximation of the Gaussian4799``bell curve''. The figure also shows the moving average window4800from the previous example, which is sometimes called a4801``boxcar'' window because it looks like a rectangular railway car.48024803\begin{figure}4804% convolution.py4805\centerline{\includegraphics[height=2.5in]{figs/convolution8.eps}}4806\caption{Ratio of spectrums before and after Gaussian smoothing, and4807the DFT of the window.}4808\label{fig.convolution8}4809\end{figure}48104811I ran the computations from the previous sections again4812with this curve, and generated Figure~\ref{fig.convolution8},4813which shows the ratio of the spectrums before and after4814smoothing, along with the DFT of the Gaussian window.48154816As a low-pass filter, Gaussian smoothing is better than a simple4817moving average. After the ratio drops off, it stays low, with almost4818none of the sidelobes we saw with the boxcar window. So it does a4819better job of cutting off the higher frequencies.48204821The reason it does so well is that the DFT of a Gaussian curve is also a4822Gaussian curve. So the ratio drops off in proportion to $\exp(-f^2)$,4823which is much faster than $1/f$.482448254826\section{Efficient convolution}4827\label{effconv}48284829One of the reasons the FFT is such an important algorithm is that,4830combined with the Convolution Theorem, it provides an efficient4831way to compute convolution, cross-correlation, and autocorrelation.48324833Again, the Convolution Theorem states4834%4835\[ \DFT(f \conv g) = \DFT(f) \cdot \DFT(g) \]4836%4837So one way to compute a convolution is:4838%4839\[ f \conv g = \IDFT(\DFT(f) \cdot \DFT(g)) \]4840%4841where $IDFT$ is the inverse DFT. A simple implementation of4842convolution takes time proportional to $N^2$; this algorithm,4843using FFT, takes time proportional to $N \log N$.48444845We can confirm that it works by computing the same convolution4846both ways. As an example, I'll apply it to the BitCoin data4847shown in Figure~\ref{fig.convolution1}.48484849\begin{verbatim}4850df = pandas.read_csv('coindesk-bpi-USD-close.csv',4851nrows=1625,4852parse_dates=[0])4853ys = df.Close.values4854\end{verbatim}48554856This example uses Pandas to read the data from the CSV file (included4857in the repository for this book). If you are not familiar with4858Pandas, don't worry: I'm not going to do much with it in this book.4859But if you're interested, you can learn more about it in4860{\it Think Stats} at \url{http://thinkstats2.com}.48614862The result, {\tt df}, is a DataFrame, one of the data structures4863provided by Pandas. {\tt ys} is a NumPy array that contains daily4864closing prices.48654866Next I'll create a Gaussian window and convolve it with {\tt ys}:48674868\begin{verbatim}4869window = scipy.signal.gaussian(M=30, std=6)4870window /= window.sum()4871smoothed = np.convolve(ys, window, mode='valid')4872\end{verbatim}48734874\verb"fft_convolve" computes the same thing using FFT:48754876\begin{verbatim}4877from np.fft import fft, ifft48784879def fft_convolve(signal, window):4880fft_signal = fft(signal)4881fft_window = fft(window)4882return ifft(fft_signal * fft_window)4883\end{verbatim}48844885We can test it by padding the window to the same length4886as {\tt ys} and then computing the convolution:48874888\begin{verbatim}4889padded = zero_pad(window, N)4890smoothed2 = fft_convolve(ys, padded)4891\end{verbatim}48924893The result has $M-1$ bogus values at the beginning, where $M$ is the4894length of the window. If we slice off the bogus values, the result4895agrees with \verb"fft_convolve" with about 12 digits of precision.48964897\begin{verbatim}4898M = len(window)4899smoothed2 = smoothed2[M-1:]4900\end{verbatim}490149024903\section{Efficient autocorrelation}49044905\begin{figure}4906% convolution.py4907\centerline{\includegraphics[height=2.5in]{figs/convolution9.eps}}4908\caption{Autocorrelation functions computed by NumPy and4909{\tt fft\_correlate}.}4910\label{fig.convolution9}4911\end{figure}49124913In Section~\ref{convolution} I presented definitions of4914cross-correlation and convolution, and we saw that they are4915almost the same, except that in convolution the window is4916reversed.49174918Now that we have an efficient algorithm for convolution, we4919can also use it to compute cross-correlations and autocorrelations.4920Using the data from the previous section, we can compute the4921autocorrelation Facebook stock prices:49224923\begin{verbatim}4924corrs = np.correlate(close, close, mode='same')4925\end{verbatim}49264927With {\tt mode='same'}, the result has the same length as {\tt close},4928corresponding to lags from $-N/2$ to $N/2-1$.4929The gray line in Figure~\ref{fig.convolution9} shows the result.4930Except at {\tt lag=0}, there are no peaks, so there is no apparent4931periodic behavior in this signal. However, the autocorrelation4932function drops off slowly, suggesting that this signal resembled4933pink noise, as we saw in Section~\ref{autopink}.49344935To compute autocorrelation using convolution,4936have to zero-pad the signal to double the length.4937This trick is necessary because the FFT is based4938on the assumption that the signal is periodic; that is, that it wraps4939around from the end to the beginning. With time-series data like4940this, that assumption is invalid. Adding zeros, and then trimming4941the results, removes the bogus values.49424943Also, remember that convolution reverse the direction of the window.4944In order to cancel that effect, we reverse the direction of the4945window before calling \verb"fft_convolve", using {\tt np.flipud},4946which flips a NumPy array. The result is a view of the array,4947not a copy, so this operation is fast.49484949\begin{verbatim}4950def fft_autocorr(signal):4951N = len(signal)4952signal = thinkdsp.zero_pad(signal, 2*N)4953window = np.flipud(signal)49544955corrs = fft_convolve(signal, window)4956corrs = np.roll(corrs, N//2+1)[:N]4957return corrs4958\end{verbatim}49594960The result from \verb"fft_convolve" has length $2N$. Of those,4961the first and last $N/2$ are valid; the rest are the result of4962zero-padding. To select the valid element, we roll the results4963and select the first $N$, corresponding to lags from $-N/2$ to4964$N/2-1$.49654966As shown in Figure~\ref{fig.convolution9} the results from4967\verb"fft_autocorr" and {\tt np.correlate} are identical (with4968about 9 digits of precision).49694970Notice that the correlations in Figure~\ref{fig.convolution9} are4971large numbers; we could normalize them (between -1 and 1) as shown4972in Section~\ref{correlate}.49734974The strategy we used here for auto-correlation also works for4975cross-correlation. Again, you have to prepare the signals by flipping4976one and padding both, and then you have to trim the invalid parts of4977the result. This padding and trimming is a nuisance, but that's why4978libraries like NumPy provide functions to do it for you.497949804981\section{Exercises}49824983Solutions to these exercises are in {\tt chap08soln.ipynb}.49844985\begin{exercise}4986The notebook for this chapter is {\tt chap08.ipynb}.4987Read through it and run the code.49884989It contains an interactive widget that lets you4990experiment with the parameters of the Gaussian window to see4991what effect they have on the cutoff frequency.49924993What goes wrong when you increase the width of the Gaussian,4994{\tt std}, without increasing the number of elements in the window,4995{\tt M}?4996\end{exercise}499749984999\begin{exercise}5000In this chapter I claimed that the Fourier transform of a Gaussian5001curve is also a Gaussian curve. For discrete Fourier transforms,5002this relationship is approximately true.50035004Try it out for a few examples. What happens to the Fourier transform5005as you vary {\tt std}?5006\end{exercise}500750085009\begin{exercise}5010If you did the exercises in Chapter~\ref{nonperiodic}, you saw the5011effect of the Hamming window, and some of the other windows provided5012by NumPy, on spectral leakage. We can get some insight into the5013effect of these windows by looking at their DFTs.50145015In addition to the Gaussian window we used in this window, create a5016Hamming window with the same size. Zero pad the windows and plot5017their DFTs. Which window acts as a better low-pass filter? You might5018find it useful to plot the DFTs on a log-$y$ scale.50195020Experiment with a few different windows and a few different sizes.5021\end{exercise}5022502350245025\chapter{Differentiation and Integration}5026\label{diffint}50275028This chapter picks up where the previous chapter left off,5029looking at the relationship between windows in the time domain5030and filters in the frequency domain.50315032In particular, we'll look at the effect of a finite difference5033window, which approximates differentiation, and the cumulative5034sum operation, which approximates integration.50355036The code for this chapter is in {\tt chap09.ipynb}, which is in the5037repository for this book (see Section~\ref{code}).5038You can also view it at \url{http://tinyurl.com/thinkdsp09}.503950405041\section{Finite differences}5042\label{diffs}50435044In Section~\ref{smoothing}, we applied a smoothing window to5045the stock price of Facebook and found that a smoothing5046window in the time domain corresponds to a low-pass filter in5047the frequency domain.50485049In this section, we'll look at daily price changes and5050see that computing the difference between successive elements,5051in the time domain, corresponds to a high-pass filter.50525053Here's the code to read the data, store it as a wave, and compute its5054spectrum.50555056\begin{verbatim}5057import pandas as pd50585059names = ['date', 'open', 'high', 'low', 'close', 'volume']5060df = pd.read_csv('fb.csv', header=0, names=names)5061ys = df.close.values[::-1]5062close = thinkdsp.Wave(ys, framerate=1)5063spectrum = wave.make_spectrum()5064\end{verbatim}50655066This example uses Pandas to read the CSV file; the5067result is a DataFrame, {\tt df}, with columns for the opening5068price, closing price, and high and low prices. I select the closing5069prices and save them in a Wave object.5070The framerate is 1 sample per day.50715072Figure~\ref{fig.diff_int1} shows5073this time series and its spectrum.5074Visually, the time series resembles Brownian noise (see5075Section~\ref{brownian}).5076And the spectrum looks like a straight5077line, albeit a noisy one. The estimated slope is -1.9,5078which is consistent with Brownian noise.50795080\begin{figure}5081% diff_int.py5082\centerline{\includegraphics[height=2.5in]{figs/diff_int1.eps}}5083\caption{Daily closing price of Facebook and the spectrum of this time5084series.}5085\label{fig.diff_int1}5086\end{figure}50875088Now let's compute the daily price change using {\tt np.diff}:50895090\begin{verbatim}5091diff = np.diff(ys)5092change = thinkdsp.Wave(diff, framerate=1)5093change_spectrum = change.make_spectrum()5094\end{verbatim}50955096Figure~\ref{fig.diff_int2} shows the resulting wave and its spectrum.5097The daily changes resemble white noise, and the estimated slope of the5098spectrum, -0.06, is near zero, which is what we expect for white5099noise.51005101\begin{figure}5102% diff_int2.py5103\centerline{\includegraphics[height=2.5in]{figs/diff_int2.eps}}5104\caption{Daily price change of Facebook and the spectrum of this time series.}5105\label{fig.diff_int2}5106\end{figure}510751085109\section{The frequency domain}51105111Computing the difference5112between successive elements is the same as convolution with5113the window {\tt [1, -1]}.5114If the order of those elements seems backward,5115remember that convolution reverses the window before applying it5116to the signal.51175118We can see the effect of this operation in the frequency domain5119by computing the DFT of the window.51205121\begin{verbatim}5122diff_window = np.array([1.0, -1.0])5123padded = thinkdsp.zero_pad(diff_window, len(close))5124diff_wave = thinkdsp.Wave(padded, framerate=close.framerate)5125diff_filter = diff_wave.make_spectrum()5126\end{verbatim}51275128\begin{figure}5129% diff_int.py5130\centerline{\includegraphics[height=2.5in]{figs/diff_int3.eps}}5131\caption{Filters corresponding to the diff and differentiate operators (left) and integration operator (right, log-$y$ scale).}5132\label{fig.diff_int3}5133\end{figure}51345135Figure~\ref{fig.diff_int3} shows the result. The finite difference5136window corresponds to a high pass filter: its amplitude increases with5137frequency, linearly for low frequencies, and then sublinearly after5138that. In the next section, we'll see why.513951405141\section{Differentiation}5142\label{effdiff}51435144The window we used in the previous section is a5145numerical approximation of the first derivative, so the filter5146approximates the effect of differentiation.51475148Differentiation in the time domain corresponds to a simple filter5149in the frequency domain; we can figure out what it is with a little5150math.51515152Suppose we have a complex sinusoid with frequency $f$:5153%5154\[ E_f(t) = e^{2 \pi i f t} \]5155%5156The first derivative of $E_f$ is5157%5158\[ \frac{d}{dt} E_f(t) = 2 \pi i f e^{2 \pi i f t} \]5159%5160which we can rewrite as5161%5162\[ \frac{d}{dt} E_f(t) = 2 \pi i f E_f(t) \]5163%5164In other words, taking the derivative of $E_f$ is the same5165as multiplying by $2 \pi i f$, which is a complex number5166with magnitude $2 \pi f$ and angle $\pi/2$.51675168We can compute the filter that corresponds to differentiation,5169like this:51705171\begin{verbatim}5172deriv_filter = close.make_spectrum()5173deriv_filter.hs = PI2 * 1j * deriv_filter.fs5174\end{verbatim}51755176I started with the spectrum of {\tt close}, which has the right5177size and framerate, then replaced the {\tt hs} with $2 \pi i f$.5178Figure~\ref{fig.diff_int3} (left) shows this filter; it is a straight line.51795180As we saw in Section~\ref{synthmat}, multiplying a complex sinusoid5181by a complex number has two effects: it multiplies5182the amplitude, in this case by $2 \pi f$, and shifts the phase5183offset, in this case by $\pi/2$.51845185If you are familiar with the language of operators and eigenfunctions,5186each $E_f$ is an eigenfunction of the differentiation operator, with the5187corresponding eigenvalue $2 \pi i f$. See5188\url{http://en.wikipedia.org/wiki/Eigenfunction}.51895190If you are not familiar with that language, here's what it5191means:51925193\newcommand{\op}{\mathcal{A}}51945195\begin{itemize}51965197\item An operator is a function that takes a function and returns5198another function. For example, differentiation is an operator.51995200\item A function, $g$, is an eigenfunction of an operator, $\op$, if5201applying $\op$ to $g$ has the effect of multiplying the function by5202a scalar. That is, $\op g = \lambda g$.52035204\item In that case, the scalar $\lambda$ is the eigenvalue that5205corresponds to the eigenfunction $g$.52065207\item A given operator might have many eigenfunctions, each with5208a corresponding eigenvalue.52095210\end{itemize}52115212Because complex sinusoids are eigenfunctions of the differentiation5213operator, they are easy to differentiate. All we have to do is5214multiply by a complex scalar.52155216For signals with more than one5217component, the process is only slightly harder:52185219\begin{enumerate}52205221\item Express the signal as the sum of complex sinusoids.52225223\item Compute the derivative of each component by multiplication.52245225\item Add up the differentiated components.52265227\end{enumerate}52285229If that process sounds familiar, that's because it is identical5230to the algorithm for convolution in Section~\ref{effconv}: compute5231the DFT, multiply by a filter, and compute the inverse DFT.52325233{\tt Spectrum} provides a method that applies the differentiation5234filter:52355236\begin{verbatim}5237# class Spectrum:52385239def differentiate(self):5240self.hs *= PI2 * 1j * self.fs5241\end{verbatim}52425243We can use it to compute the derivative of the Facebook time series:52445245\begin{verbatim}5246deriv_spectrum = close.make_spectrum()5247deriv_spectrum.differentiate()5248deriv = deriv_spectrum.make_wave()5249\end{verbatim}52505251\begin{figure}5252% diff_int.py5253\centerline{\includegraphics[height=2.5in]{figs/diff_int4.eps}}5254\caption{Comparison of daily price changes computed by5255{\tt np.diff} and by applying the differentiation filter.}5256\label{fig.diff_int4}5257\end{figure}52585259Figure~\ref{fig.diff_int4} compares the daily price changes computed by5260{\tt np.diff} with the derivative we just computed.5261I selected the first 50 values in the time series so we can see the5262differences more clearly.52635264The derivative is noisier, because it amplifies the high frequency5265components more, as shown in Figure~\ref{fig.diff_int3} (left). Also, the5266first few elements of the derivative are very noisy. The problem5267there is that the DFT-based derivative is based on the assumption that5268the signal is periodic. In effect, it connects the last element in5269the time series back to the first element, which creates artifacts at5270the boundaries.52715272To summarize, we have shown:52735274\begin{itemize}52755276\item Computing the difference between successive values in a signal5277can be expressed as convolution with a simple window. The result is5278an approximation of the first derivative.52795280\item Differentiation in the time domain corresponds to a simple5281filter in the frequency domain. For periodic signals, the result is5282the first derivative, exactly. For some non-periodic signals, it5283can approximate the derivative.52845285\end{itemize}52865287Using the DFT to compute derivatives is the basis of {\bf spectral5288methods} for solving differential equations (see5289\url{http://en.wikipedia.org/wiki/Spectral_method}).52905291In particular, it is useful for the analysis of linear, time-invariant5292systems, which is coming up in Chapter~\ref{systems}.529352945295\section{Integration}52965297\begin{figure}5298% diff_int.py5299\centerline{\includegraphics[height=2.5in]{figs/diff_int5.eps}}5300\caption{Comparison of the original time series and the integrated5301derivative.}5302\label{fig.diff_int5}5303\end{figure}53045305In the previous section, we showed that differentiation in the time5306domain corresponds to a simple filter in the frequency domain: it5307multiplies each component by $2 \pi i f$. Since integration is5308the inverse of differentiation, it also corresponds to a simple5309filter: it divides each component by $2 \pi i f$.53105311We can compute this filter like this:53125313\begin{verbatim}5314integ_filter = close.make_spectrum()5315integ_filter.hs = 1 / (PI2 * 1j * integ_filter.fs)5316\end{verbatim}53175318Figure~\ref{fig.diff_int3} (right) shows this filter on a log-$y$ scale,5319which makes it easier to see.53205321{\tt Spectrum} provides a method that applies the integration filter:53225323\begin{verbatim}5324# class Spectrum:53255326def integrate(self):5327self.hs /= PI2 * 1j * self.fs5328\end{verbatim}53295330We can confirm that the integration filter is correct by applying it5331to the spectrum of the derivative we just computed:53325333\begin{verbatim}5334integ_spectrum = deriv_spectrum.copy()5335integ_spectrum.integrate()5336\end{verbatim}53375338But notice that at $f=0$, we are dividing by 0. The result in5339NumPy is NaN, which is a special floating-point value that5340represents ``not a number''. We can partially deal with this5341problem by setting this value to 0 before converting the5342spectrum back to a wave:53435344\begin{verbatim}5345integ_spectrum.hs[0] = 05346integ_wave = integ_spectrum.make_wave()5347\end{verbatim}53485349Figure~\ref{fig.diff_int5} shows this integrated derivative along with5350the original time series. They are almost identical, but the5351integrated derivative has been shifted down. The problem is that when5352we clobbered the $f=0$ component, we set the bias of the signal to 0.5353But that should not be surprising; in general, differentiation loses5354information about the bias, and integration can't recover it. In some5355sense, the NaN at $f=0$ is telling us that this element is unknown.53565357\begin{figure}5358% diff_int.py5359\centerline{\includegraphics[height=2.5in]{figs/diff_int6.eps}}5360\caption{A sawtooth wave and its spectrum.}5361\label{fig.diff_int6}5362\end{figure}53635364If we provide this ``constant of integration'', the results are5365identical, which confirms that this integration filter is the correct5366inverse of the differentiation filter.53675368\section{Cumulative sum}5369\label{cumsum}53705371In the same way that the diff operator approximates differentiation,5372the cumulative sum approximates integration.5373I'll demonstrate with a Sawtooth signal.53745375\begin{verbatim}5376signal = thinkdsp.SawtoothSignal(freq=50)5377in_wave = signal.make_wave(duration=0.1, framerate=44100)5378\end{verbatim}53795380Figure~\ref{fig.diff_int6} shows this wave and its spectrum.53815382{\tt Wave} provides a method that computes the cumulative sum of5383a wave array and returns a new Wave object:53845385\begin{verbatim}5386# class Wave:53875388def cumsum(self):5389ys = np.cumsum(self.ys)5390ts = self.ts.copy()5391return Wave(ys, ts, self.framerate)5392\end{verbatim}53935394We can use it to compute the cumulative sum of \verb"in_wave":53955396\begin{verbatim}5397out_wave = in_wave.cumsum()5398out_wave.unbias()5399\end{verbatim}54005401\begin{figure}5402% diff_int.py5403\centerline{\includegraphics[height=2.5in]{figs/diff_int7.eps}}5404\caption{A parabolic wave and its spectrum.}5405\label{fig.diff_int7}5406\end{figure}54075408Figure~\ref{fig.diff_int7} shows the resulting wave and its spectrum.5409If you did the exercises in Chapter~\ref{harmonics}, this waveform should5410look familiar: it's a parabolic signal.54115412Comparing the spectrum of the parabolic signal to the spectrum of the5413sawtooth, the amplitudes of the components drop off more quickly. In5414Chapter~\ref{harmonics}, we saw that the components of the sawtooth5415drop off in proportion to $1/f$. Since the cumulative sum5416approximates integration, and integration filters components in5417proportion to $1/f$, the components of the parabolic wave drop off in5418proportion to $1/f^2$.54195420We can see that graphically by computing the filter that corresponds5421to the cumulative sum:54225423\begin{verbatim}5424cumsum_filter = diff_filter.copy()5425cumsum_filter.hs = 1 / cumsum_filter.hs5426\end{verbatim}54275428Because {\tt cumsum} is the inverse operation of {\tt diff}, we5429start with a copy of \verb"diff_filter", which is the filter5430that corresponds to the {\tt diff} operation, and then invert the5431{\tt hs}.54325433\begin{figure}5434% diff_int.py5435\centerline{\includegraphics[height=2.5in]{figs/diff_int8.eps}}5436\caption{Filters corresponding to cumulative sum and integration.}5437\label{fig.diff_int8}5438\end{figure}54395440Figure~\ref{fig.diff_int8} shows the filters corresponding to5441cumulative sum and integration. The cumulative sum is a good5442approximation of integration except at the highest frequencies,5443where it drops off a little faster.54445445To confirm that this is the correct filter for the cumulative5446sum, we can compare it to the ratio of the spectrum5447\verb"out_wave" to the spectrum of \verb"in_wave":54485449\begin{verbatim}5450in_spectrum = in_wave.make_spectrum()5451out_spectrum = out_wave.make_spectrum()5452ratio_spectrum = out_spectrum.ratio(in_spectrum, thresh=1)5453\end{verbatim}54545455\begin{figure}5456% diff_int.py5457\centerline{\includegraphics[height=2.5in]{figs/diff_int9.eps}}5458\caption{Filter corresponding to cumulative sum and actual ratios of5459the before-and-after spectrums.}5460\label{fig.diff_int9}5461\end{figure}54625463And here's the method that computes the ratios:54645465\begin{verbatim}5466def ratio(self, denom, thresh=1):5467ratio_spectrum = self.copy()5468ratio_spectrum.hs /= denom.hs5469ratio_spectrum.hs[denom.amps < thresh] = np.nan5470return ratio_spectrum5471\end{verbatim}54725473When {\tt denom.amps} is small, the resulting ratio is noisy,5474so I set those values to NaN.54755476Figure~\ref{fig.diff_int8} shows the ratios and the filter5477corresponding to the cumulative sum. They agree, which confirms that5478inverting the filter for {\tt diff} yields the filter for {\tt5479cumsum}.54805481Finally, we can confirm that the Convolution Theorem applies by5482applying the {\tt cumsum} filter in the frequency domain:54835484\begin{verbatim}5485out_wave2 = (in_spectrum * cumsum_filter).make_wave()5486\end{verbatim}54875488Within the limits of floating-point error, \verb"out_wave2" is5489identical to \verb"out_wave", which we computed using {\tt cumsum}, so5490the Convolution Theorem works! But note that this demonstration only5491works with periodic signals.549254935494\section{Integrating noise}54955496In Section~\ref{brownian}, we generated Brownian noise by computing the5497cumulative sum of white noise.5498Now that we understand the effect of {\tt cumsum} in the frequency5499domain, we have some insight into the spectrum of Brownian noise.55005501White noise has equal power at all frequencies, on average. When we5502compute the cumulative sum, the amplitude of each component is divided5503by $f$. Since power is the square of magnitude, the power of each5504component is divided by $f^2$. So on average, the power at frequency5505$f$ is proportional to $1 / f^2$:5506%5507\[ P_f = K / f^2 \]5508%5509where $K$ is a constant that's not important.5510Taking the log of both sides yields:5511%5512\[ \log P_f = \log K - 2 \log f \]5513%5514And that's why, when we plot the spectrum of Brownian noise on a5515log-log scale, we expect to see a straight line with slope -2, at5516least approximately.55175518In Section~\ref{diffs} we plotted the spectrum of closing prices for5519Facebook, and estimated that the slope is -1.9, which is consistent5520with Brownian noise. Many stock prices have similar spectrums.55215522When we use the {\tt diff} operator to compute daily changes, we5523multiplied the {\em amplitude} of each component by a filter proportional to5524$f$, which means we multiplied the {\em power} of each component by $f^2$.5525On a log-log scale, this operation adds 2 to the slope of the5526power spectrum, which is why the estimated slope of the result5527is near 0.1 (but a little lower, because {\tt diff} only approximates5528differentiation).5529553055315532\section{Exercises}55335534Solutions to these exercises are in {\tt chap09soln.ipynb}.55355536\begin{exercise}5537The notebook for this chapter is {\tt chap08.ipynb}.5538Read through it and run the code.55395540In Section~\ref{cumsum}, I mentioned that some of the5541examples don't work with non-periodic signals. Try replacing the5542sawtooth wave, which is periodic, with the Facebook data, which is5543not, and see what goes wrong.5544\end{exercise}55455546\begin{exercise}5547The goal of this exercise is to explore the effect of {\tt diff} and5548{\tt differentiate} on a signal. Create a triangle wave and plot5549it. Apply {\tt diff} and plot the result. Compute the spectrum of the5550triangle wave, apply {\tt differentiate}, and plot the result. Convert5551the spectrum back to a wave and plot it. Are there differences between5552the effect of {\tt diff} and {\tt differentiate} for this wave?5553\end{exercise}55545555\begin{exercise}5556The goal of this exercise is to explore the effect of {\tt cumsum} and5557{\tt integrate} on a signal. Create a square wave and plot it. Apply5558{\tt cumsum} and plot the result. Compute the spectrum of the square5559wave, apply {\tt integrate}, and plot the result. Convert the spectrum5560back to a wave and plot it. Are there differences between the effect5561of {\tt cumsum} and {\tt integrate} for this wave?5562\end{exercise}55635564\begin{exercise}5565The goal of this exercise is the explore the effect of integrating5566twice. Create a sawtooth wave, compute its spectrum, then apply {\tt5567integrate} twice. Plot the resulting wave and its spectrum. What is5568the mathematical form of the wave? Why does it resemble a sinusoid?5569\end{exercise}55705571\begin{exercise}5572The goal of this exercise is to explore the effect of the 2nd5573difference and 2nd derivative. Create a {\tt CubicSignal}, which is5574defined in {\tt thinkdsp}. Compute the second difference by applying5575{\tt diff} twice. What does the result look like? Compute the second5576derivative by applying {\tt differentiate} to the spectrum twice.5577Does the result look the same?55785579Plot the filters that corresponds to the 2nd difference and the 2nd5580derivative and compare them. Hint: In order to get the filters on the5581same scale, use a wave with framerate 1.5582\end{exercise}55835584558555865587\chapter{LTI systems}5588\label{systems}55895590This chapter presents the theory of signals and systems, using5591musical acoustics as an example. And it explains an5592important application of the Convolution Theorem, characterization5593of linear, time-invariant systems (which I'll define soon).55945595The code for this chapter is in {\tt chap10.ipynb}, which is in the5596repository for this book (see Section~\ref{code}).5597You can also view it at \url{http://tinyurl.com/thinkdsp10}.5598559956005601\section{Signals and systems}56025603In the context of signal processing, a {\bf system} is an abstract5604representation of anything that takes a signal as input and produces5605a signal as output.56065607For example, an electronic amplifier is a circuit that takes an5608electrical signal as input and produces a (louder) signal as output.56095610As another example, when you listen to a musical performance, you5611can think of the room as a system that takes the sound of the5612performance at the location where it is produced and produces a5613somewhat different sound at the location where you hear it.56145615A {\bf linear, time-invariant system}\footnote{My presentation here5616follows \url{http://en.wikipedia.org/wiki/LTI_system_theory}.} is a5617system with these additional properties:56185619\begin{enumerate}56205621\item Linearity: If you put two inputs into the system at the same5622time, the result is the sum of their outputs. Mathematically, if an5623input $x_1$ produces output $y_1$ and another input $x_2$ produces5624$y_2$, then $a x_1 + b x_2$ produces $a y_1 + b y_2$, where $a$ and5625$b$ are scalars.56265627\item Time invariance: The5628effect of the system doesn't vary over time, or depend on the state5629of the system. So if inputs $x_1$ and $x_2$ differ by a shift in time,5630their outputs $y_1$ and $y_2$ differ by the same shift, but are otherwise5631identical.56325633\end{enumerate}56345635Many physical systems have these properties, at least approximately.56365637\begin{itemize}56385639\item Circuits that contain only resistors, capacitors and inductors are5640LTI, to the degree that the components behave like their idealized5641models.56425643\item Mechanical systems that contain springs, masses and5644dashpots are also LTI, assuming linear springs (force proportional5645to displacement) and dashpots (force proportional to velocity).56465647\item Also, and most relevant to applications in this book,5648the media that transmit sounds (including air, water5649and solids) are well-modeled by LTI systems.56505651\end{itemize}56525653LTI systems are described by linear differential equations, and5654the solutions of those equations are complex sinusoids (see5655\url{http://en.wikipedia.org/wiki/Linear_differential_equation}).56565657This result provides an algorithm for computing the effect of5658an LTI system on an input signal:56595660\begin{enumerate}56615662\item Express the signal as the sum of complex sinusoid components.56635664\item For each input component, compute the corresponding output component.56655666\item Add up the output components.56675668\end{enumerate}56695670At this point, I hope this algorithm sounds familiar. It's the5671same algorithm we used for convolution in Section~\ref{effconv}, and5672for differentiation in Section~\ref{effdiff}. This process5673is called {\bf spectral decomposition} because we ``decompose''5674the input signal into its spectral components.56755676In order to apply this process to an LTI system, we have to {\bf5677characterize} the system by finding its effect on each component5678of the input signal. For mechanical systems, it turns out that there5679is a simple and efficient way to do that: you kick it and record5680the output.56815682Technically, the ``kick'' is called an {\bf impulse} and the5683output is called the {\bf impulse response}. You might wonder5684how a single impulse can completely characterize a system. You5685can see the answer by computing the DFT of an impulse. Here's5686a wave array with an impulse at $t=0$:56875688\begin{verbatim}5689>>> impulse = np.zeros(8)5690>>> impulse[0] = 15691>>> impulse5692[ 1. 0. 0. 0. 0. 0. 0. 0.]5693\end{verbatim}56945695And here's its spectrum:56965697\begin{verbatim}5698>>> spectrum = np.fft.fft(impulse)5699>>> spectrum5700[ 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j]5701\end{verbatim}57025703That's right: an impulse is the sum of components with equal5704magnitudes at all frequencies\footnote{Not to be confused with white5705noise, which has the same average power at all frequencies, but5706varies around that average.} (and a phase offset of 0).57075708Therefore when you test a system by inputting5709an impulse, you are testing the response of the5710system at all frequencies. And you can test them all at the same5711time because the system is linear, so simultaneous tests don't5712interfere with each other.571357145715\section{Transfer functions}57165717To characterize the acoustic response of a room or open space, a5718simple way to generate an impulse is to pop a balloon or5719fire a gun. A gunshot puts an impulse into5720the system; the sound you hear is the impulse response.57215722%TODO: update the numbers for the figures in this chapter5723\begin{figure}5724% systems.py5725\centerline{\includegraphics[height=2.5in]{figs/systems6.eps}}5726\caption{Waveform of a gunshot.}5727\label{fig.systems6}5728\end{figure}57295730As an example, I'll use a recording of a gunshot to characterize5731the room where the gun was fired, then use the impulse response5732to simulate the effect of that room on a violin recording.57335734This example is in {\tt chap10.ipynb}, which is in the repository5735for this book; you can also view it, and listen to the examples,5736at \url{http://tinyurl.com/thinkdsp10}.57375738Here's the gunshot:57395740\begin{verbatim}5741response = thinkdsp.read_wave('180961__kleeb__gunshots.wav')5742response = response.segment(start=0.26, duration=5.0)5743response.normalize()5744response.plot()5745\end{verbatim}57465747I select a segment starting at 0.26 seconds to remove the silence5748before the gunshot. Figure~\ref{fig.systems6} (left) shows the5749waveform of the gunshot. Next we compute the DFT of {\tt response}:57505751\begin{verbatim}5752transfer = response.make_spectrum()5753transfer.plot()5754\end{verbatim}57555756Figure~\ref{fig.systems6} (right) shows the result. This spectrum5757encodes the response of the room; for each frequency, the spectrum5758contains a complex number that represents an amplitude multiplier and5759a phase shift. This spectrum is called a {\bf transfer5760function} because it contains information about how the system transfers5761the input to the output.57625763Now we can simulate the effect this room would have on the sound5764of a violin. Here is the violin recording we used in Section~\ref{violin}57655766\begin{verbatim}5767wave = thinkdsp.read_wave('92002__jcveliz__violin-origional.wav')5768wave.ys = wave.ys[:len(response)]5769wave.normalize()5770\end{verbatim}57715772The violin and gunshot waves were sampled at the same framerate,577344,100 Hz. And coincidentally, the duration of both is about the5774same. I trimmed the violin wave to the same length as the gunshot.57755776Next I compute the DFT of the violin wave:57775778\begin{verbatim}5779spectrum = wave.make_spectrum()5780\end{verbatim}57815782Now I know the magnitude and phase of each component in the5783input, and I know the transfer function of the system. Their5784product is the DFT of the output, which we can use to compute the5785output wave:57865787\begin{verbatim}5788output = (spectrum * transfer).make_wave()5789output.normalize()5790output.plot()5791\end{verbatim}57925793%TODO: Am I missing a figure here?5794Figure~\ref{fig.systems7} shows the output of the system superimposed5795on the input. The overall shape is similar, but there are substantial5796differences. And those differences are clearly audible. Load5797{\tt chap10.ipynb} and listen to them. One thing I find striking5798about this example is that you can get a sense of what the room5799was like; to me, it sounds like a long, narrow room with hard floors5800and ceilings. That is, like a firing range.58015802There's one things I glossed over in this example that I'll mention5803in case it bothers anyone. The violin recording I started with5804has already been transformed by one system: the room where it was5805recorded. So what I really computed in my example is the sound5806of the violin after two transformations. To properly simulate5807the sound of a violin in a different room, I should have characterized5808the room where the violin was recorded and applied the inverse5809of that transfer function first.581058115812\section{Systems and convolution}58135814\begin{figure}5815% systems.py5816\centerline{\includegraphics[height=2.5in]{figs/systems8.eps}}5817\caption{Sum of a gunshot waveform with a shifted, scaled version of5818itself.}5819\label{fig.systems8}5820\end{figure}58215822If you think the previous example is black magic,5823you are not alone. I've been thinking about it for a while and it5824still makes my head hurt.58255826In the previous section, I suggested one way to think about it:58275828\begin{itemize}58295830\item An impulse is made up of components with amplitude 1 at all5831frequencies.58325833\item The impulse response contains the sum of the responses of the5834system to all of these components.58355836\item The transfer function, which is the DFT of the impulse response,5837encodes the effect of the system on each frequency component in the form5838of an amplitude multiplier and a phase shift.58395840\item For any input, we can compute the response of the system5841by breaking the input into components, computing the response to5842each component, and adding them up.58435844\end{itemize}58455846But if you don't like that, there's another way to think about5847it altogether: convolution!58485849Suppose that instead of firing one gun, you fire two guns:5850a big one with amplitude 1 at $t=0$ and a smaller one with5851amplitude 0.5 at $t=1$.58525853We can compute the response of the system by adding up5854the original impulse response and a scaled, shifted version of itself.5855Here's a function that makes a shifted, scaled version of5856a wave:58575858\begin{verbatim}5859def shifted_scaled(wave, shift, factor):5860res = wave.copy()5861res.shift(shift)5862res.scale(factor)5863return res5864\end{verbatim}58655866Here's how we use it to compute the response to a two-gun salute:58675868\begin{verbatim}5869dt = 15870shift = dt * response.framerate5871factor = 0.55872response2 = response + shifted_scaled(response, shift, factor)5873\end{verbatim}58745875Figure~\ref{fig.systems8} shows the result. You can hear what5876it sounds like in {\tt chap10.ipynb}. Not surprisingly, it5877sounds like two gunshots, the first one louder than the second.58785879Now suppose instead of two guns, you add up 100 guns, with5880a shift of 100 between them. This loop computes the result:58815882\begin{verbatim}5883total = 05884for j in range(100):5885total += shifted_scaled(response, j*100, 1.0)5886\end{verbatim}58875888At a framerate of 44,100 Hz, there are 441 gunshots per second,5889so you don't hear the individual shots. Instead, it sounds5890like a periodic signal at 441 Hz. If you play this example, it5891sounds like a car horn in a garage.58925893And that brings us to a key insight: you can think of any wave as a5894series of samples, where each sample is an impulse with a different5895magnitude.58965897As a example, I'll generate a sawtooth signal at 410 Hz:58985899\begin{verbatim}5900signal = thinkdsp.SawtoothSignal(freq=410)5901wave = signal.make_wave(duration=0.1,5902framerate=response.framerate)5903\end{verbatim}59045905Now I'll loop through the series of impulses that make up the5906sawtooth, and add up the impulse responses:59075908\begin{verbatim}5909total = 05910for j, y in enumerate(wave.ys):5911total += shifted_scaled(response, j, y)5912\end{verbatim}59135914The result is what it would sound like to play a sawtooth wave in a5915firing range. Again, you can listen to it in {\tt chap10.ipynb}.59165917\begin{figure}5918\input{diagram2}5919\caption{Diagram of the sum of scaled and shifted copies of $g$.}5920\label{fig.convolution}5921\end{figure}59225923Figure~\ref{fig.convolution} shows a diagram of this computation,5924where $f$ is the sawtooth, $g$ is the impulse response, and $h$5925is the sum of the shifted, scaled copies of $g$.59265927For the example shown:59285929\[ h[2] = f[0]g[2] + f[1]g[1] + f[2]g[0] \]59305931And more generally,59325933\[ h[n] = \sum_{m=0}^{N-1} f[m] g[n-m] \]59345935You might recognize this equation from Section~\ref{convolution}. It5936is the convolution of $f$ and $g$. This shows that if the input is $f$5937and the impulse response of the system is $g$, the output is the5938convolution of $f$ and $g$.59395940In summary, there are two ways to think about the effect of a system5941on a signal:59425943\begin{enumerate}59445945\item The input is a sequence of impulses, so the output is the sum of5946scaled, shifted copies of the impulse response; that sum is the5947convolution of the input and the impulse response.59485949\item The DFT of the impulse response is a transfer function that5950encodes the effect of the system on each frequency component as a5951magnitude and phase offset. The DFT of the input encodes the5952magnitude and phase offset of the frequency components it contains.5953Multiplying the DFT of the input by the transfer function yields5954the DFT of the output.59555956\end{enumerate}59575958The equivalence of these two descriptions should not be a surprise;5959it is basically a statement of the Convolution Theorem:5960convolution of $f$ and $g$ in the time5961domain corresponds to multiplication in the frequency domain.5962And if you wondered why convolution is defined as it is, which5963seemed backwards when we talked about smoothing and difference5964windows, now you know the reason: the definition of convolution5965appears naturally in the response of an LTI system to a signal.596659675968\section{Proof of the Convolution Theorem}59695970Well, I've put it off long enough. It's time to prove the Convolution5971Theorem (CT), which states:59725973\[ \DFT(f \conv g) = \DFT(f) \DFT(g) \]59745975where $f$ and $g$ are vectors with the same length, $N$.59765977I'll proceed in two steps:59785979\begin{enumerate}59805981\item I'll show that in the special case where $f$ is a complex5982exponential, convolution with $g$ has the effect of multiplying5983$f$ by a scalar.59845985\item In the more general case where $f$ is not a complex exponential,5986we can use the DFT to express it as a sum of exponential components,5987compute the convolution of each component (by multiplication) and5988then add up the results.59895990\end{enumerate}59915992Together these steps prove the Convolution Theorem. But first, let's5993assemble the pieces we'll need. The DFT of $g$ is:5994%5995\[ DFT(g) = G[k] = \sum_n g[n] \exp(-2 \pi i n k / N) \]5996%5997where $k$ is an index of frequency from59980 to $N-1$ and $n$ is an index of time from 0 to $N-1$.5999The result, $G$, is a vector of $N$ complex numbers.60006001The inverse DFT of $F$, which I'll call $f$, is:6002%6003\[ IDFT(F) = f[n] = \sum_k F[k] \exp(2 \pi i n k / N) \]6004%6005And here's the definition of convolution:6006%6007\[ (f \conv g)[n] = \sum_m f[m] g[n-m] \]6008%6009where $m$ is another index of time from 0 to $N-1$.6010Convolution is commutative, so I could equivalently write:6011%6012\[ (f \conv g)[n] = \sum_m f[n-m] g[m] \]6013%6014Now let's consider the special case where $f$ is a complex6015exponential with frequency $k$, which I'll call $e_k$:6016%6017\[ f[n] = e_k[n] = \exp(2 \pi i n k / N) \]6018%6019where $k$ is an index of frequency and $n$ is an index of time.60206021Plugging $e_k$ into the second definition of convolution yields6022%6023\[ (e_k \conv g)[n] = \sum_m \exp(2 \pi i (n-m) k / N) g[m] \]6024%6025We can split the first term into a product:6026%6027\[ (e_k \conv g)[n] = \sum_m \exp(2 \pi i n k / N) \exp(-2 \pi i m k / N) g[m] \]6028%6029The first half does not depend on $m$, so we can pull it out of the6030summation:6031%6032\[ (e_k \conv g)[n] = \exp(2 \pi i n k / N) \sum_m \exp(-2 \pi i m k / N) g[m] \]6033%6034Now we recognize that the first term is $e_k$, and the summation is6035$G[k]$ (using $m$ as the index of time). So we can write:6036%6037\[ (e_k \conv g)[n] = e_k[n] G[k] \]6038%6039which shows that for each complex exponential, $e_k$, convolution6040with $g$ has the effect of multiplying $e_k$ by $G[k]$. In mathematical6041terms, each $e_k$ is an eigenvector of this operation, and6042$G[k]$ is the corresponding eigenvalue.60436044Now for the second part of the proof. If the input signal, $f$, doesn't6045happen to be a complex exponential, we can express it as a sum of6046complex exponentials by computing its DFT, $F$.6047For each value of $k$ from 0 to $N-1$, $F[k]$ is the complex6048magnitude of the component with frequency $k$.60496050Each input component is a complex exponential with magnitude6051$F[k]$, so each output component is a complex6052exponential with magnitude $F[k]G[k]$, based on the first part of6053the proof.60546055Because the system is linear, the output is just the sum of the6056output components:6057%6058\[ (f \conv g)[n] = \sum_k F[k] G[k] e_k[n] \]6059%6060Plugging in the definition of $e_k$ yields6061%6062\[ (f \conv g)[n] = \sum_k F[k] G[k] \exp(2 \pi i n k / N) \]6063%6064The right hand side is the inverse DFT of the product $F G$. Thus:6065%6066\[ (f \conv g) = \IDFT( F G ) \]6067%6068Substituting $F = \DFT(f)$ and $G = \DFT(g)$:6069%6070\[ (f \conv g) = \IDFT( \DFT(f) \DFT(g) ) \]6071%6072Finally, taking the DFT of both sides yields the Convolution Theorem:6073%6074\[ \DFT(f \conv g) = \DFT(f) \DFT(g) \]6075%6076QED607760786079\section{Exercises}60806081% TODO: make exercises6082\begin{exercise}6083\end{exercise}60846085\begin{exercise}6086\end{exercise}60876088Solutions to these exercises are in {\tt chap10soln.ipynb}.608960906091\chapter{Modulation and sampling}60926093In Section~\ref{aliasing} I defined the folding frequency, also called6094the Nyquist frequency, and I claimed that ???60956096This chapter presents the theory of signals and systems, using6097musical acoustics as an example. And it explains an6098important application of the Convolution Theorem, characterization6099of linear, time-invariant systems (which I'll define soon).61006101The code for this chapter is in {\tt chap11.ipynb}, which is in the6102repository for this book (see Section~\ref{code}).6103You can also view it at \url{http://tinyurl.com/thinkdsp11}.610461056106\section{Convolution with impulses}61076108\begin{figure}6109% sampling.py6110\centerline{\includegraphics[height=2.5in]{figs/sampling1.eps}}6111\caption{.}6112\label{fig.sampling1}6113\end{figure}61146115Figure~\ref{fig.sampling1} shows61166117611861196120612161226123\end{document}6124612561266127Case study ideas:61286129\begin{itemize}61306131\item Pitch tracking in Rock Band61326133\item Auto-Tune61346135\item JPEG compression61366137\item Image recognition in Fourier space (money)61386139\item Application of cepstrum?61406141\item Application of z transform -- digital control systems61426143\item tilt shift effect on a photo6144614561466147\end{itemize}614861496150