📚 The CoCalc Library - books, templates and other resources
License: OTHER
% LaTeX source for ``Think DSP: Digital Signal Processing for Programmers''1% Copyright 2014 Allen B. Downey.23% License: Creative Commons Attribution-NonCommercial 3.0 Unported License.4% http://creativecommons.org/licenses/by-nc/3.0/5%67\documentclass[12pt]{book}8\usepackage[width=5.5in,height=8.5in,9hmarginratio=3:2,vmarginratio=1:1]{geometry}1011% for some of these packages, you might have to install12% texlive-latex-extra (in Ubuntu)1314\usepackage[T1]{fontenc}15\usepackage{textcomp}16\usepackage{mathpazo}17\usepackage{url}18\usepackage{graphicx}19\usepackage{subfig}20\usepackage{amsmath}21\usepackage{amsthm}22\usepackage{makeidx}23\usepackage{setspace}24\usepackage{hevea}25\usepackage{upquote}26\usepackage{fancyhdr}27\usepackage[bookmarks]{hyperref}2829\title{Think DSP}30\author{Allen B. Downey}3132\newcommand{\thetitle}{Think DSP: Digital Signal Processing in Python}33\newcommand{\theversion}{1.0.5}3435% these styles get translated in CSS for the HTML version36\newstyle{a:link}{color:black;}37\newstyle{p+p}{margin-top:1em;margin-bottom:1em}38\newstyle{img}{border:0px}3940% change the arrows in the HTML version41\setlinkstext42{\imgsrc[ALT="Previous"]{back.png}}43{\imgsrc[ALT="Up"]{up.png}}44{\imgsrc[ALT="Next"]{next.png}}4546\makeindex4748\newif\ifplastex49\plastexfalse5051\begin{document}5253\frontmatter5455\ifplastex5657\else58\fi5960\ifplastex61\usepackage{localdef}62\maketitle6364\else6566\newtheoremstyle{exercise}% name of the style to be used67{\topsep}% measure of space to leave above the theorem. E.g.: 3pt68{\topsep}% measure of space to leave below the theorem. E.g.: 3pt69{}% name of font to use in the body of the theorem70{}% measure of space to indent71{\bfseries}% name of head font72{}% punctuation between head and body73{ }% space after theorem head; " " = normal interword space74{}% Manually specify head7576\theoremstyle{exercise}77\newtheorem{exercise}{Exercise}[chapter]7879\input{latexonly}8081\begin{latexonly}8283\renewcommand{\topfraction}{0.9}84\renewcommand{\blankpage}{\thispagestyle{empty} \quad \newpage}8586% TITLE PAGES FOR LATEX VERSION8788%-half title--------------------------------------------------89\thispagestyle{empty}9091\begin{flushright}92\vspace*{2.0in}9394\begin{spacing}{3}95{\huge Think DSP}\\96{\Large Digital Signal Processing in Python}97\end{spacing}9899\vspace{0.25in}100101Version \theversion102103\vfill104105\end{flushright}106107%--verso------------------------------------------------------108109\blankpage110\blankpage111112%--title page--------------------------------------------------113\pagebreak114\thispagestyle{empty}115116\begin{flushright}117\vspace*{2.0in}118119\begin{spacing}{3}120{\huge Think DSP}\\121{\Large Digital Signal Processing in Python}122\end{spacing}123124\vspace{0.25in}125126Version \theversion127128\vspace{1in}129130131{\Large132Allen B. Downey\\133}134135136\vspace{0.5in}137138{\Large Green Tea Press}139140{\small Needham, Massachusetts}141142\vfill143144\end{flushright}145146147%--copyright--------------------------------------------------148\pagebreak149\thispagestyle{empty}150151Copyright \copyright ~2014 Allen B. Downey.152153154\vspace{0.2in}155156\begin{flushleft}157Green Tea Press \\1589 Washburn Ave \\159Needham MA 02492160\end{flushleft}161162Permission is granted to copy, distribute, and/or modify this document163under the terms of the Creative Commons Attribution-NonCommercial 3.0 Unported164License, which is available at \url{http://creativecommons.org/licenses/by-nc/3.0/}.165166\vspace{0.2in}167168\end{latexonly}169170171% HTMLONLY172173\begin{htmlonly}174175% TITLE PAGE FOR HTML VERSION176177{\Large \thetitle}178179{\large Allen B. Downey}180181Version \theversion182183\vspace{0.25in}184185Copyright 2012 Allen B. Downey186187\vspace{0.25in}188189Permission is granted to copy, distribute, and/or modify this document190under the terms of the Creative Commons Attribution-NonCommercial 3.0191Unported License, which is available at192\url{http://creativecommons.org/licenses/by-nc/3.0/}.193194\setcounter{chapter}{-1}195196\end{htmlonly}197198\fi199% END OF THE PART WE SKIP FOR PLASTEX200201\chapter{Preface}202\label{preface}203204Signal processing is one of my favorite topics. It is useful205in many areas of science and engineering, and if you understand206the fundamental ideas, it provides insight into many things207we see in the world, and especially the things we hear.208209But unless you studied electrical or mechanical engineering, you210probably haven't had a chance to learn about signal processing. The211problem is that most books (and the classes that use them) present the212material bottom-up, starting with mathematical abstractions like213phasors. And they tend to be theoretical, with few applications and214little apparent relevance.215216The premise of this book is that if you know how to program, you217can use that skill to learn other things, and have fun doing it.218219With a programming-based approach, I can present the most important220ideas right away. By the end of the first chapter, you can analyze221sound recordings and other signals, and generate new sounds. Each222chapter introduces a new technique and an application you can223apply to real signals. At each step you learn how to use a224technique first, and then how it works.225226This approach is more practical and, I hope you'll agree, more fun.227228229\section{Who is this book for?}230231The examples and supporting code for this book are in Python. You232should know core Python and you should be233familiar with object-oriented features, at least using objects if not234defining your own.235236If you are not already familiar with Python, you might want to start237with my other book, {\it Think Python}, which is an introduction to238Python for people who have never programmed, or Mark239Lutz's {\it Learning Python}, which might be better for people with240programming experience.241242I use NumPy and SciPy extensively. If you are familiar with them243already, that's great, but I will also explain the functions244and data structures I use.245246I assume that the reader knows basic mathematics, including complex247numbers. You don't need much calculus; if you understand the concepts248of integration and differentiation, that will do.249I use some linear algebra, but I will explain it as we250go along.251252253\section{Using the code}254\label{code}255256The code and sound samples used in this book are available from257\url{https://github.com/AllenDowney/ThinkDSP}. Git is a version258control system that allows you to keep track of the files that259make up a project. A collection of files under Git's control is260called a ``repository''. GitHub is a hosting service that provides261storage for Git repositories and a convenient web interface.262\index{repository}263\index{Git}264\index{GitHub}265266The GitHub homepage for my repository provides several ways to267work with the code:268269\begin{itemize}270271\item You can create a copy of my repository272on GitHub by pressing the {\sf Fork} button. If you don't already273have a GitHub account, you'll need to create one. After forking, you'll274have your own repository on GitHub that you can use to keep track275of code you write while working on this book. Then you can276clone the repo, which means that you copy the files277to your computer.278\index{fork}279280\item Or you could clone281my repository. You don't need a GitHub account to do this, but you282won't be able to write your changes back to GitHub.283\index{clone}284285\item If you don't want to use Git at all, you can download the files286in a Zip file using the button in the lower-right corner of the287GitHub page.288289\end{itemize}290291All of the code is written to work in both Python 2 and Python 3292with no translation.293294I developed this book using Anaconda from295Continuum Analytics, which is a free Python distribution that includes296all the packages you'll need to run the code (and lots more).297I found Anaconda easy to install. By default it does a user-level298installation, not system-level, so you don't need administrative299privileges. And it supports both Python 2 and Python 3. You can300download Anaconda from \url{https://www.anaconda.com/distribution/}.301\index{Anaconda}302303If you don't want to use Anaconda, you will need the following304packages:305306\begin{itemize}307308\item NumPy for basic numerical computation, \url{http://www.numpy.org/};309\index{NumPy}310311\item SciPy for scientific computation,312\url{http://www.scipy.org/};313\index{SciPy}314315\item matplotlib for visualization, \url{http://matplotlib.org/}.316\index{matplotlib}317318\end{itemize}319320Although these are commonly used packages, they are not included with321all Python installations, and they can be hard to install in some322environments. If you have trouble installing them, I323recommend using Anaconda or one of the other Python distributions324that include these packages.325\index{installation}326327Most exercises use Python scripts, but some also use Jupyter328notebooks. If you have not used Jupyter before, you can read about329it at \url{http://jupyter.org}.330\index{Jupyter}331332There are three ways you can work with the Jupyter notebooks:333334\begin{description}335336\item[Run Jupyter on your computer]337338If you installed Anaconda, you339probably got Jupyter by default. To check, start the server from340the command line, like this:341342\begin{verbatim}343$ jupyter notebook344\end{verbatim}345346If it's not installed, you can install it in Anaconda like this347348\begin{verbatim}349$ conda install jupyter350\end{verbatim}351352When you start the server, it should launch your default web browser353or create a new tab in an open browser window.354355\item[Run Jupyter on Binder]356357Binder is a service that runs Jupyter in a virtual machine. If you358follow this link, \url{http://mybinder.org/repo/AllenDowney/ThinkDSP},359you should get a Jupyter home page with the notebooks for this book360and the supporting data and scripts.361362You can run the scripts and modify them to run your own code, but the363virtual machine you run in is temporary. Any changes you make will364disappear, along with the virtual machine, if you leave it idle for365more than about an hour.366367\item[View notebooks on nbviewer]368369When we refer to notebooks later in the book, we will provide links to370nbviewer, which provides a static view of the code and results. You371can use these links to read the notebooks and listen to the examples,372but you won't be able to modify or run the code, or use the373interactive widgets.374375\end{description}376377Good luck, and have fun!378379380381\section*{Contributor List}382383If you have a suggestion or correction, please send email to384{\tt downey@allendowney.com}. If I make a change based on your385feedback, I will add you to the contributor list386(unless you ask to be omitted).387\index{contributors}388389If you include at least part of the sentence the390error appears in, that makes it easy for me to search. Page and391section numbers are fine, too, but not as easy to work with.392Thanks!393394\small395396\begin{itemize}397398\item Before I started writing, my thoughts about this book399benefited from conversations with Boulos Harb at Google and400Aurelio Ramos, formerly at Harmonix Music Systems.401402\item During the Fall 2013 semester, Nathan Lintz and Ian Daniher403worked with me on an independent study project and helped me with404the first draft of this book.405406\item On Reddit's DSP forum, the anonymous user RamjetSoundwave407helped me fix a problem with my implementation of Brownian Noise.408And andodli found a typo.409410\item In Spring 2015 I had the pleasure of teaching this material411along with Prof. Oscar Mur-Miranda and Prof. Siddhartan Govindasamy.412Both made many suggestions and corrections.413414\item Silas Gyger corrected an arithmetic error.415416\item Giuseppe Masetti sent a number of very helpful suggestions.417418\item Eric Peters sent many helpful suggestions.419420% ENDCONTRIB421422\end{itemize}423424425Special thanks to Freesound, which is the source of many of the426sound samples I use in this book, and to the Freesound users who427uploaded those sounds. I include some of their wave files in428the GitHub repository for this book, using the original file429names, so it should be easy to find their sources.430431Unfortunately, most Freesound users don't make their real names432available, so I can only thank them using their user names. Samples433used in this book were contributed by Freesound users: iluppai,434wcfl10, thirsk, docquesting, kleeb, landup, zippi1, themusicalnomad,435bcjordan, rockwehrmann, marcgascon7, jcveliz. Thank you all!436437%100475__iluppai__saxophone-weep.wav438%http://www.freesound.org/people/iluppai/sounds/100475/439440%105977__wcfl10__favorite-station.wav441%http://www.freesound.org/people/wcfl10/sounds/105977/442443%120994__thirsk__120-oboe.wav444%http://www.freesound.org/people/Thirsk/sounds/120994/445446%132736__ciccarelli__ocean-waves.wav447%http://www.freesound.org/people/ciccarelli/sounds/132736/448449%180960__kleeb__gunshot.wav450%http://www.freesound.org/people/Kleeb/sounds/180960/451452%18871__zippi1__sound-bell-440hz.wav453%http://www.freesound.org/people/zippi1/sounds/18871/454455%253887__themusicalnomad__positive-beeps.wav456%http://www.freesound.org/people/themusicalnomad/sounds/253887/457458%28042__bcjordan__voicedownbew.wav459%http://www.freesound.org/people/bcjordan/sounds/28042/460461%72475__rockwehrmann__glissup02.wav462%http://www.freesound.org/people/rockwehrmann/sounds/72475/463464%87778__marcgascon7__vocals.wav465%http://www.freesound.org/people/marcgascon7/sounds/87778/466467%92002__jcveliz__violin-origional.wav468%http://www.freesound.org/people/jcveliz/sounds/92002/469470471\normalsize472473\clearemptydoublepage474475% TABLE OF CONTENTS476\begin{latexonly}477478\tableofcontents479480\clearemptydoublepage481482\end{latexonly}483484% START THE BOOK485\mainmatter486487488\chapter{Sounds and signals}489\label{sounds}490491A {\bf signal} represents a quantity that varies in time. That492definition is pretty abstract, so let's start with a concrete example:493sound. Sound is variation in air pressure. A sound signal represents494variations in air pressure over time.495496A microphone is a device that measures these variations and generates497an electrical signal that represents sound. A speaker is a device498that takes an electrical signal and produces sound.499Microphones and speakers are called {\bf transducers} because they500transduce, or convert, signals from one form to another.501502This book is about signal processing, which includes processes for503synthesizing, transforming, and analyzing signals. I will focus on504sound signals, but the same methods apply to electronic signals,505mechanical vibration, and signals in many other domains.506507They also apply to signals that vary in space rather than time, like508elevation along a hiking trail. And they apply to signals in more509than one dimension, like an image, which you can think of as a signal510that varies in two-dimensional space. Or a movie, which is511a signal that varies in two-dimensional space {\it and} time.512513But we start with simple one-dimensional sound.514515The code for this chapter is in {\tt chap01.ipynb}, which is in the516repository for this book (see Section~\ref{code}).517You can also view it at \url{http://tinyurl.com/thinkdsp01}.518519520\section{Periodic signals}521\label{violin}522523\begin{figure}524% sounds.py525\centerline{\includegraphics[height=2.5in]{figs/sounds1.pdf}}526\caption{Segment from a recording of a bell.}527\label{fig.sounds1}528\end{figure}529530We'll start with {\bf periodic signals}, which are signals that531repeat themselves after some period of time. For example, if you532strike a bell, it vibrates and generates sound. If you record533that sound and plot the transduced signal, it looks like534Figure~\ref{fig.sounds1}.535536This signal resembles a {\bf sinusoid}, which means it has the same537shape as the trigonometric sine function.538539You can see that this signal is periodic. I chose the duration540to show three full repetitions, also known as {\bf cycles}.541The duration of each cycle, called the {\bf period}, is about 2.3 ms.542543The {\bf frequency} of a signal is the number of cycles544per second, which is the inverse of the period.545The units of frequency are cycles per second, or {\bf Hertz},546abbreviated ``Hz''. (Strictly speaking, the number of cycles is547a dimensionless number, so a Hertz is really a ``per second'').548549The frequency of this signal is about 439 Hz, slightly lower than 440550Hz, which is the standard tuning pitch for orchestral music. The551musical name of this note is A, or more specifically, A4. If you are552not familiar with ``scientific pitch notation'', the numerical suffix553indicates which octave the note is in. A4 is the A above middle C.554A5 is one octave higher. See555\url{http://en.wikipedia.org/wiki/Scientific_pitch_notation}.556557\begin{figure}558% sounds.py559\centerline{\includegraphics[height=2.5in]{figs/sounds2.pdf}}560\caption{Segment from a recording of a violin.}561\label{fig.sounds2}562\end{figure}563564A tuning fork generates a sinusoid because the vibration of the tines565is a form of simple harmonic motion. Most musical instruments566produce periodic signals, but the shape of these signals is not567sinusoidal. For example, Figure~\ref{fig.sounds2} shows a segment568from a recording of a violin playing569Boccherini's String Quintet No. 5 in E, 3rd570movement.571572%\footnote{The recording is from573% \url{http://www.freesound.org/people/jcveliz/sounds/92002/}.574%I identified the piece using \url{http://www.musipedia.org}.}575% Parson's code: DUUDDUURDR576577Again we can see that the signal is periodic, but the shape of the578signal is more complex. The shape of a periodic signal is called579the {\bf waveform}. Most musical instruments produce waveforms more580complex than a sinusoid. The shape of the waveform determines the581musical {\bf timbre}, which is our perception of the quality of the582sound. People usually perceive complex waveforms as rich, warm and583more interesting than sinusoids.584585586\section{Spectral decomposition}587588\begin{figure}589% sounds.py590\centerline{\includegraphics[height=2.5in]{figs/sounds3.pdf}}591\caption{Spectrum of a segment from the violin recording.}592\label{fig.sounds3}593\end{figure}594595The most important topic in this book is {\bf spectral decomposition},596which is the idea that any signal can be expressed as the sum of597sinusoids with different frequencies.598599The most important mathematical idea in this book is the {\bf discrete600Fourier transform}, or {\bf DFT}, which takes a signal and produces601its {\bf spectrum}. The spectrum is the set of sinusoids that add up to602produce the signal.603604And the most important algorithm in this book is the {\bf Fast605Fourier transform}, or {\bf FFT}, which is an efficient way to606compute the DFT.607608For example, Figure~\ref{fig.sounds3} shows the spectrum of the violin609recording in Figure~\ref{fig.sounds2}. The x-axis is the range of610frequencies that make up the signal. The y-axis shows the strength611or {\bf amplitude} of each frequency component.612613The lowest frequency component is called the {\bf fundamental614frequency}. The fundamental frequency of this signal is near 440 Hz615(actually a little lower, or ``flat'').616617In this signal the fundamental frequency has the largest amplitude,618so it is also the {\bf dominant frequency}.619Normally the perceived pitch of a sound is determined by the620fundamental frequency, even if it is not dominant.621622The other spikes in the spectrum are at frequencies 880, 1320, 1760, and6232200, which are integer multiples of the fundamental.624These components are called {\bf harmonics} because they are625musically harmonious with the fundamental:626627\begin{itemize}628629\item 880 is the frequency of A5, one octave higher than the fundamental.630An {\bf octave} is a doubling in frequency.631632\item 1320 is approximately E6, which is a major fifth above A5.633If you are not familiar with musical intervals like "major fifth'', see634\url{https://en.wikipedia.org/wiki/Interval_(music)}.635636\item 1760 is A6, two octaves above the fundamental.637638\item 2200 is approximately C$\sharp$7, which is a major third639above A6.640641\end{itemize}642643These harmonics make up the notes of an A major644chord, although not all in the same octave. Some of them are only645approximate because the notes that make up Western music have been646adjusted for {\bf equal temperament} (see647\url{http://en.wikipedia.org/wiki/Equal_temperament}).648649Given the harmonics and their amplitudes, you can reconstruct the650signal by adding up sinusoids.651Next we'll see how.652653654\section{Signals}655656I wrote a Python module called {\tt thinkdsp.py} that contains classes657and functions for working with signals and spectrums\footnote{The658plural of ``spectrum'' is often written ``spectra'', but I prefer659to use standard English plurals. If you are familiar with ``spectra'',660I hope my choice doesn't sound too strange.}. You661will find it in the repository for this book (see Section~\ref{code}).662663To represent signals, {\tt thinkdsp} provides a class called664{\tt Signal}, which is the parent class for several signal types,665including {\tt Sinusoid}, which represents both sine and cosine666signals.667668{\tt thinkdsp} provides functions to create sine and cosine signals:669670\begin{verbatim}671cos_sig = thinkdsp.CosSignal(freq=440, amp=1.0, offset=0)672sin_sig = thinkdsp.SinSignal(freq=880, amp=0.5, offset=0)673\end{verbatim}674675{\tt freq} is frequency in Hz. {\tt amp} is amplitude in unspecified676units where 1.0 is defined as the largest amplitude we can record or677play back.678679{\tt offset} is a {\bf phase offset} in radians. Phase offset680determines where in the period the signal starts. For example, a681sine signal with {\tt offset=0} starts at $\sin 0$, which is 0.682With {\tt offset=pi/2} it starts at $\sin \pi/2$, which is 1.683684Signals have an \verb"__add__" method, so you can use the {\tt +}685operator to add them:686687\begin{verbatim}688mix = sin_sig + cos_sig689\end{verbatim}690691The result is a {\tt SumSignal}, which represents the sum of two692or more signals.693694A Signal is basically a Python representation of a mathematical695function. Most signals are defined for all values of {\tt t},696from negative infinity to infinity.697698You can't do much with a Signal until you evaluate it. In this699context, ``evaluate'' means taking a sequence of points in time, {\tt700ts}, and computing the corresponding values of the signal, {\tt ys}.701I represent {\tt ts} and {\tt ys} using NumPy arrays and encapsulate702them in an object called a Wave.703704A Wave represents a signal evaluated at a sequence of points in705time. Each point in time is called a {\bf frame} (a term borrowed706from movies and video). The measurement itself is called a707{\bf sample}, although ``frame'' and ``sample'' are sometimes708used interchangeably.709710{\tt Signal} provides \verb"make_wave", which returns a new711Wave object:712713\begin{verbatim}714wave = mix.make_wave(duration=0.5, start=0, framerate=11025)715\end{verbatim}716717{\tt duration} is the length of the Wave in seconds. {\tt start} is718the start time, also in seconds. {\tt framerate} is the (integer)719number of frames per second, which is also the number of samples720per second.72172211,025 frames per second is one of several framerates commonly used in723audio file formats, including Waveform Audio File (WAV) and mp3.724725This example evaluates the signal from {\tt t=0} to {\tt t=0.5} at7265,513 equally-spaced frames (because 5,513 is half of 11,025).727The time between frames, or {\bf timestep}, is {\tt 1/11025} seconds, about72891 $\mu$s.729730{\tt Wave} provides a {\tt plot} method that uses {\tt pyplot}.731You can plot the wave like this:732733\begin{verbatim}734wave.plot()735pyplot.show()736\end{verbatim}737738{\tt pyplot} is part of {\tt matplotlib}; it is included in many739Python distributions, or you might have to install it.740741\begin{figure}742% sounds4.py743\centerline{\includegraphics[height=2.5in]{figs/sounds4.pdf}}744\caption{Segment from a mixture of two sinusoid signals.}745\label{fig.sounds4}746\end{figure}747748At {\tt freq=440} there are 220 periods in 0.5 seconds, so this plot749would look like a solid block of color. To zoom in on a small number750of periods, we can use {\tt segment}, which copies a segment of a Wave751and returns a new wave:752753\begin{verbatim}754period = mix.period755segment = wave.segment(start=0, duration=period*3)756\end{verbatim}757758{\tt period} is a property of a Signal; it returns the period in seconds.759760{\tt start} and {\tt duration} are in seconds. This example copies761the first three periods from {\tt mix}. The result is a Wave object.762763If we plot {\tt segment}, it looks like Figure~\ref{fig.sounds4}.764This signal contains two frequency components, so it is more765complicated than the signal from the tuning fork, but less complicated766than the violin.767768769\section{Reading and writing Waves}770771{\tt thinkdsp} provides \verb"read_wave", which reads a WAV772file and returns a Wave:773774\begin{verbatim}775violin_wave = thinkdsp.read_wave('input.wav')776\end{verbatim}777778And {\tt Wave} provides {\tt write}, which writes a WAV file:779780\begin{verbatim}781wave.write(filename='output.wav')782\end{verbatim}783784You can listen to the Wave with any media player that plays WAV785files. On UNIX systems, I use {\tt aplay}, which is simple, robust,786and included in many Linux distributions.787788{\tt thinkdsp} also provides \verb"play_wave", which runs789the media player as a subprocess:790791\begin{verbatim}792thinkdsp.play_wave(filename='output.wav', player='aplay')793\end{verbatim}794795It uses {\tt aplay} by default, but you can provide the796name of another player.797798799\section{Spectrums}800\label{spectrums}801802{\tt Wave} provides \verb"make_spectrum", which returns a803{\tt Spectrum}:804805\begin{verbatim}806spectrum = wave.make_spectrum()807\end{verbatim}808809And {\tt Spectrum} provides {\tt plot}:810811\begin{verbatim}812spectrum.plot()813thinkplot.show()814\end{verbatim}815816{\tt thinkplot} is a module I wrote to provide wrappers around some of817the functions in {\tt pyplot}. It is included in the818Git repository for this book (see Section~\ref{code}).819820{\tt Spectrum} provides three methods that modify the spectrum:821822\begin{itemize}823824\item \verb"low_pass" applies a low-pass filter, which means that825components above a given cutoff frequency are attenuated (that is,826reduced in magnitude) by a factor.827828\item \verb"high_pass" applies a high-pass filter, which means that829it attenuates components below the cutoff.830831\item \verb"band_stop" attenuates components in the band of832frequencies between two cutoffs.833834\end{itemize}835836This example attenuates all frequencies above 600 by 99\%:837838\begin{verbatim}839spectrum.low_pass(cutoff=600, factor=0.01)840\end{verbatim}841842A low pass filter removes bright, high-frequency sounds, so843the result sounds muffled and darker. To hear what it sounds844like, you can convert the Spectrum back to a Wave, and then play it.845846\begin{verbatim}847wave = spectrum.make_wave()848wave.play('temp.wav')849\end{verbatim}850851The {\tt play} method writes the wave to a file and then plays it.852If you use Jupyter notebooks, you can use \verb"make_audio", which853makes an Audio widget that plays the sound.854855856\section{Wave objects}857858\begin{figure}859% http://yuml.me/edit/5294b377860% pdftops -eps uml_diagram1.pdf861\centerline{\includegraphics[width=3.5in]{figs/uml_diagram1.pdf}}862\caption{Relationships among the classes in {\tt thinkdsp}.}863\label{fig.diagram1}864\end{figure}865866There is nothing very complicated in {\tt thinkdsp.py}. Most867of the functions it provides are thin wrappers around functions868from NumPy and SciPy.869870The primary classes in {\tt thinkdsp} are Signal, Wave, and Spectrum.871Given a Signal, you can make a Wave. Given a Wave, you can872make a Spectrum, and vice versa. These relationships are shown873in Figure~\ref{fig.diagram1}.874875A Wave object contains three attributes: {\tt ys} is a NumPy array876that contains the values in the signal; {\tt ts} is an array of the877times where the signal was evaluated or sampled; and {\tt878framerate} is the number of samples per unit of time. The879unit of time is usually seconds, but it doesn't have to be. In880one of my examples, it's days.881882Wave also provides three read-only properties: {\tt start},883{\tt end}, and {\tt duration}. If you modify {\tt ts}, these884properties change accordingly.885886To modify a wave, you can access the {\tt ts} and {\tt ys} directly.887For example:888889\begin{verbatim}890wave.ys *= 2891wave.ts += 1892\end{verbatim}893894The first line scales the wave by a factor of 2, making895it louder. The second line shifts the wave in time, making it896start 1 second later.897898But Wave provides methods that perform many common operations.899For example, the same two transformations could be written:900901\begin{verbatim}902wave.scale(2)903wave.shift(1)904\end{verbatim}905906You can read the documentation of these methods and others at907\url{http://greenteapress.com/thinkdsp.html}.908909910\section{Signal objects}911\label{sigobs}912913Signal is a parent class that provides functions common to all914kinds of signals, like \verb"make_wave". Child classes inherit915these methods and provide {\tt evaluate}, which evaluates the916signal at a given sequence of times.917918For example, Sinusoid is a child class of Signal, with this919definition:920921\begin{verbatim}922class Sinusoid(Signal):923924def __init__(self, freq=440, amp=1.0, offset=0, func=np.sin):925Signal.__init__(self)926self.freq = freq927self.amp = amp928self.offset = offset929self.func = func930\end{verbatim}931932The parameters of \verb"__init__" are:933934\begin{itemize}935936\item {\tt freq}: frequency in cycles per second, or Hz.937938\item {\tt amp}: amplitude. The units of amplitude are arbitrary,939usually chosen so 1.0 corresponds to the maximum input from a940microphone or maximum output to a speaker.941942\item {\tt offset}: indicates where in its period the signal starts;943{\tt offset} is in units of radians, for reasons I explain below.944945\item {\tt func}: a Python function used946to evaluate the signal at a particular point in time. It is947usually either {\tt np.sin} or {\tt np.cos}, yielding a sine or948cosine signal.949950\end{itemize}951952Like many init methods, this one just tucks the parameters away for953future use.954955Signal provides \verb"make_wave", which looks like956this:957958\begin{verbatim}959def make_wave(self, duration=1, start=0, framerate=11025):960n = round(duration * framerate)961ts = start + np.arange(n) / framerate962ys = self.evaluate(ts)963return Wave(ys, ts, framerate=framerate)964\end{verbatim}965966{\tt start} and {\tt duration} are the start time and duration967in seconds. {\tt framerate} is the number of frames (samples)968per second.969970{\tt n} is the number of samples, and {\tt ts} is a NumPy array971of sample times.972973To compute the {\tt ys}, \verb"make_wave" invokes {\tt evaluate},974is provided by {\tt Sinusoid}:975976\begin{verbatim}977def evaluate(self, ts):978phases = PI2 * self.freq * ts + self.offset979ys = self.amp * self.func(phases)980return ys981\end{verbatim}982983Let's unwind this function one step at time:984985\begin{enumerate}986987\item {\tt self.freq} is frequency in cycles per second, and each988element of {\tt ts} is a time in seconds, so their product is the989number of cycles since the start time.990991\item {\tt PI2} is a constant that stores $2 \pi$. Multiplying by992{\tt PI2} converts from cycles to {\bf phase}. You can think of993phase as ``cycles since the start time'' expressed in radians. Each994cycle is $2 \pi$ radians.995996\item {\tt self.offset} is the phase when $t=0$.997It has the effect of shifting the signal left or right in time.998999\item If {\tt self.func} is {\tt np.sin} or {\tt np.cos}, the result is a1000value between $-1$ and $+1$.10011002\item Multiplying by {\tt self.amp} yields a signal that ranges from1003{\tt -self.amp} to {\tt +self.amp}.10041005\end{enumerate}10061007In math notation, {\tt evaluate} is written like this:1008%1009\[ y = A \cos (2 \pi f t + \phi_0) \]1010%1011where $A$ is amplitude, $f$ is frequency, $t$ is time, and $\phi_0$1012is the phase offset. It may seem like I wrote a lot of code1013to evaluate one simple expression, but as we'll see, this code1014provides a framework for dealing with all kinds of signals, not1015just sinusoids.101610171018\section{Exercises}10191020Before you begin these exercises, you should download the code1021for this book, following the instructions in Section~\ref{code}.10221023Solutions to these exercises are in {\tt chap01soln.ipynb}.10241025\begin{exercise}1026If you have Jupyter, load {\tt chap01.ipynb}, read through it, and run1027the examples. You can also view this notebook at1028\url{http://tinyurl.com/thinkdsp01}.1029\end{exercise}103010311032\begin{exercise}1033Go to \url{http://freesound.org} and download a sound sample that1034includes music, speech, or other sounds that have a well-defined pitch.1035Select a roughly half-second segment where the pitch is1036constant. Compute and plot the spectrum of the segment you selected.1037What connection can you make between the timbre of the sound and the1038harmonic structure you see in the spectrum?10391040Use \verb"high_pass", \verb"low_pass", and \verb"band_stop" to1041filter out some of the harmonics. Then convert the spectrum back1042to a wave and listen to it. How does the sound relate to the1043changes you made in the spectrum?1044\end{exercise}104510461047\begin{exercise}1048Synthesize a compound signal by creating SinSignal and CosSignal1049objects and adding them up. Evaluate the signal to get a Wave,1050and listen to it. Compute its Spectrum and plot it.1051What happens if you add frequency1052components that are not multiples of the fundamental?1053\end{exercise}105410551056\begin{exercise}1057Write a function called {\tt stretch} that takes a Wave and a stretch1058factor and speeds up or slows down the wave by modifying {\tt ts} and1059{\tt framerate}. Hint: it should only take two lines of code.1060\end{exercise}106110621063\chapter{Harmonics}1064\label{harmonics}10651066In this chapter I present several new waveforms; we will look and1067their spectrums to understand their {\bf harmonic structure}, which is1068the set of sinusoids they are made up of.10691070I'll also introduce one of the most important phenomena in digital1071signal processing: aliasing. And I'll explain a little more about how1072the Spectrum class works.10731074The code for this chapter is in {\tt chap02.ipynb}, which is in the1075repository for this book (see Section~\ref{code}).1076You can also view it at \url{http://tinyurl.com/thinkdsp02}.107710781079\section{Triangle waves}1080\label{triangle}10811082A sinusoid contains only one frequency component, so its spectrum1083has only one peak. More complicated waveforms, like the1084violin recording, yield DFTs with many peaks. In this section we1085investigate the relationship between waveforms and their spectrums.10861087\begin{figure}1088% aliasing.py1089\centerline{\includegraphics[height=2.5in]{figs/triangle-200-1.pdf}}1090\caption{Segment of a triangle signal at 200 Hz.}1091\label{fig.triangle.200.1}1092\end{figure}10931094I'll start with a triangle waveform, which is like a straight-line1095version of a sinusoid. Figure~\ref{fig.triangle.200.1} shows a1096triangle waveform with frequency 200 Hz.10971098To generate a triangle wave, you can use {\tt thinkdsp.TriangleSignal}:10991100\begin{verbatim}1101class TriangleSignal(Sinusoid):11021103def evaluate(self, ts):1104cycles = self.freq * ts + self.offset / PI21105frac, _ = np.modf(cycles)1106ys = np.abs(frac - 0.5)1107ys = normalize(unbias(ys), self.amp)1108return ys1109\end{verbatim}11101111{\tt TriangleSignal} inherits \verb"__init__" from {\tt Sinusoid},1112so it takes the same arguments: {\tt freq}, {\tt amp}, and {\tt offset}.11131114The only difference is {\tt evaluate}. As we saw before,1115{\tt ts} is the sequence of sample times where we want to1116evaluate the signal.11171118There are many ways to generate a triangle wave. The details1119are not important, but here's how {\tt evaluate} works:11201121\begin{enumerate}11221123\item {\tt cycles} is the number of cycles since the start time.1124{\tt np.modf} splits the number of cycles into the fraction1125part, stored in {\tt frac}, and the integer part, which is ignored1126\footnote{Using an underscore as a variable name is a convention that1127means, ``I don't intend to use this value.''}.11281129\item {\tt frac} is a sequence that ramps from 0 to 1 with the given1130frequency. Subtracting 0.5 yields values between -0.5 and 0.5.1131Taking the absolute value yields a waveform that zig-zags between11320.5 and 0.11331134\item {\tt unbias} shifts the waveform down so it is centered at 0; then1135{\tt normalize} scales it to the given amplitude, {\tt amp}.11361137\end{enumerate}11381139Here's the code that generates Figure~\ref{fig.triangle.200.1}:11401141\begin{verbatim}1142signal = thinkdsp.TriangleSignal(200)1143signal.plot()1144\end{verbatim}11451146\begin{figure}1147% aliasing.py1148\centerline{\includegraphics[height=2.5in]{figs/triangle-200-2.pdf}}1149\caption{Spectrum of a triangle signal at 200 Hz, shown on two1150vertical scales. The version on the right cuts off the fundamental1151to show the harmonics more clearly.}1152\label{fig.triangle.200.2}1153\end{figure}11541155Next we can use the Signal to make a Wave, and use the Wave to1156make a Spectrum:11571158\begin{verbatim}1159wave = signal.make_wave(duration=0.5, framerate=10000)1160spectrum = wave.make_spectrum()1161spectrum.plot()1162\end{verbatim}11631164Figure~\ref{fig.triangle.200.2} shows two views of the result; the1165view on the right is scaled to show the harmonics more clearly. As1166expected, the highest peak is at the fundamental frequency, 200 Hz,1167and there are additional peaks at harmonic frequencies, which are1168integer multiples of 200.11691170But one surprise is that there are no peaks at the even multiples:1171400, 800, etc. The harmonics of a triangle wave are all1172odd multiples of the fundamental frequency, in this example1173600, 1000, 1400, etc.11741175Another feature of this spectrum is the relationship between the1176amplitude and frequency of the harmonics. Their amplitude drops off1177in proportion to frequency squared. For example the frequency ratio1178of the first two harmonics (200 and 600 Hz) is 3, and the amplitude1179ratio is approximately 9. The frequency ratio of the next two1180harmonics (600 and 1000 Hz) is 1.7, and the amplitude ratio is1181approximately $1.7^2 = 2.9$. This relationship is called the1182{\bf harmonic structure}.118311841185\section{Square waves}1186\label{square}11871188\begin{figure}1189% aliasing.py1190\centerline{\includegraphics[height=2.5in]{figs/square-100-1.pdf}}1191\caption{Segment of a square signal at 100 Hz.}1192\label{fig.square.100.1}1193\end{figure}11941195{\tt thinkdsp} also provides {\tt SquareSignal}, which represents1196a square signal. Here's the class definition:11971198\begin{verbatim}1199class SquareSignal(Sinusoid):12001201def evaluate(self, ts):1202cycles = self.freq * ts + self.offset / PI21203frac, _ = np.modf(cycles)1204ys = self.amp * np.sign(unbias(frac))1205return ys1206\end{verbatim}12071208Like {\tt TriangleSignal}, {\tt SquareSignal} inherits1209\verb"__init__" from {\tt Sinusoid}, so it takes the same1210parameters.12111212And the {\tt evaluate} method is similar. Again, {\tt cycles} is1213the number of cycles since the start time, and {\tt frac} is the1214fractional part, which ramps from 0 to 1 each period.12151216{\tt unbias} shifts {\tt frac} so it ramps from -0.5 to 0.5,1217then {\tt np.sign} maps the negative values to -1 and the1218positive values to 1. Multiplying by {\tt amp} yields a square1219wave that jumps between {\tt -amp} and {\tt amp}.12201221\begin{figure}1222% aliasing.py1223\centerline{\includegraphics[height=2.5in]{figs/square-100-2.pdf}}1224\caption{Spectrum of a square signal at 100 Hz.}1225\label{fig.square.100.2}1226\end{figure}12271228Figure~\ref{fig.square.100.1} shows three periods of a square1229wave with frequency 100 Hz,1230and Figure~\ref{fig.square.100.2} shows its spectrum.12311232Like a triangle wave, the square wave contains only odd harmonics,1233which is why there are peaks at 300, 500, and 700 Hz, etc.1234But the amplitude of the harmonics drops off more slowly.1235Specifically, amplitude drops in proportion to frequency (not frequency1236squared).12371238The exercises at the end of this chapter give you a chance to1239explore other waveforms and other harmonic structures.124012411242\section{Aliasing}1243\label{aliasing}12441245\begin{figure}1246% aliasing.py1247\centerline{\includegraphics[height=2.5in]{figs/triangle-1100-2.pdf}}1248\caption{Spectrum of a triangle signal at 1100 Hz sampled at124910,000 frames per second. The view on the right is scaled to1250show the harmonics.}1251\label{fig.triangle.1100.2}1252\end{figure}12531254I have a confession. I chose the examples in the previous section1255carefully to avoid showing you something confusing. But now it's1256time to get confused.12571258Figure~\ref{fig.triangle.1100.2} shows the spectrum of a triangle wave1259at 1100 Hz, sampled at 10,000 frames per second. Again, the view on1260the right is scaled to show the harmonics.12611262The harmonics1263of this wave should be at 3300, 5500, 7700, and 9900 Hz.1264In the figure, there are peaks at 1100 and 3300 Hz, as expected, but1265the third peak is at 4500, not 5500 Hz. The1266fourth peak is at 2300, not 7700 Hz. And if you look closely, the1267peak that should be at 9900 is actually at 100 Hz. What's going on?12681269The problem is that when you evaluate the signal at1270discrete points in time, you lose information about what happened1271between samples. For low frequency components, that's not a1272problem, because you have lots of samples per period.12731274But if you sample a signal at 5000 Hz with 10,000 frames per second,1275you only have two samples per period. That turns out to be enough,1276just barely, but if the frequency is higher, it's not.12771278To see why, let's generate cosine signals at 4500 and 5500 Hz,1279and sample them at 10,000 frames per second:12801281\begin{verbatim}1282framerate = 1000012831284signal = thinkdsp.CosSignal(4500)1285duration = signal.period*51286segment = signal.make_wave(duration, framerate=framerate)1287segment.plot()12881289signal = thinkdsp.CosSignal(5500)1290segment = signal.make_wave(duration, framerate=framerate)1291segment.plot()1292\end{verbatim}12931294\begin{figure}1295% aliasing.py1296\centerline{\includegraphics[height=3.5in]{figs/aliasing1.pdf}}1297\caption{Cosine signals at 4500 and 5500 Hz, sampled at 10,000 frames1298per second. The signals are different, but the samples are identical.}1299\label{fig.aliasing1}1300\end{figure}13011302Figure~\ref{fig.aliasing1} shows the result. I plotted the Signals1303with thin gray lines and the samples using vertical lines,1304to make it easier to compare the1305two Waves. The problem1306should be clear: even though the Signals are different, the1307Waves are identical!13081309When we sample a 5500 Hz signal at 10,000 frames per second, the1310result is indistinguishable from a 4500 Hz signal.1311For the same reason, a 7700 Hz signal is indistinguishable1312from 2300 Hz, and a 9900 Hz is indistinguishable from 100 Hz.13131314This effect is called {\bf aliasing} because when the high frequency1315signal is sampled, it appears to be a low frequency signal.13161317In this example, the highest frequency we can measure is 5000 Hz,1318which is half the sampling rate. Frequencies above 5000 Hz are folded1319back below 5000 Hz, which is why this threshold is sometimes called1320the ``folding frequency''. It is sometimes also called the {\bf1321Nyquist frequency}. See1322\url{http://en.wikipedia.org/wiki/Nyquist_frequency}.13231324The folding pattern continues if the aliased frequency goes below1325zero. For example, the 5th harmonic of the 1100 Hz triangle wave is1326at 12,100 Hz. Folded at 5000 Hz, it would appear at -2100 Hz, but it1327gets folded again at 0 Hz, back to 2100 Hz. In fact, you can see a1328small peak at 2100 Hz in Figure~\ref{fig.square.100.2}, and the next1329one at 4300 Hz.133013311332\section{Computing the spectrum}13331334We have seen the Wave method \verb"make_spectrum" several times.1335Here is the implementation (leaving out some details we'll get1336to later):13371338\begin{verbatim}1339from np.fft import rfft, rfftfreq13401341# class Wave:1342def make_spectrum(self):1343n = len(self.ys)1344d = 1 / self.framerate13451346hs = rfft(self.ys)1347fs = rfftfreq(n, d)13481349return Spectrum(hs, fs, self.framerate)1350\end{verbatim}13511352The parameter {\tt self} is a Wave object. {\tt n} is the number1353of samples in the wave, and {\tt d} is the inverse of the1354frame rate, which is the time between samples.13551356{\tt np.fft} is the NumPy module that provides functions related1357to the {\bf Fast Fourier Transform} (FFT), which is an efficient1358algorithm that computes the Discrete Fourier Transform (DFT).13591360%TODO: add a forward reference to full fft1361\verb"make_spectrum" uses {\tt rfft}, which stands for ``real1362FFT'', because the Wave contains real values, not complex. Later1363we'll see the full FFT, which can handle complex signals. The result1364of {\tt rfft}, which I call {\tt hs}, is a NumPy array of complex1365numbers that represents the amplitude and phase offset of each1366frequency component in the wave.13671368The result of {\tt rfftfreq}, which I call {\tt fs}, is an array that1369contains frequencies corresponding to the {\tt hs}.13701371To understand the values in {\tt hs}, consider these two ways to think1372about complex numbers:13731374\begin{itemize}13751376\item A complex number is the sum of a real part and an imaginary1377part, often written $x + iy$, where $i$ is the imaginary unit,1378$\sqrt{-1}$. You can think of $x$ and $y$ as Cartesian coordinates.13791380\item A complex number is also the product of a magnitude and a1381complex exponential, $A e^{i \phi}$, where $A$ is the {\bf1382magnitude} and $\phi$ is the {\bf angle} in radians, also called1383the ``argument''. You can think of $A$ and $\phi$ as polar1384coordinates.13851386\end{itemize}13871388Each value in {\tt hs} corresponds to a frequency component: its1389magnitude is proportional to the amplitude of the corresponding1390component; its angle is the phase offset.13911392The Spectrum class provides two read-only properties, {\tt amps}1393and {\tt angles}, which return NumPy arrays representing the1394magnitudes and angles of the {\tt hs}. When we plot a Spectrum1395object, we usually plot {\tt amps} versus {\tt fs}. Sometimes1396it is also useful to plot {\tt angles} versus {\tt fs}.13971398Although it might be tempting to look at the real and imaginary1399parts of {\tt hs}, you will almost never need to. I encourage1400you to think of the DFT as a vector of amplitudes and phase offsets1401that happen to be encoded in the form of complex numbers.14021403To modify a Spectrum, you can access the {\tt hs} directly.1404For example:14051406\begin{verbatim}1407spectrum.hs *= 21408spectrum.hs[spectrum.fs > cutoff] = 01409\end{verbatim}14101411The first line multiples the elements of {\tt hs} by 2, which1412doubles the amplitudes of all components. The second line1413sets to 0 only the elements of {\tt hs} where the corresponding1414frequency exceeds some cutoff frequency.14151416But Spectrum also provides methods to perform these operations:14171418\begin{verbatim}1419spectrum.scale(2)1420spectrum.low_pass(cutoff)1421\end{verbatim}14221423You can read the documentation of these methods and others at1424\url{http://greenteapress.com/thinkdsp.html}.14251426At this point you should have a better idea of how the Signal, Wave,1427and Spectrum classes work, but I have not explained how the Fast1428Fourier Transform works. That will take a few more chapters.142914301431\section{Exercises}14321433Solutions to these exercises are in {\tt chap02soln.ipynb}.14341435\begin{exercise}1436If you use Jupyter, load {\tt chap02.ipynb} and try out the examples.1437You can also view the notebook at \url{http://tinyurl.com/thinkdsp02}.1438\end{exercise}14391440\begin{exercise}1441A sawtooth signal has a waveform that ramps up linearly from -1 to 1,1442then drops to -1 and repeats. See1443\url{http://en.wikipedia.org/wiki/Sawtooth_wave}14441445Write a class called1446{\tt SawtoothSignal} that extends {\tt Signal} and provides1447{\tt evaluate} to evaluate a sawtooth signal.14481449Compute the spectrum of a sawtooth wave. How does the harmonic1450structure compare to triangle and square waves?1451\end{exercise}14521453\begin{exercise}1454Make a square signal at 1100 Hz and make a wave that samples it1455at 10000 frames per second. If you plot the spectrum, you can1456see that most of the harmonics are aliased.1457When you listen to the wave, can you hear the aliased harmonics?1458\end{exercise}145914601461\begin{exercise}1462If you have a spectrum object, {\tt spectrum}, and print the1463first few values of {\tt spectrum.fs}, you'll see that they1464start at zero. So {\tt spectrum.hs[0]} is the magnitude1465of the component with frequency 0. But what does that mean?14661467Try this experiment:14681469\begin{enumerate}14701471\item Make a triangle signal with frequency 440 and make1472a Wave with duration 0.01 seconds. Plot the waveform.14731474\item Make a Spectrum object and print {\tt spectrum.hs[0]}.1475What is the amplitude and phase of this component?14761477\item Set {\tt spectrum.hs[0] = 100}. Make a Wave from the1478modified Spectrum and plot it. What effect does this operation1479have on the waveform?14801481\end{enumerate}14821483\end{exercise}148414851486\begin{exercise}1487Write a function that takes a Spectrum as a parameter and modifies1488it by dividing each element of {\tt hs} by the corresponding frequency1489from {\tt fs}. Hint: since division by zero is undefined, you1490might want to set {\tt spectrum.hs[0] = 0}.14911492Test your function using a square, triangle, or sawtooth wave.14931494\begin{enumerate}14951496\item Compute the Spectrum and plot it.14971498\item Modify the Spectrum using your function and plot it again.14991500\item Make a Wave from the modified Spectrum and listen to it. What1501effect does this operation have on the signal?15021503\end{enumerate}15041505\end{exercise}15061507\begin{exercise}1508Triangle and square waves have odd harmonics only; the sawtooth1509wave has both even and odd harmonics. The harmonics of the1510square and sawtooth waves drop off in proportion to $1/f$; the1511harmonics of the triangle wave drop off like $1/f^2$. Can you1512find a waveform that has even and odd harmonics that drop off1513like $1/f^2$?15141515Hint: There are two ways you could approach this: you could1516construct the signal you want by adding up sinusoids, or you1517could start with a signal that is similar to what you want and1518modify it.1519\end{exercise}152015211522\chapter{Non-periodic signals}1523\label{nonperiodic}15241525The signals we have worked with so far are periodic, which means1526that they repeat forever. It also means that the frequency1527components they contain do not change over time.1528In this chapter, we consider non-periodic signals,1529whose frequency components {\em do} change over time.1530In other words, pretty much all sound signals.15311532This chapter also presents spectrograms, a common way to visualize1533non-periodic signals.15341535The code for this chapter is in {\tt chap03.ipynb}, which is in the1536repository for this book (see Section~\ref{code}).1537You can also view it at \url{http://tinyurl.com/thinkdsp03}.153815391540\section{Linear chirp}15411542\begin{figure}1543% chirp.py1544\centerline{\includegraphics[height=2.5in]{figs/chirp3.pdf}}1545\caption{Chirp waveform near the beginning, middle, and end.}1546\label{fig.chirp3}1547\end{figure}15481549We'll start with a {\bf chirp}, which is a signal with variable1550frequency. {\tt thinkdsp} provides a Signal called Chirp that1551makes a sinusoid that sweeps linearly through a range of1552frequencies.15531554Here's an example that sweeps from 220 to 880 Hz, which1555is two octaves from A3 to A5:15561557\begin{verbatim}1558signal = thinkdsp.Chirp(start=220, end=880)1559wave = signal.make_wave()1560\end{verbatim}15611562Figure~\ref{fig.chirp3} shows segments of this wave near the1563beginning, middle, and end. It's clear that the frequency is1564increasing.15651566Before we go on, let's see how Chirp is implemented. Here1567is the class definition:15681569\begin{verbatim}1570class Chirp(Signal):15711572def __init__(self, start=440, end=880, amp=1.0):1573self.start = start1574self.end = end1575self.amp = amp1576\end{verbatim}15771578{\tt start} and {\tt end} are the frequencies, in Hz, at the start1579and end of the chirp. {\tt amp} is amplitude.15801581Here is the function that evaluates the signal:15821583\begin{verbatim}1584def evaluate(self, ts):1585freqs = np.linspace(self.start, self.end, len(ts)-1)1586return self._evaluate(ts, freqs)1587\end{verbatim}15881589{\tt ts} is the sequence of points in time where the signal should be1590evaluated; to keep this function simple, I assume they are equally-spaced.15911592If the length of {\tt ts} is $n$, you can think of it as a sequence of1593$n-1$ intervals of time. To compute the frequency during each1594interval, I use {\tt np.linspace}, which returns a NumPy array of1595$n-1$ values between {\tt start} and {\tt end}.15961597\verb"_evaluate" is a private method1598that does the rest of the math\footnote{Beginning a method name1599with an underscore makes it ``private'', indicating that it is not1600part of the API that should be used outside the class definition.}:16011602\begin{verbatim}1603def _evaluate(self, ts, freqs):1604dts = np.diff(ts)1605dphis = PI2 * freqs * dts1606phases = np.cumsum(dphis)1607phases = np.insert(phases, 0, 0)1608ys = self.amp * np.cos(phases)1609return ys1610\end{verbatim}16111612{\tt np.diff} computes the difference between adjacent elements1613of {\tt ts}, returning the length of each interval in seconds.1614If the elements of {\tt ts} are equally spaced,1615the {\tt dts} are all the same.16161617The next step is to figure out how much the phase changes during1618each interval. In Section~\ref{sigobs} we saw that when frequency is1619constant, the phase, $\phi$, increases linearly over time:1620%1621\[ \phi = 2 \pi f t \]1622%1623When frequency is a function of time, the {\em change} in phase1624during a short time interval, $\Delta t$ is:1625%1626\[ \Delta \phi = 2 \pi f(t) \Delta t \]1627%1628In Python, since {\tt freqs} contains $f(t)$ and {\tt dts}1629contains the time intervals, we can write16301631\begin{verbatim}1632dphis = PI2 * freqs * dts1633\end{verbatim}16341635Now, since {\tt dphis} contains the changes in phase, we can1636get the total phase at each timestep by adding up the changes:16371638\begin{verbatim}1639phases = np.cumsum(dphis)1640phases = np.insert(phases, 0, 0)1641\end{verbatim}16421643{\tt np.cumsum} computes the cumulative sum, which is almost1644what we want, but it doesn't start at 0. So I use {\tt np.insert}1645to add a 0 at the beginning.16461647The result is a NumPy array where the {\tt i}th element contains the1648sum of the first {\tt i} terms from {\tt dphis}; that is, the total1649phase at the end of the {\tt i}th interval. Finally, {\tt np.cos}1650computes the amplitude of the wave as a function of phase (remember1651that phase is expressed in radians).16521653If you know calculus, you might notice that the limit as1654$\Delta t$ gets small is1655%1656\[ d\phi = 2 \pi f(t) dt \]1657%1658Dividing through by $dt$ yields1659%1660\[ \frac{d\phi}{dt} = 2 \pi f(t) \]1661%1662In other words, frequency is the derivative of phase. Conversely,1663phase is the integral of frequency. When we used {\tt cumsum}1664to go from frequency to phase, we were approximating integration.166516661667\section{Exponential chirp}16681669When you listen to this chirp, you might notice that the pitch1670rises quickly at first and then slows down.1671The chirp spans two octaves, but it only takes 2/3 s to span1672the first octave, and twice as long to span the second.16731674The reason is that our perception of pitch depends on the logarithm of1675frequency. As a result, the {\bf interval} we hear between two notes1676depends on the {\em ratio} of their frequencies, not the difference.1677``Interval'' is the musical term for the perceived difference between1678two pitches.16791680For example, an octave is an interval where the ratio of two1681pitches is 2. So the interval from 220 to 440 is one octave1682and the interval from 440 to 880 is also one octave. The difference1683in frequency is bigger, but the ratio is the same.16841685As a result, if frequency increases linearly, as in a linear1686chirp, the perceived pitch increases logarithmically.16871688If you want the perceived pitch to increase linearly, the frequency1689has to increase exponentially. A signal with that shape is called1690an {\bf exponential chirp}.16911692Here's the code that makes one:16931694\begin{verbatim}1695class ExpoChirp(Chirp):16961697def evaluate(self, ts):1698start, end = np.log10(self.start), np.log10(self.end)1699freqs = np.logspace(start, end, len(ts)-1)1700return self._evaluate(ts, freqs)1701\end{verbatim}17021703Instead of {\tt np.linspace}, this version of evaluate uses1704{\tt np.logspace}, which creates a series of frequencies1705whose logarithms are equally spaced, which means that they increase1706exponentially.17071708That's it; everything else is the same as Chirp. Here's the code1709that makes one:17101711\begin{verbatim}1712signal = thinkdsp.ExpoChirp(start=220, end=880)1713wave = signal.make_wave(duration=1)1714\end{verbatim}17151716You can listen to these examples in {\tt chap03.ipynb} and compare1717the linear and exponential chirps.171817191720\section{Spectrum of a chirp}1721\label{sauron}17221723\begin{figure}1724% chirp.py1725\centerline{\includegraphics[height=2.5in]{figs/chirp1.pdf}}1726\caption{Spectrum of a one-second one-octave chirp.}1727\label{fig.chirp1}1728\end{figure}17291730What do you think happens if you compute the spectrum of a chirp?1731Here's an example that constructs a one-second, one-octave chirp and1732its spectrum:17331734\begin{verbatim}1735signal = thinkdsp.Chirp(start=220, end=440)1736wave = signal.make_wave(duration=1)1737spectrum = wave.make_spectrum()1738\end{verbatim}17391740Figure~\ref{fig.chirp1} shows the result. The spectrum has1741components at every frequency from 220 to 440 Hz, with variations1742that look a little like the Eye of Sauron1743(see \url{http://en.wikipedia.org/wiki/Sauron}).17441745The spectrum is approximately flat between 220 and 440 Hz, which1746indicates that the signal spends equal time at each frequency in this1747range. Based on that observation, you should be able to guess what1748the spectrum of an exponential chirp looks like.17491750The spectrum gives hints about the structure of the signal,1751but it obscures the relationship between frequency and time.1752For example, we cannot tell by looking at this spectrum whether1753the frequency went up or down, or both.175417551756\section{Spectrogram}17571758\begin{figure}1759% chirp.py1760\centerline{\includegraphics[height=2.5in]{figs/chirp2.pdf}}1761\caption{Spectrogram of a one-second one-octave chirp.}1762\label{fig.chirp2}1763\end{figure}17641765To recover the relationship between frequency and time, we can break1766the chirp into segments and plot the spectrum of each segment. The1767result is called a {\bf short-time Fourier transform} (STFT).17681769There are several ways to visualize a STFT, but the most common1770is a {\bf spectrogram}, which shows time on the x-axis and frequency1771on the y-axis. Each column in the spectrogram shows the spectrum of1772a short segment, using color or grayscale to represent amplitude.17731774As an example, I'll compute the spectrogram of this chirp:17751776\begin{verbatim}1777signal = thinkdsp.Chirp(start=220, end=440)1778wave = signal.make_wave(duration=1, framerate=11025)1779\end{verbatim}17801781{\tt Wave} provides \verb"make_spectrogram", which returns a1782{\tt Spectrogram} object:17831784\begin{verbatim}1785spectrogram = wave.make_spectrogram(seg_length=512)1786spectrogram.plot(high=700)1787\end{verbatim}17881789\verb"seg_length" is the number of samples in each segment. I chose1790512 because FFT is most efficient when the number of samples is a1791power of 2.17921793Figure~\ref{fig.chirp2} shows the result. The x-axis shows time from17940 to 1 seconds. The y-axis shows frequency from 0 to 700 Hz. I cut1795off the top part of the spectrogram; the full range goes to 5512.5 Hz,1796which is half of the framerate.17971798The spectrogram shows clearly that frequency increases linearly1799over time. Similarly, in the spectrogram of an exponential chirp, we1800can see the shape of the exponential curve.18011802However, notice that the peak in each column is blurred across 2--31803cells. This blurring reflects the limited resolution of the1804spectrogram.180518061807\section{The Gabor limit}1808\label{gabor}18091810The {\bf time resolution} of the spectrogram is the duration of the1811segments, which corresponds to the width of the cells in the1812spectrogram. Since each segment is 512 frames, and there are 11,0251813frames per second, the duration of each segment is about 0.046 seconds.18141815The {\bf frequency resolution} is the frequency range between1816elements in the spectrum, which corresponds to the height of the1817cells. With 512 frames, we get 256 frequency components over a range1818from 0 to 5512.5 Hz, so the range between components is 21.6 Hz.18191820More generally, if $n$ is the segment length, the spectrum contains1821$n/2$ components. If the framerate is $r$, the maximum frequency in1822the spectrum is $r/2$. So the time resolution is $n/r$ and the1823frequency resolution is1824%1825\[ \frac{r/2}{n/2} \]1826%1827which is $r/n$.18281829Ideally we would like time resolution to be small, so we can see rapid1830changes in frequency. And we would like frequency resolution to be1831small so we can see small changes in frequency. But you can't have1832both. Notice that time resolution, $n/r$, is the inverse of frequency1833resolution, $r/n$. So if one gets smaller, the other gets bigger.18341835For example, if you double the segment length, you cut frequency1836resolution in half (which is good), but you double time resolution1837(which is bad). Even increasing the framerate doesn't help. You get1838more samples, but the range of frequencies increases at1839the same time.18401841This tradeoff is called the {\bf Gabor limit} and it is a fundamental1842limitation of this kind of time-frequency analysis.184318441845\section{Leakage}18461847\begin{figure}1848% chirp.py1849\centerline{\includegraphics[width=5.5in]{figs/windowing1.pdf}}1850\caption{Spectrum of a periodic segment of a sinusoid (left), a1851non-periodic segment (middle), a windowed non-periodic segment1852(right).}1853\label{fig.windowing1}1854\end{figure}18551856In order to explain how \verb"make_spectrogram" works, I have1857to explain windowing; and in order to explain windowing, I have to1858show you the problem it is meant to address, which is leakage.18591860The Discrete Fourier Transform (DFT), which we use to compute1861Spectrums, treats waves as if they are periodic; that is, it assumes1862that the finite segment it operates on is a complete period from an1863infinite signal that repeats over all time. In practice, this1864assumption is often false, which creates problems.18651866One common problem is discontinuity at the beginning and end of the1867segment. Because DFT assumes that the signal is periodic, it1868implicitly connects the end of the segment back to the beginning to1869make a loop. If the end does not connect smoothly to the beginning,1870the discontinuity creates additional frequency components in the1871segment that are not in the signal.18721873As an example, let's start with a sine signal that contains only1874one frequency component at 440 Hz.18751876\begin{verbatim}1877signal = thinkdsp.SinSignal(freq=440)1878\end{verbatim}18791880If we select a segment that happens to be an integer multiple of1881the period, the end of the segment connects smoothly with the1882beginning, and DFT behaves well.18831884\begin{verbatim}1885duration = signal.period * 301886wave = signal.make_wave(duration)1887spectrum = wave.make_spectrum()1888\end{verbatim}18891890Figure~\ref{fig.windowing1} (left) shows the result. As expected,1891there is a single peak at 440 Hz.18921893But if the duration is not a multiple of the period, bad things1894happen. With {\tt duration = signal.period * 30.25}, the signal1895starts at 0 and ends at 1.18961897Figure~\ref{fig.windowing1} (middle) shows1898the spectrum of this segment. Again, the peak is at 440 Hz, but now1899there are additional components spread out from 240 to 640 Hz. This1900spread is called {\bf spectral leakage}, because some of the energy that1901is actually at the fundamental frequency leaks into other frequencies.19021903In this example, leakage happens because we are using DFT on a segment1904that becomes discontinuous when we treat it as periodic.190519061907\section{Windowing}19081909\begin{figure}1910% chirp.py1911\centerline{\includegraphics[height=3.5in]{figs/windowing2.pdf}}1912\caption{Segment of a sinusoid (top), Hamming window (middle), product1913of the segment and the window (bottom).}1914\label{fig.windowing2}1915\end{figure}19161917We can reduce leakage by smoothing out the discontinuity between1918the beginning and end of the segment, and one way to do that is1919{\bf windowing}.19201921A ``window'' is a function designed to transform a non-periodic1922segment into something that can pass for periodic.1923Figure~\ref{fig.windowing2} (top) shows a segment where the end does not1924connect smoothly to the beginning.19251926Figure~\ref{fig.windowing2} (middle) shows a ``Hamming window'', one of the1927more common window functions. No window function is perfect, but some1928can be shown to be optimal for different applications, and Hamming1929is a good, all-purpose window.19301931Figure~\ref{fig.windowing2} (bottom) shows the result of multiplying the1932window by the original signal. Where the window is close to 1, the1933signal is unchanged. Where the window is close to 0, the signal is1934attenuated. Because the window tapers at both ends, the end of the1935segment connects smoothly to the beginning.19361937Figure~\ref{fig.windowing1} (right) shows the spectrum of the windowed1938signal. Windowing has reduced leakage substantially, but not1939completely.19401941Here's what the code looks like. {\tt Wave} provides {\tt window},1942which applies a Hamming window:19431944\begin{verbatim}1945#class Wave:1946def window(self, window):1947self.ys *= window1948\end{verbatim}19491950And NumPy provides {\tt hamming}, which computes a Hamming window1951with a given length:19521953\begin{verbatim}1954window = np.hamming(len(wave))1955wave.window(window)1956\end{verbatim}19571958NumPy provides functions to compute other window1959functions, including {\tt bartlett}, {\tt blackman}, {\tt hanning},1960and {\tt kaiser}. One of the exercises at the end of this chapter1961asks you to experiment with these other windows.196219631964\section{Implementing spectrograms}19651966\begin{figure}1967% chirp.py1968\centerline{\includegraphics[height=2.5in]{figs/windowing3.pdf}}1969\caption{Overlapping Hamming windows.}1970\label{fig.windowing3}1971\end{figure}19721973Now that we understand windowing, we can understand the1974implementation of spectrogram.1975Here is the {\tt Wave} method that computes spectrograms:19761977\begin{verbatim}1978#class Wave:1979def make_spectrogram(self, seg_length):1980window = np.hamming(seg_length)1981i, j = 0, seg_length1982step = seg_length / 219831984spec_map = {}19851986while j < len(self.ys):1987segment = self.slice(i, j)1988segment.window(window)19891990t = (segment.start + segment.end) / 21991spec_map[t] = segment.make_spectrum()19921993i += step1994j += step19951996return Spectrogram(spec_map, seg_length)1997\end{verbatim}19981999This is the longest function in the book, so if you can handle2000this, you can handle anything.20012002The parameter, {\tt self}, is a Wave object.2003\verb"seg_length" is the number of samples in each segment.20042005{\tt window} is a Hamming window with the same length as the segments.20062007{\tt i} and {\tt j} are the slice indices that select segments from2008the wave. {\tt step} is the offset between segments. Since {\tt2009step} is half of \verb"seg_length", the segments overlap by half.2010Figure~\ref{fig.windowing3} shows what these overlapping windows look2011like.20122013\verb"spec_map" is a dictionary that maps from a timestamp to2014a Spectrum.20152016Inside the while loop, we select a slice from the wave and apply2017the window; then we construct a Spectrum2018object and add it to \verb"spec_map". The nominal time of2019each segment, {\tt t}, is the midpoint.20202021Then we advance {\tt i} and {\tt j}, and continue as long as {\tt j}2022doesn't go past the end of the Wave.20232024Finally, the method constructs and returns a Spectrogram. Here2025is the definition of Spectrogram:20262027\begin{verbatim}2028class Spectrogram(object):2029def __init__(self, spec_map, seg_length):2030self.spec_map = spec_map2031self.seg_length = seg_length2032\end{verbatim}20332034Like many init methods, this one just stores the2035parameters as attributes.20362037{\tt Spectrogram} provides {\tt plot}, which generates a2038pseudocolor plot with time along the x-axis and frequency along2039the y-axis.20402041And that's how Spectrograms are implemented.204220432044\section{Exercises}20452046Solutions to these exercises are in {\tt chap03soln.ipynb}.20472048\begin{exercise}2049Run and listen to the examples in {\tt chap03.ipynb}, which is2050in the repository for this book, and also available at2051\url{http://tinyurl.com/thinkdsp03}.20522053In the leakage example, try replacing the Hamming window with one of2054the other windows provided by NumPy, and see what effect they have on2055leakage. See2056\url{http://docs.scipy.org/doc/numpy/reference/routines.window.html}2057\end{exercise}205820592060\begin{exercise}2061Write a class called {\tt SawtoothChirp} that extends {\tt Chirp}2062and overrides {\tt evaluate} to generate a sawtooth waveform with2063frequency that increases (or decreases) linearly.20642065Hint: combine the evaluate functions from {\tt Chirp} and2066{\tt SawtoothSignal}.20672068Draw a sketch of what you think the spectrogram of this signal2069looks like, and then plot it. The effect of aliasing should be2070visually apparent, and if you listen carefully, you can hear it.2071\end{exercise}207220732074\begin{exercise}2075Make a sawtooth chirp that sweeps from 2500 to 3000 Hz, then use it to2076make a wave with duration 1 s and framerate 20 kHz. Draw a sketch of2077what you think the spectrum will look like. Then plot the2078spectrum and see if you got it right.2079\end{exercise}208020812082\begin{exercise}2083In musical terminology, a ``glissando'' is a note that slides from one2084pitch to another, so it is similar to a chirp.20852086Find or make a recording of a glissando and plot a spectrogram of the2087first few seconds. One suggestion: George Gershwin's {\it Rhapsody in2088Blue} starts with a famous clarinet glissando, which you can download2089from \url{http://archive.org/details/rhapblue11924}.2090\end{exercise}209120922093\begin{exercise}2094A trombone player can play a glissando by extending the trombone2095slide while blowing continuously. As the slide extends, the total2096length of the tube gets longer, and the resulting pitch is inversely2097proportional to length.20982099Assuming that the player moves the slide at a constant speed, how2100does frequency vary with time?21012102Write a class called {\tt TromboneGliss} that extends {\tt Chirp} and2103provides {\tt evaluate}. Make a wave that simulates a trombone2104glissando from C3 up to F3 and back down to C3. C3 is 262 Hz; F3 is2105349 Hz.21062107Plot a spectrogram of the resulting wave. Is a trombone glissando2108more like a linear or exponential chirp?2109\end{exercise}211021112112\begin{exercise}2113Make or find a recording of a series of vowel sounds and look at the2114spectrogram. Can you identify different vowels?2115\end{exercise}211621172118\chapter{Noise}21192120In English, ``noise'' means an unwanted or unpleasant sound. In the2121context of signal processing, it has two different senses:21222123\begin{enumerate}21242125\item As in English, it can mean an unwanted signal of any kind. If2126two signals interfere with each other, each signal would consider2127the other to be noise.21282129\item ``Noise'' also refers to a signal that contains components at2130many frequencies, so it lacks the harmonic structure of the periodic2131signals we saw in previous chapters.21322133\end{enumerate}21342135This chapter is about the second kind.21362137The code for this chapter is in {\tt chap04.ipynb}, which is in the2138repository for this book (see Section~\ref{code}).2139You can also view it at \url{http://tinyurl.com/thinkdsp04}.214021412142\section{Uncorrelated noise}21432144\begin{figure}2145% noise.py2146\centerline{\includegraphics[height=2.5in]{figs/whitenoise0.pdf}}2147\caption{Waveform of uncorrelated uniform noise.}2148\label{fig.whitenoise0}2149\end{figure}21502151The simplest way to understand noise is to generate it, and the2152simplest kind to generate is uncorrelated uniform noise (UU noise).2153``Uniform'' means the signal contains random values from a uniform2154distribution; that is, every value in the range is equally likely.2155``Uncorrelated'' means that the values are independent; that is,2156knowing one value provides no information about the others.21572158Here's a class that represents UU noise:21592160\begin{verbatim}2161class UncorrelatedUniformNoise(_Noise):21622163def evaluate(self, ts):2164ys = np.random.uniform(-self.amp, self.amp, len(ts))2165return ys2166\end{verbatim}21672168{\tt UncorrelatedUniformNoise} inherits from \verb"_Noise", which2169inherits from {\tt Signal}.21702171As usual, the evaluate function takes {\tt ts}, the times when the2172signal should be evaluated. It uses2173{\tt np.random.uniform}, which generates values from a2174uniform distribution. In this example, the values are in2175the range between {\tt -amp} to {\tt amp}.21762177The following example generates UU noise with duration 0.52178seconds at 11,025 samples per second.21792180\begin{verbatim}2181signal = thinkdsp.UncorrelatedUniformNoise()2182wave = signal.make_wave(duration=0.5, framerate=11025)2183\end{verbatim}21842185If you play this wave, it sounds like the static you hear if you tune2186a radio between channels. Figure~\ref{fig.whitenoise0} shows what the2187waveform looks like. As expected, it looks pretty random.21882189\begin{figure}2190% noise.py2191\centerline{\includegraphics[height=2.5in]{figs/whitenoise1.pdf}}2192\caption{Power spectrum of uncorrelated uniform noise.}2193\label{fig.whitenoise1}2194\end{figure}21952196Now let's take a look at the spectrum:21972198\begin{verbatim}2199spectrum = wave.make_spectrum()2200spectrum.plot_power()2201\end{verbatim}22022203\verb"Spectrum.plot_power" is similar to \verb"Spectrum.plot",2204except that it plots power instead of amplitude.2205Power is the square of amplitude.2206I am switching from amplitude to power in this chapter because2207it is more conventional in the context of noise.22082209Figure~\ref{fig.whitenoise1} shows the result. Like the signal, the2210spectrum looks pretty random. In fact, it {\em is} random, but we have to2211be more precise about the word ``random''. There are at least three2212things we might like to know about a noise signal or its spectrum:22132214\begin{itemize}22152216\item Distribution: The distribution of a random signal is the set of2217possible values and their probabilities. For example, in the2218uniform noise signal, the set of values is the range from -1 to 1,2219and all values have the same probability. An alternative is2220{\bf Gaussian noise}, where the set of values is the range from negative2221to positive infinity, but values near 0 are the most likely, with2222probability that drops off according to the Gaussian or2223``bell'' curve.22242225\item Correlation: Is each value in the signal independent of the2226others, or are there dependencies between them? In UU noise, the2227values are independent.2228An alternative is {\bf Brownian noise}, where each value is the sum2229of the previous value and a random ``step''. So if the value of the2230signal is high at a particular point in time, we expect it to stay2231high, and if it is low, we expect2232it to stay low.22332234\item Relationship between power and frequency: In the spectrum of UU2235noise, the power at all frequencies is drawn from the same2236distribution; that is, the average power is the same for all2237frequencies. An alternative is {\bf pink noise}, where power is2238inversely related to frequency; that is, the power at frequency $f$2239is drawn from a distribution whose mean is proportional to $1/f$.22402241\end{itemize}224222432244\section{Integrated spectrum}22452246For UU noise we can see the relationship between power and frequency2247more clearly by looking at the {\bf integrated spectrum}, which2248is a function of frequency, $f$, that shows the cumulative power in2249the spectrum up to $f$.22502251\begin{figure}2252% noise.py2253\centerline{\includegraphics[height=2.5in]{figs/whitenoise2.pdf}}2254\caption{Integrated spectrum of uncorrelated uniform noise.}2255\label{fig.whitenoise2}2256\end{figure}22572258{\tt Spectrum} provides a method that computes the IntegratedSpectrum:22592260\begin{verbatim}2261def make_integrated_spectrum(self):2262cs = np.cumsum(self.power)2263cs /= cs[-1]2264return IntegratedSpectrum(cs, self.fs)2265\end{verbatim}22662267{\tt self.power} is a NumPy array containing power for each frequency.2268{\tt np.cumsum} computes the cumulative sum of the powers.2269Dividing through by the last element normalizes the integrated2270spectrum so it runs from 0 to 1.22712272The result is an IntegratedSpectrum. Here is the class definition:22732274\begin{verbatim}2275class IntegratedSpectrum(object):2276def __init__(self, cs, fs):2277self.cs = cs2278self.fs = fs2279\end{verbatim}22802281Like Spectrum, IntegratedSpectrum provides \verb"plot_power", so2282we can compute and plot the integrated spectrum like this:22832284\begin{verbatim}2285integ = spectrum.make_integrated_spectrum()2286integ.plot_power()2287\end{verbatim}22882289The result, shown in Figure~\ref{fig.whitenoise2}, is a straight line,2290which indicates that power at all frequencies is constant, on average.2291Noise with equal power at all frequencies is called {\bf white noise}2292by analogy with light, because an equal mixture of light at all2293visible frequencies is white.2294229522962297\section{Brownian noise}2298\label{brownian}22992300\begin{figure}2301% noise.py2302\centerline{\includegraphics[height=2.5in]{figs/rednoise0.pdf}}2303\caption{Waveform of Brownian noise.}2304\label{fig.rednoise0}2305\end{figure}23062307UU noise is uncorrelated, which means that each value does not depend2308on the others. An alternative is {\bf Brownian noise}, in which each value2309is the sum of the previous value and a random ``step''.23102311It is called ``Brownian'' by analogy with Brownian motion, in which a2312particle suspended in a fluid moves apparently at random, due to2313unseen interactions with the fluid. Brownian motion is often2314described using a {\bf random walk}, which is a mathematical model2315of a path where the distance between steps is characterized by a2316random distribution.23172318In a one-dimensional random walk, the particle moves up or down2319by a random amount at each time step. The location of the particle2320at any point in time is the sum of all previous steps.23212322This observation suggests a way to generate Brownian noise:2323generate uncorrelated random steps and then add them up.2324Here is a class definition that implements this algorithm:23252326\begin{verbatim}2327class BrownianNoise(_Noise):23282329def evaluate(self, ts):2330dys = np.random.uniform(-1, 1, len(ts))2331ys = np.cumsum(dys)2332ys = normalize(unbias(ys), self.amp)2333return ys2334\end{verbatim}23352336{\tt evaluate} uses {\tt np.random.uniform} to generate an2337uncorrelated signal and {\tt np.cumsum} to compute their cumulative2338sum.23392340Since the sum is likely to escape the range from -1 to23411, we have to use {\tt unbias} to shift the mean to 0, and {\tt2342normalize} to get the desired maximum amplitude.23432344Here's the code that generates a BrownianNoise object and plots the2345waveform.23462347\begin{verbatim}2348signal = thinkdsp.BrownianNoise()2349wave = signal.make_wave(duration=0.5, framerate=11025)2350wave.plot()2351\end{verbatim}23522353Figure~\ref{fig.rednoise0} shows the result. The waveform2354wanders up and down, but there is a clear correlation between2355successive values. When the amplitude is high, it tends to stay2356high, and vice versa.23572358\begin{figure}2359% noise.py2360\centerline{\includegraphics[height=2.5in]{figs/rednoise3.pdf}}2361\caption{Spectrum of Brownian noise on a linear scale (left) and2362log-log scale (right).}2363\label{fig.rednoise3}2364\end{figure}23652366If you plot the spectrum of Brownian noise on a linear scale, as2367in Figure~\ref{fig.rednoise3} (left), it2368doesn't look like much. Nearly all of the power is at the lowest2369frequencies; the higher frequency components are not visible.23702371To see the shape of the spectrum more clearly, we can plot power2372and frequency on a log-log scale. Here's the code:23732374\begin{verbatim}2375spectrum = wave.make_spectrum()2376spectrum.plot_power(linewidth=1, alpha=0.5)2377thinkplot.config(xscale='log', yscale='log')2378\end{verbatim}23792380The result is in Figure~\ref{fig.rednoise3} (right). The relationship between2381power and frequency is noisy, but roughly linear.23822383{\tt Spectrum} provides \verb"estimate_slope", which uses SciPy to compute2384a least squares fit to the power spectrum:23852386\begin{verbatim}2387#class Spectrum23882389def estimate_slope(self):2390x = np.log(self.fs[1:])2391y = np.log(self.power[1:])2392t = scipy.stats.linregress(x,y)2393return t2394\end{verbatim}23952396It discards the first component of the spectrum because2397this component corresponds to $f=0$, and $\log 0$ is undefined.23982399\verb"estimate_slope" returns the result from {\tt2400scipy.stats.linregress} which is an object that contains the2401estimated slope and intercept, coefficient of determination ($R^2$),2402p-value, and standard error. For our purposes, we only need the2403slope.24042405For Brownian noise, the slope of the power spectrum is -2 (we'll see2406why in Chapter~\ref{diffint}), so we can write this relationship:2407%2408\[ \log P = k -2 \log f \]2409%2410where $P$ is power, $f$ is frequency, and $k$ is the intercept2411of the line, which is not important for our purposes.2412Exponentiating both sides yields:2413%2414\[ P = K / f^{2} \]2415%2416where $K$ is $e^k$, but still not important. More relevant is2417that power is proportional to $1/f^2$, which is characteristic2418of Brownian noise.24192420Brownian noise is also called {\bf red noise}, for the same reason that2421white noise is called ``white''. If you combine visible light with2422power proportional to $1/f^2$, most of the power2423would be at the low-frequency end of the spectrum, which is red.2424Brownian noise is also sometimes called ``brown noise'', but I think2425that's confusing, so I won't use it.24262427%TODO: refer back to this in the diff/int chapter242824292430\section{Pink Noise}2431\label{pink}24322433\begin{figure}2434% noise.py2435\centerline{\includegraphics[height=2.5in]{figs/pinknoise0.pdf}}2436\caption{Waveform of pink noise with $\beta=1$.}2437\label{fig.pinknoise0}2438\end{figure}24392440For red noise, the relationship between frequency2441and power is2442%2443\[ P = K / f^{2} \]2444%2445There is nothing special about the exponent 2. More generally,2446we can synthesize noise with any exponent, $\beta$.2447%2448\[ P = K / f^{\beta} \]2449%2450When $\beta = 0$, power is constant at all frequencies,2451so the result is white noise. When $\beta=2$ the result is red noise.24522453When $\beta$ is between 0 and 2, the result is between white and2454red noise, so it is called {\bf pink noise}.24552456There are several ways to generate pink noise. The simplest is to2457generate white noise and then apply a low-pass filter with the2458desired exponent. {\tt thinkdsp} provides a class that represents2459a pink noise signal:24602461\begin{verbatim}2462class PinkNoise(_Noise):24632464def __init__(self, amp=1.0, beta=1.0):2465self.amp = amp2466self.beta = beta2467\end{verbatim}24682469{\tt amp} is the desired amplitude of the signal.2470{\tt beta} is the desired exponent. {\tt PinkNoise} provides2471\verb"make_wave", which generates a Wave.24722473\begin{verbatim}2474def make_wave(self, duration=1, start=0, framerate=11025):2475signal = UncorrelatedUniformNoise()2476wave = signal.make_wave(duration, start, framerate)2477spectrum = wave.make_spectrum()24782479spectrum.pink_filter(beta=self.beta)24802481wave2 = spectrum.make_wave()2482wave2.unbias()2483wave2.normalize(self.amp)2484return wave22485\end{verbatim}24862487{\tt duration} is the length of the wave in seconds. {\tt start} is2488the start time of the wave; it is included so that \verb"make_wave"2489has the same interface for all types of signal, but for random noise,2490start time is irrelevant. And {\tt framerate} is the number of2491samples per second.24922493\begin{figure}2494% noise.py2495\centerline{\includegraphics[height=2.5in]{figs/noise-triple.pdf}}2496\caption{Spectrum of white, pink, and red noise on a log-log scale.}2497\label{fig.noise-triple}2498\end{figure}24992500\verb"make_wave" creates a white noise wave, computes its spectrum,2501applies a filter with the desired exponent, and then converts the2502filtered spectrum back to a wave. Then it unbiases and normalizes2503the wave.25042505{\tt Spectrum} provides \verb"pink_filter":25062507\begin{verbatim}2508def pink_filter(self, beta=1.0):2509denom = self.fs ** (beta/2.0)2510denom[0] = 12511self.hs /= denom2512\end{verbatim}25132514\verb"pink_filter" divides each element of the spectrum by2515$f^{\beta/2}$. Since power is the square of amplitude, this2516operation divides the power at2517each component by $f^\beta$. It treats the component2518at $f=0$ as a special case, partly to avoid dividing by 0, and partly2519because this element represents the bias of the signal,2520which we are going to set to 0 anyway.25212522Figure~\ref{fig.pinknoise0} shows the resulting waveform. Like2523Brownian noise, it wanders up and down in a way that suggests2524correlation between successive values, but at least visually, it looks2525more random. In the next chapter we will come back to this2526observation and I will be more precise about what I mean by2527``correlation'' and ``more random''.25282529Finally, Figure~\ref{fig.noise-triple} shows a spectrum for2530white, pink, and red noise on the same log-log scale.2531The relationship between the exponent, $\beta$, and the slope2532of the spectrum is apparent in this figure.253325342535\section{Gaussian noise}25362537\begin{figure}2538% noise.py2539\centerline{\includegraphics[height=2.5in]{figs/noise1.pdf}}2540\caption{Normal probability plot for the real and imaginary parts2541of the spectrum of Gaussian noise.}2542\label{fig.noise1}2543\end{figure}25442545We started with uncorrelated uniform (UU) noise and showed that,2546because its spectrum has equal power at all frequencies, on2547average, UU noise is white.25482549But when people talk about ``white noise'', they don't always2550mean UU noise. In fact, more often they mean uncorrelated2551Gaussian (UG) noise.25522553{\tt thinkdsp} provides an implementation of UG noise:25542555\begin{verbatim}2556class UncorrelatedGaussianNoise(_Noise):25572558def evaluate(self, ts):2559ys = np.random.normal(0, self.amp, len(ts))2560return ys2561\end{verbatim}25622563{\tt np.random.normal} returns a NumPy array of values from a2564Gaussian distribution, in this case with mean 0 and standard deviation2565{\tt self.amp}. In theory the range of values is from negative to2566positive infinity, but we expect about 99\% of the values to be2567between -3 and 3.25682569UG noise is similar in many ways to UU noise. The spectrum has2570equal power at all frequencies, on average, so UG is also white.2571And it has one other interesting property: the spectrum of UG2572noise is also UG noise. More precisely, the real and imaginary2573parts of the spectrum are uncorrelated Gaussian values.25742575To test that claim, we can generate the spectrum of UG noise and2576then generate a ``normal probability plot'', which is a graphical2577way to test whether a distribution is Gaussian.25782579\begin{verbatim}2580signal = thinkdsp.UncorrelatedGaussianNoise()2581wave = signal.make_wave(duration=0.5, framerate=11025)2582spectrum = wave.make_spectrum()25832584thinkstats2.NormalProbabilityPlot(spectrum.real)2585thinkstats2.NormalProbabilityPlot(spectrum.imag)2586\end{verbatim}25872588{\tt NormalProbabilityPlot} is provided by {\tt thinkstats2}, which is2589included in the repository for this book. If you are not familiar2590with normal probability plots, you can read about them in Chapter 52591of {\it Think Stats} at \url{http://thinkstats2.com}.25922593Figure~\ref{fig.noise1} shows the results. The gray lines2594show a linear model fit to the data; the dark lines show the2595data.25962597A straight line on a normal probability plot indicates2598that the data come from a Gaussian distribution. Except for2599some random variation at the extremes, these lines are straight,2600which indicates that the spectrum of UG noise is UG noise.26012602The spectrum of UU noise is also UG noise, at least approximately. In2603fact, by the Central Limit Theorem, the spectrum of almost any2604uncorrelated noise is approximately Gaussian, as long as the2605distribution has finite mean and standard deviation, and the number of2606samples is large.260726082609\section{Exercises}26102611Solutions to these exercises are in {\tt chap04soln.ipynb}.26122613\begin{exercise}2614``A Soft Murmur'' is a web site that plays a mixture of natural2615noise sources, including rain, waves, wind, etc. At2616\url{http://asoftmurmur.com/about/} you can find their list2617of recordings, most of which are at \url{http://freesound.org}.26182619Download a few of these files and compute the spectrum of each2620signal. Does the power spectrum look like white noise, pink noise,2621or Brownian noise? How does the spectrum vary over time?2622\end{exercise}262326242625\begin{exercise}2626In a noise signal, the mixture of frequencies changes over time.2627In the long run, we expect the power at all frequencies to be equal,2628but in any sample, the power at each frequency is random.26292630To estimate the long-term average power at each frequency, we can2631break a long signal into segments, compute the power spectrum2632for each segment, and then compute the average across2633the segments. You can read more about this algorithm at2634\url{http://en.wikipedia.org/wiki/Bartlett's_method}.26352636Implement Bartlett's method and use it to estimate the power2637spectrum for a noise wave. Hint: look at the implementation2638of \verb"make_spectrogram".2639\end{exercise}264026412642\begin{exercise}2643At \url{http://www.coindesk.com} you can download the daily2644price of a BitCoin as a CSV file. Read this file and compute2645the spectrum of BitCoin prices as a function of time.2646Does it resemble white, pink, or Brownian noise?2647\end{exercise}264826492650\begin{exercise}2651A Geiger counter is a device that detects radiation.2652When an ionizing particle strikes the detector, it outputs a surge of2653current. The total output at a point in time can be modeled as2654uncorrelated Poisson (UP) noise, where each sample is2655a random quantity from a Poisson distribution, which corresponds to the2656number of particles detected during an interval.26572658Write a class called {\tt UncorrelatedPoissonNoise} that inherits2659from \verb"thinkdsp._Noise" and provides {\tt evaluate}. It should2660use {\tt np.random.poisson} to generate random values from a Poisson2661distribution. The parameter of this function, {\tt lam}, is the2662average number of particles during each interval. You can use the2663attribute {\tt amp} to specify {\tt lam}. For example, if the2664framerate is 10 kHz and {\tt amp} is 0.001, we expect about 102665``clicks'' per second.26662667Generate about a second of UP noise and listen to it. For low values2668of {\tt amp}, like 0.001, it should sound like a Geiger counter. For2669higher values it should sound like white noise. Compute and plot the2670power spectrum to see whether it looks like white noise.2671\end{exercise}267226732674\begin{exercise}2675The algorithm in this chapter for generating pink noise is2676conceptually simple but computationally expensive. There are2677more efficient alternatives, like the Voss-McCartney algorithm.2678Research this method, implement it, compute the spectrum of2679the result, and confirm that it has the desired relationship2680between power and frequency.2681\end{exercise}268226832684\chapter{Autocorrelation}26852686In the previous chapter I characterized white noise as ``uncorrelated'',2687which means that each value is independent of the others, and Brownian2688noise as ``correlated'', because each value depends on the preceding2689value. In this chapter I define these terms more precisely and2690present the {\bf autocorrelation function}, which is a useful tool2691for signal analysis.26922693The code for this chapter is in {\tt chap05.ipynb}, which is in the2694repository for this book (see Section~\ref{code}).2695You can also view it at \url{http://tinyurl.com/thinkdsp05}.269626972698\section{Correlation}26992700In general, correlation between variables means that if you know the2701value of one, you have some information about the other. There are2702several ways to quantify correlation, but the most common is the2703Pearson product-moment correlation coefficient, usually denoted2704$\rho$. For two variables, $x$ and $y$, that each contain $N$ values:2705%2706\[ \rho = \frac{ \sum_i (x_i - \mu_x) (y_i - \mu_y)}{N \sigma_x \sigma_y} \]2707%2708Where $\mu_x$ and $\mu_y$ are the means of $x$ and $y$, and2709$\sigma_x$ and $\sigma_y$ are their standard deviations.27102711Pearson's correlation is always between -1 and +1 (including both).2712If $\rho$ is positive, we say that the correlation is positive,2713which means that when one variable is high, the other tends to be2714high. If $\rho$ is negative, the correlation is negative, so2715when one variable is high, the other tends to be low.27162717The magnitude of $\rho$ indicates the strength of the correlation. If2718$\rho$ is 1 or -1, the variables are perfectly correlated, which means2719that if you know one, you can make a perfect prediction about the2720other. If $\rho$ is near zero, the correlation is probably weak, so2721if you know one, it doesn't tell you much about the others,27222723I say ``probably weak'' because it is also possible that there is2724a nonlinear relationship that is not captured by the coefficient2725of correlation. Nonlinear relationships are often important in2726statistics, but less often relevant for signal processing, so I2727won't say more about them here.27282729Python provides several ways to compute correlations. {\tt2730np.corrcoef} takes any number of variables and computes a {\bf2731correlation matrix} that includes correlations between each pair of2732variables.27332734\begin{figure}2735% autocorr.py2736\centerline{\includegraphics[height=2.5in]{figs/autocorr1.pdf}}2737\caption{Two sine waves that differ by a phase offset of 1 radian;2738their coefficient of correlation is 0.54.}2739\label{fig.autocorr1}2740\end{figure}27412742I'll present an example with only two variables. First, I define2743a function that constructs sine waves with different phase offsets:27442745\begin{verbatim}2746def make_sine(offset):2747signal = thinkdsp.SinSignal(freq=440, offset=offset)2748wave = signal.make_wave(duration=0.5, framerate=10000)2749return wave2750\end{verbatim}27512752Next I instantiate two waves with different offsets:27532754\begin{verbatim}2755wave1 = make_sine(offset=0)2756wave2 = make_sine(offset=1)2757\end{verbatim}27582759Figure~\ref{fig.autocorr1} shows what the first few periods of these2760waves look like. When one wave is high, the other is usually high, so we2761expect them to be correlated.27622763\begin{verbatim}2764>>> corr_matrix = np.corrcoef(wave1.ys, wave2.ys, ddof=0)2765[[ 1. 0.54]2766[ 0.54 1. ]]2767\end{verbatim}27682769The option {\tt ddof=0} indicates that {\tt corrcoef} should divide by2770$N$, as in the equation above, rather than use the default, $N-1$.27712772The result is a correlation matrix:2773the first element is the correlation of {\tt wave1}2774with itself, which is always 1. Similarly, the last element2775is the correlation of {\tt wave2} with itself.27762777\begin{figure}2778% autocorr.py2779\centerline{\includegraphics[height=2.5in]{figs/autocorr2.pdf}}2780\caption{The correlation of two sine waves as a function of the2781phase offset between them. The result is a cosine.}2782\label{fig.autocorr2}2783\end{figure}27842785The off-diagonal elements contain the value we're interested in,2786the correlation of {\tt wave1} and {\tt wave2}. The value 0.542787indicates that the strength of the correlation is moderate.27882789As the phase offset increases, this correlation decreases until2790the waves are 180 degrees out of phase, which yields correlation2791-1. Then it increases until the offset differs by 360 degrees.2792At that point we have come full circle and the correlation is 1.27932794Figure~\ref{fig.autocorr2} shows the relationship between correlation and2795phase offset for a sine wave. The shape of that curve should look2796familiar; it is a cosine.27972798{\tt thinkdsp} provides a simple interface for computing the2799correlation between waves:28002801\begin{verbatim}2802>>> wave1.corr(wave2)28030.542804\end{verbatim}280528062807\section{Serial correlation}28082809Signals often represent measurements of quantities that vary in2810time. For example, the sound signals we've worked with represent2811measurements of voltage (or current), which correspond to the changes2812in air pressure we perceive as sound.28132814Measurements like this almost always have serial correlation, which2815is the correlation between each element and the next (or the previous).2816To compute serial correlation, we can shift a signal and then compute2817the correlation of the shifted version with the original.28182819\begin{verbatim}2820def serial_corr(wave, lag=1):2821n = len(wave)2822y1 = wave.ys[lag:]2823y2 = wave.ys[:n-lag]2824corr = np.corrcoef(y1, y2, ddof=0)[0, 1]2825return corr2826\end{verbatim}28272828\verb"serial_corr" takes a Wave object and2829{\tt lag}, which is the integer number of places to shift the wave.2830It computes the correlation of the wave with a shifted version2831of itself.28322833We can test this function with the noise signals from the previous2834chapter. We expect UU noise to be uncorrelated, based on the2835way it's generated (not to mention the name):28362837\begin{verbatim}2838signal = thinkdsp.UncorrelatedGaussianNoise()2839wave = signal.make_wave(duration=0.5, framerate=11025)2840serial_corr(wave)2841\end{verbatim}28422843When I ran this example, I got 0.006, which indicates a very2844small serial correlation. You might get a different value when you run2845it, but it should be comparably small.28462847In a Brownian noise signal, each value is the sum of the previous2848value and a random ``step'', so we expect a strong serial2849correlation:28502851\begin{verbatim}2852signal = thinkdsp.BrownianNoise()2853wave = signal.make_wave(duration=0.5, framerate=11025)2854serial_corr(wave)2855\end{verbatim}28562857Sure enough, the result I got is greater than 0.999.28582859\begin{figure}2860% autocorr.py2861\centerline{\includegraphics[height=2.5in]{figs/autocorr3.pdf}}2862\caption{Serial correlation for pink noise with a range of2863parameters.}2864\label{fig.autocorr3}2865\end{figure}28662867Since pink noise is in some sense between Brownian noise and UU noise,2868we might expect an intermediate correlation:28692870\begin{verbatim}2871signal = thinkdsp.PinkNoise(beta=1)2872wave = signal.make_wave(duration=0.5, framerate=11025)2873serial_corr(wave)2874\end{verbatim}28752876With parameter $\beta=1$, I got a serial correlation of 0.851.2877As we vary the parameter from $\beta=0$, which is uncorrelated2878noise, to $\beta=2$, which is Brownian, serial correlation2879ranges from 0 to almost 1, as shown in Figure~\ref{fig.autocorr3}.288028812882\section{Autocorrelation}2883\label{autopink}28842885In the previous section we computed the correlation between each2886value and the next, so we shifted the elements of the array by 1.2887But we can easily compute serial correlations with2888different lags.28892890\begin{figure}2891% autocorr.py2892\centerline{\includegraphics[height=2.5in]{figs/autocorr4.pdf}}2893\caption{Autocorrelation functions for pink noise with a range2894of parameters.}2895\label{fig.autocorr4}2896\end{figure}28972898You can think of \verb"serial_corr" as a function that2899maps from each value of lag to the corresponding correlation, and we2900can evaluate that function by looping through values of {\tt lag}:29012902\begin{verbatim}2903def autocorr(wave):2904lags = range(len(wave.ys)//2)2905corrs = [serial_corr(wave, lag) for lag in lags]2906return lags, corrs2907\end{verbatim}29082909{\tt autocorr} takes a Wave object and returns the autocorrelation2910function as a pair of sequences: {\tt lags} is a sequence of2911integers from 0 to half the length of the wave; {\tt corrs}2912is the sequence of serial correlations for each lag.29132914Figure~\ref{fig.autocorr4} shows autocorrelation functions for pink2915noise with three values of $\beta$. For low values of $\beta$, the2916signal is less correlated, and the autocorrelation function drops2917off to zero quickly. For larger values, serial correlation2918is stronger and drops off more slowly. With $\beta=1.7$ serial2919correlation is strong even for long lags; this phenomenon is called2920{\bf long-range dependence}, because it indicates that each value in2921the signal depends on many preceding values.292229232924\section{Autocorrelation of periodic signals}29252926The autocorrelation of pink noise has interesting mathematical2927properties, but limited applications. The autocorrelation of2928periodic signals is more useful.29292930\begin{figure}2931% autocorr.py2932\centerline{\includegraphics[height=2.5in]{figs/autocorr5.pdf}}2933\caption{Spectrogram of a vocal chirp.}2934\label{fig.autocorr5}2935\end{figure}29362937As an example, I downloaded from \url{freesound.org} a recording of2938someone singing a chirp; the repository for this book includes the2939file: \url{28042__bcjordan__voicedownbew.wav}. You can use the2940Jupyter notebook for this chapter, {\tt chap05.ipynb}, to play it.29412942Figure~\ref{fig.autocorr5} shows the spectrogram of this wave.2943The fundamental frequency and some of the harmonics show up clearly.2944The chirp starts near 500 Hz and drops down to about 300 Hz, roughly2945from C5 to E4.29462947\begin{figure}2948% autocorr.py2949\centerline{\includegraphics[height=2.5in]{figs/autocorr6.pdf}}2950\caption{Spectrum of a segment from a vocal chirp.}2951\label{fig.autocorr6}2952\end{figure}29532954To estimate pitch at a particular point in time, we could use the2955spectrum, but it doesn't work very well. To see why not, I'll take2956a short segment from the wave and plot its spectrum:29572958\begin{verbatim}2959duration = 0.012960segment = wave.segment(start=0.2, duration=duration)2961spectrum = segment.make_spectrum()2962spectrum.plot(high=1000)2963\end{verbatim}29642965This segment starts at 0.2 seconds and lasts 0.01 seconds.2966Figure~\ref{fig.autocorr6} shows its spectrum. There is a clear peak2967near 400 Hz, but it is hard to identify the pitch precisely. The2968length of the segment is 441 samples at a framerate of 44100 Hz, so2969the frequency resolution is 100 Hz (see Section~\ref{gabor}).2970That means the estimated pitch might be off by 50 Hz; in musical2971terms, the range from 350 Hz to 450 Hz is about 5 semitones, which is2972a big difference!29732974We could get better frequency resolution by taking a longer segment,2975but since the pitch is changing over time, we would also get ``motion2976blur''; that is, the peak would spread between the start and end pitch of2977the segment, as we saw in Section~\ref{sauron}.29782979We can estimate pitch more precisely using autocorrelation.2980If a signal is periodic, we expect the autocorrelation to spike2981when the lag equals the period.29822983\begin{figure}2984% autocorr.py2985\centerline{\includegraphics[height=2.5in]{figs/autocorr7.pdf}}2986\caption{Two segments from a chirp, one starting 0.0023 seconds2987after the other.}2988\label{fig.autocorr7}2989\end{figure}29902991To show why that works, I'll plot two segments from the same2992recording.29932994\begin{verbatim}2995def plot_shifted(wave, offset=0.001, start=0.2):2996thinkplot.preplot(2)2997segment1 = wave.segment(start=start, duration=0.01)2998segment1.plot(linewidth=2, alpha=0.8)29993000segment2 = wave.segment(start=start-offset, duration=0.01)3001segment2.shift(offset)3002segment2.plot(linewidth=2, alpha=0.4)30033004corr = segment1.corr(segment2)3005text = r'$\rho =$ %.2g' % corr3006thinkplot.text(segment1.start+0.0005, -0.8, text)3007thinkplot.config(xlabel='Time (s)')3008\end{verbatim}30093010One segment starts at 0.2 seconds; the other starts 0.0023 seconds3011later. Figure~\ref{fig.autocorr7} shows the result. The segments3012are similar, and their correlation is 0.99. This result suggests3013that the period is near 0.0023 seconds, which corresponds to a frequency3014of 435 Hz.30153016\begin{figure}3017% autocorr.py3018\centerline{\includegraphics[height=2.5in]{figs/autocorr8.pdf}}3019\caption{Autocorrelation function for a segment from a chirp.}3020\label{fig.autocorr8}3021\end{figure}30223023For this example, I estimated the period by trial and error. To automate3024the process, we can use the autocorrelation function.30253026\begin{verbatim}3027lags, corrs = autocorr(segment)3028thinkplot.plot(lags, corrs)3029\end{verbatim}30303031Figure~\ref{fig.autocorr8} shows the autocorrelation function for3032the segment starting at $t=0.2$ seconds. The first peak occurs at3033{\tt lag=101}. We can compute the frequency that corresponds3034to that period like this:30353036\begin{verbatim}3037period = lag / segment.framerate3038frequency = 1 / period3039\end{verbatim}30403041The estimated fundamental frequency is 437 Hz. To evaluate the3042precision of the estimate, we can run the same computation with3043lags 100 and 102, which correspond to frequencies 432 and 441 Hz.3044The frequency precision using autocorrelation is less than 10 Hz,3045compared with 100 Hz using the spectrum. In musical terms, the3046expected error is about 30 cents (a third of a semitone).304730483049\section{Correlation as dot product}3050\label{dotproduct}30513052I started the chapter with this definition of Pearson's3053correlation coefficient:3054%3055\[ \rho = \frac{ \sum_i (x_i - \mu_x) (y_i - \mu_y)}{N \sigma_x \sigma_y} \]3056%3057Then I used $\rho$ to define serial correlation and autocorrelation.3058That's consistent with how these terms are used in statistics,3059but in the context of signal processing, the definitions are3060a little different.30613062In signal processing, we are often working with unbiased signals,3063where the mean is 0, and normalized signals, where the standard3064deviation is 1. In that case, the definition of $\rho$ simplifies to:3065%3066\[ \rho = \frac{1}{N} \sum_i x_i y_i \]3067%3068And it is common to simplify even further:3069%3070\[ r = \sum_i x_i y_i \]3071%3072This definition of correlation is not ``standardized'', so it doesn't3073generally fall between -1 and 1. But it has other useful properties.30743075If you think of $x$ and $y$ as vectors, you might recognize this3076formula as the {\bf dot product}, $x \cdot y$. See3077\url{http://en.wikipedia.org/wiki/Dot_product}.30783079\newcommand{\norm}{\mathrm{norm}}30803081The dot product indicates the degree to which the signals are similar.3082If they are normalized so their standard deviations are 1,3083%3084\[ x \cdot y = \cos \theta \]3085%3086where $\theta$ is the angle between the vectors. And that explains3087why Figure~\ref{fig.autocorr2} is a cosine curve.308830893090\section{Using NumPy}3091\label{correlate}30923093\begin{figure}3094% autocorr.py3095\centerline{\includegraphics[height=2.5in]{figs/autocorr9.pdf}}3096\caption{Autocorrelation function computed with {\tt np.correlate}.}3097\label{fig.autocorr9}3098\end{figure}30993100NumPy provides a function, {\tt correlate}, that computes3101the correlation of two functions or the autocorrelation of one3102function. We can use it to compute the autocorrelation of3103the segment from the previous section:31043105\begin{verbatim}3106corrs2 = np.correlate(segment.ys, segment.ys, mode='same')3107\end{verbatim}31083109The option {\tt mode} tells {\tt correlate} what range3110of {\tt lag} to use. With the value {\tt 'same'}, the3111range is from $-N/2$ to $N/2$, where $N$ is the length of the3112wave array.31133114Figure~\ref{fig.autocorr9} shows the result. It is symmetric because3115the two signals are identical, so a negative lag on one has the same3116effect as a positive lag on the other. To compare with the results3117from {\tt autocorr}, we can select the second half:31183119\begin{verbatim}3120N = len(corrs2)3121half = corrs2[N//2:]3122\end{verbatim}31233124If you compare Figure~\ref{fig.autocorr9} to Figure~\ref{fig.autocorr8},3125you'll notice that the correlations computed by {\tt np.correlate}3126get smaller as the lags increase. That's because {\tt np.correlate}3127uses the unstandardized definition of correlation;3128as the lag gets bigger, the3129overlap between the two signals gets smaller, so the magnitude of3130the correlations decreases.31313132We can correct that by dividing through by the lengths:31333134\begin{verbatim}3135lengths = range(N, N//2, -1)3136half /= lengths3137\end{verbatim}31383139Finally, we can standardize the results so the correlation with3140{\tt lag=0} is 1.31413142\begin{verbatim}3143half /= half[0]3144\end{verbatim}31453146With these adjustments, the results computed by {\tt autocorr} and3147{\tt np.correlate} are nearly the same. They still differ by31481-2\%. The reason is not important, but if you are curious: {\tt autocorr}3149standardizes the correlations independently for each lag; for3150{\tt np.correlate}, we standardized them all at the end.31513152More importantly, now you know what autocorrelation is, how to3153use it to estimate the fundamental period of a signal, and two3154ways to compute it.315531563157\section{Exercises}31583159Solutions to these exercises are in {\tt chap05soln.ipynb}.31603161\begin{exercise}3162The Jupyter notebook for this chapter, {\tt chap05.ipynb}, includes3163an interaction that lets you compute autocorrelations for different3164lags. Use this interaction to estimate the pitch of the vocal chirp3165for a few different start times.3166\end{exercise}316731683169\begin{exercise}3170The example code in \verb"chap05.ipynb" shows how to use autocorrelation3171to estimate the fundamental frequency of a periodic signal.3172Encapsulate this code in a function called \verb"estimate_fundamental",3173and use it to track the pitch of a recorded sound.31743175To see how well it works, try superimposing your pitch estimates on a3176spectrogram of the recording.3177\end{exercise}317831793180\begin{exercise}3181If you did the exercises in the previous chapter, you downloaded3182the historical price of BitCoins and estimated the power spectrum3183of the price changes. Using the same data, compute the autocorrelation3184of BitCoin prices. Does the autocorrelation function drop off quickly?3185Is there evidence of periodic behavior?3186\end{exercise}318731883189\begin{exercise}3190In the repository for this book you will find a Jupyter notebook3191called \verb"saxophone.ipynb" that explores autocorrelation,3192pitch perception, and a phenomenon called the {\bf missing fundamental}.3193Read through this notebook and run the examples. Try selecting3194a different segment of the recording and running the examples again.31953196Vi Hart has an excellent video called ``What is up with Noises? (The3197Science and Mathematics of Sound, Frequency, and Pitch)''; it3198demonstrates the missing fundamental phenomenon and explains how pitch3199perception works (at least, to the degree that we know). Watch it at3200\url{https://www.youtube.com/watch?v=i_0DXxNeaQ0}.3201320232033204\end{exercise}3205320632073208\chapter{Discrete cosine transform}3209\label{dct}32103211The topic of this chapter is the {\bf Discrete Cosine3212Transform} (DCT), which is used in MP3 and related formats for3213compressing music; JPEG and similar formats for images; and the MPEG3214family of formats for video.32153216DCT is similar in many ways to the Discrete Fourier Transform (DFT),3217which we have been using for spectral analysis.3218Once we learn how DCT works, it will be easier to explain DFT.32193220Here are the steps to get there:32213222\begin{enumerate}32233224\item We'll start with the synthesis problem: given a set of frequency3225components and their amplitudes, how can we construct a wave?32263227\item Next we'll rewrite the synthesis problem using NumPy arrays.3228This move is good for performance, and also provides insight3229for the next step.32303231\item We'll look at the analysis problem: given a signal and a3232set of frequencies, how can we find the amplitude of each frequency3233component? We'll start with a solution that is conceptually simple3234but slow.32353236\item Finally, we'll use some principles from linear algebra to find a3237more efficient algorithm. If you already know linear algebra,3238that's great, but I'll explain what you need as we go.32393240\end{enumerate}32413242The code for this chapter is in {\tt3243chap06.ipynb} which is in the repository for this book (see3244Section~\ref{code}).3245You can also view it at \url{http://tinyurl.com/thinkdsp06}.324632473248\section{Synthesis}3249\label{synth1}32503251Suppose I give you a list of amplitudes and a list of frequencies,3252and ask you to construct a signal that is the sum of these frequency3253components. Using objects in the {\tt thinkdsp} module, there is3254a simple way to perform this operation, which is called {\bf synthesis}:32553256\begin{verbatim}3257def synthesize1(amps, fs, ts):3258components = [thinkdsp.CosSignal(freq, amp)3259for amp, freq in zip(amps, fs)]3260signal = thinkdsp.SumSignal(*components)32613262ys = signal.evaluate(ts)3263return ys3264\end{verbatim}32653266{\tt amps} is a list of amplitudes, {\tt fs} is the list3267of frequencies, and {\tt ts} is the sequence3268of times where the signal should be evaluated.32693270{\tt components} is a list of {\tt CosSignal} objects, one for3271each amplitude-frequency pair. {\tt SumSignal} represents the3272sum of these frequency components.32733274Finally, {\tt evaluate} computes the value of the signal at each3275time in {\tt ts}.32763277We can test this function like this:32783279\begin{verbatim}3280amps = np.array([0.6, 0.25, 0.1, 0.05])3281fs = [100, 200, 300, 400]3282framerate = 1102532833284ts = np.linspace(0, 1, framerate)3285ys = synthesize1(amps, fs, ts)3286wave = thinkdsp.Wave(ys, framerate)3287\end{verbatim}32883289This example makes a signal that contains a fundamental frequency at3290100 Hz and three harmonics (100 Hz is a sharp G2). It renders the3291signal for one second at 11,025 frames per second and puts the results3292into a Wave object.32933294Conceptually, synthesis is pretty simple. But in this form it doesn't3295help much with {\bf analysis}, which is the inverse problem: given the3296wave, how do we identify the frequency components and their amplitudes?329732983299\section{Synthesis with arrays}3300\label{synthesis}33013302\begin{figure}3303\centerline{\includegraphics[height=2.5in]{figs/diagram1.pdf}}3304\caption{Synthesis with arrays.}3305\label{fig.synthesis}3306\end{figure}33073308Here's another way to write {\tt synthesize}:33093310\begin{verbatim}3311def synthesize2(amps, fs, ts):3312args = np.outer(ts, fs)3313M = np.cos(PI2 * args)3314ys = np.dot(M, amps)3315return ys3316\end{verbatim}33173318This function looks very different, but it does the same thing.3319Let's see how it works:33203321\begin{enumerate}33223323\item {\tt np.outer} computes the outer product of {\tt ts} and3324{\tt fs}. The result is an array with one row for each element3325of {\tt ts} and one column for each element of {\tt fs}. Each3326element in the array is the product of a frequency and a time, $f3327t$.33283329\item We multiply {\tt args} by $2 \pi$ and apply {\tt cos}, so each3330element of the result is $\cos (2 \pi f t)$. Since the {\tt ts} run3331down the columns, each column contains a cosine signal at a3332particular frequency, evaluated at a sequence of times.33333334\item {\tt np.dot} multiplies each row of {\tt M} by {\tt amps},3335element-wise, and then adds up the products. In terms of linear3336algebra, we are multiplying a matrix, {\tt M}, by a vector, {\tt3337amps}. In terms of signals, we are computing the weighted sum3338of frequency components.33393340\end{enumerate}33413342Figure~\ref{fig.synthesis} shows the structure of this computation.3343Each row of the matrix, {\tt M}, corresponds to a time3344from 0.0 to 1.0 seconds; $t_n$ is the time of the $n$th row.3345Each column corresponds to a frequency from3346100 to 400 Hz; $f_k$ is the frequency of the $k$th column.33473348I labeled the $n$th row with the letters $a$ through $d$; as an3349example, the value of $a$ is $\cos [2 \pi (100) t_n]$.33503351The result of the dot product, {\tt ys}, is a vector with one element3352for each row of {\tt M}. The $n$th element, labeled $e$, is the sum3353of products:3354%3355\[ e = 0.6 a + 0.25 b + 0.1 c + 0.05 d \]3356%3357And likewise with the other elements of {\tt ys}. So each element3358of {\tt y} is the sum of four frequency components, evaluated at3359a point in time, and multiplied by the corresponding amplitudes.3360And that's exactly what we wanted.33613362We can use the code from the previous section to check that the two3363versions of {\tt synthesize} produce the same results.33643365\begin{verbatim}3366ys1 = synthesize1(amps, fs, ts)3367ys2 = synthesize2(amps, fs, ts)3368max(abs(ys1 - ys2))3369\end{verbatim}33703371The biggest difference between {\tt ys1} and {\tt ys2} is about {\tt33721e-13}, which is what we expect due to floating-point errors.33733374Writing this computation in terms of linear algebra makes the code3375smaller and faster. Linear algebra3376provides concise notation for operations on matrices and vectors. For3377example, we could write {\tt synthesize} like this:3378%3379\begin{eqnarray*}3380M &=& \cos (2 \pi t \otimes f) \\3381y &=& M a3382\end{eqnarray*}3383%3384where $a$ is a vector of amplitudes,3385$t$ is a vector of times, $f$ is a vector of frequencies, and3386$\otimes$ is the symbol for the outer product of two vectors.338733883389\section{Analysis}3390\label{analysis}33913392Now we are ready to solve the analysis problem. Suppose I give you3393a wave and tell you that it is the sum of cosines with a given set3394of frequencies. How would you find the amplitude for each frequency3395component? In other words, given {\tt ys}, {\tt ts} and {\tt fs},3396can you recover {\tt amps}?33973398In terms of linear algebra, the first step is the same as for3399synthesis: we compute $M = \cos (2 \pi t \otimes f)$. Then we want3400to find $a$ so that $y = M a$; in other words, we want to solve a3401linear system. NumPy provides {\tt linalg.solve}, which does3402exactly that.34033404Here's what the code looks like:34053406\begin{verbatim}3407def analyze1(ys, fs, ts):3408args = np.outer(ts, fs)3409M = np.cos(PI2 * args)3410amps = np.linalg.solve(M, ys)3411return amps3412\end{verbatim}34133414The first two lines use {\tt ts} and {\tt fs} to build the3415matrix, {\tt M}. Then {\tt np.linalg.solve} computes {\tt amps}.34163417But there's a hitch. In general we can only solve a system of linear3418equations if the matrix is square; that is, the number of equations3419(rows) is the same as the number of unknowns (columns).34203421In this example, we have only 4 frequencies, but we evaluated the3422signal at 11,025 times. So we have many more equations than unknowns.34233424In general if {\tt ys}3425contains more than 4 elements, it is unlikely that we can analyze it3426using only 4 frequencies.34273428But in this case, we know that the {\tt ys} were actually generated by3429adding only 4 frequency components, so we can use any 4 values from3430the wave array to recover {\tt amps}.34313432For simplicity, I'll use the first 4 samples from the signal.3433Using the values of {\tt ys}, {\tt fs} and {\tt ts} from3434the previous section, we can run {\tt analyze1} like this:34353436\begin{verbatim}3437n = len(fs)3438amps2 = analyze1(ys[:n], fs, ts[:n])3439\end{verbatim}34403441And sure enough, {\tt amps2} is34423443\begin{verbatim}3444[ 0.6 0.25 0.1 0.05 ]3445\end{verbatim}34463447This algorithm works, but it is slow. Solving a linear3448system of equations takes time proportional to $n^3$, where $n$ is3449the number of columns in $M$. We can do better.345034513452\section{Orthogonal matrices}34533454One way to solve linear systems is by inverting matrices. The3455inverse of a matrix $M$ is written $M^{-1}$, and it has the property3456that $M^{-1}M = I$. $I$ is the identity matrix, which has3457the value 1 on all diagonal elements and 0 everywhere else.34583459So, to solve the equation $y = Ma$, we can multiply both sides by3460$M^{-1}$, which yields:3461%3462\[ M^{-1}y = M^{-1} M a \]3463%3464On the right side, we can replace $M^{-1}M$ with $I$:3465%3466\[ M^{-1}y = I a \]3467%3468If we multiply $I$ by any vector $a$, the result is $a$, so3469%3470\[ M^{-1}y = a \]3471%3472This implies that if we can compute $M^{-1}$ efficiently, we can find3473$a$ with a simple matrix multiplication (using {\tt np.dot}). That3474takes time proportional to $n^2$, which is better than $n^3$.34753476Inverting a matrix is slow, in general, but some special cases are3477faster. In particular, if $M$ is {\bf orthogonal}, the inverse of $M$3478is just the transpose of $M$, written $M^T$. In NumPy3479transposing an array is a constant-time operation. It3480doesn't actually move the elements of the array; instead, it creates a3481``view'' that changes the way the elements are accessed.34823483Again, a matrix is orthogonal if its transpose is also its inverse;3484that is, $M^T = M^{-1}$. That implies that $M^TM = I$, which means we3485can check whether a matrix is orthogonal by computing $M^TM$.34863487So let's see what the matrix looks like in {\tt synthesize2}. In3488the previous example, $M$ has 11,025 rows, so it might be a good idea3489to work with a smaller example:34903491\begin{verbatim}3492def test1():3493amps = np.array([0.6, 0.25, 0.1, 0.05])3494N = 4.03495time_unit = 0.0013496ts = np.arange(N) / N * time_unit3497max_freq = N / time_unit / 23498fs = np.arange(N) / N * max_freq3499ys = synthesize2(amps, fs, ts)3500\end{verbatim}35013502{\tt amps} is the same vector of amplitudes we saw before.3503Since we have 4 frequency components, we'll sample the signal3504at 4 points in time. That way, $M$ is square.35053506{\tt ts} is a vector of equally spaced sample times in the range from35070 to 1 time unit. I chose the time unit to be 1 millisecond, but it3508is an arbitrary choice, and we will see in a minute that it drops out3509of the computation anyway.35103511Since the frame rate is $N$ samples per time unit, the Nyquist3512frequency is \verb"N / time_unit / 2", which is 2000 Hz in this3513example. So {\tt fs} is a vector of equally spaced frequencies3514between 0 and 2000 Hz.35153516With these values of {\tt ts} and {\tt fs}, the matrix, $M$, is:35173518\begin{verbatim}3519[[ 1. 1. 1. 1. ]3520[ 1. 0.707 0. -0.707]3521[ 1. 0. -1. -0. ]3522[ 1. -0.707 -0. 0.707]]3523\end{verbatim}35243525You might recognize 0.707 as an approximation of $\sqrt{2}/2$,3526which is $\cos \pi/4$. You also might notice that this matrix3527is {\bf symmetric}, which means that the element at $(j, k)$ always3528equals the element at $(k, j)$. This implies that $M$ is its own3529transpose; that is, $M^T = M$.35303531But sadly, $M$ is not orthogonal. If we compute $M^TM$, we get:35323533\begin{verbatim}3534[[ 4. 1. -0. 1.]3535[ 1. 2. 1. -0.]3536[-0. 1. 2. 1.]3537[ 1. -0. 1. 2.]]3538\end{verbatim}35393540And that's not the identity matrix.354135423543\section{DCT-IV}3544\label{dctiv}35453546But if we choose {\tt ts} and {\tt fs} carefully,3547we can make $M$ orthogonal. There are several ways to do it, which3548is why there are several versions of the discrete cosine transform (DCT).35493550One simple option is to shift {\tt ts} and {\tt fs} by a half unit.3551This version is called DCT-IV, where ``IV'' is a roman numeral3552indicating that this is the fourth of eight versions of the DCT.35533554Here's an updated version of {\tt test1}:35553556\begin{verbatim}3557def test2():3558amps = np.array([0.6, 0.25, 0.1, 0.05])3559N = 4.03560ts = (0.5 + np.arange(N)) / N3561fs = (0.5 + np.arange(N)) / 23562ys = synthesize2(amps, fs, ts)3563\end{verbatim}35643565If you compare this to the previous version, you'll notice3566two changes. First, I added 0.5 to {\tt ts} and {\tt fs}.3567Second, I canceled out \verb"time_units", which simplifies3568the expression for {\tt fs}.35693570With these values, $M$ is35713572\begin{verbatim}3573[[ 0.981 0.831 0.556 0.195]3574[ 0.831 -0.195 -0.981 -0.556]3575[ 0.556 -0.981 0.195 0.831]3576[ 0.195 -0.556 0.831 -0.981]]3577\end{verbatim}35783579And $M^TM$ is35803581\begin{verbatim}3582[[ 2. 0. 0. 0.]3583[ 0. 2. -0. 0.]3584[ 0. -0. 2. -0.]3585[ 0. 0. -0. 2.]]3586\end{verbatim}35873588Some of the off-diagonal elements are displayed as -0, which means3589that the floating-point representation is a small negative number.3590This matrix is very close to $2I$, which means $M$ is almost3591orthogonal; it's just off by a factor of 2. And for our purposes,3592that's good enough.35933594Because $M$ is symmetric and (almost) orthogonal, the inverse of $M$3595is just $M/2$. Now we can write a more efficient version of {\tt3596analyze}:35973598\begin{verbatim}3599def analyze2(ys, fs, ts):3600args = np.outer(ts, fs)3601M = np.cos(PI2 * args)3602amps = np.dot(M, ys) / 23603return amps3604\end{verbatim}36053606Instead of using {\tt np.linalg.solve}, we just multiply3607by $M/2$.36083609Combining {\tt test2} and {\tt analyze2}, we can write an3610implementation of DCT-IV:36113612\begin{verbatim}3613def dct_iv(ys):3614N = len(ys)3615ts = (0.5 + np.arange(N)) / N3616fs = (0.5 + np.arange(N)) / 23617args = np.outer(ts, fs)3618M = np.cos(PI2 * args)3619amps = np.dot(M, ys) / 23620return amps3621\end{verbatim}36223623Again, {\tt ys} is the wave array. We don't have to pass3624{\tt ts} and {\tt fs} as parameters; \verb"dct_iv" can3625figure them out based on {\tt N}, the length of {\tt ys}.36263627If we've got it right, this function should solve the analysis3628problem; that is, given {\tt ys} it should be able to recover3629{\tt amps}. We can test it like this.36303631\begin{verbatim}3632amps = np.array([0.6, 0.25, 0.1, 0.05])3633N = 4.03634ts = (0.5 + np.arange(N)) / N3635fs = (0.5 + np.arange(N)) / 23636ys = synthesize2(amps, fs, ts)3637amps2 = dct_iv(ys)3638max(abs(amps - amps2))3639\end{verbatim}36403641Starting with {\tt amps}, we synthesize a wave array, then use3642\verb"dct_iv" to compute {\tt amps2}. The biggest3643difference between {\tt amps} and {\tt amps2} is about {\tt 1e-16},3644which is what we expect due to floating-point errors.364536463647\section{Inverse DCT}36483649Finally, notice that {\tt analyze2} and {\tt synthesize2} are almost3650identical. The only difference is that {\tt analyze2} divides the3651result by 2. We can use this insight to compute the inverse DCT:36523653\begin{verbatim}3654def inverse_dct_iv(amps):3655return dct_iv(amps) * 23656\end{verbatim}36573658\verb"inverse_dct_iv" solves the synthesis problem: it3659takes the vector of amplitudes and returns3660the wave array, {\tt ys}. We can test it by starting3661with {\tt amps}, applying \verb"inverse_dct_iv" and \verb"dct_iv",3662and testing that we get back what we started with.36633664\begin{verbatim}3665amps = [0.6, 0.25, 0.1, 0.05]3666ys = inverse_dct_iv(amps)3667amps2 = dct_iv(ys)3668max(abs(amps - amps2))3669\end{verbatim}36703671Again, the biggest difference is about {\tt 1e-16}.367236733674\section{The Dct class}36753676\begin{figure}3677% dct.py3678\centerline{\includegraphics[height=2.5in]{figs/dct1.pdf}}3679\caption{DCT of a triangle signal at 400 Hz, sampled at 10 kHz.}3680\label{fig.dct1}3681\end{figure}36823683{\tt thinkdsp} provides a Dct class that encapsulates the3684DCT in the same way the Spectrum class encapsulates the FFT.3685To make a Dct object, you can invoke \verb"make_dct" on a Wave.36863687\begin{verbatim}3688signal = thinkdsp.TriangleSignal(freq=400)3689wave = signal.make_wave(duration=1.0, framerate=10000)3690dct = wave.make_dct()3691dct.plot()3692\end{verbatim}36933694The result is the DCT of a triangle wave at 400 Hz, shown in3695Figure~\ref{fig.dct1}. The values of the DCT can be positive or negative;3696a negative value in the DCT corresponds to a negated cosine or,3697equivalently, to a cosine shifted by 180 degrees.36983699\verb"make_dct" uses DCT-II, which is the most common type of DCT,3700provided by {\tt scipy.fftpack}.37013702\begin{verbatim}3703import scipy.fftpack37043705# class Wave:3706def make_dct(self):3707N = len(self.ys)3708hs = scipy.fftpack.dct(self.ys, type=2)3709fs = (0.5 + np.arange(N)) / 23710return Dct(hs, fs, self.framerate)3711\end{verbatim}37123713The results from {\tt dct} are stored in {\tt hs}. The corresponding3714frequencies, computed as in Section~\ref{dctiv}, are stored in {\tt fs}.3715And then both are used to initialize the Dct object.37163717Dct provides \verb"make_wave", which performs the inverse DCT.3718We can test it like this:37193720\begin{verbatim}3721wave2 = dct.make_wave()3722max(abs(wave.ys-wave2.ys))3723\end{verbatim}37243725The biggest difference between {\tt ys1} and {\tt ys2} is about {\tt37261e-16}, which is what we expect due to floating-point errors.37273728\verb"make_wave" uses {\tt scipy.fftpack.idct}:37293730\begin{verbatim}3731# class Dct3732def make_wave(self):3733n = len(self.hs)3734ys = scipy.fftpack.idct(self.hs, type=2) / 2 / n3735return Wave(ys, framerate=self.framerate)3736\end{verbatim}37373738Be default, the inverse DCT doesn't normalize the result, so we have3739to divide through by $2N$.374037413742\section{Exercises}37433744For the following exercises, I provide some starter code in3745{\tt chap06starter.ipynb}.3746Solutions are in {\tt chap06soln.ipynb}.37473748\begin{exercise}3749In this chapter I claim that {\tt analyze1} takes time proportional3750to $n^3$ and {\tt analyze2} takes time proportional to $n^2$. To3751see if that's true, run them on a range of input sizes and time3752them. In Jupyter, you can use the ``magic command'' \verb"%timeit".37533754If you plot run time versus input size on a log-log scale, you3755should get a straight line with slope 3 for {\tt analyze1} and3756slope 2 for {\tt analyze2}.37573758You also might want to test \verb"dct_iv"3759and {\tt scipy.fftpack.dct}.37603761\end{exercise}376237633764\begin{exercise}3765One of the major applications of the DCT is compression for both3766sound and images. In its simplest form, DCT-based compression3767works like this:37683769\begin{enumerate}37703771\item Break a long signal into segments.37723773\item Compute the DCT of each segment.37743775\item Identify frequency components with amplitudes so low they are3776inaudible, and remove them. Store only the frequencies and3777amplitudes that remain.37783779\item To play back the signal, load the frequencies and amplitudes3780for each segment and apply the inverse DCT.37813782\end{enumerate}37833784Implement a version of this algorithm and apply it to a recording3785of music or speech. How many components can you eliminate before3786the difference is perceptible?37873788In order to make this method practical, you need some way to store a3789sparse array; that is, an array where most of the elements are zero.3790NumPy provides several implementations of sparse arrays, which you can3791read about at3792\url{http://docs.scipy.org/doc/scipy/reference/sparse.html}.3793\end{exercise}379437953796\begin{exercise}3797In the repository for this book you will find a Jupyter notebook3798called \verb"phase.ipynb" that explores the effect of phase on sound3799perception.3800Read through this notebook and run the examples.3801Choose another segment of sound and run the same experiments.3802Can you find any general relationships between the phase structure3803of a sound and how we perceive it?3804\end{exercise}38053806380738083809\chapter{Discrete Fourier Transform}3810\label{dft}38113812We've been using the discrete Fourier transform (DFT) since3813Chapter~\ref{sounds}, but I haven't explained how it works. Now is3814the time.38153816If you understand the discrete cosine transform (DCT), you will3817understand the DFT. The only difference is that instead of using the3818cosine function, we'll use the complex exponential function. I'll3819start by explaining complex exponentials, then I'll follow the3820same progression as in Chapter~\ref{dct}:38213822\begin{enumerate}38233824\item We'll start with the synthesis3825problem: given a set of frequency components and their amplitudes,3826how can we construct a signal? The synthesis problem is3827equivalent to the inverse DFT.38283829\item Then I'll rewrite the synthesis problem in the form of matrix3830multiplication using NumPy arrays.38313832\item Next we'll solve the analysis problem, which is equivalent to3833the DFT: given a signal, how to we find the amplitude and phase3834offset of its frequency components?38353836\item Finally, we'll use linear algebra to find a more efficient way3837to compute the DFT.38383839\end{enumerate}38403841The code for this chapter is in {\tt chap07.ipynb}, which is in the3842repository for this book (see Section~\ref{code}).3843You can also view it at \url{http://tinyurl.com/thinkdsp07}.384438453846\section{Complex exponentials}38473848One of the more interesting moves in mathematics is the generalization3849of an operation from one type to another. For example, factorial is a3850function that operates on integers; the natural definition for3851factorial of $n$ is the product of all integers from 1 to $n$.38523853If you are of a certain inclination, you might wonder how to compute3854the factorial of a non-integer like 3.5. Since the natural definition3855doesn't apply, you might look for other ways to compute the factorial3856function, ways that would work with non-integers.38573858In 1730, Leonhard Euler found one, a generalization of the factorial3859function that we know as the gamma function (see3860\url{http://en.wikipedia.org/wiki/Gamma_function}).38613862Euler also found one of the most useful generalizations in applied3863mathematics, the complex exponential function.38643865The natural definition of exponentiation is repeated multiplication.3866For example, $\phi^3 = \phi \cdot \phi \cdot \phi$. But this3867definition doesn't apply to non-integer exponents.38683869However, exponentiation can also be expressed as a power series:3870%3871\[ e^\phi = 1 + \phi + \phi^2/2! + \phi^3/3! + ... \]3872%3873This definition works with real numbers, imaginary numbers and, by a simple3874extension, with complex numbers. Applying this definition3875to a pure imaginary number, $i\phi$, we get3876%3877\[ e^{i\phi} = 1 + i\phi - \phi^2/2! - i\phi^3/3! + ... \]3878%3879By rearranging terms, we can show that this is equivalent to:3880%3881\[ e^{i\phi} = \cos \phi + i \sin \phi \]3882%3883You can see the derivation at3884\url{http://en.wikipedia.org/wiki/Euler's_formula}.38853886This formula3887implies that $e^{i\phi}$ is a complex number with magnitude 1; if you3888think of it as a point in the complex plane, it is always on the unit3889circle. And if you think of it as a vector, the angle in radians3890between the vector and the positive x-axis is the argument, $\phi$.38913892In the case where the exponent is a complex number, we have:3893%3894\[ e^{a + i\phi} = e^a e^{i\phi} = A e^{i\phi} \]3895%3896where $A$ is a real number that indicates amplitude and3897$e^{i\phi}$ is a unit complex number that indicates angle.38983899NumPy provides a version of {\tt exp} that works with complex numbers:39003901\begin{verbatim}3902>>> phi = 1.53903>>> z = np.exp(1j * phi)3904>>> z3905(0.0707+0.997j)3906\end{verbatim}39073908Python uses {\tt j} to represent the imaginary unit, rather3909than {\tt i}. A number ending in {\tt j} is considered imaginary,3910so {\tt 1j} is just $i$.39113912When the argument to {\tt np.exp} is imaginary or complex, the3913result is a complex number; specifically, a {\tt np.complex128},3914which is represented by two 64-bit floating-point numbers.3915In this example, the result is {\tt 0.0707+0.997j}.39163917Complex numbers have attributes {\tt real} and {\tt imag}:39183919\begin{verbatim}3920>>> z.real39210.07073922>>> z.imag39230.9973924\end{verbatim}39253926To get the magnitude, you can use the built-in function {\tt abs}3927or {\tt np.absolute}:39283929\begin{verbatim}3930>>> abs(z)39311.03932>>> np.absolute(z)39331.03934\end{verbatim}39353936To get the angle, you can use {\tt np.angle}:39373938\begin{verbatim}3939>>> np.angle(z)39401.53941\end{verbatim}39423943This example confirms that $e^{i \phi}$ is a complex number with3944magnitude 1 and angle $\phi$ radians.394539463947\section{Complex signals}39483949If $\phi(t)$ is a function of time, $e^{i \phi(t)}$ is also a function3950of time. Specifically,3951%3952\[ e^{i \phi(t)} = \cos \phi(t) + i \sin \phi(t) \]3953%3954This function describes a quantity that varies in time, so it is3955a signal. Specifically, it is a {\bf complex exponential signal}.39563957In the special case where the frequency of the signal is constant,3958$\phi(t)$ is $2 \pi f t$ and the result is a {\bf complex sinusoid}:3959%3960\[ e^{i 2 \pi f t} = \cos 2 \pi f t + i \sin 2 \pi f t \]3961%3962Or more generally, the signal might start at a phase offset3963$\phi_0$, yielding3964%3965\[ e^{i (2 \pi f t + \phi_0)} \]3966%3967{\tt thinkdsp} provides an implementation of this signal,3968{\tt ComplexSinusoid}:39693970\begin{verbatim}3971class ComplexSinusoid(Sinusoid):39723973def evaluate(self, ts):3974phases = PI2 * self.freq * ts + self.offset3975ys = self.amp * np.exp(1j * phases)3976return ys3977\end{verbatim}39783979{\tt ComplexSinusoid} inherits \verb"__init__" from3980{\tt Sinusoid}. It provides a version of {\tt evaluate}3981that is almost identical to {\tt Sinusoid.evaluate}; the3982only difference is that it uses {\tt np.exp} instead of3983{\tt np.sin}.39843985The result is a NumPy array of complex numbers:39863987\begin{verbatim}3988>>> signal = thinkdsp.ComplexSinusoid(freq=1, amp=0.6, offset=1)3989>>> wave = signal.make_wave(duration=1, framerate=4)3990>>> wave.ys3991[ 0.324+0.505j -0.505+0.324j -0.324-0.505j 0.505-0.324j]3992\end{verbatim}39933994The frequency of this signal is 1 cycle per second; the amplitude3995is 0.6 (in unspecified units); and the phase offset is 1 radian.39963997This example evaluates the signal at 4 places equally spaced between39980 and 1 second. The resulting samples are complex numbers.399940004001\section{The synthesis problem}40024003Just as we did with real sinusoids, we can create compound signals4004by adding up complex sinusoids with different frequencies. And that4005brings us to the complex version of the synthesis problem: given the4006frequency and amplitude of each complex component, how do we evaluate the4007signal?40084009The simplest solution is to create {\tt ComplexSinusoid} objects and4010add them up.40114012\begin{verbatim}4013def synthesize1(amps, fs, ts):4014components = [thinkdsp.ComplexSinusoid(freq, amp)4015for amp, freq in zip(amps, fs)]4016signal = thinkdsp.SumSignal(*components)4017ys = signal.evaluate(ts)4018return ys4019\end{verbatim}40204021This function is almost identical to {\tt synthesize1} in4022Section~\ref{synth1}; the only difference is that I replaced4023{\tt CosSignal} with {\tt ComplexSinusoid}.40244025Here's an example:40264027\begin{verbatim}4028amps = np.array([0.6, 0.25, 0.1, 0.05])4029fs = [100, 200, 300, 400]4030framerate = 110254031ts = np.linspace(0, 1, framerate)4032ys = synthesize1(amps, fs, ts)4033\end{verbatim}40344035The result is:40364037\begin{verbatim}4038[ 1.000 +0.000e+00j 0.995 +9.093e-02j 0.979 +1.803e-01j ...,40390.979 -1.803e-01j 0.995 -9.093e-02j 1.000 -5.081e-15j]4040\end{verbatim}40414042At the lowest level, a complex signal is a sequence of complex4043numbers. But how should we interpret it? We have some intuition for4044real signals: they represent quantities that vary in time; for4045example, a sound signal represents changes in air pressure.4046But nothing we measure in the world yields complex numbers.40474048\begin{figure}4049% dft.py4050\centerline{\includegraphics[height=2.5in]{figs/dft1.pdf}}4051\caption{Real and imaginary parts of a mixture of complex sinusoids.}4052\label{fig.dft1}4053\end{figure}40544055So what is a complex signal? I don't have a satisfying answer to this4056question. The best I can offer is two unsatisfying4057answers:40584059\begin{enumerate}40604061\item A complex signal is a mathematical abstraction that is useful4062for computation and analysis, but it does not correspond directly4063with anything in the real world.40644065\item If you like, you can think of a complex signal as a sequence of4066complex numbers that contains two signals as its real and imaginary4067parts.40684069\end{enumerate}40704071Taking the second point of view, we can split the previous4072signal into its real and imaginary parts:40734074\begin{verbatim}4075n = 5004076thinkplot.plot(ts[:n], ys[:n].real, label='real')4077thinkplot.plot(ts[:n], ys[:n].imag, label='imag')4078\end{verbatim}40794080Figure~\ref{fig.dft1} shows a segment of the result. The4081real part is a sum of cosines; the imaginary part is4082a sum of sines. Although the waveforms look different, they4083contain the same frequency components in the same proportions.4084To our ears, they sound the same (in general, we don't hear4085phase offsets).408640874088\section{Synthesis with matrices}4089\label{synthmat}40904091As we saw in Section~\ref{synthesis}, we can also express the synthesis4092problem in terms of matrix multiplication:40934094\begin{verbatim}4095PI2 = np.pi * 240964097def synthesize2(amps, fs, ts):4098args = np.outer(ts, fs)4099M = np.exp(1j * PI2 * args)4100ys = np.dot(M, amps)4101return ys4102\end{verbatim}41034104Again, {\tt amps} is a NumPy array that contains a sequence of4105amplitudes.41064107{\tt fs} is a sequence containing the frequencies of the4108components. {\tt ts} contains the times where we will evaluate4109the signal.41104111{\tt args} contains the outer product of {\tt ts} and {\tt fs},4112with the {\tt ts} running down the rows and the {\tt fs} running4113across the columns (you might want to refer back to4114Figure~\ref{fig.synthesis}).41154116Each column of matrix {\tt M} contains a complex sinusoid with4117a particular frequency, evaluated at a sequence of {\tt ts}.41184119When we multiply {\tt M} by the amplitudes, the result is a vector4120whose elements correspond to the {\tt ts}; each element is the sum of4121several complex sinusoids, evaluated at a particular time.41224123Here's the example from the previous section again:41244125\begin{verbatim}4126>>> ys = synthesize2(amps, fs, ts)4127>>> ys4128[ 1.000 +0.000e+00j 0.995 +9.093e-02j 0.979 +1.803e-01j ...,41290.979 -1.803e-01j 0.995 -9.093e-02j 1.000 -5.081e-15j]4130\end{verbatim}41314132The result is the same.41334134In this example the amplitudes are real, but they could also be4135complex. What effect does a complex amplitude have on the result?4136Remember that we can think of a complex number in two ways: either the4137sum of a real and imaginary part, $x + i y$, or the product of a real4138amplitude and a complex exponential, $A e^{i \phi_0}$. Using the4139second interpretation, we can see what happens when we multiply4140a complex amplitude by a complex sinusoid. For each frequency, $f$,4141we have:4142%4143\[ A \exp{i \phi_0} \cdot \exp{i 2 \pi f t} = A \exp{i (2 \pi f t + \phi_0)} \]4144%4145Multiplying by $A e^{i \phi_0}$ multiplies the amplitude by $A$4146and adds the phase offset $\phi_0$.41474148\begin{figure}4149% dft.py4150\centerline{\includegraphics[height=2.5in]{figs/dft2.pdf}}4151\caption{Real part of two complex signals that differ by a phase4152offset.}4153\label{fig.dft2}4154\end{figure}41554156We can test that claim by running the previous example with4157$\phi_0 = 1.5$ for all frequency components:41584159\begin{verbatim}4160phi = 1.54161amps2 = amps * np.exp(1j * phi)4162ys2 = synthesize2(amps2, fs, ts)41634164thinkplot.plot(ts[:n], ys.real[:n])4165thinkplot.plot(ts[:n], ys2.real[:n])4166\end{verbatim}41674168Since {\tt amps}4169is an array of reals, multiplying by {\tt np.exp(1j * phi)} yields4170an array of complex numbers with phase offset {\tt phi} radians, and4171the same magnitudes as {\tt amps}.41724173Figure~\ref{fig.dft2} shows the result. The phase offset4174$\phi_0 = 1.5$ shifts the wave to the left by about one quarter of4175a cycle; it also changes the waveform, because the same phase4176offset applied to different frequencies changes how the frequency4177components line up with each other.41784179Now that we have the more general solution to the synthesis problem --4180one that handles complex amplitudes -- we are ready for the analysis4181problem.418241834184\section{The analysis problem}41854186The analysis problem is the inverse of the synthesis problem: given a4187sequence of samples, $y$, and knowing the frequencies4188that make up the signal, can we compute the complex amplitudes of the4189components, $a$?41904191As we saw in Section~\ref{analysis}, we can solve this problem by forming4192the synthesis matrix, $M$, and solving the system of linear4193equations, $M a = y$ for $a$.41944195\begin{verbatim}4196def analyze1(ys, fs, ts):4197args = np.outer(ts, fs)4198M = np.exp(1j * PI2 * args)4199amps = np.linalg.solve(M, ys)4200return amps4201\end{verbatim}42024203{\tt analyze1} takes a (possibly complex) wave array, {\tt ys}, a4204sequence of real frequencies, {\tt fs}, and a sequence of real4205times, {\tt ts}. It returns a sequence of complex amplitudes, {\tt4206amps}.42074208Continuing the previous example, we can confirm that {\tt analyze1}4209recovers the amplitudes we started with. For the linear system4210solver to work, {\tt M} has to be square, so we need {\tt ys}, {\tt4211fs} and {\tt ts} to have the same length. I'll insure that by4212slicing {\tt ys} and {\tt ts} down to the length of {\tt fs}:42134214\begin{verbatim}4215>>> n = len(fs)4216>>> amps2 = analyze1(ys[:n], fs, ts[:n])4217>>> amps24218[ 0.60+0.j 0.25-0.j 0.10+0.j 0.05-0.j]4219\end{verbatim}42204221These are approximately the amplitudes we started with, although4222each component has a small imaginary part due to4223floating-point errors.422442254226\section{Efficient analysis}42274228Unfortunately, solving a linear system of equations is slow. For the4229DCT, we were able to speed things up by choosing {\tt fs} and {\tt4230ts} so that {\tt M} is orthogonal. That way, the inverse of {\tt M}4231is the transpose of {\tt M}, and we can compute both DCT and inverse4232DCT by matrix multiplication.42334234We'll do the same thing for the DFT, with one small change.4235Since {\tt M} is complex, we need it to be {\bf unitary}, rather4236than orthogonal, which means that the inverse of {\tt M} is4237the conjugate transpose of {\tt M}, which we can compute by4238transposing the matrix and negating the imaginary part of each4239element. See \url{http://en.wikipedia.org/wiki/Unitary_matrix}.42404241The NumPy methods {\tt conj} and {\tt transpose} do what we4242want. Here's the code that computes {\tt M} for $N=4$ components:42434244\begin{verbatim}4245N = 44246ts = np.arange(N) / N4247fs = np.arange(N)4248args = np.outer(ts, fs)4249M = np.exp(1j * PI2 * args)4250\end{verbatim}42514252If $M$ is unitary, $M^*M = I$, where $M^*$ is the conjugate transpose4253of $M$, and $I$ is the identity matrix. We can test whether $M$4254is unitary like this:42554256\begin{verbatim}4257MstarM = M.conj().transpose().dot(M)4258\end{verbatim}42594260The result, within the tolerance of floating-point error, is4261$4 I$, so $M$ is unitary except for an extra factor of $N$,4262similar to the extra factor of 2 we found with the DCT.42634264We can use this result to write a faster version of {\tt analyze1}:42654266\begin{verbatim}4267def analyze2(ys, fs, ts):4268args = np.outer(ts, fs)4269M = np.exp(1j * PI2 * args)4270amps = M.conj().transpose().dot(ys) / N4271return amps4272\end{verbatim}42734274And test it with appropriate values of {\tt fs} and {\tt ts}:42754276\begin{verbatim}4277N = 44278amps = np.array([0.6, 0.25, 0.1, 0.05])4279fs = np.arange(N)4280ts = np.arange(N) / N4281ys = synthesize2(amps, fs, ts)4282amps3 = analyze2(ys, fs, ts)4283\end{verbatim}42844285Again, the result is correct within the tolerance of floating-point4286arithmetic.42874288\begin{verbatim}4289[ 0.60+0.j 0.25+0.j 0.10-0.j 0.05-0.j]4290\end{verbatim}429142924293\section{DFT}4294\label{dftsection}42954296As a function, {\tt analyze2} would be hard to use because it4297only works if {\tt fs} and {\tt ts} are chosen correctly.4298Instead, I will rewrite it to take just {\tt ys} and compute {\tt freq}4299and {\tt ts} itself.43004301First, I'll make a function to compute the synthesis matrix, $M$:43024303\begin{verbatim}4304def synthesis_matrix(N):4305ts = np.arange(N) / N4306fs = np.arange(N)4307args = np.outer(ts, fs)4308M = np.exp(1j * PI2 * args)4309return M4310\end{verbatim}43114312Then I'll write the function that takes {\tt ys} and returns4313{\tt amps}:43144315\begin{verbatim}4316def analyze3(ys):4317N = len(ys)4318M = synthesis_matrix(N)4319amps = M.conj().transpose().dot(ys) / N4320return amps4321\end{verbatim}43224323We are almost done; {\tt analyze3} computes something very4324close to the DFT, with one difference. The conventional definition4325of DFT does not divide by {\tt N}:43264327\begin{verbatim}4328def dft(ys):4329N = len(ys)4330M = synthesis_matrix(N)4331amps = M.conj().transpose().dot(ys)4332return amps4333\end{verbatim}43344335Now we can confirm that my version yields the same result as4336FFT:43374338\begin{verbatim}4339>>> dft(ys)4340[ 2.4+0.j 1.0+0.j 0.4-0.j 0.2-0.j]4341\end{verbatim}43424343The result is close to {\tt amps * N}.4344And here's the version in {\tt np.fft}:43454346\begin{verbatim}4347>>> np.fft.fft(ys)4348[ 2.4+0.j 1.0+0.j 0.4-0.j 0.2-0.j]4349\end{verbatim}43504351They are the same, within floating point error.43524353The inverse DFT is almost the same, except we don't have to transpose4354and conjugate $M$, and {\em now} we have to divide through by N:43554356\begin{verbatim}4357def idft(ys):4358N = len(ys)4359M = synthesis_matrix(N)4360amps = M.dot(ys) / N4361return amps4362\end{verbatim}43634364Finally, we can confirm that {\tt dft(idft(amps))} yields {\tt amps}.43654366\begin{verbatim}4367>>> ys = idft(amps)4368>>> dft(ys)4369[ 0.60+0.j 0.25+0.j 0.10-0.j 0.05-0.j]4370\end{verbatim}43714372If I could go back in time, I might change the definition of4373DFT so it divides by $N$ and the inverse DFT doesn't. That would4374be more consistent with my presentation of the synthesis and analysis4375problems.43764377Or I might change the definition so that both operations divide4378through by $\sqrt{N}$. Then the DFT and inverse DFT would be4379more symmetric.43804381But I can't go back in time (yet!), so we're stuck with a slightly4382weird convention. For practical purposes it doesn't really4383matter.438443854386\section{The DFT is periodic}43874388In this chapter I presented the DFT in the form of matrix multiplication.4389We compute the synthesis matrix, $M$, and the analysis matrix, $M^*$.4390When we multiply $M^{*}$ by the wave array, $y$, each element of the4391result is the product of a row from $M^*$ and $y$, which we can4392write in the form of a summation:4393%4394\[ DFT(y)[k] = \sum_n y[n] \exp(-2 \pi i n k / N) \]4395%4396where $k$ is an index of frequency from4397$0$ to $N-1$ and $n$ is an index of time from $0$ to $N-1$.4398So $DFT(y)[k]$ is the $k$th element of the DFT of $y$.43994400Normally we evaluate this summation for $N$ values of $k$, running from44010 to $N-1$. We {\em could} evaluate it for other values of $k$, but4402there is no point, because they start to repeat. That is, the value at4403$k$ is the same as the value at $k+N$ or $k+2N$ or $k-N$, etc.44044405We can see that mathematically by plugging $k+N$ into the summation:4406%4407\[ DFT(y)[k+N] = \sum_n y[n] \exp(-2 \pi i n (k+N) / N) \]4408%4409Since there is a sum in the exponent, we can break it into two parts:4410%4411\[ DFT(y)[k+N] = \sum_n y[n] \exp(-2 \pi i n k / N) \exp(-2 \pi i n N / N) \]4412%4413In the second term, the exponent is always an integer multiple of4414$2 \pi$, so the result is always 1, and we can drop it:4415%4416\[ DFT(y)[k+N] = \sum_n y[n] \exp(-2 \pi i n k / N) \]4417%4418And we can see that this summation is equivalent to $ DFT(y)[k]$.4419So the DFT is periodic, with period $N$. You will need this result4420for one of the exercises below, which asks you to implement the Fast Fourier4421Transform (FFT).44224423As an aside, writing the DFT in the form of a summation provides an4424insight into how it works. If you review the diagram in4425Section~\ref{synthesis}, you'll see that each column of the synthesis matrix4426is a signal evaluated at a sequence of times. The analysis matrix is4427the (conjugate) transpose of the synthesis matrix, so each {\em row}4428is a signal evaluated at a sequence of times.44294430Therefore, each summation is the correlation of $y$ with one of the4431signals in the array (see Section~\ref{dotproduct}). That is, each4432element of the DFT is a correlation that quantifies the similarity of4433the wave array, $y$, and a complex exponential at a particular4434frequency.443544364437\section{DFT of real signals}44384439\begin{figure}4440% dft.py4441\centerline{\includegraphics[height=2.5in]{figs/dft3.pdf}}4442\caption{DFT of a 500 Hz sawtooth signal sampled at 10 kHz.}4443\label{fig.dft3}4444\end{figure}44454446The Spectrum class in {\tt thinkdsp} is based on {\tt np.ftt.rfft},4447which computes the ``real DFT''; that is, it works with real signals.4448But the DFT as presented in this chapter is more general than that; it4449works with complex signals.44504451So what happens when we apply the ``full DFT'' to a real signal?4452Let's look at an example:44534454\begin{verbatim}4455signal = thinkdsp.SawtoothSignal(freq=500)4456wave = signal.make_wave(duration=0.1, framerate=10000)4457hs = dft(wave.ys)4458amps = np.absolute(hs)4459\end{verbatim}44604461This code makes a sawtooth wave with frequency 500 Hz, sampled at4462framerate 10 kHz. {\tt hs} contains the complex DFT of the wave;4463{\tt amps} contains the amplitude at each frequency. But what4464frequency do these amplitudes correspond to? If we look at the4465body of {\tt dft}, we see:44664467\begin{verbatim}4468fs = np.arange(N)4469\end{verbatim}44704471It's tempting to think that these values are the right frequencies.4472The problem is that {\tt dft} doesn't know the sampling rate. The DFT4473assumes that the duration of the wave is 1 time unit, so it thinks the4474sampling rate is $N$ per time unit. In order to interpret the4475frequencies, we have to convert from these arbitrary time units back4476to seconds, like this:44774478\begin{verbatim}4479fs = np.arange(N) * framerate / N4480\end{verbatim}44814482With this change, the range of frequencies is from 0 to the actual4483framerate, 10 kHz. Now we can plot the spectrum:44844485\begin{verbatim}4486thinkplot.plot(fs, amps)4487thinkplot.config(xlabel='frequency (Hz)',4488ylabel='amplitude')4489\end{verbatim}44904491Figure~\ref{fig.dft3} shows the amplitude of the signal for each4492frequency component from 0 to 10 kHz. The left half of the figure4493is what we should expect: the dominant frequency is at 500 Hz, with4494harmonics dropping off like $1/f$.44954496But the right half of the figure is a surprise. Past 5000 Hz, the4497amplitude of the harmonics start growing again, peaking at 9500 Hz.4498What's going on?44994500The answer: aliasing. Remember that with framerate 10000 Hz, the4501folding frequency is 5000 Hz. As we saw in Section~\ref{aliasing},4502a component at 5500 Hz is indistinguishable from a component4503at 4500 Hz. When we evaluate the DFT at 5500 Hz, we get the same4504value as at 4500 Hz. Similarly, the value at 6000 Hz is the same4505as the one at 4000 Hz, and so on.45064507The DFT of a real signal is symmetric around the folding frequency.4508Since there is no additional information past this point, we can4509save time by evaluating only the first half of the DFT,4510and that's exactly what {\tt np.fft.rfft} does.4511451245134514\section{Exercises}45154516Solutions to these exercises are in {\tt chap07soln.ipynb}.451745184519\begin{exercise}4520The notebook for this chapter, {\tt chap07.ipynb}, contains4521additional examples and explanations. Read through it and run4522the code.4523\end{exercise}452445254526\begin{exercise}4527In this chapter, I showed how we can express the DFT and inverse DFT4528as matrix multiplications. These operations take time proportional to4529$N^2$, where $N$ is the length of the wave array. That is fast enough4530for many applications, but there is a faster4531algorithm, the Fast Fourier Transform (FFT), which takes time4532proportional to $N \log N$.45334534The key to the FFT is the Danielson-Lanczos lemma:4535%4536\[ DFT(y)[n] = DFT(e)[n] + exp(-2 \pi i n / N) DFT(o)[n] \]4537%4538Where $ DFT(y)[n]$ is the $n$th element of the DFT of $y$; $e$ is a4539wave array containing the even elements of $y$, and $o$ contains the4540odd elements of $y$.45414542This lemma suggests a recursive algorithm for the DFT:45434544\begin{enumerate}45454546\item Given a wave array, $y$, split it into its even elements, $e$,4547and its odd elements, $o$.45484549\item Compute the DFT of $e$ and $o$ by making recursive calls.45504551\item Compute $DFT(y)$ for each value of $n$ using the4552Danielson-Lanczos lemma.45534554\end{enumerate}45554556For the base case of this recursion, you could wait until the length4557of $y$ is 1. In that case, $DFT(y) = y$. Or if the length of $y$4558is sufficiently small, you could compute its DFT by matrix multiplication,4559possibly using a precomputed matrix.45604561Hint: I suggest you implement this algorithm incrementally by starting4562with a version that is not truly recursive. In Step 2, instead of4563making a recursive call, use {\tt dft}, as defined by4564Section~\ref{dftsection}, or {\tt np.fft.fft}. Get Step 3 working,4565and confirm that the results are consistent with the other4566implementations. Then add a base case and confirm that it works.4567Finally, replace Step 2 with recursive calls.45684569One more hint: Remember that the DFT is periodic; you might find {\tt4570np.tile} useful.45714572You can read more about the FFT at4573\url{https://en.wikipedia.org/wiki/Fast_Fourier_transform}.45744575\end{exercise}457645774578\chapter{Filtering and Convolution}45794580In this chapter I present one of the most important and useful4581ideas related to signal processing: the Convolution Theorem.4582But before we can understand the Convolution Theorem, we have to understand4583convolution. I'll start with a simple example, smoothing, and4584we'll go from there.45854586The code for this chapter is in {\tt chap08.ipynb}, which is in the4587repository for this book (see Section~\ref{code}).4588You can also view it at \url{http://tinyurl.com/thinkdsp08}.458945904591\section{Smoothing}4592\label{smoothing}45934594\begin{figure}4595% convolution1.py4596\centerline{\includegraphics[height=2.5in]{figs/convolution1.pdf}}4597\caption{Daily closing price of Facebook stock and a 30-day moving4598average.}4599\label{fig.convolution1}4600\end{figure}46014602Smoothing is an operation that tries to remove short-term variations4603from a signal in order to reveal long-term trends. For example, if4604you plot daily changes in the price of a stock, it would look noisy;4605a smoothing operator might make it easier to see whether the price4606was generally going up or down over time.46074608A common smoothing algorithm is a moving average, which computes4609the mean of the previous $n$ values, for some value of $n$.46104611For example, Figure~\ref{fig.convolution1} shows the daily4612closing price of Facebook from May 17, 2012 to December 8,46132015. The gray line4614is the raw data, the darker line shows the 30-day moving average.4615Smoothing removes the most4616extreme changes and makes it easier to see long-term trends.46174618Smoothing operations also apply to sound signals. As an example, I'll4619start with a square wave at 440 Hz. As we saw in4620Section~\ref{square}, the harmonics of a square wave drop off4621slowly, so it contains many high-frequency components.46224623\begin{verbatim}4624signal = thinkdsp.SquareSignal(freq=440)4625wave = signal.make_wave(duration=1, framerate=44100)4626segment = wave.segment(duration=0.01)4627\end{verbatim}46284629{\tt wave} is a 1-second segment of the signal; {\tt segment}4630is a shorter segment I'll use for plotting.46314632To compute the moving average of this signal, I'll create4633a window with 11 elements and normalize it so the elements4634add up to 1.46354636\begin{verbatim}4637window = np.ones(11)4638window /= sum(window)4639\end{verbatim}46404641Now I can compute the average of the first 11 elements by4642multiplying the window by the wave array:46434644\begin{verbatim}4645ys = segment.ys4646N = len(ys)4647padded = thinkdsp.zero_pad(window, N)4648prod = padded * ys4649sum(prod)4650\end{verbatim}46514652{\tt padded} is a version of the window with zeros added to4653the end so it has the same length as {\tt segment.ys}4654{\tt prod} is the product of the window and the wave array.46554656The sum of the elementwise products is the average of the first 114657elements of the array. Since these elements are all -1, their average4658is -1.46594660\begin{figure}4661% convolution2.py4662\centerline{\includegraphics[height=2.5in]{figs/convolution2.pdf}}4663\caption{A square signal at 400 Hz (gray) and an 11-element4664moving average.}4665\label{fig.convolution2}4666\end{figure}46674668To compute the next element of the smoothed signal, we shift the4669window to the right and compute the average of the next 11 elements of4670the wave array, starting with the second.46714672\begin{verbatim}4673rolled = np.roll(rolled, 1)4674prod = rolled * ys4675sum(prod)4676\end{verbatim}46774678The result is -1 again.4679To compute the moving average, we repeat this process and store4680the result in an array named {\tt smoothed}:46814682\begin{verbatim}4683smoothed = np.zeros_like(ys)4684rolled = padded4685for i in range(N):4686smoothed[i] = sum(rolled * ys)4687rolled = np.roll(rolled, 1)4688\end{verbatim}46894690{\tt rolled} is a copy of {\tt padded} that gets shifted to4691the right by one element each time through the loop. Inside4692the loop, we multiply the segment by {\tt rolled} to select469311 elements, and then add them up.46944695Figure~\ref{fig.convolution2} shows the result. The gray line4696is the original signal; the darker line is the smoothed signal.4697The smoothed signal starts to ramp up when the leading edge of4698the window reaches the first transition, and levels off when4699the window crosses the transition. As a result, the transitions4700are less abrupt, and the corners less sharp. If you listen4701to the smoothed signal, it sounds less buzzy and slightly muffled.470247034704\section{Convolution}4705\label{convolution}47064707The operation we just computed is called {\bf convolution},4708and it is such a common operation that NumPy provides an4709implementation that is simpler and faster than my version:47104711\begin{verbatim}4712convolved = np.convolve(ys, window, mode='valid')4713smooth2 = thinkdsp.Wave(convolved, framerate=wave.framerate)4714\end{verbatim}47154716{\tt np.convolve} computes the convolution of the wave4717array and the window. The mode flag {\tt valid} indicates4718that it should only compute values when the window and the4719wave array overlap completely, so it stops when the right4720edge of the window reaches the end of the wave array. Other4721than that, the result is the same as in Figure~\ref{fig.convolution2}.47224723\newcommand{\conv}{\ast}47244725Actually, there is one other difference. The loop in the4726previous section actually computes {\bf cross-correlation}:4727%4728\[ (f \star g)[n] = \sum_{m=0}^{N-1} f[m] g[n+m] \]4729%4730where $f$ is a wave array with length $N$, $g$ is the window,4731and $\star$ is the symbol for cross-correlation. To4732compute the $n$th element of the result, we shift $g$ to4733the right, which is why the index is $n+m$.47344735The definition of convolution is slightly different:4736%4737\[ (f \conv g)[n] = \sum_{m=0}^{N-1} f[m] g[n-m] \]4738%4739The symbol $\conv$ represents convolution. The difference is in the4740index of $g$: $m$ has been negated, so the summation iterates the4741elements of $g$ backward (assuming that negative indices wrap around4742to the end of the array).47434744Because the window we used in this example is symmetric,4745cross-correlation and convolution yield the same result. When we use4746other windows, we will have to be more careful.47474748You might wonder why convolution is defined the way it is. There4749are two reasons:47504751\begin{itemize}47524753\item This definition comes up naturally for several applications,4754especially analysis of signal-processing systems, which is4755the topic of Chapter~\ref{systems}.47564757\item Also, this definition is the basis of the Convolution Theorem,4758coming up very soon.47594760\end{itemize}476147624763\section{The frequency domain}47644765\begin{figure}4766% convolution4.py4767\centerline{\includegraphics[height=2.5in]{figs/convolution4.pdf}}4768\caption{Spectrum of the square wave before and after smoothing.}4769\label{fig.convolution4}4770\end{figure}47714772Smoothing makes the transitions in a square signal less abrupt,4773and makes the sound slightly muffled. Let's see what effect this4774operation has on the spectrum. First I'll plot the spectrum4775of the original wave:47764777\begin{verbatim}4778spectrum = wave.make_spectrum()4779spectrum.plot(color=GRAY)4780\end{verbatim}47814782Then the smoothed wave:47834784\begin{verbatim}4785convolved = np.convolve(wave.ys, window, mode='same')4786smooth = thinkdsp.Wave(convolved, framerate=wave.framerate)4787spectrum2 = smooth.make_spectrum()4788spectrum2.plot()4789\end{verbatim}47904791The mode flag {\tt same} indicates that the result should have the4792same length as the input. In this example, it will include a few values4793that ``wrap around'', but that's ok for now.47944795Figure~\ref{fig.convolution4} shows the result. The fundamental4796frequency is almost unchanged; the first few harmonics are4797attenuated, and the higher harmonics are almost eliminated. So4798smoothing has the effect of a low-pass filter, which we4799saw in Section~\ref{spectrums} and Section~\ref{pink}.48004801To see how much each component has been attenuated, we can4802compute the ratio of the two spectrums:48034804\begin{verbatim}4805amps = spectrum.amps4806amps2 = spectrum2.amps4807ratio = amps2 / amps4808ratio[amps<560] = 04809thinkplot.plot(ratio)4810\end{verbatim}48114812{\tt ratio} is the ratio of the amplitude before and after4813smoothing. When {\tt amps} is small, this ratio can be big4814and noisy, so for simplicity I set the ratio to 0 except4815where the harmonics are.48164817\begin{figure}4818% convolution5.py4819\centerline{\includegraphics[height=2.5in]{figs/convolution5.pdf}}4820\caption{Ratio of spectrums for the square wave, before and after smoothing.}4821\label{fig.convolution5}4822\end{figure}48234824Figure~\ref{fig.convolution5} shows the result. As expected, the4825ratio is high for low frequencies and drops off at a cutoff frequency4826near 4000 Hz. But there is another feature we did not expect: above4827the cutoff, the ratio bounces around between 0 and 0.2.4828What's up with that?482948304831\section{The convolution theorem}4832\label{convtheorem}48334834\begin{figure}4835% convolution6.py4836\centerline{\includegraphics[height=2.5in]{figs/convolution6.pdf}}4837\caption{Ratio of spectrums for the square wave, before and after4838smoothing, along with the DFT of the smoothing window.}4839\label{fig.convolution6}4840\end{figure}48414842\newcommand{\DFT}{\mathrm{DFT}}4843\newcommand{\IDFT}{\mathrm{IDFT}}48444845The answer is the convolution theorem. Stated mathematically:4846%4847\[ \DFT(f \conv g) = \DFT(f) \cdot \DFT(g) \]4848%4849where $f$ is a wave array and $g$ is a window. In words,4850the convolution theorem says that if we convolve $f$ and $g$,4851and then compute the DFT, we get the same answer as computing4852the DFT of $f$ and $g$, and then multiplying the results4853element-wise. More concisely, convolution in the time4854domain corresponds to multiplication in the frequency domain.48554856And that explains Figure~\ref{fig.convolution5}, because when we4857convolve a wave and a window, we multiply the spectrum of4858the wave with the spectrum of the window. To see how that works,4859we can compute the DFT of the window:48604861\begin{verbatim}4862padded = zero_pad(window, N)4863dft_window = np.fft.rfft(padded)4864thinkplot.plot(abs(dft_window))4865\end{verbatim}48664867{\tt padded} contains the window, padded with zeros to be the4868same length as {\tt wave}. \verb"dft_window" contains the4869DFT of the smoothing window.48704871\newcommand{\abs}{\mathrm{abs}}48724873Figure~\ref{fig.convolution6} shows the result, along with the4874ratios we computed in the previous section. The ratios are4875exactly the amplitudes in \verb"dft_window". Mathematically:4876%4877\[ \abs(\DFT(f \conv g)) / \abs(\DFT(f)) = \abs(\DFT(g)) \]4878%4879In this context, the DFT of a window is called a {\bf filter}.4880For any convolution window in the time domain, there is a4881corresponding filter in the frequency domain. And for any4882filter than can be expressed by element-wise multiplication in4883the frequency domain, there is a corresponding window.488448854886\section{Gaussian filter}48874888\begin{figure}4889% convolution7.py4890\centerline{\includegraphics[height=2.5in]{figs/convolution7.pdf}}4891\caption{Boxcar and Gaussian windows.}4892\label{fig.convolution7}4893\end{figure}48944895The moving average window we used in the previous section is a4896low-pass filter, but it is not a very good one. The DFT drops off4897steeply at first, but then it bounces around. Those bounces are4898called {\bf sidelobes}, and they are there because the moving average4899window is like a square wave, so its spectrum contains high-frequency4900harmonics that drop off proportionally to $1/f$, which is relatively4901slow.49024903We can do better with a Gaussian window. SciPy provides functions4904that compute many common convolution windows, including {\tt4905gaussian}:49064907\begin{verbatim}4908gaussian = scipy.signal.gaussian(M=11, std=2)4909gaussian /= sum(gaussian)4910\end{verbatim}49114912{\tt M} is the number of elements in the window; {\tt std}4913is the standard deviation of the Gaussian distribution used to4914compute it. Figure~\ref{fig.convolution7} shows the shape4915of the window. It is a discrete approximation of the Gaussian4916``bell curve''. The figure also shows the moving average window4917from the previous example, which is sometimes called a4918{\bf boxcar window} because it looks like a rectangular railway car.49194920\begin{figure}4921% convolution.py4922\centerline{\includegraphics[height=2.5in]{figs/convolution8.pdf}}4923\caption{Ratio of spectrums before and after Gaussian smoothing, and4924the DFT of the window.}4925\label{fig.convolution8}4926\end{figure}49274928I ran the computations from the previous sections again4929with this curve, and generated Figure~\ref{fig.convolution8},4930which shows the ratio of the spectrums before and after4931smoothing, along with the DFT of the Gaussian window.49324933As a low-pass filter, Gaussian smoothing is better than a simple4934moving average. After the ratio drops off, it stays low, with almost4935none of the sidelobes we saw with the boxcar window. So it does a4936better job of cutting off the higher frequencies.49374938The reason it does so well is that the DFT of a Gaussian curve is also a4939Gaussian curve. So the ratio drops off in proportion to $\exp(-f^2)$,4940which is much faster than $1/f$.494149424943\section{Efficient convolution}4944\label{effconv}49454946One of the reasons the FFT is such an important algorithm is that,4947combined with the Convolution Theorem, it provides an efficient4948way to compute convolution, cross-correlation, and autocorrelation.49494950Again, the Convolution Theorem states4951%4952\[ \DFT(f \conv g) = \DFT(f) \cdot \DFT(g) \]4953%4954So one way to compute a convolution is:4955%4956\[ f \conv g = \IDFT(\DFT(f) \cdot \DFT(g)) \]4957%4958where $IDFT$ is the inverse DFT. A simple implementation of4959convolution takes time proportional to $N^2$; this algorithm,4960using FFT, takes time proportional to $N \log N$.49614962We can confirm that it works by computing the same convolution4963both ways. As an example, I'll apply it to the BitCoin data4964shown in Figure~\ref{fig.convolution1}.49654966\begin{verbatim}4967df = pandas.read_csv('coindesk-bpi-USD-close.csv',4968nrows=1625,4969parse_dates=[0])4970ys = df.Close.values4971\end{verbatim}49724973This example uses Pandas to read the data from the CSV file (included4974in the repository for this book). If you are not familiar with4975Pandas, don't worry: I'm not going to do much with it in this book.4976But if you're interested, you can learn more about it in4977{\it Think Stats} at \url{http://thinkstats2.com}.49784979The result, {\tt df}, is a DataFrame, one of the data structures4980provided by Pandas. {\tt ys} is a NumPy array that contains daily4981closing prices.49824983Next I'll create a Gaussian window and convolve it with {\tt ys}:49844985\begin{verbatim}4986window = scipy.signal.gaussian(M=30, std=6)4987window /= window.sum()4988smoothed = np.convolve(ys, window, mode='valid')4989\end{verbatim}49904991\verb"fft_convolve" computes the same thing using FFT:49924993\begin{verbatim}4994from np.fft import fft, ifft49954996def fft_convolve(signal, window):4997fft_signal = fft(signal)4998fft_window = fft(window)4999return ifft(fft_signal * fft_window)5000\end{verbatim}50015002We can test it by padding the window to the same length5003as {\tt ys} and then computing the convolution:50045005\begin{verbatim}5006padded = zero_pad(window, N)5007smoothed2 = fft_convolve(ys, padded)5008\end{verbatim}50095010The result has $M-1$ bogus values at the beginning, where $M$ is the5011length of the window. If we slice off the bogus values, the result5012agrees with \verb"fft_convolve" with about 12 digits of precision.50135014\begin{verbatim}5015M = len(window)5016smoothed2 = smoothed2[M-1:]5017\end{verbatim}501850195020\section{Efficient autocorrelation}50215022\begin{figure}5023% convolution.py5024\centerline{\includegraphics[height=2.5in]{figs/convolution9.pdf}}5025\caption{Autocorrelation functions computed by NumPy and5026{\tt fft\_correlate}.}5027\label{fig.convolution9}5028\end{figure}50295030In Section~\ref{convolution} I presented definitions of5031cross-correlation and convolution, and we saw that they are5032almost the same, except that in convolution the window is5033reversed.50345035Now that we have an efficient algorithm for convolution, we5036can also use it to compute cross-correlations and autocorrelations.5037Using the data from the previous section, we can compute the5038autocorrelation Facebook stock prices:50395040\begin{verbatim}5041corrs = np.correlate(close, close, mode='same')5042\end{verbatim}50435044With {\tt mode='same'}, the result has the same length as {\tt close},5045corresponding to lags from $-N/2$ to $N/2-1$.5046The gray line in Figure~\ref{fig.convolution9} shows the result.5047Except at {\tt lag=0}, there are no peaks, so there is no apparent5048periodic behavior in this signal. However, the autocorrelation5049function drops off slowly, suggesting that this signal resembled5050pink noise, as we saw in Section~\ref{autopink}.50515052To compute autocorrelation using convolution,5053have to zero-pad the signal to double the length.5054This trick is necessary because the FFT is based5055on the assumption that the signal is periodic; that is, that it wraps5056around from the end to the beginning. With time-series data like5057this, that assumption is invalid. Adding zeros, and then trimming5058the results, removes the bogus values.50595060Also, remember that convolution reverse the direction of the window.5061In order to cancel that effect, we reverse the direction of the5062window before calling \verb"fft_convolve", using {\tt np.flipud},5063which flips a NumPy array. The result is a view of the array,5064not a copy, so this operation is fast.50655066\begin{verbatim}5067def fft_autocorr(signal):5068N = len(signal)5069signal = thinkdsp.zero_pad(signal, 2*N)5070window = np.flipud(signal)50715072corrs = fft_convolve(signal, window)5073corrs = np.roll(corrs, N//2+1)[:N]5074return corrs5075\end{verbatim}50765077The result from \verb"fft_convolve" has length $2N$. Of those,5078the first and last $N/2$ are valid; the rest are the result of5079zero-padding. To select the valid element, we roll the results5080and select the first $N$, corresponding to lags from $-N/2$ to5081$N/2-1$.50825083As shown in Figure~\ref{fig.convolution9} the results from5084\verb"fft_autocorr" and {\tt np.correlate} are identical (with5085about 9 digits of precision).50865087Notice that the correlations in Figure~\ref{fig.convolution9} are5088large numbers; we could normalize them (between -1 and 1) as shown5089in Section~\ref{correlate}.50905091The strategy we used here for auto-correlation also works for5092cross-correlation. Again, you have to prepare the signals by flipping5093one and padding both, and then you have to trim the invalid parts of5094the result. This padding and trimming is a nuisance, but that's why5095libraries like NumPy provide functions to do it for you.509650975098\section{Exercises}50995100Solutions to these exercises are in {\tt chap08soln.ipynb}.51015102\begin{exercise}5103The notebook for this chapter is {\tt chap08.ipynb}.5104Read through it and run the code.51055106It contains an interactive widget that lets you5107experiment with the parameters of the Gaussian window to see5108what effect they have on the cutoff frequency.51095110What goes wrong when you increase the width of the Gaussian,5111{\tt std}, without increasing the number of elements in the window,5112{\tt M}?5113\end{exercise}511451155116\begin{exercise}5117In this chapter I claimed that the Fourier transform of a Gaussian5118curve is also a Gaussian curve. For discrete Fourier transforms,5119this relationship is approximately true.51205121Try it out for a few examples. What happens to the Fourier transform5122as you vary {\tt std}?5123\end{exercise}512451255126\begin{exercise}5127If you did the exercises in Chapter~\ref{nonperiodic}, you saw the5128effect of the Hamming window, and some of the other windows provided5129by NumPy, on spectral leakage. We can get some insight into the5130effect of these windows by looking at their DFTs.51315132In addition to the Gaussian window we used in this window, create a5133Hamming window with the same size. Zero pad the windows and plot5134their DFTs. Which window acts as a better low-pass filter? You might5135find it useful to plot the DFTs on a log-$y$ scale.51365137Experiment with a few different windows and a few different sizes.5138\end{exercise}5139514051415142\chapter{Differentiation and Integration}5143\label{diffint}51445145This chapter picks up where the previous chapter left off,5146looking at the relationship between windows in the time domain5147and filters in the frequency domain.51485149In particular, we'll look at the effect of a finite difference5150window, which approximates differentiation, and the cumulative5151sum operation, which approximates integration.51525153The code for this chapter is in {\tt chap09.ipynb}, which is in the5154repository for this book (see Section~\ref{code}).5155You can also view it at \url{http://tinyurl.com/thinkdsp09}.515651575158\section{Finite differences}5159\label{diffs}51605161In Section~\ref{smoothing}, we applied a smoothing window to5162the stock price of Facebook and found that a smoothing5163window in the time domain corresponds to a low-pass filter in5164the frequency domain.51655166In this section, we'll look at daily price changes and5167see that computing the difference between successive elements,5168in the time domain, corresponds to a high-pass filter.51695170Here's the code to read the data, store it as a wave, and compute its5171spectrum.51725173\begin{verbatim}5174import pandas as pd51755176names = ['date', 'open', 'high', 'low', 'close', 'volume']5177df = pd.read_csv('fb.csv', header=0, names=names)5178ys = df.close.values[::-1]5179close = thinkdsp.Wave(ys, framerate=1)5180spectrum = wave.make_spectrum()5181\end{verbatim}51825183This example uses Pandas to read the CSV file; the5184result is a DataFrame, {\tt df}, with columns for the opening5185price, closing price, and high and low prices. I select the closing5186prices and save them in a Wave object.5187The framerate is 1 sample per day.51885189Figure~\ref{fig.diff_int1} shows5190this time series and its spectrum.5191Visually, the time series resembles Brownian noise (see5192Section~\ref{brownian}).5193And the spectrum looks like a straight5194line, albeit a noisy one. The estimated slope is -1.9,5195which is consistent with Brownian noise.51965197\begin{figure}5198% diff_int.py5199\centerline{\includegraphics[height=2.5in]{figs/diff_int1.pdf}}5200\caption{Daily closing price of Facebook and the spectrum of this time5201series.}5202\label{fig.diff_int1}5203\end{figure}52045205Now let's compute the daily price change using {\tt np.diff}:52065207\begin{verbatim}5208diff = np.diff(ys)5209change = thinkdsp.Wave(diff, framerate=1)5210change_spectrum = change.make_spectrum()5211\end{verbatim}52125213Figure~\ref{fig.diff_int2} shows the resulting wave and its spectrum.5214The daily changes resemble white noise, and the estimated slope of the5215spectrum, -0.06, is near zero, which is what we expect for white5216noise.52175218\begin{figure}5219% diff_int2.py5220\centerline{\includegraphics[height=2.5in]{figs/diff_int2.pdf}}5221\caption{Daily price change of Facebook and the spectrum of this time series.}5222\label{fig.diff_int2}5223\end{figure}522452255226\section{The frequency domain}52275228Computing the difference5229between successive elements is the same as convolution with5230the window {\tt [1, -1]}.5231If the order of those elements seems backward,5232remember that convolution reverses the window before applying it5233to the signal.52345235We can see the effect of this operation in the frequency domain5236by computing the DFT of the window.52375238\begin{verbatim}5239diff_window = np.array([1.0, -1.0])5240padded = thinkdsp.zero_pad(diff_window, len(close))5241diff_wave = thinkdsp.Wave(padded, framerate=close.framerate)5242diff_filter = diff_wave.make_spectrum()5243\end{verbatim}52445245\begin{figure}5246% diff_int.py5247\centerline{\includegraphics[height=2.5in]{figs/diff_int3.pdf}}5248\caption{Filters corresponding to the diff and differentiate operators (left) and integration operator (right, log-$y$ scale).}5249\label{fig.diff_int3}5250\end{figure}52515252Figure~\ref{fig.diff_int3} shows the result. The finite difference5253window corresponds to a high pass filter: its amplitude increases with5254frequency, linearly for low frequencies, and then sublinearly after5255that. In the next section, we'll see why.525652575258\section{Differentiation}5259\label{effdiff}52605261The window we used in the previous section is a5262numerical approximation of the first derivative, so the filter5263approximates the effect of differentiation.52645265Differentiation in the time domain corresponds to a simple filter5266in the frequency domain; we can figure out what it is with a little5267math.52685269Suppose we have a complex sinusoid with frequency $f$:5270%5271\[ E_f(t) = e^{2 \pi i f t} \]5272%5273The first derivative of $E_f$ is5274%5275\[ \frac{d}{dt} E_f(t) = 2 \pi i f e^{2 \pi i f t} \]5276%5277which we can rewrite as5278%5279\[ \frac{d}{dt} E_f(t) = 2 \pi i f E_f(t) \]5280%5281In other words, taking the derivative of $E_f$ is the same5282as multiplying by $2 \pi i f$, which is a complex number5283with magnitude $2 \pi f$ and angle $\pi/2$.52845285We can compute the filter that corresponds to differentiation,5286like this:52875288\begin{verbatim}5289deriv_filter = close.make_spectrum()5290deriv_filter.hs = PI2 * 1j * deriv_filter.fs5291\end{verbatim}52925293I started with the spectrum of {\tt close}, which has the right5294size and framerate, then replaced the {\tt hs} with $2 \pi i f$.5295Figure~\ref{fig.diff_int3} (left) shows this filter; it is a straight line.52965297As we saw in Section~\ref{synthmat}, multiplying a complex sinusoid5298by a complex number has two effects: it multiplies5299the amplitude, in this case by $2 \pi f$, and shifts the phase5300offset, in this case by $\pi/2$.53015302If you are familiar with the language of operators and eigenfunctions,5303each $E_f$ is an eigenfunction of the differentiation operator, with the5304corresponding eigenvalue $2 \pi i f$. See5305\url{http://en.wikipedia.org/wiki/Eigenfunction}.53065307If you are not familiar with that language, here's what it5308means:53095310\newcommand{\op}{\mathcal{A}}53115312\begin{itemize}53135314\item An operator is a function that takes a function and returns5315another function. For example, differentiation is an operator.53165317\item A function, $g$, is an eigenfunction of an operator, $\op$, if5318applying $\op$ to $g$ has the effect of multiplying the function by5319a scalar. That is, $\op g = \lambda g$.53205321\item In that case, the scalar $\lambda$ is the eigenvalue that5322corresponds to the eigenfunction $g$.53235324\item A given operator might have many eigenfunctions, each with5325a corresponding eigenvalue.53265327\end{itemize}53285329Because complex sinusoids are eigenfunctions of the differentiation5330operator, they are easy to differentiate. All we have to do is5331multiply by a complex scalar.53325333For signals with more than one5334component, the process is only slightly harder:53355336\begin{enumerate}53375338\item Express the signal as the sum of complex sinusoids.53395340\item Compute the derivative of each component by multiplication.53415342\item Add up the differentiated components.53435344\end{enumerate}53455346If that process sounds familiar, that's because it is identical5347to the algorithm for convolution in Section~\ref{effconv}: compute5348the DFT, multiply by a filter, and compute the inverse DFT.53495350{\tt Spectrum} provides a method that applies the differentiation5351filter:53525353\begin{verbatim}5354# class Spectrum:53555356def differentiate(self):5357self.hs *= PI2 * 1j * self.fs5358\end{verbatim}53595360We can use it to compute the derivative of the Facebook time series:53615362\begin{verbatim}5363deriv_spectrum = close.make_spectrum()5364deriv_spectrum.differentiate()5365deriv = deriv_spectrum.make_wave()5366\end{verbatim}53675368\begin{figure}5369% diff_int.py5370\centerline{\includegraphics[height=2.5in]{figs/diff_int4.pdf}}5371\caption{Comparison of daily price changes computed by5372{\tt np.diff} and by applying the differentiation filter.}5373\label{fig.diff_int4}5374\end{figure}53755376Figure~\ref{fig.diff_int4} compares the daily price changes computed by5377{\tt np.diff} with the derivative we just computed.5378I selected the first 50 values in the time series so we can see the5379differences more clearly.53805381The derivative is noisier, because it amplifies the high frequency5382components more, as shown in Figure~\ref{fig.diff_int3} (left). Also, the5383first few elements of the derivative are very noisy. The problem5384there is that the DFT-based derivative is based on the assumption that5385the signal is periodic. In effect, it connects the last element in5386the time series back to the first element, which creates artifacts at5387the boundaries.53885389To summarize, we have shown:53905391\begin{itemize}53925393\item Computing the difference between successive values in a signal5394can be expressed as convolution with a simple window. The result is5395an approximation of the first derivative.53965397\item Differentiation in the time domain corresponds to a simple5398filter in the frequency domain. For periodic signals, the result is5399the first derivative, exactly. For some non-periodic signals, it5400can approximate the derivative.54015402\end{itemize}54035404Using the DFT to compute derivatives is the basis of {\bf spectral5405methods} for solving differential equations (see5406\url{http://en.wikipedia.org/wiki/Spectral_method}).54075408In particular, it is useful for the analysis of linear, time-invariant5409systems, which is coming up in Chapter~\ref{systems}.541054115412\section{Integration}54135414\begin{figure}5415% diff_int.py5416\centerline{\includegraphics[height=2.5in]{figs/diff_int5.pdf}}5417\caption{Comparison of the original time series and the integrated5418derivative.}5419\label{fig.diff_int5}5420\end{figure}54215422In the previous section, we showed that differentiation in the time5423domain corresponds to a simple filter in the frequency domain: it5424multiplies each component by $2 \pi i f$. Since integration is5425the inverse of differentiation, it also corresponds to a simple5426filter: it divides each component by $2 \pi i f$.54275428We can compute this filter like this:54295430\begin{verbatim}5431integ_filter = close.make_spectrum()5432integ_filter.hs = 1 / (PI2 * 1j * integ_filter.fs)5433\end{verbatim}54345435Figure~\ref{fig.diff_int3} (right) shows this filter on a log-$y$ scale,5436which makes it easier to see.54375438{\tt Spectrum} provides a method that applies the integration filter:54395440\begin{verbatim}5441# class Spectrum:54425443def integrate(self):5444self.hs /= PI2 * 1j * self.fs5445\end{verbatim}54465447We can confirm that the integration filter is correct by applying it5448to the spectrum of the derivative we just computed:54495450\begin{verbatim}5451integ_spectrum = deriv_spectrum.copy()5452integ_spectrum.integrate()5453\end{verbatim}54545455But notice that at $f=0$, we are dividing by 0. The result in5456NumPy is NaN, which is a special floating-point value that5457represents ``not a number''. We can partially deal with this5458problem by setting this value to 0 before converting the5459spectrum back to a wave:54605461\begin{verbatim}5462integ_spectrum.hs[0] = 05463integ_wave = integ_spectrum.make_wave()5464\end{verbatim}54655466Figure~\ref{fig.diff_int5} shows this integrated derivative along with5467the original time series. They are almost identical, but the5468integrated derivative has been shifted down. The problem is that when5469we clobbered the $f=0$ component, we set the bias of the signal to 0.5470But that should not be surprising; in general, differentiation loses5471information about the bias, and integration can't recover it. In some5472sense, the NaN at $f=0$ is telling us that this element is unknown.54735474\begin{figure}5475% diff_int.py5476\centerline{\includegraphics[height=2.5in]{figs/diff_int6.pdf}}5477\caption{A sawtooth wave and its spectrum.}5478\label{fig.diff_int6}5479\end{figure}54805481If we provide this ``constant of integration'', the results are5482identical, which confirms that this integration filter is the correct5483inverse of the differentiation filter.54845485\section{Cumulative sum}5486\label{cumsum}54875488In the same way that the diff operator approximates differentiation,5489the cumulative sum approximates integration.5490I'll demonstrate with a Sawtooth signal.54915492\begin{verbatim}5493signal = thinkdsp.SawtoothSignal(freq=50)5494in_wave = signal.make_wave(duration=0.1, framerate=44100)5495\end{verbatim}54965497Figure~\ref{fig.diff_int6} shows this wave and its spectrum.54985499{\tt Wave} provides a method that computes the cumulative sum of5500a wave array and returns a new Wave object:55015502\begin{verbatim}5503# class Wave:55045505def cumsum(self):5506ys = np.cumsum(self.ys)5507ts = self.ts.copy()5508return Wave(ys, ts, self.framerate)5509\end{verbatim}55105511We can use it to compute the cumulative sum of \verb"in_wave":55125513\begin{verbatim}5514out_wave = in_wave.cumsum()5515out_wave.unbias()5516\end{verbatim}55175518\begin{figure}5519% diff_int.py5520\centerline{\includegraphics[height=2.5in]{figs/diff_int7.pdf}}5521\caption{A parabolic wave and its spectrum.}5522\label{fig.diff_int7}5523\end{figure}55245525Figure~\ref{fig.diff_int7} shows the resulting wave and its spectrum.5526If you did the exercises in Chapter~\ref{harmonics}, this waveform should5527look familiar: it's a parabolic signal.55285529Comparing the spectrum of the parabolic signal to the spectrum of the5530sawtooth, the amplitudes of the components drop off more quickly. In5531Chapter~\ref{harmonics}, we saw that the components of the sawtooth5532drop off in proportion to $1/f$. Since the cumulative sum5533approximates integration, and integration filters components in5534proportion to $1/f$, the components of the parabolic wave drop off in5535proportion to $1/f^2$.55365537We can see that graphically by computing the filter that corresponds5538to the cumulative sum:55395540\begin{verbatim}5541cumsum_filter = diff_filter.copy()5542cumsum_filter.hs = 1 / cumsum_filter.hs5543\end{verbatim}55445545Because {\tt cumsum} is the inverse operation of {\tt diff}, we5546start with a copy of \verb"diff_filter", which is the filter5547that corresponds to the {\tt diff} operation, and then invert the5548{\tt hs}.55495550\begin{figure}5551% diff_int.py5552\centerline{\includegraphics[height=2.5in]{figs/diff_int8.pdf}}5553\caption{Filters corresponding to cumulative sum and integration.}5554\label{fig.diff_int8}5555\end{figure}55565557Figure~\ref{fig.diff_int8} shows the filters corresponding to5558cumulative sum and integration. The cumulative sum is a good5559approximation of integration except at the highest frequencies,5560where it drops off a little faster.55615562To confirm that this is the correct filter for the cumulative5563sum, we can compare it to the ratio of the spectrum5564\verb"out_wave" to the spectrum of \verb"in_wave":55655566\begin{verbatim}5567in_spectrum = in_wave.make_spectrum()5568out_spectrum = out_wave.make_spectrum()5569ratio_spectrum = out_spectrum.ratio(in_spectrum, thresh=1)5570\end{verbatim}55715572\begin{figure}5573% diff_int.py5574\centerline{\includegraphics[height=2.5in]{figs/diff_int9.pdf}}5575\caption{Filter corresponding to cumulative sum and actual ratios of5576the before-and-after spectrums.}5577\label{fig.diff_int9}5578\end{figure}55795580And here's the method that computes the ratios:55815582\begin{verbatim}5583def ratio(self, denom, thresh=1):5584ratio_spectrum = self.copy()5585ratio_spectrum.hs /= denom.hs5586ratio_spectrum.hs[denom.amps < thresh] = np.nan5587return ratio_spectrum5588\end{verbatim}55895590When {\tt denom.amps} is small, the resulting ratio is noisy,5591so I set those values to NaN.55925593Figure~\ref{fig.diff_int9} shows the ratios and the filter5594corresponding to the cumulative sum. They agree, which confirms that5595inverting the filter for {\tt diff} yields the filter for {\tt5596cumsum}.55975598Finally, we can confirm that the Convolution Theorem applies by5599applying the {\tt cumsum} filter in the frequency domain:56005601\begin{verbatim}5602out_wave2 = (in_spectrum * cumsum_filter).make_wave()5603\end{verbatim}56045605Within the limits of floating-point error, \verb"out_wave2" is5606identical to \verb"out_wave", which we computed using {\tt cumsum}, so5607the Convolution Theorem works! But note that this demonstration only5608works with periodic signals.560956105611\section{Integrating noise}56125613In Section~\ref{brownian}, we generated Brownian noise by computing the5614cumulative sum of white noise.5615Now that we understand the effect of {\tt cumsum} in the frequency5616domain, we have some insight into the spectrum of Brownian noise.56175618White noise has equal power at all frequencies, on average. When we5619compute the cumulative sum, the amplitude of each component is divided5620by $f$. Since power is the square of magnitude, the power of each5621component is divided by $f^2$. So on average, the power at frequency5622$f$ is proportional to $1 / f^2$:5623%5624\[ P_f = K / f^2 \]5625%5626where $K$ is a constant that's not important.5627Taking the log of both sides yields:5628%5629\[ \log P_f = \log K - 2 \log f \]5630%5631And that's why, when we plot the spectrum of Brownian noise on a5632log-log scale, we expect to see a straight line with slope -2, at5633least approximately.56345635In Section~\ref{diffs} we plotted the spectrum of closing prices for5636Facebook, and estimated that the slope is -1.9, which is consistent5637with Brownian noise. Many stock prices have similar spectrums.56385639When we use the {\tt diff} operator to compute daily changes, we5640multiplied the {\em amplitude} of each component by a filter proportional to5641$f$, which means we multiplied the {\em power} of each component by $f^2$.5642On a log-log scale, this operation adds 2 to the slope of the5643power spectrum, which is why the estimated slope of the result5644is near 0.1 (but a little lower, because {\tt diff} only approximates5645differentiation).5646564756485649\section{Exercises}56505651The notebook for this chapter is {\tt chap09.ipynb}.5652You might want to read through it and run the code.56535654Solutions to these exercises are in {\tt chap09soln.ipynb}.56555656\begin{exercise}5657The goal of this exercise is to explore the effect of {\tt diff} and5658{\tt differentiate} on a signal. Create a triangle wave and plot5659it. Apply {\tt diff} and plot the result. Compute the spectrum of the5660triangle wave, apply {\tt differentiate}, and plot the result. Convert5661the spectrum back to a wave and plot it. Are there differences between5662the effect of {\tt diff} and {\tt differentiate} for this wave?5663\end{exercise}56645665\begin{exercise}5666The goal of this exercise is to explore the effect of {\tt cumsum} and5667{\tt integrate} on a signal. Create a square wave and plot it. Apply5668{\tt cumsum} and plot the result. Compute the spectrum of the square5669wave, apply {\tt integrate}, and plot the result. Convert the spectrum5670back to a wave and plot it. Are there differences between the effect5671of {\tt cumsum} and {\tt integrate} for this wave?5672\end{exercise}56735674\begin{exercise}5675The goal of this exercise is the explore the effect of integrating5676twice. Create a sawtooth wave, compute its spectrum, then apply {\tt5677integrate} twice. Plot the resulting wave and its spectrum. What is5678the mathematical form of the wave? Why does it resemble a sinusoid?5679\end{exercise}56805681\begin{exercise}5682The goal of this exercise is to explore the effect of the 2nd5683difference and 2nd derivative. Create a {\tt CubicSignal}, which is5684defined in {\tt thinkdsp}. Compute the second difference by applying5685{\tt diff} twice. What does the result look like? Compute the second5686derivative by applying {\tt differentiate} to the spectrum twice.5687Does the result look the same?56885689Plot the filters that corresponds to the 2nd difference and the 2nd5690derivative and compare them. Hint: In order to get the filters on the5691same scale, use a wave with framerate 1.5692\end{exercise}56935694569556965697\chapter{LTI systems}5698\label{systems}56995700This chapter presents the theory of signals and systems, using5701musical acoustics as an example. It explains an5702important application of the Convolution Theorem, characterization5703of linear, time-invariant systems (which I'll define soon).57045705The code for this chapter is in {\tt chap10.ipynb}, which is in the5706repository for this book (see Section~\ref{code}).5707You can also view it at \url{http://tinyurl.com/thinkdsp10}.5708570957105711\section{Signals and systems}57125713In the context of signal processing, a {\bf system} is an abstract5714representation of anything that takes a signal as input and produces5715a signal as output.57165717For example, an electronic amplifier is a circuit that takes an5718electrical signal as input and produces a (louder) signal as output.57195720As another example, when you listen to a musical performance, you5721can think of the room as a system that takes the sound of the5722performance at the location where it is generated and produces a5723somewhat different sound at the location where you hear it.57245725A {\bf linear, time-invariant system}\footnote{My presentation here5726follows \url{http://en.wikipedia.org/wiki/LTI_system_theory}.} is a5727system with these two properties:57285729\begin{enumerate}57305731\item Linearity: If you put two inputs into the system at the same5732time, the result is the sum of their outputs. Mathematically, if an5733input $x_1$ produces output $y_1$ and another input $x_2$ produces5734$y_2$, then $a x_1 + b x_2$ produces $a y_1 + b y_2$, where $a$ and5735$b$ are scalars.57365737\item Time invariance: The5738effect of the system doesn't vary over time, or depend on the state5739of the system. So if inputs $x_1$ and $x_2$ differ by a shift in time,5740their outputs $y_1$ and $y_2$ differ by the same shift, but are otherwise5741identical.57425743\end{enumerate}57445745Many physical systems have these properties, at least approximately.57465747\begin{itemize}57485749\item Circuits that contain only resistors, capacitors and inductors are5750LTI, to the degree that the components behave like their idealized5751models.57525753\item Mechanical systems that contain springs, masses and5754dashpots are also LTI, assuming linear springs (force proportional5755to displacement) and dashpots (force proportional to velocity).57565757\item Also, and most relevant to applications in this book,5758the media that transmit sound (including air, water5759and solids) are well-modeled by LTI systems.57605761\end{itemize}57625763LTI systems are described by linear differential equations, and5764the solutions of those equations are complex sinusoids (see5765\url{http://en.wikipedia.org/wiki/Linear_differential_equation}).57665767This result provides an algorithm for computing the effect of5768an LTI system on an input signal:57695770\begin{enumerate}57715772\item Express the signal as the sum of complex sinusoid components.57735774\item For each input component, compute the corresponding output component.57755776\item Add up the output components.57775778\end{enumerate}57795780At this point, I hope this algorithm sounds familiar. It's the5781same algorithm we used for convolution in Section~\ref{effconv}, and5782for differentiation in Section~\ref{effdiff}. This process5783is called {\bf spectral decomposition} because we ``decompose''5784the input signal into its spectral components.57855786In order to apply this process to an LTI system, we have to {\bf5787characterize} the system by finding its effect on each component5788of the input signal. For mechanical systems, it turns out that there5789is a simple and efficient way to do that: you kick it and record5790the output.57915792Technically, the ``kick'' is called an {\bf impulse} and the5793output is called the {\bf impulse response}. You might wonder5794how a single impulse can completely characterize a system. You5795can see the answer by computing the DFT of an impulse. Here's5796a wave array with an impulse at $t=0$:57975798\begin{verbatim}5799impulse = np.zeros(8)5800impulse[0] = 15801impulse_spectrum = np.fft.fft(impulse)5802\end{verbatim}58035804Here's the wave array:58055806\begin{verbatim}5807[ 1. 0. 0. 0. 0. 0. 0. 0.]5808\end{verbatim}58095810And here's its spectrum:58115812\begin{verbatim}5813[ 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j]5814\end{verbatim}58155816The spectrum is all ones; that is, an impulse is the sum of components5817with equal magnitudes at all frequencies. This5818spectrum should not be confused with white noise, which has the same5819{\em average} power at all frequencies, but varies around that5820average.58215822When you test a system by inputting5823an impulse, you are testing the response of the5824system at all frequencies. And you can test them all at the same5825time because the system is linear, so simultaneous tests don't5826interfere with each other.582758285829\section{Windows and filters}5830\label{winfilt}58315832To show why this kind of system characterization works, I5833will start with a simple example: a 2-element moving average.5834We can think of this operation as a system that takes a signal5835as an input and produces a slightly smoother signal as an output.58365837In this example we know what the window is, so we can compute5838the corresponding filter. But that's not usually the case; in the5839next section we'll look at an example where we don't know the5840window or the filter ahead of time.58415842\begin{figure}5843% systems.py5844\centerline{\includegraphics[height=2.5in]{figs/systems1.pdf}}5845\caption{DFT of a 2-element moving average window.}5846\label{fig.systems1}5847\end{figure}58485849Here's a window that computes a58502-element moving average (see Section~\ref{smoothing}):58515852\begin{verbatim}5853window_array = np.array([0.5, 0.5, 0, 0, 0, 0, 0, 0,])5854window = thinkdsp.Wave(window_array, framerate=8)5855\end{verbatim}58565857We can find the corresponding filter by computing the DFT of the5858window:58595860\begin{verbatim}5861filtr = window.make_spectrum(full=True)5862\end{verbatim}58635864Figure~\ref{fig.systems1} shows the result. The filter that corresponds to a5865moving average window is a low-pass filter with the approximate shape5866of a Gaussian curve.58675868Now imagine that we did not know the window or the corresponding filter,5869and we wanted to characterize this system. We would do that by5870inputting an impulse and measuring the impulse response.58715872In this example, we can compute the impulse response by multiplying5873spectrum of the impulse and the filter, and then converting the result5874from a spectrum to a wave:58755876\begin{verbatim}5877product = impulse_spectrum * filtr5878filtered = product.make_wave()5879\end{verbatim}58805881Since \verb"impulse_spectrum" is all ones, the product is5882identical to the filter, and the filtered wave is identical to the5883window.58845885This example demonstrates two things:58865887\begin{itemize}58885889\item Because the spectrum of an impulse is all ones, the DFT5890of the impulse response is identical to the filter that5891characterizes the system.58925893\item Therefore, the impulse response is identical to the5894convolution window that characterizes the system.58955896\end{itemize}589758985899\section{Acoustic response}5900\label{response}59015902To characterize the acoustic response of a room or open space, a5903simple way to generate an impulse is to pop a balloon or5904fire a gun. A gunshot puts an impulse into5905the system; the sound you hear is the impulse response.59065907\begin{figure}5908% systems.py5909\centerline{\includegraphics[height=2.5in]{figs/systems6.pdf}}5910\caption{Waveform of a gunshot.}5911\label{fig.systems6}5912\end{figure}59135914As an example, I'll use a recording of a gunshot to characterize5915the room where the gun was fired, then use the impulse response5916to simulate the effect of that room on a violin recording.59175918This example is in {\tt chap10.ipynb}, which is in the repository5919for this book; you can also view it, and listen to the examples,5920at \url{http://tinyurl.com/thinkdsp10}.59215922Here's the gunshot:59235924\begin{verbatim}5925response = thinkdsp.read_wave('180961__kleeb__gunshots.wav')5926response = response.segment(start=0.26, duration=5.0)5927response.normalize()5928response.plot()5929\end{verbatim}59305931I select a segment starting at 0.26 seconds to remove the silence5932before the gunshot. Figure~\ref{fig.systems6} (left) shows the5933waveform of the gunshot. Next we compute the DFT of {\tt response}:59345935\begin{verbatim}5936transfer = response.make_spectrum()5937transfer.plot()5938\end{verbatim}59395940Figure~\ref{fig.systems6} (right) shows the result. This spectrum5941encodes the response of the room; for each frequency, the spectrum5942contains a complex number that represents an amplitude multiplier and5943a phase shift. This spectrum is called a {\bf transfer5944function} because it contains information about how the system transfers5945the input to the output.59465947Now we can simulate the effect this room would have on the sound5948of a violin. Here is the violin recording we used in Section~\ref{violin}59495950\begin{verbatim}5951violin = thinkdsp.read_wave('92002__jcveliz__violin-origional.wav')5952violin.truncate(len(response))5953violin.normalize()5954\end{verbatim}59555956The violin and gunshot waves were sampled at the same framerate,595744,100 Hz. And coincidentally, the duration of both is about the5958same. I trimmed the violin wave to the same length as the gunshot.59595960Next I compute the DFT of the violin wave:59615962\begin{verbatim}5963spectrum = violin.make_spectrum()5964\end{verbatim}59655966Now I know the magnitude and phase of each component in the5967input, and I know the transfer function of the system. Their5968product is the DFT of the output, which we can use to compute the5969output wave:59705971\begin{verbatim}5972output = (spectrum * transfer).make_wave()5973output.normalize()5974output.plot()5975\end{verbatim}59765977\begin{figure}5978% systems.py5979\centerline{\includegraphics[height=2.5in]{figs/systems7.pdf}}5980\caption{The waveform of the violin recording before and after convolution.}5981\label{fig.systems7}5982\end{figure}59835984Figure~\ref{fig.systems7} shows the input (top) and output (bottom) of5985the system. They are substantially different, and the differences are5986clearly audible. Load {\tt chap10.ipynb} and listen to them. One5987thing I find striking about this example is that you can get a sense5988of what the room was like; to me, it sounds like a long, narrow room5989with hard floors and ceilings. That is, like a firing range.59905991There's one thing I glossed over in this example that I'll mention5992in case it bothers anyone. The violin recording I started with5993has already been transformed by one system: the room where it was5994recorded. So what I really computed in my example is the sound5995of the violin after two transformations. To properly simulate5996the sound of a violin in a different room, I should have characterized5997the room where the violin was recorded and applied the inverse5998of that transfer function first.599960006001\section{Systems and convolution}6002\label{sysconv}60036004\begin{figure}6005% systems.py6006\centerline{\includegraphics[height=2.5in]{figs/systems8.pdf}}6007\caption{Sum of a wave and a shifted, scaled copy.}6008\label{fig.systems8}6009\end{figure}60106011If you think the previous example is black magic,6012you are not alone. I've been thinking about it for a while and it6013still makes my head hurt.60146015In the previous section, I suggested one way to think about it:60166017\begin{itemize}60186019\item An impulse is made up of components with amplitude 1 at all6020frequencies.60216022\item The impulse response contains the sum of the responses of the6023system to all of these components.60246025\item The transfer function, which is the DFT of the impulse response,6026encodes the effect of the system on each frequency component in the form6027of an amplitude multiplier and a phase shift.60286029\item For any input, we can compute the response of the system6030by breaking the input into components, computing the response to6031each component, and adding them up.60326033\end{itemize}60346035But if you don't like that, there's another way to think about6036it altogether: convolution! By the Convolution Theorem, multiplication6037in the frequency domain corresponds to convolution in the time6038domain. In this example, the output of the system is the convolution6039of the input and the system response.60406041Here are the keys to understanding why that works:60426043\begin{itemize}60446045\item You can think of the samples in the input wave as a sequence6046of impulses with varying amplitude.60476048\item Each impulse in the input yields a copy of the impulse response,6049shifted in time (because the system is time-invariant) and scaled by6050the amplitude of the input.60516052\item The output is the sum of the shifted, scaled copies of the6053impulse response. The copies add up because the system is linear.60546055\end{itemize}60566057Let's work our way up gradually: suppose that instead of firing one6058gun, we fire two: a big one with amplitude 1 at $t=0$ and a6059smaller one with amplitude 0.5 at $t=1$.60606061We can compute the response of the system by adding up6062the original impulse response and a scaled, shifted copy of itself.6063Here's a function that makes a shifted, scaled copy of6064a wave:60656066\begin{verbatim}6067def shifted_scaled(wave, shift, factor):6068res = wave.copy()6069res.shift(shift)6070res.scale(factor)6071return res6072\end{verbatim}60736074The parameter {\tt shift} is a time shift in seconds; {\tt factor}6075is a multiplicative factor.60766077Here's how we use it to compute the response to a two-gun salute:60786079\begin{verbatim}6080shift = 16081factor = 0.56082gun2 = response + shifted_scaled(response, shift, factor)6083\end{verbatim}60846085Figure~\ref{fig.systems8} shows the result. You can hear what6086it sounds like in {\tt chap10.ipynb}. Not surprisingly, it6087sounds like two gunshots, the first one louder than the second.60886089Now suppose instead of two guns, you add up 100 guns fired at a rate6090of 441 shots per second.6091This loop computes the result:60926093\begin{verbatim}6094dt = 1 / 4416095total = 06096for k in range(100):6097total += shifted_scaled(response, k*dt, 1.0)6098\end{verbatim}60996100With 441 shots per second,6101so you don't hear the individual shots. Instead, it sounds6102like a periodic signal at 441 Hz. If you play this example, it6103sounds like a car horn in a garage.61046105And that brings us to a key insight: you can think of any wave as a6106series of samples, where each sample is an impulse with a different6107amplitude.61086109As a example, I'll generate a sawtooth signal at 441 Hz:61106111\begin{verbatim}6112signal = thinkdsp.SawtoothSignal(freq=441)6113wave = signal.make_wave(duration=0.1,6114framerate=response.framerate)6115\end{verbatim}61166117Now I'll loop through the series of impulses that make up the6118sawtooth, and add up the impulse responses:61196120\begin{verbatim}6121total = 06122for t, y in zip(wave.ts, wave.ys):6123total += shifted_scaled(response, t, y)6124\end{verbatim}61256126The result is what it would sound like to play a sawtooth wave in a6127firing range. Again, you can listen to it in {\tt chap10.ipynb}.61286129\begin{figure}6130\centerline{\includegraphics[height=1.2in]{figs/diagram2.pdf}}6131\caption{Diagram of the sum of scaled and shifted copies of $g$.}6132\label{fig.convolution}6133\end{figure}61346135Figure~\ref{fig.convolution} shows a diagram of this computation,6136where $f$ is the sawtooth, $g$ is the impulse response, and $h$6137is the sum of the shifted, scaled copies of $g$.61386139For the example shown:61406141\[ h[2] = f[0]g[2] + f[1]g[1] + f[2]g[0] \]61426143And more generally,61446145\[ h[n] = \sum_{m=0}^{N-1} f[m] g[n-m] \]61466147You might recognize this equation from Section~\ref{convolution}. It6148is the convolution of $f$ and $g$. This shows that if the input is $f$6149and the impulse response of the system is $g$, the output is the6150convolution of $f$ and $g$.61516152In summary, there are two ways to think about the effect of a system6153on a signal:61546155\begin{enumerate}61566157\item The input is a sequence of impulses, so the output is the sum of6158scaled, shifted copies of the impulse response; that sum is the6159convolution of the input and the impulse response.61606161\item The DFT of the impulse response is a transfer function that6162encodes the effect of the system on each frequency component as a6163magnitude and phase offset. The DFT of the input encodes the6164magnitude and phase offset of the frequency components it contains.6165Multiplying the DFT of the input by the transfer function yields6166the DFT of the output.61676168\end{enumerate}61696170The equivalence of these descriptions should not be a surprise;6171it is basically a statement of the Convolution Theorem:6172convolution of $f$ and $g$ in the time6173domain corresponds to multiplication in the frequency domain.6174And if you wondered why convolution is defined as it is, which6175seemed backwards when we talked about smoothing and difference6176windows, now you know the reason: the definition of convolution6177appears naturally in the response of an LTI system to a signal.617861796180\section{Proof of the Convolution Theorem}61816182Well, I've put it off long enough. It's time to prove the Convolution6183Theorem (CT), which states:61846185\[ \DFT(f \conv g) = \DFT(f) \DFT(g) \]61866187where $f$ and $g$ are vectors with the same length, $N$.61886189I'll proceed in two steps:61906191\begin{enumerate}61926193\item I'll show that in the special case where $f$ is a complex6194exponential, convolution with $g$ has the effect of multiplying6195$f$ by a scalar.61966197\item In the more general case where $f$ is not a complex exponential,6198we can use the DFT to express it as a sum of exponential components,6199compute the convolution of each component (by multiplication) and6200then add up the results.62016202\end{enumerate}62036204Together these steps prove the Convolution Theorem. But first, let's6205assemble the pieces we'll need. The DFT of $g$, which I'll call $G$6206is:6207%6208\[ DFT(g)[k] = G[k] = \sum_n g[n] \exp(-2 \pi i n k / N) \]6209%6210where $k$ is an index of frequency from62110 to $N-1$ and $n$ is an index of time from 0 to $N-1$.6212The result is a vector of $N$ complex numbers.62136214The inverse DFT of $F$, which I'll call $f$, is:6215%6216\[ IDFT(F)[n] = f[n] = \sum_k F[k] \exp(2 \pi i n k / N) \]6217%6218Here's the definition of convolution:6219%6220\[ (f \conv g)[n] = \sum_m f[m] g[n-m] \]6221%6222where $m$ is another index of time from 0 to $N-1$.6223Convolution is commutative, so I could equivalently write:6224%6225\[ (f \conv g)[n] = \sum_m f[n-m] g[m] \]6226%6227Now let's consider the special case where $f$ is a complex6228exponential with frequency $k$, which I'll call $e_k$:6229%6230\[ f[n] = e_k[n] = \exp(2 \pi i n k / N) \]6231%6232where $k$ is an index of frequency and $n$ is an index of time.62336234Plugging $e_k$ into the second definition of convolution yields6235%6236\[ (e_k \conv g)[n] = \sum_m \exp(2 \pi i (n-m) k / N) g[m] \]6237%6238We can split the first term into a product:6239%6240\[ (e_k \conv g)[n] = \sum_m \exp(2 \pi i n k / N) \exp(-2 \pi i m k / N) g[m] \]6241%6242The first half does not depend on $m$, so we can pull it out of the6243summation:6244%6245\[ (e_k \conv g)[n] = \exp(2 \pi i n k / N) \sum_m \exp(-2 \pi i m k / N) g[m] \]6246%6247Now we recognize that the first term is $e_k$, and the summation is6248$G[k]$ (using $m$ as the index of time). So we can write:6249%6250\[ (e_k \conv g)[n] = e_k[n] G[k] \]6251%6252which shows that for each complex exponential, $e_k$, convolution6253with $g$ has the effect of multiplying $e_k$ by $G[k]$. In mathematical6254terms, each $e_k$ is an eigenvector of this operation, and6255$G[k]$ is the corresponding eigenvalue (see Section~\ref{effdiff}).62566257Now for the second part of the proof. If the input signal, $f$, doesn't6258happen to be a complex exponential, we can express it as a sum of6259complex exponentials by computing its DFT, $F$.6260For each value of $k$ from 0 to $N-1$, $F[k]$ is the complex6261magnitude of the component with frequency $k$.62626263Each input component is a complex exponential with magnitude6264$F[k]$, so each output component is a complex6265exponential with magnitude $F[k] G[k]$, based on the first part of6266the proof.62676268Because the system is linear, the output is just the sum of the6269output components:6270%6271\[ (f \conv g)[n] = \sum_k F[k] G[k] e_k[n] \]6272%6273Plugging in the definition of $e_k$ yields6274%6275\[ (f \conv g)[n] = \sum_k F[k] G[k] \exp(2 \pi i n k / N) \]6276%6277The right hand side is the inverse DFT of the product $F G$. Thus:6278%6279\[ (f \conv g) = \IDFT( F G ) \]6280%6281Substituting $F = \DFT(f)$ and $G = \DFT(g)$:6282%6283\[ (f \conv g) = \IDFT( \DFT(f) \DFT(g) ) \]6284%6285Finally, taking the DFT of both sides yields the Convolution Theorem:6286%6287\[ \DFT(f \conv g) = \DFT(f) \DFT(g) \]6288%6289QED629062916292\section{Exercises}62936294Solutions to these exercises are in {\tt chap10soln.ipynb}.62956296\begin{exercise}6297In Section~\ref{sysconv} I describe convolution as the sum of shifted,6298scaled copies of a signal. Strictly speaking, this operation is6299{\em linear} convolution, which does not assume that the signal6300is periodic.63016302But in Section~\ref{response}, when we multiply the6303DFT of the signal by the transfer function, that operation corresponds6304to {\em circular} convolution, which assumes that the signal is6305periodic. As a result, you might notice that the output contains6306an extra note at the beginning, which wraps around from the end.63076308Fortunately, there is a standard solution to this problem. If you6309add enough zeros to the end of the signal before computing the DFT,6310you can avoid wrap-around and compute a linear convolution.63116312Modify the example in {\tt chap10.ipynb} and confirm that zero-padding6313eliminates the extra note at the beginning of the output.6314\end{exercise}631563166317\begin{exercise}6318The Open AIR library provides a ``centralized... on-line resource for6319anyone interested in auralization and acoustical impulse response6320data'' (\url{http://www.openairlib.net}). Browse their collection6321of impulse response data and download one that sounds interesting.6322Find a short recording that has the same sample rate as the impulse6323response you downloaded.63246325Simulate the sound of your recording in the space where the impulse6326response was measured, computed two way: by convolving the recording6327with the impulse response and by computing the filter that corresponds6328to the impulse response and multiplying by the DFT of the recording.6329\end{exercise}63306331%TODO: one more?6332%\begin{exercise}6333%\end{exercise}6334633563366337\chapter{Modulation and sampling}63386339In Section~\ref{aliasing} we saw that when a signal is sampled at634010,000 Hz, a component at 5500 Hz is indistinguishable from a6341component at 4500 Hz. In this example, the folding frequency, 5000 Hz,6342is half of the sampling rate. But I didn't explain why.63436344This chapter explores the effect of sampling and presents the6345Sampling Theorem, which explains aliasing and the folding frequency.63466347I'll start by exploring the effect of convolution with impulses;6348I'll uses that to explain amplitude modulation (AM), which6349turns out to be useful for understanding the Sampling Theorem.63506351The code for this chapter is in {\tt chap11.ipynb}, which is in the6352repository for this book (see Section~\ref{code}).6353You can also view it at \url{http://tinyurl.com/thinkdsp11}.635463556356\section{Convolution with impulses}63576358As we saw in Section~\ref{sysconv}, convolution of a signal with6359a series of impulses has the effect of making shifted, scaled6360copies of the signal.63616362As an example, I'll read signal that sounds like a beep:63636364\begin{verbatim}6365filename = '253887__themusicalnomad__positive-beeps.wav'6366wave = thinkdsp.read_wave(filename)6367wave.normalize()6368\end{verbatim}63696370And I'll construct a wave with four impulses:63716372\begin{verbatim}6373imp_sig = thinkdsp.Impulses([0.005, 0.3, 0.6, 0.9],6374amps=[1, 0.5, 0.25, 0.1])6375impulses = imp_sig.make_wave(start=0, duration=1.0,6376framerate=wave.framerate)6377\end{verbatim}63786379And then convolve them:63806381\begin{verbatim}6382convolved = wave.convolve(impulses)6383\end{verbatim}63846385\begin{figure}6386% sampling.py6387\centerline{\includegraphics[height=3.5in]{figs/sampling1.pdf}}6388\caption{The effect of convolving a signal (top left) with a series of6389impulses (bottom left). The result (right) is the sum of shifted,6390scaled copies of the signal.}6391\label{fig.sampling1}6392\end{figure}63936394Figure~\ref{fig.sampling1} shows the results, with the signal in6395the top left, the impulses in the lower left, and the result on6396the right.63976398You can hear the result in {\tt chap11.ipynb}; it sounds like6399a series of four beeps with decreasing loudness.64006401The point of this example is just to demonstrate that convolution6402with impulses makes shifted, scaled copies. This result will be6403useful in the next section.640464056406\section{Amplitude modulation}6407\label{am}64086409\begin{figure}6410% sampling.py6411\centerline{\includegraphics[height=5.5in]{figs/sampling2.pdf}}6412\caption{Demonstration of amplitude modulation. The top row is the6413spectrum of the signal; the next row is the spectrum after modulation;6414the next row is the spectrum after demodulation; the last row is the6415demodulated signal after low-pass filtering.}6416\label{fig.sampling2}6417\end{figure}64186419Amplitude modulation (AM) is used to broadcast AM radio, among other6420applications. At the transmitter, the signal (which might contain6421speech, music, etc.) is ``modulated'' by multiplying it with a cosine6422signal that acts as a ``carrier wave''. The result is a6423high-frequency wave that is suitable for broadcast by radio. Typical6424frequencies for AM radio in the United States are 500--1600 kHz (see6425\url{https://en.wikipedia.org/wiki/AM_broadcasting}).64266427At the receiving end, the broadcast signal is ``demodulated'' to6428recover the original signal. Surprisingly, demodulation works by6429multiplying the broadcast signal, again, by the same carrier wave.64306431To see how that works, I'll modulate a signal with a carrier wave at643210 kHz. Here's the signal:64336434\begin{verbatim}6435filename = '105977__wcfl10__favorite-station.wav'6436wave = thinkdsp.read_wave(filename)6437wave.unbias()6438wave.normalize()6439\end{verbatim}64406441And here's the carrier:64426443\begin{verbatim}6444carrier_sig = thinkdsp.CosSignal(freq=10000)6445carrier_wave = carrier_sig.make_wave(duration=wave.duration,6446framerate=wave.framerate)6447\end{verbatim}64486449We can multiply them using the {\tt *} operator, which multiplies6450the wave arrays elementwise:64516452\begin{verbatim}6453modulated = wave * carrier_wave6454\end{verbatim}64556456The result sounds pretty bad. You can hear it in {\tt chap11.ipynb}.64576458Figure~\ref{fig.sampling2} shows what's happening in the frequency6459domain. The top row is the spectrum of the original signal. The6460next row is the spectrum of the modulated signal, after multiplying6461by the carrier. It contains two copies of the original spectrum,6462shifted by plus and minus 10 kHz.64636464To understand why, recall that convolution in the time domain corresponds6465to multiplication in the frequency domain. Conversely, multiplication6466in the time domain corresponds to convolution in the frequency domain.6467When we multiply the signal by the carrier, we are convolving its6468spectrum with the DFT of the carrier.64696470Since the carrier is a simple cosine wave, its DFT is two impulses, at6471plus and minus 10 kHz. Convolution with these impulses makes6472shifted, scaled copies of the spectrum. Notice that the amplitude of6473the spectrum is smaller after modulation. The energy from the original6474signal is split between the copies.64756476We demodulate the signal, by multiplying by the carrier wave again:64776478\begin{verbatim}6479demodulated = modulated * carrier_wave6480\end{verbatim}64816482The third row of Figure~\ref{fig.sampling2} shows the result. Again,6483multiplication in the time domain corresponds to convolution in the6484frequency domain, which makes shifted, scaled copies of the spectrum.64856486Since the modulated spectrum contains two peaks, each peak gets split6487in half and shifted by plus and minus 20 kHz. Two of the copies6488meet at 0 kHz and get added together; the other two copies end up6489centered at plus and minus 20 kHz.64906491If you listen to the demodulated signal, it sounds pretty good. The6492extra copies of the spectrum add high frequency components that were6493not it the original signal, but they are so high that your speakers6494probably can't play them, and most people can't hear them. But if6495you have good speakers and good ears, you might.64966497In that case, you can get rid of the extra components by applying a6498low-pass filter:64996500\begin{verbatim}6501demodulated_spectrum = demodulated.make_spectrum(full=True)6502demodulated_spectrum.low_pass(10000)6503filtered = demodulated_spectrum.make_wave()6504\end{verbatim}65056506The result is quite close to the original wave, although about half6507of the power is lost after demodulating and filtering. But that's not6508a problem in practice, because much more of the power is lost in6509transmitting and receiving the broadcast signal. We have to amplify6510the result anyway, another factor of 2 is not an issue.651165126513\section{Sampling}65146515\begin{figure}6516% sampling.py6517\centerline{\includegraphics[height=3.0in]{figs/sampling3.pdf}}6518\caption{Spectrum of a signal before (top) and after (bottom) sampling.}6519\label{fig.sampling3}6520\end{figure}65216522I explained amplitude modulation in part because it is interesting, but6523mostly because it will help us understand sampling. ``Sampling'' is6524the process of measuring an analog signal at an series of points in6525time, usually with equal spacing.65266527For example, the WAV files we have used as examples were6528recorded by sampling the output of a microphone using an analog-to-digital6529converter (ADC). The sampling rate for most of them is 44.1 kHz,6530which is the standard rate for ``CD quality'' sound, or 48 kHz, which6531is the standard for DVD sound.65326533At 48 kHz, the folding frequency is 24 kHz, which is higher than most6534people can hear (see \url{https://en.wikipedia.org/wiki/Hearing_range}).65356536In most of these waves, each sample has 16 bits, so there6537are $2^{16}$ distinct levels. This ``bit depth'' turns out to be enough6538that adding more bits does not improve the sound quality noticeably6539(see \url{https://en.wikipedia.org/wiki/Digital_audio}).65406541Of course, applications other than audio signals might require higher6542sampling rates, in order to capture higher frequencies, or higher6543bit-depth, in order to reproduce waveforms with more fidelity.65446545%TODO: refer to Vi Hart's video6546%TODO: refer to the sampling video65476548To demonstrate the effect of the sampling process, I am going to start6549with a wave sampled at 44.1 kHz and select samples from it at about 11 kHz.6550This is not exactly the same as sampling from an analog signal, but6551the effect is the same.65526553First I'll load a recording of a drum solo:65546555\begin{verbatim}6556filename = '263868__kevcio__amen-break-a-160-bpm.wav'6557wave = thinkdsp.read_wave(filename)6558wave.normalize()6559\end{verbatim}65606561Figure~\ref{fig.sampling3} (top) shows the spectrum of this wave.6562Now here's the function that samples from the wave:65636564\begin{verbatim}6565def sample(wave, factor=4):6566ys = np.zeros(len(wave))6567ys[::factor] = wave.ys[::factor]6568return thinkdsp.Wave(ys, framerate=wave.framerate)6569\end{verbatim}65706571I'll use it to select every fourth element:65726573\begin{verbatim}6574sampled = sample(wave, 4)6575\end{verbatim}65766577The result has the same framerate as the original, but most of the6578elements are zero. If you play the sampled wave, it doesn't sound6579very good. The sampling process introduces high-frequency6580components that were not in the original.65816582Figure~\ref{fig.sampling3} (bottom) shows the spectrum of the sampled6583wave. It contains four copies of the original spectrum (it looks like6584five copies because one is split between the highest and lowest6585frequencies).65866587\begin{figure}6588% sampling.py6589\centerline{\includegraphics[height=4in]{figs/sampling9.pdf}}6590\caption{The DFT of an impulse train is also an impulse train.}6591\label{fig.sampling9}6592\end{figure}65936594To understand where these copies come from, we can think of the6595sampling process as multiplication with a series of impulses. Instead6596of using {\tt sample} to select every fourth element, we could use6597this function to make a series of impulses, sometimes called an6598{\bf impulse train}:65996600\begin{verbatim}6601def make_impulses(wave, factor):6602ys = np.zeros(len(wave))6603ys[::factor] = 16604ts = np.arange(len(wave)) / wave.framerate6605return thinkdsp.Wave(ys, ts, wave.framerate)6606\end{verbatim}66076608And then multiply the original wave by the impulse train:66096610\begin{verbatim}6611impulses = make_impulses(wave, 4)6612sampled = wave * impulses6613\end{verbatim}66146615The result is the same; it still doesn't sound very good, but now6616we understand why. Multiplication in the time domain corresponds6617to convolution in the frequency domain. When we multiply6618by an impulse train, we are convolving with the DFT of an6619impulse train. As it turns out, the DFT of an impulse6620train is also an impulse train.66216622Figure~\ref{fig.sampling9} shows two examples. The top row is6623the impulse train in the example, with frequency 11,025 Hz.6624The DFT is a train of 4 impulses, which is why we get 4 copies6625of the spectrum. The bottom row shows an impulse6626train with a lower frequency, about 5512 Hz. Its DFT is a train6627of 8 impulses. In general, more impulses in the time6628domain correspond to fewer impulses in the frequency6629domain.66306631In summary:66326633\begin{itemize}66346635\item We can think of sampling as multiplication by an impulse train.66366637\item Multiplying by an impulse train corresponds6638to convolution with an impulse train in the frequency domain.66396640\item Convolution with an impulse train makes multiple copies of the6641signal's spectrum.66426643\end{itemize}664466456646\section{Aliasing}66476648\begin{figure}6649% sampling.py6650\centerline{\includegraphics[height=5.5in]{figs/sampling4.pdf}}6651\caption{Spectrum of the drum solo (top), spectrum of the impulse6652train (second row), spectrum of the sampled wave (third row),6653after low-pass filtering (bottom).}6654\label{fig.sampling4}6655\end{figure}66566657Section~\ref{am}, after demodulating an AM signal we got rid of6658the extra copies of the spectrum by applying a low-pass filter.6659We can do the same thing after sampling, but it turns out6660not to be a perfect solution.66616662Figure~\ref{fig.sampling4} shows why not. The top row is the spectrum6663of the drum solo. It contains high frequency components that extend6664past 10 kHz. When we sample this wave, we convolve the spectrum6665with the impulse train (second row), which makes copies of the spectrum6666(third row). The bottom row shows the result after applying a low-pass6667filter with a cutoff at the folding frequency, 5512 Hz.66686669If we convert the result back to a wave, it is similar to the original6670wave, but there are two problems:66716672\begin{itemize}66736674\item Because of the low-pass filter, the components above 5500 Hz6675have been lost, so the result sounds muted.66766677\item Even the components below 5500 Hz are not quite right, because6678the include contributions from leftover from the spectral copies we6679tried to filter out.66806681\end{itemize}66826683\begin{figure}6684% sampling.py6685\centerline{\includegraphics[height=4.8in]{figs/sampling5.pdf}}6686\caption{Spectrum of a bass guitar solo (top), its spectrum after6687sampling (middle), and after filtering (bottom).}6688\label{fig.sampling5}6689\end{figure}66906691If the spectral copies overlap after sampling, we lose information6692about the spectrum, and we won't be able to recover it.66936694But if the copies don't overlap, things work out pretty well. As6695a second example, I loaded a recording of a bass guitar solo.66966697Figure~\ref{fig.sampling5} shows its spectrum (top row), which contains6698no visible energy above 5512 Hz. The second row shows the spectrum of6699the sampled wave, and the third row shows the spectrum after the low6700pass filter. The amplitude is lower because we've filtered out some6701of the energy, but the shape of the spectrum is almost exactly what we6702started with. And if we convert back to a wave, it sounds the same.670367046705\section{Interpolation}67066707\begin{figure}6708% sampling.py6709\centerline{\includegraphics[height=2.5in]{figs/sampling6.pdf}}6710\caption{A brick wall low-pass filter (right) and the corresponding6711convolution window (left).}6712\label{fig.sampling6}6713\end{figure}67146715The low pass filter I used in the last step is a so-called {\bf brick6716wall filter}; frequencies above the cutoff are removed completely,6717as if they hit a brick wall.67186719Figure~\ref{fig.sampling6} (right) shows what this filter looks like.6720Of course, multiplication by the this filter, in the frequency domain,6721corresponds to convolution with a window in the time domain. We can6722find out what that window is by computing the inverse DFT of the6723filter, which is shown in Figure~\ref{fig.sampling6} (left).67246725That function has a name; it is the normalized sinc function, or at6726least a discrete approximation of it (see6727\url{https://en.wikipedia.org/wiki/Sinc_function}):67286729\[ \mathrm{sinc}(x) = \frac{\sin \pi x}{\pi x} \]67306731When we apply the low-pass filter, we are convolving with a sinc6732function. We can think of this convolution as the sum of shifted,6733scaled copies of the sinc function.67346735The value of sinc is 1 at 0 and 0 at every other integer6736value of $x$. When we shift the sinc function, we move the zero point.6737When we scale it, we change the height at the zero point.6738So when we add up the shifted, scaled copies, they interpolate6739between the sampled points.67406741\begin{figure}6742% sampling.py6743\centerline{\includegraphics[height=2.5in]{figs/sampling8.pdf}}6744\caption{.}6745\label{fig.sampling8}6746\end{figure}67476748Figure~\ref{fig.sampling8} shows how that works using a short segment6749of the bass guitar solo. The line across the to is the original6750wave. The vertical gray lines show the sampled values. The thin6751curves are the shifted, scaled copies of the sinc function.6752The sum of these sinc functions is identical to the original wave.67536754Read that last sentence again, because it is more surprising than it6755might seem. Because we started with a signal that contained no energy6756above 5512 Hz, and we sampled at 11,025 Hz, we were able to recover6757the original spectrum exactly.67586759In this example, I started with a wave that had already been6760sampled at 44,100 Hz, and I resampled it at 11,025 Hz. After6761resampling, the gap between the spectral copies is the sampling6762rate, 11.025 kHz. If the original signal contains components that6763exceed half of the sampling rate, 5512 Hz, the copies overlap6764and we lose information.67656766But if the signal is ``bandwidth limited''; that is, it contains no6767energy above 5512 Hz, the spectral copies don't overlap, we don't lose6768information, and we can recover the original signal exactly.67696770This result is known as the Nyquist-Shannon sampling theorem (see6771\url{https://en.wikipedia.org/wiki/Nyquist-Shannon_sampling_theorem}).67726773This example does not prove the Sampling Theorem, but I hope it6774helps you understand what it says and why it works.67756776Notice that the argument I made does6777not depend on the original sampling rate, 44.1 kHz. The result6778would be the same if the original had been sampled at a higher6779frequency, or even if the original had been a continuous analog6780signal: if we sample at framerate $f$, we can recover the original6781signal exactly, as long as it contains no energy at frequencies6782above $f/2$.678367846785\section{Exercises}67866787Solutions to these exercises are in {\tt chap11soln.ipynb}.67886789\begin{exercise}6790The code in this chapter is in {\tt chap11.ipynb}. Read through6791it and listen to the examples.6792\end{exercise}679367946795\begin{exercise}6796Chris ``Monty'' Montgomery has an excellent video called ``D/A and A/D6797| Digital Show and Tell''; it demonstrates the Sampling Theorem in6798action, and presents lots of other excellent information about6799sampling. Watch it at6800\url{https://www.youtube.com/watch?v=cIQ9IXSUzuM}.6801\end{exercise}680268036804\begin{exercise}6805As we have seen, if you sample a signal at too low a6806framerate, frequencies above the folding frequency get aliased.6807Once that happens, it is no longer possible to filter out6808these components, because they are indistinguishable from6809lower frequencies.68106811It is a good idea to filter out these frequencies {\em before}6812sampling; a low-pass filter used for this purpose is called6813an {\bf anti-aliasing filter}.68146815Returning to the drum solo example, apply a low-pass filter6816before sampling, then apply the low-pass filter again to remove6817the spectral copies introduced by sampling. The result should6818be identical to the filtered signal.6819\end{exercise}682068216822\backmatter6823\printindex68246825\end{document}68266827\end{itemize}682868296830