Path: blob/master/section-2-data-science-and-ml-tools/introduction-to-matplotlib-video.ipynb
874 views
Introduction to Matplotlib
Get straight into plotting data, that's what we're focused on.
Video 0 will be concepts and contain details like anatomy of a figure. The rest of the videos will be pure code based.
Concepts in Matplotlib
2 ways of creating plots (pyplot & OO) - use the OO method
Plotting data (NumPy arrays), line, scatter, bar, hist, subplots
Plotting data directly with Pandas (using the pandas matplotlib wrapper)
Plotting data (pandas DataFrames) with the OO method, line, scatter, bar, hist, subplots
Cutomizing your plots,
limits
,colors
,styles
,legends
Saving plots
0. Concepts in Matplotlib
What is Matplotlib?
Why Matplotlib?
Anatomy of a figure
Where does Matplotlib fit into the ecosystem?
A Matplotlib workflow
1. 2 ways of creating plots
pyplot()
OO
- https://matplotlib.org/api/_as_gen/matplotlib.pyplot.subplots.htmlMatplotlib recommends the OO API
Start by importing Matplotlib
and setting up the %matplotlib inline
magic command.
-> Show figure/plot anatomy here <-
2. Making the most common type of plots using NumPy arrays
Most of figuring out what kind of plot to use is getting a feel for the data, then see what suits it best.
Matplotlib visualizations are built off NumPy arrays. So in this section we'll build some of the most common types of plots using NumPy arrays.
line
scatter
bar
hist
subplots()
To make sure we have access to NumPy, we'll import it as np
.
Line
Line is the default type of visualization in Matplotlib. Usually, unless specified otherwise, your plots will start out as lines.
Scatter
Bar
Vertical
Horizontal
Histogram (hist)
Could show image of normal distribution here
Subplots
Multiple plots on one figure https://matplotlib.org/3.1.1/gallery/recipes/create_subplots.html
3. Plotting data directly with pandas
This section uses the pandas pd.plot()
method on a DataFrame to plot columns directly.
https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html
line
scatter
bar
hist
df.plot(subplots=True, figsize=(6, 6))
To plot data with pandas, we first have to import it as pd
.
Now we need some data to check out.
Line
Concept
DataFrame
Often, reading things won't make sense. Practice writing code for yourself, get it out of the docs and into your workspace. See what happens when you run it.
Let's start with trying to replicate the pandas visualization documents.
Working with actual data
Let's do a little data manipulation on our car_sales
DataFrame.
Scatter
Concept
DataFrame
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-34-540f318a89d0> in <module>
1 # Doesn't work
----> 2 car_sales.plot(x="Odometer (KM)", y="Price", kind="scatter")
~/Desktop/ml-course/work-in-progress/env/lib/python3.7/site-packages/pandas/plotting/_core.py in __call__(self, *args, **kwargs)
736 if kind in self._dataframe_kinds:
737 if isinstance(data, ABCDataFrame):
--> 738 return plot_backend.plot(data, x=x, y=y, kind=kind, **kwargs)
739 else:
740 raise ValueError(
~/Desktop/ml-course/work-in-progress/env/lib/python3.7/site-packages/pandas/plotting/_matplotlib/__init__.py in plot(data, kind, **kwargs)
59 ax = plt.gca()
60 kwargs["ax"] = getattr(ax, "left_ax", ax)
---> 61 plot_obj = PLOT_CLASSES[kind](data, **kwargs)
62 plot_obj.generate()
63 plot_obj.draw()
~/Desktop/ml-course/work-in-progress/env/lib/python3.7/site-packages/pandas/plotting/_matplotlib/core.py in __init__(self, data, x, y, s, c, **kwargs)
928 # the handling of this argument later
929 s = 20
--> 930 super().__init__(data, x, y, s=s, **kwargs)
931 if is_integer(c) and not self.data.columns.holds_integer():
932 c = self.data.columns[c]
~/Desktop/ml-course/work-in-progress/env/lib/python3.7/site-packages/pandas/plotting/_matplotlib/core.py in __init__(self, data, x, y, **kwargs)
870 raise ValueError(self._kind + " requires x column to be numeric")
871 if len(self.data[y]._get_numeric_data()) == 0:
--> 872 raise ValueError(self._kind + " requires y column to be numeric")
873
874 self.x = x
ValueError: scatter requires y column to be numeric
Bar
Concept
DataFrame
Histograms
Subplots
Concept
DataFrame
4. Plotting with pandas using the OO method
For more complicated plots, you'll want to use the OO method.
What if we wanted a horizontal line going across with the mean of heart_disease["chol"]
?
https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.axes.Axes.axhline.html
Adding another plot to existing styled one
5. Customizing your plots
limits (xlim, ylim)
,colors
,styles
,legends
Style
Changing the title, legend, axes
Changing the cmap
Changing the xlim & ylim
6. Saving plots
Saving plots to images using
figsave()
If you're doing something like this often, to save writing excess code, you might put it into a function.
A function which follows the Matplotlib workflow.