GitHub Repository: Probability-Statistics-Jupyter-Notebook/probability-statistics-notebook
Path: blob/master/notebook-for-reviewing/chapter_6_descriptive_statistics.ipynb
³⁸⁸ views

Kernel: Python 3

In [34]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
from scipy.ndimage import mean, median, variance
from stemgraphic import stem_graphic

Chapter 6 Descriptive Statistics

Data Presentation

Show the data

In [6]:

# Load data
data = np.array([45,62,52,72,91,88,64,65,69,59,70,63,80,70,59,87,59,69,68,69,56,59,74,60,79,56,177,61,60,78,66,61,47,63,63,57,77,67,55,55,56,39,65,60,80,41,72,77,54,81,63,70,73,76,61,75,62,59,64,61,70,65,83,61,56,64,72,90,86,63,63,63,65,80,69,62,75,59,81,79,94,63,64,55,61,66,65,72,61,76,48,92,135,67,73,66,143,82,71,51,70,71,45,64,89,66,66,65,60,64,59,93,84,47,48,65,74,57,62,79,62,68,73,54,55,78,69,69,61,186,55,68,76,70,69,61,55,61,82,83,66,59,69,61,93,76,81,65,67,51,69,77,78,63,77,61,61,66,87,53,67,78,68,80,89,77,63,67,95,54,64,63,28,73,75,65,67,62,65,88,78,75,71,72,60,53,67,81,85,71,49,70,49,58,63,105,62,72,66,79])
data = data.reshape(len(data), 1)
data_frame = pd.DataFrame(data, columns=['Service Times'])
print(data_frame)

Out[6]:

     Service Times
             45
             62
             52
             72
             91
..             ...
          105
           62
           72
           66
           79

[200 rows x 1 columns]

Data graphs

Include:

Histograms
Leaf Plots
Box Plots

In [8]:

# Generate the Histogram Graph
data_frame.hist(['Service Times'], grid=False)
plt.show()

Out[8]:

In [11]:

# Generate the Leaf Plots Graph
stem_graphic(data_frame['Service Times'])
plt.show()

Out[11]:

In [13]:

# Generate the Box Plots
plt.boxplot(data_frame, vert=False)
plt.show()

Out[13]:

Sample Statistics

Mean
Variance
Median
Trimmed Mean
Mode
Quantile
Coefficient

In [39]:

# Cal the mean
mu = mean(data_frame)

# Cal the variance
var = stats.tstd(data_frame) ** 2

# Cal the median
medi = median(data_frame)

# Cal the trimmed mean
r = 0.05
trim_mean = stats.trim_mean(data_frame, r)

# Cal the mode
mode = stats.mode(data_frame)

# Cal the quantile
upper_quantile = stats.mstats.mquantiles(data_frame, prob=[0.75])
lower_quantile = stats.mstats.mquantiles(data_frame, prob=[0.25])

# Cal the interquantile range
inter_quantile = upper_quantile - lower_quantile

# Cal the coefficient 
coeff = stats.variation(data_frame)

# Output
print('----- Mean -----\n{}'.format(mu))
print('----- Varn -----\n{}'.format(var))
print('----- Medi -----\n{}'.format(medi))
print('----- Trim -----\n{}'.format(trim_mean))
print('----- Mode -----\n{}'.format(mode))
print('----- Up-Q -----\n{}'.format(upper_quantile))
print('----- Lo-Q -----\n{}'.format(lower_quantile))
print('----- Inte -----\n{}'.format(inter_quantile))
print('----- Coef -----\n{}'.format(coeff))

Out[39]:

----- Mean -----
Service Times    69.345
dtype: float64
----- Varn -----
[309.31253769]
----- Medi -----
66.0
----- Trim -----
[67.88333333]
----- Mode -----
ModeResult(mode=array([[61]]), count=array([[13]]))
----- Up-Q -----
[76.]
----- Lo-Q -----
[61.]
----- Inte -----
[15.]
----- Coef -----
[0.25298522]

Chapter 6 Descriptive Statistics

Data Presentation

Data graphs

Sample Statistics

Product

Resources

Company