Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download

📚 The CoCalc Library - books, templates and other resources

132922 views
License: OTHER
Kernel: Python 3
%matplotlib inline import pymc3 as pm import numpy as np import pandas as pd import matplotlib.pyplot as plt %config InlineBackend.figure_format = 'retina' plt.style.use(['seaborn-colorblind', 'seaborn-darkgrid'])

Code 14.1

pancake = np.array([[1,1],[1,0],[0,0]]) # simulate a pancake and return randomly ordered sides pancakes = np.asarray([np.random.permutation(pancake[np.random.choice(range(3))]) for i in range(10000)]) up = pancakes[:, 0] down = pancakes[:, 1] # compute proportion 1/1 (BB) out of all 1/1 and 1/0 num_11_10 = np.sum(up==1) num_11 = np.sum((up==1) & (down==1)) num_11/num_11_10
0.67182846932698037

Code 14.2

d = pd.read_csv('Data/WaffleDivorce.csv', ';') d['log_population'] = np.log(d['Population']) _, ax = plt.subplots(1, 2, figsize=(18, 5)) # points ax[0].scatter(d['MedianAgeMarriage'], d['Divorce'], marker='o', facecolor='none', edgecolors='k', linewidth=1) # standard errors ax[0].errorbar(d['MedianAgeMarriage'], d['Divorce'], d['Divorce SE'].values, ls='none', color='k', linewidth=1) ax[0].set_xlabel('Median age marriage') ax[0].set_ylabel('Divorce rate') ax[0].set_ylim(4, 15) # points ax[1].scatter(d['log_population'], d['Divorce'], marker='o', facecolor='none', edgecolors='k', linewidth=1) # standard errors ax[1].errorbar(d['log_population'], d['Divorce'], d['Divorce SE'].values, ls='none', color='k', linewidth=1) ax[1].set_xlabel('log population') ax[1].set_ylabel('Divorce rate') ax[1].set_ylim(4, 15);
Image in a Jupyter notebook

Code 14.3

div_obs = d['Divorce'].values div_sd = d['Divorce SE'].values R = d['Marriage'].values A = d['MedianAgeMarriage'].values N = len(d) with pm.Model() as m_14_1: sigma = pm.HalfCauchy('sigma', 2.5) a = pm.Normal('a', 0., 10.) bA = pm.Normal('bA', 0., 10.) bR = pm.Normal('bR', 0., 10.) mu = a + bA*A + bR*R div_est = pm.Normal('div_est', mu, sigma, shape=N) obs = pm.Normal('div_obs', div_est, div_sd, observed=div_obs) # start value and additional kwarg for NUTS start = dict(div_est=div_obs) trace_14_1 = pm.sample(4000, tune=1000, njobs=2, start=start, nuts_kwargs=dict(target_accept=.95))
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... 96%|█████████▋| 4821/5000 [05:51<00:12, 13.97it/s]/home/osvaldo/Documentos/Proyectos/01_PyMC3/pymc3/pymc3/step_methods/hmc/nuts.py:468: UserWarning: Chain 1 contains 1 diverging samples after tuning. If increasing `target_accept` does not help try to reparameterize. % (self._chain_id, n_diverging)) 100%|██████████| 5000/5000 [06:03<00:00, 13.76it/s]

Code 14.4

pm.summary(trace_14_1, varnames=['div_est', 'a', 'bA', 'bR', 'sigma']).round(2)

Code 14.5

div_obs = d['Divorce'].values div_sd = d['Divorce SE'].values mar_obs = d['Marriage'].values mar_sd = d['Marriage SE'].values A = d['MedianAgeMarriage'].values N = len(d) with pm.Model() as m_14_2: sigma = pm.HalfCauchy('sigma', 2.5) a = pm.Normal('a', 0., 10.) bA = pm.Normal('bA', 0., 10.) bR = pm.Normal('bR', 0., 10.) mar_est = pm.Flat('mar_est', shape=N) mu = a + bA*A + bR*mar_est div_est = pm.Normal('div_est', mu, sigma, shape=N) obs1 = pm.Normal('div_obs', div_est, div_sd, observed=div_obs) obs2 = pm.Normal('mar_obs', mar_est, mar_sd, observed=mar_obs) # start value and additional kwarg for NUTS start = dict(div_est=div_obs, mar_est=mar_obs) trace_14_2 = pm.sample(4000, tune=1000, njobs=3, start=start, nuts_kwargs=dict(target_accept=.95))
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... 95%|█████████▍| 4747/5000 [07:42<00:24, 10.45it/s]/home/osvaldo/Documentos/Proyectos/01_PyMC3/pymc3/pymc3/step_methods/hmc/nuts.py:468: UserWarning: Chain 2 contains 1 diverging samples after tuning. If increasing `target_accept` does not help try to reparameterize. % (self._chain_id, n_diverging)) 100%|█████████▉| 4999/5000 [08:00<00:00, 14.55it/s]/home/osvaldo/Documentos/Proyectos/01_PyMC3/pymc3/pymc3/step_methods/hmc/nuts.py:468: UserWarning: Chain 0 contains 2 diverging samples after tuning. If increasing `target_accept` does not help try to reparameterize. % (self._chain_id, n_diverging)) 100%|██████████| 5000/5000 [08:00<00:00, 10.41it/s]
pm.traceplot(trace_14_2);
Image in a Jupyter notebook

Code 14.6

d = pd.read_csv('Data/milk.csv', ';') d.loc[:,'neocortex.prop'] = d['neocortex.perc'] / 100 d.loc[:,'logmass'] = np.log(d['mass'])

Code 14.7

# prep data kcal = d['kcal.per.g'].values.copy() logmass = d['logmass'].values.copy() # PyMC3 can handle missing value quite naturally. neocortex = d['neocortex.prop'].values.copy() mask = np.isfinite(neocortex) neocortex[~mask] = -999 neocortex = np.ma.masked_values(neocortex, value=-999) # fit model with pm.Model() as m_14_3: sigma = pm.HalfCauchy('sigma', 1.) sigma_N = pm.HalfCauchy('sigma_N', 1.) nu = pm.Normal('nu', .5, 1.) bN = pm.Normal('bN', 0., 10.) bM = pm.Normal('bM', 0., 10.) a = pm.Normal('a', 0., 100.) neocortex_ = pm.Normal('neocortex', nu, sigma_N, observed=neocortex) mu = a + bN*neocortex_ + bM*logmass kcal_ = pm.Normal('kcal', mu, sigma, observed=kcal) trace_14_3 = pm.sample(5000, tune=5000, njobs=2)
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... 96%|█████████▌| 9555/10000 [03:55<00:09, 48.19it/s] /home/osvaldo/Documentos/Proyectos/01_PyMC3/pymc3/pymc3/step_methods/hmc/nuts.py:468: UserWarning: Chain 1 contains 189 diverging samples after tuning. If increasing `target_accept` does not help try to reparameterize. % (self._chain_id, n_diverging)) 100%|█████████▉| 9996/10000 [04:04<00:00, 47.86it/s]/home/osvaldo/Documentos/Proyectos/01_PyMC3/pymc3/pymc3/step_methods/hmc/nuts.py:452: UserWarning: The acceptance probability in chain 0 does not match the target. It is 0.664707186287, but should be close to 0.8. Try to increase the number of tuning steps. % (self._chain_id, mean_accept, target_accept)) /home/osvaldo/Documentos/Proyectos/01_PyMC3/pymc3/pymc3/step_methods/hmc/nuts.py:468: UserWarning: Chain 0 contains 533 diverging samples after tuning. If increasing `target_accept` does not help try to reparameterize. % (self._chain_id, n_diverging)) 100%|██████████| 10000/10000 [04:05<00:00, 40.81it/s]

Code 14.8

# the missing value in pymc3 is automatically model as a node with *_missing as name pm.summary(trace_14_3, varnames=['neocortex_missing', 'a', 'bN', 'bM', 'nu', 'sigma_N', 'sigma']).round(2)

Code 14.9

# prep data neocortex = np.copy(d['neocortex.prop'].values) mask = np.isfinite(neocortex) kcal = np.copy(d['kcal.per.g'].values[mask]) logmass = np.copy(d['logmass'].values[mask]) neocortex = neocortex[mask] # fit model with pm.Model() as m_14_3cc: sigma = pm.HalfCauchy('sigma', 1.) bN = pm.Normal('bN', 0., 10.) bM = pm.Normal('bM', 0., 10.) a = pm.Normal('a', 0., 100.) mu = a + bN*neocortex + bM*logmass kcal_ = pm.Normal('kcal', mu, sigma, observed=kcal) trace_14_3cc = pm.sample(5000, tune=5000, njobs=2) pm.summary(trace_14_3cc, varnames=['a', 'bN', 'bM', 'sigma']).round(2)
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... 98%|█████████▊| 9816/10000 [02:19<00:02, 82.03it/s]/home/osvaldo/Documentos/Proyectos/01_PyMC3/pymc3/pymc3/step_methods/hmc/nuts.py:468: UserWarning: Chain 1 contains 7 diverging samples after tuning. If increasing `target_accept` does not help try to reparameterize. % (self._chain_id, n_diverging)) 100%|█████████▉| 9994/10000 [02:21<00:00, 82.69it/s]/home/osvaldo/Documentos/Proyectos/01_PyMC3/pymc3/pymc3/step_methods/hmc/nuts.py:468: UserWarning: Chain 0 contains 17 diverging samples after tuning. If increasing `target_accept` does not help try to reparameterize. % (self._chain_id, n_diverging)) 100%|██████████| 10000/10000 [02:21<00:00, 70.57it/s]

Code 14.10

# prep data kcal = d['kcal.per.g'].values.copy() logmass = d['logmass'].values.copy() neocortex = d['neocortex.prop'].values.copy() mask = np.isfinite(neocortex) neocortex[~mask] = -999 neocortex = np.ma.masked_values(neocortex, value=-999) with pm.Model() as m_14_4: sigma = pm.HalfCauchy('sigma', 1.) sigma_N = pm.HalfCauchy('sigma_N', 1.) a_N = pm.Normal('a_N', .5, 1.) betas = pm.Normal('bNbMgM', 0., 10., shape=3) # bN, bM, and gM a = pm.Normal('a', 0., 100.) nu = a_N + betas[2]*logmass neocortex_ = pm.Normal('neocortex', nu, sigma_N, observed=neocortex) mu = a + betas[0]*neocortex_ + betas[1]*logmass kcal_ = pm.Normal('kcal', mu, sigma, observed=kcal) trace_14_4 = pm.sample(5000, tune=5000, njobs=2) pm.summary(trace_14_4, varnames=['neocortex_missing', 'a', 'bNbMgM', 'a_N', 'sigma_N', 'sigma']).round(2)
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... 100%|█████████▉| 9994/10000 [04:11<00:00, 53.63it/s] /home/osvaldo/Documentos/Proyectos/01_PyMC3/pymc3/pymc3/step_methods/hmc/nuts.py:452: UserWarning: The acceptance probability in chain 0 does not match the target. It is 0.658126861926, but should be close to 0.8. Try to increase the number of tuning steps. % (self._chain_id, mean_accept, target_accept)) /home/osvaldo/Documentos/Proyectos/01_PyMC3/pymc3/pymc3/step_methods/hmc/nuts.py:468: UserWarning: Chain 0 contains 507 diverging samples after tuning. If increasing `target_accept` does not help try to reparameterize. % (self._chain_id, n_diverging)) 100%|██████████| 10000/10000 [04:12<00:00, 39.66it/s] /home/osvaldo/Documentos/Proyectos/01_PyMC3/pymc3/pymc3/step_methods/hmc/nuts.py:468: UserWarning: Chain 1 contains 19 diverging samples after tuning. If increasing `target_accept` does not help try to reparameterize. % (self._chain_id, n_diverging))

Code 14.11-14

Stan related. As you can see above, PyMC3 deal with missing value internally if you represent the observed data using a numpy mask array. The missing/masked value are replaced with a new random variable added to the model (with name *_missing).

import sys, IPython, scipy, matplotlib, platform print("This notebook was createad on a computer %s running %s and using:\nPython %s\nIPython %s\nPyMC3 %s\nNumPy %s\nSciPy %s\nMatplotlib %s\n" % (platform.machine(), ' '.join(platform.linux_distribution()[:2]), sys.version[:5], IPython.__version__, pm.__version__, np.__version__, scipy.__version__, matplotlib.__version__))
This notebook was createad on a computer x86_64 running debian stretch/sid and using: Python 3.6.2 IPython 6.1.0 PyMC3 3.2 NumPy 1.13.3 SciPy 0.19.1 Matplotlib 2.1.0