Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download

📚 The CoCalc Library - books, templates and other resources

132923 views
License: OTHER
Kernel: Python 3

Examples and Exercises from Think Stats, 2nd Edition

http://thinkstats2.com

Copyright 2016 Allen B. Downey

MIT License: https://opensource.org/licenses/MIT

from __future__ import print_function, division import nsfg

Examples from Chapter 1

Read NSFG data into a Pandas DataFrame.

preg = nsfg.ReadFemPreg() preg.head()

Print the column names.

preg.columns
Index(['caseid', 'pregordr', 'howpreg_n', 'howpreg_p', 'moscurrp', 'nowprgdk', 'pregend1', 'pregend2', 'nbrnaliv', 'multbrth', ... 'laborfor_i', 'religion_i', 'metro_i', 'basewgt', 'adj_mod_basewgt', 'finalwgt', 'secu_p', 'sest', 'cmintvw', 'totalwgt_lb'], dtype='object', length=244)

Select a single column name.

preg.columns[1]
'pregordr'

Select a column and check what type it is.

pregordr = preg['pregordr'] type(pregordr)
pandas.core.series.Series

Print a column.

pregordr
0 1 1 2 2 1 3 2 4 3 5 1 6 2 7 3 8 1 9 2 10 1 11 1 12 2 13 3 14 1 15 2 16 3 17 1 18 2 19 1 20 2 21 1 22 2 23 1 24 2 25 3 26 1 27 1 28 2 29 3 .. 13563 2 13564 3 13565 1 13566 1 13567 1 13568 2 13569 1 13570 2 13571 3 13572 4 13573 1 13574 2 13575 1 13576 1 13577 2 13578 1 13579 2 13580 1 13581 2 13582 3 13583 1 13584 2 13585 1 13586 2 13587 3 13588 1 13589 2 13590 3 13591 4 13592 5 Name: pregordr, Length: 13593, dtype: int64

Select a single element from a column.

pregordr[0]
1

Select a slice from a column.

pregordr[2:5]
2 1 3 2 4 3 Name: pregordr, dtype: int64

Select a column using dot notation.

pregordr = preg.pregordr

Count the number of times each value occurs.

preg.outcome.value_counts().sort_index()
1 9148 2 1862 3 120 4 1921 5 190 6 352 Name: outcome, dtype: int64

Check the values of another variable.

preg.birthwgt_lb.value_counts().sort_index()
0.0 8 1.0 40 2.0 53 3.0 98 4.0 229 5.0 697 6.0 2223 7.0 3049 8.0 1889 9.0 623 10.0 132 11.0 26 12.0 10 13.0 3 14.0 3 15.0 1 Name: birthwgt_lb, dtype: int64

Make a dictionary that maps from each respondent's caseid to a list of indices into the pregnancy DataFrame. Use it to select the pregnancy outcomes for a single respondent.

caseid = 10229 preg_map = nsfg.MakePregMap(preg) indices = preg_map[caseid] preg.outcome[indices].values
array([4, 4, 4, 4, 4, 4, 1])

Exercises

Select the birthord column, print the value counts, and compare to results published in the codebook

# Solution goes here

We can also use isnull to count the number of nans.

preg.birthord.isnull().sum()
4445

Select the prglngth column, print the value counts, and compare to results published in the codebook

# Solution goes here

To compute the mean of a column, you can invoke the mean method on a Series. For example, here is the mean birthweight in pounds:

preg.totalwgt_lb.mean()
7.265628457623368

Create a new column named totalwgt_kg that contains birth weight in kilograms. Compute its mean. Remember that when you create a new column, you have to use dictionary syntax, not dot notation.

# Solution goes here

nsfg.py also provides ReadFemResp, which reads the female respondents file and returns a DataFrame:

resp = nsfg.ReadFemResp()

DataFrame provides a method head that displays the first five rows:

resp.head()

Select the age_r column from resp and print the value counts. How old are the youngest and oldest respondents?

# Solution goes here

We can use the caseid to match up rows from resp and preg. For example, we can select the row from resp for caseid 2298 like this:

resp[resp.caseid==2298]

And we can get the corresponding rows from preg like this:

preg[preg.caseid==2298]

How old is the respondent with caseid 1?

# Solution goes here

What are the pregnancy lengths for the respondent with caseid 2298?

# Solution goes here

What was the birthweight of the first baby born to the respondent with caseid 5012?

# Solution goes here