📚 The CoCalc Library - books, templates and other resources

cocalc-examples / think-stats-2ed / code / chap01ex.ipynb

²⁰¹⁸³⁸ views
License: OTHER

Kernel: Python 3

Examples and Exercises from Think Stats, 2nd Edition

MIT License: https://opensource.org/licenses/MIT

In [1]:

from __future__ import print_function, division

import nsfg

Examples from Chapter 1

Read NSFG data into a Pandas DataFrame.

In [2]:

preg = nsfg.ReadFemPreg()
preg.head()

Out[2]:

Print the column names.

In [3]:

preg.columns

Out[3]:

Index(['caseid', 'pregordr', 'howpreg_n', 'howpreg_p', 'moscurrp', 'nowprgdk',
       'pregend1', 'pregend2', 'nbrnaliv', 'multbrth',
       ...
       'laborfor_i', 'religion_i', 'metro_i', 'basewgt', 'adj_mod_basewgt',
       'finalwgt', 'secu_p', 'sest', 'cmintvw', 'totalwgt_lb'],
      dtype='object', length=244)

Select a single column name.

In [4]:

preg.columns[1]

Out[4]:

'pregordr'

Select a column and check what type it is.

In [5]:

pregordr = preg['pregordr']
type(pregordr)

Out[5]:

pandas.core.series.Series

Print a column.

In [6]:

pregordr

Out[6]:

      1
      2
      1
      2
      3
      1
      2
      3
      1
      2
     1
     1
     2
     3
     1
     2
     3
     1
     2
     1
     2
     1
     2
     1
     2
     3
     1
     1
     2
     3
        ..
  2
  3
  1
  1
  1
  2
  1
  2
  3
  4
  1
  2
  1
  1
  2
  1
  2
  1
  2
  3
  1
  2
  1
  2
  3
  1
  2
  3
  4
  5
Name: pregordr, Length: 13593, dtype: int64

Select a single element from a column.

In [7]:

pregordr[0]

Out[7]:

1

Select a slice from a column.

In [8]:

pregordr[2:5]

Out[8]:

  1
  2
  3
Name: pregordr, dtype: int64

Select a column using dot notation.

In [9]:

pregordr = preg.pregordr

Count the number of times each value occurs.

In [10]:

preg.outcome.value_counts().sort_index()

Out[10]:

  9148
  1862
   120
  1921
   190
   352
Name: outcome, dtype: int64

Check the values of another variable.

In [11]:

preg.birthwgt_lb.value_counts().sort_index()

Out[11]:

0        8
0       40
0       53
0       98
0      229
0      697
0     2223
0     3049
0     1889
0      623
0     132
0      26
0      10
0       3
0       3
0       1
Name: birthwgt_lb, dtype: int64

Make a dictionary that maps from each respondent's caseid to a list of indices into the pregnancy DataFrame. Use it to select the pregnancy outcomes for a single respondent.

In [12]:

caseid = 10229
preg_map = nsfg.MakePregMap(preg)
indices = preg_map[caseid]
preg.outcome[indices].values

Out[12]:

array([4, 4, 4, 4, 4, 4, 1])

Exercises

Select the birthord column, print the value counts, and compare to results published in the codebook

In [13]:

# Solution goes here

We can also use isnull to count the number of nans.

In [14]:

preg.birthord.isnull().sum()

Out[14]:

4445

Select the prglngth column, print the value counts, and compare to results published in the codebook

In [15]:

# Solution goes here

To compute the mean of a column, you can invoke the mean method on a Series. For example, here is the mean birthweight in pounds:

In [16]:

preg.totalwgt_lb.mean()

Out[16]:

7.265628457623368

Create a new column named totalwgt_kg that contains birth weight in kilograms. Compute its mean. Remember that when you create a new column, you have to use dictionary syntax, not dot notation.

In [17]:

# Solution goes here

nsfg.py also provides ReadFemResp, which reads the female respondents file and returns a DataFrame:

In [18]:

resp = nsfg.ReadFemResp()

DataFrame provides a method head that displays the first five rows:

In [19]:

resp.head()

Out[19]:

Select the age_r column from resp and print the value counts. How old are the youngest and oldest respondents?

In [20]:

# Solution goes here

We can use the caseid to match up rows from resp and preg. For example, we can select the row from resp for caseid 2298 like this:

In [21]:

resp[resp.caseid==2298]

Out[21]:

And we can get the corresponding rows from preg like this:

In [22]:

preg[preg.caseid==2298]

Out[22]:

How old is the respondent with caseid 1?

In [23]:

# Solution goes here

What are the pregnancy lengths for the respondent with caseid 2298?

In [24]:

# Solution goes here

What was the birthweight of the first baby born to the respondent with caseid 5012?

In [25]:

# Solution goes here

In [ ]:

Examples and Exercises from Think Stats, 2nd Edition

Examples from Chapter 1

Exercises

Product

Resources

Company