Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download

Folder full of pertinent coursework

1666 views
Kernel: Python 2 (SageMath)

GHCND

Sean Paradiso

import numpy as np import pandas as pd import matplotlib.pyplot as plt %matplotlib inline data = pd.read_csv('598354.csv') data
As described by GHCND_documentation.pdf, this data is 'a composite of climate records from numerous sources that were merged and then subjected to a suite of quality assurance reviews.'
data.describe()
station = data.groupby("STATION").size() #station
#station.index
#station.describe()
The following code provides a more succinct version of the column names for user ease
data.columns
Index([u'STATION', u'STATION_NAME', u'DATE', u'MDPR', u'DAPR', u'PRCP', u'TMAX', u'TMIN', u'TOBS'], dtype='object')
Here we display the names of the stations with recorded data and show their respective data size. The sizes were shown so that when making a selection of three stations we could compare sample sizes and choose statoins with a similar number of data points.
name = data.groupby("STATION_NAME").size() name
STATION_NAME ACTON CALIFORNIA CA US 610 ACTON ESCONDIDO CANYON CA US 273 ADELANTO 3.1 S CA US 22 ADIN MOUNTAIN CA US 596 ADIN RANGER STATION CA US 582 AHWAHNEE 2.5 NNW CA US 571 ALAMO 1.0 WSW CA US 5 ALBION 4.0 SE CA US 563 ALDER POINT CALIFORNIA CA US 562 ALDER SPRINGS CALIFORNIA CA US 610 ALPINE CA US 610 ALPINE CALIFORNIA CA US 610 ALTA SIERRA 0.4 WSW CA US 567 ALTA SIERRA 1.3 S CA US 45 ALTA SIERRA 1.4 SSW CA US 63 ALTA SIERRA 2.3 WSW CA US 55 ALTADENA 0.7 ESE CA US 583 ALTADENA CA US 577 ALTURAS CA US 497 ALTURAS MUNICIPAL AIRPORT CA US 608 AMBOY CA US 370 AMERICAN CANYON 0.3 S CA US 138 AMERICAN CANYON 3.5 NE CA US 16 ANAHEIM 4.9 E CA US 580 ANAHEIM 4.9 ENE CA US 602 ANAHEIM 7.3 E CA US 394 ANAHEIM CA US 610 ANAHEIM HILLS 1.1 SE CA US 38 ANDERSON 2.6 NE CA US 104 ANDERSON 8.5 WNW CA US 205 ... WINDSOR 0.6 NNE CA US 597 WINDSOR 1.2 NNW CA US 54 WINDSOR 1.4 SE CA US 609 WINDSOR 1.5 WNW CA US 81 WINDSOR 1.8 SE CA US 63 WINTERS CA US 604 WOFFORD HEIGHTS CALIFORNIA CA US 610 WOLVERTON CALIFORNIA CA US 610 WOODACRE 0.6 SW CA US 145 WOODACRE CALIFORNIA CA US 610 WOODLAND 1 WNW CA US 527 WOODLAND 2.8 SE CA US 449 WOODLAND HILLS PIERCE COLLEGE CA US 610 WOODSIDE 3.4 S CA US 558 WOODSIDE FIRE STATION 1 CA US 356 WRIGHTWOOD 1.2 WNW CA US 395 YOLLA BOLLA CALIFORNIA CA US 610 YOSEMITE LAKES 4.7 S CA US 597 YOSEMITE PARK HDQUARTERS CA US 502 YOSEMITE VILLAGE 12 W CA US 606 YREKA 0.9 WNW CA US 481 YREKA 4.5 S CA US 295 YREKA CA US 608 YUCAIPA 1.5 NNE CA US 500 YUCCA MESA CA US 577 YUCCA VALLEY 1.1 SW CA US 29 YUCCA VALLEY 2.7 ENE CA US 531 YUCCA VALLEY CA US 577 YUCCA VALLEY CALIFORNIA CA US 575 YUROK CALIFORNIA CA US 601 Length: 1345, dtype: int64
The code below was just a superfluous method of displaying the names of the stations through the utilization of a for loop.
group1 = data.groupby("STATION_NAME") #for name, group in group1: #print(name)
The next three lines are the station selections and the three lines beyond (namely, tmm1,2,3) are streamlining the data so only the information in which we are interested, i.e. minimum and maximum temperature, are displayed.
selection1 = group1.get_group('AMBOY CA US') #selection1
selection2 = group1.get_group('ALTURAS CA US') #selection2
selection3 = group1.get_group('ANAHEIM CA US') #selection3
tmm1 = selection1.iloc[:,6:8] #tmm1
tmm2 = selection2.iloc[:,6:8] #tmm2
tmm3 = selection3.iloc[:,6:8] #tmm3
Here we correct the data because there are numerous inputs of -9999 and this is clearly not a recorded value but most likely a form of placeholder. In order to make everything readable/plottable, we simply replace every instance of -9999 with NaN (not a number).
Below these long columns of data we have the actual plot of the minimum and maximum temperatures for our first selection.
corrected1 = tmm1[tmm1 > -9999] corrected1
corrected1.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7f521c28bad0>
Image in a Jupyter notebook
We again correct for the -9999 values and plot the necessary data for our second selection.
corrected2 = tmm2[tmm2 > -9999] corrected2
corrected2.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7f521c5fd8d0>
Image in a Jupyter notebook
Finally, we, again, correct and plot for the data we selected for our third station.
corrected3 = tmm3[tmm3 > -9999] corrected3
corrected3.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7f521a06bb50>
Image in a Jupyter notebook
Here we extract the precipitation for every date in our data. We then proceed to furhter specify this selection to the June 2015 span that is required.
prcp = data.groupby("PRCP").size() prcp
PRCP -9999 171378 0 403317 2 66 3 7160 4 26 5 5698 6 19 7 10 8 3587 9 8 10 2696 11 14 12 3 13 2494 14 4 15 1761 16 5 17 3 18 1614 19 6 20 1638 21 5 22 5 23 1376 24 8 25 3093 26 9 27 9 28 1277 29 6 ... 1740 1 1753 1 1758 1 1765 1 1791 1 1793 1 1808 1 1811 1 1822 1 1859 1 1872 1 1880 1 1892 1 1918 1 1923 1 2017 1 2019 1 2052 2 2090 1 2096 1 2159 1 2179 1 2256 1 2271 1 2304 1 2413 1 2558 1 2753 1 4699 1 12344 1 Length: 707, dtype: int64
date = data.iloc[:, [2, 5]] date June2015 = date[516:546] June2015
As we can see from the above data, every precipitation value for June 2015 is -9999 so we double check the data and correct for these unwanted values and replace them with NaN as seen below. Our initial results were confirmed and thus we didn't plot any data due to the sheer lack of data.
percip = data.iloc[:,5:6] #corper = percip[percip > -9999] for i in percip[percip <= -9999]: percip[i] = 0 corper = percip[i] junepercip = corper.iloc[516:546] junepercip junepercip.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7f5219f73810>
Image in a Jupyter notebook
selection1 s1p = selection1.iloc[:,5:6] for i in s1p[s1p <= -9999]: s1p[i] = 0 s1corper = s1p[i] s1june = corper.iloc[516:546] s1june s1p.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7f5219f9fb90>
Image in a Jupyter notebook
selection2 s2p = selection2.iloc[:,5:6] for i in s2p[s2p <= -9999]: s2p[i] = 0 s2corper = s2p[i] s2june = corper.iloc[516:546] s2june s2p.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7f52141b30d0>
Image in a Jupyter notebook
selection3 s3p = selection3.iloc[:,5:6] for i in s3p[s3p <= -9999]: s3p[i] = 0 s3corper = s3p[i] s3june = corper.iloc[516:546] s3june s3p.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7f521410d290>
Image in a Jupyter notebook