Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download

📚 The CoCalc Library - books, templates and other resources

132927 views
License: OTHER
1
# This is just one possible solution, there are
2
# several ways to do this using `delayed`
3
4
sums = []
5
counts = []
6
for fn in filenames:
7
# Read in file
8
df = delayed(pd.read_csv)(fn)
9
10
# Groupby origin airport
11
by_origin = df.groupby('Origin')
12
13
# Sum of all departure delays by origin
14
total = by_origin.DepDelay.sum()
15
16
# Number of flights by origin
17
count = by_origin.DepDelay.count()
18
19
# Save the intermediates
20
sums.append(total)
21
counts.append(count)
22
23
# Compute the intermediates
24
sums, counts = compute(sums, counts)
25
26
# Combine intermediates to get total mean-delay-per-origin
27
total_delays = sum(sums)
28
n_flights = sum(counts)
29
mean = total_delays / n_flights
30