Master Python Notebook
Common Settings and Imports
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-2-d7b84aac1a98> in <module>()
7 # Our numerical workhorses
8 import numpy as np
----> 9 import pandas as pd
10 import scipy.integrate
11
ImportError: No module named 'pandas'
To generate a floating table of contents, use this code:
Pandas
Pivot Tables
Permutations
LATEX and Markdown
Use single dollar sign for inline LATEX and double dollar sign for block LATEX. For example, this is inline. And this is block:
Other Markdown things are bulleted lists:
item 1
other items
Plotting
XLWINGS
This is an area I need to look at in more deteail
General Python
Multiple Find and Replace
Sets
List All Files and Folders in a Directory
Databases
Idioms
Vibration Data Processing
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-6-2ff34b5bf648> in <module>()
10
11 #Load Data (assumes two column array
---> 12 df = pd.read_csv(file_path,delimiter=',',header=None,names=["time","data"])
13 t = df["time"]
14 x = df["data"]
/usr/lib/python3/dist-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, na_fvalues, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, float_precision, nrows, iterator, chunksize, verbose, encoding, squeeze, mangle_dupe_cols, tupleize_cols, infer_datetime_format, skip_blank_lines)
461 skip_blank_lines=skip_blank_lines)
462
--> 463 return _read(filepath_or_buffer, kwds)
464
465 parser_f.__name__ = name
/usr/lib/python3/dist-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
237
238 # Create the parser.
--> 239 parser = TextFileReader(filepath_or_buffer, **kwds)
240
241 if (nrows is not None) and (chunksize is not None):
/usr/lib/python3/dist-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
551 self.options['has_index_names'] = kwds['has_index_names']
552
--> 553 self._make_engine(self.engine)
554
555 def _get_options_with_defaults(self, engine):
/usr/lib/python3/dist-packages/pandas/io/parsers.py in _make_engine(self, engine)
688 def _make_engine(self, engine='c'):
689 if engine == 'c':
--> 690 self._engine = CParserWrapper(self.f, **self.options)
691 else:
692 if engine == 'python':
/usr/lib/python3/dist-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
1050 kwds['allow_leading_cols'] = self.index_col is not False
1051
-> 1052 self._reader = _parser.TextReader(src, **kwds)
1053
1054 # XXX
parser.pyx in pandas.parser.TextReader.__cinit__ (pandas/parser.c:3265)()
parser.pyx in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:5703)()
OSError: File b'C:/Users/Stevens/Google Drive/Work/Nick/Software - Engineering/Python, Tensorflow, Machine Learning/vibration-data-examples-CSV/aircraft_takeoff.csv' does not exist
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-7-002640c9fd18> in <module>()
1 #Plot Data
2 plt.figure(1)
----> 3 plt.plot(t, x)
4 plt.xlabel('Time (seconds)')
5 plt.ylabel('Accel (g)')
NameError: name 't' is not defined
Statistics
if we want to plot the 95% confidence interval for the mean of our data samples, we can use the bootstrap to do so. The basic idea is simple - draw many, many samples with replacement from the data available, estimate the mean from each sample, then rank order the means to estimate the 2.5 and 97.5 percentile values for 95% confidence interval. Unlike using normal assumptions to calculate 95% CI, the results generated by the bootstrap are robust even if the underlying data are very far from normal.
Note that the bootstrap function is a higher order function, and will return the boostrap CI for any valid statistical function, not just the mean. For example, to find the 95% CI for the standard deviation, we only need to change np.mean to np.std in the arguments:
Permutation-resampling is another form of simulation-based statistical calculation, and is often used to evaluate the p-value for the difference between two groups, under the null hypothesis that the groups are invariant under label permutation. For example, in a case-control study, it can be used to find the p-value that hypothesis that the mean of the case group is different from that of the control group, and we cannot use the t-test because the distributions are highly skewed.