Professor Ioannis Paraskevopoulos
Python Matrix commands ''''''''''''''''''''''' Matrix operators in Numpy matrix() coerces an object into the matrix class. .T transposes a matrix.
or dot(X,Y) is the operator for matrix multiplication (when matrices are 2-dimensional; see here). .I takes the inverse of a matrix. Note: the matrix must be invertible.
Symbolic Mathematics
Statistical Models
Statsmodel is a Python library designed for more statistically-oriented approaches to data analysis, with an emphasis on econometric analyses. It integrates well with the pandas and numpy libraries we covered in a previous post. It also has built in support for many of the statistical tests to check the quality of the fit and a dedicated set of plotting functions to visualize and diagnose the fit. Scikit-learn also has support for linear regression, including many forms of regularized regression lacking in statsmodels, but it lacks the rich set of statistical tests and diagnostics that have been developed for linear models.
Linear regression one of the simplest and most commonly used modeling techniques. It makes very strong assumptions about the relationship between the predictor variables (the X) and the response (the Y). It assumes that this relationship takes the form:
Ordinary Least Squares is the simplest and most common estimator in which the two s are chosen to minimize the square of the distance between the predicted values and the actual values. Even though this model is quite rigid and often does not reflect the true relationship, this still remains a popular approach for several reasons. For one, it is computationally cheap to calculate the coefficients. It is also easier to interpret than more sophisticated models, and in situations where the goal is understanding a simple model in detail, rather than estimating the response well, they can provide insight into what the model captures. Finally, in situations where there is a lot of noise, it may be hard to find the true functional form, so a constrained model can perform quite well compared to a complex model which is more affected by noise.
The resulting model is represented as follows:
Here the hats on the variables represent the fact that they are estimated from the data we have available. The s are termed the parameters of the model or the coefficients. β0 is called the constant term or the intercept.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-10-d87738ea8dc2> in <module>()
----> 1 est.params
AttributeError: 'OLS' object has no attribute 'params'
Extract the HOVAL column (home values) from the DBF file and make it the dependent variable for the regression. Note that PySAL requires this to be an nx1 numpy array.
Extract CRIME (crime) and INC (income) vectors from the DBF to be used as independent variables in the regression. Note that PySAL requires this to be an nxj numpy array, where j is the number of independent variables (not including a constant). pysal.spreg.OLS adds a vector of ones to the independent variables passed in.
If the optional parameters w and spat_diag are passed to pysal.spreg.OLS, spatial diagnostics will also be computed for the regression. These include Lagrange multiplier tests and Moran’s I of the residuals. The w parameter is a PySAL spatial weights matrix. In this example, w is built directly from the shapefile columbus.shp, but w can also be read in from a GAL or GWT file. In this case a rook contiguity weights matrix is built, but PySAL also offers queen contiguity, distance weights and k nearest neighbor weights among others. In the example, the Moran’s I of the residuals is 0.204 with a standardized value of 2.592 and a p-value of 0.0095.