Exercise Sheet 2: Linear Algebra and TensorFlow Basics (Deadline: 14 Nov 14:00)
Linear Algebra (11 points)
For theoretical tasks you are encouraged to write LaTeX. Jupyter notebooks support them simply by typing an expression between dollar signs or in the blocks like \ begin{equation} \ end{equation}.
Alternatively, you can upload the solutions in the written form as images and paste them inside cells. But if you do this, make sure that the images have high quality, so we can read them without any problems.
1. Orthogonal vectors (1 point)
Let be non-zero vectors in . Show that = 0 if and only if a and b are orthogonal vectors (i.e. angle between them equals radians).
Hint: be careful to prove 2 directions: 1.) = 0 orthogonality and 2.) orthogonality = 0.
2. Interpretation of matrix multiplication (2 points)
Let a. Are the column vectors of B orthogonal? Orthonormal? Is B a rotation matrix? Justify your answers.(0.5 points)
b. Calculate . (0.5 points)
c. What is the interpretation of a matrix multiplication, when the column vectors of B are orthonormal (you may want to draw a simple picture visualizing row vectors of A and column vectors of B, but you don't need to submit it)? (1 point)
3. Property of matrix (1 point)
Let be a matrix obtained by stacking all training vectors (also called "design matrix"), like in the lecture. Obviously, it can be an arbritrary matrix. However a covariance matrix has some interesting properties, namely it is always a positive semidefinite matrix ( for all non-zero ). Prove this statement.
Hint: can be represented as a squared norm of some vector.
4. Eigendecomposition (4 points)
Consider the following matrix: Is the matrix M:
a. Symmetric? What does it imply for its eigendecomposition? (0.5 points)
b. Singular? What does it imply for its eigendecomposition? (0.5 points)
c. Find the eigendecomposition of M (3 points).
5. Rayleigh-Ritz principle (3 points)
On the lecture we saw that the exact solution for decoding matrix D in PCA problem is a matrix that contains the eigenvectors corresponding to the largest eigenvalues of , where X is a design matrix. But where does it come from?
If you want to see full derivation of PCA, you can find it in Deep Learning Book, chapter 2.12. In this exercise we would like to focus only on Raleigh-Ritz principle, which allows to solve optimization problems of type or using eigendecomposition of C.
So prove that , where is the smallest eigenvalue of C. For which vector this minimum is attained? (the proof for is analogous).
Answer - 1
Given then is just the inner product .
To prove in both directions it is sufficient to show that in case of ,
Using the Cosine Law and the distribution property of the inner product, we have:
Equating the two expressions and noting that we have , we have our result.
Answer - 2
a.
(i)
(ii) They are orthogonal since their dot product = 0. They are orthonomal since each vector individually is a normal vector.
(iii) A square matrix is a rotation matrix iff and is an orthogonal matrix. . Therefore, we conclude that is a rotation matrix.
b.
c.
Given a rotation matrix and some vector , then the rotation is a counter-clockwise rotation of by . Since then must be clockwise rotation by the same angle.
We note that , and so we conclude that the row matricies of are rotated clockwise by the operation .
Answer - 3
Answer - 4
Have:
a.
Symmetric matricies have linearly independent eigenvectors and their normalised eigenvectors make up an orthonormal set.
b.
We find that , therefore it is singular. This implies that is an eigenvalue of .
c.
We look for the eigendecomposition of the form , where eigenvectors make up the columns of , with each column corresponding to the eigenvalue found in the diagonal eigenvalue matrix - ie .
Do to obtain the charachteristic polynomial: . Now solve for to find the roots - i.e eigenvalues.
We find the following eigenvalues: . These make up the diagonal matrix .
Now solve for to obtain each eigenvector , corresponding to each eigenvalue . We find three eigenvectors:
Next normalize each of the eigenvectors and form the matrix , obtaining .
Finally we use the symmetric matrix property that , which is our eigendecomposition of .
We include an attachment of hand calculation as well as numeric computation for confirmation:
Answer - 5
Let and symmetric, as well as .
We have the following eigendecomposition , where is orthogonal and is a diagonal matrix with eigenvalues on the diagonal. WLOG, we order the eigenvalues in an ascending order: .
Then,
Using the same logic we can show that
Moreover, now let be the eigenvector correponding to the eigenvalue , then since columns of form an orthonormal set of eigenvectors, it is easy to show that .
TensorFlow (9 points)
1. A simple computational graph in Tensorflow (2 + 1 = 3 points)
To get started with TensorFlow we ask you to specify a very simple computational graph of the function and also taking its derivative. If you don't know how to proceed, please study the example given on the lecture and also the official TenorFlow starting guide: https://www.tensorflow.org/get_started/get_started.
This exercise requires around 10 lines of code, it is simple enough. However, it is very important to understand the notion of computational graph.
Actually, this simple example will be a backbone for all neural networks in the future. All what we will need is to specify first the forward pass (f), and then take derivatives with respect to parameters (x), in order to update them, so they minimize the loss function f. But we will come back to this in several weeks.
Tensorflow Question:
1.) In the exercise above x should be a TF placeholder, but f is a TF tensor. What is the difference between a placeholder and other tensors in computational graph? And how TF Variable differs from tensors? (0.5 point)
Answer:
A TF placeholder is a structure that makes a promise of providing data at some later point. The value of the placeholder must be fed at runtime. It is often used to load datasets and input variables.
A tensor is a multi-dimensional array-like structure. Tensors are the basic units of TensorFlow.
A TF variable is a structure to initialize and store a value with the ability to change it later. The value of the variable must be specified in the decleration. It is ofted used to store model's weights and update them while training.
Tensorflow Question:
2.) What does tf.Session().run(...) accept as an argument and what does it return? What is the fundamental difference between these 2 types of objects? (0.5 point)
Answer:
tf.Session().run takes as arguments the operation(s) to be carried out (graph element(s)) as well as the assigned values for the placeholder(s) in the operation(s). It returns the results of carrying out the given operation(s) on the given placeholder value(s).
The difference is the argument is a graph element while the return object is the result of carrying out that element. For example, passing the argument tf.constant([10, 20]) will result in the 1-D array [10,20].
2. A pure Python implementation (1 point)
Please do the same, but in pure Python, calculating the expression for the derivative by hand. There is no need to use NumPy library, since we deal with 1-dimensional x (0.5 point).
3. A more complicated computational graph (5 points)
Okay, you could easily take the derivative of by hand. But what if you need to take the derivative of the function (where ) with respect to x? It resembles a neural network much more than the previous example. Please, implement it in TensorFlow (0.5 point) and in pure Python (0.5 point) in the same way as before (but using new cells with the code below).
Note that in pure Python implementation you will need to somehow overcome the numerical errors for small values of x (roughly from -0.1 to 0.1). You can just skip these values.
For pure Python implementation you will need to derive the derivative by hand (3 points). Please write down your derivation for the new (you are encouraged to write LaTeX for them):
= ...
Hint 1: you may want to use a new variable to simplify your derivations, for example setting can help. Don't worry if the final expression is quite ugly, it is supposed to be so.
Hint 2: your derived expression for should produce exactly the same values (up to the float type precision) as TensorFlow derivative.
Answer:
This gives us our result.
There exists also the interpritation of the problem where . For this the derivative is: .
We include both interpretations in the analysis that follows.
Questions (1 point)
** 1.) What is the main difference in the program structure between TensorFlow and plain Python? (0.5 point) **
The way variables are defined is different. In TF, we use TF types instead of python objects. In addition, calculations are not carried out in TF unless tf.Session().run() is executed, while in pure python the statements are executed while interpreted.
** 2.) Does TensorFlow provide numerical or automatic differentiation? What are advantages of this way of differentiation? (0.5 point) **
TF provides automatic differentiation. The advantages of AD is its resilience to errors in floating point rounding. We note this fact in the comparison of '2nd interpretation' plots.
Submission instructions
You should provide a single Jupyter notebook as a solution. The naming should include the assignment number and matriculation IDs of all team members in the following format: assignment-1_matriculation1_matriculation_2_matriculation3.ipynb (in case of 3 team members). Make sure to keep the order matriculation1_matriculation_2_matriculation3 the same for all assignments.
Please, submit your solution to your tutor (with [NNIA][assignment-2] in email subject):
Maksym Andriushchenko [email protected]
Marius Mosbach [email protected]
Rajarshi Biswas [email protected]
Marimuthu Kalimuthu [email protected]
If you are in a team, please submit only 1 solution to only 1 tutor.