📚 The CoCalc Library - books, templates and other resources
License: OTHER
Math 157: Intro to Mathematical Software
UC San Diego, winter 2018
February 16, 2018: Number theory and cryptography (part 3 of 3)
Administrivia:
Once again, I will collect the homework no earlier than Saturday 8pm.
No lecture Monday, February 21 (university holiday). Otherwise, next week's schedule is unaffected.
Added in class:
HW 3 has been returned. Regarding any issues with the grading, contact Thomas or me.
Comments on the homework
General: if you turn off the Sage preparser while using pandas, be sure to turn it back on afterwards! Otherwise some things may fail in weird ways, e.g.,
a^b
will not behave as expected.Problem 1: if you are having trouble with the link to the Zip Code Database, you can copy the file "zips.csv" from the shared project into your course project. Also, for some reason, the capital city of Kentucky (Frankfort) does not appear; its correct zip code is 40601 (which does appear under another name).
Problem 4a: see section 2.1 of the research paper posted at https://factorable.net.
Problem 5b: you are supposed to find the probability as predicted by the conjecture. That is, look up the formula for the Artin constant on Wikipedia and use that formula to compute the value to at least five decimal digits (you can use Wikipedia's computed value to check your work).
Problem 6a: do this "by hand", not using a built-in Sage function (although you may use the latter to check your work).
Problem 6b: similarly, do this "by hand", not using a built-in Sage function.
(Pause here to field additional questions.)
RSA encryption and decryption
Last time, we saw how to use the difficulty of the discrete logarithm problem to execute a Diffie-Hellman key exchange.
This time, we will use the difficulty of integer factorization to perform RSA encryption and decryption.
To begin with, one needs two large primes and , preferably of comparable size; one then computes . The values and must be kept private, but the value of will be made public. (Having and be of comparable size is important because otherwise, one can try to use a factorization technique that looks specifically for the smaller prime factor, e.g., trial division.)
Since we know the factorization of as , we can compute the Euler phi function . This must be kept secret!
Let's say we want to have a public key for encryption and a secret key for decryption. We pick a value coprime to and make that public. We then compute the reciprocal of mod , called , and keep that private.
Now suppose someone wants to send us a secure message. The message must first be transformed into an integer (having no common factor with , but this is essentially automatic because has only huge prime factors).
Our correspondent encrypts by computing . In addition to , this uses only the public information of and .
We receive the ciphertext and then decrypt it by computing . This uses the private information of .
Finally, we translate this integer back into a string.
One can also do this in reverse: we can use the secret key to encode a message which anyone can decrypt using the public key. This is an example of authentication: the fact that the message decrypts correctly proves that we must have sent it, because (presumably) no one other than us has access to the secret information needed to compute the secret key!
The difference between theory and practice...
... is that there is no difference... in theory.
This demonstration is an oversimplification for several reasons: see https://en.wikipedia.org/wiki/RSA_(cryptosystem).
The security of RSA depends on the practical difficulty of factoring . (In principle, there might exist a method for breaking RSA without directly factoring , but no such techniques are currently known.) See https://members.loria.fr/PZimmermann/records/factor.html for some "factoring records".
As with Diffie-Hellman, the performance of RSA is inferior to that of symmetric encryption systems of comparable security. One thus typically uses RSA as a key exchange protocol rather than a direct encryption scheme.
Alice, who does not have the factorization of , chooses a random key and uses the public key to encrypt it like any other message.
Bob uses the secret key to decrypt the key. Now they have a shared secret with which to perform symmetric encryption.
Smart cards
In addition to being the most widely used public key cryptographic system on the internet, RSA can also be used in SMART CARDS.
Smart cards have a tiny chip embedded in them that allows the card to communicate with the bank through, say, the ATM.
The bank chooses two large primes and and computes the modulus It programs each card with the RSA encryption with an encryption exponent specific to that card.
The bank needs to compute which it can do since it knows and
Upon receiving the card, the customer chooses a PIN. The bank stores in each customer account the PIN and the decrypting exponent Note that the customer does not know
When the customer inserts the card at an ATM and enters the PIN, the bank retrieves from the customer file, picks a random integer and sends to the card.
The card computes and sends to the bank.
The bank computes and checks that If this holds, it accepts that the card is genuine.
Key point: The ATM only sees and and cannot recover which is the crucial piece of information that the card contains.
So why not stop here?
I mentioned that in addition to DH and RSA, there is a third commonly used public-key encryption technique based on elliptic curves. So why is a third one needed?
The reason is that we are not completely ignorant of good techniques for attacking the factorization and discrete logarithm problems, on whose hardness these two schemes depend. For example, for factorization, there is a method called the quadratic sieve which factors by finding a pair of integers such that Of course one can produce such integers in a trivial way, by starting with and choose so that either or is a multiple of . However, if one finds these and in some other way, it can very well happen that the prime factors of are distributed in some nontrivial way across and , in which case computing will expose a nontrivial factor of .
There is no obvious way to modify RSA to address this, other than to use bigger numbers to try to defeat such attacks. However, Diffie-Hellman can be abstracted in such a way that the role of multiplication mod can be replaced with some other mathematical structure. Namely, all we need is a finite set of known cardinality and a computable binary operation on which satisfies the axioms for an abelian group:
commutativity: for all , ;
associativity: for all , ;
identity: there exists some for which for all , .
inverses: for every , there is some for which .
The usual Diffie-Hellman uses and .
Well, we do need one more thing. Using , we can define exponentiation of an element by an integer by declaring that where there are terms in the product. We can then define the discrete logarithm of with respect to to be the smallest nonnegative integer such that , assuming that such an integer exists.
But we need it to be hard to compute these discrete logarithms! For instance, taking and won't work because in this case, the discrete logarithms are easy to compute. (Why?)
One particularly convenient source of "mysterious" abelian groups is elliptic curves. I will (probably) have more to say about these in a later lecture.