Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download

📚 The CoCalc Library - books, templates and other resources

132930 views
License: OTHER
Kernel: SageMath 8.1

Math 157: Intro to Mathematical Software

UC San Diego, winter 2018

February 16, 2018: Number theory and cryptography (part 3 of 3)

Administrivia:

  • Once again, I will collect the homework no earlier than Saturday 8pm.

  • No lecture Monday, February 21 (university holiday). Otherwise, next week's schedule is unaffected.

Added in class:

  • HW 3 has been returned. Regarding any issues with the grading, contact Thomas or me.

Comments on the homework

  • General: if you turn off the Sage preparser while using pandas, be sure to turn it back on afterwards! Otherwise some things may fail in weird ways, e.g., a^b will not behave as expected.

  • Problem 1: if you are having trouble with the link to the Zip Code Database, you can copy the file "zips.csv" from the shared project into your course project. Also, for some reason, the capital city of Kentucky (Frankfort) does not appear; its correct zip code is 40601 (which does appear under another name).

  • Problem 4a: see section 2.1 of the research paper posted at https://factorable.net.

  • Problem 5b: you are supposed to find the probability as predicted by the conjecture. That is, look up the formula for the Artin constant on Wikipedia and use that formula to compute the value to at least five decimal digits (you can use Wikipedia's computed value to check your work).

  • Problem 6a: do this "by hand", not using a built-in Sage function (although you may use the latter to check your work).

  • Problem 6b: similarly, do this "by hand", not using a built-in Sage function.

(Pause here to field additional questions.)

RSA encryption and decryption

Last time, we saw how to use the difficulty of the discrete logarithm problem to execute a Diffie-Hellman key exchange.

This time, we will use the difficulty of integer factorization to perform RSA encryption and decryption.

To begin with, one needs two large primes pp and qq, preferably of comparable size; one then computes N=pqN = pq. The values pp and qq must be kept private, but the value of NN will be made public. (Having pp and qq be of comparable size is important because otherwise, one can try to use a factorization technique that looks specifically for the smaller prime factor, e.g., trial division.)

p = random_prime(2^512) q = random_prime(2^512) N = p*q print(N)
41419943256619115371929633960915436081333601025132507684642028468484478921759407653356340195820797710576523813267027967507224851696827688138442542330986988369378488170389934195199287636368538023295982779360201026036707252228432756244181143111960337765417464689419294630990633160674129112550733873677881995741
factor(N) ## This won't work.

Since we know the factorization of NN as pqp*q, we can compute the Euler phi function φ(N)=(p1)(q1)\varphi(N) = (p-1)(q-1). This must be kept secret!

ph = (p-1)*(q-1) print(ph)
41419943256619115371929633960915436081333601025132507684642028468484478921759407653356340195820797710576523813267027967507224851696827688138442542330986973716823713220278630375682838980667992962107501946332281552777712479341039155879237151794818053742997364800935491878396649612206355197457074893870359949984

Let's say we want to have a public key for encryption and a secret key for decryption. We pick a value ee coprime to φ(N)\varphi(N) and make that public. We then compute the reciprocal of ee mod φ(N)\varphi(N), called dd, and keep that private.

e = random_prime(2^64) d = e^(-1) % ph print(e, d)
(1339493909277735437, 25840998133408742419595455758050696972236828815239533443498096594595316551634562946307796619793110913848274904189759842820342626173114347270184635151865129120878721512710431502017116977643069963214102116687602254887067213940299080830641534173635141493547159568010800796977112714903051348302459418335299763845)

Now suppose someone wants to send us a secure message. The message must first be transformed into an integer (having no common factor with NN, but this is essentially automatic because NN has only huge prime factors).

s = "MEET ME AT LA JOLLA COVE" m = Integer("".join(str(ord(i)) for i in s)) m
776969843277693265843276653274797676653267798669

Our correspondent encrypts mm by computing me(modN)m^e \pmod{N}. In addition to mm, this uses only the public information of ee and NN.

t = power_mod(m, e, N) t
27912245620487134604972897939748883075160551825565793245218458875841248341128239516630836789590935358630625874980773237593033547010589383235886952956651191391855382660476423135659391780939989306525894250067517046567212283463184215220338986475704684886746184722022536657027569410447848018345555663274438219797

We receive the ciphertext tt and then decrypt it by computing td(modN)t^d \pmod{N}. This uses the private information of dd.

m1 = power_mod(t, d, N) m1
37335539151062425937639490622555563807251994585434202062051098568250989929699930031512374192604163303503612858187761002135731957182245303713594387189655397024573668166433038547772427898250605056282073345494499306021176560559609210304086605675353030316140559333523186231111843955057748491900306707179425538293

Finally, we translate this integer back into a string.

s1 = str(m1) l = [int(s1[i:i+2]) for i in range(0, len(s1), 2)] s2 = "".join(str(chr(int(i))) for i in l) s2
'%!7\'\x0f\n>*;%?^Z>\x1978&\x07\x19\x13^:6"\x14\x14>\x05\nb8R2bc\x1dEc\x1e\x03\x0f\x0c%)\\<)?\x1e#\x03=\x1c:\x12M=\x00\x15#I\x139\x12\x16-\x1e%\r;+W\x12`7\'F\x189$D\x10@!\x03U/M\x18\x1bYR2<28\x1c\x14I"6^1]\x06\x02\x0bL8\x05;<\\\n\x1e(V<8K#\x1e\x1e\x1f=(7]!4\x1fV\x17\x0b\x0bT\'7\x05M01\x13\x00\x1eC\x07\x11^\x195R]'

One can also do this in reverse: we can use the secret key to encode a message which anyone can decrypt using the public key. This is an example of authentication: the fact that the message decrypts correctly proves that we must have sent it, because (presumably) no one other than us has access to the secret information needed to compute the secret key!

The difference between theory and practice...

... is that there is no difference... in theory.

This demonstration is an oversimplification for several reasons: see https://en.wikipedia.org/wiki/RSA_(cryptosystem).

The security of RSA depends on the practical difficulty of factoring NN. (In principle, there might exist a method for breaking RSA without directly factoring NN, but no such techniques are currently known.) See https://members.loria.fr/PZimmermann/records/factor.html for some "factoring records".

As with Diffie-Hellman, the performance of RSA is inferior to that of symmetric encryption systems of comparable security. One thus typically uses RSA as a key exchange protocol rather than a direct encryption scheme.

  • Alice, who does not have the factorization of NN, chooses a random key and uses the public key to encrypt it like any other message.

  • Bob uses the secret key to decrypt the key. Now they have a shared secret with which to perform symmetric encryption.

Smart cards

In addition to being the most widely used public key cryptographic system on the internet, RSA can also be used in SMART CARDS.

Smart cards have a tiny chip embedded in them that allows the card to communicate with the bank through, say, the ATM.

The bank chooses two large primes pp and qq and computes the modulus N=pq.N = pq. It programs each card with the RSA encryption with an encryption exponent ee specific to that card.

p = random_prime(2^512) q = random_prime(2^512) N = p*q print(N)
55753010897792513319396262605382984542348815026948328121199699562295051986355171437661537675381569099267283792638252474446434017019310425551370624570490143119427546778142657558828508437092389411621583124763345317001148342911303586862531163006421348107487835094271068476263141623862553910734429465559821137487

The bank needs to compute φ(N),\varphi(N), which it can do since it knows pp and q.q.

ph = (p-1)*(q-1) print(ph)
55753010897792513319396262605382984542348815026948328121199699562295051986355171437661537675381569099267283792638252474446434017019310425551370624570490127709715061407540052369746692500106005831329821471096444663004990919613331684123223430835259839696835669098063408028974315180224934805901519409675244883840

Upon receiving the card, the customer chooses a PIN. The bank stores in each customer account the PIN and the decrypting exponent d.d. Note that the customer does not know d.d.

e = random_prime(2^64) d = e^(-1) % ph print('encryption exponent program in the chip e =' + str(e)) print('decryption exponent stored by the bank d =' + str(d))
encryption exponent program in the chip e =10569735828442645609 decryption exponent stored by the bank d =48449062934817461599681895951503591981962314732419532782779063373486130627792046512139620899766610154788613258895302147063997053163696743066592071598636057436651204205804860313552165196541700324936453878640306277553478164479656116099559078609448012179364519981150789751595705642966068193372586423445329419609

When the customer inserts the card at an ATM and enters the PIN, the bank retrieves dd from the customer file, picks a random integer M<NM<N and sends Md(modN)M^d \pmod N to the card.

M = ZZ.random_element(1,2^100) t = power_mod (M, d , N) print(M) print(t)
327863867837269351084580819530 47365935985462719815290633365959310178550850278503051755598444720520029508947878810567057504970318032611150794210913491686157547185948592701613023581978764509295772386534792227068320661168269135882842164486164134924550476540455396680618256658569667500688579307243037787983537170081922940592280993859491583185

The card computes M=te(modN)M = t^e \pmod N and sends (M+1)e(modN)(M+1)^e \pmod N to the bank.

MC = power_mod(t,e,N) MC - M
0
r = power_mod(MC+1, e, N) print(r)
40940869208008607894887699909399057636557029661554650664200766649067960968226490030316511581538090003994578641537346768771185020252375478264432550500774396015516228655519787312269015058088021242824164813558885668203605861528575489145528218208866771818528065558298003312712996202735640299063672128420894092003

The bank computes rd(modN)=((M+1)d)e(modN)M+1(modN)r^d \pmod N = ((M+1)^d)^e \pmod N \equiv M +1 \pmod N and checks that rM=1.r - M=1. If this holds, it accepts that the card is genuine.

power_mod(r,d,N)- M

Key point: The ATM only sees Md(modN)M^d\pmod N and (M+1)e(modN)(M+1)^e\pmod N and cannot recover e,e, which is the crucial piece of information that the card contains.

So why not stop here?

I mentioned that in addition to DH and RSA, there is a third commonly used public-key encryption technique based on elliptic curves. So why is a third one needed?

The reason is that we are not completely ignorant of good techniques for attacking the factorization and discrete logarithm problems, on whose hardness these two schemes depend. For example, for factorization, there is a method called the quadratic sieve which factors NN by finding a pair of integers x,yx,y such that x2y2(modN). x^2 \equiv y^2 \pmod{N}. Of course one can produce such integers in a trivial way, by starting with xx and choose yy so that either x+yx+y or xyx-y is a multiple of NN. However, if one finds these xx and yy in some other way, it can very well happen that the prime factors of NN are distributed in some nontrivial way across x+yx+y and xyx-y, in which case computing gcd(xy,N)\gcd(x-y,N) will expose a nontrivial factor of NN.

There is no obvious way to modify RSA to address this, other than to use bigger numbers to try to defeat such attacks. However, Diffie-Hellman can be abstracted in such a way that the role of multiplication mod pp can be replaced with some other mathematical structure. Namely, all we need is a finite set SS of known cardinality nn and a computable binary operation \star on SS which satisfies the axioms for an abelian group:

  • commutativity: for all a,ba,b, ab=baa \star b = b \star a;

  • associativity: for all a,b,ca,b,c, a(bc)=(ab)ca \star (b \star c) = (a \star b) \star c;

  • identity: there exists some ee for which for all aa, ae=eaa \star e = e \star a.

  • inverses: for every aa, there is some bb for which ab=ea \star b = e.

The usual Diffie-Hellman uses S={1,,p1}S = \{1,\dots,p-1\} and ab=ab(modp)a \star b = ab \pmod{p}.

Well, we do need one more thing. Using \star, we can define exponentiation of an element aSa \in S by an integer gg by declaring that ag=aaaa^g = a \star a \star \cdots \star a where there are gg terms in the product. We can then define the discrete logarithm of bb with respect to aa to be the smallest nonnegative integer gg such that ag=ba^g = b, assuming that such an integer exists.

But we need it to be hard to compute these discrete logarithms! For instance, taking S={0,,p1}S = \{0,\dots,p-1\} and ab=a+b(modp)a \star b = a+b \pmod{p} won't work because in this case, the discrete logarithms are easy to compute. (Why?)

One particularly convenient source of "mysterious" abelian groups is elliptic curves. I will (probably) have more to say about these in a later lecture.