CoCalc -- 2018-02-16.ipynb

Download

📚 The CoCalc Library - books, templates and other resources

📚 The Library - Shared Public Version

cocalc-examples / math157 / lectures / 2018-02-16 / 2018-02-16.ipynb

¹³²⁹³⁰ views
License: OTHER

Kernel: SageMath 8.1

Math 157: Intro to Mathematical Software

UC San Diego, winter 2018

February 16, 2018: Number theory and cryptography (part 3 of 3)

Administrivia:

Once again, I will collect the homework no earlier than Saturday 8pm.
No lecture Monday, February 21 (university holiday). Otherwise, next week's schedule is unaffected.

Added in class:

HW 3 has been returned. Regarding any issues with the grading, contact Thomas or me.

Comments on the homework

General: if you turn off the Sage preparser while using pandas, be sure to turn it back on afterwards! Otherwise some things may fail in weird ways, e.g., a^b will not behave as expected.
Problem 1: if you are having trouble with the link to the Zip Code Database, you can copy the file "zips.csv" from the shared project into your course project. Also, for some reason, the capital city of Kentucky (Frankfort) does not appear; its correct zip code is 40601 (which does appear under another name).
Problem 4a: see section 2.1 of the research paper posted at https://factorable.net.
Problem 5b: you are supposed to find the probability as predicted by the conjecture. That is, look up the formula for the Artin constant on Wikipedia and use that formula to compute the value to at least five decimal digits (you can use Wikipedia's computed value to check your work).
Problem 6a: do this "by hand", not using a built-in Sage function (although you may use the latter to check your work).
Problem 6b: similarly, do this "by hand", not using a built-in Sage function.

(Pause here to field additional questions.)

RSA encryption and decryption

Last time, we saw how to use the difficulty of the discrete logarithm problem to execute a Diffie-Hellman key exchange.

This time, we will use the difficulty of integer factorization to perform RSA encryption and decryption.

To begin with, one needs two large primes $p$ and $q$ , preferably of comparable size; one then computes $N = pq$ . The values $p$ and $q$ must be kept private, but the value of $N$ will be made public. (Having $p$ and $q$ be of comparable size is important because otherwise, one can try to use a factorization technique that looks specifically for the smaller prime factor, e.g., trial division.)

In [1]:

p = random_prime(2^512)
q = random_prime(2^512)
N = p*q
print(N)

Out[1]:

41419943256619115371929633960915436081333601025132507684642028468484478921759407653356340195820797710576523813267027967507224851696827688138442542330986988369378488170389934195199287636368538023295982779360201026036707252228432756244181143111960337765417464689419294630990633160674129112550733873677881995741

factor(N) ## This won't work.

Since we know the factorization of $N$ as $p*q$ , we can compute the Euler phi function $\varphi(N) = (p-1)(q-1)$ . This must be kept secret!

In [2]:

ph = (p-1)*(q-1)
print(ph)

Out[2]:

41419943256619115371929633960915436081333601025132507684642028468484478921759407653356340195820797710576523813267027967507224851696827688138442542330986973716823713220278630375682838980667992962107501946332281552777712479341039155879237151794818053742997364800935491878396649612206355197457074893870359949984

Let's say we want to have a public key for encryption and a secret key for decryption. We pick a value $e$ coprime to $\varphi(N)$ and make that public. We then compute the reciprocal of $e$ mod $\varphi(N)$ , called $d$ , and keep that private.

In [3]:

e = random_prime(2^64)
d = e^(-1) % ph
print(e, d)

Out[3]:

(1339493909277735437, 25840998133408742419595455758050696972236828815239533443498096594595316551634562946307796619793110913848274904189759842820342626173114347270184635151865129120878721512710431502017116977643069963214102116687602254887067213940299080830641534173635141493547159568010800796977112714903051348302459418335299763845)

Now suppose someone wants to send us a secure message. The message must first be transformed into an integer (having no common factor with $N$ , but this is essentially automatic because $N$ has only huge prime factors).

In [4]:

s = "MEET ME AT LA JOLLA COVE"
m = Integer("".join(str(ord(i)) for i in s))
m

Out[4]:

776969843277693265843276653274797676653267798669

Our correspondent encrypts $m$ by computing $m^e \pmod{N}$ . In addition to $m$ , this uses only the public information of $e$ and $N$ .

In [5]:

t = power_mod(m, e, N)
t

Out[5]:

27912245620487134604972897939748883075160551825565793245218458875841248341128239516630836789590935358630625874980773237593033547010589383235886952956651191391855382660476423135659391780939989306525894250067517046567212283463184215220338986475704684886746184722022536657027569410447848018345555663274438219797

We receive the ciphertext $t$ and then decrypt it by computing $t^d \pmod{N}$ . This uses the private information of $d$ .

In [12]:

m1 = power_mod(t, d, N)
m1

Out[12]:

37335539151062425937639490622555563807251994585434202062051098568250989929699930031512374192604163303503612858187761002135731957182245303713594387189655397024573668166433038547772427898250605056282073345494499306021176560559609210304086605675353030316140559333523186231111843955057748491900306707179425538293

Finally, we translate this integer back into a string.

In [13]:

s1 = str(m1)
l = [int(s1[i:i+2]) for i in range(0, len(s1), 2)]
s2 = "".join(str(chr(int(i))) for i in l)
s2

Out[13]:

'%!7\'\x0f\n>*;%?^Z>\x1978&\x07\x19\x13^:6"\x14\x14>\x05\nb8R2bc\x1dEc\x1e\x03\x0f\x0c%)\\<)?\x1e#\x03=\x1c:\x12M=\x00\x15#I\x139\x12\x16-\x1e%\r;+W\x12`7\'F\x189$D\x10@!\x03U/M\x18\x1bYR2<28\x1c\x14I"6^1]\x06\x02\x0bL8\x05;<\\\n\x1e(V<8K#\x1e\x1e\x1f=(7]!4\x1fV\x17\x0b\x0bT\'7\x05M01\x13\x00\x1eC\x07\x11^\x195R]'

One can also do this in reverse: we can use the secret key to encode a message which anyone can decrypt using the public key. This is an example of authentication: the fact that the message decrypts correctly proves that we must have sent it, because (presumably) no one other than us has access to the secret information needed to compute the secret key!

The difference between theory and practice...

... is that there is no difference... in theory.

This demonstration is an oversimplification for several reasons: see https://en.wikipedia.org/wiki/RSA_(cryptosystem).

The security of RSA depends on the practical difficulty of factoring $N$ . (In principle, there might exist a method for breaking RSA without directly factoring $N$ , but no such techniques are currently known.) See https://members.loria.fr/PZimmermann/records/factor.html for some "factoring records".

As with Diffie-Hellman, the performance of RSA is inferior to that of symmetric encryption systems of comparable security. One thus typically uses RSA as a key exchange protocol rather than a direct encryption scheme.

Alice, who does not have the factorization of $N$ , chooses a random key and uses the public key to encrypt it like any other message.
Bob uses the secret key to decrypt the key. Now they have a shared secret with which to perform symmetric encryption.

Smart cards

In addition to being the most widely used public key cryptographic system on the internet, RSA can also be used in SMART CARDS.

Smart cards have a tiny chip embedded in them that allows the card to communicate with the bank through, say, the ATM.

The bank chooses two large primes $p$ and $q$ and computes the modulus $N = pq.$ It programs each card with the RSA encryption with an encryption exponent $e$ specific to that card.

In [15]:

p = random_prime(2^512)
q = random_prime(2^512)
N = p*q
print(N)

Out[15]:

55753010897792513319396262605382984542348815026948328121199699562295051986355171437661537675381569099267283792638252474446434017019310425551370624570490143119427546778142657558828508437092389411621583124763345317001148342911303586862531163006421348107487835094271068476263141623862553910734429465559821137487

The bank needs to compute $\varphi(N),$ which it can do since it knows $p$ and $q.$

In [16]:

ph = (p-1)*(q-1)
print(ph)

Out[16]:

55753010897792513319396262605382984542348815026948328121199699562295051986355171437661537675381569099267283792638252474446434017019310425551370624570490127709715061407540052369746692500106005831329821471096444663004990919613331684123223430835259839696835669098063408028974315180224934805901519409675244883840

Upon receiving the card, the customer chooses a PIN. The bank stores in each customer account the PIN and the decrypting exponent $d.$ Note that the customer does not know $d.$

In [17]:

e = random_prime(2^64)
d = e^(-1) % ph
print('encryption exponent program in the chip e =' + str(e))
print('decryption exponent stored by the bank d =' + str(d))

Out[17]:

encryption exponent program in the chip e =10569735828442645609
decryption exponent stored by the bank d =48449062934817461599681895951503591981962314732419532782779063373486130627792046512139620899766610154788613258895302147063997053163696743066592071598636057436651204205804860313552165196541700324936453878640306277553478164479656116099559078609448012179364519981150789751595705642966068193372586423445329419609

When the customer inserts the card at an ATM and enters the PIN, the bank retrieves $d$ from the customer file, picks a random integer $M<N$ and sends $M^d \pmod N$ to the card.

In [18]:

M = ZZ.random_element(1,2^100)
t = power_mod (M, d , N)
print(M)
print(t)

Out[18]:

327863867837269351084580819530
47365935985462719815290633365959310178550850278503051755598444720520029508947878810567057504970318032611150794210913491686157547185948592701613023581978764509295772386534792227068320661168269135882842164486164134924550476540455396680618256658569667500688579307243037787983537170081922940592280993859491583185

The card computes $M = t^e \pmod N$ and sends $(M+1)^e \pmod N$ to the bank.

In [19]:

MC = power_mod(t,e,N)
MC - M

Out[19]:

0

In [20]:

r = power_mod(MC+1, e, N) 
print(r)

Out[20]:

40940869208008607894887699909399057636557029661554650664200766649067960968226490030316511581538090003994578641537346768771185020252375478264432550500774396015516228655519787312269015058088021242824164813558885668203605861528575489145528218208866771818528065558298003312712996202735640299063672128420894092003

The bank computes $r^d \pmod N = ((M+1)^d)^e \pmod N \equiv M +1 \pmod N$ and checks that $r - M=1.$ If this holds, it accepts that the card is genuine.

In [0]:

power_mod(r,d,N)- M

Key point: The ATM only sees $M^d\pmod N$ and $(M+1)^e\pmod N$ and cannot recover $e,$ which is the crucial piece of information that the card contains.

So why not stop here?

I mentioned that in addition to DH and RSA, there is a third commonly used public-key encryption technique based on elliptic curves. So why is a third one needed?

The reason is that we are not completely ignorant of good techniques for attacking the factorization and discrete logarithm problems, on whose hardness these two schemes depend. For example, for factorization, there is a method called the quadratic sieve which factors $N$ by finding a pair of integers $x,y$ such that $x^2 \equiv y^2 \pmod{N}.$ Of course one can produce such integers in a trivial way, by starting with $x$ and choose $y$ so that either $x+y$ or $x-y$ is a multiple of $N$ . However, if one finds these $x$ and $y$ in some other way, it can very well happen that the prime factors of $N$ are distributed in some nontrivial way across $x+y$ and $x-y$ , in which case computing $\gcd(x-y,N)$ will expose a nontrivial factor of $N$ .

There is no obvious way to modify RSA to address this, other than to use bigger numbers to try to defeat such attacks. However, Diffie-Hellman can be abstracted in such a way that the role of multiplication mod $p$ can be replaced with some other mathematical structure. Namely, all we need is a finite set $S$ of known cardinality $n$ and a computable binary operation $\star$ on $S$ which satisfies the axioms for an abelian group:

commutativity: for all $a,b$ , $a \star b = b \star a$ ;
associativity: for all $a,b,c$ , $a \star (b \star c) = (a \star b) \star c$ ;
identity: there exists some $e$ for which for all $a$ , $a \star e = e \star a$ .
inverses: for every $a$ , there is some $b$ for which $a \star b = e$ .

The usual Diffie-Hellman uses $S = \{1,\dots,p-1\}$ and $a \star b = ab \pmod{p}$ .

Well, we do need one more thing. Using $\star$ , we can define exponentiation of an element $a \in S$ by an integer $g$ by declaring that $a^g = a \star a \star \cdots \star a$ where there are $g$ terms in the product. We can then define the discrete logarithm of $b$ with respect to $a$ to be the smallest nonnegative integer $g$ such that $a^g = b$ , assuming that such an integer exists.

But we need it to be hard to compute these discrete logarithms! For instance, taking $S = \{0,\dots,p-1\}$ and $a \star b = a+b \pmod{p}$ won't work because in this case, the discrete logarithms are easy to compute. (Why?)

One particularly convenient source of "mysterious" abelian groups is elliptic curves. I will (probably) have more to say about these in a later lecture.

In [0]: