CoCalc -- config.py

📚 The CoCalc Library - books, templates and other resources

cocalc-examples / stanford-tensorflow-tutorials / 2017 / assignments / chatbot / config.py

¹³²⁹³⁷ views
License: OTHER

1
""" A neural chatbot using sequence to sequence model with
2
attentional decoder. 
3

4
This is based on Google Translate Tensorflow model 
5
https://github.com/tensorflow/models/blob/master/tutorials/rnn/translate/
6

7
Sequence to sequence model by Cho et al.(2014)
8

9
Created by Chip Huyen as the starter code for assignment 3,
10
class CS 20SI: "TensorFlow for Deep Learning Research"
11
cs20si.stanford.edu
12

13
This file contains the hyperparameters for the model.
14

15
See readme.md for instruction on how to run the starter code.
16
"""
17

18
# parameters for processing the dataset
19
DATA_PATH = '/Users/Chip/data/cornell movie-dialogs corpus'
20
CONVO_FILE = 'movie_conversations.txt'
21
LINE_FILE = 'movie_lines.txt'
22
OUTPUT_FILE = 'output_convo.txt'
23
PROCESSED_PATH = 'processed'
24
CPT_PATH = 'checkpoints'
25

26
THRESHOLD = 2
27

28
PAD_ID = 0
29
UNK_ID = 1
30
START_ID = 2
31
EOS_ID = 3
32

33
TESTSET_SIZE = 25000
34

35
# model parameters
36
""" Train encoder length distribution:
37
[175, 92, 11883, 8387, 10656, 13613, 13480, 12850, 11802, 10165, 
38
8973, 7731, 7005, 6073, 5521, 5020, 4530, 4421, 3746, 3474, 3192, 
39
2724, 2587, 2413, 2252, 2015, 1816, 1728, 1555, 1392, 1327, 1248, 
40
1128, 1084, 1010, 884, 843, 755, 705, 660, 649, 594, 558, 517, 475, 
41
426, 444, 388, 349, 337]
42
These buckets size seem to work the best
43
"""
44
# [19530, 17449, 17585, 23444, 22884, 16435, 17085, 18291, 18931]
45
# BUCKETS = [(6, 8), (8, 10), (10, 12), (13, 15), (16, 19), (19, 22), (23, 26), (29, 32), (39, 44)]
46

47
# [37049, 33519, 30223, 33513, 37371]
48
# BUCKETS = [(8, 10), (12, 14), (16, 19), (23, 26), (39, 43)]
49

50
# BUCKETS = [(8, 10), (12, 14), (16, 19)]
51
BUCKETS = [(16, 19)]
52

53
NUM_LAYERS = 3
54
HIDDEN_SIZE = 256
55
BATCH_SIZE = 64
56

57
LR = 0.5
58
MAX_GRAD_NORM = 5.0
59

60
NUM_SAMPLES = 512
61

62

Product

Resources

Company