Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download

📚 The CoCalc Library - books, templates and other resources

132928 views
License: OTHER
1
%!TEX root = main.tex
2
\section{Introduction}
3
Publicly available datasets have helped the computer vision community to
4
compare new algorithms and develop applications. Especially
5
MNIST~\cite{LeNet-5} was used thousands of times to train and evaluate models
6
for classification. However, even rather simple models consistently get about
7
$\SI{99.2}{\percent}$ accuracy on MNIST~\cite{TF-MNIST-2016}. The best models
8
classify everything except for about 20~instances correct. This makes
9
meaningful statements about improvements in classifiers hard. Possible reason
10
why current models are so good on MNIST are
11
\begin{enumerate*}
12
\item MNIST has only 10~classes
13
\item there are very few (probably none) labeling errors in MNIST
14
\item every class has \num{6000}~training samples
15
\item the feature dimensionality is comparatively low.
16
\end{enumerate*}
17
Also, applications which need to recognize only Arabic numerals are rare.
18
19
Similar to MNIST, \dbName{} is of very low resolution. In contrast to MNIST,
20
the \dbNameVersion~dataset contains \dbTotalClasses~classes, including Arabic
21
numerals and Latin characters. Furthermore, \dbNameVersion{} has much less
22
recordings per class than MNIST and is only in black and white whereas
23
MNIST is in grayscale.
24
25
\dbName{} could be used to train models for semantic segmentation of
26
non-cursive handwritten documents like mathematical notes or forms.
27
28
\section{Terminology}
29
A \textit{symbol} is an atomic semantic entity which has exactly one visual
30
appearance when it is handwritten. Examples of symbols are:
31
$\alpha, \propto, \cdot, x, \int, \sigma, \dots$
32
%\footnote{The first symbol is an \verb+\alpha+, the second one is a \verb+\propto+.}
33
34
While a symbol is a single semantic entity with a given visual appearance, a
35
glyph is a single typesetting entity. Symbols, glyphs and \LaTeX{} commands do
36
not relate:
37
38
\begin{itemize}
39
\item Two different symbols can have the same glyph. For example, the symbols
40
\verb+\sum+ and \verb+\Sigma+ both render to $\Sigma$, but they have different
41
semantics and hence they are different symbols.
42
\item Two different glyphs can correspond to the same semantic entity. An example is
43
\verb+\varphi+ ($\varphi$) and \verb+\phi+ ($\phi$): Both represent the small
44
Greek letter \enquote{phi}, but they exist in two different variants. Hence
45
\verb+\varphi+ and \verb+\phi+ are two different symbols.
46
\item Examples for different \LaTeX{} commands that represent the same symbol are
47
\verb+\alpha+ ($\alpha$) and \verb+\upalpha+ ($\upalpha$): Both have the same
48
semantics and are hand-drawn the same way. This is the case for all \verb+\up+
49
variants of Greek letters.
50
\end{itemize}
51
52
All elements of the data set are called \textit{recordings} in the following.
53
54
55
\section{How HASY was created}
56
\dbName{} is derived from the HWRT dataset which was first used and described
57
in~\cite{Thoma:2014}. HWRT is an on-line recognition dataset, meaning it does
58
not contain the handwritten symbols as images, but as point-sequences. Hence
59
HWRT contains strictly more information than \dbName. The larger dimension
60
of each recordings bounding box was scaled to be \SI{32}{\pixel}. The image
61
was then centered within the $\SI{32}{\pixel} \times \SI{32}{\pixel}$ bounding
62
box.
63
64
\begin{figure}[h]
65
\centering
66
\includegraphics*[width=\linewidth, keepaspectratio]{figures/sample-images.png}
67
\caption{100 recordings of the \dbNameVersion{} data set.}
68
\label{fig:100-data-items}
69
\end{figure}
70
71
HWRT contains exactly the same recordings and classes as \dbName, but \dbName{}
72
is rendered in order to make it easy to use.
73
74
HWRT and hence \dbName{} is a merged dataset. $\SI{91.93}{\percent}$ of HWRT
75
were collected by Detexify~\cite{Kirsch,Kirsch2014}. The remaining recordings
76
were collected by \href{http://write-math.com}{http://write-math.com}. Both
77
projects aim at helping users to find \LaTeX{} commands in cases where the
78
users know how to write the symbol, but not the symbols name. The user writes
79
the symbol on a blank canvas in the browser (either via touch devices or with a
80
mouse). Then the websites give the Top-$k$ results which the user could have
81
thought of. The user then clicks on the correct symbol to accept it as the
82
correct symbol. On \href{http://write-math.com}{write-math.com}, other users
83
can also suggest which symbol could be the correct one.
84
85
After collecting the data, Martin Thoma manually inspected each recording. This
86
manual inspection is a necessary step as anonymous web users could submit any
87
drawing they wanted to any symbol. This includes many creative recordings as
88
shown in~\cite{Kirsch,Thoma:2014} as well as loose associations. In some cases,
89
the correct label was unambiguous and could be changed. In other cases, the
90
recordings were removed from the data set.
91
92
It is not possible to determine the exact number of people who contributed
93
handwritten symbols to the Detexify part of the dataset. The part which was
94
created with \href{http://write-math.com}{write-math.com} was created by
95
477~user~IDs. Although user IDs are given in the dataset, they are not
96
reliable. On the one hand, the Detexify data has the user ID 16925,
97
although many users contributed to it. Also, some users lend their phone to
98
others while being logged in to show how write-math.com works. This leads to
99
multiple users per user ID. On the other hand, some users don't register and
100
use write-math.com multiple times. This can lead to multiple user IDs for one
101
person.
102
103
\section{Classes}
104
The \dbNameVersion~dataset contains \dbTotalClasses~classes. Those classes include the
105
Latin uppercase and lowercase characters (\verb+A-Z+, \verb+a-z+), the Arabic
106
numerals (\verb+0-9+), 32~different types of arrows, fractal and calligraphic
107
Latin characters, brackets and more. See \cref{table:symbols-of-db-0,table:symbols-of-db-1,table:symbols-of-db-2,table:symbols-of-db-3,table:symbols-of-db-4,table:symbols-of-db-5,table:symbols-of-db-6,table:symbols-of-db-7,table:symbols-of-db-8} for more information.
108
109
\section{Data}
110
The \dbNameVersion~dataset contains \dbTotalInstances{} black and white images
111
of the size $\SI{32}{\pixel} \times \SI{32}{\pixel}$. Each image is labeled
112
with one of \dbTotalClasses~labels. An example of 100~elements of the
113
\dbNameVersion{} data set is shown in~\cref{fig:100-data-items}.
114
115
The average amount of black pixels is \SI{16}{\percent}, but this is highly
116
class-dependent ranging from \SI{3.7}{\percent} of \enquote{$\dotsc$} to \SI{59.2}{\percent} of \enquote{$\blacksquare$} average
117
black pixel by class.
118
119
The ten classes with most recordings are:
120
\[\int, \sum, \infty, \alpha, \xi, \equiv, \partial, \mathds{R}, \in, \square\]
121
Those symbols have \num{26780} recordings and thus account for
122
\SI{16}{\percent} of the data set. 47~classes have more than \num{1000}
123
recordings. The number of recordings of the remaining classes are distributed
124
as visualized in~\cref{fig:class-data-distribution}.
125
126
\begin{figure}[h]
127
\centering
128
\includegraphics*[width=\linewidth, keepaspectratio]{figures/data-dist}
129
\caption{Distribution of the data among classes. 47~classes with
130
more than \num{1000} recordings are not shown.}
131
\label{fig:class-data-distribution}
132
\end{figure}
133
134
A weakness of \dbNameVersion{} is the amount of available data per class. For
135
some classes, there are only 51~elements in the test set.
136
137
The data has $32\cdot 32 = 1024$ features in $\Set{0, 255}$.
138
As~\cref{table:pca-explained-variance} shows, \SI{32}{\percent} of the features
139
can explain~\SI{90}{\percent} of the variance, \SI{54}{\percent} of the
140
features explain \SI{99}{\percent} of the variance and \SI{86}{\percent} of the
141
features explain \SI{99}{\percent} of the variance.
142
143
\begin{table}[h]
144
\centering
145
\begin{tabular}{lccc}
146
\toprule
147
Principal Components & 331 & 551 & 882 \\
148
Explained Variance & \SI{90}{\percent} & \SI{95}{\percent} & \SI{99}{\percent} \\
149
\bottomrule
150
\end{tabular}
151
\caption{The number of principal components necessary to explain,
152
\SI{90}{\percent}, \SI{95}{\percent}, \SI{99}{\percent}
153
of the data.}
154
\label{table:pca-explained-variance}
155
\end{table}
156
157
The Pearson correlation coefficient was calculated for all features. The
158
features are more correlated the closer the pixels are together as one can see
159
in~\cref{fig:feature-correlation}. The block-like structure of every 32th
160
feature comes from the fact the features were flattened for this visualization.
161
The second diagonal to the right shows features which are one pixel down in the
162
image. Those correlations are expected as symbols are written by continuous
163
lines. Less easy to explain are the correlations between high-index
164
features with low-index features in the upper right corner of the image.
165
Those correlations correspond to features in the upper left corner with
166
features in the lower right corner. One explanation is that symbols which have
167
a line in the upper left corner are likely $\blacksquare$.
168
169
\begin{figure}[h]
170
\centering
171
\includegraphics*[width=\linewidth, keepaspectratio]{figures/feature-correlation.pdf}
172
\caption{Correlation of all $32 \cdot 32 = 1024$ features. The diagonal
173
shows the correlation of a feature with itself.}
174
\label{fig:feature-correlation}
175
\end{figure}
176
177
178
\section{Classification Challenge}
179
\subsection{Evaluation}
180
\dbName{} defines 10 folds which should be used for calculating the accuracy
181
of any classifier being evaluated on \dbName{} as follows:
182
183
\begin{algorithm}[H]
184
\begin{algorithmic}
185
\Function{CrossValidation}{Folds $F$}
186
\State $D \gets \cup_{i=1}^{10} F_i$\Comment{Complete Dataset}
187
\For{($i=0$; $\;i < 10$; $\;i$++)}
188
\State $A \gets D \setminus F_i$\Comment{Train set}
189
\State $B \gets F_i$\Comment{Test set}
190
\State Train Classifier $C_i$ on $A$
191
\State Calculate accuracy $a_i$ of $C_i$ on $B$
192
\EndFor
193
\State \Return ($\frac{1}{10}\sum_{i=1}^{10} a_i$, $\min(a_i)$, $\max(a_i)$)
194
\EndFunction
195
\end{algorithmic}
196
\caption{Calculate the mean accuracy, the minimum accuracy, and the maximum
197
accuracy with 10-fold cross-validation}
198
\label{alg:seq1}
199
\end{algorithm}
200
201
\subsection{Model Baselines}
202
Eight standard algorithms were evaluated by their accuracy on the raw image
203
data. The neural networks were implemented with
204
Tensorflow~0.12.1~\cite{tensorflow2015-whitepaper}. All other algorithms are
205
implemented in sklearn~0.18.1~\cite{scikit-learn}. \Cref{table:classifier-results}
206
shows the results of the models being trained and tested on MNIST and also for
207
\dbNameVersion{}:
208
\begin{table}[h]
209
\centering
210
\begin{tabular}{lrrr}
211
\toprule
212
\multirow{2}{*}{Classifier} & \multicolumn{3}{c}{Test Accuracy} \\%& \multirow{2}{*}{\parbox{1.2cm}{\centering HASY\\Test time}}\\
213
& MNIST & HASY & min -- max\hphantom{00 } \\\midrule% &
214
TF-CNN & \SI{99.20}{\percent} & \SI{81.0}{\percent} & \SI{80.6}{\percent} -- \SI{81.5}{\percent}\\% & \SI{3.1}{\second}\\
215
Random Forest & \SI{96.41}{\percent} & \SI{62.4}{\percent} & \SI{62.1}{\percent} -- \SI{62.8}{\percent}\\% & \SI{19.0}{\second}\\
216
MLP (1 Layer) & \SI{89.09}{\percent} & \SI{62.2}{\percent} & \SI{61.7}{\percent} -- \SI{62.9}{\percent}\\% & \SI{7.8}{\second}\\
217
LDA & \SI{86.42}{\percent} & \SI{46.8}{\percent} & \SI{46.3}{\percent} -- \SI{47.7}{\percent}\\% & \SI{0.2}{\second}\\
218
$k$-NN ($k=3$)& \SI{92.84}{\percent} & \SI{28.4}{\percent} & \SI{27.4}{\percent} -- \SI{29.1}{\percent}\\% & \SI{196.2}{\second}\\
219
$k$-NN ($k=5$)& \SI{92.88}{\percent} & \SI{27.4}{\percent} & \SI{26.9}{\percent} -- \SI{28.3}{\percent}\\% & \SI{196.2}{\second}\\
220
QDA & \SI{55.61}{\percent} & \SI{25.4}{\percent} & \SI{24.9}{\percent} -- \SI{26.2}{\percent}\\% & \SI{94.7}{\second}\\
221
Decision Tree & \SI{65.40}{\percent} & \SI{11.0}{\percent} & \SI{10.4}{\percent} -- \SI{11.6}{\percent}\\% & \SI{0.0}{\second}\\
222
Naive Bayes & \SI{56.15}{\percent} & \SI{8.3}{\percent} & \SI{7.9}{\percent} -- \hphantom{0}\SI{8.7}{\percent}\\% & \SI{24.7}{\second}\\
223
AdaBoost & \SI{73.67}{\percent} & \SI{3.3}{\percent} & \SI{2.1}{\percent} -- \hphantom{0}\SI{3.9}{\percent}\\% & \SI{9.8}{\second}\\
224
\bottomrule
225
\end{tabular}
226
\caption{Classification results for eight classifiers.
227
% The test time is the time needed for all test samples in average.
228
The number of
229
test samples differs between the folds, but is $\num{16827} \pm
230
166$. The decision tree was trained with a maximum depth of~5. The
231
exact structure of the CNNs is explained
232
in~\cref{subsec:CNNs-Classification}. For $k$ nearest neighbor,
233
the amount of samples per class had to be reduced to 50 for HASY
234
due to the extraordinary amount of testing time this algorithm
235
needs.}
236
\label{table:classifier-results}
237
\end{table}
238
239
The following observations are noteworthy:
240
\begin{itemize}
241
\item All algorithms achieve much higher accuracy on MNIST than on
242
\dbNameVersion{}.
243
\item While a single Decision Tree performs much better on MNIST than
244
QDA, it is exactly the other way around for~\dbName{}. One possible
245
explanation is that MNIST has grayscale images while \dbName{} has
246
black and white images.
247
\end{itemize}
248
249
250
\subsection{Convolutional Neural Networks}\label{subsec:CNNs-Classification}
251
Convolutional Neural Networks (CNNs) are state of the art on several computer
252
vision benchmarks like MNIST~\cite{wan2013regularization}, CIFAR-10, CIFAR-100
253
and SVHN~\cite{huang2016densely},
254
ImageNet~2012~\cite{deep-residual-networks-2015} and more. Experiments on
255
\dbNameVersion{} without preprocessing also showed that even the
256
simplest CNNs achieve much higher accuracy on \dbNameVersion{} than all other
257
classifiers (see~\cref{table:classifier-results}).
258
259
\Cref{table:cnn-results} shows the 10-fold cross-validation results for four
260
architectures.
261
\begin{table}[H]
262
\centering
263
\begin{tabular}{lrrrr}
264
\toprule
265
\multirow{2}{*}{Network} & \multirow{2}{*}{Parameters} & \multicolumn{2}{c}{Test Accuracy} & \multirow{2}{*}{Time} \\
266
& & mean & min -- max\hphantom{00 } & \\\midrule
267
2-layer & \num{3023537} & \SI{73.8}{\percent} & \SI{72.9}{\percent} -- \SI{74.3}{\percent} & \SI{1.5}{\second}\\
268
3-layer & \num{1530609} & \SI{78.4}{\percent} & \SI{77.6}{\percent} -- \SI{79.0}{\percent} & \SI{2.4}{\second}\\
269
4-layer & \num{848753} & \SI{80.5}{\percent} & \SI{79.2}{\percent} -- \SI{80.7}{\percent} & \SI{2.8}{\second}\\
270
TF-CNN & \num{4592369} & \SI{81.0}{\percent} & \SI{80.6}{\percent} -- \SI{81.5}{\percent} & \SI{2.9}{\second}\\
271
\bottomrule
272
\end{tabular}
273
\caption{Classification results for CNN architectures. The test time is,
274
as before, the mean test time for all examples on the ten folds.}
275
\label{table:cnn-results}
276
\end{table}
277
The following architectures were evaluated:
278
\begin{itemize}
279
\item 2-layer: A convolutional layer with 32~filters of size $3 \times 3 \times 1$
280
is followed by a $2 \times 2$ max pooling layer with stride~2. The output
281
layer is --- as in all explored CNN architectures --- a fully
282
connected softmax layer with 369~neurons.
283
\item 3-layer: Like the 2-layer CNN, but before the output layer is another
284
convolutional layer with 64~filters of size $3 \times 3 \times 32$
285
followed by a $2 \times 2$ max pooling layer with stride~2.
286
\item 4-layer: Like the 3-layer CNN, but before the output layer is another
287
convolutional layer with 128~filters of size $3 \times 3 \times 64$
288
followed by a $2 \times 2$ max pooling layer with stride~2.
289
\item TF-CNN: A convolutional layer with 32~filters of size $3 \times 3 \times 1$
290
is followed by a $2 \times 2$ max pooling layer with stride~2.
291
Another convolutional layer with 64~filters of size $3 \times 3 \times 32$
292
and a $2 \times 2$ max pooling layer with stride~2 follow. A fully
293
connected layer with 1024~units and tanh activation function, a
294
dropout layer with dropout probability 0.5 and the output softmax
295
layer are last. This network is described in~\cite{tf-mnist}.
296
\end{itemize}
297
298
For all architectures, ADAM~\cite{kingma2014adam} was used for training. The
299
combined training and testing time was always less than 6~hours for the 10~fold
300
cross-validation on a Nvidia GeForce GTX Titan Black with CUDA~8 and CuDNN~5.1.
301
\clearpage
302
\subsection{Class Difficulties}
303
The class-wise accuracy
304
\[\text{class-accuracy}(c) = \frac{\text{correctly predicted samples of class } c}{\text{total number of training samples of class } c}\]
305
is used to estimate how difficult a class is.
306
307
32~classes were not a single time classified correctly by TF-CNN and hence have
308
a class-accuracy of~0. They are shown in~\cref{table:hard-classes}. Some, like
309
\verb+\mathsection+ and \verb+\S+ are not distinguishable at all. Others, like
310
\verb+\Longrightarrow+ and
311
\verb+\Rightarrow+ are only distinguishable in some peoples handwriting.
312
Those classes account for \SI{2.8}{\percent} of the data.
313
314
\begin{table}[h]
315
\centering
316
\begin{tabular}{lcrlc}
317
\toprule
318
\LaTeX & Rendered & Total & Confused with & \\\midrule
319
\verb+\mid+ & $\mid$ & 34 & \verb+|+ & $|$ \\
320
\verb+\triangle+ & $\triangle$ & 32 & \verb+\Delta+ & $\Delta$ \\
321
\verb+\mathds{1}+ & $\mathds{1}$ & 32 & \verb+\mathbb{1}+ & \includegraphics{symbols/mathbb1.pdf} \\
322
\verb+\checked+ & {\mbox {\wasyfamily \char 8}} & 28 & \verb+\checkmark+ & $\checkmark$ \\
323
\verb+\shortrightarrow+ & $\shortrightarrow$ & 28 & \verb+\rightarrow+ & $\rightarrow$ \\
324
\verb+\Longrightarrow+ & $\Longrightarrow$ & 27 & \verb+\Rightarrow+ & $\Rightarrow$ \\
325
\verb+\backslash+ & $\backslash$ & 26 & \verb+\setminus+ & $\setminus$ \\
326
\verb+\O+ & \O & 24 & \verb+\emptyset+ & $\emptyset$ \\
327
\verb+\with+ & $\with$ & 21 & \verb+\&+ & $\&$ \\
328
\verb+\diameter+ & {\mbox {\wasyfamily \char 31}} & 20 & \verb+\emptyset+ & $\emptyset$ \\
329
\verb+\triangledown+ & $\triangledown$ & 20 & \verb+\nabla+ & $\nabla$ \\
330
\verb+\longmapsto+ & $\longmapsto$ & 19 & \verb+\mapsto+ & $\mapsto$ \\
331
\verb+\dotsc+ & $\dotsc$ & 15 & \verb+\dots+ & $\dots$ \\
332
\verb+\fullmoon+ & {\mbox {\wasyfamily \char 35}} & 15 & \verb+\circ+ & $\circ$ \\
333
\verb+\varpropto+ & $\varpropto$ & 14 & \verb+\propto+ & $\propto$ \\
334
\verb+\mathsection+ & $\mathsection$ & 13 & \verb+\S+ & $\S$ \\
335
\verb+\vartriangle+ & $\vartriangle$ & 12 & \verb+\Delta+ & $\Delta$ \\
336
\verb+O+ & $O$ & 9 & \verb+\circ+ & $\circ$ \\
337
\verb+o+ & $o$ & 7 & \verb+\circ+ & $\circ$ \\
338
\verb+c+ & $c$ & 7 & \verb+\subset+ & $\subset$ \\
339
\verb+v+ & $v$ & 7 & \verb+\vee+ & $\vee$ \\
340
\verb+x+ & $x$ & 7 & \verb+\times+ & $\times$ \\
341
\verb+\mathbb{Z}+ & $\mathbb{Z}$ & 7 & \verb+\mathds{Z}+ & $\mathds{Z}$ \\
342
\verb+T+ & $T$ & 6 & \verb+\top+ & $\top$ \\
343
\verb+V+ & $V$ & 6 & \verb+\vee+ & $\vee$ \\
344
\verb+g+ & $g$ & 6 & \verb+9+ & $9$ \\
345
\verb+l+ & $l$ & 6 & \verb+|+ & $|$ \\
346
\verb+s+ & $s$ & 6 & \verb+\mathcal{S}+ & $\mathcal{S}$ \\
347
\verb+z+ & $z$ & 6 & \verb+\mathcal{Z}+ & $\mathcal{Z}$ \\
348
\verb+\mathbb{R}+ & $\mathbb{R}$ & 6 & \verb+\mathds{R}+ & $\mathds{R}$ \\
349
\verb+\mathbb{Q}+ & $\mathbb{Q}$ & 6 & \verb+\mathds{Q}+ & $\mathds{Q}$ \\
350
\verb+\mathbb{N}+ & $\mathbb{N}$ & 6 & \verb+\mathds{N}+ & $\mathds{N}$ \\
351
\bottomrule
352
\end{tabular}
353
\caption{32~classes which were not a single time classified correctly by
354
the best CNN.}
355
\label{table:hard-classes}
356
\end{table}
357
358
In contrast, 21~classes have an accuracy of more than \SI{99}{\percent} with
359
TF-CNN (see~\cref{table:easy-classes}).
360
361
\begin{table}[h]
362
\centering
363
\begin{tabular}{lcr}
364
\toprule
365
\LaTeX & Rendered & Total\\\midrule
366
\verb+\forall + & $\forall $ & 214 \\
367
\verb+\sim + & $\sim $ & 201 \\
368
\verb+\nabla + & $\nabla $ & 122 \\
369
\verb+\cup + & $\cup $ & 93 \\
370
\verb+\neg + & $\neg $ & 85 \\
371
\verb+\setminus + & $\setminus $ & 52 \\
372
\verb+\supset + & $\supset $ & 42 \\
373
\verb+\vdots + & $\vdots $ & 41 \\
374
\verb+\boxtimes + & $\boxtimes $ & 22 \\
375
\verb+\nearrow + & $\nearrow $ & 21 \\
376
\verb+\uplus + & $\uplus $ & 19 \\
377
\verb+\nvDash + & $\nvDash $ & 15 \\
378
\verb+\AE + & \AE & 15 \\
379
\verb+\Vdash + & $\Vdash $ & 14 \\
380
\verb+\Leftarrow + & $\Leftarrow $ & 14 \\
381
\verb+\upharpoonright+ & $\upharpoonright$ & 14 \\
382
\verb+- + & $- $ & 12 \\
383
\verb+\guillemotleft + & $\guillemotleft $ & 11 \\
384
\verb+R + & $R $ & 9 \\
385
\verb+7 + & $7 $ & 8 \\
386
\verb+\blacktriangleright+ & $\blacktriangleright$ & 6 \\
387
\bottomrule
388
\end{tabular}
389
\caption{21~classes with a class-wise accuracy of more than \SI{99}{\percent}
390
with TF-CNN.}
391
\label{table:easy-classes}
392
\end{table}
393
394
395
\section{Verification Challenge}
396
In the setting of an online symbol recognizer like
397
\href{http://write-math.com}{write-math.com} it is important to recognize when
398
the user enters a symbol which is not known to the classifier. One way to achieve
399
this is by training a binary classifier to recognize when two recordings belong to
400
the same symbol. This kind of task is similar to face verification.
401
Face verification is the task where two images with faces are given and it has
402
to be decided if they belong to the same person.
403
404
For the verification challenge, a training-test split is given. The training
405
data contains images with their class labels. The test set
406
contains 32~symbols which were not seen by the classifier before. The elements
407
of the test set are pairs of recorded handwritten symbols $(r_1, r_2)$. There
408
are three groups of tests:
409
\begin{enumerate}[label=V\arabic*]
410
\item $r_1$ and $r_2$ both belong to symbols which are in the training set,
411
\item $r_1$ belongs to a symbol in the training set, but $r_2$
412
might not
413
\item $r_1$ and $r_2$ don't belong symbols in the training set
414
\end{enumerate}
415
416
When evaluating models, the models may not take advantage of the fact that it
417
is known if a recording $r_1$ / $r_2$ is an instance of the training symbols.
418
For all test sets, the following numbers should be reported: True Positive (TP),
419
True Negative (TN), False Positive (FP), False Negative (FN),
420
Accuracy $= \frac{TP+ TN}{TP+TN+FP+FN}$.
421
422
423
% \section{Open Questions}
424
425
% There are still a couple of open questions about \dbNameVersion:
426
427
% \begin{enumerate}
428
% \item What is the accuracy of human expert labelers?
429
% \item What is the variance between human experts labeling the samples?
430
% \end{enumerate}
431
432
433
\section{Acknowledgment}
434
435
I want to thank \enquote{Begabtenstiftung Informatik Karls\-ruhe}, the Foundation
436
for Gifted Informatics Students in Karlsruhe. Their support helped me to write
437
this work.
438
439