📚 The CoCalc Library - books, templates and other resources
cocalc-examples / martinthoma-latex-examples / documents / bachelor-proposal-latex-writing-recognition / bachelor-proposal-latex-writing-recognition.tex
132928 viewsLicense: OTHER
\documentclass[a4paper]{scrartcl}1\usepackage{amssymb, amsmath} % needed for math2\usepackage[utf8]{inputenc} % this is needed for umlauts3\usepackage[english]{babel} % this is needed for umlauts4\usepackage[T1]{fontenc} % this is needed for correct output of umlauts in pdf5\usepackage[margin=2.5cm]{geometry} %layout6\usepackage{hyperref} % links im text7\usepackage{color}8\usepackage{framed}9\usepackage{enumerate} % for advanced numbering of lists10\usepackage{csquotes}11\usepackage{ifxetex,ifluatex}12\usepackage{etoolbox}13\usepackage[svgnames]{xcolor}14\usepackage{tikz}15\usepackage{framed}16\usepackage{parskip}17\usepackage{cite}18\usepackage{fancyref}19\usepackage{mystyle}20\clubpenalty = 10000 % Schusterjungen verhindern21\widowpenalty = 10000 % Hurenkinder verhindern2223\hypersetup{24pdfauthor = {Martin Thoma},25pdfkeywords = {Bachelor proposal, LaTeX, handwriting recognition},26pdftitle = {Proposal for a Bachelor of Science Thesis:\\Interactive on-line handwriting recognition of mathematical formulae}27}2829%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3031\begin{document}32\title{Proposal for a Bachelor of Science Thesis:\\Interactive on-line handwriting recognition of mathematical formulae}33\author{Martin Thoma}34\maketitle35\section{The problem background}36There are people who don't know how to write even37simple mathematical formulae with \LaTeX{} like38\[\pi/\alpha=\sum_{n=-\infty}^\infty \frac{\sin^2 (c+n)\alpha}{(c+n)^2}=\int_{-\infty}^\infty \frac{\sin^2 (c+n)\alpha}{(c+n)^2}\, \text{d}n\]39or who need much time to do so. Currently, there are several online40services, programs and apps that help to write mathematical41formulae, but all programs I know have serious disadvantages:42\begin{itemize}43\item \href{http://detexify.kirelabs.org/classify.html}{detexify.kirelabs.org}44recognizes \textbf{only symbols},45\item the formula editor of LibreOffice Writer 3.6 as shown46in \Fref{fig:libre-office-3.6} offers some47guidance by grouping common operations while showing48a WYSIWYG editor, but it has \textbf{no handwriting recognition}.49Another drawback is the fact that it is \textbf{not available50as an online service}, so you have to install LibreOffice51which might not be possible on all devices.52\item The \enquote{Daum Equation Editor} (see \Fref{fig:daum-editor}) is available online53and offers guidance through the creation of equations,54but does not offer handwriting recognition. Although55it might be OpenSource, the \textbf{source code is difficult to56find}. This means if you want to improve the recognition,57it is not possible. It also makes use of Adobe Flash58which is not available on many smartphones and tablet59computers.60\item Maple seems to offer handwritten symbol recognition (\href{http://www.maplesoft.com/products/maple/features/handwritten.aspx}{source}),61but on the one hand I was not able to test that, because62it is \textbf{not available for free}. On the other hand you63have to install additional software, it seems not to be64available for tablet computers and it does only recognize65single symbols.66\item Wolfram Mathematica seems to be able to do complete67formula recognition at least for simple formulae (\href{http://reference.wolfram.com/mathematica/tutorial/HandwrittenMathRecognition.html}{source})68by using Microsoft's \href{http://windows.microsoft.com/en-ph/windows7/use-math-input-panel-to-write-and-correct-math-equations}{Math Input Panel},69but this is neither OpenSource nor available as an70online service. Additionally it is not71available for Linux systems, so I can't test it.72\end{itemize}7374A more comprehensive list can be found at \href{https://en.wikipedia.org/wiki/Formula_editor}{https://en.wikipedia.org/wiki/Formula\_editor}.75A problem of some of the projects presented there is that they76require the client to execute Java Applets which is a security77risk.7879\begin{figure}[h]80\centering81\includegraphics*[width=5cm, keepaspectratio]{figures/libreoffice-writer.png}82\caption{LibreOffice Writer 3.6 - Formula Editor}83\label{fig:libre-office-3.6}84\end{figure}8586\begin{figure}[h]87\centering88\includegraphics*[width=15cm, keepaspectratio]{figures/daum-editor.png}89\caption{Daum Equation editor}90\label{fig:daum-editor}91\end{figure}92\break93\section{The problem statement}94What I would like to have is an interactive on-line handwriting95recognition service, that is available as a web service which makes96use of touchscreens. Additionally, it should be for free and97OpenSource, the source code should be easy to find and documented.98This means:99\begin{itemize}100\item \textbf{Service}: The program can be accessed over the web, so101that the user does only have to have a modern browser.102As a consequence, the software could be used with any103device that has a touch screen.104\item \textbf{On-line handwriting recognition}: The service105starts recognizing while the user enters a formula.106\item \textbf{Interactive}: The service offers symbols and constructs107to the user before the user starts typing. These suggestions108might change depending on what the user has typed before.109\item \textbf{OpenSource}: Any license in this list: \href{http://opensource.org/licenses}{http://opensource.org/licenses}110\item \textbf{Easy to find}: Ideally, the project should have111an own domain that contains the source code, the service112and documentation. But it might be enough to provide113an email address to a developer within the top of114of the source code of the delivered HTML document.115\end{itemize}116117This service should also encourage the users by techniques118of \enquote{Gamification} to give as much119meta information about their formulae as possible:120\begin{itemize}121\item Which problem domain does the formula belong to, e.~g. \enquote{Euclidean geometry}, \enquote{analysis} or \enquote{calculus}?122\item Does the formula itself have a name, e.~g. \enquote{Pythagorean theorem}, \enquote{Fibonacci numbers} or \enquote{geometric series}?123\end{itemize}124125This information should be used to create a formula database.126127\section{Significance}128For me as a Linux user, there no software that I can test and which129offers on-line, interactive math handwriting recognition. But the130need of such a software is there.131132But there are more reasons why this bachelor's thesis matters:133Projects like \LaTeX{}, Linux, Apache or Firefox have shown that134OpenSource software can enrich the development in specific areas. The135\enquote{Browser Wars} might be the most famous result of an active136OpenSource community. Internet Explorer 6 had137a market share of over 80\% in 2003. Prequels of Firefox and the Mozilla138foundation already existed, but Firefox 1.0 was released not until139November 2004. After that, Firefox and other open browsers added many140features that Internet Explorer had to compete with, like tabbed browsing,141HTML4 standard conformance, support of the \texttt{<canvas>} tag and142speed of HTML rendering and JavaScript execution.\footnote{\href{http://www.evolutionoftheweb.com/}{www.evolutionoftheweb.com} offers a graphical overview. Although supporting standards like HTML4 or CSS~2 is not done with one version, but rather an incremental process.} Some of these143questions are interesting for science such as many problems related144to layouts and just-in-time compilation (JIT). With OpenSource software145that makes it easy to find its source and offers good documentation,146researchers can simply try their ideas without being blocked by147having to try to access the source code.148149Additionally, such a project might give researchers more time to150concentrate on the tasks they really want to do rather than spending151hours by learning \LaTeX{}.152153One last reason why this thesis matters is the formula database that154gets created by users. This database might be used in follow-up work,155e.~g. a formula spotter for presentations or a math detector for speech.156157\section{Time schedule}158\begin{itemize}159\item[70h] Literature research about on-line handwriting recognition160techniques and Gamification.161\item[5h] Defining browsers and devices that should get supported162and required client side software like HTML5, CSS 3163and ECMAScript (better known as JavaScript). Also,164required input methods like touchscreens and stylus165should be mentioned.166\item[20h] Writing use cases. This is includes writing example167formula that the user should type and the system should168be able to recognize; finding people with different169knowledge of \LaTeX{} and from different fields who170want to participate in user tests.171\item[60h] Implementing the core of the application: Handwriting172recognition of digits and symbols by using only173HTML, CSS and on the client side. This includes implementing174a way for the user to enter new symbols and to correct the175symbol that was suggested by the recognition system.176\item[20h] Introduce testers that already know \LaTeX{} to the177current system. At this point, the system does only do178symbol recognition. The testers should train it,179insert symbols like $a-z, A-Z, 0-9, \alpha-\omega, A-\Omega, \cdot, \circ, \dots$180\item[10h] Get feedback by the users. This feedback will not be included181in the thesis, but the improvements will get documented.182\item[60h] Finding structures and ways how to enter them. Examples183of structures that can be nested are sums:184\begin{verbatim}\sum_{<some structure>}^{<another structure>} <a third structure>\end{verbatim}185Implement the recognition of those structures.186\item[30h] Observe \enquote{fresh} testers while they try to use187the system.188\item[70h] Improving the software to fix problems that were found189with user tests190\item[50h] Fix bugs, improve code quality and readability as well191as documentation.192\item[45h] Usability testing: Try Hallway testing. The results193of these tests get documented and will be part of the194bachelor's thesis. If possible, I would like195to let the testers use their own devices.196\item[10h] Mentioning open questions and ideas how they could be197analyzed with the service that was created.198\end{itemize}199200\section{Outline}201I have described in which steps I would like to write the software,202but almost all points include writing the bachelor's thesis document.203A first draft of the outline could be like this:204205\begin{enumerate}206\item Introduction207\item Definitions208\begin{enumerate}209\item Hardware: What is available and what is the distribution?210\item Software: What is available and what is the distribution?211\item Support of standards like HTML, CSS, ECMA-Script, Flash, Cookies, ...212\item Choice of hardware, software and standards that should get supported as well as the choice of Libraries and the required server-side software213\item Application to the domain of math recognition214\end{enumerate}215\item On-line handwriting techniques216\begin{enumerate}217\item Description of techniques in general218\item Application to the domain of math recognition219\end{enumerate}220\item Gamification techniques221\begin{enumerate}222\item Description of techniques in general223\item Application to the domain of math recognition in the web224\end{enumerate}225\item Software Project226\begin{enumerate}227\item Structure of the code228\item Availability of documentation229\item Availability of the service230\end{enumerate}231\item Summary232\begin{enumerate}233\item Future Work234\end{enumerate}235\end{enumerate}236\break237238\renewcommand\refname{Related Literature}239\nocite{*}240\bibliographystyle{itmalpha}241\bibliography{literatur}242243This literature list is only a list that seems to make sense to me244by now. As I proceed I might find more useful sources for the different245topics. So I might add, but also remove elements from this list.246Especially for Gamification I might read documents from247\href{http://gamification-research.org/}{gamification-research.org}.248\end{document}249250251