Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
drgnfrts
GitHub Repository: drgnfrts/Singapore-Locations-NER
Path: blob/main/CHANGELOG.md
744 views

CHANGELOG

NOTE: This Project is complete and further updates are unlikely

Version History

[1.1.1] - 🗓️ 10/06/2022

Added

  • New folders to organise the Training and Validation Annotation Datasets

Changed

  • Updated scripts for converting annotation datasets from Doccano to spaCy v2 & spaCy v2 to v3

  • Put up notice in scripts warning against overwriting existing datasets, especially the Evaluation Dataset

  • Updated source of training and validation data for config files handling training for Models v2.0, 3.0 and 3.1

Deprecated

  • Geocoding Plan - project no longer continuing, placed in archives folder

[1.1.0] - 🗓️ 08/06/2022

Added

  • New dataset of locations text & annotations (suffixed with _v3.1)

  • model_v3.1 , a new Enhanced NER-centric model trained on a larger (1598 > 2180) dataset

  • "Gold Standard" Evaluation Annotated Dataset & Evaluation Script (golden_set.spacy)

Changed

  • Updated Streamlit Mini-App script to enable users to run Model 3.1 on Streamlit

  • Updated Documentation for Model 3.1 + how to evaluate with the "Gold Standard" set

[1.0.4] - 🗓️ 11/05/2022

Added

  • Script to run POST requests for NER with Model v3.0, built with FastAPI

[1.0.3] - 🗓️ 04/05/2022

Added

  • Test Jupyter Notebook and a Plan Outline for Geocoding Locations phase of Project

  • Proper CHANGELOG.md

Changed

[1.0.2] - 🗓️ 28/04/2022

Added

[1.0.1] - 🗓️ 24/04/2022

Changed

Fixed

  • Optimised Training Scripts and Streamlit Mini-App Script

Removed

  • Redundant file data/training_datasets/train_data_er.json

[1.0.0] - 🗓️ 14/04/2022

Added

Changed

  • Took down and reuploaded repo to remove residual Git LFS files from model_v1.0

Fixed

  • requirements.txt file, following switchover to standard virtualenv environment. File lacks packages for newspaper3k, wikipediaapi and doccano to enable functioning of Streamlit mini-app.

  • Streamlit mini-app now fully functional

[0.4.0] - 🗓️ 29/03/2022

Added

  • model_v3.0, an Enhanced NER-Centric Model

  • Scripts used for model training of model_v3.0, including scripts used to handle and convert training data from Doccano

  • Training datasets for model_v3.0. File names end with the _doccano suffix before their file extensions label

  • Script to run Streamlit mini-app

  • model_v1.1, a re-trained version of model_v1.0 without the excessively large vector file. Still fundementally useless, but much smaller in size now.

Changed

  • Reorganised data subfolder and its internal subfolders

  • Names of files containing training datasets for the Dictionary-Centric Model appended with _er suffix before the file extension labels

Removed

  • model_v1.0 - excessively large vector file was clogging the repo

[0.3.0] - 🗓️ 10/03/2022

Added

  • model_v2.1, the Third (final) Dictionary-Centric NER Model. Model uses the model_v2.0 as a base, with an EntityRuler pipe added to function as a "Dictionary of Locations" to find and match locations in text

Changed

  • First Model and Second Model renamed to model_v1.0 and model_v2.0.

  • Scripts for data cleaning and model training shifted to new training_scripts subfolder.

  • Configuration files for model training shifted to new training_config subfolder and renamed by model version number.

[0.2.0] - 🗓️ 03/03/2022

Added

  • Second Dictionary-Centric NER Model custom-trained for Singapore Locations. Utilises tok2vec, tagger & parser from spaCy's pre-built en_core_web_md, and a custom-trained NER pipe.

  • requirements.txt file. Anaconda environment used for development of project, future updates will further refine the file.

    • config.cfg and base_config.cfg for Second Model.

[0.1.0] - 🗓️ 24/02/2022

Added

  • First iteration of scripts used to clean data for Dictionary-Centric Model

  • Locations Data

  • First Dictionary-Centric NER Model. Consists of tok2vec and ner pipes

  • config.cfg and base_config.cfg for First Model