GitHub Repository: drgnfrts/Singapore-Locations-NER
Path: blob/main/CHANGELOG.md
⁷⁴⁴ views

CHANGELOG

NOTE: This Project is complete and further updates are unlikely

Version History

[1.1.1] - 🗓️ 10/06/2022

Added

New folders to organise the Training and Validation Annotation Datasets

Changed

Updated scripts for converting annotation datasets from Doccano to spaCy v2 & spaCy v2 to v3
Put up notice in scripts warning against overwriting existing datasets, especially the Evaluation Dataset
Updated source of training and validation data for config files handling training for Models v2.0, 3.0 and 3.1

Deprecated

Geocoding Plan - project no longer continuing, placed in archives folder

[1.1.0] - 🗓️ 08/06/2022

Added

New dataset of locations text & annotations (suffixed with _v3.1)
model_v3.1 , a new Enhanced NER-centric model trained on a larger (1598 > 2180) dataset
"Gold Standard" Evaluation Annotated Dataset & Evaluation Script (golden_set.spacy)

Changed

Updated Streamlit Mini-App script to enable users to run Model 3.1 on Streamlit
Updated Documentation for Model 3.1 + how to evaluate with the "Gold Standard" set

[1.0.4] - 🗓️ 11/05/2022

Added

Script to run POST requests for NER with Model v3.0, built with FastAPI

[1.0.3] - 🗓️ 04/05/2022

Added

Test Jupyter Notebook and a Plan Outline for Geocoding Locations phase of Project
Proper CHANGELOG.md

Changed

Streamlined readme.md

[1.0.2] - 🗓️ 28/04/2022

Added

LICENSE.md

[1.0.1] - 🗓️ 24/04/2022

Changed

Finalised first version of documentation.md
Updated readme.md
Reorganised certain files

Fixed

Optimised Training Scripts and Streamlit Mini-App Script

Removed

Redundant file data/training_datasets/train_data_er.json

[1.0.0] - 🗓️ 14/04/2022

Added

documentation subfolder and documentation.md

Changed

Took down and reuploaded repo to remove residual Git LFS files from model_v1.0

Fixed

requirements.txt file, following switchover to standard virtualenv environment. File lacks packages for newspaper3k, wikipediaapi and doccano to enable functioning of Streamlit mini-app.
Streamlit mini-app now fully functional

[0.4.0] - 🗓️ 29/03/2022

Added

model_v3.0, an Enhanced NER-Centric Model
Scripts used for model training of model_v3.0, including scripts used to handle and convert training data from Doccano
Training datasets for model_v3.0. File names end with the _doccano suffix before their file extensions label
Script to run Streamlit mini-app
model_v1.1, a re-trained version of model_v1.0 without the excessively large vector file. Still fundementally useless, but much smaller in size now.

Changed

Reorganised data subfolder and its internal subfolders
Names of files containing training datasets for the Dictionary-Centric Model appended with _er suffix before the file extension labels

Removed

model_v1.0 - excessively large vector file was clogging the repo

[0.3.0] - 🗓️ 10/03/2022

Added

model_v2.1, the Third (final) Dictionary-Centric NER Model. Model uses the model_v2.0 as a base, with an EntityRuler pipe added to function as a "Dictionary of Locations" to find and match locations in text

Changed

First Model and Second Model renamed to model_v1.0 and model_v2.0.
Scripts for data cleaning and model training shifted to new training_scripts subfolder.
Configuration files for model training shifted to new training_config subfolder and renamed by model version number.

[0.2.0] - 🗓️ 03/03/2022

Added

Second Dictionary-Centric NER Model custom-trained for Singapore Locations. Utilises tok2vec, tagger & parser from spaCy's pre-built en_core_web_md, and a custom-trained NER pipe.
requirements.txt file. Anaconda environment used for development of project, future updates will further refine the file.
- config.cfg and base_config.cfg for Second Model.

[0.1.0] - 🗓️ 24/02/2022

Added

First iteration of scripts used to clean data for Dictionary-Centric Model
Locations Data
First Dictionary-Centric NER Model. Consists of tok2vec and ner pipes
config.cfg and base_config.cfg for First Model

CHANGELOG

Version History

[1.1.1] - 🗓️ 10/06/2022

Added

Changed

Deprecated

[1.1.0] - 🗓️ 08/06/2022

Added

Changed

[1.0.4] - 🗓️ 11/05/2022

Added

[1.0.3] - 🗓️ 04/05/2022

Added

Changed

[1.0.2] - 🗓️ 28/04/2022

Added

[1.0.1] - 🗓️ 24/04/2022

Changed

Fixed

Removed

[1.0.0] - 🗓️ 14/04/2022

Added

Changed

Fixed

[0.4.0] - 🗓️ 29/03/2022

Added

Changed

Removed

[0.3.0] - 🗓️ 10/03/2022

Added

Changed

[0.2.0] - 🗓️ 03/03/2022

Added

[0.1.0] - 🗓️ 24/02/2022

Added

Product

Resources

Company