Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
drgnfrts
GitHub Repository: drgnfrts/Singapore-Locations-NER
Path: blob/main/training_scripts/trial_notebook.ipynb
744 views
Kernel: Python 3.8.12 ('spacyenv')

Trial Notebook

This notebook stores all the code that I'm cobbling together or trying out before

import re import csv abbreviation_dictionary = [] with open("test_dict.csv", "r") as csv_file: csvtest = csv.reader(csv_file, delimiter=",") for row in csvtest: abbreviation_dictionary.append(row) def lengthen_abbreviations(text): split = re.findall(r"[\w']+|[.,!?;&] | |-", text) print(split) i = 0 for word in split: for row in abbreviation_dictionary: check_column = 0 while check_column < 4: if word == "": split[i] = '' elif word == row[check_column]: split[i] = row[3] check_column += 1 csv_file.close() i += 1 cleaned_text = ''.join(split) return cleaned_text lengthen_abbreviations("I stay at CCK Ave 3. My route to URA involves taking bus 975 to CCK Stn, then taking the NSL to JE. At JE I change to the PSR-bound train& taking the EWL to TGP.")
['I', ' ', 'stay', ' ', 'at', ' ', 'CCK', ' ', 'Ave', ' ', '3', '. ', 'My', ' ', 'route', ' ', 'to', ' ', 'URA', ' ', 'involves', ' ', 'taking', ' ', 'bus', ' ', '975', ' ', 'to', ' ', 'CCK', ' ', 'Stn', ', ', 'then', ' ', 'taking', ' ', 'the', ' ', 'NSL', ' ', 'to', ' ', 'JE', '. ', 'At', ' ', 'JE', ' ', 'I', ' ', 'change', ' ', 'to', ' ', 'the', ' ', 'PSR', '-', 'bound', ' ', 'train', '& ', 'taking', ' ', 'the', ' ', 'EWL', ' ', 'to', ' ', 'TGP']
'I stay at Choa Chu Kang Avenue 3. My route to Urban Redevelopment Authority involves taking bus 975 to Choa Chu Kang Station, then taking the North South Line to Jurong East. At Jurong East I change to the Pasir Ris-bound train& taking the East West Line to Tanjong Pagar'