One of my most devastating afflictions is that I am a developer. Once I get an idea it gnaws at my brain until I either find another project or scratch that itch. Taggloo is that itch.
The Taggloo site was an experiment. It suffered the worst possible fate on the web: it was used by other people. What started out as a small site for me to combine my love of .NET and Manx Gaelic became a useful tool which cost me money but – worse – time. It conflicted with my life, family and worst of all, my role as Scout Leader. So I had to make the difficult decision to shut it down.
My head does not let sleeping dogs lie. I am reviving thoughts and ideas on how Taggloo could be useful. But not in the form it was in. By combining the dataset with AI and sticking that behind an API, I can scratch multiple itches.
I’m currently running through a course on FastAI. I am enthused by one of the myths they dispel: you do not need lots of data. This contradicts my previous belief that the Manx Gaelic corpus is just not big enough. Nor is it modern enough, with a lot of the available corpus being in “old Manx Gaelic”, like the Bible. One of Taggloo’s ambitions was to catalogue modern Manx Gaelic. It aimed to add to the corpus by mining social media like Facebook and Twitter. Zuck and Musk put a stop to that when they turned off their APIs, becoming less open.
This is me thinking aloud, please comment to tell me I’m wrong (or right)!
My understanding of how machine-learning works is by understanding the relationship of tokens between each other in a data-set. Simple.
So in English, one could write:
I like living on the Isle of Man
Where the token relationships may be:
- “I”
- “I like”
- “I like living”
- “live living”
- “on the”
- “the Isle of Man”
- “Isle of Man”
This is all simplistic and I’m sure this can be broken down even further.
In Manx Gaelic, this would be:
S’mie lhiam cummal ayns Mannin
So the tokens would be:
- “S’”
- “S’mie”
- “S’mie lhiam”
- “lhiam”
- “lhiam cummal”
- “cummal”
- “cummal ayns”
- “ayns Mannin”
- “Mannin”
So given that these words/tokens often go together one could derive the next word, and create inference based on the probability of the words being alongside each other or within the same sentence as other words:
- S’
- mie
- lhiam
- cummal
- ayns
- Mannin
- ayns
- cummal
- lhiam
- mie
This looks possible for an auto-correct like interface. It can predict the next word within the same language. You might consider inferring mutations found in Gaelic languages. For example, “Mannin” can become “Vannin” in “Ellan Vannin”. But what about where you need to translate between languages?
| Sentence | Language | Meaning | Literal meaning |
| I like living on the Isle of Man | English | I like living on the Isle of Man | I like living on the Isle of Man |
| S’mie lhiam cummal ayns Mannin | Manx Gaelic | I like living in Mannin (being the Isle of Man) | Is (emphatic) good with me live in Mannin |
In my FastAI learnings so far, I’ve been covering image recognition. I have been using machine learning to categorise images based on what is in the training set, which seems like it should be more complex than language. I just haven’t got there yet or the lightning bolt hasn’t struck.
My model is available at hugging Face if you find the need to distinguish a photograph of a cat from a dog,
https://huggingface.co/spaces/programx360/fastai-chapter2-v2
Another challenge I’m working with is AI and the prevalence/popularity of Python. Except I’m a .NET developer. Once I figure more out, perhaps I’ll be able to roll my own. Or at least use an API and JavaScript. This podcast episode convinces I’m not going to be all alone in C#.
That said, I am definitely liking the Jupyter Notebooks, which allow you to drop in Python scripts and annotate that script with markdown, providing as as-you-go development plan. My chapter 2 Notebook is at Kaggle.

Leave a comment