Twitter aggregation and some statistics

I know some of you noticed a slight bug in the Taggloo site for the last month or so. Due to Twitter changing the API that I used to extract #Gaelg Twitter content, the Twitter aggregation hasn’t been working for a while.

Upon further investigation, this needed some fairly extensive refactoring. I had been relying quite heavily on the particular format that the data was previously published in. This was great as I could pretty much use the same code for blogs and YouTube, too. Unfortunately, Twitter were keen on developers using the full API and as such I have had to rewrite large portions of the community content aggregation code.

Alas, this has taken longer than I had hoped. As some of you will know, I’ve had good cause to be distracted lately. However, some discipline and Diet Coke has enabled me to fix the problem. Taggloo is now collecting Twitter content once again!

As a small apology, I thought I’d share some statistics with you:

  • Twitter contributes over 97% of aggregated community content.
  • Since May 2011, over 2,000 Tweets with #Manx #Gaelg content have been aggregated. Most of which has been within the current calendar year.
  • We’ve been trying to encourage users to show their support of the language in social media, which has been shown to be successful, with a significant increase in content aggregated since March this year.
  • Saving users’ blushes, it is obvious some users have contributed real value to the community, with the top 10 Twitter users (excluding bots) posting over 50% of #Manx #Gaelg tweets!
Chart showing Community Content Items aggregated over time

Community Content Items aggregated over time

Remember, tweet in Manx using the #Manx #Gaelg hash tags and contribute to the island’s heritage in modern media whilst building an even stronger Manx dictionary within Taggloo.

Crowd sourcing Manx

In my previous post I hinted on the improvements made to Taggloo in the latest significant release. Key amongst these is the ability for users to contribute their own content to the dictionaries. One of the beauties of Manx in particular is the fascinating vocabulary, even different pronunciations and words used in communities within such a small island. By adding content and improving existing content, we can help create a living, social dictionary.

Adding content to the Taggloo dictionaries is easy. Perform your search and at the bottom of every screen is a link to “Improve this entry”.

Improve link screenshot

If you’re not already logged in, you’ll be asked to log in. Remember, you don’t need to create an account. You can just use your Facebook login.

The opportunities of improving an entry are significant, as shown in the screenshot. Clicking on the tile will allow you to add improved content.

26-04-2013_12-41-44

You can add a definition, a plural, pronunciation, a sound clip of the word being pronounced, a translation to another language, a phrase, mutation or a web site with relevant information. Have a look at how you could improve the dictionary:

  • Add a definition: What does the word mean? This is useful for when you would like to describe the meaning of a word instead of relying on synonyms.
  • Add a plural: Plurals in Manx aren’t as straight forward as in English, so you can add how the plural form is used.
  • Add a pronunciation: Using phonetic syllables or the phonemic alphabet, define specifically how a word is pronounced.
  • Add a sound clip: Dictionaries are great for finding formal definitions of how words are pronounced, but there’s no better way than hearing someone say it. Add a sound clip to show how the word sounds “for real”.
  • Add a translation: Add a translation or synonym for the word in another of the supported languages.
  • Add a phrase: “Use it in a sentence”! A great way to understand what relationship the word has with other words in a sentence or when you could use a word.
  • Add a mutation: Languages sometimes mutate words for reasons of ease of speech or more technical reasons such as the context the word is used in. These rules aren’t always clear, so add a mutation to help other users.
  • Share a web site: Another opportunity to help other users understand how the word is used for real. An example of a good web site example would be if the word is featured heavily in an article, for example.

You can also add a new word that’s not already in the dictionary. If no matches are returned, you’ll have an option to add the word:

26-04-2013_13-32-36_494x125

Over time the dictionaries will become fortified with rich content, submitted by real users of the language. Have a look at the screenshot below for the result of searching for “thie“:

26-04-2013_12-43-27

Taggloo: even more social

Hopefully you’ve seen Taggloo by now and read about how it was inspired. Taggloo was always intended to bridge the gap between translating words and the use of those words in the community. The last major feature launch was the aggregation of community content where minority languages such as Manx were used in social media. This allows a user to identify other interested people that they can connect with and for these real-life uses of language to be included in translations. It’s a neat idea and one that is starting to bear fruit now the code has been active for around 9 months.

Taggloo logo

The next step was to extend the idea of community with user-generated content and authority. The Taggloo dictionary contains tens of thousands of phrases and translations, but they were fairly static. The inclusion of community content in social media extended the richness of the dictionary, but without the structure of a dictionary.

With the latest update, users can contribute their own words and add a wide variety of improvements to existing words. For example, you can add a phrase, sound file, web site or definition. Taggloo also supports the concept of mutations and plurals to further extend the richness of the dictionary. Learners and experts alike are encouraged to add common phrases, their own translations, perhaps modern concepts such as internet terminology or idioms to help extend the richness of the dictionary.

Social Taggloo screenshot

But how do you know how reliable dictionary data is, if anyone can submit their own content? Content is submitted by users with a seeded vote of zero (0). Then, as other users use it, they can “vote up” the item, increasing the item’s score. Search results are sorted on this score, so the authoritative submissions are always presented first. Conversely, if a translation or resource isn’t appropriate, then it may be “voted down”.

To add to the “social” dimension of Taggloo, the site now supports Facebook authentication. You don’t need to create a new username/password if you don’t want to (though you can if you wish or are not a Facebook user), instead, just log in with your Facebook username and password. The site will never know your password, so that’s one less thing to worry about!

If you haven’t yet come across them, check out the Facebook page or Twitter stream at @TagglooIM where you can be introduced to new phrases and keep in touch with Taggloo developments.

This last update was a big one and I hope to introduce the features in detail in the coming weeks in future blog posts. Why wait till then? Have a play …

Aggregating a “living” language

Community dudesBetween work, TT and various obligations I’ve finally managed to finish what is the majority of the second key phase of Taggloo.im: social content.

Language is nothing if not used in a social context, and using Manx online is no different. I’m a big fan of Social Media and particularly how it can be used to promote the visibility (if not learning/teaching) of our island’s language and heritage. Modern technology is already leveraged very well by teachers and learners of Manx using mediums such as Twitter, You Tube, blogs and Facebook.

It was always my intention to bring together what is currently a disparate and siloed set of high quality language content and aggregate it for presentation and showcasing it, alongside other, similar content. In the spirit of open-data, why not take the content, re-form it and publish it side-by-side?

Currently, Taggloo is pulling in content from:

  • Twitter (based on a set of rules which should hopefully provide fairly relevant Manx content)
  • You Tube (specifically the Gaelg You Tube channel)

Taggloo community screenshot

Try it out: http://taggloo.im/Community

Hopefully by integrating this live Manx content, it will encourage users not only to develop their understanding of the language by accessing previously unseen channels, but also to participate in the discussion using Twitter.

Maybe in the future I’ll also add a Facebook, Flickr, Tumblr feed, or just more content from You Tube or more blogs. If you’d like to see a particular set of data included, make sure you suggest it on our UserVoice site. The site is able to gather and parse a wide variety of data-formats, which will increase as more content feeds are discovered.

Remember, if you’re using Twitter to write in Manx, add the #Gaelg hash-tag!

Taggloo launched

It seems that my Taggloo (http://taggloo.im) experiment has been sufficiently successful to warrant some determined effort on producing a site that I would be happy to launch and have people use. After collecting some ideas from some Manx speakers and adding few of my own ideas, I’ve developed the site and am happy to publish the site for wider use. I’ve even had some positive feedback for my very limited design skills!

Taggloo logo

Taggloo, as the site says, is a means of bringing learners and seasoned speakers of niche languages together to help keep the language living. Taggloo is Manx Gaelic for “speech” and as such is designed to be an informal aid to existing resources that may be available, rather than a definitive or technical language resource. Currently, the site only has Manx Gaelic, but it is designed from the outset to support other languages.

Taggloo dudeMy aim for the site is for to create a Social Dictionary. This will be achieved by aggregating electronically published content such as Tweets, blog posts or You Tube videos and by encouraging users to get involved by submitting their own interpretations and uses of words, or their own words. I’ve been learning Manx Gaelic long enough to understand that spellings, pronunciation and meanings of words can vary between dictionaries, contexts and even regions of this small island. So called “dead languages” are often still living, evolving and growing through use in the community and this includes the internet community.

Ultimately, I want to answer a key problem I had in trying to access useful Manx content in an electronic format, by opening the dictionaries and enabling opportunities for users to leverage this data. Users can access this data either by using the web-site or, using the comprehensive API, via mobile phone “apps” or even applets embedded in other web sites.

The site currently performs basic translations, though this will be extended over the coming weeks as I find time to introduce my intended improvements. Here’s what is on my roadmap so far:

  • Community content aggregated from Twitter, blogs and You Tube.
  • Language use “in the wild” drawn from such community content added to translation results to show context within sentences and discussions.
  • Uploadable media items, including a subset of Manx Gaelic vocabulary to prime the collection of user-submitted media.
  • Submission of words missing from the dictionary by expert speakers and learners alike and rating of community submitted content.
  • Submission of comments regarding people’s individual understanding of word meanings and uses, providing a very personal interpretation on language use.

How can you contribute?

The site is intended to be open from the outset and this includes accepting users’ comments and suggestions on how to improve the site. I’ve set up a UserVoice forum to collect users’ impressions. Maybe you have new languages in mind, or have a killer-feature in mind. Or, perhaps there is a bug on the site that needs to be fixed!

Any and all feedback is welcome, and you can submit your suggestions and queries at http://taggloo.uservoice.com.

Taggloo – an early look at user behaviour

Taggloo, my experiment with Manx translations is proving to be surprisingly popular. Even at this very early stage, a select few people are using it regularly and are providing me with excellent feedback.

I thought I’d just have a quick look at the analytics I’m collecting on the usage of the site (not personally identifiable) this morning. Bearing in mind that this site is an experiment, I was surprised by the results. I was further pleased by the correlation of translations with work we’re doing in class.

At the time of writing, in the first month of use the site has had over 1,200 queries, which will be sourced from both the web-site and clients using the API, such as the Windows Phone 7 application.

The most popular word is the English “because” . This is particularly interesting as it is exactly what we’re learning in class at the moment. This word has a complex structure in its Manx form, with one translation being “er yn oyr”. Literally, “on the reason”.

The second most popular query is for the Manx “poyll faarkee”, which is “swimming pool” in English. Some queries are clearly unexplainable!

By far, the most popular platform for conducting queries is the Apple iPhone web browser, with over 450 individual requests. This is probably due to most of the users who I’ve asked to try out the service having Apple devices and maybe will serve to encourage some kind soul to volunteer to write an iPhone client.

The Windows Phone 7 application accounts for over 150 requests, not bad for the 13 downloads this application has obtained so far. Due to the current lowly position of Windows Phone 7 in the smartphone space and the very niche community of Manx speakers who may be involved with this experiment, I’m obviously not expecting this download figure to be high!

This weekend was spent working on an improved index and rebuilding the current word lists to match it. This will serve faster and more accurate lookups and paves the way for further additions to the served content in the future. Having had such surprisingly good feedback, I just wish I had the time to get stuck into the other ideas I have. It’s all very exciting, I’m working towards a social, living dictionary. Who needs Google Translate?

Open data, open dictionaries

Dictionary pageThe Isle of Man branch of the British Computer Society had a fascinating presentation on open data and mash-ups on Friday. The talk was given by Prof. Robert Barr OBE, and the gist of the session was that data should flow freely to the people in a useful data structure, yet also that the open-ness should be considered with attention to commercial considerations such as intellectual property and the benefits to the wider economy.

While listening to Robert, it struck me that I am in my very own battle for the extraction of data that should be more readily available. As you may know, I am learning Manx. As part of this, I am generating my own revision notes, references, blog posts and the like that may someday see the light of day. Part of this work is the development of a Manx language dictionary for Windows  Phone 7.

To achieve my goal, I needed a copy of the Manx dictionary. Having asked around and researching myself, I gathered a number of links to existing on-line resources. These ranged from PDF formatted documents to fully indexed dictionaries. The PDF version (English to Manx, Manx to English) was unsuitable because it would be difficult to accurately extract the words from the PDF “printed page”. The RoadLingua and FreeLang dictionaries appeared promising, and the dictionaries appeared to be out of copyright. But these were encoded in proprietary dictionary file formats. So ironically, even though the dictionary was “open”, the software needed to be reverse engineered to access the dictionary, itself a violation of copyright. So it was that I was left with the remaining two options that may prove to be useful. These were the Phil Kelly dictionary and the Faragher’s. These were, however, only HTML sites. Between the two, Faragher’s seemed the best, as it provided value-added content such as use of the words within sentences and Manx phrases – ideal if you are interested in the many idioms in use in Manx Gaelic.

So it seemed that I would need to use the Faragher’s site as a “back end” to my application, essentially screen-scraping the site for translations. And indeed, to accomplish this, I would be best served if I wrote my own web site, which acted as a bridge between my Windows Phone 7 application and the dictionary itself. This would double my work, but the reasons were various; the extended platform on a server would allow me to parse the HTML from the site more reliably and by caching words as they were requested, I could – over time – create a reliability buffer in case the original site was to fail. I set about the task and have just launched the site in a very early form of initial testing (take a look, at http://taggloo.im). This was particularly challenging, as the HTML from the Faragher’s dictionary is flakey at best. However, by inserting that middle layer, I could hide this trickery from the user.

All this, because the dictionary was not available electronically in an indexed form. And this resonates with Robert Barr’s point about open data. Open data should not only be open, but also be usefully formatted to allow for its use. An unindexed dictionary is hardly a dictionary! More frustration was in the encapsulation of the indexed dictionary within copyrighted software which was quite closed! I approached RoadLingua about how they would feel about releasing the file formats to their dictionary but I received no response.

So it was with great surprise and relief when I realised that by navigating to an unpublished URL (that should have been concealed from internet users) I could extract the entire Faragher’s dictionary from the site, and put it to my own use! So, after playing with MySQL scripts in order to format them into T-SQL, I now have two 50,000 word dictionaries, one for each direction (Manx to English, English to Manx). Am I going to keep this to myself?

No. I’ve checked about copyright, and I’m informed that this is not an issue, certainly in the spirit of expanding the availability of Manx learning resources. So, as part of my Taggloo project, which already has an effective and reliable API for XML and JSON consumers, I’m going to make the entire database available for use by other applications (maybe mobile phone applications, competing with my own) and web-sites (it becomes possible to “embed” Manx dictionaries on even the simplest of sites). Although the final API has yet to be defined, and there will likely be changes to it in the coming weeks, this data will obviously be free for use by anyone and everyone (subject to fair use – ie. not crashing my server), the API will ask for one thing: the opportunity to record the words being indexed. This itself, over time, will create a second rich data-set. What words are people regularly using? Do these correlate to students’ progress in classes, or do the translations point to any cultural significance such as house names, which are regularly seen in Manx, yet seldom understood?

I have many plans around this project, with further data-sets springing from them, and adding further depth to what will hopefully become reliable and rich data-set containing both formal dictionary content and community contributions. This complements the already available learning resources for the user, particularly those found at LearnManx.com. I’ll be blogging about them very soon, hopefully in line with an exciting new blog design.

Basic verbage without the rulage

In my previous posts I’ve used “learn”, “ynsaghey” and learning “gynsaghey”. There are some further verbs that are regularly used and knowledge of which can help you get by in conversation or basic tweets.

For “to go”, in English you would use “go” as the verbal-noun and imperative. That is it is both an instruction “Go to bed!” and a statement “I go to bed early”. The infinitive being “going”, such as “I am going to bed”. Obviously in Manx, this all changes.

In Manx, the same verb “goll” is used for both the verbal-noun and infinitive. So “Ta mee goll dys lhiabee”, “I go to bed” could also mean “I am going to bed”. The imperative, or commanding form, is “immee”. Therefore, “Immee dys lhiabee!”. Of course, there is no simple rule between goll -> immee as there is in English go -> going. So, learning is necessarily by rote.

The nine key verbs most often seen are below. The exclamation marks are my own to try and help distinguish the use of the word as an instruction from the original noun.

Verbal noun and infinitive Imperative
(Statement of fact or “-ing” form) (Instruction!)
çheet come, coming tar come!
goll go, going immee go!
coyrt or cur give, giving or put, putting cur put!
goaill take, taking gow take!
gra say, saying abbyr say!
jannoo do, doing jean do!
clashtyn hear, hearing clasht hear!
fakin see, seeing jeeagh see! look!
feddyn or geddyn get, getting fow get!

In “çheet” we see the first appearance of the cedilla. This “çh” form has the same sound as in English “church”. This is as opposed to the Manx “Cha”, which is “ha”.

So examples of the use of these verbs:

  • Gow my leshtal” – Take my excuse (“sorry”) (Note that this is instructive, not aggressive, despite my exclamations)
  • “Vel o goll?” – Are you going? Equally …
  • “Nagh ren uss goll dys Doolish?” – Didn’t you go to Douglas? And …
  • Immee dys Doolish nish!” – Go to Douglas now!

I have a great little book with these verbs in and I regularly just stop and quiz myself on them. I’m using Goodwin’s “First lessons in Manx”. You could also print this page out and test yourself.

Manx in Social Media

When I started learning (then abandoned) Manx in 2006 I struggled because it was not in everyday use, and it was quite difficult to stretch my muscles outside of “I like this”, “I did that”, etc.

So in this renewed effort of learning I’m using Social Media to create that environment. By using similar sentence structures, it’s easy to tweet feelings, thoughts and actions. For example:

  • Ta mee skee – I am tired
  • Ta mee feer skee – I am very tired
  • Ta mee goll dy valley – I am going home
  • Ta feme aym er jough! – I need a drink!

These are pushed into my Twitter feed and my Facebook wall, probably annoying many of my followers and friends.

In addition to this, I try and stretch myself out of these standard sentences by creating sentences from film quotes, famous songs, etc. I have been known to make some disastrous mistakes, particularly the quote from Breakfast at Tiffany’s; “I am a very stylish girl” which I rendered as “Ta mee fashanagh mooar ben“. Unfortunately, due to synonyms/translation differences, that could also mean “I dress up as a big lady”. This caused much amusement to a couple of Manx learning tweeps :/ .

To my surprise, I found a definite interest in my tweets! Both by professional Manx speakers, experienced speakers and equally importantly, learners and people who want to learn but are unsure of how to make the leap.

Adrian Cain, the Manx Language Officer, has also started to add #manx and #gaelg hashtags on to his Manx tweets. This has set a precedent, with others using the same tags to help aggregate Manx tweets by interest (#manx) and language (#gaelg). Using these tags, and the retweets that using such tags generates, I’ve gained a few additional followers of Manx and Scottish Gaelic speakers.

So despite some complaints by friends and followers about my Manx tweets, I’m going to continue to tweet, learn and spread the word. If you’re on Twitter, make sure you use the #manx (for Manx interest) and #gaelg (for Manx language) hashtags.

Using past tense

Having covered using the present tense, I thought it would be useful to have a look at the past tense before moving on to verbs.

The same structures seem to apply, instead of using “ta”, “va” is used.

va mee I was Va mee gynsaghey I was learning
v’ou You were V’ou gynsaghey You were learning Used when speaking to a single person for politeness
v’eh He was V’eh gynsaghey He was learning
v’ee She was V’ee gynsaghey She was learning
va shin We were Va shin gynsaghey We were learning
va shiu You were Va shiu gynsaghey You were learning Used to address more than one person
va’d They were Va’d gynsaghey They were learning

The negative form introduces “row” (as in “cow”) which means “was”, though I’m not sure if you could use “row” on the affirmative form, for example, “row mee” gynsaghey”.

Also note that the singular of “You were not” has changed its form. This is to avoid confusion between “r’ou” and “row” when speaking as they both sound similar. I guess one should use the “uss” form to avoid any confusion.

cha row mee I was not Cha row mee gynsaghey I was not learning
cha row uss You were not Cha row uss gynsaghey You were not learning Used when speaking to a single person for politeness
cha row eh He was not Cha row eh gynsaghey He was not learning
cha row ee She was not Cha row ee gynsaghey She was not learning
cha row shin We were not Cha row shin gynsaghey We were not learning
cha row shiu You were not Cha row shiu gynsaghey You were not learning Used to address more than one person
cha row ad They were not Cha row ad gynsaghey They were not learning

Updated 25 September …

If you need to use the “do” form, the table below shows some examples. I distinguish the two by another of my silly rules:

  • row = was – “W” is in both “row” and “was”
  • ren = did – Totally no pattern!
cha ren mee I did not Cha ren mee ynsaghey I did not learn
cha ren uss You did not Cha ren uss ynsaghey You did not learn Used when speaking to a single person for politeness
cha ren eh He did not Cha ren eh ynsaghey He did not learn
cha ren ee She did not Cha ren ee ynsaghey She did not learn
cha ren shin We did not learn Cha ren shin ynsaghey We did not learn
cha ren shiu You did not Cha ren shiu ynsaghey You did not learn Used to address more than one person
Cha ren ad They did not Cha ren ad ynsaghey They did not learn

So I guess that it follows that as you can use “Nagh row” for “Wasn’t?”, you could use “Nagh ren” for “Didn’t?”.

nagh ren mee? Didn’t I? Nagh ren mee ynsaghey? Didn’t I learn?
nagh ren uss? Didn’t you? Nagh ren uss ynsaghey Didn’t you learn? Used when speaking to a single person for politeness
nagh ren eh? Didn’t he? Nagh ren eh ynsaghey? Didn’t he learn?
nagh ren ee? Didn’t she? Nagh ren ee ynsaghey? Didn’t she learn?
nagh ren shin? Didn’t we learn? Nagh ren shin ynsaghey? Didn’t we learn?
Nagh ren shiu? Didn’t we? Nagh ren shiu ynsaghey? Didn’t you learn? Used to address more than one person
Nagh ren ad? Didn’t they? Nagh ren ad ynsaghey? Didn’t they learn?

I think that completes the past tense in the simplest form. I’m told that it is possible to man-handle your Manx and use these simpler forms rather than looking for the past tense verb of each stem when starting out. I’m counting on it.