The Dilemma of Dialects: Localising Content at the Regional Level

In my last blog post, I talked about my work on a recently completed localisation project for which I was asked to write training data for the British English version of a future customer service chatbot. I discussed issues such as adapting my typically formal writing style to reflect the way people would speak to such devices, but I was only referring to standard English at the time. In this post, however, the topic of conversation will move away from localisation at the national level to localisation at the regional level, which, in my opinion, is extremely pertinent when thinking about the UK, with almost all of its major cities and regions having their own distinct accent or dialect. This is definitely the case for my beloved Black Country, with most regionalisms at both the lexical and grammatical level being almost incomprehensible for those hailing from other parts of the country. I was therefore pleased when the end client requested that we include more regionalisms in the data around a month ago, not just because the data would be more representative of spoken language across the country, but also because of the challenges this would pose when trying to express this in writing.

Although I believe to have an extensive knowledge of regional accents and dialects - it's always fascinated me how diverse these are in the UK - I generally avoided trying to imagine how someone in Tyne and Wear, Merseyside or the Welsh Valleys, for example, would express themselves for fear of creating stilted utterances. I therefore focused my efforts on the Black Country dialect and accent, having been exposed to them throughout my entire life. I also thought long and hard about the slight linguistic differences between ourselves and our neighbours in Birmingham, drawing in particular on my experience of working as a Spanish teacher at a primary school in the Second City.

However, before I explain how I tried to express Black Country parlance in writing, I am first going to provide a bit of context. The Black Country is a region located to the west of Birmingham that's home to just over a million people, encompassing the metropolitan boroughs of Dudley, Sandwell, Wolverhampton and Walsall. The region's name was coined during the 1840s, a time when it was engulfed by the soot produced by heavy industries, including iron foundries, glass factories and coking. The name has since stuck, with us now having our own flag, a dedicated museum (The Black Country Museum) in honour of the region's rich past, and our very own day of regional celebration on the 14th July (Black Country Day). The sense of regional pride around these parts is also so great that the large majority of us are deeply offended when we're confused for Brummies. You should also know that we have three professional football teams - West Bromwich Albion, Wolverhampton Wanderers and Walsall - and that the most famous people to hail from the region include Lenny Henry, Frank Skinner, Noddy Holder, Beverly Knight and Robert Plant. Even though I only live a stone's throw away from the Walsall-Lichfield border, i.e. the Black Country-South Staffordshire border, and although the heart of the region is unarguably in Dudley and Wolverhampton, Black Country culture is still ingrained in me, hence my concerted effort to add highly idiomatic regional language to the training data. But, how was I going to do that when the dialect does not have a standardised form in written language?

Thankfully, with greetings not bearing too much relevance in the data, I didn't have to worry too much about aspects of Early Modern English—and even Middle English— that are still used by many speakers in the region today such as the word "thou," which features in contracted form in the expression "Ow B'ist?" (How are you?). However, expressing most people's use of invented past participles and auxiliary verbs, for instance, proved challenging. Replacing "been" for "bin," for example, was relatively straightforward, but would it be okay for me to write "sin" instead of "seen" or "saw." What about "ay" instead of "haven't"/"have not," and "day" instead of "didn't"/"did not"? I tried to write these instances of deviation from standardised grammar in accordance with how they are pronounced, but with no standardised forms, and no one else from the region on our team to consult, I just had to go with my gut instinct once the PM gave us the go-ahead. This issue also pertained to modal verbs and negative complements in the present tense; I had to check if I could use "corr" instead of "can't," and "I ay" rather than "I'm not," for instance. In addition, I had to clarify that I could use "am" instead of "are" when writing utterances in the second person singular, and "we was" rather than "we were" in the past tense.

Meanwhile, the regional pronunciation of certain words such as "phone" and "school" represented an even bigger problem. I was tempted to write "fowan" and "schuwal" to help the chatbot recognise that people in these parts, myself included, pronounce such words as if they contained two separate vowel sounds as opposed to one. But, after consulting the PM, the solution was to maintain standardised spelling, since unlike "ay" and "corr," "fowan" and "schuwal" aren't individual words in their own right. This also meant that I couldn't spell "hospital" as "ospital," for example, in an attempt to express the fact that we do not pronounce the letter "h" at the beginning of words. We also had a debate about pronouns. We asked if we could use "ya" instead of "you" for the Midlands and the North in general, which then led me to ask whether "yow" could be included in the data, as it is very commonly used in the Black Country. I asked if "ah" could be used instead of "I" too, but, unlike "yow," this was not permitted, as it representing a change in pronunciation as opposed to a change at the lexical or grammatical level.

After all these queries had been resolved, I was quick to add aspects of my region's rich linguistic diversity to the training data. Although this required a great deal of thought, especially with the regional dialect and accent being used in their purest forms around ten miles away from where I live, I thoroughly enjoyed this challenge. Adapting my writing style to mirror spoken standard English proved difficult, but nowhere near as difficult as the challenge of writing utterances in a non-standardised, and relatively unknown, regional dialect. This project therefore proved to me once again how important it is for the final, localised project to be highly idiomatic, as well as how much hard work is required for this goal to be achieved. It has also taught me the importance of diversifying one's offer of services. Knowing that this project wasn't an out-and-out translation task, I wasn't too interested in taking part at first. However, in hindsight, I've been able to add another unique experience to my CV upon which I can build, and thus, further my skills as a professional in the languages services industry. If another project like this comes my way, I will most certainly embrace it with open arms. After all, why shouldn't I branch out a bit and try my hand at new things?

