Chatbots: AI is cool, but rules still rule

There is a palpable excitement in the room as some of the finest minds in medical Computational Linguistics take their seats in a typically sterile conference room, in a typical Boston office building, on a typical spring day.

The presentation begins with the preface that what you are about to see is a conversational agent that is trained entirely from message board conversations between mothers, other mothers and paediatricians. The algorithm is almost entirely unsupervised machine learning.

A beaming academic beams the display of a simple chat window onto the wall and begins typing:

“My child isn’t smiling, should I be worried?”

The machine immediately produces the reply:

“How old is your child?”

“Just over one year old”

Just as instantly, the machine comes back:

“This is little cause for concern. It’s quite normal for children to not express themselves much before they’re two years old. If you have any other worries, though, you should consult your pediatrician”

Knowing how impressive the probabilistic models behind this are, the room breaks into an impressed murmur.

One hand is raised, however, and a new query is posed:

“Tell the machine ‘my child isn’t breathing’”

The presenter tentatively types in the query, which the machine instantly responds to with:

“How old is your child?”

Of course, it should go without saying that this is merely a test case and the researcher behind it shouldn’t be too harshly judged, just as it should go without saying that the age of your child is irrelevant in the face of respiratory failure, even if the bot has a helpful answer lined up thereafter.

But the researcher’s response to this failing is interesting:

“Well, a few simple rules in the machine would fix that up quickly”

If you’re not familiar with rule-based chat engines versus machine learning chat engines, here’s an almost disrespectfully simplistic explanation:

A rule-based approach looks at text that is provided to the machine “my child isn’t breathing” and tries to match that input word-for-word against keyword “rules” that have been configured in the machine. If a rule pattern exists that contains “my child is not breathing” or any fewer of those words (like “child not breathing”) then the bot will be satisfied and provide the answer.

A machine learning approach relies on a fairly complex probabilistic calculation of the input, which reaches a conclusion that is a statistical measure of the input against a so-called “intent” that is “trained” into the machine. So the machine would need to have an intent called something like ‘child-not-breathing’ which the machine would try to match the input to using math. The math itself is too much to get into, but essentially the bot is saying something like “I’m 92% sure that “my child is not breathing” matches ‘child-not-breathing’ and therefore provides the requisite response.

The rule-based approach is considered embarrassingly simplistic by Computational Linguists, data scientists and even many software engineers. In fact those techniques are decades old and earlier keyword pattern matching can be found in many 80s DOS-based games that required textual input to drive gameplay (that’s probably another, highly geeky, blog post).

When the whole hype around chatbots began, three or four years ago now, it was characterised in part by software creators avoiding rule-based techniques like the plague. The perception was, and still is, that rule-based chatbots require more hands-on effort and don’t benefit from machine learning models improving. That is true, but only partially. There was also a fear of the humiliation of not embracing the shiny new thing, even though the shiny new thing was unproven and subject to weird results (like Microsoft’s notorious Tay bot and the aforementioned baby breathing scare).

At Twyla we also avoided the rule-based approach for a few months at the beginning, before we realised just how data-dependent machine learning-powered chatbots were in order to achieve any semblance of quality in the conversation. What’s more, in order to train the model with sufficient intents (like ‘child-not-breathing’) one would need someone very familiar with the subject matter at hand spending hours labelling a spreadsheet of real-world queries according to what intent they match. And you would need a considerable number of real-world examples of people describing their child not breathing in order for the machine to accurately match its intent. That’s basic statistics.

A lot of chatbot technology providers might bristle at this suggestion, stating “actually a model can be reliably trained using just ten ‘utterances’ (real world examples). That is somewhat true, yes. But that’s only half the challenge. This approach works particularly well if your chatbot is not expected to have complex sequential dialogues or readily change topics. If you’re going to rely exclusively on machine learning for your chatbot to understand language, you need to also solve the challenge of how your bot knows when a subject is being changed or how to keep track of where the user is in a conversation (so-called ‘dialogue management’).

To some degree this has been tackled in machine learning-driven solutions, but often they still rely on good old fashioned rules to make it work (even if the rules are not keyword patterns but rather markers of context in the dialogue). Mixing techniques in how a chatbot works only adds complexity to the challenge of troubleshooting it when it inevitably goes wrong. And AI, being a black box, does not make that troubleshooting process easy.

Back to Twyla, in the face of data scarcity and our own drive to get customer projects actually delivered and working well, we fell back on rules, almost against our own will. And in truth we never really looked back. While our software is capable of training intents and entities (and using them in dialogues), it frankly works faster and more reliably to use rules, not just in delivering benefit to the end users and our customers from day one, but also in quickly generating nice clean data to be used in training machine learning models later, as people actually talk to the bot.

The absolutism of these rules is what makes them work so well. Firstly your results are not clouded by statistical calculations that have no clear explanation of how to improve them. Secondly, you have greater control over the machine’s vocabulary, which means you can customise synonyms and provide subject matter-sensitive terminology (like if you have a product called Galaxy, that it doesn’t get confused with space-related terminology).

We did some research along with a large German telco some time ago, to see what would be quicker and more effective: a pure rule-based dialogue or a pure machine-learning dialogue.

We timeboxed the project to five days, start to finish, and set off in a race between Computational Linguist (training a machine learning chatbot) and UX designer (creating a rule-based chatbot). Each was working from the same set of data and information.

After the five days, the Computational Linguist had trained 10 intents to a satisfactory level, albeit with no capability for follow-on questions and no means of gracefully failing if the bot didn’t understand. The algorithm was a cool thing to behold in action, taking an input, deconstructing the language, evaluating it against the trained model, and producing a calculation on its confidence of an intent match. All in the blink of an eye.

The UX designer accomplished, after five days, coverage of over 150 distinctive queries as well as ‘abstractions’ of those queries. Where a precise query might be “Why is my bill so high?” for which the bot can provide a ready-made answer. An abstraction might be “I have a question about my bill” for which the bot can then prompt the user to clarify what that question might be.

What’s more, the UX designer had the time and headspace to work on the content somewhat, rather than simply taking answers verbatim from the dataset of historical conversations that was provided to us. So not only did the bot understand ‘more’ but provided more salient, dialogue-friendly responses.

Rules are an inconvenient truth in chatbot world and the steadfast resistance to them may be holding up many opportunities to implement good quality chatbots. The truth is that rules just f*****g work. That’s the sum total of it.

To finish this off with a Twyla plug, our hybrid solution means the best of both worlds are available. You want intents or entities? You have workable and voluminous enough data to work from? We got you. You want a bot that works quickly and well, with graceful failure and abstract queries built-in, and maybe you don’t have historical chat to work from? We got you too.

The future is conversational.