How Can Bots Understand Natural Language?

I picked up on a conversation between Alice and Bob that went like this:

Alice: “The quick brown fox jumped over the lazy dog.”
Bob: “Ok.”
Alice: “Who jumped over the dog?”
Bob: “The fox.”
Alice: “What was the color of the fox?”
Bob: “Brown.”
Alice: “What was the dog like?”
Bob: “Lazy:”

This is a pretty silly conversation for two humans to engage in. But what if Bob were a chat bot? The conversation suddenly becomes a lot more interesting. How does the bot answer the questions? It is able to answer the questions by recognizing the patterns in the statement and the questions. It is able to recognize the patterns because it was trained by metadata tags – called “annotations” – that marked these patterns. In this post, we will examine some common annotations. We need a natural language analysis tool to examine the annotations applied to a sentence, so we will use the Stanford CoreNLP suite for this purpose.

Part of Speech (POS) Tagging

A POS tagger annotates each word in a sentence with its part of speech (noun, verb, adjective, and others).

“The quick brown fox jumped over the lazy dog”

pos

Typically, the tags are at a more granular level such as VBD – verb determinant. The full list and meaning of the tags can be found in the Penn Treebank POS tag reference.

This system is incredibly useful in identifying what a question might intend to ask. Let’s look at the POS tags for the question “Who jumped over the dog?”

pos1

Do you see a pattern? Look at how jumped/VBD and dog/NN are in both the statement and the question. The bot can see this pattern too, therefore it knows you are asking it something about jumping and dogs.

Dependency Parsing

Dependency parsing generates dependencies between the different parts of speech in a sentence.

“The quick brown fox jumped over the lazy dog.”

dep-parse1

The relationships are labelled using syntactic functions. For example, “brown” is an adjective that modifies “fox.”

“Who jumped over the dog?”

dep-parse2

Do you see the pattern? The tuple (“fox,” “nsubj,” “jumped”) from the statement seems to answer the ‘”who,” “nsubj,” “jumped” from the question, so the substitution of “who” with “fox” gets us to the answer.

“What was the color of the fox?”

dep-parse3

The interesting relationship in the statement and question is shown through ‘”brown,” “amod,” “fox” and “color,” “nmod,” “fox.” Like before, we can substitute “color” with “brown” and have the answer.

Named Entity Recognition

Named Entity Recognition (NER) is the analysis of words or phrases that identify an entity like a person, organization, place, time, or currency. A corpus needs to be trained to identify an entity and a model may identify other entities or a subset of the ones mentioned here.

“I need a ticket to New York.”

ner

“Give me twenty dollars.”

ner1

“I have a ten year old daughter.”

ner2

“John Kerry got stuck in New Delhi traffic!”

ner3

This is a powerful technique for Bots to understand the context and intent. However, both the training data and training time are substantive.

Conclusion

I have covered the basic annotations in this post that can help uncover patterns from a knowledge base of statements and related questions. There are more annotations that build on these annotations. If you want to learn more about how we are using Bots that understand natural language, get in touch!

Sayantam Dey

Sayantam Dey

Senior Director Engineering

Sayantam Dey is the Senior Director of Engineering at 3Pillar Global, working out of our office in Noida, India. He has been with 3Pillar for ten years, delivering enterprise products and building frameworks for accelerated software development and testing in various technologies. His current areas of interest are data analytics, messaging systems and cloud services. He has authored the ‘Spring Integration AWS’ open source project and contributes to other open source projects such as SocialAuth and SocialAuth Android.

Leave a Reply

Related Posts

The Importance of Place and Space, with Rachael Stott On this episode of The Innovation Engine, we'll be looking at the importance of place and space with Rachael Stott of Refraction. Among the topics we'...
SXSW 2017: The Innovation Engine Podcast Recap For a very special episode of The Innovation Engine, we're giving you a look inside the 2017 SXSW festival in Austin, Texas. SXSW celebrates the conve...
Seyla Seng to Attend MLConf 2017 MLconf will be holding an event in New York City on March 24th, 2017. 3Pillar Global's Seyla Seng will be attending this year's conference event.M...
Be the QA workshop in Cluj and Timisoara  Want an introduction into the world of Quality Assurance?Join "Be the QA" workshop, now in CLUJ & TIMISOARA!The workshop wil...
Take 3, Scene 19: The SXSW 2017 Preview This episode of 'Take 3' takes us to Austin, Texas ahead of SXSW 2017, the annual tech conference which runs from March 10 to March 19 this year. Seyl...

Free product development tips delivered right to your inbox