How can you build a constantly learning Virtual Assistant using Graph and Search techniques

Can you use the human in the loop to constantly improve the capabilities of the virtual agent?

Let us say you are developing a virtual assistant to handle customer service calls on Telphone for Hotel Chain. Your virtual assistant had to back out and take the help of a human to resolve the customer issue.

The virtual assistant can listen to the recording of the conversation between the customer service representative and the customer, converts the conversation to text using speech to text recognition techniques and analyzes the conversation for future use. 

The stored conversations/dialog are used to improve the intelligence of the software system on a continuous basis by storing the conversations in a graph data structure on an inverted index for efficient future retrieval. 

A dialog can be defined as the smallest element of the customer and business interaction. The system can build a bipartite graph with a hierarchy of dialogues. A dialog itself can be represented by two nodes and an edge between them. The dialogs are connected and branched off as new combinations rise for the business interactions across different communication platforms. The graph can be built on an inverted index data structure to support efficient text search. 

Elaborating further, to start with the opening sentence from customer service representative such as “Hello {Customer Name}! This is {Company}. How can I help you” will be represented as the root node of the graph. We note that the data in the node will have placeholders for the customer name, the business name. The placeholders in the conversation for building the graph are identified by looking for fuzzy string matches from the input dictionary consisting of inputs such as the business name, the customer name, the items served by the business, etc. The node is annotated with information about who the speaker (customer or customer service representative) was. The node will also have features such as semantic mappings of the sentence, vector computed using sentence2vec algorithm by training a convolutional neural network on the domain that the software agent is trained for. 

A different semantic response from the customer is created as a child node for the question from the customer representative. The semantic equivalence to the existing nodes on the graph can be done using learn to rank algorithms such as Lambda Mart borrowed from the search techniques after doing a first pass inexpensive ranking on the inverted index of the graph of conversation. In an implementation, the result with the highest score with Learning to Rank algorithm exceeding a certain threshold is used as a representative for the customer input. The semantic equivalence comparison and scoring is done after tokenizing, stemming, normalizing and parametrizing (recognizing placeholders) input query. Slot filling algorithms are used to parametrize the customer responses. The slot filling algorithms can use HMM/CRF models to identify part of speech tags associated with the keywords and statistical methods to identify the relationships between the words. If there is a match to an existing dialog from the customer, then the software system will store the dialog context and not create a new node. In there is not a match, than a new node is added to the node of the last conversation. 

Some tasks are simple question and answers such as “User: What is your speciality? Customer Service Representative: Our specialty is Spicy Chicken Pad Kee Mow ”. These tasks can be indexed on the graph as orphan parent-child relationships in the graph. 

One of the challenges we run into when we are building the graph to constantly learn is the change in context. If there is no change in the context, we create the node as a child of the previous node. If there is a change in the context, we need to start a new node different from the previous state in the graph. To figure out a change in the context when the customer talks to the customer service representative, we can use a Bayesian or SVM Machine Learning classifier. The classifier can be trained on crowdsourced training data using features such as the number of tokens common to current and the previous task, the matching score percentage between what the customer has said and the maximum score match of an existing dialog. To improve the accuracy of the classifier, we can train a different classifier for each domain. 

It is to be noted that the graph can be constructed manually by an interaction designer, which can then be inserted in an inverted index. In yet another implementation, a Recurrent Neural Network can be trained on the interaction between the customer and the customer service representative, if we have a lot of training data. To implement personalization to models in a recurrent neural network, user profiles can be clustered into several macro groups. We can use an unsupervised clustering algorithm such as K-Means clustering to accomplish this or create manually curated clusters based on information about the user such as age group, location, and gender of the customer. We can then boost the weight of the examples which had a positive conversion from the customer service representatives. In an implementation, this can be done by duplicating the positive inputs in the training data. The positive inputs can be characterized by things such as the order price and satisfaction from the customer. It is to be noted that the idea of personalization in neural networks is not specific to conversational customer interactions and can be used in things such as building model which send an automatic response to emails. 

The graph on the inverted index is then used to answer questions about the business by a software agent. The software agent starts from the root node of the graph and greets the customer on a call, SMS and Facebook Messenger. The customer can respond to the greeting with a question about the business by searching for the closest match to the question from the customer using techniques borrowed from information retrieval. In an implementation, this can be done using an inverted index to look up possible matches for the user input using an in-expensive algorithm to start with and then evaluating the matches with an expensive algorithm such as Gradient Boosted Decision Tree. Before hitting the inverted index, we have to run stemming, tokenization and normalization algorithm on the input query to make sure that the input can be searched properly by the algorithms looking for a match.

This was an idea I wrote in 2016, in a patent application for Vocy.AI (If you ever plan to use this technique for your company, please consider paying royalty to a poor innovator). Components such as Sentence2Vec can be replaced now with BERT and RNN can be augmented further with attention techniques.This approach gives control as well as the evolution of the virtual agent to enterprises.

with Google:

with Intuit:

Install on Clover App Market:

Install on Yext App Market :http://Install on Yext App Market: