Welcome to the first edition of CheckPoint — our Product Updates newsletter! Every few months, we’ll give you an inside look at our NLP research so that you’re up-to-date with what Point API has to offer.
Introducing: Reply AI
This August, we launched Point API with our trademark service, the Autocomplete AI. It’s the AI engine that powers our EasyEmail Chrome extension, a Gmail plug-in that generates personalized autocomplete suggestions based on your conversation history.
Autocomplete AI has been useful for our users, but we discovered that you often want to use suggestions before typing anything. Perhaps you’re in a rush and want to reply with the click of a button. Or maybe you have writer’s block and can’t think of what to say. To solve this problem, we released another Point API service called Reply AI.
Reply AI learns from your conversation history to suggest the most relevant replies to new incoming messages, all written in your own voice. These personalized suggestions inspire quick replies without you typing a single character. As an example, here are Reply AI suggestions for a real email received by a customer support user:
This user receives tons of customer support questions about their services provided at other locations. Reply AI detects that this incoming email merits a reply about travel charges and then suggests responses in the user’s own voice. Reply AI suggestions also contain placeholders for personal information, such as Value, Location, or Phone Number.
Generic vs. Personalized
Personalized suggestions are our bread and butter. But what if you don’t want to give us your personal data? Would it be possible to build an AI model that immediately works out-of-the-box?
To answer this question, we trained our Reply AI on open-source data from the Enron email dataset to support generic reply suggestions. These generic reply suggestions are commonly used phrases and sentences that don’t contain any personal information and can be used by anybody at any time.
If you’re familiar with Gmail’s Smart Reply system, you’ll probably want to compare their generic suggestions with ours. Let’s see how they stack up when faced with the same email prompt. Here are the results in Gmail:
You’re probably familiar with this one. Smart Reply identifies a single prompt in the received email and suggests three short sentences as replies. Here’s what our Reply AI suggests:
We found that our Reply AI suggestions are comparable to Gmail’s. Notice that Reply AI has suggestions for each prompt so that you can address everything that merits a reply, while Gmail is always limited to three suggestions for the entire email. In the future, we plan to carefully evaluate the semantic similarity between our suggestions and Gmail’s to get a better idea of how we compare.
Reply AI supports both generic and personalized suggestions, but we conclude that Reply AI works better when trained on a user’s conversation history. We discovered that personalized suggestions are more useful, especially for enterprise users, and are not currently offered by corporations like Google.
Reply rates — all the way up
Reply AI has shown significant improvements since its initial release in August. Our researchers and engineers have done a stellar job of making continuous improvements, day in and day out.
To evaluate our models in an offline setting, we built a system that compares our AI-generated suggestions with actual emails that were written by users. We calculate the number of “useful” suggestions, which we defined as those that are identical or nearly identical to what the user actually wrote. Although not all useful suggestions are guaranteed to be chosen in a live setting, our evaluation system helps us track our improvements over time and simulates how we stack up against similar products.
The first graph shows the percentage of emails with at least one useful suggestion. Back in August, we provided useful suggestions for 19 percent of emails, while our latest model provides useful suggestions for 24 percent of emails. As a comparison, Smart Reply suggestions are used in just 12 percent of emails sent on mobile.
The second graph displays the number of useful suggestions we provide in a set of three provided suggestions. The higher the number of useful suggestions, the more likely they are to save writing time for our users. Back in August, our original model returned .35 useful suggestions, while our latest model returns .50 useful suggestions. That’s an improvement of 42.9 percent!
Although .50 useful suggestions does not sound like many, we often provide more than three suggestions per email. As a result, we have a high likelihood of predicting what the user writes for any incoming message.
The Next Steps
We’ve gotten positive feedback about our new improvements and releases, but we’re certainly not satisfied yet! Our researchers are hard at work improving both Autocomplete AI and Reply AI. Our mission is to not only save you time while writing messages, but also to inspire you to write at your best.
If you have any feedback or crazy feature ideas, send us a reply or hit us up on our website. Stay tuned for our next CheckPoint!