Spell Check and Autocorrect with Conditional Probability

August 16, 2016

A client of mine wanted to reduce the time it took for his vendors to upload an inventory list to his system. The system currently matches the product name with known product names in the system database, and for the names that match exactly, the inventory items are automatically updated. For the items that did not match, the user needs to search through the system database and choose the matching inventory item.

In the domain in which the client operates, the product names are not standardized and there is no accepted SKU ID for the products. The automatic matching also has a bunch of heuristics that cover the most commonly occurring mismatches, but they only apply to a subset of the mismatched products. In this blog, I will elaborate upon an approach that uses conditional probability to solve the mismatch in these product names without resorting to heuristics.

Let’s assume we have the following known names in the inventory:

Correct Product Names

Carbon Dioxide

Carbon Monoxide

Carbone Lorraine

If a user provides a name like “Carbin,” we will calculate the most likely correct work using Bayes Theorem:

P(C/W) = Probability(correctly named product/wrongly named product)

P(W/C) = Probability(wrongly named product/correctly named product)

P(C) = Probably(correctly named product)

P(W) = Probability(wrongly named product)

P(C/W) = P(W/C) * P(C) / P(W)

We can ignore the division factor P(W) as it will be constant for all the probability calculation and will have no impact when we choose the highest probability.

P(C) = [Count of correctly named product] / [Total number of individual names in product]

eg.
Probability(Carbon) = [Count of Carbon] / [Total number of individual names in product] = 2/7
Probability(Dioxide) = 1/7

Now, how do we calculate P(W/C)? We will construct all possible permutations of the words for a wrongly provided product name using the concept of edit distance. The edit distance is the minimum number of insertions, deletions, and substitutions required to transform one string into the other.

Let us understand it with an example. The wrongly provided word in need of correction is “Carbin.” So we prepare a set of all possible alterations to “Carbin” as shown below:

Delete a letter, eg. arbin, crbin, cabin, etc.
Replace a letter, eg. aarbin, barbin, etc.
Add a letter, eg. acarbin, bcarbin, etc.
Switch two letters around, eg. acrbin, crabin, cabrin, etc.

We choose all the words from the above set that are known product names. These words are Carbon and Carbone. In probabilistic terms, the probability of all the unknown product names would be 0 and they can’t be selected.

Prob(unknown word/Carbon)=0 and Prob(known word/Carbon)=1

Finally, calculate the conditional probabilities for Gruaud, Gruaudi

Prob(Carbon/Carbin) = Prob(Carbin/Carbon) * Prob(Carbon) = 1 * 2/7 = 0.28

Prob (Carbone/Carbin) = Prob(Carbin/Carbone) * Prob(Carbone) = 1 * 1/7 = 0.14

The higher probabilistic word is “Carbon.”

I hope you have enjoyed reading this. If you have any questions or queries, please leave a comment below. I highly appreciate your feedback!

AI Artificial Intelligence or Machine Learning Concept with Digital Computer Code with Chess Pieces. 3D Render

Software / Product Engineering

The Future of Digital Product Engineering: Building AI-Infused Strategic Software for a Rapidly Transforming World

April 1, 2025

AI/ML

The Importance of Guardrails for Secure and Responsible AI Applications

January 13, 2025

Healthcare

What Payers Need to Know: Adapting Commercial Offerings Under the New Administration

December 17, 2024

View all

Stay in Touch

Keep your competitive edge – subscribe to our newsletter for updates on emerging software engineering, data and AI, and cloud technology trends.

Spell Check and Autocorrect with Conditional Probability

Recent blog posts

The Future of Digital Product Engineering: Building AI-Infused Strategic Software for a Rapidly Transforming World

The Importance of Guardrails for Secure and Responsible AI Applications

What Payers Need to Know: Adapting Commercial Offerings Under the New Administration

Stay in Touch