It would be the same for other applications. We need to find the minimum edit distances for typed word, and that would be words that we can recommend to a user of our spell checking system. Then we need to calculate the edit distance of typed word and similar word (if same form does not exist). We need to have a corpus of words of a particular language with most of the forms, or a mechanism that knows how to create all the forms of a word. What we need to calculate is the minimum edit distance for that language. When it comes to creating a spell checker, we need a bit more than just the edit distance between 2 words or 2 strings. The Levenshtein distance between two strings is no greater than the sum of their Levenshtein distances from a third string (triangle inequality).If the strings are the same size, the Hamming distance is an upper bound on the Levenshtein distance. It is zero if and only if the strings are equal.It is at most the length of the longer string.It is always at least the difference of the sizes of the two strings.The Levenshtein distance has several simple upper and lower bounds that are useful in applications which are applied with many of them. sittin → sittin g (insertion of “g” at the end).sitt en → sitt in (substitution of “i” for “e”).kitten → sitten (substitution of “s” for “k”).Note that the first element in the minimum corresponds to deletion (from to ), the second to insertion and the third to match or mismatch, depending on whether the respective symbols are the same.įor example, the Levenshtein distance between “kitten” and “sitting” is 3, since the following three edits change one into the other, and there isn’t way to do it with a fewer than three edits: Mathematically, the Levenshtein distance between two strings is given by where Levenshtein distance between two words is the minimum number of single-character edits (insertion, deletion, substitution) required to change one word into the other. This type of calculating edit distance is referred to as Levenshtein edit distance. We could assign to all of these operations some value, for example for insertions and deletions, usual value is 1, and for editing is 2 (it is combined from deletion and insertions). Edit operations are character insertions, character deletion and character change. So we are measuring the distance between the two words or strings by calculating how many edit operations are needed to be applied to one string to become the other one. Main problem here is to calculate the distance between the word you get, and possible words or a word you are expecting. Then you may use edit distance to assume what speaker wanted to say. But because of noice or user’s bad English command that technology you are using recognized is not what you are expecting as possible command. For example when what you want to create application with voice commands, you may use some voice recognition API or some voice recognition technology. This thing, I would like now to talk about, can have many uses. If you are getting commands from voice, and you cannot match the word someone said perfectly, maybe after checking how distant it is to commands, you can do something reasonable. It is useful to apply it to a voice recognition api. Also, with this technology in mind, you can do many other things. Have you ever wondered what the underlying technology and math behind a spell checker are? I did, so I even built my own spell checker.
0 Comments
Leave a Reply. |