Ticket #225: spelling-formulae.rst

File spelling-formulae.rst, 1.0 KB (added by Olly Betts, 14 years ago)

Simple mathematical model of spelling errors

Line 
1.. -*- coding: UTF-8 -*-
2
3Starting from Bayes' Theorem, assuming X and A are independent::
4
5 p(X|A)·p(A) = p(X∩A) = p(A∩X) = p(A|X)·p(X)
6
7If X is "the intended word was X" and A is "the typed word was A", let m be the
8number of edits required to get from X to A, and assume each edit is equally
9likely with probability p, then we want to pick the most likely X which the
10user intended, given they actually typed A - i.e. we want to::
11
12 find X to maximise p(X|A) = p(A|X)·p(X)/p(A)
13 ⇔ maximise p(A|X)·p(X)
14 ⇔ maximise p(A|X)·freq(X)
15 ⇔ maximise pᵐ·freq(X)
16 ⇔ maximise log(p).m + log(freq(X))
17
18Since p < 1, log(p) is negative, so we want to::
19
20 minimise m + log(freq(X))/log(p)
21
22It's likely X and A aren't actually independent, and that not all edits have
23equal probability, but this does at least give us a formula which instinctively
24feels plausible and which has some theoretical justification. We should also
25be able to optimise it fairly well.
26
27We do need an estimate of the p though.