Strings 2008 2 8
Substitution matrices
Topcis
- What are substitution matrices
A substitution matrix is the mathematical form to describe the odds that a character changes into another character.
Discovery questions:
|
(Peter, is the concept of odds too hard and/or too confusing?)
odds = p / (1 - p), where p is the probability of the change. Thus the odds of the example above is 1/6, or what the bookmaker calls six-to-one.
- Why are odds used
Odds are used since we want to quantify how much *more likely* the change is compared to a random guess, and not how likely the change is.
Discovery questions:
|
- How to apply them in sequence comparison
When we compare biological sequences we generally assume that all characters change independently from each other. Under this assumption we then can compute the odds that one sequence changes into another by simply using the odds of each character change (or non-change) in each position of the two strings and multiplying all the odds together.
(Peter, are you comfortable with introducing log-odds here? Substitution matrices are really log-odd matrices and the values can be simply added together because of log(o1*o2*o3*..) = log(o1)+log(o2)+log(o3)...
The odds (log-odds) matrix (a.k.a. substitution matrix) of changing any character into any other character, provides us with a exactly quantifiable metric to compare two strings.
Illustration example (POD's comparing 4-5 students, and summarising the comparison in form of a tree).
--ThomasHuber 17:43, 10 January 2008 (EST)