Strings 2008 2 8

From MDWiki
Jump to navigationJump to search

Substitution matrices


A substitution matrix is the mathematical form to describe the odds that a character changes into another character.

Discovery questions:
  • If you have to choose randomly a day of the week, what are the odds that you choose Sunday?

(Peter, is the concept of odds too hard and/or too confusing?)

odds = p / (1 - p), where p is the probability of the change. Thus the odds of the example above is 1/6, or what the bookmaker calls six-to-one.

  • Why are odds used

Odds are used since we want to quantify how much *more likely* the change is compared to a random guess, and not how likely the change is.

Discovery questions:
  • What does it mean when the odd is bigger than one, and what does it mean when the odd is smaller than one?

  • How to apply them in sequence comparison

When we compare biological sequences we generally assume that all characters change independently from each other. Under this assumption we then can compute the odds that one sequence changes into another by simply using the odds of each character change (or non-change) in each position of the two strings and multiplying all the odds together.

(Peter, are you comfortable with introducing log-odds here? Substitution matrices are really log-odd matrices and the values can be simply added together because of log(o1*o2*o3*..) = log(o1)+log(o2)+log(o3)...

The odds (log-odds) matrix (a.k.a. substitution matrix) of changing any character into any other character, provides us with a exactly quantifiable metric to compare two strings.

Illustration example (POD's comparing 4-5 students, and summarising the comparison in form of a tree).

goto Tree of life

--ThomasHuber 17:43, 10 January 2008 (EST)