Strings 2008 2 8: Difference between revisions

From MDWiki
Jump to navigationJump to search
No edit summary
No edit summary
 
Line 6: Line 6:
A substitution matrix is the mathematical form to describe the odds that a character changes into another character.
A substitution matrix is the mathematical form to describe the odds that a character changes into another character.


{| border="1"
|Discovery questions:
* If you have to choose randomly a day of the week, what are the odds that you choose Sunday?
|}
(Peter, is the concept of odds too hard and/or too confusing?)
odds = p / (1 - p), where p is the probability of the change. Thus the odds of the example above is 1/6, or what the bookmaker calls six-to-one.
* Why are odds used
Odds are used since we want to quantify how much *more likely* the change is compared to a random guess, and not how likely the change is.
{| border="1"
|Discovery questions:
* What does it mean when the odd is bigger than one, and what does it mean when the odd is smaller than one?
|}




* Why are they used
* How to apply them in sequence comparison
* How to apply them in sequence comparison
When we compare biological sequences we generally assume that all characters change independently from each other. Under this assumption we then can compute the odds that one sequence changes into another by simply using the odds of each character change (or non-change) in each position of the two strings and multiplying all the odds together.
(Peter, are you comfortable with introducing log-odds here? Substitution matrices are really log-odd matrices and the values can be simply added together because of log(o1*o2*o3*..) = log(o1)+log(o2)+log(o3)...
The odds (log-odds) matrix (a.k.a. substitution matrix) of changing any character into any other character, provides us with a exactly quantifiable metric to compare two strings.


Illustration example (POD's comparing 4-5 students, and summarising the comparison in form of a tree).


to be completed




[[Strings_2008_2_9 | goto Motif search]]
[[Strings_2008_2_9 | goto Tree of life]]


--[[User:ThomasHuber|ThomasHuber]] 17:43, 10 January 2008 (EST)
--[[User:ThomasHuber|ThomasHuber]] 17:43, 10 January 2008 (EST)

Latest revision as of 03:29, 17 January 2008

Substitution matrices

Topcis

A substitution matrix is the mathematical form to describe the odds that a character changes into another character.


Discovery questions:
  • If you have to choose randomly a day of the week, what are the odds that you choose Sunday?


(Peter, is the concept of odds too hard and/or too confusing?)

odds = p / (1 - p), where p is the probability of the change. Thus the odds of the example above is 1/6, or what the bookmaker calls six-to-one.


  • Why are odds used

Odds are used since we want to quantify how much *more likely* the change is compared to a random guess, and not how likely the change is.

Discovery questions:
  • What does it mean when the odd is bigger than one, and what does it mean when the odd is smaller than one?


  • How to apply them in sequence comparison

When we compare biological sequences we generally assume that all characters change independently from each other. Under this assumption we then can compute the odds that one sequence changes into another by simply using the odds of each character change (or non-change) in each position of the two strings and multiplying all the odds together.

(Peter, are you comfortable with introducing log-odds here? Substitution matrices are really log-odd matrices and the values can be simply added together because of log(o1*o2*o3*..) = log(o1)+log(o2)+log(o3)...


The odds (log-odds) matrix (a.k.a. substitution matrix) of changing any character into any other character, provides us with a exactly quantifiable metric to compare two strings.


Illustration example (POD's comparing 4-5 students, and summarising the comparison in form of a tree).


goto Tree of life

--ThomasHuber 17:43, 10 January 2008 (EST)