Strings 2008 2 1

From MDWiki
Jump to navigationJump to search

Imtroduction to string comparison

A daily life task and key concept in science is comparing two objects. We learned earlier that a string is the simplest data structure to store information, thus by comparing two strings encoding the objects will allow us to compare them.


  • According to the Oxford English dictionary the correct spelling for the word is: Introduction. Was it spelled correctly in the title of this page?
  • The DNA fragment ACTTTAGCCAT was extracted from a sample collected from the rare Babelfish. Further analysis has shown that the Babelfish sequence is ACTTGAGCCAT while the human sequence is ACTTTAGCCAT. Was the extracted DNA sequence indeed from the Babelfish or a contaminant of human DNA?

Discovery question:

Go to Ensembl Human genome. How many base pairs does the human genome have? How long would it take you to compare the DNA of two humans if you can compare 10 nucleotides per second?


Given two strings s1 and s2, compare each pair of elements. The two strings are the same if for all pairs the characters are identical.

simple python code

def StringsAreIdentical(s1, s2):
    """ checks if two strings s1 and s2 are identical"""

    # not identical if different length
    if (len(s1) != len(s2)):
        return 0

    for i in range(len(s1)):
        if (s1[i] != s2[i]):
            return 0;        # not identical
    return 1

dna_probe = 'ACTTTAGCCAT'

dna_human = 'ACTTTAGCCAT'
dna_babelfish = 'ACTTGAGCCAT'

compare1 = StringsAreIdentical(dna_probe, dna_human)
compare2 = StringsAreIdentical(dna_probe, dna_babelfish)

if ((compare1 == 1) and (compare2 == 1)):
    print "probe DNA is identical with both human and babelfish DNA"
elif ((compare1 == 1) and (compare2 == 0)):
    print "probe DNA is identical with human DNA"
elif ((compare1 == 0) and (compare2 == 1)):
    print "probe DNA is identical with babelfish DNA"
    print "probe DNA is identical with neither human nor babelfish DNA"

goto Pattern search

--ThomasHuber 12:08, 10 January 2008 (EST)