Roytberg M.A.
Institute of the Mathematical Problems in Biology, Russian Academy of Science, 142292, Pushchino, Moscow Region, Russia, E-mail: roytberg@impb.serpukhov.su
The notion of parametric sequence alignment was introduced and investigated in papers [1, 2].
Let A be an alignment of the sequences u and v, c(A) be the number of matches in A, f(A) be the number of mismatches and d(A) be the number of indels (imserted or deleted symbols).
Weight of an alignment A is the value
(1)
where C, F, D are coefficients constituting the parameter vector of an alignment procedure.
An alignment A is called optimal relatively to the given parameters if its weight is the maximal weight possible for the given parameters.
A set P of parameter vectors is called "optimality region" if (i) there exists an alignment optimal for all parameters vectors from P; (ii) P is maximal, that is P is not a part of a larger region that still satisfies condition (i).
In [2] it was shown that the number of optimality regions for sequences u, v of lengths m and n (m <n) does not exceed where k is a coefficient.
Note that formula (1) corresponds to the global sequence alignment and takes into account only the number of deleted/inserted symbols, but not gaps.
The following scoring functions are also of interest:
- global alignment with affine gap penalty function:
(2)
(here g(A) is the number of gaps in alignment A);
- local alignments with or without taking into account the number of
gaps:
(3)
(4)
(here di(A) is the number of inner indels in alignment A,
gi(A) is the number of inner gaps in alignment A).
Proposition. Let u, v be sequences of lengths m and n respectively,
m<n. Then the number of optimality regions does not exceed:
in case (2); in case (3);
in case (4), where k1, k2, k3 are some
coefficients.