SOV service is designed to facilitate the comparison and evaluation of secondary structure element assignments.
Evaluation of the similarity between PSEQ and OSEQ sequences is done for each
conformational state (helix, strand, coil) separately and for all conformational
states combined. The measures used are:
Q3 - traditional per-residue prediction accuracy Qindex
SOV - Segment OVerlap measure (the definition by Zemla et al. - PROTEINS:
Structure, Function, and Genetics, 34, 1999, pp. 220-223 [MEDLINE])
Q3 measure
Qindex: (Qhelix, Qstrand, Qcoil, Q3) gives percentage of residues predicted
correctly as helix, strand, coil, and all (all three conformational states combined).
The definition of Qindex is as follows.
For a single conformational state:
number of residues correctly predicted in state i
Qi = ------------------------------------------------- * 100,
number of residues observed in state i
where i is either helix, strand or coil.
For all three states:
number of residues correctly predicted
Q3 = -------------------------------------- * 100
number of all residues
SOV measure
Segment OVerlap quantity measure for a single conformational state:
1 SUM MINOV(S1;S2) + DELTA(S1;S2)
SOV(i) = --- SUM --------------------------- * LEN(S1)
N(i) SUM MAXOV(S1;S2)
S(i)
S1 and S2 are the observed and predicted secondary structure segments
(in state i, which can be either H, E or C);
LEN(S1) is the number of residues in the segments S1;
MINOV(S1;S2) is the length of actual overlap of S1 and S2, i.e.
the extent for which both segments have residues in state i,
for example H;
MAXOV(S1;S2) is the length of the total extent for which either of
the segments S1 or S2 has a residue in state i;
DELTA(S1;S2) is the integer value defined as being equal to the
MIN{(MAXOV(S1;S2)- MINOV(S1;S2)); MINOV(S1;S2);
INT(LEN(S1)/2); INT(LEN(S2)/2)}
THE SUM is taken over S, all the pairs of segments {S1;S2},
where S1 and S2 have at least one residue in state i
in common;
N(i) is the number of residues in state i defined as follows:
SUM SUM
N(i) = SUM LEN(S1) + SUM LEN(S1)
SUM SUM
S(i) S'(i)
Two sums are taken over S and S'
S(i) is the number of all the pairs of segments {S1;S2},
where S1 and S2 have at least one residue in state i
in common
S'(i) is the number of segments S1 that do not produce
any segment pair
Segment OVerlap quantity measure for all three states:
1 SUM SUM MINOV(S1;S2) + DELTA(S1;S2)
SOV = --- SUM SUM --------------------------- * LEN(S1)
N SUM SUM MAXOV(S1;S2)
i S(i)
where the normalization value N is a sum of N(i) over all three
conformational states (i = HELIX, STRAND, COIL):
SUM
N = SUM N(i)
SUM
i
SOV observed indicates that S1 is observed fragment and S2 is predicted one.
SOV predicted indicates that S1 is predicted fragment and S2 is observed one.
-------------------------------------------------------------------------------
Data format of prediction
The SSP (secondary structure prediction) data can be prepared
in COLUMN format:
First column: protein sequence (AA) in one-letter code
Second column: observed (OSEC) secondary structure
Third column: predicted (PSEC) secondary structure
Secondary structure conformational states can be either helix (H), strand (E) or coil (C).
Note: Alternatively, for helix assignment 'G' or 'I' can be used instead,
for strand assignment 'B' can be used instead, and
for coil assignment 'L', 'T' or 'S' can be used instead.
Spaces should be used as delimiters to separate columns.
Example.1 of input data format:
*******************************
AA OSEC PSEC
M C C
Q C C
T C H
R H H
S H H
I H H
G C C
V C C
-------------------------------------------------------------------------------
Three other formats of the input data are also allowed:
Example.2 of input data format:
*******************************
AA OSEC PSEC NUM
M C C 1
Q C C 2
T C H 3
R H H 4
S H H 5
I H H 6
G C C 7
V C C 8
Example.3 of input data format:
*******************************
>OSEQ
CCCHHHCC
>PSEQ
CCHHHHCC
>AA
MQTRSIGV
Example.4 of input data format:
*******************************
SSP 1 M C C
SSP 2 Q C C
SSP 3 T C H
SSP 4 R H H
SSP 5 S H H
SSP 6 I H H
SSP 7 G C C
SSP 8 V C C
-------------------------------------------------------------------------------
Output:
*******
SECONDARY STRUCTURE PREDICTION
NUMBER OF RESIDUES PREDICTED: LENGTH = 8
AA OSEC PSEC NUM
M C C 1
Q C C 2
T C H 3
R H H 4
S H H 5
I H H 6
G C C 7
V C C 8
-----------------------
SECONDARY STRUCTURE PREDICTION ACCURACY EVALUATION. N_AA = 8
ALL HELIX STRAND COIL
Q3 : 87.5 100.0 100.0 80.0
SOV : 100.0 100.0 100.0 100.0
-----------------------