Technical Comments

Delta-Interacting Protein A and the Origin of Hepatitis Delta Antigen

Science  02 May 1997:
Vol. 276, Issue 5313, pp. 824-825
DOI: 10.1126/science.276.5313.824

Robert Brazas and Don Ganem (1) propose that the cellular protein, delta-interacting protein A (DIPA), interacts with hepatitis delta antigen (HDAg), affecting hepatitis delta virus (HDV) replication. Although their work provides useful information about the biology of HDV, the main conclusion, that DIPA is the cellular homolog of HDAg, is not supported by their data.

We have examined the statistical significance of the match between HDAg and DIPA protein sequences by Monte Carlo simulation. In their comparison between HDAg and DIPA protein sequences, Brazas and Ganem reported an identity of 24% and a similarity of 56%, using the GES scale, which considers hydrophobicity when determining the distance matrix for substitutions (2). We compared HDAg with 10,000 randomized DIPA sequences, using the GAP program with the same parameters as Brazas and Ganem (1) (a gap weight of 3.0 and gap length weight of 0.1). The probability distributions for identity match and for similarity values that are determined using the GES scale (1) show that the match between HDAg and DIPA is not significant (Fig. 1A): The probability for an identity match greater than or equal to 24% is 13.2% and the probability for a similarity match greater than or equal to 56% is 14.1%. This does not support the proposed common ancestral relationship between HDAg and DIPA.

Figure 1

Probability distributions for (A) DIPA sequence randomized and (B) randomization based on average amino acid composition.

Furthermore, it is inappropriate to use the GES scale to determine homologous relationships between protein sequences, because convergent evolution could easily affect the hydrophobicity of a protein sequence, a relatively simple chemical property. However, the match between HDAg and DIPA is also not significant, with the use of the PAM-250 matrix (data not shown).

While various matrices may give different similarity measurements, the identity remains the same given a particular alignment. However, the identity match is a result of a biased amino acid composition. A Monte Carlo simulation comparing HDAg to 10,000 random sequences that have the average amino acid compositions of an overall protein with the same length of the DIPA protein sequence shows that the observed similarity is again not significant (P = 18.2%), but the identity match would have been significant (Fig. 1B). Thus the reported “match” is biased by the amino acid compositions of HDAg and DIPA. [We used the amino acid composition derived from the exon database developed from GenBank release 90, where redundant sequences are deleted by a similarity criterion of 20%. For detailed procedures, see (3)].

The three amino acid compositions are listed (Table 1). Both HDAg and DIPA have similarly biased amino acid compositions with overrepresented residues like Glu, Gly, and Arg and underrepresented His, Thr, and Tyr. This will lead to elevated identity matching between the simulated random sequences and HDAg.

Table 1

Amino acid compositions of HDAg, DIPA, and the exon database.

View this table:

We conducted a test of the effect of amino acid composition on the identical matching of the residues given by Brazas and Ganem. With the use of the amino acid compositions of HDAg and DIPA, we calculated the expected number of identical matches and then compared that expectation with the observed matches. The 47 identical matches reported (1) can be explained as chance matches with a probability of 76% that occur as a consequence of the biased amino acid composition of HDAg and DIPA (Table2).

Table 2

Distribution of identical matches of amino acid compositions between HDAg and DIPA. χ2 = 6.6 with 10 degrees of freedom; P = 0.76. The expected numbers were calculated as products of expected frequency and total identical matches (47). The expected frequency, f aa = [N aa(HDAg) +N aa(DIPA)]/∑i=1 to 11[N aai(HDAg) + N aai(DIPA)], whereN aa is the number of residues in HDAg and DIPA.

View this table:

In an accompanying Perspective, Hugh D. Robertson proposes that spliceosomal introns could have arisen from the self-replicative viroid (4). This proposal provides a specific candidate (viroid) as an ancestor of modern introns, in line with the exon theory of genes (5) that spliceosomal introns are descendants of self-splicing introns that existed in an RNA world. However, the relationship between HDAg and DIPA cannot yet be taken as evidence for this particular scenario.


Response: Long et al. raise an important issue related to the determination of an evolutionary relationship between two proteins that share only limited amino acid identity. While we agree that the biased amino acid composition of L-HDAg and DIPA complicates the interpretation of their 24% identity (1), this compositional bias does not invalidate our inference that these two proteins are related.

The GAP program used by Long et al. is a global alignment algorithm that aligns two proteins; it uses a substitution matrix to provide a value for each aligned pair of amino acids, and in addition assesses a penalty for the introduction of a gap into the alignment [gap penalty score of −(3 + 0.1k) for a gap of lengthk]. The program then determines the alignment that produces the maximal score when all the alignment values and gap penalties are added together. By using only percent identity and percent similarity individually, however, Long et al. do not include other important alignment characteristics (gap number, gap length, and the combined identity and similarity) that are inherent in each alignment score. This and other factors lead to a significant underestimate of the relatedness of the two proteins.

Like Long et al., we have compared L-HDAg to randomized (shuffled) DIPA sequences; such a pairwise comparison is justifiable because independent biological experiments pointed to a potential relationship (see below). However, we used the alignment scores rather than percent identity to determine the significance of the L-HDAg–DIPA alignment. We also used a more recent substitution matrix, the BLOSUM-62 matrix (2), which has been shown to outperform most other matrices (including PAM-250) in detecting significant protein similarities (3), together with the GAP alignment algorithm [gap penalty score of −(9 + k) for a gap of lengthk] (4) to align L-HDAg to 10,000 randomized DIPA sequences. Only 0.2% of the randomized DIPA–L-HDAg alignments had a score equal to or greater than the optimal L-HDAg–DIPA alignment (score = 64) (Fig. 11). (Although this statistical result would not be noteworthy in a database search, it is significant in light of the biological data favoring homology; see below.) This alignment contains 55 identical residue pairs (29.6% identity) compared to 47 for the original GAP alignment (5). Our analysis explicitly takes into account the issue of the contribution of the amino acid composition bias presented by Long et al., yet points much more strongly than theirs to a relationship between L-HDAg and DIPA.

Figure 1

GAP alignment scores using BLOSUM-62 substitution matrix with DIPA sequence randomized.

In addition to their shared compositional bias, which would be expected if indeed L-HDAg and DIPA are related, both proteins (i) are nearly identical in size, (ii) contain putative coiled coil domains, and most importantly, (iii) specifically oligomerize with each other in a manner that is dependent on the integrity of their coiled coil domains. [As we have shown (1), this interaction can have important functional consequences in vivo.] The computer-based alignment analyses do not take these facts into consideration, but biologists must do so in coming to a judgment about relatedness.

The assignment of an evolutionary relationship between two proteins of limited similarity is often difficult, and the compositional bias identified by Long et al. further complicates this risk for DIPA and HDAg. However, our analysis shows that when the totality of the evidence linking the proteins is considered, the confounding effect of compositional bias is not sufficient to negate the relationship between them that we proposed (1).


Related Content

Navigate This Article