Skip to content

The Genome’s Tale is Moving!

August 3, 2013

Hi to all of you who occasionally drop by,

I know I haven’t posted for more than half a year, and this is largely because I’ve been busy with what may appropriately be called “stuff.” I’m moving The Genome’s Tale to a new WordPress blog, Origins of Genomes. This is because I’d like to start from scratch so I can try a somewhat different layout and approach to blogging. Some of the articles from this site will be moved over to the new blog, but for the most part, the content there will be fresh. If I get enough traffic, I’ll consider getting the domain name. But not yet.

 

12/12/12

December 12, 2012

12_12_12_last_date-300x210

Convergent Evolution and Front-loading

May 22, 2012

There are many examples of convergent evolution among different life forms. Convergent evolution is simply the independent acquisition of some biological feature among different lineages. For example, both birds and bats have wings (this is an elementary example, of course), but this similarity is not due to common ancestry but instead it is the result of convergent evolution.

So how does convergent evolution relate to front-loading? One of the criticisms of the front-loading hypothesis is that you can’t design a genome in a unicellular organism to evolve specific organs, tissues, biochemical systems, etc., several billion years in the future. But convergent evolution neatly answers this criticism.

A classic example of convergent evolution is the eye in the octopus and the mammalian eye. They are extremely similar, structurally speaking (see the figure, below).

Figure. Source: Ogura et al., 2004.

 

The human eye and octopus eye both have the following tissues:

1. Eyelids.

2. Cornea.

3. Pupil.

4. Iris.

5. Ciliary muscle.

6. Lens.

7. Retina.

8. Optic nerve.

Furthermore, the arrangement of these parts are practically the same. Cool! And these two systems have arisen independently, through convergent evolution (i.e., they are not related through common descent; see Ogura et al., 2004). This means that these two organs have evolved as a result of the initial state of the last common ancestor of mammals and octopuses. In short, the convergent evolution of these two organs demonstrates that a genome can be programmed to evolve a given objective. If we ran the “clock of life” backwards (to borrow from Stephen J. Gould), human-like eyes would probably appear on the scene once again. In other words, the same system keeps popping up again and again. And this is evidence that a given objective can be front-loaded, starting with a specified initial state. The eye is a beautiful example of convergent evolution, wherein 8 separate “parts” independently came together in the same arrangement to produce the function of vision.

Are there examples of convergent evolution in biochemical systems? If so, this would provide evidence that not only can organs be front-loaded, but so too can biochemical systems. More on this later.

References

Atsushi Ogura, Kazuho Ikeo, Takashi Gojobori. Comparative Analysis of Gene Expression for Convergent Evolution of Camera Eye Between Octopus and Human. Genome Research, 14: 1555-1561 (2004).

Molecular Machines and Evolution, Part 1

May 22, 2012

I will be writing a number of essays addressing the argument that co-option is a plausible evolutionary mechanism for the origin of molecular machines. Briefly, there are many molecular machines which carry out functions that require the interaction multiple protein components.  How could these biological systems have evolved through purely non-teleological mechanisms? Co-option is often offered as a plausible evolutionary mechanism that can give rise to such molecular machines. But there are a number of problems with invoking non-teleological co-option as a general solution to the origin of molecular machines, and these problems can be summarized as follows:

1.       Complementary conformations.

2.       Pre-adaptation of components.

3.       A specific sequence of co-option events is required.

4.       Other considerations.

I will be discussing the first problem in this article; in following articles, the other problems will be considered.

The Co-option Mechanism

Molecular machines are composed of specific protein components that interact to produce biological function. Below is a figure describing a hypothetical molecular machine composed of protein components A, B, C, and D.

Figure 1. Components A, B, C, and D interact to produce a biological function.

If this biological function can only be carried out by the interaction of multiple protein components, then co-option must be invoked. Simply put, co-option involves a protein originally carrying out function X, which then undergoes a shift in function such that it now carries out function Y.  “Normal” Darwinian evolution, where for example, gene duplication gradually increases the efficiency of the system, cannot explain the origin of molecular machines that carry out functions that can only be carried out by the interaction of multiple protein components (note: when I say “normal Darwinian evolution,” I simply mean random mutation and natural selection gradually increasing the efficiency of the system; my use of the term “normal” does not imply that co-option is not a normal process, etc.). This is because the very nature of the function requires that multiple components interact, so you couldn’t start with a single component carrying out this function. It’d have to carry out another function, then interact with other proteins over evolutionary time, and undergo a shift in function such that the new function arises. And this is the essence of co-option, illustrated in Figure 2.

Figure 2. Components originally carrying out unrelated functions associate with each other step-by-incremental-step, eventually producing a novel function (function 6).

Now let’s examine if co-option offers a general solution to the origin of molecular machines.

The Problem of Protein Shapes

Molecular machines are constructed from individual protein parts. For example, the bacterial flagellum of Salmonella consists of 42 protein parts, such as MotA and MotB (the motor proteins), FlgE (universal joint), FliD (cap protein), etc. These protein parts interact in a tightly-integrated, specific manner. And to interact in precise ways, these protein parts have specific, complementary shapes consisting of knobs, crevices, protrusions, etc. Thus, in order for these molecular machines to have been co-opted from precursor parts, the precursor proteins would need to independently evolve complementary shapes prior to the co-option of the molecular machine, despite carrying out different functions. Consider Figure 3.

Figure 3.

                In figure 3 we see a “protein complex” composed of 5 components: A, B, C, D, and E. Importantly, the shapes of these proteins are complementary to each other.  Component A is complementary to B, C, and D. And D is complementary to A, C, and E. Yet the co-option scenario would have us believe that parts A through E (which are shown apart from the complex in the above figure), were originally functioning in different contexts, and were independently shaped by natural selection such that their shapes just happened to be complementary. But there is no reason why, without teleology, these 5 proteins should be shaped just right prior to their co-option into the novel molecular machine. Is it really plausible to expect several proteins to independently match a given pattern – the pattern of complementarity – especially since this pattern would have to be shaped by chance alone, since there is nothing in natural selection that would drive towards matching a pattern that will only be beneficial in the future? It is important to remember that non-teleological evolution has no foresight, unlike an engineer. And since evolutionary mechanisms cannot peer into the future, it is entirely unreasonable to expect non-teleological processes to shape these proteins in just the right way such that when they do associate, novel function appears.

How specific do protein shapes need to be?

The parts depicted in Figure 3 have very specific shapes and fit very well. But one could argue that during the evolution of a molecular machine, the parts were not quite so complementary but still managed to elicit the function. Then, over time, the parts became more tightly integrated, performing the function more efficiently. This situation is seen in Figure 4.

Figure 4.

                In the above figure, components A through E are somewhat complementary to each other, but not very much. These parts associate to form the molecular complex, and then over time (represented by the large red arrow), the parts become more tightly integrated, resulting in a molecular machine that is composed of tightly integrated components. So, in this scenario, the shapes of the precursor parts do not need to be that complementary to each other – they only need to be suited to each other well enough so that there is function, even if it is only minimal. Let us now consider this possibility.

The first point I will make here is that random protein-protein binding almost always does not produce new biochemical functions. Of all the possible ways for 2 or more proteins to bind together, the vast majority of them will not offer novel biological functionality. And since we are talking about 3D space here, there are trillions of different possible protein-protein interactions between 2 proteins.

However, there is another point I want to make, and that is that as the complexity of the system increases, and more parts are co-opted into the system, the greater the constraints on evolutionary mechanisms, and the less plausible it is for that biological machine to increase in complexity. And we can tie this into the discussion of complementary shapes. Two proteins could feasibly, through chance, have roughly complementary shapes, and loosely bind, producing a novel but inefficient function. This loose conglomeration could grow in complexity by the co-option of more proteins. But as the system becomes more complex and more tightly integrated, simple binding to the system by a protein will not produce novel function. The protein must bind specifically to particular components of the complex, and of course, the more components there are, the greater the number of possible protein-protein binding interactions – the vast majority of which will be non-functional. Indeed, you might “gum up the works” if your new protein does not bind specifically enough and if its shape is not fully complementary to the proteins it will interact with.  John Bracht highlighted this way back in 2002 in a response to Ursula Goodenough on the evolution of the bacterial flagellum:

“Evolutionary explanations must describe how a new protein integrates into an old system in such a way as to allow continued functionality overall (often, both the incoming protein and the pre-existing system must be extensively modified to fit together in a coordinated way), and enhance functionality of the entire system in such a way as to provide selective advantage.”

Furthermore, if a new protein component will interact with multiple components of the system, there are even severer constraints on what protein shapes are allowed. The blind watchmaker would have to independently shape this new protein precisely so that when it is incorporated into the molecular machine, its shape fits well.  We can construct a hypothesis that goes as follows:

The more components a protein interacts with, the more specified its shape must be, and subsequently, the more specified its sequence must be.

The greater the number of components a protein will interact in a biological machine, the greater the degree of specificity its shape must have, and the more specified its sequence must be (since the sequence is what codes for the protein shape).

Now, let’s test this hypothesis.

ATP synthase and Protein Conformation Specificity

To test this hypothesis, we will begin with the following premise: protein conformation specificity is determined by amino acid sequence specificity. In other words, since it is ultimately the amino acid sequence of the protein that determines its shape (there are other factors, but I won’t get into that right now), we’d predict that a protein that interacts with multiple components will have a greater degree of sequence conservation across taxa than a protein that only interacts with one protein.

Here’s where ATP synthase comes in. Bacterial F1F0 ATP synthases are composed of 8 components: the alpha subunit, the beta subunit, the a subunit, the b subunit, the c subunit, the gamma subunit, the delta subunit, and the epsilon subunit (see Figure 5).

Figure 5. Diagram of the ATP synthase system.

 I retrieved the sequences of each of these components from UniProt. The sequences were all from three different bacteria genera: Escherichia, Shigella, and Bacillus. So, there were 3 alpha sequences, 3 beta sequences, 3 subunit a sequences, etc. The sequences of each component were then aligned using ClustalO, and the percent identity was recorded. Below is a table of the subunits, the percent sequence identity shared among the 3 sequences from each subunit, and the number of ATP synthase (ATPase) components each of these proteins interact with.

Name of Protein Percent Sequence Identity Number of components protein interacts with
ATPase subunit alpha 52.621% 4 (beta, gamma,  delta, epsilon)
ATPase subunit beta 65.962% 4 (alpha, gamma, delta, epsilon)
ATPase subunit a 24.468% 2 (c, b)
ATPase subunit b 24.571% 2 (a, delta)
ATPase subunit c 39.241% 3 (a, gamma, epsilon)
ATPase subunit delta 22.162% 3 (alpha, beta, b)
ATPase subunit epsilon 34.532% 4 (alpha, beta, gamma, c)
ATPase subunit gamma 36.426% 4 (alpha, beta, epsilon, c)

 

The first feature that I’d like you to notice is that ATPase subunits a and b both interact with only 2 components, and they share almost exactly the same amount of sequence conservation (24.468% and 24.571%, respectively, a difference of about .1%). However, we do see some exceptions to the hypothesis I described above. Subunit delta interacts with 3 components but has the lowest degree of sequence conservation. And subunits gamma and epsilon both interact with 4 components but have a lower degree of sequence conservation than ATPase subunit c. Nevertheless, if we average the degrees of sequence conservation among the ATPase subunits that interact with different numbers of components – we do indeed find that, on average, the greater the number of components an ATPase subunit interacts with, the greater the degree of sequence conservation, and hence, the more conserved the 3D structure of the protein (see graph, below).

Graph. This graph lists the mean degree of sequence conservation among ATPase subunits that interact with 2, 3, and 4 components respectively.

From the above graph we can see that, generally speaking, our hypothesis is correct. The greater the number of components a protein interacts with, the more specific its sequence and shape must be. And this adds another constraint on what kinds of proteins are and are not tolerated for being co-opted into a multi-part molecular machine.

There is one more detail I would like to add here, regarding the matter of complementary shapes: not only must the proteins be fairly complementary to one another, but these complementary-shaped proteins must also be localized to the same subcellular location. If they are not, then the molecular complex cannot be co-opted from these precursor proteins. This, again, adds another constraint on the co-option scenario, and diminishes its plausibility as a general solution to the origin of molecular machines.

Summary

I will summarize the conclusions of this article in brief:

  • In order to function properly, molecular machines require the interaction of protein components that interlock and bind together. The shapes of the proteins are what allow proteins to fit snuggly with each other, producing biological function.
  • Although loosely complementary shapes will produce function, there is a threshold at which there will be either novel functionality or no novel functionality; and the vast majority of physically possible protein shapes will be below this threshold. Further, the precursor proteins which independently evolve complementary shapes must just happen to be localized to the same subcellular location.
  • If a protein that will be co-opted into a multi-part complex will interact with multiple components of the molecular complex, then its shape must be very specific. And there are many more ways to clog up, gum up, and destroy the function of a molecular machine by tossing a protein into the mix than there are ways to enhance the function of the machine by the addition of a new protein.

To be continued…

 

Note: Some of these images are not very high quality; however, if you click on them, they will have far better quality.

On the Hippo signaling pathway

April 24, 2012

Here’s the abstract of a fairly new paper published by Cell Reports (“Premetazoan Origin of the Hippo Signaling Pathway”):

“Nonaggregative multicellularity requires strict control of cell number. The Hippo signaling pathway coordinates cell proliferation and apoptosis and is a central regulator of organ size in animals. Recent studies have shown the presence of key members of the Hippo pathway in nonbilaterian animals, but failed to identify this pathway outside Metazoa. Through comparative analyses of recently sequenced holozoan genomes, we show that Hippo pathway components, such as the kinases Hippo and Warts, the coactivator Yorkie, and the transcription factor Scalloped, were already present in the unicellular ancestors of animals. Remarkably, functional analysis of Hippo components of the amoeboid holozoan Capsaspora owczarzaki, performed in Drosophila melanogaster, demonstrate that the growth-regulatory activity of the Hippo pathway is conserved in this unicellular lineage. Our findings show that the Hippo pathway evolved well before the origin of Metazoa and highlight the importance of Hippo signaling as a key developmental mechanism predating the origin of Metazoa.”

This is interesting, especially from a front-loading perspective. The Hippo signaling pathway is an important developmental mechanism in Metazoa (animals), but all the core components of the Hippo signaling pathway have recently been found in unicellular Holozoa, which include the choanoflagellates.

A figure from the paper “Premetazoan Origin of the Hippo Signaling Pathway.”

Thus, the following Hippo pathway components have been found in unicellular organisms:

1) Hippo (kinase)

2) Warts (kinase)

3) Yorkie (coactivator)

4) Scalloped (transcription factor)

Specifically, these components have been found in Capsaspora owczarzaki. If you take a look at the above figure, you will see that this unicellular lineage is deeper-branching than the choanoflagellates. So, what are the Hippo components doing in unicellular organisms that don’t need them? This really isn’t expected from non-teleological evolution, but we’d expect this from front-loading. What’s very neat, too, is that the researchers discovered that these Hippo pathway components in Capsaspora owczarzaki can actually function in Drosophila. This is quite surprising from a non-telic viewpoint, because there’s no reason why these proteins in unicellular organisms should have the right sequence specificity to function in a very different multi-cellular organism like Drosophila. But it makes sense under the front-loading hypothesis, because we’d predict these proteins (more specifically, their ancestors) to be given a function that would conserve their sequence identity very well, such that when animals did appear on the scene, these components could be easily co-opted into a Metazoan role.

Deep Homology and Front-loading

March 29, 2012

Deep Homology and Front-loading

I argue that the FLH predicts that proteins of major importance in eukaryotes and advanced multi-cellular life forms (e.g., animals, plants) will share deep homology with proteins in prokaryotes. I have discussed this prediction with various critics of the FLH, and the most common objection seems to be that non-teleological evolution also makes this prediction. I disagree, so let me explain.

Life seems to require a minimum of about 250 genes (Koonin, Eugene V. How Many Genes Can Make a Cell: The Minimal-Gene-Set Concept, 2002. Annual Reviews Collection, NCBI) – a proto-cell would not require that many genes. Thus, it would be perfectly acceptable, under the non-teleological model, that the last common ancestor of all life forms had approximately 250 genes, add or take a few. From this small genome, gene duplication events would have occurred, subsequently followed with mutations in the new genes,  leading to the origin of  novel proteins. Over time, then, and through gene and genome duplication/random mutation, this small genome would evolve into larger genomes. This model is perfectly acceptable with the non-teleological hypothesis, and the non-teleological hypothesis does not predict otherwise. However, this model – where a minimal genome gradually evolves into the biological complexity we see today, through gene duplication, genome duplication, natural selection, and random mutation – is not compatible with the front-loading hypothesis. This is because front-loading requires that the first genomes have genes that would be used by later, more complex life forms. Of the 250 or so genes required by life, none of them could encode proteins that would be used later in multicellular life forms (excluding the proteins that are necessary to all life forms). A front-loading designer couldn’t possibly hope to “stack the deck” in favor of the appearance of plants and animals, for example, by starting out with a minimal genome.

Look at it this way. With a minimal genome of 250 genes that are involved in metabolism, transcription, translation, replication, etc., evolution could tinker with that genome in any way imaginable, so that you couldn’t really front-load anything at all with a minimal genome. You couldn’t anticipate the rise of animals and plants. Such a genome would not shape subsequent evolution. If the last common ancestor of all life forms had a minimal genome, and if you ran the tape of life back, and then played it again, a totally different course of evolution would result. But if you loaded LUCA with genes that could be used by animals and plants, you could predict that something analogous to animals and plants would arise. If you loaded this genome with hemoglobin, rhodopsin, tubulin, actin, epidermal growth factors, etc. – or analogs of these proteins – something analogous to animal life forms would probably result over deep-time.

Given that you couldn’t really front-load anything with a minimal genome consisting of about 250 genes, under the front-loading hypothesis, it is necessary that the LUCA contain unnecessary (but beneficial) genes that would later be exploited by more complex life forms. Non-teleological evolution does not require this. It has no goal, unlike front-loading. It tinkers with what is there – and if a minimal genome was all that was there, it would tinker around, eventually producing “endless forms most beautiful” as Darwin so famously put it. On the other hand, front-loading is goal-oriented: a minimal genome does not allow one to plan the origin of specific biological objectives.

  Thus, under the front-loading hypothesis, we would predict that important proteins in eukaryotes, animals, and plants will share deep homology with unnecessary but functional proteins in prokaryotes.

Non-teleological evolution does not predict this. Non-teleological evolution could explain that observation, but it does not predict this. And this is the important point to understand. There is nothing in non-teleological evolution that requires multi-cellular proteins to share deep homology with unnecessary prokaryotic proteins – but front-loading demands this. There is nothing in non-teleological evolution that requires that the LUCA have a genome larger than the minimum genome size – but for front-loading to occur, this must be the case. I conclude, then, that this prediction is made by the front-loading hypothesis, but it is not made by non-teleological evolution, and so front-loading is certainly testable.

New Article at Uncommon Descent

January 21, 2012

For the record:

Uncommon Descent (UD) member  kairosfocus graciously invited me to write a guest-post for UD on the front-loading hypothesis. Here’s my article. Thanks, kairosfocus!

Molecular Clues of Teleology

January 12, 2012

Molecular Clues of Teleology

Previously, I mentioned a testable prediction the front-loading hypothesis makes. I said that:

Importantly, the front-loading hypothesis predicts that these homologs of protein components in important molecular machines will be more conserved in sequence identity than the average prokaryotic protein.” (Consider reading my previous post so that this post might be more intelligible.)

Well, I decided to tentatively test this prediction made by the front-loading hypothesis, using a small dataset consisting of only two important eukaryotic proteins: tubulin and actin. The former protein consists of more than one chain, so I only focused on the beta chain.

Here’s the deal:  both of these proteins are well-conserved among eukaryotes. They play a crucial role in eukaryotes, as well. Tubulin is a major component of cilia, and actin can be found in sarcomeres, an important component of muscle cells. Actin also plays a major role in the cytoskeleton.

Tubulin shares homology with a prokaryotic protein, FtsZ, while actin shares homology with the bacterial protein MreB.

To test the prediction I outlined earlier, we need to align tubulin, actin, FtsZ, and MreB amino acid sequences to determine the degree of sequence conservation for each protein.  So, I grabbed a couple of tubulin beta sequences from UniProt, one belonging to Volvox  (accession number: P11482) and the other to a frog (Q91575). When aligned (using ClustalW), there is 88.514% sequence identity shared between the two sequences. The same procedure was done for actin: a Volvox actin sequence (P20904) was aligned with a frog actin sequence (P04751). The result was 89.655% sequence identity shared between the two sequences. Clearly, both tubulin and actin are very highly conserved in sequence identity, with actin being just slightly more conserved.

What about the prokaryotic homologs of these two proteins? Are they also well-conserved, as is expected under the front-loading hypothesis, or are they no more conserved than the average prokaryotic protein?

When FtsZ sequences belonging to Escherichia coli (P0A9A6) and Caulobacter (P0CAU9) are aligned, there is 83.3% sequence similarity.  The most common degree of sequence similarity between E. coli proteins and Caulobacter proteins ranges from 51-60% sequence similarity. Thus, the fact that tubulin’s prokaryotic homolog is considerably more conserved in sequence identity than the average prokaryotic protein is exactly what we would expect under the front-loading hypothesis.

I did the same thing with actin’s prokaryotic homolog, MreB. If we align MreB sequences from E. coli (P0A9X4) and Caulobacter (Q9A821), there is 86.5% sequence similarity. Interestingly enough, the telic prediction plays out again.

But this isn’t the whole story. Recall that I also noted in my previous post that: “…the front-loading hypothesis predicts that, in general, the more highly conserved in sequence identity a protein component is in a [major eukaryotic molecular system], then the higher the degree of sequence conservation will be in its prokaryotic homolog…

And this is exactly what we find for FtsZ and MreB. In eukaryotes, actin is slightly more conserved than tubulin, suggesting that actin is just a bit more important than tubulin. Consistent with the front-loading prediction, we find that actin’s prokaryotic homolog, MreB, is more conserved than tubulin’s prokaryotic homolog, FtsZ. This correlation is expressed in the figure below.

Figure. This illustrates the observation that actin is more conserved than tubulin, and interestingly, actin’s prokaryotic homolog, MreB, is more conserved than tubulin’s prokaryotic homolog, FtsZ, which is what the front-loading hypothesis predicts. Cool!

Thus, we can see that the two predictions I made earlier, from a front-loading perspective, are tentatively confirmed, tantalizing us to explore this further with larger data sets. Since only a small dataset was used here, the predictions can only be considered very tentatively confirmed. But it’s a clue that teleology was involved in life’s history.

A front-loading prediction and molecular machines

January 9, 2012

Unfortunately, many proponents of ID are content with simply criticizing Darwinian evolution instead of focusing their energy on developing a robust, testable teleological hypothesis in biology. Thankfully, the front-loading hypothesis, an inherently telic hypothesis, is testable and it does make predictions about the biological world. So, let’s delve into one of the predictions it makes regarding molecular machines like cilia, etc.

First, a bit of context. Under the front-loading hypothesis, the earth was seeded with unicellular life in the distant past. These life forms had the genomic information necessary for shaping subsequent evolution. For example, such unicellular life forms would have had the genes necessary for the origin of metazoa. But what about the origin of molecular machines which are only found in non-prokaryotic life forms, such as cilia? Could such molecular systems have been front-loaded?

 

A coolish diagram of the structure of cilia. Image taken from here.

If a molecular machine has played a fairly major role in shaping evolution, then, from a front-loading perspective, it’d be a pretty safe guess to suppose that that machine was either designed into the initial cells, or it was front-loaded to exist in the future. How could something like the cilium be front-loaded? First of all, the major protein components – or homologs of these proteins – of the cilium would have to be designed into the initial cells. This would ensure that the blind watchmaker – non-teleological evolution – wouldn’t have to tinker around with genes and form a cilium from scratch, since the essential components or their homologs would be in the first cells, although they wouldn’t be functioning as a cilium: they’d probably be performing functions independent of each other.

Let’s use a simpler example. Suppose that a certain molecular machine, called X, is specific to eukaryotes. System X is composed of three basic components: A, B, and C. If a designer(s) were to front-load system X, starting with prokaryotes, homologs of components A, B, and C could be designed into the prokaryotic genome. Thus, the first life forms on earth would have had the proteins A1, B1, and C1 (the 1 at the end of each letter means that there are homologs of the components A, B, and C). However, from a rational design perspective, there is a problem. If system X is to arrive on the scene roughly a billion years after the first life forms appeared on earth, proteins A1, B1, and C1 are likely to diverge in sequence identity to such an extent that their initial sequence identity is effectively erased. This, of course, means that after this divergence, A1, B1, and C1 are extremely unlikely to be co-opted into system X, as planned, because their diverged sequence identity has totally modified their function and 3D shape, etc.

How can this problem be overcome? There’s a pretty simple solution, actually. If proteins A1, B1, and C1 are given functions such that their sequence identity is well conserved, and thus their 3D shapes, this problem is overcome.

From this simple solution that a rational designer would inevitably use, we can formulate a couple of predictions from the front-loading hypothesis:

a. Components of molecular machines/systems that are of major importance in eukaryotes will share deep homology with proteins in prokaryotes.

b. Importantly, the front-loading hypothesis predicts that these homologs of protein components in important molecular machine will be more conserved in sequence identity than the average prokaryotic protein.

c. Also, we can tentatively predict that the more essential a component is to a major molecular machine, then the greater degree of sequence conservation in its prokaryotic homolog. With this latter prediction, we can guess how important a component is to a molecular machine by the level of sequence conservation the protein shows across different taxa. Thus, the front-loading hypothesis predicts that, in general, the more highly conserved in sequence identity a protein component is in a molecular machine, then the higher the degree of sequence conservation will be in its prokaryotic homolog.

Note that non-teleological evolution does not make the last two predictions, only the first one. Thus, we can test the front-loading hypothesis on the exclusive last two predictions it makes. If these predictions are confirmed, this will be positive evidence in favor of the front-loading hypothesis.

 

 

The Bacterial Flagellum and Homology

December 25, 2011

The Bacterial Flagellum and Homology

In this brief analysis, I’m going to discuss the bacterial flagellum and the homology a number of its components share with non-flagellar proteins. The below table is a list of flagellar proteins found in the genus Salmonella, their length in terms of amino acid residues, and their homologs (if any). Protein lengths were taken from UniProt, and the data on homologs were taken from Pallen and Matzke’s 2006 paper in Nature Reviews Microbiology, “From the Origin of Species to the origin of bacterial flagella.”

Flagellar Protein Length (amino acid) Homology with non-flagellar proteins?
FlgA 219 Yes (CpaB)
FlgBCFG 138; 134; 251; 260 No; but homology with FlgBCEFGK
FlgD 232 No
FlgE 403 No; but homology with FlgBCFGK
FlgH 221 No
FlgI 367 No
FlgJ 316 No
FlgK 553 No; but homology with FlgBCEFG
FlgL 317 No; but homology with FliC
FlgM 97 No
FlgN 140 No
FlhA 692 Yes (YscV)
FlhB 383 Yes (YscU)
FlhDC 113; 192 Yes (other activators)
FlhE 130 No
FliA 239 Yes (RpoD, RpoH, RpoS)
FliB 401 No
FliC 200 Yes; homology with FlgL and EspA
FliD 467 No
FliE 104 No
FliF 579 Yes (YscJ)
FliG 331 Yes (MgtE)
FliH 235 Yes (YscL; AtpFH)
FliI 456 Yes (YscN; AtpD; Rho)
FliJ 147 Yes (YscO)
FliK 405 Yes (YscP)
FliL 155 No
FliM 334 Yes (FliN; YscQ)
FliN 137 Yes (FliM; YscQ)
FliO 125 No
FliP 245 Yes (YscR)
FliQ 89 Yes (YscS)
FliR 264 Yes (YscT)
FliS 135 No
FliT 122 No
FliZ 183 No
MotA 295 Yes (ExbB; TolQ)
MotB 309 Yes (ExbD; TolR; OmpA)

When these figures are added up, we get a total of 12,322 amino acid residues. Thus, it appears that Salmonella flagella are composed of roughly 12,322 amino acid residues. What percent of the Salmonella flagellum, in terms of amino acid residues, has absolutely no known homologs? A total of 3,195 amino acid residues belong to proteins in the flagellum that have no known homologs. This means that approximately 25.9% of the Salmonella flagellum lacks sequence homology.  Now, you will notice that a number of flagellar proteins only have homologs in the type III secretion system. However, the type III secretion system (TTSS) is not a pre-cursor system to the bacterial flagellum. It probably evolved directly from the flagellar export system (do note that Gophna et al. 2003 are a dissenting view, but in my humble opinion, the evidence is certainly in favor of the hypothesis that the type III secretion system evolved from flagella). So we can ask the question: what percent of the flagellum lacks homologs or only has homologs in the TTSS, which is not a pre-cursor system to the flagellum? A total of 2,804 amino acid residues only share sequence homology with TTSS components. This is added to 3,195, to get 5,999. Thus, approximately 48.7% of the Salmonella flagellum has no known homologs in systems that would pre-date the flagellum.  Finally, we ask the question: what percent of the flagellum have no known homologs in non-flagellar systems? Note that a number of flagellar proteins only share homology with other flagellar proteins and TTSS components. For example, FliM is homologous to FliN and YscQ. FliN is only homologous to FliM and YscQ. Since YscQ could not be a pre-cursor protein, one of these proteins do not share homology with a pre-cursor protein. If FliM is supposed to be a pre-cursor to FliN, then the homology FliN shares with FliM cannot be evidence that FliM descended through non-teleological evolution.  To arrive at a percent of the flagellum that has no homologs that provide evidence of a non-telic origin of the flagellum, in cases like FliM/FliN we will use the shorter protein. This will allow us to be as fair as possible to the non-telic position. We arrive at a total of 471 amino acid residues. Add this to 5,999 and about 52.5% of the Salmonella flagellum has no homologs that provide evidence of a non-teleological origin.

Conclusion

Several flagellar proteins only share structural similarity with other proteins. However, structural similarity can often be the result of convergent evolution – there are only a few thousand different protein folds, contrasted with trillions of different possible amino acid sequences.  Further, in some instances, sequence similarity can also be the result of convergent evolution.

From this brief analysis in this article, I found that more than half of the Salmonella flagellum, in terms of amino acid residues, lack any homologs that provide evidence that it evolved through non-teleological mechanisms.  Some of the remaining homologs can hardly be called significant. The flagellar protein, FliG, shares only about 20% sequence similarity with its only homolog, MgtE.  Also, from the angle of intelligent intervention, where the flagellum was designed at the dawn of life, the remaining proteins it does share fairly significant sequence similarity with could possibly be explained by convergent evolution. I suggest that convergent evolution at the molecular level may be more pervasive than many think.