Hi to all of you who occasionally drop by,
I know I haven’t posted for more than half a year, and this is largely because I’ve been busy with what may appropriately be called “stuff.” I’m moving The Genome’s Tale to a new WordPress blog, Origins of Genomes. This is because I’d like to start from scratch so I can try a somewhat different layout and approach to blogging. Some of the articles from this site will be moved over to the new blog, but for the most part, the content there will be fresh. If I get enough traffic, I’ll consider getting the domain name. But not yet.
There are many examples of convergent evolution among different life forms. Convergent evolution is simply the independent acquisition of some biological feature among different lineages. For example, both birds and bats have wings (this is an elementary example, of course), but this similarity is not due to common ancestry but instead it is the result of convergent evolution.
So how does convergent evolution relate to front-loading? One of the criticisms of the front-loading hypothesis is that you can’t design a genome in a unicellular organism to evolve specific organs, tissues, biochemical systems, etc., several billion years in the future. But convergent evolution neatly answers this criticism.
A classic example of convergent evolution is the eye in the octopus and the mammalian eye. They are extremely similar, structurally speaking (see the figure, below).
The human eye and octopus eye both have the following tissues:
5. Ciliary muscle.
8. Optic nerve.
Furthermore, the arrangement of these parts are practically the same. Cool! And these two systems have arisen independently, through convergent evolution (i.e., they are not related through common descent; see Ogura et al., 2004). This means that these two organs have evolved as a result of the initial state of the last common ancestor of mammals and octopuses. In short, the convergent evolution of these two organs demonstrates that a genome can be programmed to evolve a given objective. If we ran the “clock of life” backwards (to borrow from Stephen J. Gould), human-like eyes would probably appear on the scene once again. In other words, the same system keeps popping up again and again. And this is evidence that a given objective can be front-loaded, starting with a specified initial state. The eye is a beautiful example of convergent evolution, wherein 8 separate “parts” independently came together in the same arrangement to produce the function of vision.
Are there examples of convergent evolution in biochemical systems? If so, this would provide evidence that not only can organs be front-loaded, but so too can biochemical systems. More on this later.
Atsushi Ogura, Kazuho Ikeo, Takashi Gojobori. Comparative Analysis of Gene Expression for Convergent Evolution of Camera Eye Between Octopus and Human. Genome Research, 14: 1555-1561 (2004).
I will be writing a number of essays addressing the argument that co-option is a plausible evolutionary mechanism for the origin of molecular machines. Briefly, there are many molecular machines which carry out functions that require the interaction multiple protein components. How could these biological systems have evolved through purely non-teleological mechanisms? Co-option is often offered as a plausible evolutionary mechanism that can give rise to such molecular machines. But there are a number of problems with invoking non-teleological co-option as a general solution to the origin of molecular machines, and these problems can be summarized as follows:
1. Complementary conformations.
2. Pre-adaptation of components.
3. A specific sequence of co-option events is required.
4. Other considerations.
I will be discussing the first problem in this article; in following articles, the other problems will be considered.
The Co-option Mechanism
Molecular machines are composed of specific protein components that interact to produce biological function. Below is a figure describing a hypothetical molecular machine composed of protein components A, B, C, and D.
Figure 1. Components A, B, C, and D interact to produce a biological function.
If this biological function can only be carried out by the interaction of multiple protein components, then co-option must be invoked. Simply put, co-option involves a protein originally carrying out function X, which then undergoes a shift in function such that it now carries out function Y. “Normal” Darwinian evolution, where for example, gene duplication gradually increases the efficiency of the system, cannot explain the origin of molecular machines that carry out functions that can only be carried out by the interaction of multiple protein components (note: when I say “normal Darwinian evolution,” I simply mean random mutation and natural selection gradually increasing the efficiency of the system; my use of the term “normal” does not imply that co-option is not a normal process, etc.). This is because the very nature of the function requires that multiple components interact, so you couldn’t start with a single component carrying out this function. It’d have to carry out another function, then interact with other proteins over evolutionary time, and undergo a shift in function such that the new function arises. And this is the essence of co-option, illustrated in Figure 2.
Figure 2. Components originally carrying out unrelated functions associate with each other step-by-incremental-step, eventually producing a novel function (function 6).
Now let’s examine if co-option offers a general solution to the origin of molecular machines.
The Problem of Protein Shapes
Molecular machines are constructed from individual protein parts. For example, the bacterial flagellum of Salmonella consists of 42 protein parts, such as MotA and MotB (the motor proteins), FlgE (universal joint), FliD (cap protein), etc. These protein parts interact in a tightly-integrated, specific manner. And to interact in precise ways, these protein parts have specific, complementary shapes consisting of knobs, crevices, protrusions, etc. Thus, in order for these molecular machines to have been co-opted from precursor parts, the precursor proteins would need to independently evolve complementary shapes prior to the co-option of the molecular machine, despite carrying out different functions. Consider Figure 3.
In figure 3 we see a “protein complex” composed of 5 components: A, B, C, D, and E. Importantly, the shapes of these proteins are complementary to each other. Component A is complementary to B, C, and D. And D is complementary to A, C, and E. Yet the co-option scenario would have us believe that parts A through E (which are shown apart from the complex in the above figure), were originally functioning in different contexts, and were independently shaped by natural selection such that their shapes just happened to be complementary. But there is no reason why, without teleology, these 5 proteins should be shaped just right prior to their co-option into the novel molecular machine. Is it really plausible to expect several proteins to independently match a given pattern – the pattern of complementarity – especially since this pattern would have to be shaped by chance alone, since there is nothing in natural selection that would drive towards matching a pattern that will only be beneficial in the future? It is important to remember that non-teleological evolution has no foresight, unlike an engineer. And since evolutionary mechanisms cannot peer into the future, it is entirely unreasonable to expect non-teleological processes to shape these proteins in just the right way such that when they do associate, novel function appears.
How specific do protein shapes need to be?
The parts depicted in Figure 3 have very specific shapes and fit very well. But one could argue that during the evolution of a molecular machine, the parts were not quite so complementary but still managed to elicit the function. Then, over time, the parts became more tightly integrated, performing the function more efficiently. This situation is seen in Figure 4.
In the above figure, components A through E are somewhat complementary to each other, but not very much. These parts associate to form the molecular complex, and then over time (represented by the large red arrow), the parts become more tightly integrated, resulting in a molecular machine that is composed of tightly integrated components. So, in this scenario, the shapes of the precursor parts do not need to be that complementary to each other – they only need to be suited to each other well enough so that there is function, even if it is only minimal. Let us now consider this possibility.
The first point I will make here is that random protein-protein binding almost always does not produce new biochemical functions. Of all the possible ways for 2 or more proteins to bind together, the vast majority of them will not offer novel biological functionality. And since we are talking about 3D space here, there are trillions of different possible protein-protein interactions between 2 proteins.
However, there is another point I want to make, and that is that as the complexity of the system increases, and more parts are co-opted into the system, the greater the constraints on evolutionary mechanisms, and the less plausible it is for that biological machine to increase in complexity. And we can tie this into the discussion of complementary shapes. Two proteins could feasibly, through chance, have roughly complementary shapes, and loosely bind, producing a novel but inefficient function. This loose conglomeration could grow in complexity by the co-option of more proteins. But as the system becomes more complex and more tightly integrated, simple binding to the system by a protein will not produce novel function. The protein must bind specifically to particular components of the complex, and of course, the more components there are, the greater the number of possible protein-protein binding interactions – the vast majority of which will be non-functional. Indeed, you might “gum up the works” if your new protein does not bind specifically enough and if its shape is not fully complementary to the proteins it will interact with. John Bracht highlighted this way back in 2002 in a response to Ursula Goodenough on the evolution of the bacterial flagellum:
“Evolutionary explanations must describe how a new protein integrates into an old system in such a way as to allow continued functionality overall (often, both the incoming protein and the pre-existing system must be extensively modified to fit together in a coordinated way), and enhance functionality of the entire system in such a way as to provide selective advantage.”
Furthermore, if a new protein component will interact with multiple components of the system, there are even severer constraints on what protein shapes are allowed. The blind watchmaker would have to independently shape this new protein precisely so that when it is incorporated into the molecular machine, its shape fits well. We can construct a hypothesis that goes as follows:
The more components a protein interacts with, the more specified its shape must be, and subsequently, the more specified its sequence must be.
The greater the number of components a protein will interact in a biological machine, the greater the degree of specificity its shape must have, and the more specified its sequence must be (since the sequence is what codes for the protein shape).
Now, let’s test this hypothesis.
ATP synthase and Protein Conformation Specificity
To test this hypothesis, we will begin with the following premise: protein conformation specificity is determined by amino acid sequence specificity. In other words, since it is ultimately the amino acid sequence of the protein that determines its shape (there are other factors, but I won’t get into that right now), we’d predict that a protein that interacts with multiple components will have a greater degree of sequence conservation across taxa than a protein that only interacts with one protein.
Here’s where ATP synthase comes in. Bacterial F1F0 ATP synthases are composed of 8 components: the alpha subunit, the beta subunit, the a subunit, the b subunit, the c subunit, the gamma subunit, the delta subunit, and the epsilon subunit (see Figure 5).
Figure 5. Diagram of the ATP synthase system.
I retrieved the sequences of each of these components from UniProt. The sequences were all from three different bacteria genera: Escherichia, Shigella, and Bacillus. So, there were 3 alpha sequences, 3 beta sequences, 3 subunit a sequences, etc. The sequences of each component were then aligned using ClustalO, and the percent identity was recorded. Below is a table of the subunits, the percent sequence identity shared among the 3 sequences from each subunit, and the number of ATP synthase (ATPase) components each of these proteins interact with.
|Name of Protein||Percent Sequence Identity||Number of components protein interacts with|
|ATPase subunit alpha||52.621%||4 (beta, gamma, delta, epsilon)|
|ATPase subunit beta||65.962%||4 (alpha, gamma, delta, epsilon)|
|ATPase subunit a||24.468%||2 (c, b)|
|ATPase subunit b||24.571%||2 (a, delta)|
|ATPase subunit c||39.241%||3 (a, gamma, epsilon)|
|ATPase subunit delta||22.162%||3 (alpha, beta, b)|
|ATPase subunit epsilon||34.532%||4 (alpha, beta, gamma, c)|
|ATPase subunit gamma||36.426%||4 (alpha, beta, epsilon, c)|
The first feature that I’d like you to notice is that ATPase subunits a and b both interact with only 2 components, and they share almost exactly the same amount of sequence conservation (24.468% and 24.571%, respectively, a difference of about .1%). However, we do see some exceptions to the hypothesis I described above. Subunit delta interacts with 3 components but has the lowest degree of sequence conservation. And subunits gamma and epsilon both interact with 4 components but have a lower degree of sequence conservation than ATPase subunit c. Nevertheless, if we average the degrees of sequence conservation among the ATPase subunits that interact with different numbers of components – we do indeed find that, on average, the greater the number of components an ATPase subunit interacts with, the greater the degree of sequence conservation, and hence, the more conserved the 3D structure of the protein (see graph, below).
Graph. This graph lists the mean degree of sequence conservation among ATPase subunits that interact with 2, 3, and 4 components respectively.
From the above graph we can see that, generally speaking, our hypothesis is correct. The greater the number of components a protein interacts with, the more specific its sequence and shape must be. And this adds another constraint on what kinds of proteins are and are not tolerated for being co-opted into a multi-part molecular machine.
There is one more detail I would like to add here, regarding the matter of complementary shapes: not only must the proteins be fairly complementary to one another, but these complementary-shaped proteins must also be localized to the same subcellular location. If they are not, then the molecular complex cannot be co-opted from these precursor proteins. This, again, adds another constraint on the co-option scenario, and diminishes its plausibility as a general solution to the origin of molecular machines.
I will summarize the conclusions of this article in brief:
- In order to function properly, molecular machines require the interaction of protein components that interlock and bind together. The shapes of the proteins are what allow proteins to fit snuggly with each other, producing biological function.
- Although loosely complementary shapes will produce function, there is a threshold at which there will be either novel functionality or no novel functionality; and the vast majority of physically possible protein shapes will be below this threshold. Further, the precursor proteins which independently evolve complementary shapes must just happen to be localized to the same subcellular location.
- If a protein that will be co-opted into a multi-part complex will interact with multiple components of the molecular complex, then its shape must be very specific. And there are many more ways to clog up, gum up, and destroy the function of a molecular machine by tossing a protein into the mix than there are ways to enhance the function of the machine by the addition of a new protein.
To be continued…
Note: Some of these images are not very high quality; however, if you click on them, they will have far better quality.
Here’s the abstract of a fairly new paper published by Cell Reports (“Premetazoan Origin of the Hippo Signaling Pathway”):
“Nonaggregative multicellularity requires strict control of cell number. The Hippo signaling pathway coordinates cell proliferation and apoptosis and is a central regulator of organ size in animals. Recent studies have shown the presence of key members of the Hippo pathway in nonbilaterian animals, but failed to identify this pathway outside Metazoa. Through comparative analyses of recently sequenced holozoan genomes, we show that Hippo pathway components, such as the kinases Hippo and Warts, the coactivator Yorkie, and the transcription factor Scalloped, were already present in the unicellular ancestors of animals. Remarkably, functional analysis of Hippo components of the amoeboid holozoan Capsaspora owczarzaki, performed in Drosophila melanogaster, demonstrate that the growth-regulatory activity of the Hippo pathway is conserved in this unicellular lineage. Our findings show that the Hippo pathway evolved well before the origin of Metazoa and highlight the importance of Hippo signaling as a key developmental mechanism predating the origin of Metazoa.”
This is interesting, especially from a front-loading perspective. The Hippo signaling pathway is an important developmental mechanism in Metazoa (animals), but all the core components of the Hippo signaling pathway have recently been found in unicellular Holozoa, which include the choanoflagellates.
A figure from the paper “Premetazoan Origin of the Hippo Signaling Pathway.”
Thus, the following Hippo pathway components have been found in unicellular organisms:
1) Hippo (kinase)
2) Warts (kinase)
3) Yorkie (coactivator)
4) Scalloped (transcription factor)
Specifically, these components have been found in Capsaspora owczarzaki. If you take a look at the above figure, you will see that this unicellular lineage is deeper-branching than the choanoflagellates. So, what are the Hippo components doing in unicellular organisms that don’t need them? This really isn’t expected from non-teleological evolution, but we’d expect this from front-loading. What’s very neat, too, is that the researchers discovered that these Hippo pathway components in Capsaspora owczarzaki can actually function in Drosophila. This is quite surprising from a non-telic viewpoint, because there’s no reason why these proteins in unicellular organisms should have the right sequence specificity to function in a very different multi-cellular organism like Drosophila. But it makes sense under the front-loading hypothesis, because we’d predict these proteins (more specifically, their ancestors) to be given a function that would conserve their sequence identity very well, such that when animals did appear on the scene, these components could be easily co-opted into a Metazoan role.
Deep Homology and Front-loading
I argue that the FLH predicts that proteins of major importance in eukaryotes and advanced multi-cellular life forms (e.g., animals, plants) will share deep homology with proteins in prokaryotes. I have discussed this prediction with various critics of the FLH, and the most common objection seems to be that non-teleological evolution also makes this prediction. I disagree, so let me explain.
Life seems to require a minimum of about 250 genes (Koonin, Eugene V. How Many Genes Can Make a Cell: The Minimal-Gene-Set Concept, 2002. Annual Reviews Collection, NCBI) – a proto-cell would not require that many genes. Thus, it would be perfectly acceptable, under the non-teleological model, that the last common ancestor of all life forms had approximately 250 genes, add or take a few. From this small genome, gene duplication events would have occurred, subsequently followed with mutations in the new genes, leading to the origin of novel proteins. Over time, then, and through gene and genome duplication/random mutation, this small genome would evolve into larger genomes. This model is perfectly acceptable with the non-teleological hypothesis, and the non-teleological hypothesis does not predict otherwise. However, this model – where a minimal genome gradually evolves into the biological complexity we see today, through gene duplication, genome duplication, natural selection, and random mutation – is not compatible with the front-loading hypothesis. This is because front-loading requires that the first genomes have genes that would be used by later, more complex life forms. Of the 250 or so genes required by life, none of them could encode proteins that would be used later in multicellular life forms (excluding the proteins that are necessary to all life forms). A front-loading designer couldn’t possibly hope to “stack the deck” in favor of the appearance of plants and animals, for example, by starting out with a minimal genome.
Look at it this way. With a minimal genome of 250 genes that are involved in metabolism, transcription, translation, replication, etc., evolution could tinker with that genome in any way imaginable, so that you couldn’t really front-load anything at all with a minimal genome. You couldn’t anticipate the rise of animals and plants. Such a genome would not shape subsequent evolution. If the last common ancestor of all life forms had a minimal genome, and if you ran the tape of life back, and then played it again, a totally different course of evolution would result. But if you loaded LUCA with genes that could be used by animals and plants, you could predict that something analogous to animals and plants would arise. If you loaded this genome with hemoglobin, rhodopsin, tubulin, actin, epidermal growth factors, etc. – or analogs of these proteins – something analogous to animal life forms would probably result over deep-time.
Given that you couldn’t really front-load anything with a minimal genome consisting of about 250 genes, under the front-loading hypothesis, it is necessary that the LUCA contain unnecessary (but beneficial) genes that would later be exploited by more complex life forms. Non-teleological evolution does not require this. It has no goal, unlike front-loading. It tinkers with what is there – and if a minimal genome was all that was there, it would tinker around, eventually producing “endless forms most beautiful” as Darwin so famously put it. On the other hand, front-loading is goal-oriented: a minimal genome does not allow one to plan the origin of specific biological objectives.
Thus, under the front-loading hypothesis, we would predict that important proteins in eukaryotes, animals, and plants will share deep homology with unnecessary but functional proteins in prokaryotes.
Non-teleological evolution does not predict this. Non-teleological evolution could explain that observation, but it does not predict this. And this is the important point to understand. There is nothing in non-teleological evolution that requires multi-cellular proteins to share deep homology with unnecessary prokaryotic proteins – but front-loading demands this. There is nothing in non-teleological evolution that requires that the LUCA have a genome larger than the minimum genome size – but for front-loading to occur, this must be the case. I conclude, then, that this prediction is made by the front-loading hypothesis, but it is not made by non-teleological evolution, and so front-loading is certainly testable.