9/9/2015 — It's been a while since the last pig-chimp newsletter, and I've been getting some messages from people who were wondering what was going on with the human genome scan. So first a bit of information about what's happening on that front.
Although we've taken a lot of time off from the project this summer, we have continued sporadically to work on (and with) the computer program we've written to survey the human, chimp and pig genomes. But, alas, the main thing that we've figured out is that the program runs too slowly to complete the task of scanning the entire human genome in a reasonable amount of time. That's just the way it goes with computer programs sometimes. In concept, they always seem good. But then, in real applications, they are sometimes too slow for us humans, who would rather not spend our entire lives waiting for results. And in this case, because we're comparing entire genomes with entire genomes, the scans do seem to take forever to run. The number of computations involved with even very sparse surveys of this nature are astronomical.
So unless we get access to more powerful computing facilities, it looks as if we're unlikely to make much progress with this approach. (Using the computers we have now, it would take us about ten years to complete even a rather sparse scan!) Since I don't have the money to go around renting supercomputers, it will be a matter of finding someone who does and who wants to finance this research. And since I've never been much of a go-getter fundraiser type, I don't know when that will happen. (A few more details about the scanning program appear below).
In the meantime, you might want to read this detailed response that I've written to all of the various objections critics have raised with regard to the theory: http://www.macroevolution.net/prothero.html
Results of last spring's project: Some of you may remember that earlier this year we were exploring some similarities that other researchers had found between certain human chromosomal regions and certain portions of the pig genome. However, since that time, we determined that that line of research is a dead end. Our initial hope was that by comparing just those regions, which others had found to be similar in pig and human, that we could avoid comparing all of the pig and chimp genomes with all of the human genome. But no such luck. We found nothing conclusive in those particular regions. (But that's like saying we searched three haystacks out of 400 and didn't find a needle.)
Problems with Boomstick: We call the scanning program Boomstick (slang for a sawed-off shotgun) because it takes many small samples (analogous to the many small pellets of a shotgun) from the query genome (in this case the pig or the chimp genome) and surveys ("blasts") a target genome for matches to those queries.
As I've mentioned above, our tests show that this program runs very slowly. The basic problem is that genomes are so large. The pig, human and chimpanzee genomes each contain on the order of 3,000,000,000 nucleotides. Boomstick takes non-overlapping 40 nucleotide queries from the pig genome and scans the entire human genome with each such query, looking for a close match. If a match is found, its position in the human genome is recorded. Then the next 40nt query is taken and the human genome is scanned again. In the pig genome there are about 70 million such non-overlapping queries and in the chimp genome nearly 80 million. So for this example, you have to scan the entire human genome with 150 million queries. With the scans taking something like six seconds per query, that means completing the task would take about 1,700 years!
Obviously then, we have to reduce the density of the queries (to get done in a year, we'd have to take them from the pig and chimp genomes at intervals of 68,000 nts = 40 x 1700 nts), or run the program on a supercomputer (or submit it to a computing facility in many, many separate batches that could run in parallel) so that many different sub-jobs could be running simultaneously. Really, to get done anytime soon, it will probably be necessary to do both (i.e., to reduce the density of the sampling and to run the scans in many separate, simultaneous batches). But at present we have no funding, so that isn't happening. Computer time isn't free, especially large amounts of it. Sorry I can't say more, but there it is!
The Obstacle: As I say, the main thing holding back our research is a lack of funding. No dough. No go. Of course, if you're interested in seeing this project move forward, you may want to make a (hopefully hefty) contribution to removing this monetary obstacle. You can do so on the donations page.
In all honesty, however, I should mention that using genetics to determine whether we are derived from an ancient multiple backcross event, is a bit like using a camera in the dark. Under such circumstances we expect the genetic traces of the event to be obscure. Backcrossing, especially repeated generations of backcrossing, tend to erase traces of the non-backcross parent (in this case pig). I explain the reasons for this in more detail in the section at the bottom of this page (which has already appeared on various pages of the website). But, as I have repeatedly explained (for example, in the blue sidebar toward the bottom of this page) shifts in gene dosage resulting from hybridization could nevertheless alter phenotype in a major way even when absolutely no detectable nucleotide sequence differences are present. So, really, this scanning project is nothing more than an effort to accommodate that certain fraction of people who imagine that every problem can be solved with genetics, even though we all know that any given tool is appropriate to some tasks but not to others. In the present case, the methods I have already used (comparative anatomy and physiology)--or even one that I have not yet attempted (attempts at hybridization using artificial insemination)--are more appropriate.
Questions? If you have questions, I'd be happy to answer them. My plan is to paste your question into a questions and answers page and then to reply to it there. Have no fear, your name won't be mentioned unless you specifically request it.
Ongoing news: You may not be aware, but I make announcements about pig-chimp issues and other topics relating to Macroevolution.net on Twitter (@macroevo). So you can follow me there if you're interested. Also, I've just started a Facebook account and I plan to make ongoing announcements there as well. So like that if you like.
Best wishes to all!
Why genetics has dealt the pig-chimp theory a bad hand - A non-technical explanation of the problems involved with using genetics to detect ancient backcrossing
However, as I’ve said before, the task of deciding whether certain portions of our genome are pig-derived is likely to be difficult even if such regions are actually there. Such, unfortunately, is the expectation under the hypothesis, i.e., that the human race came into being via a process involving, first, a cross between pig and chimpanzee, then multiple generations of backcrossing to chimpanzee, and all of this happening many thousands of generations ago. It's well known that backcross hybrids are hard to identify with molecular techniques (Vähä and Primmer 2006). And in the present case, not only the repeated generations of backcrossing to chimpanzees pose a problem, but also, because the crossing would have occurred anciently, the subsequent effects of recombination during thousands of generations of meiosis in the descendant hybrid population would make the genetic traces very difficult to detect today. Even though the gene dosages throughout the genome would be shifted toward pig, the sequence similarities would be shifted toward the backcross parent, that is, toward chimpanzee. (The reasons why dosages would shift one way even as sequence similarity shifted the other are given here.)
Nevertheless, some people have criticized me for saying that it would be hard to detect the pig in our genome, even if it were actually there. It would be a cakewalk, they say. Some of the people who’ve claimed this are even professional geneticists. But I think they imagine this to be the case only because they are unfamiliar with the genetics of hybridization and don’t really understand the problem.
In fact, it’s not easy for most people to understand the technical genetic issues relating to the question of whether humans might be pig-chimpanzee hybrids. But the muddling effects of the various events that occur in a hybridization process such as the one hypothesized can be explained by an analogy that anyone can understand.
The effect of combining two genomes. Imagine that you had two packs of 52 cards with each card bearing a unique number that sets it apart from all other cards in either pack. Call one pack the pig pack and the other the chimp pack. Now make a complete list of all the numbers in the pig pack and make a complete list of all the numbers in the chimp pack. Then shuffle the two packs together and spread the cards on the table. Under these circumstances, it would be fairly easy to show that some of the cards on the table were from the pig pack. All you’d have to do would be to pick up cards one at a time and compare the number on each with the numbers on the pig list. Once you found one such card, your search would be over.
The effect of a first backcross. Now imagine taking, at random, half the cards out of the ones you’ve spread on the table and replacing them with a new pack of chimp cards, so that only about 1/4 of the remaining cards spread on the table are pig cards. It now will be harder to find pig cards, but it can still be done with a reasonable amount of ease.
The effect of a second backcross. You take, at random, half the cards out of the ones you’ve spread on the table and replace them with a new pack of chimp cards so that only about 1/8 of the remaining cards spread on the table are pig cards. It now will be even harder to find pig cards, but it can still be done.
The effect of a third backcross. You take, at random, half the cards out of the ones you’ve spread on the table and replace them with a new pack of chimp cards so that only about 1/16 of the remaining cards spread on the table are pig cards. It now will be even more difficult to find pig cards, but it can still be done.
The effect of ignorance (The exact history of the process is unknown). Now suppose that you don’t know exactly how many times this repetitive process of backcrossing occurs, that is, how many times half the cards are taken out and replaced with a chimp pack, so that you don’t know exactly what fraction of the cards are likely to be from the original pig pack. The effect of this ignorance is that you cannot be sure exactly how thoroughly you will have to search before you can feel confident that no pig cards are on the table. Indeed, the only way you could be really sure would be to pick up every card.
The effect of recombination (gene conversion) during backcrossing. Now imagine that the cards were magical and that during each of the preceding steps some of the numbers on the pig cards changed to match numbers on the chimp cards so that the cards were really derived from the pig pack, but their numbers had been changed to chimp numbers.
The effect of recombination (gene conversion) during subsequent generations. Now imagine that the cards on the table go through a repeated process in which, during each round of the process, many pig cards get their numbers converted to chimp numbers and a few chimp cards get their numbers changed to pig numbers. That is, since the great majority of the DNA in a hybrid derived from multiple backcrosses to chimpanzee would be chimp-derived DNA, most conversions would be toward chimp.
The effect of thouands of generations of meiosis and gene conversion. Now imagine that this process of recombination and gene conversion is repeated many thousands of times.
The effect of thousands of generations of point mutation. Now imagine that throughout each of those thousands of repetitions, at any given stage, that for each of the cards, there is a small chance that a digit in the number on that card will change to some other digit.
The effect of representing ancient genomes with modern genomes. Now imagine that your list of pig numbers was based not on the actual pig pack that was put in at beginning of this process, but upon a pack that someone gave you who said he thought it was similar to the original pack, but that he didn’t really know exactly how similar it might be.
The effect of a large haystack. Now imagine that there were not 52 cards in each pack, but many millions. The human, pig and chimpanzee genomes all contain on the order of 3,000,000,000 nucleotides, and it is unknown which of those nucleotides should be compared in order to evaluate this hypothesis.
Not a cakewalk.
Indeed, comparative anatomy is a much easier and far more telling method of evaluating this question. I have already made the evidence from that source available.
Most shared on Macroevolution.net: