Some comments on AlphaFold

2020-12-02 science, proteins

Many people are asking for my opinion on the recent impressive success of AlphaFold at CASP14, perhaps incorrectly assuming that I am an expert on protein folding. I have actually never done any research in that field, but it's close enough to my research interests that I have closely followed the progress that has been made over the years. Rather than reply to everyone individually, here is a public version of my comments. They are based on the limited information on AlphaFold that is available today. I may come back to this post later and expand it.

First of all, the GDT scores obtained by AlphaFold are impressive, which is of course the reason for all the buzz at the moment. The GDT score measures how close a predicted structure is to the experimentally determined one. It is defined on a scale from 0 to 100 and can roughly be interpreted as the percentage of amino acid residues that were placed correctly. For about 2/3 of the proteins in this year's competition, AlphaFold achieved a GDT score in the 90s, whereas in the not so distant past, a score in the 70s was already considered very good. Which exact techniques were used to obtain the predicted structures is not something I can comment on: as far as I know, no technical details have been made public so far. Nor is AlphaFold a publicly available program or service that scientists could explore or apply to their own work. So all we know for now is that DeepMind, the company behind AlphaFold, has figured out a way to obtain good scores at CASP14. In the following I will assume that this is not just good luck, and that the method is applicable to a much larger class of proteins than the CASP candidates.

The scores obtained by AlphaFold are clearly a sign of significant progress. But does it mean that we have "a solution to a 50-year-old grand challenge in biology", as the press release claims? That depends on what exactly one considers that challenge to be.

If the challenge of protein folding is taken to be a purely pragmatic one, i.e. being able to predict structure from sequence, then AlphaFold is a candidate for a solution. How much of a solution will depend on further evaluations that remain to be done, on a larger range of proteins. CASP is limited to proteins for which experimental structures are (just) available. But some proteins resist experimental structure determination, for example because they have no well-defined structure at all. A robust structure prediction tool would have to identify such cases, rather than predict bogus structures. Allosteric proteins, which are proteins that can take more than one stable structure, provide another set of interesting test cases. A third case of interest is protein pairs that differ minimally in their sequence but importantly in structure. The goal of evaluating the robustness of a tool is to understand how it behaves at best, at worst, and for important edge cases, such that its users can judge the trustworthiness of its results.

For many scientists, including myself, having a black-box structure prediction tool is not sufficient to declare the protein folding problem solved. A solution requires an in-depth understanding of the mechanisms that determine protein structure. Whether or not AlphaFold can contribute to identifying these mechanisms is a question that scientists can only start to examine, and only if AlphaFold becomes sufficiently accessible and inspectable for critical examination by outside experts. I hope this will happen, and in fact I am optimistic that it will happen: the problem is important enough to deserve a serious effort by everyone involved. AlphaFold is not the end of the quest for a solution of the protein folding problem, but it could well turn out to be the beginning of a new chapter in the story.