"

12 Testing AI Predictions of Protein Structure and Binding (Biochemistry)

Josh Beckham headshot

Author: Josh T Beckham, Freshman Research Initiative, The University of Texas at Austin

Josh Beckham leads the Virtual Cures stream in the Freshman Research Initiative at UT Austin focused on infectious disease drug discovery using both computational and wet lab techniques.

To see Josh’s full bio, click here.

Cite this Chapter: Beckham, J. T. (2025). Testing AI Predictions of Protein Structure and Binding. In K. Procko, E. N. Smith, and K. D. Patterson (Eds.), Enhancing STEM Higher Education through Artificial Intelligence. The University of Texas at Austin. https://doi.org/10.15781/7dv5-g279

Description of resource(s):

The resource is a protocol with steps and instructions to explore protein structures with AI. The activity, which was used in my lab with first year students, involved 3 parts: 1. Use the AlphaFold program to create an AI-generated protein model, 2. Use the program DiffDock to predict how small molecule drugs interact with that protein model, and 3. compare the drug docking results to those from a known experimental protein model (x-ray crystallography model). This would be a computer lab activity or a take home activity for students.

Links to Resources:

Protocol: Generate and Evaluate Alphafold Protein Model

Why I implemented this:

The advent of AI to structural biology and drug discovery is a game changer. Programs like AlphaFold are able to predict protein structures with high levels of accuracy when compared to experimental results and allow scientists to have a model of the protein’s structure when no experimental structure exists. An AI model costs almost nothing to generate compared to very expensive and time-consuming wet lab experimental methods. This OER protocol is meant to be a resource to more broadly teach and use AI in the context of biochemistry so that students can better understand the applications and the underlying concepts.

In creating this protocol I sought to incorporate AI into our research and our course on drug discovery so these undergraduate students at the sophomore, junior and senior level could understand how AI can contribute to the field. To create this resource, I chose open-access AI programs. Once I decided on the combination of AlphaFold and Diffdock, I compared the generated protein structure output to the experimental model (from x-ray crystallography data) and found them to be very similar, meaning the AI model was high enough quality for research purposes. (Jumper, et. al., 2021) The AlphaFold technology will soon supplant the traditional molecular docking tools that we use which will make it even more powerful in addition to its structure prediction capabilities; therefore, training in these tools will prepare students to understand the capabilities and limitations of these approaches.

My main takeaways:

The main takeaways are that Alphafold does a better job at structure prediction than prior software, and will be used by us and many other researchers in the field from here on out. The student-researchers were surprised and excited that they could use AI in the context of their research projects and not just as a language based tool. For example, in a survey after using a version of the protocol, students responded: “it has made me value the different uses of AI”; “I’m really impressed that a computer can do this.”; and “It has allowed me to analyze my work more critically.”

What else should I consider?

Timing: This protocol does take some time to get through (a couple of hours for novices). However, it is relatively conducive to breaking up over several days. For example, the background could be covered in one class and then several of the first steps could be done in shorter sessions. However, students would want to be sure to save any files generated from day to day and be able to access them again on subsequent days.
Preparation: There are several files that need to be saved – so good file management and a way to save them in the cloud (Google Drive, UT Box, Dropbox) would be helpful so that they can be accessed from another computer on a different day.
Context: This protocol is great for college level or advanced high school use.
Adaptations: A direct extension of this protocol is to use a different protein of your choosing or to dock many different positive and negative control ligands. Different molecular docking (virtual screening) programs could be used as well.
Potential Pitfalls:

When proceeding through the activity, the file naming can be a challenge to keep track of which is which. Having a good system helps and trying to do most of it in one sitting helps reduce confusion about where you save files or which order. It may help to keep a running set of notes of what files were created.

If one intends to do this with a different protein, there can be errors in the PDB structures data or unique features (multiple chains) that need to be ‘cleaned’ before using that requires knowledge of x-ray crystallography data. Picking a simple protein (one chain) and is from a more recent deposition is usually a good strategy when starting out because newer PDB depositions are usually better quality. Finding good positive and negative control ligands can be hard and requires understanding of the biochemical methods to determine binding. Often a journal paper on the target protein may provide good examples of positive controls. A clear positive control is the molecule bound in a PDB structure. The negative controls can be ones that are known not to bind the protein or can be randomly selected molecules that are similar in their properties to the positive controls. A good resource for positive controls and negatives is the DUDE database with ‘actives’ and  ‘decoy’ molecules. Lastly, this protocol does not work for covalent docking of ligands as DIffDock does not try to covalently dock the molecules.

WANT TO LEARN MORE?:

License

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License

Enhancing STEM Higher Education with Artificial Intelligence Copyright © 2025 by Office of STEM Education Excellence is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.