"

6. Advanced Search through the RCSB PDB

Walter Novak

Advanced searching is a powerful way to sift through the more than 200,000 structures available in the PDB. For example, you can quickly search for specific ligands or cofactors, enzyme classification (E.C.) numbers, experiment type, or even sequence and structural motifs. Further, individual advanced searches can be combined to yield highly focused results. The RCSB allows advanced searches using sequence information, structural information, chemical information or other characteristics called “attributes.”

To use the Advanced Search Query Builder (Figure 1), simply point your web browser to the RCSB PDB website and just under the search box click the “Advanced Search” link. It can be a little daunting at first, but our instructions below will have you searching like an expert in no time.

Figure showing the advanced search tool on the RCSB site.
Figure 1: The Advanced Search Query Builder Interface on the RCSB website.

You can search for an attribute by typing in the “Type to filter and/or select an attribute” box. You can also identify categories of attributes or all attributes by clicking the single or double down arrows, respectively, at the right of the attribute box (Figure 1). In the bottom right hand corner of the Advanced Search Query Builder you can choose to include computationally derived structural models. You can also return the number of hits your query will generate with the “Count” button. Finally, clicking “Search” will return your search results.

One final note is that you can use Boolean operators (AND, OR, or NOT) to yield results that include this AND that, this OR that, or this, but NOT that. Figure 2 below gives an example of how to use Boolean operators to identify structures with a deposit date from January 1, 2021 to January 1, 2025 that also have more than 10% of the protein residues identified as Ramachandran outliers by Molprobity.

Figure showing the use of Boolean operators for advanced searches.
Figure 2: The Advanced Search Query Builder supports the use of Boolean operators.

We provide detailed instructions for several popular searches below.

Structure Similarity

There are two ways to identify structures that share a similar fold to a particular structure: 1) use an existing PDB ID and 2) upload your own PDB file. Here we will cover using an existing PDB ID.

Figure showing Structure Similarity search features.
Figure 3: The Structural Similarity search interface.

Steps:

    1. Under the search box at the RCSB, click on the “Advanced Search” link.
    2. Click on “Structure Similarity” under Advanced Search Query Builder to expand the menu (Figure 3).
    3. Type the PDB ID for your structure of interest (e.g. “1TIM” for triose phosphate isomerase) into the “Entry ID” search box and hit “Return.”
    4. You can search by “Assembly ID” or “Chain ID.”
      • Assemblies may have multiple proteins or multiple copies of a single protein or nucleic acid. This is the default search. There may be multiple assemblies defined in your PDB, therefore you can select which assembly you want from the dropdown menu. If only one Assembly exists, then “1” will be your only choice.
          • Note: To easily determine the biological assembly (assemblies) in your PDB, in another window navigate to the RCSB page for your structure of interest. On the left-hand side of the page you will see a cartoon of your structure. It will state the Biological Assembly at the top of this image. You can click through different assemblies (if they are present).
      • Chain refers simply to a contiguous amino acid or nucleic acid sequence. If your protein has multiple chains, you can select which one to search with. This may be especially helpful if you are just interested in one particular protein chain among many.
    5. You can then select a “Strict” or “Relaxed” search. These choices make intuitive sense; strict will only find the most closely related structures, while relaxed will find all similar structures.
    6. Click “Count” on the far right to see how many hits meet your search criteria.
    7. To view the results, click the blue “Search” button on the far right.

Sequence Similarity

There are two ways to identify structures that share sequence similarity: 1) use an existing PDB ID and 2) paste in an amino acid sequence.

Figure showing Sequence Similarity search features.
Figure 4: The Sequence Similarity search interface.

Steps using an existing PDB ID:

      1. Under the search box at the RCSB, click on the “Advanced Search” link.
      2. Click on Sequence Similarity under the Advanced Search Query Builder to expand the menu to expand the menu (Figure 4).
      3. Type in the PDB ID you are interested in (e.g. “1TIM” for triose phosphate isomerase) in the “Entry ID” search box and hit “Return.” You should see the amino acid sequence appear in the sequence box.
      4. Click “Count” on the far right to see how many hits meet your search criteria.
      5. You can reduce the number of hits by either 1) reducing the E-value cutoff (e.g. “1E-100”) or increasing the Identity cutoff (e.g. 80).
      6. To view the results, click the blue “Search” button on the far right.

Steps using your own sequence:

      1. Under the search box at the RCSB, click on the “Advanced Search” link.
      2. Click on Sequence Similarity under the Advanced Search Query Builder to expand the menu (Figure 4).
      3. Paste your amino acid sequence of interest in the sequence box.
      4. Click “Count” on the far right to see how many hits meet your search criteria.
      5. You can reduce the number of hits by either 1) reducing the E-value cutoff (e.g. “1E-100”) or increasing the Identity cutoff (e.g. 80).
      6. To view the results, click the blue “Search” button on the far right.

Enzyme Classification Number

To identify PDB entries for particular enzyme it can be helpful to search by the Enzyme Classification (EC) number since each enzyme has a unique EC number.

Figure showing EC number search features.
Figure 5: The EC number search interface.

Steps:

      1. Under the search box at the RCSB, click on the “Advanced Search” link.
      2. Click on “Structure Attributes” under the Advanced Search Query Builder to expand the menu (Figure 5).
      3. Type “Enzyme” in the search box and you should see that under “Polymer Molecular Features,” two choices appear. Click “Enzyme Classification Number.”
      4. You can enter a whole or partial EC number (e.g. 2 or 2.1.3.2). You can also include several EC numbers in the search box, separated by commas.
      5. Your search will return hits containing any of the EC numbers in the search box.
      6. You may also choose to exclude certain EC numbers in your search by clicking the “+ NOT” button.
      7. Click “Count” on the far right to see how many hits meet your search criteria.
      8. To view the results, click the blue “Search” button on the far right.

PDB composition (Protein, NA, sugar)

PDB files have many other attributes that can be used to refine searches. Using the composition of the PDB file a search can be narrowed to a file that contains DNA, contains both protein and nucleic acid, or contains an oligosaccharide, for example.

Figure showing PDB composition search features.
Figure 6: The PDB composition search interface.

Steps:

      1. Under the search box at the RCSB, click on the “Advanced Search” link.
      2. Click on “Structure Attributes” under the Advanced Search Query Builder to expand the menu (Figure 6).
      3. Type “polymer” in the search box and you should click “Entry Polymer Composition” under “Entry Features.”
      4. Under the “– Select value –” pulldown you can select the composition of the PDB that you desire.
      5. You may also choose to exclude certain compositions in your search by clicking the “+ NOT” button.
      6. Click “Count” on the far right to see how many hits meet your search criteria.
      7. To view the results, click the blue “Search” button on the far right.

PDB size

PDB size can be searched two ways. The first method uses “polymer entity molecular weight” and will identify all PDB files that possess at least one chain with the given molecular weight. The second method uses “molecular weight per deposited model” and will only identify PDB files with that total molecular weight.

Steps using “Polymer Entity Molecular Weight”:

Figure showing polymer entity molecular weight search features.
Figure 7: The polymer entity molecular weight search interface.
      1. Under the search box at the RCSB, click on the “Advanced Search” link.
      2. Click on “Structure Attributes” under the Advanced Search Query Builder to expand the menu (Figure 7).
      3. Type “weight” in the search box and you should click “Polymer Entity Molecular Weight” under “Polymer Molecular Features.” The default is to search using the “=”; however, this dropdown menu can allow for a variety of searches (e.g. >, <, range).
      4. In the search box enter the kDa you would like to search on (e.g. 30).
      5. You may also choose to exclude certain compositions in your search by clicking the “+ NOT” button.
      6. Click “Count” on the far right to see how many hits meet your search criteria.
      7. To view the results, click the blue “Search” button on the far right.
Figure showing molecular weight per deposited model search features.
Figure 8: The molecular weight per deposited model search interface.

Steps using “Molecular Weight per Deposited Model”:

      1. Under the search box at the RCSB, click on the “Advanced Search” link.
      2. Click on “Structure Attributes” under the Advanced Search Query Builder to expand the menu (Figure 8).
      3. Type “weight” in the search box and you should click “Molecular Weight per Deposited Model” under “Entry Features.”
      4. The default is to search using the “=”; however, this dropdown menu can allow for a variety of searches (e.g. >, <, range).
      5. In the search box enter the kDa you would like to search on (e.g. 30).
      6. You may also choose to exclude certain compositions in your search by clicking the “+ NOT” button.
      7. Click “Count” on the far right to see how many hits meet your search criteria.
      8. To view the results, click the blue “Search” button on the far right.

License

Seeing the Invisible: Learning to Teach with Biomolecular Visualization Copyright © by The BioMolViz Working Group. All Rights Reserved.