Tutorial

1 User interface and utility

1.1. Home module

The Home module provides an introduction of the RNA-Protein Pocket database.

1.2. Search module

The Search box consists of two parts: search box and a description table of RNA classification. The RNA description table is the result of the presentative RNA-Protein complex classification based on their function and it provides users with the class name, function description, and members in the class. The table is designed to help users retrieve information using the pulldown search box easily. In the pulldown search box, users can select the object by sequence identity cutoff, RNA class, and presentative RNA PDB ID. For example, we fill in ‘1’, ‘1’，‘rRNA’, ‘1DK1’ in the three drop-down menus respectively, and it means we select an RNA structure whose PDB ID is 1DK1, and belongs to the rRNA of the dataset. Then a new page displays, it shows the detailed information for 1DK1. (1) A comprehensive information table, (2) A pocket information table, (3) Sequence Preview. The introductions of the search results are as follow.

(1) Comprehensive information table

A comprehensive information table consists of two sections: general information section provides the general information of 1DK1 structure including PDB ID, PubMed, RNA class, RNA Nts, Protein length, method, resolution, and a brief description.

(2) Pocket information table

the Pocket information section provides the interaction surface information (binding sites of RNA/Protein, secondary structure of RNA/Protien which interact with Protein/RNA) and pocket information (topology information, binding sites and secondary structure locating on the pocket of RNA/Protein ).

(3) Sequence Preview

A sequence map was generated to show the RNA/Protein binding sites, and binding fragraments on RNA/protein pockets in different colors.

1.3. Visualization module

In Visualization module, users can upload a molecule or pocket structure file in pdb format.Some functions are listed:
a. Users can click on the visual area, and then hold down the left mouse button to rotate the pocket structure.
b. Users also can adjust the structure scale by scrolling the pulley on the mouse to facilitate the observation.
c. If an atom is clicked on the visual area, the name of the atom and the residue which the atom belongs to can be displayed.
d. The visual area can be saved as a picture by clicking the "image" button or a pdb file by clicking the "pdb" button.
e. The size of the visual area can be adjusted by clicking the "big" and "small" buttons.
f. The key residues can be highlighted in different colors by clicking "rainbow" and "crimson" buttons.
g. Users can display pocket in several different styles by clicking "spacefill", "wire", "ball&stick", and "cartoons" buttons.

1.4. Download module

Three summary tables were provided in the download module. In the first table, users can download the RNA classification table (xlsx format) with the summary of the RNA classification and description based on function, all the RNA pocket files (MRC and PDB format) of the dataset. The second table is for the pockets of each RNA-protein structure, users can download the pocket information table (xlsx format) and pocket structure files (MRC and PDB format) with structure information of all pockets of each of the RNA-protein structure. These MRC or PDB format files can be visualized in UCSF Chimera or PyMol. The third table is for the statistical analysis data. Users can download the sequence and structure information of all RNA-protein complexes in this dataset (zip format), the RNA-protein complexes general information involved in this study (xlsx format), RNA and portein pocket topology information (xls format), and RNA-protein interaction information (zip format).

1.5. Links module

The Links module provides the other useful links of RNA 3D structure resources, sequence alignment, RNA modeling, molecular dynamics, molecular docking, and molecular visualization analysis. These useful websites would be helpful in RNA-related drug development and vaccine design.

1.6. Tutorial module

The Tutorial module provides the introduction to use the RPocket and the abbreviation for the RPpocket database.

1.7. Statistical module

Some statistical analysis was generated including: the statistical results of the RNA/protein binding fragments, the RNA nucleotide distributions, the number of the RNA nucleotide backbone (phosphate and sugar) and side-chain (base) interact with proteins, the secondary structure distributions in the RNA-protein interaction interface,the geometric information distribution for RNA binding-pocket and non-binding-pocket, and the average frequency of the amino acid located at the RNA-protein interface/interacting with nucleotides/interacting with RNA backbone (phosphate and sugar) and bases.

1.8. Contacts module

The Contacts module provides emails for users to comment or ask questions.

2 RPpocket database construction

1. A total of 1967 RNA-protein complexes were extracted from Protein Data Bank (PDB), which were solved by X-ray diffraction, solution NMR, and electron microscopy (March 2021). In order to facilitate the study, we have only retained the complexes composed of single stranded RNA and protein, in which the protein chain ranges from 1 to 10. In addition, it is difficult to determine the RNA-protein interaction and pocket information using many missing nucleotide or amino acid. Therefore, as long as the RNA or protein chain of the complex is incomplete, we will remove these structures. On the basis of excluding RNA-protein complexes with less than 20 or more than 500 nucleotides or amino acids, we use CD-hit server to classify these structures with RNA sequence identity >95%. For the representative structure of each cluster, we use the structure automatically generated by CD-hit. Thus, we obtained 74 redundant RNA-protein complexes. Among them, 50 structures were obtained by X-ray diffraction, 23 structures by NMR and 1by electron microscopy.

2. We identified the RNA-protein binding sites using a distance-based calculation. A nucleotide or amino acid is considered one binding site if the distance is less than 4 Å between the RNA and protein.

3. The secondary structure units of RNA (stem, internal loop, hairpin loop, bulge, multiple loops, and single-stranded) were identified and generated using RNA FRABASE 2.0 and TBI-foRNA. All the protein secondary structures (loop, sheet, helix) were obtained by PyMol.

4. The RNA pockets were detected by the 3V server using the rolling probe method . The volume and surface area were calculated by rolling two virtual probes (a shell probe and a solvent probe) around the van der Waals surface. We used the default radius value (10Å for shell probe radius and 3Å for solvent probe radius) to extract the RNA pockets. Next, the protein pockets were detected by DoGSiteScorer, which is widely used for detecting protein surface pockets and subpockets. The DoGSiteScorer uses the heavy atom coordinates to detect the pockets on the protein surface, provide the pocket mesh shape, and mark the grid points. Then, DoGSiteScorer calculates the volume and surface area values by multiplying the number of mesh points.

5. The binding patterns of RNA/Protein sequence and secondary structure, the pocket topology information were calculated and provided in the RPpocket server.