Adv¶

RNA 3D structure prediction¶

For each sequence chosen for folding, secondary structure predictions were generated based on the MSA. Two methods were used in this study: SimRNA and Rosetta. For Rosetta, a total of 10,000 decoys were generated for the target sequence and each homologous sequence using the Rosetta FARFAR protocol. For SimRNA prediction, SimRNAweb server was used using the default parameters.

Both modeling steps can be performed in a semi-automated way with rna-tools (M.M. et al., unpublished, software available for download at https://github.com/mmagnus/rna-tools) as well as the pipeline of tools facilitating modeling with Rosetta (https://rna-tools.readthedocs.io/en/latest/tools.html#rosetta) and SimRNA/SimRNAweb (https://rna-tools.readthedocs.io/en/latest/tools.html#simrnaweb).

evoClustRNA¶

usage: evoClustRNA.py [-h] [-a RNA_ALIGNMENT_FN] [-o OUTPUT_DIR]
                      [-i INPUT_DIR] [-m MAPPING_FN] [-x MATRIX_FN] [--inf]
                      [-v] [-s] [-f]

Named Arguments¶

`-a, --rna_alignment_fn`
	rna alignemnt with the extra guidance line, e.g. test_data/rp14sub.stk
`-o, --output_dir`
	output folder where motifs and structures will be saved, e.g. test_out/rp14 (default: out -> out/structures and out/motifs will be created Default: “out”
`-i, --input_dir`
	input folder with structures, .e.g. test_data Default: “.”
`-m, --mapping_fn`
	a file with mapping folders on the drive with sequence names in the alignment (<name in the alignment>:<folder name>), use multiple lines for multiple seqs
`-x, --matrix_fn`
	output matrix with rmsds all-vs-all Default: “”
`--inf`	Use INFs instead of RMSD Default: False
`-v, --verbose`	be verbose Default: False
`-s, --save`	save motifs and structures to output_dir, this slows down the program Default: False
`-f, --flat-dir`	use flat directory structure, structures/<all pdbs here>, fetch pdbs based on leading part of names Default: False

When RNA models are loaded, models ending with ‘template.pdb’ are ignore.

evoClustRNA.get_rna_models_from_dir(directory, residues, save, output_dir, flat_dir)[source]¶

@todo

This function goes folder by folder.

Ugly hack: it removes clust01-05X from the list.

Parameters:	directory – residues – save – output_dir –
Returns:
Return type:

evoClustRNA.parse_num_list(s)[source]¶: http://stackoverflow.com/questions/6512280/accept-a-range-of-numbers-in-the-form-of-0-5-using-pythons-argparse

evoClustRNA.sort_nicely(l)[source]¶

Sort the given list in the way that humans expect.

http://blog.codinghorror.com/sorting-for-humans-natural-sort-order/

evoClust_autoclustix.py¶

usage: evoClust_autoclustix.py [-h] [--half] [-v] matrix

Positional Arguments¶

matrix

A txt file with a similarity matrix with column headers, See test_data/matrix.txt for more . ! .txt is need to auto-removal system to work

Named Arguments¶

--half

50% in 3 the biggest clusters

Default: False

-v, --verbose

Default: False

evoClust_autoclustix.py implements a simple interactive clustering. Technically, this script is a simple wrapper for evoClust_clustix.py.

usage: evoClust_clustix.py [-h] [-o OUTPUT] [-c CUT_OFF] [-v] matrix

Positional Arguments¶

matrix

A txt file with a similarity matrix with column headers, See test_data/matrix.txt for more

Named Arguments¶

-o

See test_data/output.txt for more, don’t type extension of the file

-c

Cut_off of RMSD for the formation of a cluster

Default: 5.0

-v, --verbose

be verbose

Default: False

evoClust_get_models.py¶

evoClust_get_models.py

Uses find in curr directory to find needed file.

This script creates: - reps for top 5 clusters representative structures - resp_motifs for top 5 clusters representative motifs

Add cutoff the name of reps, e.g. reps_c2.5

The script has the second mode right now:

[mm] rosetta-5x$ evoClust_get_models.py -i structures/ ade_plus_ade_cleanup_mapping_pkX_*.out -n adepk
evoClust_get_models.py
--------------------------------------------------------------------------------
['adepk_min.out.10.pdb', 'adepk_min.out.5.pdb', '', 'adepk_min.out.1.pdb', '']
1_adepk_min.out.10.pdb
2_adepk_min.out.5.pdb
3_
4_adepk_min.out.1.pdb
5_
= structures == out/structures/<files>===================
cp -v structures//adepk_min.out.10.pdb reps_ns/c1_adepk_min.out.10.pdb
structures//adepk_min.out.10.pdb -> reps_ns/c1_adepk_min.out.10.pdb
cp -v structures//adepk_min.out.5.pdb reps_ns/c2_adepk_min.out.5.pdb
structures//adepk_min.out.5.pdb -> reps_ns/c2_adepk_min.out.5.pdb
cp -v structures// reps_ns/c3_
cp: structures// is a directory (not copied).
cp -v structures//adepk_min.out.1.pdb reps_ns/c4_adepk_min.out.1.pdb
structures//adepk_min.out.1.pdb -> reps_ns/c4_adepk_min.out.1.pdb
cp -v structures// reps_ns/c5_
cp: structures// is a directory (not copied).

# evoClust_get_models.py -i structures/ ade_plus_ade_cleanup_mapping_pkX_*.out -n adepk

first, the input is parsed to get borders of lines of clusters. These borders are used to select structures that come to a given cluster. For each cluster, there is a search if within it there is a structure that starts with a given name - defined with –NATIVE_SEQ_ONLY. If there is none, then to the reps list ‘’ is appended.

OLD: It reads out folder created by evoclustRNA.py in structure such as: - out/structures/<homologs>

usage: evoClust_get_models.py [-h] [-i INPUT_DIR] [-o OUTPUT_PREFIX] [-c] [-s]
                              [-u] [-n NATIVE_SEQ_ONLY]
                              clustix_results_fn

Positional Arguments¶

`clustix_results_fn`

Named Arguments¶

`-i, --input_dir`
	input folder with structures, .e.g. test_data Default: “out”
`-o, --output_prefix`
	output folder where motifs and structures will be saved, e.g. test_out/rp14 Default: “”
`-c, --use-cutoff-for-names`
	Default: False
`-s, --skip_motifs`
	Default: False
`-u, --skip_structures`
	Default: False
`-n, --native-seq-only`

Python Classes used in the scripts¶

RNAmodel¶

class RNAmodel.RNAmodel(fpath, residues, save=False, output_dir='')[source]¶

Example:	>>> rna = RNAmodel("test_data/rp14/rp14_5ddp_bound_clean_ligand.pdb", [1], False, None) >>> rna.get_report() "File: rp14_5ddp_bound_clean_ligand.pdb # of atoms: 1 \nresi: 1 atom: <Atom C3'> \n"
Parameters:	fpath – file path, string residues – list of residues to use (and since we take only 1 atom, C3’, this equals to number of atoms. save – boolean, save to output_dir or not output_dir – string, if save, save segments to this folder

get_report()[source]¶: Str a short report about rna model

get_rmsd_to(other_rnamodel, output='', dont_move=False)[source]¶: Calc rmsd P-atom based rmsd to other rna model

save(output_dir, verbose=True)[source]¶: Save structures and motifs

RNAalignment¶

RNAalignment

Example:

    # STOCKHOLM 1.0

AACY023581040                --CGUUGGACU------AAA--------AGUCGGAAGUAAGC-----AAU-C------GCUGAAGCAACGC---
    AJ630128                     AUCGUUCAUUCGCUAUUCGCA-AAUAGCGAACGCAA--AAG------CCG-A-------CUGAAGGAACGGGAC
    target                       --CGUUGACCCAG----GAAA-----CUGGGCGGAAGUAAGGCCCAUUGCACUCCGGGCCUGAAGCAACGCG--
    #=GC SS_cons                 ::(((((,<<<<<<<.._____..>>>>>>>,,,,,,,,<<<<...._____.....>>>>,,,,)))))::::
    x                            --xxxxxxxxx-----------------xxxxxxxx--xxx------------------xxxxxxxxxxx----
    #=GC RF                      AUCGUUCAuCucccc..uuuuu..ggggaGaCGGAAGUAGGca....auaaa.....ugCCGAAGGAACGCguu
//

x line is used to pick resides to calculate RMSD.

x line could be renamed to EvoClust

class RNAalignment.RNAalignment(fn)[source]¶

RNAalignemnt

get_range(seqid, offset=0, verbose=True)[source]¶

Get a list of positions for selected residues based on the last line of the alignment!

If seqis not found in the alignment, raise an exception, like

Exception: Seq not found in the alignment: 'CP000879.1/21644622164546

Warning

EvoClust lines has to be -1 in the alignemnt.