prot_db¶
The prot_db module contains classes to handle protein file and protein description which can be either generate by Prodigal or Provide by Gembase. It also provide an interface to abstract the way to get protein sequences and descriptions
-
class
integron_finder.prot_db.
ProteinDB
(replicon, cfg, prot_file=None)[source]¶ AbstractClass defining the interface for ProteinDB. ProteinDB provide an abstraction and a way to access to proteins corresponding to the replicon/contig CDS.
-
__getitem__
(prot_seq_id)[source]¶ Parameters: prot_seq_id (str) – the id of a protein sequence Returns: The Sequence corresponding to the prot_seq_id. Return type: Bio.SeqRecord
objectRaises: KeyError – when seq_id does not match any sequence in DB
-
__init__
(replicon, cfg, prot_file=None)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
__iter__
()[source]¶ Returns: a generator which iterate on the protein seq_id which constitute the contig. Return type: generator
-
__weakref__
¶ list of weak references to the object (if defined)
-
_make_db
()[source]¶ Returns: an index of the sequence contains in protfile corresponding to the replicon
-
_make_protfile
()[source]¶ Create fasta file with protein corresponding to the nucleic sequence (replicon)
Returns: the path of the created protein file Return type: str
-
get_description
(gene_id)[source]¶ Parameters: gene_id (str) – a protein/gene identifier
Returns: The description of the protein corresponding to the gene_id
Return type: SeqDesc
namedtuple objectRaises: - IntegronError – when gene_id is not a valid Gembase gene identifier
- KeyError – if gene_id is not found in GembaseDB instance
-
protfile
¶ Returns: The absolute path to the protein file corresponding to contig id Return type: str
-
-
class
integron_finder.prot_db.
ProdigalDB
(replicon, cfg, prot_file=None)[source]¶ Creates proteins from Replicon/contig using prodigal and provide facilities to access them.
-
__getitem__
(prot_seq_id)[source]¶ Parameters: prot_seq_id (str) – the id of a protein sequence Returns: The Sequence corresponding to the prot_seq_id. Return type: Bio.SeqRecord
object
-
__iter__
()[source]¶ Returns: a generator which iterate on the protein seq_id which constitute the contig. Return type: generator
-
_make_protfile
()[source]¶ Use prodigal to generate proteins corresponding to the replicon
Returns: the path of the created protfile Return type: str
-
get_description
(gene_id)[source]¶ Parameters: gene_id (str) – a protein/gene identifier
Returns: The description of the protein corresponding to the gene_id
Return type: SeqDesc
namedtuple objectRaises: - IntegronError – when gene_id is not a valid Gembase gene identifier
- KeyError – if gene_id is not found in ProdigalDB instance
-
-
class
integron_finder.prot_db.
GembaseDB
(replicon, cfg, prot_file=None)[source]¶ Implements
ProteinDB
from a Gembase. Managed proteins from Proteins directory corresponding to a replicon/contig-
__getitem__
(prot_seq_id)[source]¶ Parameters: prot_seq_id (str) – the id of a protein sequence Returns: The Sequence corresponding to the prot_seq_id. Return type: Bio.SeqRecord
object
-
__init__
(replicon, cfg, prot_file=None)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
__iter__
()[source]¶ Returns: a generator which iterate on the protein seq_id which constitute the contig. Return type: generator
-
_make_protfile
()[source]¶ Create fasta file with protein corresponding to this sequence, from the corresponding Gembase protfile This step is necessary because in Gembase Draft One nucleic file can contains several contigs, but all proteins are in the same file.
Returns: the path of the created protein file Return type: str
-
_parse_lst
()[source]¶ Parse the LSTINFO file and extract information specific to the replicon :return:
-
static
gembase_complete_parser
(lst_path, sequence_id)[source]¶ Parameters: - lst_path (str) – the path of of the LSTINFO file Gembase Complet
- sequence_id (str) – the id of the genomic sequence to analyse
Returns: the information related to the ‘valid’ CDS corresponding to the sequence_id
Return type: class:pandas.DataFrame` object
-
static
gembase_draft_parser
(lst_path, replicon_id)[source]¶ Parameters: - lst_path (str) – the path of of the LSTINFO file from a Gembase Draft
- sequence_id (str) – the id of the genomic sequence to analyse
Returns: the information related to the ‘valid’ CDS corresponding to the sequence_id
Return type: class:pandas.DataFrame` object
-
static
gembase_sniffer
(lst_path)[source]¶ Detect the type of gembase :param str lst_path: the path to the LSTINFO file corresponding to the nucleic sequence :returns: either ‘Complet’ or ‘Draft’
-
get_description
(gene_id)[source]¶ Parameters: gene_id (str) – a protein/gene identifier
Returns: The description of the protein corresponding to the gene_id
Return type: SeqDesc
namedtuple objectRaises: - IntegronError – when gene_id is not a valid Gembase gene identifier
- KeyError – if gene_id is not found in GembaseDB instance
-
-
class
integron_finder.prot_db.
SeqDesc
(id, strand, start, stop)¶ -
__getnewargs__
()¶ Return self as a plain tuple. Used by copy and pickle.
-
static
__new__
(_cls, id, strand, start, stop)¶ Create new instance of SeqDesc(id, strand, start, stop)
-
__repr__
()¶ Return a nicely formatted representation string
-
_asdict
()¶ Return a new OrderedDict which maps field names to their values.
-
classmethod
_make
(iterable)¶ Make a new SeqDesc object from a sequence or iterable
-
_replace
(**kwds)¶ Return a new SeqDesc object replacing specified fields with new values
-
id
¶ Alias for field number 0
-
start
¶ Alias for field number 2
-
stop
¶ Alias for field number 3
-
strand
¶ Alias for field number 1
-