agtools API Reference
UnitigGraph
Represents a unitig-level assembly graph parsed from a GFA file.
| Attributes: |
|
|---|
Methods:
| Name | Description |
|---|---|
from_gfa |
Parse a GFA file into a UnitigGraph object. |
get_segment_sequence |
Retrieve a DNA sequence for a segment. |
get_neighbors |
Get neighboring segments of a given segment. |
get_adjacency_matrix |
Return the adjacency matrix as a matrix or a pandas DataFrame. |
is_connected |
Check if there is a path between two segments in the graph. |
get_connected_components |
Get connected components of the graph. |
calculate_average_node_degree |
Calculate the average node degree of the graph. |
calculate_total_length |
Calculate the total length of all segments in the graph. |
calculate_average_segment_length |
Calculate the average segment length. |
calculate_n50_l50 |
Calculate N50 and L50 for the segments in the graph. |
get_gc_content |
Calculate the GC content of segment sequences. |
Examples:
>>> from agtools.core.unitig_graph import UnitigGraph
>>> ug = UnitigGraph.from_gfa("assembly.gfa")
>>> ug.vcount
42
>>> ug.ecount
80
References
GFA: Graphical Fragment Assembly (GFA) Format Specification https://github.com/GFA-spec/GFA-spec
calculate_average_node_degree()
Calculate the average node degree of the graph.
| Returns: |
|
|---|
| Raises: |
|
|---|
Examples:
>>> ug.calculate_average_node_degree()
2.576374745417515
calculate_average_segment_length()
Calculate the average segment length.
| Returns: |
|
|---|
| Raises: |
|
|---|
Examples:
>>> ug.calculate_average_segment_length()
8490.319755600814
calculate_n50_l50()
Calculate N50 and L50 for the segment in the graph.
| Returns: |
|
|---|
Examples:
>>> ug.calculate_n50_l50()
(15000, 12)
calculate_total_length()
Calculate the total length of all segments in the graph.
| Returns: |
|
|---|
Examples:
>>> ug.calculate_total_length()
350000
from_gfa(file_path)
classmethod
Parse a GFA file into a UnitigGraph object.
| Parameters: |
|
|---|
| Returns: |
|
|---|
Examples:
>>> ug = UnitigGraph.from_gfa("assembly.gfa")
>>> ug.vcount
42
>>> ug.ecount
80
get_adjacency_matrix(type='matrix')
Return the adjacency matrix as igraph or pandas DataFrame.
| Parameters: |
|
|---|
| Returns: |
|
|---|
| Raises: |
|
|---|
Examples:
>>> matrix = ug.get_adjacency_matrix()
>>> isinstance(matrix, list)
True
>>> df = ug.get_adjacency_matrix(type="pandas")
>>> df.head()
unitig_1 unitig_2 unitig_3
unitig_1 0 1 0
unitig_2 1 0 1
unitig_3 0 1 0
get_connected_components()
Get connected components of the graph.
| Returns: |
|
|---|
Examples:
>>> components = ug.get_connected_components()
>>> len(components)
3
>>> [len(c) for c in components]
[10, 8, 5]
>>> components[0]
[0, 1, 2, 3, ...]
get_gc_content()
Calculate the GC content of segment sequences.
| Returns: |
|
|---|
| Raises: |
|
|---|
Examples:
>>> round(ug.get_gc_content(), 2)
0.42
get_neighbors(seg_name)
Get neighboring segments of a given segment.
| Parameters: |
|
|---|
| Returns: |
|
|---|
Examples:
>>> ug.get_neighbors("unitig_1")
['unitig_2', 'unitig_3']
get_path(path_name)
Retrieve the segment string and overlaps string of a path.
This method retrieves the segment string and overlaps string of a path from the original GFA file using byte offsets.
| Parameters: |
|
|---|
| Returns: |
|
|---|
| Raises: |
|
|---|
Examples:
>>> ug.get_path("path_1")
('unitig_1+,unitig_2+,unitig_3+', '*')
get_segment_sequence(seg_name)
Retrieve a DNA sequence for a segment.
This method retrieves the sequence of a segment from the original GFA file using byte offsets, without loading all sequences into memory at once.
| Parameters: |
|
|---|
| Returns: |
|
|---|
| Raises: |
|
|---|
Examples:
>>> ug.get_segment_sequence("unitig_1")[:10]
Seq('ATGCGTACGG')
is_connected(from_seg, to_seg)
Check if there is a path between two segments in the graph.
This method determines whether a path exists between the segment
specified by from_seg and the segment specified by to_seg
using the underlying graph's shortest path search.
| Parameters: |
|
|---|
| Returns: |
|
|---|
Examples:
>>> ug.is_connected("unitig_1", "unitig_2")
True
ContigGraph
Represents a contig-level assembly graph derived from a GFA file.
| Attributes: |
|
|---|
Methods:
| Name | Description |
|---|---|
get_contig_sequence |
Retrieve a DNA sequence for a contig. |
get_neighbors |
Get neighboring contigs of a given contig. |
get_adjacency_matrix |
Return the adjacency matrix as igraph or pandas DataFrame. |
is_connected |
Check if there is a path between two contigs in the graph. |
get_connected_components |
Get connected components of the graph. |
calculate_average_node_degree |
Calculate the average node degree of the graph. |
calculate_total_length |
Calculate the total length of all contigs in the graph. |
calculate_average_contig_length |
Calculate the average contig length. |
calculate_n50_l50 |
Calculate N50 and L50 for the contigs in the graph. |
get_gc_content |
Calculate the GC content of contig sequences. |
calculate_average_contig_length()
Calculate the average contig length.
| Returns: |
|
|---|
| Raises: |
|
|---|
Examples:
>>> cg.calculate_average_contig_length()
40000
calculate_average_node_degree()
Calculate the average node degree of the graph.
| Returns: |
|
|---|
| Raises: |
|
|---|
Examples:
>>> cg.calculate_average_node_degree()
1
calculate_n50_l50()
Calculate N50 and L50 for the contigs in the graph.
| Returns: |
|
|---|
Examples:
>>> cg.calculate_n50_l50()
(15000, 12)
calculate_total_length()
Calculate the total length of all contigs in the graph.
| Returns: |
|
|---|
Examples:
>>> cg.calculate_total_length()
120000
get_adjacency_matrix(type='matrix')
Return the adjacency matrix as igraph or pandas DataFrame.
| Parameters: |
|
|---|
| Returns: |
|
|---|
| Raises: |
|
|---|
Examples:
>>> matrix = cg.get_adjacency_matrix()
>>> isinstance(matrix, list)
True
>>> df = cg.get_adjacency_matrix(type="pandas")
>>> df.head()
contig_1 contig_2 contig_3
contig_1 0 1 0
contig_2 1 0 1
contig_3 0 1 0
get_connected_components()
Get connected components of the graph.
| Returns: |
|
|---|
Examples:
>>> components = cg.get_connected_components()
>>> len(components)
3
>>> [len(c) for c in components]
[10, 8, 5]
>>> components[0]
[0, 1, 2, 3, ...]
get_contig_sequence(contig_name)
Retrieve a DNA sequence for a contig.
This method retrieves the sequence of a contig from the contigs file using byte offsets, without loading all sequences into memory at once.
| Parameters: |
|
|---|
| Returns: |
|
|---|
Examples:
>>> cg.get_contig_sequence("contig_1")
Seq('TTGATGCGACGTACGG')
get_gc_content()
Calculate the GC content of contig sequences.
| Returns: |
|
|---|
| Raises: |
|
|---|
Examples:
>>> cg.get_gc_content()
0.42
get_neighbors(contig_name)
Get neighboring contigs of a given contig.
| Parameters: |
|
|---|
| Returns: |
|
|---|
Examples:
>>> cg.get_neighbors("contig_1")
['contig_2', 'contig_3']
is_connected(from_contig, to_contig)
Check if there is a path between two contigs in the graph.
This method determines whether a path exists between the contig
specified by from_contig and the contig specified by to_contig
using the underlying graph's shortest path search.
| Parameters: |
|
|---|
| Returns: |
|
|---|
| Raises: |
|
|---|
Examples:
>>> cg.is_connected("contig_1", "contig_2")
True
FastaParser
A minimal, lightweight FASTA parser with on-demand sequence retrieval.
This parser builds an index mapping of sequence names to byte offsets in the file, allowing sequences to be fetched lazily without loading the entire FASTA file into memory. Works with both plain-text FASTA and gzip-compressed FASTA (.gz).
| Attributes: |
|
|---|
Methods:
| Name | Description |
|---|---|
get_sequence |
Retrieve a DNA sequence by sequence name. |
get_index |
Retrieve the file pointer of the DNA sequence by sequence name. |
Examples:
>>> from agtools.core.fasta_parser import FastaParser
>>> parser = FastaParser("contigs.fasta")
get_index(seq_name)
Retrieve the file pointer of the DNA sequence by sequence name.
| Parameters: |
|
|---|
| Returns: |
|
|---|
| Raises: |
|
|---|
Examples:
>>> parser.get_index("contig_1")
8487228
get_sequence(seq_name)
Retrieve a DNA sequence by sequence name.
| Parameters: |
|
|---|
| Returns: |
|
|---|
| Raises: |
|
|---|
Examples:
>>> seq = parser.get_sequence("contig_1")
>>> len(seq)
1500
>>> seq[:10]
Seq('TGGCTCTTCA')