Filtering segments from an assembly graph
agtools can filter segments based on a minimum segment length. Segments that are shorter than the minimum length will be removed, along with any other elements that contain those segments. You can use the filter subcommand provided through the command-line interface. Please refer to the CLI reference for further details on the filter subcommand.
Here is an example GFA file.
# Test GFA file
H VN:Z:1.0
# Segments
S seq1 ATGCGTATGCGTATGCGTAA
S seq2 CGTACTGACTGACTGACTGA
S seq3 TGCATGCATGCATGCATGCA
S seq4 ACGTACGTACGTACGTACGT
S seq5 GTACGTACGTACGTACGTAC
S seq6 CGTACGTACGTACGTACGTA
S seq7 TACGTACGTACGTACGTACG
S seq8 ATCGATCGATCGATCGATCG
S seq9 GCGTGCGTGCGTGCGTGCGT
S seq10 TTGCTTGCTTGCTTGCTTGC
S seqX ACGTACGTAC
L seqX + seq2 + 5M
L seq4 + seq5 + 10M
L seq6 - seq7 + 7M
L seq9 + seq10 - 5M
J seq5 + seqX - *
J seq3 - seq8 + *
J seq7 + seq2 + *
C seq1 + seqX - 5 10M
C seq2 + seqX + 2 10M
P seqpath1 seq1+,seqX+,seq3- 20M,10M,20M
P seqpath2 seq4+,seq5+,seq6+ 20M,20M,20M
W seqread1 0 * <seq6<seqX>seq9
W seqread2 0 * <seq1>seq2<seq3
To remove segments shorter than 15 bp, run the following command.
agtools filter -g test_graph.gfa -l 15 -o results/filtered_graph.gfa
The filtered graph file will look as follows.
# Test GFA file
H VN:Z:1.0
# Segments
S seq1 ATGCGTATGCGTATGCGTAA
S seq2 CGTACTGACTGACTGACTGA
S seq3 TGCATGCATGCATGCATGCA
S seq4 ACGTACGTACGTACGTACGT
S seq5 GTACGTACGTACGTACGTAC
S seq6 CGTACGTACGTACGTACGTA
S seq7 TACGTACGTACGTACGTACG
S seq8 ATCGATCGATCGATCGATCG
S seq9 GCGTGCGTGCGTGCGTGCGT
S seq10 TTGCTTGCTTGCTTGCTTGC
L seq4 + seq5 + 10M
L seq6 - seq7 + 7M
L seq9 + seq10 - 5M
J seq3 - seq8 + *
J seq7 + seq2 + *
P seqpath2 seq4+,seq5+,seq6+ 20M,20M,20M
W seqread2 0 * <seq1>seq2<seq3
Note
Note that seqX and all links, jumps, containments, paths, and walks containing seqX have been removed from the assembly graph.