PaSGAL: Parallel Sequence to Graph Alignment

Overview

PaSGAL (Parallel Sequence to Graph Aligner) is designed to accelerate alignment of sequences to directed acyclic sequence graphs (DAGs), e.g., variation graphs, splicing graphs. The underlying algorithm is a parallelization of dynamic programming procedure for sequence to DAG alignment. As computing exact alignments is compute intensive, PaSGAL uses Advanced Vector Extensions (AVX) SIMD instructions and OpenMP to achieve high alignment performance on CPUs equipped with multiple cores and wide SIMD width. Given a set of query sequences and a reference DAG, PaSGAL produces an highest scoring optimal local alignment for each query sequence along a path in the graph. The software and its documentation are available on GitHub. In the paper [1], we demonstrate its utility as an accurate and scalable aligner for both short and long reads to large variation graphs (e.g. MHC).

Use the following links to access the data sets that we used for evaluation [1]. These sequence and graph formats are readable by PaSGAL.

  • Simulated read sets
    • LRC : L1 (fasta), L2 (fastq), L3 (fastq)
    • MHC : M1 (fasta), M2 (fastq), M3 (fastq)
  • Variations graphs : LRC (vg), MHC1 (vg), MHC2 (txt)


Publication

  1. Chirag Jain, Sanchit Misra, Haowen Zhang, Alexander Dilthey and Srinivas Aluru. "Accelerating Sequence Alignment to Graphs", IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2019.