Analysis Pipeline

SynGraph builds a colored de Bruijn graph over gene tokens to detect conserved synteny blocks across multiple genomes.

Project Structure

syn/
├── crates/
│   ├── syn-core/          # Core library (data structures, algorithms)
│   │   ├── token.rs       # Gene token types
│   │   ├── genome.rs      # Genome/chromosome representation
│   │   ├── bed.rs         # TOGA2 BED12 parser
│   │   ├── kmer.rs        # Gene k-mer construction
│   │   ├── graph.rs       # Colored de Bruijn graph
│   │   ├── compaction.rs  # Unitig compaction
│   │   └── synteny.rs     # Synteny block detection
│   ├── syn-cli/           # Command-line interface
│   └── syn-web/           # WASM + Web UI
├── data/                  # Input/output data
└── Makefile              # Build commands
            

Pipeline Steps

1
Input: TOGA2 BED12 Files Done

Parse TOGA2 output files containing gene annotations with embedded orthology information.

Input Format

BED12 with name field: transcript#ortholog_gene#scores$fragment

Key Filters

  • Only fragment #1 (avoid duplicates from fragmented genes)
  • Single-copy orthogroups only (filter multi-copy)
  • Collapse overlapping features to longest representative
2
Build Gene Graph Done

Construct a colored de Bruijn graph over gene tokens.

Process

For each genome, extract k-mers of consecutive genes. Add each k-mer as an edge, colored by the genome ID.

Parameters

  • k=3 (default) - 3-gene windows
  • Nodes = (k-1)-mers = 2-gene contexts

Command

./target/release/syn build-toga \
  --bed data/genome1.bed \
  --bed data/genome2.bed \
  --k 3 \
  --output data/graph.bin
3
Compact Graph Done

Collapse non-branching paths into unitigs (synteny blocks).

Process

Find maximal paths where all edges have the same color set. These become unitigs.

Output

  • Unitigs with gene lists
  • Color sets (which genomes contain each unitig)
  • Chromosome coordinates per genome
  • Adjacency information (graph structure)

Command

./target/release/syn compact \
  --graph data/graph.bin \
  --output data/compacted.bin
4
Synteny Analysis Done

Classify blocks and detect conserved synteny regions.

Block Classification

  • Universal: Present in all genomes
  • Conserved: Present in most genomes (threshold configurable)
  • Singleton: Present in only one genome

Metrics

  • Total blocks, universal blocks count
  • N50 block size (in genes)
  • Branch points count
5
Visualization Done

Interactive web browser for exploring synteny.

Features

  • Two genome tracks with blocks colored by chromosome
  • Ribbons connecting shared blocks (same block ID)
  • Gene search across all blocks
  • Click blocks to see gene lists
  • Pan and zoom navigation

Deploy

make build-wasm
make deploy
6
Rearrangement Detection Planned

Detect inversions, translocations, and other rearrangements.

Planned Features

  • Strand-aware analysis for inversion detection
  • Cross-chromosome block mapping for translocations
  • Breakpoint identification
  • Rearrangement classification and annotation

Quick Start

# Build everything
make build

# Run full pipeline on TOGA2 BED files
./target/release/syn build-toga \
  --bed data/HLlarFus1.sorted.bed \
  --bed data/HLtaeGutt6.sorted.bed \
  --k 3 \
  --output data/graph.bin

./target/release/syn compact \
  --graph data/graph.bin \
  --output data/compacted.bin

# Serve web interface
./target/release/syn serve --graph data/compacted.bin --port 8080

Data Flow

TOGA2 BED files          Gene Graph              Compacted Graph         Web Browser
┌─────────────┐         ┌─────────────┐         ┌─────────────┐         ┌─────────────┐
│ genome1.bed │────────▶│             │         │  Unitigs    │         │  Tracks     │
│ genome2.bed │         │  Colored    │────────▶│  + colors   │────────▶│  + ribbons  │
│ genome3.bed │────────▶│  de Bruijn  │         │  + coords   │         │  + popups   │
│    ...      │         │  Graph      │         │  + adj.     │         │  + search   │
└─────────────┘         └─────────────┘         └─────────────┘         └─────────────┘
                              │                       │
                              ▼                       ▼
                         graph.bin              compacted.bin