A group of genes from different species that evolved from a common ancestral gene. Identified by tools like OrthoFinder or extracted from TOGA2 output.
ENSGALG00010001554 (chicken gene ID used as orthogroup identifier in TOGA2)
Output format from TOGA2 (Tool to infer Orthologs from Genome Alignments). Contains gene annotations with orthology information embedded in the name field.
transcript#ortholog_gene#scores$fragmentENSGALT00010003578#ENSGALG00010001554#19155,18478$1
The fundamental unit in our de Bruijn graph. Represents a gene by its orthogroup ID.
BRCA1, TP53, ENSGALG00010001554
A sequence of k consecutive gene tokens from a genome. Unlike DNA k-mers, these represent gene order, not nucleotide sequence.
[BRCA1, TP53, MYC] is a 3-mer
A graph where:
Nodes: (k-1)-mers of gene tokens
Edges: k-mers connecting nodes
Colors: Each edge is "colored" by the set of genomes that contain it
A maximal non-branching path in the colored de Bruijn graph. This means:
1. All edges in the path have the same set of colors (same genomes)
2. At each internal node, there is exactly one way to continue the path
3. The path cannot be extended further while maintaining these properties
[X-Y] is a unitig shared by A, B, C[Y-Z] is a unitig shared by A, B only[Y-W] is a unitig unique to C
Key insight: A unitig with the same ID in different genomes represents the exact same conserved gene order.
A node in the graph where paths diverge - different genomes have different gene arrangements at this position. Branch points indicate potential rearrangement sites.
Conservation of gene order between genomes. Two genomes are syntenic in a region if they share the same sequence of genes.
A unitig (block) that appears in multiple genomes. In the browser, blocks with the same ID in both compared genomes are connected by ribbons.
A unitig present in all analyzed genomes - represents highly conserved synteny.
A unitig present in only one genome - represents a unique gene arrangement.
A chromosomal rearrangement where a segment is reversed. Will be detected by incorporating strand information into the graph.
Status: Not yet implemented
Movement of a genomic segment to a different chromosome. Can be detected when adjacent blocks in one genome are on different chromosomes in another.
Status: Not yet implemented
Linear representation of a genome showing synteny blocks as colored rectangles. Blocks are colored by chromosome and sized by gene count.
A curved connection between two genome tracks showing that the same block (same ID) exists in both genomes. Ribbons visualize synteny relationships.