Biomedical and Translational Informatics Laboratory

BioBin

What is BioBin?

BioBin is a standalone command line application and collapsing method that uses prior biological knowledge from a prebuilt database. While it was specifically developed to investigate rare variant burden in traditional genetic trait studies, BioBin can be used to apply multiple levels of burden testing and is useful for exploring the natural distribution of rare variants in ancestral populations.

Why use BioBin?

The era of successful genome-wide association studies (GWAS) has increased the fields’ understanding of heritable traits, highlighted novel disease associations that were critical for further biochemical and pharmaceutical development, and advanced the understanding of genetic association and complexity of common diseases. However, large proportions of variance in common complex diseases remain to be investigated. Many researchers are investigating the effects of rare variants. Collapsing and/or binning methods such as BioBin have been a popular approach because they are easily applied to case-control studies, can utilize whole-genome data, and allow for the investigation of collective polygenic inheritance. BioBin’s approach meets four criteria we have defined for improved binning algorithm development:

1. Complexity of interactions (both epistatic and in aggregate) of rare and common variants

2. Potential non-independence between rare variants and between bins

3. Importance and possible limitation of “user” feature definition

4. Necessity of tool flexibility

Instead of focusing on a novel statistical test, we have concentrated on biologically-driven automated bin generation. BioBin can create bins based on many features, including: regulatory regions, evolutionary conserved regions, genes, and/or pathways. Based on the user’s selected features, BioBin creates appropriate feature level bins using information from one or more of the databases in our integrated database, called the Library of Knowledge Integration (LOKI). In addition, users can utilize complex binning, i.e. collapse only exons in pathways or perform regulatory and gene feature analyses simultaneously. The innovation of BioBin’s flexible algorithm and incorporation of prior biological knowledge to automate bin generation allows the user the opportunity to test unique hypotheses.

Latest Release:

New Features in 2.3:

  • Genomic build is now detected from the VCF file when --genomic-build is not specified.
  • Summary script output now includes case/control capacity and gene list, and is tab-separated by default.
  • Changed the representation of p-values from SKAT methods in the case when they are very close to 0.
  • Added option --bin-constant-loci to bin loci even when they do not vary among samples with non-missing phenotypes (default: N).
  • Added option --drop-missing-phenotype-samples to drop samples which are neither case nor control for any phenotype (default: Y).
  • Added option --force-all-control to consider all phenotypes as controls (default: N).
  • Added option --ignore-build-difference to suppress the error when the specified genome build does not match the VCF (default: N).
  • Added options --include-samples and --exclude-samples to specify files containing samples (one per line) to keep or filter out, respectively.
  • Added option --set-star-referent to consider star alleles as referent rather than missing (default: Y).
  • SKAT implementation is in beta development.

Documentation:

Archived Releases:

Archived Documentation:

Bug Fix in 2.2.1:

  • Fixed a multithreading crash when using roles and threads (multiple phenotypes) together.

Bug Fix in 2.2.0:

  • Correctly parsing variants that include the "FT" genotype-level filter tag.  Previous released versions used vcftools, which ignored this flag.  Previous beta versions incorrectly set the variant to monomorphic, effectively ignoring the entire variant.

New Features in 2.2:

  • Added ability to run PheWAS, reading the VCF only once
  • Compressed VCFs are automatically detected
  • Added an output listing unlifted loci
  • Added the ability to suppress summary information in the bins file
  • Removed dependence on VCFTools library and other various performance improvements
  • Removed Genotype and Frequency reports
  • Added the ability to run statistical tests within BioBin.  Tests included:
    • linear regression
    • logistic regression
    • Wilcoxon rank-sum test
    • SKAT (continuous phenotype) **Beta
    • SKAT (dichotomous phenotype) **Beta

Bug Fixes in 2.1.2:

  • Fixed reactome, GWAS catalog and liftOver loaders in LOKI.

Bug Fix in 2.1.1:

  • Fixed NaN in bins when using the built-in weighting method.

New Features in 2.1:

  • Weighting of individual loci according to MAF or user input
  • Sliding-window intergenic bins
  • Minimum MAF threshold added
  • Filtering of genes / regions of interest