VCF Format

Variant Calling Format is a tab-delimited text file that is used to describe single nucleotide variants (SNVs) as well as insertions, deletions, and other sequence variations. This is a bit limiting as it is only tailored to show variations and not genetic features (that’ll be covered on the next page).

There are 8 required fields for this format:

  1. Chromosome Name
  2. Chromosome Position
  3. ID
    • This is generally used to reference an annotated variant in dbSNP or other curate variant database.
  4. Reference base(s)
    • What is the reference’s base at this position
  5. Alternate base(s)
    • The variants found in your dataset that differ from the reference
  6. Variant Quality
    • Phred-scaled quality for the observed ALT
  7. Filter
    • Whether or not this has passed all filters – generally a QC measure in variant calling algorithms
  8. Info
    • This is for additional information, generally describing the nature of the position/variants with respect to other data.

Example VCF File

What software use VCF?

  • Output of SNP detection tools such as [GATK](https://software.broadinstitute.org/gatk/) and [Samtools](http://samtools.github.io/)
  • Input for SNP feature detection like [SNPeff](http://snpeff.sourceforge.net/)
  • [VCF Tools](https://vcftools.github.io/index.html)
  • Also the required format for [dbSNP](https://www.ncbi.nlm.nih.gov/projects/SNP/)

How are these files generated?

  • SNP callers generate these files as output.
  • Haplotyping software also report in this format.
  • Any database holding variant information will generally have this format available for download.