Variant Calling Format is a tab-delimited text file that is used to describe single nucleotide variants (SNVs) as well as insertions, deletions, and other sequence variations. This is a bit limiting as it is only tailored to show variations and not genetic features (that’ll be covered on the next page).
There are 8 required fields for this format:
- Chromosome Name
- Chromosome Position
- This is generally used to reference an annotated variant in dbSNP or other curate variant database.
- Reference base(s)
- What is the reference’s base at this position
- Alternate base(s)
- The variants found in your dataset that differ from the reference
- Variant Quality
- Phred-scaled quality for the observed ALT
- Whether or not this has passed all filters – generally a QC measure in variant calling algorithms
- This is for additional information, generally describing the nature of the position/variants with respect to other data.
Example VCF File
What software use VCF?
- Output of SNP detection tools such as [GATK](https://software.broadinstitute.org/gatk/) and [Samtools](http://samtools.github.io/)
- Input for SNP feature detection like [SNPeff](http://snpeff.sourceforge.net/)
- [VCF Tools](https://vcftools.github.io/index.html)
- Also the required format for [dbSNP](https://www.ncbi.nlm.nih.gov/projects/SNP/)
How are these files generated?
- SNP callers generate these files as output.
- Haplotyping software also report in this format.
- Any database holding variant information will generally have this format available for download.