There are a variety different sequencing technologies as well as file formats used in sequence analysis. Below we describe how next-generation sequencing works as well as file formats that are most commonly encountered, including those generated by the sequencer and analysis programs.
This section is here to help you better understand how data are generated and what happens next and how these different file formats are used.
Before beginning, it’s good to be familiar with some terminology that will be used from here on out.
read: a single sequence produced from a sequencer. Think: a sequencing machine read a molecule and this is what it thinks it is.
library: a collection of DNA fragments that have been prepared for sequencing. This is generally talking about individual samples.
flowcell: a chip on which DNA is loaded and provided to the sequencer.
lane: one portion of a flowcell. Usually used for technical replicates or different samples.
run: an entire sequencing reaction from start to finish.