BED Format

The official documentation for BED format can be found here.

BED format is a simple way to define basic sequence features to a sequence. It consists of one line per feature, each containing 3-12 columns of data, plus optional track definition lines. These are generally used for user defined sequence features as well as graphical represntations of features.

Here are some links to the formal definitions of each field.

Required fields

The first three fields in each feature line are required:

  1. Chromosome Name
    • Name of the chromosome or scaffold. Any valid seq\_region\_name can be used, and chromosome names can be given with or without the ‘chr’ prefix.
  2. Chromosome Start
    • Start position of the feature in standard chromosomal coordinates \(i.e. first base is 0\).
  3. Chromosome End
    • End position of the feature in standard chromosomal coordinates

Optional fields

Nine additional fields are optional. Note that columns cannot be empty – lower-numbered fields must always be populated if higher-numbered ones are used.

  1. Name
    • Label to be displayed under the feature, if turned on in “Configure this page”.
  2. Score
    • A score between 0 and 1000. See
      [track lines](http://asia.ensembl.org/info/website/upload/bed.html#tracklines)
      , below, for ways to configure the display style of scored data.
  3. Strand
    • defined as + \(forward\) or – \(reverse\).
  4. thickStart
    • coordinate at which to start drawing the feature as a solid rectangle
  5. thickEnd
    • coordinate at which to stop drawing the feature as a solid rectangle
  6. itemRgb
    • an RGB colour value \(e.g. 0,0,255\). Only used if there is a track line with the value of itemRgb set to “on” \(case-insensitive\).
  7. blockCount
    1. the number of sub-elements \(e.g. exons\) within the feature
  8. blockSizes
    • the size of these sub-elements
  9. blockStarts
    • the start coordinate of each sub-element

Track lines

Track definition lines can be used to configure the display further, e.g. by grouping features into separate tracks. Track lines should be placed at the beginning of the list of features they are to affect.

The track line consists of the word ‘track’ followed by space-separated key=value pairs – see the example below. Valid parameters used by Ensembl are:

  • name: unique name to identify this track when parsing the file
  • description: Label to be displayed under the track in Region in Detail
  • priority: integer defining the order in which to display tracks, if multiple tracks are defined.
  • color: as RGB, hex or X11 named color
  • useScore: a value from 1 to 4, which determines how scored data will be displayed. Additional parameters may be needed, as described below.
  • tiling array (example file)
  • colour gradient – defaults to Yellow-Green-Blue, with 20 colour grades. Optionally you can specify the colours for the gradient (cgColour1, cgColour2, cgColour3) as either RGB, hex or X11 colour names, and the number of colour grades (cgGrades). (example file)
  • histogram (example file)
  • wiggle plot (example file)
  • itemRgb
  • if set to ‘on’ (case-insensitive), the individual RGB values defined in tracks will be used.

BedGraph format

BedGraph is a suitable format for moderate amounts of scored data. It is based on the BED format (see above) with the following differences:

  1. The score is placed in column 4, not column 5
  2. Track lines are compulsory, and must include type=bedGraph. Currently the only optional parameters supported by Ensembl are:

* name
* see above
* description
* see above
* priority
* see above
* graphType
* either ‘bar’ or ‘points’.

What software use bed files?

  • Alignment viewers can use these data to graphically display certain features.
  • [bedtools](http://bedtools.readthedocs.io/en/latest/index.html) uses this format to query for nearby features.
  • Some annotation files are in this format.
  • Feature detection packages use this as output.

How are these files generated?

  • Feature detection algorithms.
  • Lots of databases that hold certain genomic features report their data in this format.
  • Sometimes manually curated from alignments (via bedtools, bamtools, etc.).