VCF and BCF Formatted files

VCF is a text-based file format for representing genetic polymorphism.

VCF files can be read using VCF.Reader, respectively:

reader = VCF.Reader(open("example.vcf", "r"))
for record in reader
    # do something
end
close(reader)

A reader first reads the header section of a file and creates a VCF.Header object. The header function is provided to access the header object of a reader:

julia> header(reader)
VariantCallFormat.Header:
  metainfo tags: fileformat fileDate source reference contig phasing INFO FILTER FORMAT
     sample IDs: NA00001 NA00002 NA00003

julia> findall(header(reader), "FORMAT")
4-element Array{VariantCallFormat.MetaInfo,1}:
 VariantCallFormat.MetaInfo:
    tag: FORMAT
  value: ID="GT" Number="1" Type="String" Description="Genotype"          
 VariantCallFormat.MetaInfo:
    tag: FORMAT
  value: ID="GQ" Number="1" Type="Integer" Description="Genotype Quality"
 VariantCallFormat.MetaInfo:
    tag: FORMAT
  value: ID="DP" Number="1" Type="Integer" Description="Read Depth"       
 VariantCallFormat.MetaInfo:
    tag: FORMAT
  value: ID="HQ" Number="2" Type="Integer" Description="Haplotype Quality"

VariantCallFormat.MetaInfo variables in the header support the following accessors:

AccessorDescription
metainfotagtag string
metainfovalvalue string
keyskeys of fields between '<' and '>'
valuesvalues of fields between '<' and '>'
[<key>]value of a field with key
julia> metainfo = VariantCallFormat.MetaInfo("##FORMAT=<ID=GT,Number=1,Type=String,Description=\"Genotype\">")
VariantCallFormat.MetaInfo:
    tag: FORMAT
  value: ID="GT" Number="1" Type="String" Description="Genotype"

julia> metainfotag(metainfo)
"FORMAT"

julia> metainfoval(metainfo)
"<ID=GT,Number=1,Type=String,Description=\"Genotype\">"

julia> keys(metainfo)
4-element Array{String,1}:
 "ID"         
 "Number"     
 "Type"       
 "Description"

julia> metainfo["ID"]
"GT"

VCF.Record and BCF.Record variables support the following accessor functions (see the docstring of each accessor for the details):

AccessorDescription
chromchromosome name
posreference position
idunique identifiers
refreference bases
altalternate bases
qualPhred-scaled quality score
filterfilter status
infoadditional information
infokeyskeys of additional information
formatgenotype format
genotypegenotype information
julia> record = VCF.Record("20\t14370\trs6054257\tG\tA\t29\tPASS\tNS=3;DP=14;AF=0.5;DB;H2\tGT:GQ:DP:HQ\t0|0:48:1:51,51\t1|0:48:8:51,51")
VariantCallFormat.Record:
   chromosome: 20
     position: 14370
   identifier: rs6054257
    reference: G
    alternate: A
      quality: 29.0
       filter: PASS
  information: NS=3 DP=14 AF=0.5 DB H2
       format: GT GQ DP HQ
     genotype: [1] 0|0 48 1 51,51 [2] 1|0 48 8 51,51

julia> VCF.chrom(record)
"20"

julia> VCF.pos(record)
14370

julia> VCF.id(record)
1-element Array{String,1}:
 "rs6054257"

julia> VCF.ref(record)
"G"

julia> VCF.alt(record)
1-element Array{String,1}:
 "A"

julia> VCF.qual(record)
29.0

julia> VCF.filter(record)
1-element Array{String,1}:
 "PASS"

julia> VCF.info(record)
5-element Array{Pair{String,String},1}:
 "NS"=>"3"  
 "DP"=>"14"
 "AF"=>"0.5"
 "DB"=>""   
 "H2"=>""   

julia> VCF.format(record)
4-element Array{String,1}:
 "GT"
 "GQ"
 "DP"
 "HQ"

julia> VCF.genotype(record)
2-element Array{Array{String,1},1}:
 String["0|0","48","1","51,51"]
 String["1|0","48","8","51,51"]

julia> VCF.genotype(record, 1:2, "GT")
2-element Array{String,1}:
 "0|0"
 "1|0"

julia> VCF.genotype(record, 1:1, "GT")
1-element Array{String,1}:
 "0|0"

julia> VCF.genotype(record, 1:2, "GT")
2-element Array{String,1}:
 "0|0"
 "1|0"