the chicken genome sequence is important for several reasons.
first, the chicken shared a common ancestor with mammals ～310 million years ago (mya) at a phylogenetic distance not previously covered by other genome sequences. it therefore fills a gap in our knowledge and understanding of the evolution and conservation of genes, regulatory sequences, genomes, and karyotypes.
the chicken is also a major source of protein in the world, with billions of birds used in meat and egg production each year. it is the first livestock species to be sequenced and so leads the way for others. the sequence and the 2.8 million genetic polymorphisms defined in a parallel project are expected to benefit agriculture and cast new light on animal domestication. also, as the first bird to be sequenced, it is a model for the 9600 avian species thought to exist today. many of the features of the chicken genome and its biology make it an ideal organism for studies in development and evolution, along with applications in agriculture and medicine.
avian genomics has its origins in genetic linkage mapping (burt and cheng 1998), but our knowledge of the chicken genome has been transformed in recent years, mostly through the analysis of large numbers of partial cdna sequences (abdrakhmanov et al. 2000; tirunagaru et al. 2000; boardman et al. 2002) and culminating with the chicken genome sequence (hillier et al. 2004). these were landmark events in our understanding of avian biology, developmental biology, and the evolution of vertebrates and will facilitate applications in agriculture and medicine.
chicken research has had a significant impact on fundamental biology and the chicken has been a popular model organism for at least 100 years, for example, with the discovery of b cells and tumor viruses (brown et al. 2003). ready access to the chicken embryo using incubated eggs and the ease of manipulation make this system ideal for studies of vertebrate development (stern 2004, 2005). the chicken has been used in many of the classical studies on the molecular basis of patterning in the vertebrate embryo, in particular, the limb bud. in recent times, other model organisms, such as the mouse and zebrafish, have been in greater demand because of increased genetic resources and the ability to manipulate their genomes. the chicken est and genome programs have removed many of these limitations in the chicken. in addition, new tools such as the electroporation of chicken embryos and the use of rnai to knock down gene expression are likely to make the chicken embryo a powerful model for the molecular study of development in vertebrates (stern 2004, 2005).
during the past 80 years, modern selective breeding has made spectacular progress in both egg and meat production traits (burt 2002). world egg production has increased to 795 billion/year in 2002 (commodity research bureau [crb]) and broiler meat to 6.5 million tons/year (usda foreign agricultural service [fas]), during this period. associated with these successes have been a number of undesirable traits.in meat-type chickens, there has been an increase in the incidence of congenital disorders, such as ascites and lameness, reduced fertility, and reduced resistance to infectious disease. in egg-type chickens, there has been an increase in the incidence of osteoporosis associated with increased egg production. given the possibility that genetic progress in egg and meat production will reach its limit within the next twenty years (burt 2002), priorities in the poultry industry will be to reduce these costs and develop new products. the consumer wants high-quality products(e.g., increased egg shell strength), which requires greater uniformity and predictability in production. with an increased requirement for food safety, there will be a need to reduce the use of chemicals and antibiotics and increase genetic resistance to pathogens. these new traits are difficult and costly to measure by conventional genetic selection, and the developments in poultry genomics in the last few years promises new solutions to these problems.
in this report, key features and limitations of the draft chicken genome sequence will be discussed. more detailed reviews have been presented elsewhere from the viewpoint of genomics (burt 2004a; dequeant and pourquié 2005), developmental biology (burt 2004b; stern 2005), evolution (ellegren 2005), and genomic tools (antin and konieczka 2005).
the current draft of the chicken genome (hillier et al. 2004) was assembled using a whole-genome sequencing strategy, including bac, fosmid, and plasmid paired-end reads (washu). this approach produced a high-quality assembly, in part because of the relatively small size of the chicken genome, one third that of a typical mammal. however, it was the low repetitive dna content, only 11% compared with 40%–50% found in mammals, that was a key contributing factor to the quality of the final assembly. this sequence employed dna from a single inbred female jungle fowl (gallus gallus gallus, the ancestor of domesticated chickens; fumihito et al. 1994) and represented a 6.6-fold coverage of the genome. together with genetic and bac maps, almost 100,000 contigs were assembled into a scaffold of 907 mb, or 86% of a 1050-mb genome. in birds, it is the female that is the heterogametic sex, with single copies of the z and w chromosomes. therefore, these chromosomes were poorly represented in the final assembly. in addition, unlike the rest of the genome, the w chromosome has a high repeat content and so very little sequence was assembled. targeted sequencing of the sex chromosomes will be necessary to complete their assemblies. for autosomes, sequence coverage was 98% based on overlaps with an independent set of bac clones sequenced to high quality. overlaps with cdna clones suggested 5%–10% of genes were missing from the final assembly; gene duplications and gc-rich sequences were a particular problem. the mhc region on chromosome 16, a rich source of duplicated genes, was very poorly represented. further work to complete the chicken genome sequence to a high quality for comparative genomics and gene discovery is required.
a unique characteristic of avian genomes is the large variability in chromosome size. in addition to a pair of sex chromosomes, chickens have 38 pairs of autosomes: 5 macro-, 5 intermediate, and 28 microchromosomes. since each chromosome arm must have at least one obligate crossover, it follows that the microchromosomes will have the highest rate of recombination. comparison of genetic maps (schmid et al. 2005) and genome sequences confirms this expectation, with crossover rates of 2.8 cm/mb for macrochromosomes and 6.4 cm/mb for microchromosomes. this is in contrast to 1–2 cm/mb for most human chromosomes, making the chicken ideal for genetic linkage studies. high-resolution genetic maps will be necessary to define variation in recombination rates within chromosomes.
many sequence characteristics, such as %gc content, cpg island density, and gene density, show clear relationships with chromosome size and therefore recombination rate (table 1). however, we must be cautious about making any conclusions on cause and effect with these correlations (fazzari and greally 2004). the density of genes is highest on the microchromosomes, confirming earlier conclusions based on mapping genes (smith et al. 2000) and cpg islands (mcqueen et al. 1996). the estimated number of cpg islands based on bioinformatics approaches depends on the definition in use. in this case (hillier et al. 2004), ～70,000 cpg islands were predicted in the chicken, with 38% of these located in regions of conserved synteny with mammalian genomes. since 48% are associated with a gene, cpg island density mimics gene density and is highest on microchromosomes. conversely, sizes of introns and intergenic regions and density of repetitive elements correlate negatively with gene density and are reduced on microchromosomes. if we assume that genomes balance selective constraints favoring dna loss over those that favor expansion and that selection will be most efficient in regions of high recombination where linkage of alleles are more readily broken (hill and robertson 1966), then the correlation of the densities of genes, cpg islands, repeats, etc. with chromosome size (and therefore recombination rate) is to be expected.
general characteristics of macro-and microchromosomes:
comparison of orthologous chicken and turkey sequences revealed that different chromosome size classes are subject to different evolutionary forces (axelsson et al. 2005). microchromosomes show 18% higher sequence divergence in introns and a 26% higher rate of synonymous substitution in coding sequences than macrochromosomes, indicating that the smaller chromosomes are more susceptible to germline mutations. a possible cause for the differences in mutation rate is “biased-gene-conversion” (meunier and duret 2004), a recombination-induced mutation mechanism.
ever since the first gene maps were created (haldane 1927), comparative maps have been used to examine the evolution of the vertebrate genome. comparisons between the early gene maps of human and chicken (burt et al. 1999) suggested extensive conservation of synteny, possibly more than found between mouse and human. the comparison of chicken with mammalian and fish genomes has confirmed and extended this view (bourque et al. 2005). the estimated number of interchromosomal rearrangements between the mammalian ancestor and chicken, during an estimated period of 500 million years (myr), is almost the same as the number found in the mouse lineage, over the course of ～87 myr.
genes and proteins
a major benefit of the chicken genome sequence has been the set of gene predictions. the most conservative evidence-based approach of ensembl generated 17,709 predictions (table 2). the comparative ab initio methods, twinscan (korf et al. 2001) and sgp-2 (syntenic gene prediction-2) (parra et al. 2003), predict larger gene sets but likely include false positives. in total, there may be 20,000–23,000 genes; suggesting we still have more to learn about gene prediction (eyras et al. 2005). when used to identify novel genes missed in the current human gene set (ensembl 22,287 genes), only an additional 37 were predicted (castelo et al. 2005), which suggests we have identified most of the “conserved” genes found in birds and mammals. only 75 processed (or retrotransposed) pseudogenes were found in the chicken genome (hillier et al. 2004), compared with 15,000 in mammals. the reason for this low number may be the sequence specificity of reverse transcription by avian lines (long interspersed elements). mammalian lines are more promiscuous and able to retrotranspose most mrnas. it was hoped that the lack of pseudogenes in the chicken would help to identify functional noncoding rna genes in mammalian genomes via conservation of chromosomal gene location. (because of their noncoding character, it is difficult to distinguish functional rna genes from the large excess of rna pseudogenes in mammals by ab initio methods.) in chicken, 571 rna genes in 20 distinct families were predicted and only the mirna and snorna families (that usually lie within introns of coding genes) show conserved synteny to the extent that protein coding genes do. that the other noncoding rna families did not suggests that they may transpose throughout the genome in ways that differ from coding genes.
frequency and class of gene/protein predictions (ensembl, june 2004)
comparisons between mammals and birds can also start to address questions about gene gains/losses (hillier et al. 2004). comparisons between human, chicken, and fugu suggest a core set of almost one third of all genes (7606) is conserved in all vertebrates. these comparisons also suggest that the rates of gene loss were higher in the avian lineage and fewer gene duplications were found in birds. careful comparisons detected some genes lost from the chicken lineage, including vomeronasal receptors, caseins, and some genes of the immune system. similarly, birds have more keratins specific to feathers and mammals have lost the avidin egg proteins. the discovery that all enzymes in the urea cycle were present but apparently not used for this function in birds was perplexing.
new tools for genome analysis
important by-products of any genome project are the resources (cdna and bac clones, genetic markers, etc.) and information it provides for future research (antin and konieczka 2005). together with chromosome paints, bac clones (bprc) have been used to define cytogenetically all chicken chromosomes (masabanda et al. 2004). because of the nearly identical sizes of the microchromosomes in mitotic chromosome spreads, this was not previously feasible. a bac map with 20-fold redundancy or 91% coverage of the chicken genome has been assembled into 260 contigs (wallis et al. 2004; chickfpc). bac contig maps are under construction for other birds; including turkey, california condor, and zebra finch (edwards et al. 2005). these clones can be used to target specific genomic regions and to create whole-genome bac arrays for comparative surveys of avian genomes. these arrays may be able to classify many avian species into unique clades, a notoriously difficult task (edwards et al. 2005). from the very start, ests and cdna clones have been important (boardman et al. 2002; chickest), in particular for the prediction of chicken genes. ests have been used to create cdna microarrays (burnside et al. 2005) and design dna chips (affymetrix) for high-throughput gene expression assays. a total of 4532 full-length cdna clones (caldwell et al. 2004; hubbard et al. 2005), representing ～25% of known gene predictions in chicken, can now be used in evolutionary and functional studies (available from ark-genomics). rnai and transgenic technologies are now available in the chicken, which when combined with the accessible chicken embryo, makes this a powerful system for functional studies in vivo (brown et al. 2003; nakamura et al 2004; sang 2004; stern 2004). the application of these tools and access to the biological information they generate is a huge and complex task. there are a number of databases distributed throughout the world (table 3), including genome browsers (ensembl, ncbi, and ucsc), genetic maps (arkdb and chickace), gene expression (geisha), and others, but there is a need to integrate these views into a single model organism database (gmod).
applications of the chicken genome sequence
birds and mammals shared a common ancestor ～310 million years ago (mya) (hedges 2002). sequence comparisons between these groups are characterized with a high signal-to-noise ratio for the detection of functional elements. taken together with the ready access to chicken embryos and as a major food source, chicken genomics is likely to have major applications and benefits in comparative genomics, evolutionary biology and systematics, models of development and human disease, and agriculture.
a major reason for sequencing the chicken genome was to increase our understanding of the human genome through comparative genomics, for example, to define regions under selection such as coding and regulatory elements (hillier et al. 2004). comparisons with known functional sequences suggested that 75% of coding regions and 30%–40% of regulatory elements are conserved. only 2.5% of the chicken sequence could be aligned with that of the human (44% coding, 25% intronic, and 31% intergenic) and, given that 5% of the mammalian genome is under selection, almost all of this is likely to be of functional significance.
comparative genomics has identified ～400 ultra-conserved regions (ucr) greater than 200 bp sharing at least 95% sequence identity between human and chicken (sandelin et al. 2004). surprisingly, highly conserved, noncoding regions like the ucr often exist far from any predicted gene within so-called “gene deserts” that are apparently free of any known protein-coding genes and are often clustered (ovcharenko et al. 2005). genes with a role in transcriptional regulation and development flank many of these ucr and gene deserts. these regions are often far from genes and may represent distant regulatory signals.
parent-specific gene expression by genomic imprinting is only found in mammals and not birds or lower vertebrates. therefore, comparison of imprinted genes in mammals with orthologs in the chicken may uncover features about the origins of imprinting. comparative mapping suggests these genes cluster on macrochromosomes in regions that preferentially undergo asynchronous dna replication (dunzinger et al. 2005). analysis of the chicken region orthologous to the imprinted mammalian ascl2–h19 region (yokomine et al. 2005) revealed extensive conservation of gene organization, except h19, a critical noncoding imprinted gene. this gene and its regulatory elements were absent from the chicken genome. these studies suggest that imprinted genes were clustered before the evolution of imprinting, an event that occurred after the divergence of birds and mammals ～310 mya. subsequently, imprinting control elements, such as the h19 gene region, must have evolved by duplication and/or transposition into these gene clusters.
a long-standing question in genome evolution has been the question of genome size. the chicken genome is 35% the size of the human and 45% of mouse. in part, this can be explained in terms of the low frequency of repeats, pseudogenes, segmental duplication, and gene duplications (hillier et al. 2004). however, these factors only account for 20%–25% of the variation in genome size, so other factors are at work, possibly a dearth of ancient repeats (that are no longer detectably repetitive) or reduction in cell size and energy conservation (hughes and piontkivska 2005).
applications in developmental biology are likely to be another major beneficiary of the genome sequence (burt 2004b; stern 2005). the chicken has always been a favorite among developmental biologists (brown et al. 2003; stern 2005) because of easy access to the chick embryo and ease of manipulation. these features, when combined with the new tools of genomics, are ideal for testing gene function and predicted regulatory sequences in vivo. for example, studies on the conservation of the avian sox2 genes have identified neural specific enhancers, confirmed in vivo by electroporation of chick embryo neural tubes (uchikawa et al. 2004).
in the mouse and other model systems, whole-mount in situ hybridization screens have been useful in identifying patterns of expression that may suggest developmental functions of novel genes (emap). a similar effort has started in the chicken using the large collection of sequenced chicken ests (boardman et al. 2002; ark-genomics; chickest). data can be accessed at geisha and standard three-dimensional embryo reconstructions are under development (emap).
genetic variation and complex trait analysis
in parallel with the chicken genome sequencing project, a consortium (wong et al. 2004; wang et al. 2005b; chickvd) generated 2.8 million snps from a comparison of the red jungle fowl reference sequence and partial genome scans of silkie, broiler, and layer lines. nucleotide diversity (5 × 10-3 per nucleotide) was six times the rate found in humans (ellegren 2005). resequencing confirmed 94% of the total and 83% of the nonsynonymous snps. an initial surprise was that ～70% of snps were common to all breeds, suggesting an origin prior to domestication 5,000–10,000 years ago. another possibility is that their ancestry has been lost because of extensive cross breeding between asian and western poultry populations. the next steps are to verify a larger sample of snps and create high-resolution genetic and linkage disequilibrium maps of chicken populations. these assays will be used to map and identify genes controlling traits of economic and biological interest at quantitative trait loci (qtl). currently, more than 600 qtl have been mapped using microsatellites (andersson and georges 2004; hocking 2005; wang et al. 2005b). the availability of a standard set of 10,000 or more snps combined with the ease of building structured large resource populations hold much promise towards the identification of genes controlling these traits.
animal health and the avian immune system
one area that has benefited most from genomic approaches has been the characterization of the genes and proteins in the avian immune system. the mhc was the first major chicken genome sequence to be assembled (kaufman et al. 1999) and was a surprise, being relatively compact and simpler than those of mammals. since then, there has been slow progress in the isolation of avian cytokines and other signaling molecules. the main problem has been their high rate of evolution, limiting their detection using homology to mammalian sequences (staeheli et al. 2001). even now, one must be careful in concluding that avian homologs to mammalian immune genes do not exist, as several examples known from ests or directed sequencing were not found in the genome assembly. this started to change when analysis of large est data sets identified 185 immune-related sequences (lynn et al. 2003; smith et al. 2004). this compared with the 80 genes identified by tirunagaru et al. (2000) and the 28 genes listed in the review by staeheli et al. (2001). sequences included interleukins, transcription factors, chemokines, differentiation antigens, receptors, genes involved in the toll pathway, and mhc-associated genes. the discovery of il4 and other cytokines involved in the th2 response (smith et al. 2004) was a surprise, since it had previously been speculated that the chicken does not elicit a typical th2 response (staeheli et al. 2001). the receptors for il10 and il13 were also identified, indicating that the chicken probably also contained these genes, which are typical tr1 and th2 cytokines. this was confirmed by sequencing specific bac clones identified assuming conservation of synteny between chicken and mammalian genomes (avery et al. 2004; rothwell et al. 2004).
a comprehensive analysis of the chicken genome sequence has identified many cytokines, chemokines, and their receptors (hillier et al. 2004; kaiser et al. 2004, 2005; wang et al. 2005a). even genes once thought to be mammalian-specific, including il3, il7, il9, il26, csmf, lif, and cathelicidin, were found (hillier et al. 2004). these are proteins that evolve rapidly and require more effort to detect. a number of orthologs to human chemokines are absent from the chicken genome, including ccl2, 7, 8, 11, 15, 18, 23, 24, and 26; cxcl1–7, 9, 10, and 11, possibly products of independent gene duplications in mammals. similarly, missing chemokine receptors included ccr1, ccr3, ccr10, cxcr3, and cxcr6. the lack of functional eosinophils correlates with the absence of the eotaxin genes (ccl22, ccl24, ccl26) and their receptor (ccr3). chickens lack lymph nodes and also the genes for the lymphotoxins (lt-α and -β) and their receptors. tnf is also absent, but its receptor, tnfrsf1a (ensgalg00000014890) is present, suggesting that further sequencing will reveal this gene in the chicken. similar analyses have been performed on the leukocyte receptor complex (nikolaidis et al. 2005) that regulates the activity of t- and b-lymphocytes and nk cells. a model of evolution by repeated birth and death of these ig-like receptors' genes was proposed.
when the first issue of genome research appeared 10 years ago, avian genomics was still in a mapping phase (burt and cheng 1998). the idea of sequencing the chicken genome was only a dim possibility and comparative maps were hailed as an alternative mapping resource. as the first livestock species to be fully sequenced, the chicken genome sequence is a landmark in both avian biology and agriculture. the avian community was small but has grown rapidly in the last two years thanks to the est and genome sequencing programs. the challenge now is to keep the momentum going and to exploit these resources. the creation of aviannet, an organization to encourage the exchange of tools and resources in avian biology, is a start but only a beginning. the chicken genome was determined to inform us about the nature and function of the human genome. it has also informed us about the nature of birds and other vertebrates. with 9600 extant avian species, there is still a lot to learn. birds, in particular, poultry and ducks are a source of many infectious diseases (avian flu: web focus 2005) and genomics is going to tell us a lot about host responses to these pathogens. there is therefore a need to sequence and characterize other avian genomes. this time these sequences will be used to inform us about responses to pathogens that infect both humans and birds.
samir r. fanous