Genetic glossary


Adiposity: excessive accumulation of fat at a particular body location (i.e., regional adiposity), or the generalized accumulation of fat (i.e., obesity).

Allele: one of two or more forms of a gene or a genetic locus, typically differing in their DNA sequences; an individual’s alleles determine the distinct traits that can be passed on from parents to offspring.

Allele frequency: the proportion (0.0-1.0) or a percent (0-100%) of occurrence of that allele observed in a population

Allelic heterogeneity: the presence of more than two alleles in the same gene; see chromosomes.

Artificial selection: human intervention in animal or plant reproduction to attempt to ensure that certain desirable traits are represented in successive generations; also see positive selection and genetic selection.

Association mapping: see genome-wide association (GWAS)

Autosomal: residing on or due to chromosomes that are not the sex (X or Y) chromosomes.

Autosomal dominant disease: The disease will develop when at least one copy of the abnormal or mutant allele is present on a non-sex chromosome.

Autosomal recessive disease: The disease will develop when two copies of the abnormal or mutant allele are present on a non-sex chromosome.

Base pair: DNA is made up of four chemical bases: adenine (A), guanine (G), cytosine (C), and thymine (T). DNA bases pair up with each other in a specific manner, A with T and C with G, to form units called base pairs (Figure 1).


Figure 1. Genetic Glossary

Figure 1DNA is formed in a double helix. The units that make up the double helix are base pairs. Horses have approximately 2.7 billion base pairs in their genome. 


Chromosomes: the genome is organized into chromosomes that contain most of the DNA of a living organism. Chromosomes come in pairs, and horses have 31 pairs of autosomes (non-sex chromosomes) and one pair of sex chromosomes (X and Y). One copy of each chromosome comes from the sire, and one copy comes from the dam. There are 4 possible chromosomal combinations that the foal can inherit from its parents (Figure 2).


Figure 2. Genetic Glossary

Figure 2. One copy of each chromosome comes from the sire and one copy comes from the dam. There are 4 possible chromosomal combinations that the foal can inherit from its parents. 


Complex genetic disease: Researchers are learning that nearly all conditions and diseases have a genetic component. Some disorders, such as HYPP and lavender foal syndrome, are caused by mutations in a single gene. The causes of many other disorders, however, are much more complex. Common medical problems in horses, such as equine metabolic syndrome, tying-up, and laminitis, do not have a single genetic cause—they are likely associated with the effects of alleles in multiple genes in combination with management and other environmental factors. Conditions caused by many contributing factors are called complex or multifactorial disorders. Although complex disorders often cluster in related individuals, they do not have a clear-cut or Mendelian pattern of inheritance. This makes it difficult to determine an animal’s risk of inheriting or passing on these disorders to their offspring.

Conserved sequence: identical or similar nucleic acid (DNA/RNA) sequences across species. A highly conserved sequence, or region of the gene, typically suggests that that gene has some type of evolutionary significance.

Copy number variation: a type of genetic variation in which sections of DNA on a chromosome are duplicated and the number of repeats varies between individuals in a population. Copy number variation is due to duplication or deletion events that affect a considerable number of base pairs.

Deletion: a type of genetic variant in which a part of a chromosome or a sequence of DNA is lost during DNA replication. The number of nucleotides that are deleted can vary, from a single base to an entire piece of a chromosome.

DNA: deoxyribonucleic acid is an extremely long molecule that contains an individual’s unique genetic code. DNA is made up of four chemical bases: adenine (A), guanine (G), cytosine (C), and thymine (T) that are arranged in a linear sequence, or a strand. DNA bases from complementary strands pair up with each other, A with T and C with G, to form units called base pairs. In this way, the nucleotides are arranged in two long strands that form a spiral called a double helix (Figure 1). An individual horse’s DNA contains about 2.7 billion base pairs divided amongst all the chromosomes.


Encephalitis: inflammation of the brain.

Endocrine disrupting chemicals: Endocrine disruptors are chemicals that may interfere with the body's endocrine (hormonal) system and produce adverse developmental, reproductive, neurological, and/or immune system effects.

Fixation: when an allele reaches 100% frequency in a population.

FST: FST is a measure of the proportion of total genetic variation that can be explained by population structure, traditionally applied to identify structure among subpopulations due to common ancestors.

Founder: the first individual in which a genetic mutation arises.

Genetic architecture: a complete description of all genetic contributions to a trait, including all alleles that influence it, the magnitude of each of their effects, their allele frequencies, and their interactions with each other and the environment.

Genetic burden: the sum of all potentially deleterious alleles that are carried, mostly hidden, in the genome of an individual, or in the genomes of a population, that may be transmitted to offspring.

Genetic diversity: genetic variability within a species or population. Genetic diversity is a result of new allele combinations (e.g., recombination of chromosomes that occurs during egg and sperm production) and new genetic variants.

Genetic selection: the process by which certain phenotypes (traits) become more or less common in a population or species than other phenotypes. Genetic selection in animals can be natural (due to natural environmental pressures) or artificial (due to human-controlled breeding). Selection can also be defined as positive (the result is an increasing of the frequency of the trait in a population), negative (decreasing of the frequency of the trait), or balancing (keeping the frequency of the trait the same).

Genetic variant: alteration in DNA sequence often used synonymously with allele. Any two individual’s genomes differ in millions of different ways. There are genetic variants in the individual nucleotides (single nucleotide polymorphisms or SNPs) as well as larger variations, such as deletions, insertions, and copy number variations. Any of these variants may cause alterations in an individual's traits, or phenotype.

Genetic variation: Genetic variation generally refers to the differences in alleles of genes between individual members of a population, or the frequency in which the various genetic variants are expressed. Genetic variation is important for survival and adaptation of a species.

Genome-wide association (GWAS): the process by which a large set of genetic variants that are distributed across the genome in a group of individuals is used to identify variants that have different allele frequencies between individuals with and without the trait of interest (see detailed explanation of GWAS at the end of the glossary and Figure 5).

Genome: the entirety of an individual’s genetic information. In eukaryotes (yeast and higher order multi-celled organisms), DNA is organized into chromosomes (Figure 2), and the genome consists of DNA that includes both genes and non-coding sequences (Figure 3). In prokaryotes (microscopic single-celled organism without a nucleus, such as bacteria) and in viruses, the genome consists of DNA (bacteria and some viruses) or just RNA (some viruses).


Figure 3. Genetic Glossary

Figure 3. Each chromosome has both genes and intergenic regions that contain non-coding sequences. Most genes code for a protein product. Horses have approximately 22,000 protein coding genes in their genome.


Genotype: the unique combination of alleles within an individual at a particular locus.  The genotypes at a single locus (Figure 4), or more often, multiple loci, underlie particular traits or phenotypes.


Figure 4. Genetic Glossary

Figure 4. A horse’s genotype affects its phenotype. Phenotype can be the result of the horse’s genotype at a single locus or gene. When the phenotype is controlled by a single locus it is referred to as a single gene or monogenic trait/disease. For example, gray coat color is caused by a dominant mutation in STX17. Heterozygous (G/N) and homozygous (G/G) individuals are gray. Homozygous normal (N/N or wild-type) individuals are not gray. 


Genotype imputation: the statistical inference of unobserved genotypes using known haplotypes in a population. In other words, genotype imputation uses information from known genotypes in a sample to make an educated guess about genotypes at nearby loci that are unknown. Genotype imputation allows researchers to accurately evaluate for association between traits of interest and genetic markers that are not directly genotyped. See haplotype below.

Haplotype: a linear combination of alleles or genes located on the same chromosome, that tend to be inherited together in a population.  See also the detail explanation of GWAS (figure 5).

Heterozygous: An individual is heterozygous at a gene locus when its genotype represents two different alleles of that gene.

Heritability: in breeding and genetics heritability is an estimate of how much variation in a phenotypic trait in a population is due to genetic variation among individuals in that population.

Histopathology: a microscopic examination of the cellular and tissue alterations resulting in or causing a disease trait.

Homozygous: An individual is homozygous at a gene locus when its genotype is represented by two copies of the same allele.

Hyperinsulinemia: havingexcess levels of insulin circulating in the blood relative to the level of glucose.


Inbreeding: the breeding of individuals that are genetically closely related.

Insertion: the addition of one or more nucleotide base pairs into a DNA sequence.

in situ hybridization: using a labeled complementary DNA, RNA or modified nucleic acid strand (i.e., probe) to localize a specific DNA or RNA sequence in a portion or section of a tissue.

Insulin dysregulation: abnormal insulin metabolism signified by hyperinsulinemia and/orinsulin resistance.

Insulin resistance: a pathological condition in which cells fail to respond normally to the hormone insulin.

Linkage disequilibrium: the non-random statistical association between alleles at different loci. LD is most often due to the fact that two loci are physically close to one another on a chromosome and are less likely to be separated by genetic recombination.

Locus: a position or place on a gene or chromosome (plural: loci).

Mendelian or single gene trait: A Mendelian trait is one that is caused by a mutation in a single gene and follows simple Mendelian inheritance. Single gene traits typically follow a autosomal dominant or recessive, or sex-linked pattern of inheritance. Only one mutated copy of the gene is necessary to be affected by an autosomal dominant trait, whereas two mutated copies of the gene are necessary to be affected by an autosomal recessive trait. X-linked dominant and recessive traits are caused by mutations on the X chromosome and Y-linked traits are caused by mutations on the Y chromosome.

Meninges: the three membranes that envelop the brain and spinal cord.

Metabolomics: the study of the set of all metabolites and other small molecules present within an individual, cell, or tissue.

Multifactorial traits: traits or characteristics inherited as a result of genetic and environmental factors. An example is susceptibility to certain diseases (i.e. cancer).

Mutation(s): synonym of genetic variant, representing an allele, often used in association with a variant causing a disease or other trait.

Myeloencephalitis: inflammation of the brain and spinal cord.

Next-generation sequencing (NGS): also known as high-throughput sequencing, is the catch-all term used to describe a number of different modern sequencing technologies that rapidly sequence the entirety of the DNA or RNA in a sample. Next generation technologies have the advantage of not requiring a specific template to amplify nucleic acids that was needed in older technologies, and they have revolutionized our ability to study genes and their expression.


Pearson’s correlation coefficient: a statistic measuring the strength of the relationship between two measurements (X and Y). A value of 1 implies that a linear equation describes the relationship between X and Y perfectly, with all data points lying on a line for which Y increases as X increases. A value of −1 implies that all data points lie on a line for which Y decreases as X increases. A value of 0 implies that there is no linear correlation between the measurements.

Penetrance: Proportion of individuals in a population carrying a particular allele of a gene that is expressed in the phenotype.

Personalized medicine: separates patients with apparently similar conditions into different clinical groups based on molecular similarities, allowing medical decisions to be tailored to the individual patient based on their predicted drug response or risk of disease.

Phenotype: the measurable physical characteristic or trait that is the result of an individual’s genotype (Figure 4). A phenotype is the trait(s) or characteristic(s) that are seen or measured and can be anything from disease status (affected vs. unaffected) to physical properties such as height.

Phenocopy or phenocopies: a condition where the phenotype of an individual is altered because of an environmental factor, and thus the individual appears to have an altered genotype, though in fact it does not.

Positive selection: genetic selection that results in an increasing of the frequency of a trait within a population.

Polygenic trait or disease: a trait that is known to be influenced by many different alleles in many different genes.

Precision medicine: a genome-driven technology that enables grouping of patients that are genetically similar, allowing for more targeted disease diagnosis and therapy; see also personalized medicine.

Recombination: the rearrangement of chromosomal segments during egg and sperm production to produce offspring with different gene and allele combinations than either parent. Recombination is the source of genetic diversity in sexually reproducing species (Figure 5).


Figure 5. Genetic Glossary

Figure 5. Recombination occurs during meiosis resulting in the production of offspring with trait combinations that differ from those in either parent. Recombination during meiosis is facilitated by chromosomal crossover, or the exchange of genetic material between homologous chromosomes. (A) Homologous chromosomes within an individual (one from the individual’s sire and one from the dam) replicate and (B) align. (C) Chromosomal crossover allows for recombination between chromosomes. (D) At the end of meiosis, gametes with 4 different chromosomes are produced. 


Reference genome: A reference genome is a digital nucleic acid sequence based on the genome sequence of a representative example of a species. The equine reference genome is based on the genome sequence of a Thoroughbred mare, Twilight.

RNA: ribonucleic acid. RNA is assembled as a single-stranded chain of nucleotides. Cellular organisms use messenger RNA (mRNA) to convert the genetic information of DNA towards the synthesis of specific proteins or promote or inhibit the expression and translation of other genes. RNA differs from DNA in that it uses the bases guanine, uracil, adenine, and cytosine (G, U, A, and C).

RNAseq: unbiased, next generation sequencing of all RNA present in a tissue sample.

Selective breeding: (also called artificial selection) is the process by which humans direct specific breedings to selectively develop particular phenotypic traits.

Sequenom assay:  a laboratory system for simultaneous genotyping of multiple SNPs that can be custom designed by the researcher for a specific purpose.

Sex-linked traits: traits or characteristics associated with a gene carried only by the male or female parent.

Signature of selection: a pattern of allele frequencies at a particular location on a chromosome that indicate selective pressure has been applied to that region of the genome.

Single nucleotide polymorphism (SNP): DNA sequence variations that occur when a single nucleotide (A, T, C, or G) in the genome sequence is altered (typically pronounced “snip”).

SNP genotyping arrays. SNP array or SNP chips: a laboratory technology used to simultaneously genotype many tens to hundreds of thousands of SNPs by use of small DNA probes across a microarray


Transcriptome: the sum total of all the types of RNA molecules expressed from the genes within an organism’s tissue or tissues.

Whole genome sequence: the complete DNA sequence of an individual, typically performed using next-generation sequencing technologies. 

A genome-wide association study (GWAS) is the examination of a set of genetic variants (usually SNPs) that are distributed across the genome in a group of individuals to identify variants that have different allele frequencies between individuals with and without the trait of interest. GWAStakes advantage of linkage disequilibrium to link a phenotype (the observable trait[s]) to the underlying genotype (the genetic allele[s] responsible for the trait). Linkage disequilibrium (LD) is the non-random statistical association between alleles at different loci. LD is most often due to the fact that two loci are physically close to one another in the genome and are less likely to be separated by genetic recombination. Association mapping relies on the fact that when the allele responsible for a trait (causal allele) has entered a population relatively recently from a single founding ancestor (“founder”) it will be physically linked to the surrounding genetic sequence (or haplotype) of that founder. Over generations, the length of the haplotype surrounding the causal allele is eroded due to recombination (see Figure 6).

GWAS is most often performed by scanning the genome for associations between particular SNPs from a large a panel of SNPs on a SNP chip and a particular phenotype.  Statistical tests of association are applied to look for allele frequency differences between the groups. The most common design of GWAS uses a case-control approach, which compares two large groups of individuals—one healthy control group and one case group affected by a disease. When the allele frequencies of a variant in a particular SNP are statistically different between the two groups, the variant is said to be associated with that trait.These associations must then be independently verified in order to show that they either contribute to the trait of interest directly, or are in LD with the causative allele(s). GWAS can also be applied to quantitative phenotypes such as height or insulin concentrations.