Mary Gray, PhD student, Clinical Genetics Research Group
As part of my PhD from time to time I have the opportunity to screen small cohorts of patients for medium to small scale genetic changes (called copy number variations), in the hopes of finding something new that may explain the patient’s disease and tell us something about what the involved gene or genes do. I have just finished screening a group of patients for new copy number variations and haven’t managed to find anything novel this time, but I thought I’d delve into the world of the copy number variation or CNV and why it’s interesting (and kinda weird).
Not too long ago it was thought that most of the variation between people genetically was due to very small scale changes to DNA where a single DNA letter (base) is different between people — called a single nucleotide polymorphism or SNP. Any changes much larger than this, such as a 1000 base deletion or even duplications of whole genes were considered rare and lead to Mendelian (heritable from parent to offspring) disease. One such example of this is Charcot-Marie-Tooth disease, a neurological condition that can be caused by a deletion on chromosome 17. Research into the causes of multifactorial diseases such as schizophrenia and heart disease thus focused on SNPs, but found that SNPs alone could not explain all the genetic basis of these diseases — there was in fact some ‘missing heritability’. In 2006 came the first genome wide survey of copy number variation in 270 healthy people. Surprisingly the results from this study showed that about 12% of a healthy human genome contains CNVs — deletions and duplications of genes, genetic regulators and regions of unknown function. CNVs are the largest contributor to genetic variation between people. It seems that for some genes we can have three copies or only one copy and be perfectly normal. But surely these increases and decreases in genes (and subsequently their products that build our bodies) have some affect? Just a year later (2007) Stranger and colleges published in the journal Science an experiment that showed CNVs do have a biological affect on cells. In fact 20% of the variation in gene expression (the level to which a gene is copied to messenger RNA to make proteins or other building blocks) between people could be accounted for by the differences in CNVs between people.
More and more individuals have had their whole genome screened for copy number variation using array technology in the search for causes of common diseases like asthma — in fact this number is now in the thousands. We now know that CNVs at certain genetic locations can contribute to your susceptibility to asthma, schizophrenia or even epilepsy. How these variations cause an increased risk or act to protect an individual is largely unknown. We certainly have a lot of data, but it takes a lot of time and hard science to prove that deletion/duplication of gene X increases a person’s risk of Y because of a decrease/increase in gene expression leading to Z biological outcome. It’s complicated to prove things when dealing in risks and protection factors and not absolutes.
The data on CNVs is out there though for anyone to look at and study. Take a look at the Database of Genomic Variants (DGV) and see just how wide spread and normal these CNVs are (currently over 89,000 CNVs in the database!). There is now a new companion database called dbVar that will eventually have everything DGV has but you can study the data in a different manner — also you can see the CNVs found so far in chimpanzee, rhesus macaque, mouse, dog, pig and fruit fly. I use these databases and compare my patients that have rare Mendelian diseases in the hopes of finding something in my patients that aren’t in the databases — something that isn’t normal. And from time to time I get lucky, but mostly I just see how amazing it is that a person may be missing 10 genes that seem like they do something important — but then at least 20 other people are missing the same genes and are perfectly fine. With so many genes with overlapping functions it becomes a situation where few genes have no back up plan should there be not enough or too much expression. And those few genes are the ones I’m chasing — so wish me luck! But without further ado I had better stop writing this and finish hammering 22 gigabytes worth of .txt files into something I can do science with.