stephen chanock:good morning. well, thank you very much for the invitation to come to this very excitingmeeting. it's a very important meeting, in my mind, and one that i'm sorry that i'm notable to be much of. but i know people from my laboratory and other laboratories in thenci have been here, and have found this to be extremely informative. and most importantly,provocative and stimulating. because i think, you know, the encode and the mission -- andmore importantly, the generation of data by encode has really been the start, and notthe finish of a whole series of very important questions. so, i wanted to, today, in the next 25 minutesor so, talk about encode, particularly in
the study of trying to get at the complexityof cancer susceptibility. i'm not going to give you 50 examples in five hundreds, youknow, different levels of looking at methylation marks -- right, left, up, down, you know,inside, out, and beyond. i'm going to try and put it more in the context of how we'reusing it, and where it's been helpful, but at the same time, try and stimulate some ofyou who may not be thinking about cancer susceptibility to start thinking about it. and how and whatway to utilize the kinds of opportunities i think that encode has really put in frontof us. so really, when we think about the etiologyof cancer, it is really complex. you know, we think of mendelian diseases as a particularmutation that drives a disruption of a critical
gene or pathway. and there are important consequences,and then the other 24,999 genes and their environments, and the long non-coding rnas,and all the things that we add to that picture. but i think, particularly when we think aboutcancer, which is something that involves an extended period of time for the developmentthereof, we have to think very hard about the role of environment and lifestyle. youknow, as we all know, bmi, smoking, you know, all sorts of chemicals and the like are veryimportant carcinogens. and they're part of this equation. there is this other side of genetic susceptibility,and there's an argument of how much is it of each? and there's some of us who thinkthat it's probably 100 percent of both. okay?
in other words, thinking what's the geneticsusceptibility? what's the setting of your carburetor as you start out in life, and youexpose yourself -- either willingly or unwillingly -- to lots of environment challenges. andthen, the lifestyle changes that you make. you know, determining your weight, exercise,and by the way, you know, in our institute, the view is physical activity is the nextsmoking. and those who have lack of physical activity, or do the wrong things, are puttingthemselves at higher risk for diabetes, lung cancer, breast cancer, prostate cancer, andthe like. but really, we think of the environment asthe triggers, and the genetics as the set point. and i'll spend pretty much the restof the time talking about our understanding
of what the set point is. because we havevery little understanding of really how the triggers actually interact with the geneticmakeup that we have. we have a few examples, but very few, and i think this is where -- inmy mind -- encode needs to get environmental, so to speak, in thinking about not a modelsystem, but how and in what way you take that information. you plug in to try and understandhow critical changes take place that would lead to very important diseases. and i happento be interested in cancer, but the same thing can apply to diabetes, to coronary arterydisease, nerve degenerative disorders, arthritis, and the like. so, i think it's really critical. and then, of course, we also know that thereare stochastic events. we know that there
are errors in the program of dna repair, whichare very important. and then, all the chance issues. and how important chance is. well,there's a lot of debate about this. some of us are a little bit skeptical of the statementof how important chance is. i mean, i think that's more an attribution of what we don'tunderstand, as opposed to necessarily saying that it's truly chance, in a probabilisticpoint of view. but we can debate that later. so, really, when we think about this, we haveto then, now, move into cancer. and cancer, there are really four spaces that we livein. as you can see, above the line, the germline, where i'll spend pretty much most of my timetalking. we know that there are a whole series of cancer syndromes. those are important mutationsin the germline that put someone at very high
risk for developing cancer. there are moderatepenetrance, genes that are part of an oligogenic model. and then gwas has been very successfulin cancer, and i'll talk a fair amount about that. only a small fraction of what we havethere is actually actionable. maybe 26, 27 of the mutations do we really know what todo with. another 50, we have an idea. and then, another 100, we think we could surmise,but we really don't have the evidence. and then, underneath, there is the second, orthe third, or the fourth, or the fifth cancers -- genomes that are indeed, the somatic alterationsthat live in the world of drivers and passengers. and we know that heterogeneity is very important.and we've really accumulated an extraordinary catalog with tcga and the cosmic data.
and then, you know, the excitement in cancerresearch is all about targeted therapy. but again, targeted therapy is a very early concept.and it's not going to solve everything overnight. it is a very difficult thing to do. and then,we have all the tests that are out there. so, this is sort of the space that we livein, between going from research to actually trying to figure out what's actionable. andfrequency and/or penetrance is not the only way to move from discovery to clinically actionable.to understand the functional consequences. and again, this is where encode is a veryuseful tool to look at many of those things, particularly when we look at the moderatelypenetrant genes, and some of the gwas hits. and we'll come back to that.
so, what happens when there's more than onegenome? well, we know that we can look and see a wide panoply, landscape, of four ordersof magnitude difference in the burden of genetic changes that are observed with the whole genomeor exome sequencing, going from pediatric tumors that are very, you know, that are fastand furious, so to speak -- rhabdoid, ewing's sarcoma, aml, acute myelogenous leukemia.and then, those that are environmentally driven, like lung cancers and melanoma, where smokingand uv light are driving them. you can see as many as four orders of magnitude more mutationalload. now, the problem is, we don't have large enough numbers so that we can use frequencyto pull out, necessarily, what we think the drivers are. we have a small smidget of informationthat suggests that there are few of them.
but again, i think how and in what way wesort of look at these sort of ncis snapshots of, what's the forensic picture of a canceris very different from how we actually get there. so, we've known for some time that canceris a heritable condition. we go back to paul broca, the wonderful french neurobiologistin the 1860s who noted in his own family extraordinary clustering of breast cancer, with his sisters,mother, grandmothers, aunts, and the like. and actually, reported this -- we didn't knowwhat genetics were about in those days, per se, but the, you know, the astute intellectclearly pointed that out. we had ages of twins, family, and sibling studies. we saw familiarclustering, such as what joe fraumeni had
pointed out, not necessarily for just onecancer, for sets of cancers. and then, certainly, the knudson hypothesis of knocking out boththe germline and the somatic. you could get to cancer by having a germline inherited mutation,and then have something show up as somatically. and then, of course, the positional cloningof a familial breast cancer gene in '91 -- that was, by '93, '94, really much better annotated,from mary-claire king and on into the world of brca. so, when we look at this, we spend a lot oftime trying to do positional cloning, and identifying mutated genes and cancer susceptibilitysymptoms. and there are about 115 or 120, sort of depending on your definition rightnow. and this continues to evolve. exome sequencing
will continue to put more spots on the map.and one of the remarkable things is we really don't see like we've seen the infectious diseaseworld, i.e., hla, in terms of a concentration of a cancer region. and i think that reallybespeaks how complex cancer is, and the multiple different pathways that lead to it. theseare ascertained in families with rare mutations, and they have been instrumental in helpingus to identify the concepts of both oncogenes, those where a single mutation would drivesomething, and then tumor suppressor, where you remove those things that are sort of protectingyou, letting the cell loose, so to speak. and then, we also know that even in the worldof brca1, the idea of the penetrance -- in other words, you know, what's the risk thatwe would see -- is not identical for each
individual, again, underscoring both the geneticand the environment. and we've seen this more in tcga, where we clearly can see the impactof both germline and somatic mutations. when we look at the nearly 500 women with ovariancancer, we could see a substantial fraction who had a germline variation -- germline mutationsthat were very important that had an impact on survival. but we could also see silencingof the gene, and most importantly, mutations and rearrangements, and somatically. so, this sort of gets us to this questionof these high penetrance mutations and somatic alternations. when we start mapping them,as naz rahman did about a year ago, against the emerging databases like cosmic and tcga,nearly 50 percent of what we are assuming
are the susceptibility genes are already incosmic and with the frequency that would suggest that they really are drivers. so, these arethe unfortunate errors. and this is the world that's harder to understand, i think, usingthe world of encode, in terms of regulation. these are knocking out, or doing something,creating the dominant negative in vivo, so to speak. and i think that that's a very differentparadigm, from where we go, when we start for our search for common variants in complexdiseases, because we're going to build up to how we see the architecture of geneticsusceptibility. so we have very nice reproducible technologiesthat start out very easily when we put large collections together. and these chips giveus a multiple testing problem that we've all
roared at, and had difficult trying to figureout. we've come to this sort of quasi-conclusion of genome-wide significance. and that's beenvery helpful, when we start to look at, for instance, cancer. so now, there are some 490separate loci that have been identified in more than two dozen cancers. there are another120 that are sitting out there, from the oncoray [spelled phonetically] that we've seen thedata, and have not yet made it to publication. so, we can see that the world of cancer susceptibility-- whether you're going from rare cancers, like some of the pediatric cancers, like ewing'ssarcoma or osteosarcoma, clearly have these common variants with small effects, to prostateand breast cancer, where we're now able to explain a large fraction of the familial risk.and i'll come back to that in a minute.
so, interesting enough for all the excitementof cnvs, back in the -- about seven years ago, it's sort of the cahoot tech [spelledphonetically] of genomic susceptibility, with respect to common variants. we've only seenone that's really been reproducible that's of a common nature. now, when you back toyour germline susceptibility alleles that are important in these highly penetrant mutations,copy number becomes very important. but those are very rare events. so, if we're thinkingof frequency against, specifically, the effect size, we see very little with respect to cnvs.interestingly enough, just shy about 10 percent are shared between cancers. so, again, wedon't see these soft points where we could say, other than the tert region on chromosomefive in aq24 in hla for the viral driven cancers.
we don't see regions where eight, nine, or10 cancers are all lining up together. there seems to be, again, the suggestion of sortof perturbation of literally the sort of redundant pathways. interestingly enough, and partlydue to the way in which these studies are conducted and have collected samples, butalso, i think it's a question of the heterogeneity -- almost none of these, or only two or three,that are associated with outcome. so, raising the question, what's importantfor getting cancer may not be as important, once you have it for your specific outcome?that's a whole other world of how that germline is functioning. and that's a very importantquestion, in thinking about pharmacogenomics, for which i think the encode resource is reallyterrific. and then, i mentioned that there
are more than 100 effected. about 120 thatwill be reported very soon. now, if we look at this, we've done some work in our own -- ihave mitch machiela, sitting here in the audience, who's spent a fair amount of time trying tolook at the same question that i showed you, that over 50 percent of the known susceptibilityhighly penetrant mutants are actually in the cosmic database, suggesting that somethingabout either germline or somatic alternation takes place. well, mitch looked at about justshy of 300 of the regions -- mapped them, looked at the genes, and used a different-- a series of different approaches to try and ask the question, are there indeed -- isthere a relationship between these common variants and somatic mutations? in other words,pointing toward specifically genes being important
that we would identify through our landscapesequencing of somatic alternations. and we really don't see that. in other words, as you can see here, thesetwo circles look basically identical, between the gwas genes that we've mapped, in the intervalsin and around the gwas hits of nearly 300 against a permutation analysis of genes thatare not in the gwas regions that have anything to do with cancer. and it's very differentif we were to go those 115 genes that i referred to before, where over 50 percent of them,we know are heavily mutated and are critical, you know, as identified in the cosmic database.so, really, you know, the interpretation is we're not necessarily looking sufficient orrequired elements for developing cancer for
any single one of those hits. so, the correlationdoes not necessarily imply causation, and that's a dangerous view in the gwas, and onethat some have continued to propagate, and some of us feel quite strongly it's not. partlybecause we know that we have primarily indirect associations. we have markers. and those markersare very important, but they give us that question of how do we then prioritize? and this is where there've been a whole scoreof papers that have been put out in the last few years, using the encode resources and[unintelligible] and the like, to be able to try and figure out, specifically, how andwhat way can we prioritize these? and here are just a couple of things, you know, thestatistical approach to prioritizing, based
on pleiotropy and annotation of the encode,versus actually groups that have actually tried to do this for the known loci, and seeif we can see patterns. and the problem is, with what we've seen in the genome-wide associationstudies, we don't really see patterns. we can't say, beyond the very generic word thatthey're -- that most of them look to be regulatory. but we can't say, are they really all enhancers?are they in silencers? are they important, you know, binding sites for transcriptionfactors? for open or closed chromatin, and the like. so i think we have to be reallysort of careful about that. so, really, can we use this? yes, and no,in my mind. we can use it to start the discussion. but we know that each of these gwas singleshas to go one by one investigation. the old
smith barney television commercial, for theolder people. one snp at a time. you can't do it genome-wide and getting the answer ofwhat's the functional -- i realize that's a little bit of a dangerous thing to say infront of the encode audience, but i got to say it [laughs]. all right? it -- and we knowthat these things are giving us important insights into the biology, but they're notnecessarily causal. but they have a functional contribution. and i think where encode reallyallows us, i think, to more effectively be smart, is really looking at the integratingat the non-coding and regulatory information, and the eqtls and the like, to be able toprioritize which variants are we going to take into the lab, and actually try and makesome sense of.
so, here is an example of -- mila prokunina-olssonin our program did a beautiful job in taking one of the bladder cancer hits for the prostatestem cell antigen -- it's just by chance that's its name. so, she mapped this, and imputedit, and then, using the encode, was able to look at, specifically, all the correlatedvariants, and see some of the activity that was in and around the promoter. and then,the risk allele actually turns out to be very important for actually the expression, inshowing the difference in both mrna, as well as protein, as shown with immunohistochemistry. so, this highlighted something we really hadn'tthought about in bladder cancer, per se. and here is a very good example of a translationapplication. because this snp actually predicts
for the degree of expression that's measurable,that's quite significant. and one can see, actually, that on the market, they were -- imean, not on the market. the pharmaceutical industry was trying to develop an anti-pscamonoclonal antibody, but for different types of cancer. so, now, there's been this discussionto try and bring this in, and the regulatory issues have held us up for a bit, but hereis a possible clinical trial. taking something where you've gone from being able to map it,do the functional analysis, again, using encode as part of that analytic strategy, to be ableto say, “here, we can show what we think are the functional underpinnings of this particularassociation.†and just, this is the real lucky one. this is the one in 475 [laughs]that have -- has jumped off the page as clinically
translatable. but hopefully, the other 474or whatever are out there, ripe for the taking, at this time. so, when we start thinking about this, wecome back to how and what way we use this to really look at what we know is sort ofthe sweep, or the architecture of genetic susceptibility. and we clearly know, here,that in the gwas world, we've been able to find common variants that really fit thissort of polygenic model of snps and snps, and more snps. hundreds to thousands of snps,to explain this part of the common disease paradigm. but we also know that there arethese rare damaging drivers that i made the case for, for brca1 tp53
patch in the light so really the questions what fraction ofthe polygenic company contributes to each cancer? so we've done an awful lot ofscanning in our world in the nci and we decided to look at thirteen differentcancers in men trying to use some of the newer approaches to take genotypes snpsand explain anywhere between 10 to 50% of the variability on the liabilityskills so in other words what fraction of the genetic contribution to thatparticular type of a common disease or not so common disease can be explainedby the gwas component in this is looking at the knowing that it's morethan just the snips it hit genome-wide
significance but rather they're a lotmore underneath that curve and when we look across those cancers just seen herewe said we can clearly see that we can explain some fraction that indeed it does does begin to approachwhat we've seen from the familial between in all the parent-child studies epidemiologically been done over thelast thirty years so the shared heritability interestingly enough doesbring to bear some very interesting questions we can see strong sharedfactors between things that embryologic association, testes and kidneys chroniclymphocytic leukemia and diffuse large b
cell lymphoma but we also could seethings such as adult lymphoma and bone tumors in children so again using thisas a way to try and put our hands around where are the things that are there thatwe haven't appreciated is as models and so going forward as we know all modelsare wrong but some are useful ok and we have to start to think how can we usesto think about how we would predict disease well prediction is difficultespecially about the future and this was said by hero of mine yogi berra alsosaid by not a hero of mine dan quayle but really the person who really saidthis first with niels bohr an absolutely brilliant person and i think in doingthis we now have with our gwas world
as well as the highly pension mutationsthe ability to start to map and sort of look at what the genetic architecturelooks like so here looking at breast cancer we now know that we can actuallyexplain 35 to 40 percent of the excess familial risk with these hundred andsixty snps in breast cancer and we know that the highly pension mutationsexplain about 10 to 15 percent so at this point we can see that more than 50%of the risk of breast cancer in a family can be explained with the variance wealready know in hand and the polygenic models if we keep pushing them have apotential to add more to that in exactly what that limit is is a very importantquestion if we start looking at the area
under the curves gm and chatarjee have spenta lot of time modeling this we can actually see veryinteresting things that the total heritability corresponds to about atwo-fold sibling relative risk at the moment we've been able to explain of onepoint four of that too and as we continue to put that catalog together wethink that we topped out at about it auc of eighty this may not be good enoughfor an individual patient but for public health measures it could be veryimportant in discriminating who would get earlier mammograms who would getearlier interventions are preventive therapy so again how and in what way weuse this information is you know in in
the public health then you i think iscoming up on the horizon for individually counseling people on the basis of snpsi think we still have a long way to go despite what 23 and mean decode me andothers have one of the things so we also can have to take advantage of the themost important risk factor for most of our cancers which is age and if we lookat prostate cancer if you take the 76 snps you can see you can get asubstantial separation between the first and 99% how if you look at thedistribution of those snps doesn't necessarily mean you're going to get thecancer but it's important for in our minds for public health reallyimplementation. now what we did these
gwas we kept seing these unexpectedfindings of the genome wide association studies of large chromosomalabnormalities that turned out to be somatic mosaicism part of this sort ofdynamic genome what we know that there's a subpopulation that his cloneexpanded and stable within either blood or buckle and we can clearly see this inas we've done now over a hundred and twenty-seven thousand individuals wereable to actually see the distribution of these events hitting all the differentchromosomes but particularly the acts of the y is even higher and that's a morecomplicated story that for another day but we used our gwas chips to be able tolook at these greater than 2 megabase
events in mich makela again had donesome very nice work in putting this together and looking at that landscapeand being able to see that there were some recurrent events which really raisethis very important question on why are people walking around with these andmany of these half of these are healthy controls who have not developed cancerper se and you know what can we tolerate is an important question related toultimately what may be a kind of question of genomic stability in thelarge as opposed to thinking in the more classical lynch colon cancer model so wethought that it would be very interesting to ask this question as wesee some of these events on chromosome
13 or 12 or 24 that are recurrent could weuse the encode data to do break point analysis realizing that this is sort offirst the the roughest cut so to speak but nonetheless is this going to behelpful for us to throw home in all regions it may be more amenable to thesekinds of events taking place as we know these events take place and single baseif in a number of newspapers new england journal of medicine, nature medicine inthe last year showing that single base point mutations are clearly there andthen we of course know all the classical neurofibromatosis inin turner syndrome in the leg so we took
688 interstitial events in 543 feel america events and looked at the 200 kbwindows of the snps around it and looked at those permutations both withrespect to the region and then other regions of the genome and what weinterestingly saw was here is how we looked at each of the different elementsso the genomewide of looking at the recombination rate versus thepermutations distributions with 95% differences, first thing we saw was openchromatin looking fairly interesting to us when you see where theserecombination user moving over towards when we start looking at theseparticularly in both linear copy neutral
in interstitial losses we also sawthat repetitive elements did bear some element on this so again the question iswhere and why these events are occurring in these kinds of places is a verydifficult question but in you know this is instead of 37,000 feet we may bemoved down to 25,000 feet thinking about what's going on in the genome and theninteresting enough with gene rich regions we could particularly see withrespect to the telomeric events and and not as well with the interstitial lossesthat we thought but again in our minds is raised a really off sort of afundamental question of really how in what way we could look at including thefragile site
different elements that may tell us arethere some regions of the genome that are going to be more sensitive and morelikely to have these kinds of events taking place because we think of adetectable most ... with only the tip of the iceberg we've clearly seen atlarge and small events and we know if we u-shaped curve it's seen in the veryyoung with catastrophic diseases as well as now and the aging population and wesee a lining up for neurodegenerative disorders .. closing you knowto me that current challenge of taking in thinking about the relationshipbetween germ-line and and somatic in knowing that thegermline itself is eroding and falling
apart the hardest thing that we have toexplain is for instance for both i think the highly pension mutations and the youknow the the common variants is this question of tissue specificity theorigin could it be that the effects are really mediated through the adjacentcells or logic modulation could these snps for instance be a modulating theimmune system and we clearly can't see selective success of immune blockadebut not total the question of timing effects in the hardest thing in my mindis going back that first slide is the interaction with the environmentalstimuli so let me end by saying you know again
kudos to the gang that had envisionedand kept the encode alive despite all the different assaults and questions i thinkthat the spectacular resource for the functional basis of susceptibility it'san opportunity to explore many novel elements work individually with theirinteractions but also to have a call out to i think the value of the team sciencethe short and long term in the establishment of all of thoseextraordinary thresholds and standards of many of us can use to apply todifferent places so there are obviously many people to acknowledge particularlythe wisdom of joe fraumeni and bob hoover over 40 years saw the value of bio specimensand have been explicitly well-done
studies giving us an opportunity toreally go into the cancer susceptibility world and certainly acknowledgments ofall the things that are part of the oncoarray consortium that i've made allusionto. so why don't i stop there and it is time for questionsthank you you know what i've always been curious, theoccurrence of variations that these sides and high correlation in this specific tissueof to a particular transform state you know seems logical and we heard a lot about itbut in tissues where there is no evidence of a disease state and the mutation is stillthere, do we know, has that been studied in a sense that, are there other compensatingfactors that sort of
mitigate any missregulation going on there or? it's a terrific question i think it raisedtwo critical questions one is you know howand in what way do we actually protect you know why is it that for instancewith brca1 is a breast and ovarian cancer and it's not any number of othertissues per se and you know this is where the question to environmental andsecondary effects as well or very important other genes that do interactor don't interact but why why we see that tissue specificity is still verymuch and i think you know we we look at some cancers related to that and we sayyou know like harold varmus it pointed
out the provocative questions you havesome tissues were you have extraordinary turnover for instance like the heart andyou have small intestines and you see virtually no cancer whatsoever at alland then you know in then you see other cancers of the skin where you see such awide variety of the responses to to uv light and particularly the protectionthereof in the repair mechanisms that it really raises this questionof really the timing and extent to which those damages are exposed are notexposed so you know the 3d culture people are are you know showing veryinteresting data that if you take a mutation you put it in a particular 3dculture and then you start to grow to a
certain point their intrinsicenvironmental properties that are not necessarily determined by a geneticmutation that are not linear that allow for either the growth to begin to changeyou know there's nature or in its or in its trajectory so what we really are missing is that three-dimensionalcharacter i think you know when we look at so much of this genetic informationand think it is a flat mutation i wish i could answer your question better butit's it's really to me the $64,000 question in cancer susceptibility youknow why is it that we're not getting cancer in all of our tissues?
one question: to what degree independentlyof breast cancer, prostate cancer, theobvious sex-specific ones are you stratifying the risk by sex because thereare a lot of other cancers that have sex specific susceptibility? you can i mean what's interesting forinstance if you take the top five or six cancers and you get beyond thesex-specific one breast prostate ovarian but you gotta colon you go too long yougonna bladder pancreas you really don't see a huge difference between sexes andyou don't see that the variance or the incidents really is substantiallydifferent between them so you know it is
an interesting question and then you goto some others where you know if you get rarer and go down the line you youclearly see those kinds of changes you know i think how and in what way we canexplain that i mean with lung cancer we used to do with smoking behavior but nowthat men women smoking comfortable levels we'veseen pretty much comfortable ca both incidents in survival in thewest at least. you're encouraging us to think about the role of environment of thecancer is something known at this point about how much of this would be sharedenvironment private environment say
within family as opposed to substantialvariation across individuals curious because i'm wondering to what extentwith this environmental effect be captured in family history and to whatextent to be outside? well it's an excellent question is youknow the history of linkage as always you know when we when we tried to do astructured analysis before we started sequencing and looking at the twinstudies we've always you know how these two categories of what's the heritableversus the environmental? the environmental in most of the studies has beenrelatively small and what could actually be characterized in then there is thelarge we just don't know i think shared
environment is clearly important butagain it comes back to this question you know the individual exposure each personyou know what they're so to speak carburetors asset and you look atbrca1 for instance you know and there we already know you know some fractionof the gwas hits have very important modifying effects as we start to explainthe differences and penetrance with you know by both mutations in withinfamilies so i think you know the again that the question of environment is veryimportant in trying to sort of understand it has a model how you applyit specifically we still have a long way to go to be able to quantify that andput that into something it would be a
suitable individual predictive modelthat's where i i still worry that we have a little bit of it naive sense thatwe're going to be able to explain cancer risk to people individually as opposedto in a population level or perhaps it is familial level terms of where you wouldfall in the distribution but you know as opposed to actually being able to sayyou're going to get it or not get it deterministically.
No comments:
Post a Comment