>> this morning, i'd like to talk about genome-wide association studies in transition to-- from what we found that we expected to find, and then what we have not expected to find, and then a little bit about where we're going to in terms of mapping susceptibility and addressing some of the questions i think that are on georgia's mind and others in terms of where sequencing is going to go and what the scope of the scientific questions are. and so, i have a lot to try and cover and i will go quickly and put on your seat belts.
all right, so let's start with genetic predisposition which is something that we've been interested in for at least a century and we've been, for at least a little over 22 years, able to, at least, for some of the major cancers, pinpoint particular regions in the genome where we know there are very strong mutation-- mutations that have very strong effect. so, beginning in 1990 with the identification of brca1 and brca2, we've begun to see this sort of sweep of changes of both the allele frequencies, as you can see on the bottom down here, and then the effect sizes.
and this cartoon is not meant to be precise, which is giving us an idea that there is a kind of genomic architecture to a disease like breast cancer and similarly for prostate cancer or colon cancer, pediatric cancers like neuroblastoma are beginning to see interesting comparable, but not identical pictures. and so, i think, the key issue is as we use the next generation of technologies to sequence, we're going to be in the position to be able to fill this in further. so, we went, as i mentioned in 1990, i mean,
we were able to use positional cloning and identifying regions with brca1 and 2. so, then, the world of genome-wide association studies where we have literally thousands of individuals with, and thousands without, using the microchip, you know, the microarray snp chip to identify sets of variants. and now, with the next set of the icogs chip that's out for review right now, would probably have some 70, 75 regions in the genome that are uniquely each independently contributing a small risk of breast cancer in women.
and then, we have this space in between that we really have to look at, and this is where the sequencing is going to get particularly difficult, where we're looking at things that have a moderate to mild effect, but clearly are there in the population as well as in special populations as well as in families, and this is going to really be the hardest part of, i think, genetic susceptibility, is filling this particular part of the picture. as you notice, we don't have things that are sitting out here that are highly frequent, you know, 30 or 40 percent of the population with very strong effects.
again, this is something that the population geneticists has been looking at for years and i think with the different kinds of theories and approaches have clearly been borne out, we don't have these very strong effects in high frequency in the populations except in one particular cancer where we see this, and that's testicular cancer, which is relatively rare, but there are very strong hits and just not one, but several in testicular cancer. but as i'll show you later, it's by far the most familialy-driven cancer,
if one can say that, where you see the twin studies and the relationships between father and son and the like are of a much stronger nature than we see in any of the other cancers that we typically deal with. we also know that use-- the use of the human genome project has been really quite striking in being able to identify mutations in the germline. so, there are now about 100 places in the genome where many of these are very rare syndromes, but at least we can identify, in familial or special populations,
mutations that predispose with a very high risk to the development of one or more cancers. and some of these, they are multiple cancers, which raise the question of the genetic susceptibility of, it's just not the genes that are deterministically on their own, but the question of what other genes, but more importantly, what are the environmental exposures. and i think this is where the whole world gets far more complex and where the sequencing of that space that i've talked about before is going
to have to really rely upon exquisitely detailed phenotype information from very good epidemiologic studies. so, the investments of bcag over the last 40 years should really begin to pay off particularly as we take many of these studies into the sequencing realm and to the next generation of gwas. so, when we look at that map, we can sort of see across the genome and it's sort of a shotgun. there are not particularly cancer susceptibility regions in the genome, but it's hard evolutionarily to think of why that would be the case
and cancer is the downside to something evolutionarily. it's very hard to imagine as well. so, we don't have an hla equivalent or something that you see in immunologic or immune disorders for cancer, per se. there are some regions where some things show up like the tert region at the end of chromosome 5 or the cdkn2, you know, ab and all the other genes on the end of chromosome 9, where interestingly enough, i put, you know, i put these thunderbolts here because these are the only two of those 97 that map to the same regions that we're seeing in the gwas, okay?
so, it's telling us that there are different genes that, from an evolutionarily point of view, have evolved where there are mutations in some genes whereas not in others. so, again, the complexity of this is really quite apparent. so, when we think about the age of genome-wide association studies, it's been a very exciting time for discovery. so, we have the new candidate genes and new candidate regions about a third of what i'm going to show you in terms of what we know for genome-wide association studies in cancer
or mapping to regions where there are no genes. but that doesn't mean that, not interesting in my mind, they are, in many ways, more interesting because they're raising sort fundamental biologic questions of where you would have disturbances or perturbations of particular regulatory pathways and rnas and processing the epigenomics and the like that maybe particularly important. we know that these give us clues for mechanistic insights. and i'm changing this round because when we first saw the world of gwas,
there were a lot of ballyhooing and that company's spring up that they were going to be able to put these tests right into clinical practice. and unfortunately, these were far too early. and most of them, while of very good intention, i don't think the data is really worn out that the few snips that have been identified in prostate or breasts really are clinically actionable at this point. what we really have is this very interesting, i think, sociology of geneticists
and epidemiologists working together to discover biology. so, these are three different worlds that are still trying to get used to each other. and i think they're doing a pretty darn good job, but there is an accommodation and some of the transition with the iteration of the things going on. and then the challenge of using this risk prediction. for individual risk, it's hard to say we are there yet. there is hope and we can see light at the end of the tunnel, similarly for public health decisions,
but i think these are going to require further studies, and more importantly, getting a more comprehensive picture of the variants for any given disease in that space. and then assessing how much of the genetic contribution of breast cancer or prostate cancer can we really identify. and it comes back to a fundamental point that i didn't include before, i was hoping that joe fraumeni said to me years ago when we really started in on this, he sort of in a-- in his wonderful sort of-- with a smirk and smile on his face, you know,
cancer is a hundred percent genetic and a hundred percent epidemiologic. he's telling me the environmental lifestyle exposures. we know what the genes are now and we're getting the sense of what the set points for any given individual is. and we are still trying to amass what we think is sort of the expose zone or, you know, the world that exposes lifestyle decisions that are important in interacting with what our genetic set point puts each of us individually, add or not added. and so, that relationship is particularly difficult to go at,
but i think that that's really the future. so, if we look at genome-wide association studies, this as of the middle of may and i have a wonderful post-doc by charles chung who tracks these, and we sort of followed the literature and we ended up keeping this database. there are 240 different regions for 25 cancers and some of these in very rare cancers like ewing's sarcoma of adolescents, neuroblastoma in children, as well as prostate, colon, and breast. and interestingly enough, there are about 85 more than are in the publication pipeline.
and then, we can see from some of meta-analysis and studies that we have going on in others that these numbers will continue to climb. and the question is how each of these are going to play out for the specific diseases. but as you can also see, there's something of interest here, that are these set rainbow regions. and these rainbow regions are regions where we're saying two, three, or in some circumstances, like tert, where there are now nine different cancers that have common variants in the region of chromosome 5
that are associated with one or more cancer. and at some circumstance, the exact same haplotype flips both ways, in one disease, it's a susceptibility, but in the others, it's protective. so, in my mind, this is telling us that this is the beginning of the story and that there are going to be very interesting environmental and changing interactions that are probably behind how and why we see that, particularly in the chromosome 5 region where we know that, for instance, basal cell and melanoma have that flip, which is very interesting. two different kinds of skin cancer.
in one circumstance, a haplotype appears to be protected, but it's susceptible to the other type of skin cancer. so, you know a lot of exciting biology is ahead of us. so, we've learned that, you know, in conducting gwas, the agnosticism is that p values do matter to protect against this enormous sea of false positives. and the size of the study is important. and, you know, bob hoover and i continue to have this fun sort of debate back and forth.
the design "sort of" matters. to the more classical epidemiologist, mat-- you know, design matters tremendously. but in the discovery, the capacity to use convenient controls and to cobble many different kinds of studies have at least afforded us a first generation of discovery. and i think that these, you know, these issues of design are very important thinking about how we go forward in retesting and looking in particular kinds of studies
to get a more refined understanding. for each of these regions, remember, we just have a map in front of us. and we have a marker for a map that requires mapping with either sequencing or going to the thousand genomes project or any number of computation, the interesting ways to begin to figure out what are all the variants and what ones could be the best candidates to go into the laboratory to really get at this essential point of collaboration, where as i mentioned before, where epidemiology meets genetics to discover biology.
and it's that next huge sort of chasm of taking each one of these variants over the best variants into the laboratory to begin to unravel the myths really essential to the, i think, making sense of this. and i would say lastly that gwas is really not for the weak of heart or stomach because you can spend two and half years putting together these studies, doing your analysis, and then diagnostic statistics show up no matter whether you had your favorite gene and you want to bet your entire mortgage on that gene being important,
the data will be the data and you have to recognize that the statistics are very important and that, you know, the priors tends to sort of fall by the wayside. so, it's here, the other thing that we've learned is that a lot of the things that are close to genome-wide significance, when we increase the sample size, many of them continue to push in. so, john ioannidis wrote this very nice paper. and we clearly are seeing this in the prostate and breast and ovarian world. whereas we've doubled, tripled, and quadrupled the sample sizes,
many of those things that were just underneath these sort of thresholds that we use for publication and for really committing people in the laboratory to spend one or two years working on it. you don't want to look at a post-doc and say, "oh, there's a possibility this is real, go spend two years and we'll figure it out later." you want a very, very small likelihood of a false positive before you send someone to work on that. and indeed, pushing more of these across that genome-wide threshold clearly,
you know, sizes is coming along and it's very important for that. at the same time, we know that the data sharing is very important and then george, i think, made an allusion to this wherein in our cgem study and some of the other gwas, we have now have 11 of them that we've move on to dbgap that are available through the data access-- committees under the data access users, you know, consents and the like. and in our minds, this is very important. we can see several hundred publications out there using the data sets that people have formal permission for that they're able
to do creative methodologic, who are applying at non-cancer or other sort of creative kind of cancer studies. and this is very important because these data sets are very rich. and we continue to learn more as we look at them as smart people come up with new methods. and i think it's really critical to do this and, you know, the dccps has been out front as well in pushing the number of the studies into the dbgap setup so that we really could get as many people thinking about these related questions and whether they are methodologic
or scientific discovery or character invasion and the like. and i think where this is going to get particularly exciting is where it's going to start to intersect with pcga and we're going to start to layer somatic and more exposure information so that sort of three dimensions of the germline susceptibility, the genes going awry, and what's driving it from the environment. whether it's smoking or alcohol or exposure to something or, you know, obesity and the like, are going to be able to be looked at in much more exquisite details.
and that's really some of the excitement of, in my mind, of the next 5 to 10 years. so, when we look at the gwas map that i showed you, it's across the entire genome. again, there's no hot spot, per se. and no single cancer other than testicular really has a pattern to it, but it's really striking as i mentioned that testicular cancer, just about every hit that we see and we have four or five more that are lining up, getting ready for publication,
are in sexual development genes. so, it's very, very striking in our minds. and we can clearly see that, you know, this is a disease, as i've mentioned before, with a high estimate for heritability and it's interesting that heritability as we see the genes are all still converging upon very related genes and pathways. this is what we all had hoped, in 2005 and 6, we would see with breast cancer and be able to link it with hormone use or with obesity,
but we haven't seen those kinds of, you know, that kind of data, but testicular cancer has been sort of the best poster child for being able to put biologic hypothesis out and seeing that the actual data is focusing genetic variation on particular biologic pathway. so, i would just say that overall, as we delve into this in more detail, that gwas is a great start, but it's really not the end. it's not even the beginning of the end, but it's perhaps the end of the beginning.
and i think we have to keep that in mind as we go forward. so, let's take prostate cancer, disease that one in six, one in seven men in united states will be affected. and we know that age is a very important risk factor. ethnic background and family history, each of which are sort of, i think, surrogates for a genetic component, per se, but there is not a lot else that we've been able to take to the back. there are a lot of alluring observations, but in terms of the establishing risk factors
for prostate cancer, those are our three. and this is where we've seen, you know, with genome-wide association study, 48 that are published and another 24 that are coming along very quickly. so, we can see that, again, there are probably 70 or more different regions of the genome that are associated with risk for prostate cancer. and interestingly enough, some of them are in regions iden-- that have been identified in the type 2 diabetes genome-wide associated study. and the alleles are going in the opposite direction. there is an older literature suggesting an inverse relationship
between type 2 diabetes and the development of prostate cancer. so, again, there is some variants of interesting biologic observations that are coming out of it and some of the groups are looking very hard at these relationships, and particularly, what is it about the biology of the thada, thada gene, or jazf1 and the like, that puts you in one category or the other, per se, with respect risk. the other thing that we also notice is there is a particularly controversial region on chromosome 19 over klk3 and that's the gene that makes the psa, the prostate-specific antigen.
okay. and depending on how your study is structured, you see that is associated with prostate cancer or you see it, in our hands, in cohorts, associated with psa levels and the screen for psa level. and i think there had been some interesting meta-analysis and this gives it sort of the refinement of what i would call the second generation analysis of putting studies together and looking at the designs and having very smart epidemiologists sort of better assess what the data is looking like. and it really suggests that the klk3 association has clearly got a component
that's driven by the psa screen. but there is also a very good literature and i think there is suggestion that the klk3 gene itself has something to do with primary carcinogenesis with prostate cancer. and there are groups that are doing that, lloyd [inaudible] and our group at ltg and ross eeels [phonetic] in the uk and bill catalona [phonetic] and several other groups are all looking at this question and it sort of again raises a very interesting and important opportunity where not only can we scan for cancer, but cancer risk factors like psa.
so, there are set of gwas that are going on now to ask the question, are there genetic variants that are important for being able to interpret and potentially reinterpret psa levels? so, not only would a man be considered to be evaluated for his psa level, but what does his genetic profile look like that you would interpret that psa level in the context, the genetic profile? and so, you know, similarly, people are looking at this smoking and alcohol and the like where we know these are very important risk factors for known cancer outcomes.
okay? so, the other thing that's been striking about prostate cancer is we haven't been able to take any of these 72 regions and be able to stick them on aggressive advanced prostate cancer, gleason 8 above were metastatic disease. so, there's something about the development of prostate cancer that probably, so that we can begin to parse into two different categories, that which is important for etiology and how prostate cancer starts. and then those are the things that are important for the sustenance, so to speak, and how prostate cancer can indeed become more aggressive
or develop in nasty or more dangerous ways as opposed to others. and there probably are genetic regions that are probably very different that are modulating and contributing to that risk. and i would also say what's interesting of 240 loci that i showed you before, interestingly enough, not a single one of them to this day right now winds up with the survivorship in that particular cancer there are lots of reasons, some of them are powered calculations, some of them are sample sizes in heterogeneity. but it's been striking that even out of 240, we haven't seem a handful jump-out
and say, not only is this important for risk for breast cancer, but it's an important, you know, locus that may be important for outcome with respect to her-- you know, percept in or any number of different kinds of therapeutic modality. so, when we think about prostate cancer, we also have to now think that each of these variants explains a small proportion of the risk. so, if we take those first 48, we can suggest that it may be about 15 percent of the overall risk in adding the next 20, you know, 24 may push that number to about 20.
so, the questions is looking at this information, prediction is difficult especially about the future. and this has been said by yogi berra, a hero of mine here, a great intellect. not a hero of mine, dan quayle [laughter], a former vice president in the united states. but the first person to really write this and talk about this was a very bright nobel laureate, and you have to be very careful in how and in what way you're going to use this information for applying it.
so, when we start looking at the discriminatory power, we take-- let's take a-- and go in to some of the cohort studies. so, here on the breast and prostate towards consortium with p craft incidence and we have a paper that's published where we've looked at some 26 or 27 hits and going in prospective cohorts and being able to look at age and psa levels, as you can see, the genetics even by in the older population where we know age is a very important risk factor, still only gets us to a point of 0.6, 0.65, maybe 0.67 at best. whereas psa is out here in a stronger,
more clinically actionable level although the whole issue of psa, i'm not going to get into, quite controversial at this time., but genetics is behind that. so, that's why i say genetics is not ready for primetime, per se. but let's go a step further and say, all right, what is it that we can do? so, in collaboration with the lung and chattergy group who have spend quite a bit of time very elegantly modeling the statistical basis of what we see empirically in terms of sample sizes, the effect sizes, and p values, and then projecting ahead, what would we be able to see if we were able to scan
in a much larger set of number of individuals? so, then what's our limit for the common variants? those that are 5 or 10 percent or greater in the population, that proportion of that map that i showed you with genomic architecture. and for breast or prostate, the best we probably will get to is an auc of about 80 percent which probably is still-- is less than the psa which tells us that just using common variants alone is going to be very difficult to make the case that you just look at that profile
and you have a clinical action. now, if you put the gail model and some other things that people have looked at, we may move that a little bit. but, again, the less common and rare variants are going to be necessary to really put this in a range that we would want to make individual decisions. one could argue that you could look at the edges here and ask the questions of public health decisions of people who are at very high risk and very low risk and consider prevention or when to start, you know, mammography and the like, those kinds of questions.
but, again, they have to be answered in very well designed studies as opposed to just inferred on the basis of several different disparate data sets at this point. when we compare this to a disease like crohn's disease where we know there's a much stronger familial component and you can see there, the common variants really do push you up to a highly, you know, predictive level that you really have to think about what the clinical action could be in using a gwas set of data, particularly in individuals with crohn's disease or something like that.
and this tells us that there are different diseases displaying very distinct architectures. and that's where we know we are going. so, our next generation of genome-wide association is not only looking at other populations, but looking at the meta-analysis of bringing all these groups together to continue to discover and shift to lower minor alleles frequencies. we could probably use our current scans to get to about 2 or 2-1/2 percent, but below that, the imputation is a bit unstable and it's much harder to do
that because you need very large sample sizes. so, for the 1 percent, you're going to need 80,000 cases and 80,000 controls put together to find the things that we would expect to find. so, we see distinct difference in the underlying genetic architecture and these are common variants and we see them in the log additive effects, but we haven't really began to, i think, adequately look at the epistatic effects in the relationship particularly with gene-environment.
and there have been several papers from eric lander and peter visscher, and then allan [inaudible] has a paper coming out addressing this question of the relationship of gene-gene and gene-environment interactions explaining some of the missing heritability. two years ago, there was a paper and discussion that, you know, gwas had only explained a small percentage of the heritability. well, we looked at very early time point and again, i would say that as we push on in this field we will fill in more of that space, but we have to be realistic about what limit the common variants are going
to have for being able to explain genetic susceptibility and it's probably going to vary from disease to disease. so, when we come back and look at these, we can see, for instance here, what we know in breast cancer and then let's go here to prostate cancer which looks very different. we don't have, in prostate cancer, those highly penetrant mutations that we see in breast cancer. brca2 is just beginning to show up in the prostate cancer world, particularly with respect to aggressive disease,
but we don't have that tail coming down, per se, but we have many more than we've been able to identify earlier on with genome-wide associations. and then, if you look at a disease like neuroblastoma, a very rare pediatric cancer. you clearly can see familial components explaining about 1 to 2 percent and then, you know, the common variants explaining another small percentage, and then this sort split in between that is just in the process of being identified and john baras and his group
at children's hospitals really have been instrumental in leading this. and again, different diseases with different genetic architectures, so that's one of the mantras and then lung cancer, we haven't been able to see a whole lot with lung cancer and i think a lot of that has to do with the very profound effect of smoking, just makes it very hard to find things that are 1.2, 1.3, 1.4, when the effect of smoking is so profound that it's, you know, it's wiping-- it's whitewashing what would be a lot of small genetic contribution. interestingly enough, bladder cancer which is sort
of just behind lung cancer is a very successful place where we have some 12 or 13 regions and now another handful that are coming out of some of the meta-analysis. so, they are, again, very interesting places to look at the gene-environment interactions. so, when we have these markers, we have to start thinking about what do we do with them? well, the first thing is you look and you say, what are these markers in linkage disequilibrium with,
are there coding variants because we can make sense of a gene, that 1 or 2 percent of the genome, but almost all of them are out in the non-protein coding regions whether they're in genes or in intergenic regions. and this is where the ability to look at the bioinformatic tools that are available in many different places, particularly, you know, ncbr and other kinds of tools to look for regulatory of unannotated transcripts and to begin to make the kinds of decisions as to how and in what way we will then do the functional analysis to begin
to ask questions about what could be the effect of this particular variant because, again, that list of 240, only three of them turn out to have any evidence right now of being a coding variant that would change a protein where one amino acid would shift. a number of them are beginning to emerge with first papers showing that the regulation or the promoter or something about the slicing sights look very different and that these might be, you know, the way in which we explain why
that particular region is contributing to an increased or a decreased risk for a particular type of cancer or sets of cancer for that matter. and then, the experimental strategies are the whole world, and so, you know, we sort of have these combs with gwas where we get to a particular point and we have to turn it back on its side, turn it upside down and really ask the question, where and how does each individual region and the proclivities and the uniqueness of that region going to drive the questions to go forward that the lab needs to look at it? and so, the more comprehensive we are in being able to look at the variants
that are in the linkage disequilibrium that are correlated with what we see from the gwas. and then the ones that are promising to go into the laboratory are really where we then send our next generation at the ground troops to really make sense of this. so, if we take a region like that chromosome 10q11. i'm sorry about the font from my mac to pc there is variation but we-- there was one particular snip that was like second strongest
in the early prostate cancer, gwas. and it has lined up with every gwas that we've seen and it's in the gene msnb-- msmb, i was about to say msnbc, but that's, you know, covering the election. and this is a particular prostatic-specific serum marker that has very high expression in the prostate. and we were able to see actually the illumina was lucky enough to put the best snip on the chip because when we go to map that region, we can't find a stronger snip and then we can now put together in several papers from our group and several other groups a story that you can see
that that risk allele associated with prostate cancer is also associated with lower expression of the msmb gene and by reporter assays, by em assay in prostate tissue and tumor tissues, the basic biology of really corroborating the susceptibility finding that we see at these large population base studies. but it's interesting that right next door is another gene ncoa4 that's been a great interest to the androgen sensitivity world particularly in prostate cancer and we re-sequence this with next generation sequencing and we're able to see that, yes, that promoter looks most interesting but matt freedman
and his group went on to show that the expression on an ncoa4 is driven by that same promoter. so, there is within-site pleiotropy, so to speak, in the sense of affecting more than one particular gene which tells us that, you know, many of these regions, in my mind, are going to perhaps be a bit more complicated than a one snip, one outcome analysis. and so, this is really where we're going. and then, we go to the next level with mike dean and ccr,
we'd looked at the fact that there are these fusion transcripts that you could see on the ucsc browsers, but then have gone in the laboratory, evaluated that these are there and the functional role of these may contribute to carcinogenesis. so, we can clearly see this in cell lines and in tissues and the expression of this does have, you know, interesting effects on the standard in the way that we measure carcinogenesis in the laboratory and these transcripts can be confirmed. so, there's, again, a level of complexity just in this region,
and so we have to at least pause it that those 240 or more regions that we know about are going-- some of it are going to have similar types of complexity. so, you know, this is the hard part of cancer biology that it only gets more complex, but not simpler as we dig deeper. and we go to 11q13 where there are a number of hits for prostate, kidney, and breast cancer, possibly endometrial cancer, and there genes that are nearby, but their signals can't be linked to those genes directly. and interestingly enough, you know, we looked at cgems and identified one prostate, but we've been able to map this
and see that one became three. so, there are three independent signals, all of which independently contribute to prostate cancer risk in that particular part of 11q13. and now, there's the kidney cancer which was our topic in the kidney cancers gwas and that kidney cancer region could now be genetically and functionally link to the ccnd1 gene well down the stream through the vhl pathway which is very interesting particularly in kidney cancer considering the hypoxia pathway of vhl,
is an important gene for highly penetrative mutation as well as the sort of the basic pathophysiology of renal cancer. and now the australians have taken the breast cancer region and linked that to ccnd1 as well. but we can't push the prostate cancer that far down. the prostate cancer looks like it's more associated with ccnd2 and this gene here. but again, these are the kinds of things as we look very closely. when we look at genome-wide association studies, we also can ask the question,
what can we do with these to look at clinical outcomes in different ways? okay. i'm taking a little bit of the tap now. so, the group at st. jude's asked a very interesting question that we knew that children with leukemia had different outcomes with aggressive therapy based on their genetic pop-- their population genetics background. so, children who were described as hispanic and the native american background had a poor rate of going into remission in their first aggressive therapy. so, they took several thousand children that they had treated in st. jude's
and with the cog and looked at the genetic structure itself and just used the population genetic structure analysis and asked the question, does this indeed correlate with this clinical outcome that people had published four or five times before in clinical studies with no real genetics other than self-described ethnicity, per se. and interestingly enough, they could see this very strong relationship. and then, using that population structure at least points towards several regions in the genome that may be of great interest because that they have different frequencies with the allele,
per se, in different populations. now, what the-- the other thing to point out is the study that st. jude's had conducted also had a second retreatment. so, the group of children who were of hispanic background, who had the native american, you know, component of 10 percent or more of their genome, per se, into that mixture analysis turned out to go into remission at the same rate as everybody else in the reinduction, per se. but, again, this is telling us something about potentially the pharmacologic response to that initial therapy, per se.
and so, it raises the question of thinking about this gwas as using a preliminary biomarkers for screening pharmacogenomics and the like. and so, there are a number of groups that are looking at this particularly in other diseases like cardiovascular and like george, did you have question or? >> oh, i'm sorry. i was just wondering about the-- i mean, whether you're-- do they try and correct for things like socioeconomic background and-- >> well, it's important in, for instance, you know, some of the pre-genetic studies from uk, it raised that question.
but here in cog-- in the cog at st. jude's, they were able to do that as a first pass. and so, the second study that's going on right now will better address that pretty interesting question. >> i see. okay. >> but it, you know, again, how you want to stratify this, again, the numbers are very important and-- >> no, no, no. no argument, i'm just, you know, no--
obviously things like obesity and diabetes and a whole pile of concomitant morbidities exist in native populations. >> right. but, again, these are children-- >> right. >> -- so the productivity to those is very different-- >> true enough. >> -- than actually-- >> -- kids under 18. so, if this was an adult population, yeah, you have--
there's a bigger issue there. i mean, there is the question of access to care. that's been the criticism that richard holstein [phonetic] and others have raised, you know, in different socioeconomic groups, other different levels of care and monitoring. well, i would say that st. jude's and cog did pretty darn well in tracking any child that gets into this system. it's not like the adults, like, when you're over 18, you're growing up and if you decide to come or not come for your follow up bone marrow
or your therapy, that's your choice. the kids aren't there. being trained as a pediatric oncologist, people are out on the street looking for them and tracking them down and, you know, judges will actually make children board to the state if parents are not appropriately taking care of them and it has nothing to do with the ncf. >> yeah, i know. and then to be honest, if they are at st. jude in the first place,
it suggests they're probably not the same degree. >> yeah. yeah, exactly. okay. but i-- but it is an important question that's been raised and i think the next generation of study talked with mary and company will address that. we know we can look orthogonally in all sorts of exciting things, you know, biometrics in tobacco, caffeine, alcohol, vitamin levels. demetrius albanes has been doing some very exciting things, looking at vitamins, you know, e, a, not q or f or whatever, but just having fun with you demetrius,
but i think it's very important. but what we do see, interestingly enough, is in height and weight, there were over 200 regions that have been identified which in my mind tells us now we can actually talk about pathway as opposed to a constructed, many of us trying to create usually post-talk. here, genetically, we know of 200 regions that are important for bmi and the question is how and in what way these are related? there's a diagnostic search and identification here as opposed to go networks and things that have certain bioses.
i'm not saying that those aren't good resources, but i think there is an opportunity to sort of applying and ask informatically the value of how we define pathways in a circumstance where we have enough information, i think, identified by a statistical approach as opposed to, you know, the heuristic superimposition of the different kinds of values that we put on how we create pathways, per se. and that's sort of my challenge to the computational world. so far, no eggs and no tomatoes have come at me today. but in other places, i have gotten some people very hot on the pallet.
now that we've done these genome-wide association studies, we've come across some unexpected findings, seen large chromosomal abnormality, structural variation, aneuploidy in germline. a lot of this work is work that kevin jacobs at the core genotyping facility had really sort of first started to characterize this. we built the pipeline for doing the qc analysis that were-- there were, you know, samples that we could see if you look here across the chromosome seeing the mapping of the b allele frequency
in the log rr ratios, that you could see in this particular region here a deletion. there was across a good part of the chromosome. there was a subset of cells that are what we called genetic mosaicism, not all, but a proportion of the cells that are missing that part of the chromosome, per se. and we started to see this over and over. and so kevin basically rewrote a set of programs both for the normalization calling off from the alumina to begin to really allow us
to get to this question of looking at genetic mosaicism and there are two papers that have recently come out in major genetics from our group and from the geneva consortium led by cathy laurie where we've compared some of the common samples that we have and be able to refine what kevin had laid out and what laurie and what cathy had also identified and developed in a slightly different way and it's very reassuring because when we look at this, what we're looking at is the log rr ratios, the observed probe intensity to the expected intensity and then what the minor allele frequency, the b probe intensity looks like.
the bs, we would expect them to look, as we can see here, top bottom, and then if-- this heterozygous in the middle. and we don't care whether it's a, c, g, or t, this is just looking at the distribution of any particular snp assay, are you seeing just one population of cells that has one particular genotype of ab, bb, aa, or what have you. and so, we started graphing these and as kevin built this, you know, different analytic approach and with meredith yeager, they were able to really, i think, come up with a very strong case for being able
to confirm what we had done in validation studies for their spanish colleagues, luis [phonetic], perez [phonetic], geraldo [phonetic], and [inaudible] in spain, where we were able to identify distinct subpopulations and show this in the laboratory. and we made a cut-off of greater than 2 mega bases. so, we think this is the tip of the iceberg where there are individuals who have resulting alterations in copy number and loss of heterozygosity. we've known that mosaicism for a long time, classical genetics, talking about, you know, also the things from eye color to, you know, cats,
to age-old explanations through developmental disorders and catastrophic diseases, trisomy 21, turners syndrome, neurofibromatosis. but recently, there have been some groups that have published that there are these very rare familial mutations and in those families, they have a lot of this mosaicism and a risk for cancer, per se and then, there's, on the other side, the circumstances seen in mosaicism that [inaudible] and groups have found with proteus syndrome and with ollier's disease, but we have known cancer oncogenes where we-- that have known mutations that we can see in a subset of cells in a--
of a particular cell type in an individual which has not developed cancer. they may have other manifestations that have other phenotypes, per se. and so, it's really interesting about the context of how and why mosaicism develops and most importantly, it's clinical manifestations. so, you know, to have an akt1 mutation or an idh1 mutation, but not have lung cancer or brain tumor or something, it's sort of perplexing to the cancer biologist but to the phenodevelopmental geneticist, they've seen this in individuals who have very interesting types of, you know,
rare genetic highly penetrant syndromes. and instead of seeing it in germline, it's in a-- it's a tissue specific set of mutations that you're seeing in skin or in kidney or whatever. and it's really, i think, a very interesting facet that tells us that the genome is probably far more complex and we can't think of it as that two-dimensional way in-- the two-dimensional way in which we classically do. so, here, we went ahead in our spanish study
and published this two years ago showing that about 1-1/2 percent of the population had this, had these large events and we couldn't associate it with bladder cancer, per se. and we did the laboratory evaluation. so, we went through our very large first generation set of gwas that we conducted in bcg with some 57,000 individuals who are now over 80,000. so, we're going to be replicating what i'm going to show you. and we can see a little over 1 percent of the population around that range that have individuals with one of this events and there were more
than expected individuals with multiple events. and the kinds of events that were detected [inaudible], we had a large number of cases and control. and if we could look at particularly [inaudible] kevin had come up with a very clever way of displaying this based on the patterns of normalization that he'd used to set up the pipeline, we were able to begin to segregate what we thought were gains or losses or what's copy-neutral loss of heterozygosity, that you have what looks like the same numbers copied,
but it's only of one particular allele, per se. and uniparental studies is one version of that. and we have limits of between about 7 percent and 95 percent detection, i've seen to the different detectable populations. so, when we started looking at this-- sorry, pc had swapped this here. what we saw was for the age at which the dna was collected, age is really important. and so, one of the conclusions here is as you get older, your genome starts to fall apart.
and so, as we could see up to age 75 to 90, almost 1-1/2 to 2 percent of the population walking around have these kinds of large scales structural mosaicism. and the geneva found exactly the same thing in their study. and now, there's another study that's about to come out in your europe collaborating this. i mean, what the interesting question, is everybody got to about age 80 and that's where the cohorts are. and if we actually look at our data past 80,
there may be a suggestion that they start to drop off a little bit. so, we're interested now in developing some studies with up octogenarians and nonagenarians and centenarians asking the question, is this a way to identify people of lower frequency that this maybe an important, so to speak, risk, you know, protective the factor? but it's an interesting question about the aging genome. we could also see that we had a higher frequency in men compared to women and then when we started looking at this,
we could see that there are different types of events. so, the gains were involving entire chromosomes whereas copy neutral, we're in the telomeric regions and then the interstitial regions are where we had loses. so, different chro-- you know, different kinds of events are affecting different parts of the chromosome with quite distinct frequencies when we looked at this large group with 57,000 individuals and the geneva saw basically the same kind of pattern which raises some very interesting questions about genomic stability
and what kinds of events maybe important with respect to telomeric instability as opposed to whole chromosome or the interstitial, in other words, inside of a particular chromosome, per se. and when we go further, here you could see we saw that there were individual who have multiple events. some of them with cancer, but we saw a number of individuals who were cancer-free, who had a number of these kinds of events. so, we looked at the circos plot of looking at cases on the right and cancer-free individuals of the left and you could see
that there's certain regions on chromosome 9, 13, 14, and 20 that are, so to speak, hot spots where the same regions are involved in people who have cancer, people who don't have cancer at this time, you know, raising very interesting questions, and chromosome 20 and chromosome 13 and 14 are regions with great interest to the leukemia world and aml, col, and cml, these fall right on the regions that are known to be translocations that have actually been identified as part of, you know, the driving force of col. i'll come back to that in a second. we then looked at adjusted analysis overall and we could see the individuals
who had cancer versus those who did not, we could see a, you know, an effect that there was an increased risk for developing cancer in individuals who had these mosaic events. now, we only see it in 1 percent of the population, the 1-1/2 percent. so, the numbers are relatively small, but when we look at the two of the cancers that we had the most amount of data on, we could see both kidney and lung also went up there. and this is something that we will go forward on. now, when we looked at the hematologic cancers and the group from geneva led
by cathy laurie, they spent quite a bit of time to going back to the cohorts and the sub-analysis of those 4, you could see individuals who had hematologic cancers, there was a much greater likelihood of detecting this and this early detection of disease, you know, and when you go back a little bit of the hematologic literature, you could clearly see that people who have cll before that are clinically diagnose with cll will have the 13 or 20 events, per se. and so, we could see this in our untreated leukemia as well
as in the studies in geneva. we've now taken these y chromosomes, so kevin has rewritten the programs and very interestingly, we unfortunately compute for the men that with age, your y chromosomes undergo with probably a higher frequency loss of part of that y chromosome. and we spent quite a bit of time developing pcr assays to do qpcr with [inaudible] in the laboratory and we now can calibrate this and see that there may even be a few men walking around with y chromosome gain, which something that we're trying to make sense of.
so, we're just getting ready to run this across the 80,000 individuals where we take out the 50 some thousand men and ask these questions of the relationship to cancer risk, and particularly, the smoking. so, we know that this clearly goes on. so, in our minds, the aging genome has important implications for, you know, thorough characterization for tcga, you know, when people get excited about seeing somatic events, got to have to ask that question, what does the germline look like? because there are parts of individuals out there that have this mosaicism,
and so are the "somatic events" potentially map back, at the same time, could these somatic events be very important as we look at individuals as markers of genomic instability or, for that matter, for very specific cancers? and so, from our cohort studies, we have the opportunity to be able to go look at tma, to look at the tumor tissue and ask the question, do we see the same event in the tumor that we see in either blood or buccal? and it gives us insights in the early and late events and genetic biomarkers for early detection, particularly, hematopoietic cancers.
so, you could think of this now as sort of two different types of events. one is, are there embryogenic progenesis with somatic alterations that are below a threshold of detection, per se, in blood or buccal cell. i neglected to point out that about 18 percent of ours are buccal and the-- no real difference between buccal and blood with respect to the frequency which is very interesting. and then, there are unknown events sort of trigger survival bottlenecks that lead to positive selection. and that's one way of looking at it.
or it is that the genomic instability with aging, the process of repair or sort of containing, you know, stable genomes begin to breakdown and then you get these proliferation of suppressed populations and we know there is something called immunosenescence with age. you lose some of your repertoire immunoreactivity to a number of different kinds of stimuli and different kinds of challenges. and do we have a comparable sort of thing with respect to somatic events taking place in the large? and we're now sort of diving in to using sequencing data
to ask these questions of-- do we see this for smaller events and using, you know, next generation sequencing technology to address these questions as well. and the question is-- the bottom line is, the co-existence of these distinct clonal populations. so, right now, we're looking and mapping in our breakpoint that we have published on in analyzing tumor pairs. and we're also going in this-- into looking at the timing dynamics from known cohort studies,
going at the blood bank and the australian twin registry to ask these questions, do people have mosaicism that is stable, the three or four measurements always 15 percent in this person or is there a curve that maybe of real biologic consequences for that individual? and then, the confirmation of the known associations and then going at the x and y chromosome. the y will be difficult because of the question of lionization because even though you may have a somatic event,
if that chromosome is put to sleep, so to speak, is that biologically relevant? and so, we are sort of scratching our heads on how to approach the x. and then, going in to the hematopoietic, the plco and nhl, we've just finished 9,000 cases from 5 subtypes and we'll look into the mosaicism of that and with neil young [inaudible] aplastic anemia which has a high risk for developing leukemia and mds. then with neil caporaso looking at the cll families. so, we think that in the next year or so, we'll have a number of new observations that help to give us some more clarity
as we now sort of traverse as we've seen here for the first 10 years or so with the genome and the core genotyping facility, we really focused on genetic associations and genotyping. and we've moved in to the exome world and now we're getting ready for population-based sequencing which is going to turn this on all its side. and it's going to be much more complex in terms of the structure of those studies. and i would end with two or three slides here on the value of thresholds and significance because we've always, in genetics,
sort of have this crisis of how can we get over a threshold, declare something significant with a minimum number and go forward. gwas has taught us, i think, the number of things that reminded us, things that are hard lessons for exome and whole genome sequencing because, i think, the bottom line here of whole genome sequencing is going to bring this rising tide of uninterpretable variants, how and in what way are we going to classify them and use them, and it's going the take the laboratory investigation.
i'm not sure we're going to have diagnostic testing that's going to be so easy for a whole genome because-- let's not fool ourselves about germline genetics. it's about discovery, biology, targets, all these things, but we have to validate, we have to characterize, and then later, the clinical application. i think that we have to be very careful about thinking about targets downstream and lifestyle and environment are important. that one is out of line. so, in the not too distant future, i think we're going to look back at gwas
as the kind of "golden age" because there was a convergence of having epidemiology and genetics technology to be able to do a lot of rapid discovery. you have a lot to be learned after that, but, you know, with the allure of sequencing at hand, we're going to have a very hard time. so, i think the key issue here i want to end with is with the great president of the united states who made a very important comment, and i like this in the first person plural, "we not only use all of the brains we have, but all we can borrow."
and this is the spirit of collaboration, and being able to draw from different sources to make sense of studies at hand and to work through the art of collaboration. to that end, i will acknowledge meeting wonderful people from all sorts of institutions. [ silence ]
No comments:
Post a Comment