questions. our next speaker is going to betalking about fictionalizing the cancer genome and linda chin, from harvard school of medicineis going to be the speaker. linda chin:thank you, mark. thank you to the organizers for the opportunity to speak here. it is anhonor. i want to talk a little bit about the efforts in tcga and what i believe are importantnext steps to understand and translate the information to impact on medicine. to start,as dr. lifton did, i'd like to remind all of us why we're here, our goals in cancermedicine which is to prevent cancer, detect it early, because there's no question thatthat's where we have the biggest impact on survival and when that fails, intervene appropriately.and we have heard today that cancer really,
ultimately is a disease of the genome andtherefore it makes sense if we can understand what the genome is telling us, we can do abetter job in managing the patient and that is what, in my view, what personalized medicineis. now there is evidence that genomics alreadyimpact on science and i want to highlight a few examples here. certainly, we can startout with the example of bcr-abl, the philadelphia chromosome, that taught us the power of targetedtherapy. and the herceptin examples that show us that we really need a biomarker to allowus to select the right patient population for a drug to be effective. and certainly,the poster child, the brab mutation, which really was discovered by probably the firstsystematic cancer genomic efforts leading
to effective drugs that are likely to be approvedthis year. that's nearly eight years from target discovery target discovery to a drug,likely approved drug and then compare it to the example of gleevec which took 41 yearsfrom discovery of the target to a drug in the clinic. so given that, i think it's clear that cancergenomics can impact dramatically on medicine. it will enable a more rational and effectiveapproach to prevention which is targeting the underlying ideology. it will certainlyhelp us detect cancer early on, by targeting, in a rational way, targeting a known allelethat occurs early and applying technology, so that's a serum proteomic or imaging. andwhen that fails and the patient comes in with
cancer, we focus a lot of effort on usingthe genome information to guide us to new therapeutic targets and biomarkers that allowus to select the right population of patients for that target, for drugs to get to thattarget. and ultimately it's quite clear that mono therapy is not going to be effective.it's not going to give us a long-term cure, therefore we need to begin to think aboutcombinations, co extinction strategy and learning from the genome will give us a rational wayto go about doing that. so it is with that hope, expectation thatthe nih, nagr and nci come together to launch the pioneering effort. the tcga, the pilotproject ran from 2006 to 2009 and i would say it's definitelyâ it was definitely thefirst effort, coordinated effort, where there
is intention to characterize the cancer, notjust on one dimension but multiple dimensions. not just looking at copy number change, ornot just looking at sequence alteration, but really try to understand what the expressionsare, both the coding and non coding, and also in fact trying to map the promoted methylationpatterns in the same sample. and that project has moved on and i got thisrecent update from the nagi, i believe the council update, the pilot ended in 2009 andit was continued based on the success of that in its phase 2. and the phase 2 has gone andthis is a part of the sample that entering tcgaa provided from nci office where we areright on track to complete the 3,000 cases to study in the phase 2 project which willbe cases from these 20 tumor types from diverse
organ systems. and there's a very aggressiveand ambitious plan to complete these comprehensive analyses of each of these 20 tumor projectsby 2014, i guess. that's what he said. and the current study design of the tcga isthat allâ every one of these tumor samples and the match number will get [unintelligible]sequence and a good percentage of them whole genome in addition to sequencing the transcriptor,the micro rna and mapping the methylation patterns. we are also beginning to layer inproteomic analysis on some portion of the samples. so we're really trying to build acomprehensive map of what the genome is, what the transcriptor is and hopefully linkingit to what the proteum is. now how is this all possible? we've hearda lot about it earlier today. it is being
made possible by the transforming technology,the so called next gen sequencing or massively parallel sequencing technology which i'm notgoing to go into. you've heard a lot about that already today. but one thing that hasn'tbeen mentioned a lot, beyond the cost is the fact that this technology doesn't just giveus cheaper sequence. it gives us a lot more information and much more accurate information.this is the slide that i borrowed from [unintelligible] and it really highlights what the next gensequencing technology can do. it certainly can identify poor mutation and small insertionand deletion, but it can do it much more sensitively and more accurately than the o capillary basedsequencing method. it could also do copy number, like the array based technology but in contrastto that it would do it with digital quantitation
[spelled phonetically]. it would tell youprecisely what the copy number of that particular sequence is which the array based technologyis not able to do. and importantly, it would give us one new dimension information whichis rearrangement that the previous technologyâ we were completely blind to with the previoustechnology. this is the first time we will be able to map the rearrangement down to basepair accuracy in a high through put manner. and then beyond that we can begin to interrogateevidence for nonhuman pathogen sequences and their roles in human disease. and certainlythere's a lot of evidence in clinical medicine that infectious ideology plays a role in humandisease. so we already heard about how much the sequencingcosts have come down and eric lander has shown
how much the production of sequencing datahas changed at the broad institute. but i like to use this slide which i got from nagito show how this is also changing the way we can think about what cancer genomics cando. this is a little piece of advertisement ipulled out from the company alumina, that advertised that their machine, [unintelligible]2000 in one single run which takes eight days can generate 200 gigabytes of sequence andthis is apart from nagi reporting on their annual sequencing production for fiscal 2007.this is from all the sequencing centers, i believe, not just cancer sequencing, all oftheir sequencing output in 2007 and the total is 140 gigabytes. that level of growth, whichhad continued and i believe this is way off,
probably tenfold less than what actually happenedin 2010 because this was a projection earlier in the second quarter 2010. this growth haschanged the way we can think about what we can do with cancer genomics. we are no longerlimited by the real estate that we can sequence on. we don't have to choose between numberof samples we need to provide the statistic power versus how many genes we can sequence.we can do them all with this technology. and they're really and i think we just heard frombrad bernstein that we can apply this similarly to the epigenome. so it really has transformedthe way we can think about and we can look at the genome at a level that we never imaginedpossible. but it also created a lot of challenges: certainlystoring data, transferring them, not to mention
mapping them. and then you have to analyzethem; mixed sequence, variance calling, determining translocation and rearrangement. none of thisis easy. none of them was easy and it's still not easy and it's still an evolving science.and on top of that, the rates of growth is enormous. i guess i forgot to show this. thisis the latest accounting from tcga alone. the sequencing output per month is about 17terabytes. so this is an enormous challenge, not justto deal with the data, process them, map them, but also to make sense out of them. and someoneearlier this morning already mentioned and asked the question about the cost of the analysis.and that clearly is a major bottleneck. and recognizing that, i think the face, too, oftcga has begun at its inception has anticipated
some of this, although i don't think anticipatedthe amount of the challenge but certainly it differed from the pilot in the sense thatit has built and funded dedicated genome data analysis centers. so it is no longer justa collection center to generate data, they actually have data centers that are reallydevoted to analysis. and moreover, the centers together are reallytrying to think of ways to accelerate making sense an analysis of data such as buildingan automated analysis pipeline. and here's an example: the tcga analysis pipeline that'sbased in the [unintelligible] where we try to automate and have fast turnaround, predefined analysis and an example of being able to inject all the available data over 2,000of their data files into a pipeline that can
do normalâ analyze each of the data types,generate analysis for each single data type, in addition to correlated data types suchas which mutation is significantly correlated with survival, for example, in that particularcohort, and do so in a pre defined, automated way, providing a very fast turnaround withthe result that's human readable so that this could serve as a companion to the raw datathat's released to the public. because the majority of communities, which i will comeback to, need to use this data and validate them and functionalize them, but they it'sa challenge for them to make sense out of the enormous raw data. so the analysis isimportant. moreover, i think by automating and puttingthis in the pipeline it provides reproducible
and uniform intermediate data files that canfree up the analysts in tcga and outside tcga to do higher level analysis such as tryingto figure out what the meanings of a mutation and its relationship to a gene that is knownto be in the same complex that might be altered by methylation or by genomic alterations.they can focus more on that type of analysis that really aims to understand a biologicalquestion or answer a clinical need, so that we can accelerate the process. so with that, and with all the talk you haveheard earlier today, i think it safe to say that with the effort of tcga, along with ourinternational colleagues, icgc, in the next 5, 10 years we will have a complete atlasof all thematic alterations and somatic epigenome
alterations in the cancer genome of all majorcancer types. but then the challenge is, how do we go from here to here? and it's obviouslynot a single step. and what it takes is a lot of work. and it was called the valleyof death or big death, and i think one of the things that i think we need to begin tothink about is how to depict all this information and do it in an effective and efficient wayto support and make that knowledge actionable and translatable in the clinic. and one of the examples that i like to remindyou before i go through this example is i mentioned at the beginning that the brab mutationand development of brab inhibitor is certainly the poster child we like to think and we'dlike to see many of that coming out because
of the short time lag between discovery andthe drugs that's impacting on patient survival. but i think we need to remember that thatprocess would not happen in eight years if we didn't have the prior knowledge that brabis a kinase, that the brab signal in the map kinase signaling pathway and [unintelligible]is a good downstream reporter of brab activity that one can use as a target engagement andresponder id throughout the entire process of drug development in process. so without that knowledge we wouldn't be ableto develop the drug. but for many of the genes that we are identifying now we really don'thave a clue as to how they function and what they do or whether even the [unintelligible]activity is there.
so we need to get that knowledge. and there'sno easy way to do it and i'm not sure there's a high throughput way of doing it but i wantto now give an example of a study that we have been working on that i hope will highlightthe value of taking the next step, functionalizing not just showing activity but actually understandhow a genetic element functions in cancer and how that information is necessary to makeit a translationable [spelled phonetically] observation. so, this is a paper published by tcga a coupleyears ago defining using data that [unintelligible] really represents at least four major molecularsub types, which you have defined here such as proneural, classical, mesenchymal and richfor specific genotypes such as the classically
egfr v3 mutations that's known to associatewith global blastoma in the classical subtype. the idh1 mutation that, as we mentioned earlier,is almost exclusively observed in the proneural sub type and so on. now one of the things that we were interestedinâ at least somebody in the lab, johnny, was interested inâ was, what is the difference?what's driving the molecular differences between these sub types? you can look at a transcriptor[unintelligible] using the tcga data and interestingly you find that the major difference on thetranscriptor level, it really exists betweenâ you can barely see itâ [unintelligible] andproneural sub types so he had a hypothesis that maybe some micro rna is regulating acollection of genes and that's really underlying
the molecular differences between proneuraland mesenchymal. this is a hypothesis, but how do you go abouttesting it? well we know we had these data, very complex data from tcga and we neededto make sense of it. and this is where computational modeling becomes valuable and we went to ourcollaborator, jim collins, at bu, and asked him to help us develop and use his cr networkmodeling algorithm which i'm not going to go into, to try to build a regulatory mapof micro rna, rna in [unintelligible] blastoma, which we did with his help, using just about200 samples which we had matched rna and micro rna data, and this is the value of doing theintegrated genomics in tcga. you cannot do this analysis if your rna and [unintelligible]data are generated on different sets of samples.
with the network algorithm, we generated thefur ball that represented the network, which is by itself not very useful. there are 29,000edges among several hundred micro rna and thousands of messenger rna but we had a questionwe wanted to use this for. so that makes thisâ so that we know how to use this information.we take these network relationships and we ask how is that different between proneuraland mesenchymal without going into a lot of detail we identified 17 micro rna that reallydrive the separation between proneural and mesenchymal and you can see that all thesemessenger rna edges connected to these 17 microrna really account for 85 90 percentof the genes that makes up for the signature genes reported by the tcga paper to definethe proneural versus the mesenchymal sub types.
so now you can go in and ask, what are these17 microrna doing, and both of them are unknown. i'll give you one example. we decided to focuson one of them, which is mir34a and with the hypothesisâ well, because it's also a micrornathat has evidence in some tumor types it's deleted in the regional block, it expresseda very low level in the proneural sub type which the hypothesis and is cr edge enrichedfor proneural signature, which i'm not showing you. so that led us to a hypothesis that mir34acan be a candidate determinant of the proneural molecular sub type. so the first thing we have to do is provethe mir34a actually does something in glioblastoma [spelled phonetically] and we can do thatvery quickly by proving that it is a tumor
suppressant mirror, ok? so we can do this,such as, loss of function in human glioma cells where we over express the mirror andshow that we eliminate tumor genicity of these cells in vivo. we can do the converse whichis to use mir34a in a moralized human [unintelligible] and show that now they become transformingand tumor genic. and we can do this in multiple cell systems in human and mouse systems sowe proved that mir34a tumor suppressor in glioblastoma. but that information is notvery helpful. there's not much more we can do with that information so we decided togo a little deeper and say, how does mir34a contribute to the proneural version of transcriptomicsignature? now, this is perhaps a simplistic view, thething about regulation between microrna and
rna but we can certainly think there couldbe direct interaction where the microrna directly bonds to the target gene that negatively regulatesexpression, but he also could do those through intermediates such as transcription factorswhere you get a very different relationship. still, a relationship that you can see inthe network model but it wouldn't be the same kind of negative correlation. so i'm not going to talk about this. i willjust talk about our efforts to focus on this period of direct target on mir34a. throughusing sequence analysis and correlations we identified two period direct targets of mir34aand the criteria we used is down here. now using one of the criteriaâ one of the thingswe're aware of is the sequence prediction's
very have a very high false positive rate.so, what we're asking here is that we only focus on predicted targets that are predictedby all three algorithms. now we went on to prove and i'm not goingto show you the data, but luciferase reporter as a that dll1 and pdgfra are, indeed, directtargets regulated by mir34a through direct bonding and we can show regulation on proteinlevel mir34a of these two target genes in human and mouse cells. so they're clearlynovel direct targets of mir34a. what does that mean? is that relevant? well,the reason that we were interested in this is because of what we know about glioblastoma.this is the old review, it's very old, from 2007 it really needs an update now but itshows here a point that i want to make. glioblastoma
is clinically defined as two types, primaryor [unintelligible] blastoma versus secondary which progressed from anticedent low gradeglioma. over five, 10 years it becomes the same similarly aggressive glioblastoma multiforming. the classical signature genes associated with low grade glioma that progressed thesecondary gbm is a pdgfra and what's not shown here is the work of [unintelligible] groupand then the molecular classification papers from ccga we know that these secondary gbmor the gbm that has idh1 mutations they are of the proneural sub type.so in other words, pdgfra is expected and is a signature genome proneural sub type gbm. with respect to not signaling, which is whatdll1 regulates, we can see that actually,
i don't show here, heidi phillip's paper actuallyshows that notch is activated both classical and proneural sub types gbm and we can seethat signature also in the tcga data. so we know that mir34a is lower in proneural subtype gbm and it up regulates because it's low, pdgfra signaling in proneural gbm, whichwe do see in human tumors. so we can see the human relevance but thereis another important point. how do we know that this relationship we can see is [unintelligible]through some artificial assay where we do report assay to show they truly interact.we do artificial over expression or knock down studies to show that one can regulatethe other. i would say all of these are artificial. how do we know that these things really happenin vivo? well, it's not that easy to go in
vivo to test this, because not only can wenot do it in cell one, and cells in themselves are limited, for proneural gbms it's particularlyproblematic because there is no known proneural cell ones. and i think after many years searchingthere's probably one cell one idh1 mutation that's out there. so the majority of models,cell model systems are not proneural. and that's where the mouse genetically engineeredmodel becomes useful. the [unintelligible] lab published two yearsago a p53p10 genetically engineered model that leads to spontaneous high grade gliomai'm not going to go into the details but we took these mouse tumors and profiled themand asked, well, p53 and p10 are seen in human proneural sub types gb based on tcga dataand when we take the mouse tumors and profile
them and ask what type of tumor are they,you can see that there are significantly enriched for proneural signature genes. in other words,the p53/pten model is the proneural model gbm. and consistent with that, mir34a expressionis very low in these tumors. and pdgfra is over expressed. and in fact, this is an ihgfrom the paper showing that in the tumor part of the brain you see high level activationpdgfra which you don't see in normal brains. and we can show in the system that mir34adoes regulate pdgfra. and the same thing with notch signaling; we can use the jim model,the p53/pten cells, and show that when mir34a is knocked down and they form tumors, thosetumors have high evidence, high level activation of notch signaling.so what that means is that by using the model
system, the in vitro system we have basedon in silico prediction data starting with the sort of multidimensional data from tcgausing a network modeling, reformulated a hypothesis. that is, mir34a defines a subset of tumorand this subset of tumor looks like proneural and they activateâ they have concurrent activationof both notch and pdjfra. and now this isâ and we can show in the in vivo model thatsimulates human proneural gbm that this does happen in a real tumor. so that gives us a framework to understandhow mir34a may be contributing to molecular signature of proneural sub type and importantlyit gives us a hypothesis that we can test. a hypothesis that says, perhaps mir34a definesa sub type of glioblastoma that's sensitive
to combining an activation of notch and pdgfraand there are actually drugs targeting bones pathway in clinic. so this is an example where starting witha biological question, leveraging tcga data and really using high level computationalmining to develop a framework to test the hypothesis. but at the end is understandinghow the event contributes to tumor to lead to a hypothesis that is translatable. nowthis remains to be tested but it is a testable hypothesis and it may lead to transitionalimpact. so what i hope to show what that one exampleis that we can do cancer genomics. i think the technology is here. the capability ishere. but let's not underestimate the challenge
of analysis, not just bio and thematic analysisbut higher level computational analysis that formulate hypotheses, provide framework forexperimental testing and understanding the mechanism and ultimately it's that understandingthat leads to actionable information that we can translate. now, translation is noteasy but we need to also focus on doing this part. in the last minute, i want to come back tothis little box here to say something really quickly. yes, we can get samples. we can profilesequence the tumor and generate cancer genomic data, but how weâ what type of sample wesequence and how we collect the sample and what information we know about the sampleis also an important aspect. it's not easy
to do but we need to think about it becausei'll give you one example. right now, a lot of the community's effort really focuses onidentifying new targets and new biomarkers. but we have to remember that there's anotherreally important impact that genomics can pay play which is improving the way we managethe early stage diagnosed patients. why is this important? because that's our majority.the majority of our patients are diagnosed at low stage. they currently are treated bysurgery and triage based on pathology and clinical staging and much of our cutting edgeefforts focus on the tip of the iceberg. now that's not important, but we shouldn'tforget this and the reason is we need to do a better job. i think this was supposed tobe animated, i forgot, because what we know
clinically is that pathological and clinicalstages cannot identify all the patients that can be cured by surgery alone. ten to 15 percentinherently poor prognosis and they need additional therapy [unintelligible] but we have no wayof identifying those patients. what we need now is we need molecular characteristics toidentify that for us, such asâ i'm going to skip this partâ such as a paper that isin the same issue this week in "nature" identifying prognostic markers. they identified prostatepatients diagnosed with prostate cancer and identified their risk of recurrence so thatwe can enlist them more appropriately into therapy or not. that has tremendous healthcare, economic as well as quality of care impact.
so how does genomics help here? i think weneed to think about evolution of cancer genomes differently. i think a lot of data now showthat it doesn't happen as a serial [unintelligible] event where you pick up one event, you becomeearly stage cancer. you pick up another mutation, now you've become more aggressive and so on.in fact, we know in some genomics study and some mechanistic study that at the transitionpoint from benign to malignant, these tumors already have numerous alterations in theirgenomes. and it depends on the hand that they are dealt with. they are either inherentlyvery aggressive, or they're inherently benign. that's not to say they couldn't acquire moreevents but they're predestined at the beginning to behave a certain way, which means if wecan get at this early stage cancer and understand
what the genome is and identify the genomicevent that predicts poor outcome or aggressive behavior we can identify the high risk patientsamong the early diagnosis and provide them with inappropriate [unintelligible] therapyand spare the ones that can be cured without toxic downstream therapy. now one other point that i was going to pickup, but i do want to mention thatâ dr. lifton mentionedâ that in order to do this, we cansequence many human patients' early stage cancers and find what's different betweenaggressive and benign. but these are early stage tumors. they are [unintelligible] morethan the late stage and you have to have very long follow up to know whether they have goodor poor prognosis. this is where the extreme
case becomes really helpful and we can getto the extreme case using genetically engineered models. we engineer them to have black andwhite outcomes and that can serve as a starting point in leveraging evolution conservationas another to get us in to a shorter list of candidates which we can then take theminto functional studies, identify ones that can truly try metastases, and it turns outthese are also oncogenic by themselves so they're really true therapeutic targets aswell. and importantly, we have shown, at least inour study, that these are not as expected not just prognostic in a particular lineagethey could be lineage prognostic. this is an example of the metastases that we identifiedearly stage melanoma that can be prognostic
in melanoma but they turn out can also beprognostic in three different cohorts of breast cancer, suggesting there are some fundamentalprocess that early stage cancer cells have. if they have those genetic events they arewired to behave more aggressively whether they are in the skin or the breast micro environment. so at the end i want to say that let's notforget that cancer genomics can impact here to identify prognostic molecular based markerswho complement our standard of care which is pathologic and clinical staging. and sincethese tumors deregulated early stage cancer they can be identified by a form of prognosticbiomarker and therapeutic target. and importantly, since we can identify the functionally activeones and through mechanistic studies we can
also predict what the right therapy is thatthe patient would most likely benefit from in an [unintelligible] setting. so i'll end here by saying that i hope thatwe all believe and hope to see the cancer genomics will impact and lead us to genomicmedicine or personalized medicine but on a different level not just in therapeutic targetsbut ultimately in early prevention, early detection and management of early stage patients. i just want to say one more thing in termsof acknowledgement. the mircorna mir34a study started by johnny, from isla, from jim cohen'slab, aided by sachet who's our computational biologist and also our team at the broad whoreally work on the analysis pipeline. thank
you. [applause]
No comments:
Post a Comment