nicolas stransky:-- to find new drivers in cancer, and in particular kinase drivers in cancer because we’re mostinterested in kinases. so, as we’ve -- okay -- as we’ve discussed several times alreadyin this meeting, kinase fusions, or fusions in general, are a result of a genomic instabilityin cancer. and we know of many examples of strong drivers in cancer -- for example, bcr-ablor eml4-alk -- that are -- that are frequent in cml and lung adenocarcinoma, respectively.and these are very strong drivers because the clinical trials and approved therapiesassociated with these alterations are very successful therapies to treat these cancers. and rna-seq data produced by the tcga is avery powerful data set to discover new kinase
fusions, new fusions in general. and to date,there are more than 10,000 samples that have been sequenced in tcga across 33 tumor types. so, we sought to find new kinase fusions acrossthis data set. and initially, i used some the public -- publicly available algorithmsat the time, in 2013. but for many of them, either the time to run those algorithms were-- was way too long, above half a day or a day on eight cores per sample -- so, thatwas really too much to run on the entire data set -- or they didn’t have enough -- theywere not sensitive enough to discover all the -- all the fusions, and they were missinga lot of the known fusions. so, we sought to develop a new algorithm that would be bothvery sensitive and also very fast to be able
to run across the entire data set. and so,we ran it on this -- on this data set. there are a few of these cancer types thatare still under embargo from the tcga, so i’m not going to present any results aboutthose. but i will have the -- what i’m presenting is the data on all the remaining samples.so, rapidly what the algorithm is doing is really what all the kinase fusion -- or allthe fusion detection algorithms are doing. first, there is an alignment to the genomeusing our knowledge of the transcriptome. and this is done using the star aligner, whichis a very fast aligner. so, this is done -- and so, conveniently, what the star aligner doesis separating the align rates into two different [unintelligible], one containing all the reads-- paired reads that align perfectly to the
genome, and another [unintelligible] thatonly contains the chimeric rates that could be supportive of a fusion. and then, what the fusion detection algorithmdoes is go through this smaller [unintelligible] only containing reads that are -- that are-- that are not aligning properly to the genome and finding if there are any pairs that couldbe supportive of a fusion. so, in a sense, it really -- it can discover any single fusionthat is present in the data, and really the [unintelligible] a very powerful data setto discover fusions because, in the -- there is -- there are really a lot of reads in thedata that support these fusions. and so, in the final step of the -- of thepipeline, there are some false -- there were
some false positive detection that reducesthe number of false positive and passenger events that can be seen in the data. and i’mgoing to describe how that can be done. so, i want to spend a minute, really, here becausethese post-processing tests that are applied to the results are very important in orderto discover what could be real driver events in those tumors, as opposed to passenger orfalse positive events. so, as i said, there’s -- first step inthe pipeline is the fusion detection step. that is -- that -- where all the supportingreads for all the fusions are assembled and counted. and then, in the post-processingstep, there are heuristics to find, first, the passenger events. and these are real fusionsthat exist in the data but that don’t have
the properties of real fusions. for example,they don’t have an exon-exon junction, or their coding sequence is not in frame, orthey’re cutting through protein domains making the proteins unstable, or finally inthe case of kinase fusions, they do not -- they would not contain a kinase domain. and i want to stress on the fact that thereare really many of these kinase fusions that exist in the data that do not contain a kinasedomain. there are many examples of such recurrent pseudo-fusions that are a result of a copynumber amplification. examples such are [unintelligible] in breast cancer or -- other kinases thatare known to be amplified do have some fusions in rna-seq, but either they do not containa kinase domain, or they result in unstable
proteins. so, it is really important to checkwhether the putative -- the translated sequence could be supportive of a -- of a real proteinactivating event. in a -- in another step, there are also heuristicsto filter out false positives. and this is done using a large data set of normal samples,both from the tcga -- there are about 600 rna-seq normal samples -- and also from thegtex data -- there are 3,800 samples that have been sequenced by the gtex project. andso, the union of both data sets -- so, i’ve run the same pipeline -- fusion detectionpipeline on the -- on the union of these two data sets to discover what could be falsepositive events and then filtered out anything that will appear at a certain frequency -- abovea certain frequency in those -- in this data
set. so, in the end -- in the end, there isa list of kinase fusions that we think could be -- could be functional. also want to mention one reason for some falsepositives, which is the very high expression of one of the partner genes in the fusion.and those results in very nonspecific fusion events between two genes that occur withouta clear breakpoint. so, that is also a very frequent reason for which some genes mightappear as fused with others. so, the output of the pipeline is presentedhere across the tumor types that -- across several tumor types. and what i want to showis that there’s -- as expected, there is a very -- the fusion -- the frequency of kinasefusions, or recurrent kinase fusions, varies
greatly from one cancer to another. thyroidcancer is -- as it -- as it is known, is -- contains more than 13 percent of such events. and onaverage, across cancers -- across solid tumors, there is about 2 percent of recurrent kinasefusions. this is a plot that presents the thresholdsthat have been applied in terms of the number of chimeric reads and split reads to filterfor fusions. and there are two things that i want to say here. one is, the plot on theleft presents the recurrent kinase fusions that have been discovered by the pipeline.and the color code separates those as -- with known fusion -- known kinase fusions, andin blue, the novel kinase fusions that we’ve discovered. and there’s -- there reallyis no clear bias in terms of the number of
reads. so, there -- these novel events thatwe’ve discovered -- they don’t have less genomic evidence. another thing i want to show is that the passengerand the single [unintelligible] -- so, the nonrecurrent kinase fusions that are presentin the data -- they tend to have lower numbers of reads. so, really, applying these thresholdsis necessary to filter out a lot of the false positives. but at the same time, we don’twant to miss -- we don’t want to miss novel events. so, this is the result of the entire pipelinepresented in this matrix, where the genes are -- the recurrent kinase events are presentedon the left across 26 tumor types at the bottom.
and there are four -- roughly -- so, thereare four types of fusions that i want to describe. first, the pipeline was able to recapitulateall the known fusions that were -- that had been described in all of these cancers, andmany of which have been already described today, such as the raf1 fusions or fgfr fusions.so, we do find all of these events. and all the fusions that had been describedin the -- in the tcga papers and others have been recapitulated by the -- by the pipelines,with a few notable events. for example, the prkaca fusions that have been discovered lastyear in fibrolamellar hcc -- there are six examples of those in the tcga data, two ofwhich are annotated as fibrolamellar hepatocellular carcinoma. two are annotated as hcc, and twodo not have clinical annotations in the -- in
the tcga. other -- so, there are many eventsthat we were able to recapitulate this way. second, there are novel -- there are fusionsthat were involving known kinases but for which we found new partners. and that is reallya recurring theme for many of the fusion-finding efforts. there really are a lot of gene partnersthat tend to be fused with kinases and other genes in general. so, that is really a messagefor diagnostic efforts to find those fusions because they cannot be specific for just onedriver gene and its partner. they need to be really agnostic of the partner gene. third, and more importantly, there are somenovel indications where we found some of the fusions that were known before. for example,the ret fusions could be found in colorectal
adenocarcinoma or breast carcinoma. anotherexample is the -- in the fgfr3 fusions that could be found in prostate cancer in a -- inone sample. so, really, there are several examples of these fusions at low frequencyin other cancers. and finally, and most importantly, there aresome novel examples of kinase fusions with genes that were not known to be -- to be involvedin fusions before. first, met and pik3ca, which are obviously known oncogenes, are -- andthat were not known to be involved in fusions before -- we found several fusions for thesetwo genes, six fusions in met and four in pik3ca. but there are also some other kinasefusions that are -- that we believe are drivers in those samples that are involved in kinasesthat were not known to be associated with
cancer before. and an example is fgr, forwhich i’m showing here four samples that harbor fusions of fgr. there is also a fifthone in a -- in a cancer type that has not been released by tcga yet. so, quickly, i want to go through a few ofthe examples. in ret, we were able to describe some novel indications. these are plots thatdescribe the putative sequence of the -- of the protein -- the predicted protein. these-- so, these are known partner genes, ccdc6 and erc1, in, respectively, colorectal adenocarcinomaand breast cancer. they contain the coding domain that causes ret to dimerize, you know,to activate. we found other -- these are examples of novel partners for ret fusions. they containthe kinase domain. they contained also -- the
parter gene contributes [unintelligible] domainsthat create -- that cause ret to dimerize [unintelligible] activate. so, these are allin thyroid carcinoma. these are examples of the filters that we apply to verify that thesefusions are functional. i also want to describe an example of met,the -- sorry, met and pik3ca fusions, starting with pik3ca actually. these fusions are interestingbecause they are really 5‘utr fusions in some examples. tbl1xr1 is contributing onlythe 5‘utr exon, and it is fused with the first -- the exon 2, which is also 5‘utrin pik3ca. so, really, the entire coding sequence of pik3ca is expressed in those fusions. andit has the effect of over-expressing pik3ca, as is shown in these three plots. here shownin breast cancer with pik3ca is known, of
course, to be mutated and amplified -- thiswould be a third way of activating pik3ca by fusions, where we see that the two samplesin red and also with the arrows harboring the fusions are among the highest expressersof pik3ca. and the same is true for the two other tumor types. with met, the two -- so, we found two examplesin kidney papillary carcinoma, where met is known to be a driver, highly mutated. theseexamples of met fusions contain a [unintelligible] domain that is causing met to dimerize and-- or to -- or to activate. there are also four other cancer types in which we foundmet fusions, but this one is particularly interesting because met was already knownto be a driver there.
this is an additional evidence that the pik3cafusions are real. they’re -- the four samples are shown in the -- on the left. these areonly the reads that are involved in the -- in the fusion, so only chimeric and split readsinvolved in the pik3ca fusions, showing here the utr -- 5‘utr of pik3ca. so, these arereal promoter of fusions causing over-expression of pik3ca. and lastly, i want to show an example of thisnovel fusion that we discovered, between wasf2 and fgr. so, this is also an example of a5‘utr fusion, akin to the pik3ca fusions that i just described. wasf2 -- sorry, fgris a src family kinase. it is known to be highly expressed in some hematopoietic cellsand some cancers as well. it had not been
involved with -- and had not been previouslyimplicated with cancer, so we don’t really know its role or anything in cancer. however,we know that it is a viral oncogene homolog, so it could potentially be oncogenic. the five fusions that we found in the tcga-- [unintelligible] across five different tumor types are all in the -- in the formof wasf2 fused with fgr. so, this is a strong argument that this very recurrent event, andit could be oncogenic. again, as i’ve shown with pik3ca, the same is true with fgr. thesamples in which the fusions are present over-express fgr at almost the highest level without copynumber amplification. this is the fifth tumor type that i was talkingabout. and interestingly, we have also found
this fusion in a cell line and also, becausewe were talking about the clinical implications of some of the tcga work earlier, we havebeen -- we are collaborating with a large hospital that has included fgr in their -- intheir clinical sequencing panel. and these fusions have also been found in patients.so, we have -- we’re hopeful that this could be discovered in additional tumor types, maybeat higher frequencies than the frequencies that i’ve shown here. and finally, because i’m over time, ntrk1/2/3fusions are recurrent across several cancers. this is the same theme as fgfr fusions thatare recurrent across cancers. so, this re-underlines the necessity of looking for these fusionsin more than just specific tumor types for
diagnostic purposes. so, finally, i want to summarize my -- thistalk in two -- with two parts. one, these -- there were updates to the study we publishedlast year that i presented today. for -- first, there were additional chimera types that wereanalyzed and presented here. the one thing that i didn’t talk about, the fgfr2 fusions,are found in cholangiocarcinoma at a -- at a frequency of about 10 percent in cholangiocarcinoma.is one of these six new cancer types presented here. jak1 fusions are also novel. they werenot described before. and there are two that we found in two different cancer types betweenalg14 and jak1. there are two novel fgr fusions compared to the study we had published. andalso, the prkaca fusions in liver cancer are
a very important feature in this -- in thisanalysis because there are six of these fusions in the -- in the liver cancer data set. and finally, some broader implications ofthis work -- this was the first pan-cancer fusion analysis. we focused on kinase eventsbecause these are strong drivers of the disease. and we believe that this could have profound-- the discovery of novel fusions could have profound implications for diagnosis, treatment,and drug discovery. finally, i want to thank the cancer genomeatlas because none of this would have been possible without this data being publiclyavailable. and also, i want to thank my other colleagues at blueprint, starting with christophlengauer, our cso, and other colleagues that
have participated in this work. thank you. [applause] male speaker:nico, great talk, and really a great illustration of how the tcga data can be used beyond thenetwork and in -- and in drug discovery. so, thank you. and i was really interested bythis fgr fusion because one of the things that’s been really mysterious to me, youknow, for the last 20 years or so doing cancer genomics, is -- where are the activating srcfamily kinase alterations in the human cancer genome? and this could be, to my knowledge,the first one. and so, what i’d like to know next is, what’s the evidence that thesefgfr fusions are functionally important? and
if there is evidence, are you at liberty toshare any of it? nicolas stransky:so, what i can say is that we’re working on it. [laughs] the second thing i want tosay is that actually some papers coming out of your lab show that fgr was one of the genesthat, when over-expressed, could induce resistance to some therapies. male speaker:absolutely, yeah. resistance to [unintelligible] therapy. nicolas stransky:right. male speaker:but it’s not -- it’s -- and if i recall
correctly, it’s in a region of amplificationin some egfr mutant lung cancers as well. nicolas stransky:right. and so, this is -- i think these cancer types are not the ones where we’re goingto find fgr fusions at a high frequency. there might be a rare cancer type, just like [unintelligible],where fgr is a main driver event. there are no activating mutations that have been describedin the tcga. there are really very few events. yes, it’s -- it is an interesting event. male speaker:actually, my question is related this and goes along with the idea of trying to inferwhich of these fusions are really driving and having a major impact for the tumors harboringthem. have you tried to look at the local
amplification or status of the fusion genes-- because one of the point we always find is that those that seem to have [unintelligible]biological dominant function -- clearly, those fusions don’t show these broad amplificationevents. they show extremely micro-focal amplification events that typically involve one or sometimesboth of the regions implicated in the fusion genes. and so, have you tried to go on snparray -- you know, high-density snp array, and see whether you can identify very focalamplification events upon the [unintelligible] genes? nicolas stransky:right. so, for the known fusions, i haven’t done this work. but for the novel ones, i’vebeen looking at them particularly closely
to know if there were some amplificationsthat could -- that could maybe cause these to be artifacts of -- that could maybe causethe fusions to be artifacts in the amplification. and it is not the case. there are no amplifications,or there are maybe broad amplifications around the gene but no focal amplifications. however,the snp arrays are not the -- definition of the snp arrays is not sufficient to reallysee these events at a very -- yeah, to see very small events. the other thing i can share is that thereis one sample harboring fgr fusion that has whole-genome sequencing. and i -- so, forone of these five, there is whole-genome sequencing. and i was able see also the fusion in thatsample, which seems to be just a chromosomal
break without amplification. male speaker:i mean, sometimes even those [unintelligible] if you go to the single probe level, you canactually find those type of things. but it’s true, if you do a standard segmentation, itcan be very [inaudible]. female speaker:okay. thanks, everyone. and enjoy your lunch, and please come back at 2:25. thank you. thankyou for the nice talk. [end of transcript]
No comments:
Post a Comment