Alterations in oncogenes or tumor suppressor genes underlie the driving forces of carcinogenesis. An oncogene is a gene that causes cancer through activating mutation or expression at high levels, while for a tumor suppressor gene, it is the loss or reduction of function that leads to cancer. Research in cancer biology has identified hundreds of genes involved in different stages of tumorigenesis [7, 17]. The alterations in these oncogenes or tumor suppressor genes can come from a variety of sources, such as single nucleotide polymorphisms (SNPs), copy number variations (CNV), chromosomal regions, viral integration, gene fusions, etc. There is another type of event called a passenger mutation, which also commonly occurs in tumor tissues. However, such passenger mutations have no effect on the growth of tumors and they usually hitchhike on a near-by tumor driver gene’s alteration. It is an important research question to distinguish true tumor driver mutations from artefact events such as passenger mutations in order to better elucidate tumor oncogenesis and evolution. As the names “oncogene” and “tumor suppressor gene” suggest, previous systematic searches for tumor driver genes have mostly adopted the paradigm that a positive association between up-regulation and gain of function vs. tumor proliferation and worse survival hints at a possible oncogene, while for tumor suppressor genes, a negative association is expected. For example, Bric et al. conducted an RNA interference (RNAi) screen for tumor suppressors through selecting for small hairpin RNAs (shRNAs) capable of accelerating lymphomagenesis in a mouse model . Koso et al. mobilized the Sleeping Beauty transposon system in mice and profiled insertions that promoted medulloblastoma formation in the cerebellum . Wrzeszczynski et al. carried out a bioinformatics screen for candidate ovarian cancer oncogenes or tumor suppressors by first looking for genes with significant amplification or deletion across tumor samples . Regardless of the different specific designs, there is one common feature shared by most such screening studies. They all assume a monotone (either positive or negative) relationship between the end-point outcome and their genes of interest.
However, there remains the possibility that a true driver gene could actually exhibit a non-linear association with end-point observations. That is to say, both its up-regulation end and down-regulation can lead to aggressive tumor growth or metastasis, or vice versa. With a slight abuse of terms, “regulation” here includes any type of copy number variation, mutation, or RNA expression level change. Recently, Shen et al. explored the existence of such genes, which can potentially perform both oncogenic and tumor suppressive functions, through database searching and text mining . They identified 83 genes that have dual functional annotation according to the literature. Most of these genes are transcription factors. They can both positively and negatively regulate transcription, which serves as the basis for their potential dual role in cancer development. These genes usually carry genomic mutation patterns similar to those of oncogenes, and expression patterns resembling those of tumor suppressor genes. TP53 is an example of one whose tumor suppressive effect, as exerted by activating DNA repair proteins, arresting the cell cycle and initiating apoptosis, is well known. On the other hand, more than 80% of the somatic and germline TP53 alterations found are missense mutations rather than nonsense or frame-shift mutations, which usually lead to loss of function. The strong selection to maintain expression of the full-length p53 mutant protein and its accumulation in the nucleus is an implication of gain-of-function and oncogenic mutation . An in vivo knock in experiment has shown that many mutant p53 variants are essential for neoplastic transformation . Another close example is Notch, which is an oncogene in cancer types like T cell acute lymphoblastic leukemia (ALL), and a tumor suppressor gene in other types like B cell ALL . A more concrete example would be c-Myc whose dual role in leukemia was described by Uribesalgo et al. . They showed that the c-Myc/RARα complex could function either as an activator or a repressor based on the c-Myc phosphorylation status.
Although to the extent of our knowledge at present, there is no solid evidence of a gene that can perform both oncogenic and tumor suppressive effects in one cell line, the possibility cannot be ruled out. Such genes may be overlooked by traditional approaches, as these assume a linear association. Even if not a true bifunctional gene, a gene bearing a true function and a passenger event (e.g. a tumor suppressor gene coincidentally amplified with a nearby oncogene) can easily confound analysis, leading to its failure to be discovered as a hit. Therefore, it is important and worthwhile to explore whether there exists a non-linear association between genomic features and end-point outcomes, what the abundance is, and how it occurs if it does exist. As far as we know, no such study has been proposed to answer these questions.
In this study, we carried out a large-scale bioinformatics screen with the motivation to search for genes that have tri-modal association with end-point observations. First, we divided patients or cell lines into “lower than normal” (“low”), “similar to normal” (“middle”) and “higher than normal” (“high”) groups based on the expression levels of each investigated gene in tumor samples with respect to normal samples. To do this, we devised an algorithm based on Expectation-Maximization (EM)  that takes into consideration the expression levels of both normal samples and tumor samples for each gene. Then we focused on a specific scenario where candidate targets whose “low” and “high” groups of patients were both associated with worse survival and higher tumor grade compared to the “middle” group of patients. We termed this a “tri-modal” association.
This study will mainly focus on breast cancer, which is the most common type of invasive cancer in women. Breast tumors can be graded with the Nottingham Histologic Score system . In this system, a grade of 1, 2 or 3 is given to a breast tumor, where 3 has the poorest chance of prognostic survival. A number of tumor driver genes have been previously identified in breast cancers. For example, ERBB2, ESR1 and c-myc are breast tumor oncogenes; p53, p27, Skp2, BRCA-1 and BRCA-2 are breast tumor suppressors [20, 32]. Breast cancer can be divided into 5 subtypes according to the PAM50 assay , which include luminal A, luminal B, HER2-enriched, basal-like, and normal-like subtypes. The basal-like breast tumor subtype largely overlaps the triple negative type of breast cancer, which lacks or shows a low level of ESR1 and PGR expressions, and lacks ERBB2 amplification. Estrogen-receptor (ER) negative breast cancer, which generally includes basal and HER2 subtypes, is characterized by aggressive clinical behavior and resistance to hormone deprivation therapy . In our study, we replicated our analysis across an array of breast tumor patient cohorts, including the following: (1) the Metabric study , where a total of ~ 2000 patients are available and divided into a discovery set and a validation set; (2) the Cancer Genome Atlas (TCGA)  breast cancer study, where ~ 1000 patients are available; (3) the GSE18229 study , where 337 breast cancer patients are available; (4) the GSE20624 study , where 344 breast cancer patients are available; (5) the GSE20685 study , where 327 breast cancer patients are available; and (6) the GSE22133 study [12, 13], where 359 breast cancer patients are available.