Abstract
Pancreatic cancer (PC) is characterized by high mortality and poor prognosis, due to difficulties in early diagnosis and limited therapeutics. Although there is an intervention window before preneoplastic lesions advance to invasive PC, effective prognosis remains a hurdle using current biomarkers and imaging techniques. Biomarkers with significant prognostic efficacy and convenient analysis methods are urgently required. Hence, this study aimed to identify a potential biomarker for pancreatic cancer through an in-silico analysis of the Human Protein Atlas. Filtering yielded TPX2, MSLN and S100A2 as the top hits and ultimately TPX2 was selected based on the prognostic p-value, survival analysis and gene/protein analysis.
Introduction
Pancreatic cancer(PC), the 4th leading cause of cancer deaths in Europe and USA, accounts for 227,000 deaths/year worldwide. Without improvements in curative therapeutics, PC is anticipated to be 2nd leading cause of death in 2030.1 PC occurs between the age of 70-80 years, with the risk increasing with age. Despite developments in detection and management of PC, only about 4% of patients will live 5 years after diagnosis.2
Risk factors for PC include smoking3, male sex, family history of chronic pancreatitis and pancreatic cancer2, diabetes mellitus, non-O blood group4,5, occupational exposures6, African-American ethnicity, advancing age, a high-fat diet, diets high in meat and low in vegetables and folate, obesity, Helicobacter pylori infection and periodontal disease.2
Genomic instability results in highly heterogenous chemotherapy-resistant PC tumours. Invasive PC harbours KRAS activation, inactivation of tumour-suppressor genes including CDKN2A, TP53, SMAD4,1 and BRCA2,7 chromosomal losses, gene amplifications,8 and telomere-shortening.9
Additionally, epigenetic dysregulation of DNA methylation, histone modifications and non-coding RNA alter PC gene function. For instance, promoter methylation and gene silencing of CDKNA2A,10 MLH1, CDH1, CDKN1C, RELN, SPARC, and TFPI2 genes;11–15 and promoter hypomethylation of SFN, MSLN, and S100A4,16 and mucin genes. Overexpression of miR-21, miR-34, miR-155, and miR-200 are also linked to neoplastic PC progression. 17–20
The fatality of PC and the advanced-stage patient presentation at diagnosis has led to clinical trials for assess the best screening method for individuals with a family history of PC and pancreatic neoplasia.21,22 Unfortunately, none have proven sufficiently specific for prognosis.23–25 Most people who present with symptoms attributable to PC have advanced disease.26
Pancreaticoduodenectomy is the primary form of curative treatment. Other forms of treatments include adjuvant- and neoadjuvant-therapy for patients with curative resection,27–30 chemoradiotherapy for local resectable cancer,31–33 and Gemcitabine-based combination treatments for advanced PC.34–36
Despite the increasing number of PC prognosis biomarkers, no molecular signatures have been validated for clinical routine implementation. Thus, good results in larger validations are urgently needed before application.
Furthermore, chromosomal gains of 20q are prominent in adenocarcinomas and PC.37 TPX2 is a candidate oncogene showing copy number–driven overexpression in non–small-cell lung cancer and PC.38,39 However, its frequency and level in PC is unreported. TPX2, a microtubule-associated protein downstream of Ran-GTP, plays a role in mitotic spindle formation and chromosome segregation. During interphase, TPX2 is localized to the nucleus by importin α/β interaction.40 During mitosis, TPX2 interacts with Aurora A kinase, leading to Aurora A localization to the mitotic spindles microtubules.41
Thus, the aim of the study was to bioinformatical mine the Human Protein Atlas(THPA) to identify potential prognostics markers for pancreatic cancer.
Materials and methods
To conduct an in-silico analysis of THPA, the database was first queried for prognostic genes for PC. The queried hits were then filtered for those with protein data related to overall patient survival in the Pathology Atlas(PA). This filtered was applied so that the protein expression could be compared between the normal and disease samples. The PA filter listed proteins that possessed information about the mRNA data, clinical outcome and prognostic association of genes in major cancer types, protein expression across major cancer types and Kaplan-Meier survival plots.
The resultant hits were further filtered based on enhanced and supported reliability score tissues (IHC). The list was further narrowed down to the ones that were stained with one or several antibodies with non-overlapping epitopes targeting the same genes and had been validated based on orthogonal or independent-antibody validation methods. In contrast to enhanced reliability score, supported reliability score filtered for those hits which were consistent with RNA-sequencing and protein/gene characterization data.
Next, the hits were then refined further to focus only on unfavourable prognostic genes since that would list genes that were associated with poor disease prognosis. The same was done for favourable prognostic genes.
The hits were then sifted for genes that possessed data in the Cell Atlas(CA) as this would allow provide sub-cellular localization information of the gene product. In a similar fashion, the genes were further refined for cancer-related genes since the disease being examined in this study is a cancer type.
The resultant hits were then scrutinized for function, protein, subcellular localization, isoforms, gene, RNA and protein expression in pancreatic cancer and normal tissue, correlation between RNA and protein expression, antibody quality and prognostic potential.
Results
To identify a potential biomarker for pancreatic cancer, THPA was mined and analysed using several different filters.
Upon filtering the PA for unfavourable prognostic PC genes, 669 hits were obtained. Next, filtering of genes possessing protein data in the PA yielded 589 hits. Then, refining the hits for enhanced reliability score tissues (IHC) results in 194 hits. The resultant hits were refined for the ones that had protein data in the CA and lead to 161 hits. Lastly, on filtering the hits for proteins associated with cancer genes lead to 38 genes.
The hits were scrutinized for their prognostic p-value, protein/gene/RNA expression, cellular localization, survival analysis.
Figure 1. An image of some of the filtered genes displayed alphabetically
The survival scatter plots displayed the clinical status(dead or alive) for the patient cohort. Patients alive at follow-up time were represented in blue while the patients who had died were in red. The x-axis shows the gene expression of tumour sample at the time of diagnosis was represented on the x-axis. The y-axis represents the follow-up time after diagnosis. The top density plot showed the expression levels distribution among the dead and alive, stratified using the cut-off indicated by the vertical dashed line on the survival scatter plot. This cut-off was selected based on the FPKM that minimizes the p-value. The Kaplan-Meier plots summarize correlation analysis between the mRNA expression levels and the patient survival.
Table 1. Summary of the top hits obtained from the filtering steps of the Human Protein Atlas.
Save your time!
We can take care of your essay
- Proper editing and formatting
- Free revision, title page, and bibliography
- Flexible prices and money-back guarantee
Place Order
- Gene Gene description Protein class Subcellular localisation Prognostic p-value
- TPX2 TPX2, microtubule nucleation factor Cancer-related genes
- Predicted intracellular proteins Nucleoplasm
- Cytokinetic bridge
- Mitotic spindle
- Microtubule organizing centre 2.68e-5
- MSLN Mesothelin Cancer-related genes
- Disease related genes
- Predicted intracellular proteins
- Predicted membrane proteins
- Plasma proteins
- Predicted secreted proteins Neoplasm
- Vesicles 6.66e-4
- S100A2 S100 Calcium binding protein A2 Cancer-related genes
- Predicted intracellular proteins Nucleus
- Nucleoli
- Plasma membrane
- Cytosol 2.04e-4
TPX2
TPX2 was shortlisted since the high expression of the associated protein was significantly associated with unfavourable outcomes for renal, liver, endometrial and pancreatic cancers.
In the Tissue Atlas(TA), the Islets of Langerhans show medium TPX2 protein while protein expression was not detected in the exocrine glandular cells. Additionally, the RNA expression was enhanced in the testis and the thymus but it was absent in the 174 pancreas samples.
Further, the CA showed that the protein was localized to the nucleoplasm, microtubule organizing centre, cytokinetic bridges and mitotic spindle.
In the PA, the RNA-sequencing data from the Cancer Genome Atlas(TCGA) showed RNA expression in all the tissues. Additionally, 11 of 12 samples showed high/medium protein expression. Distinct nuclear staining for this protein was observed only in a small subset of malignancies. However, there was no alternative antibody staining available for this protein. Furthermore, the survival scatters plot showed a median expression of 9.18FKPM for the dead and 6.2 for the alive. The average RNA level of 176 samples was 9.1FKPM while the cut-off was 9.13FKPM. The survival analysis had a p-value of 0.000027. The 5-year survival for individuals with higher expression than the cut-off was 21% while that of the patients with lower expression that the cut-off was 34%. Moreover, this gene has only two transcripts.
MSLN
MSLN was shortlisted since the high expression of the associated was seen to be significantly linked to unfavourable outcomes for renal and pancreatic cancers.
In the TA, the Islets of Langerhans and the exocrine glandular cells displayed no protein expression. Additionally, the RNA expression was enhanced in the fallopian tube and lung but it was almost negligible in the 174 pancreas samples.
Further, the CA showed that the protein was localized to vesicles and nucleoplasm.
In the PA, the RNA-sequencing data from TCGA showed RNA expression in all the tissues Additionally, 9 of 12 samples showed high/medium protein expression. Most ovarian and pancreatic cancers and a few colorectal, endometrial and stomach cancers displayed moderate/strong cytoplasmic and membranous immunoreactivity. Further, the alternative antibody for the same protein showed similar patterns of protein expression. Furthermore, the survival scatters plot showed a median expression of 1.1e-2FKPM for the dead and 65 for the alive. The average RNA level of 176 samples was 126.2FKPM while the cut-off was 14.06FKPM. The survival analysis had a p-value of 0.00067. The 5-year survival for individuals with higher expression than the cut-off is 25% while that of the patients with lower expression that the cut-off is 32%. Moreover, this gene has six transcripts.
S100A2
S100A2 was shortlisted since the high expression of the associated was seen to be significantly linked to unfavourable outcomes for pancreatic cancer.
In the TA, the Islets of Langerhans and the exocrine glandular cells displayed no protein expression. Additionally, the RNA expression was enhanced in the oesophagus, tonsil and vagina but it was absent in the 174 pancreas samples.
Further, the CA showed that the protein was localized to the nucleus, cytosol, nucleoli and plasma membrane.
In the PA, the RNA-sequencing data from TCGA shows that the RNA is expressed in all the tissues. Additionally, 2 of 8 samples showed high/medium protein expression. Several cases of squamous cell carcinoma of lung, skin, head and neck cancers and urothelial cancers showed strong cytoplasmic and nuclear staining. Further, the 2 alternative antibodies for the same protein showed similar patterns of protein expression. Furthermore, the survival scatters plot showed a median expression of 8.1FKPM for the dead and 5.1 for the alive. The average RNA level of 176 samples was 60.6FKPM while the cut-off was 3.62FKPM. The survival analysis had a p-value of 0.00020. The 5-year survival for individuals with higher expression than the cut-off is 21% while that of the patients with a lower expression that the cut-off is 48%. Moreover, this gene has six transcripts.
Figure 2. Images of the survival analysis of TPX2 gene. The picture here is the survival scatter plot along the Kaplan-Meier plots.
Discussion
The 5-year survival statistics for PC have remained fairly consistent for the past few decades with no improvement in the prognosis of PC. It is well-established that the earliest forms of PC are curable but it has to be detected before the symptoms appear. Although a variety of strategies have been presented, no successful strategy has been concluded as effective approach and now, non-invasive biomarkers are the hope. Multiple translational studies have explored minimally or non-invasive biomarkers in biological fluids, although prognostic performance has not been further validated.
Thus, this study aimed to identify a potential biomarker for PC through an in-silico analysis of THPA. The identified hits were further filtered based on their prognostic potential and p-value, antibody validation, gene/protein expression, cellular localization and survival analysis. This led to narrowing down the hits to three cancer-related genes-TPX2, MSLN and S100A2. Upon further scrutinizing of the genes, TPX2 was selected.
TPX2 is a microtubule-associated protein that plays a role in cell cycle. Upregulated TPX2 was found in squamous cell lung carcinoma with the expression correlating to tumor grade, stage, and nodal status. The amplification of 20q-chromosome region has been reported previously in 80% of PC tumors.37 These gains were observed at the same frequency in early and advanced stages, suggesting that genes in this region might play an important role in the relatively early stage of pancreatic carcinogenesis. Moreover, TPX2 possessing only two isoforms making it easy to detect through sequencing or antibody. This makes TPX2, a prime biomarker candidate for early PC.
Upon further scrutinizing of the survival data TPX2 had the best survival analysis and intermediate cut-off which would make design of a suitable assay far more amenable. Furthermore, out of the three shortlisted genes, only TPX2 protein expression was seen in the Islets of Langerhans and endocrine glandular cells. It is likely that dysregulated TPX2 could promote PC development and progression through a proliferative advantage. Unlike the other two genes, a larger proportion of PC patient samples showed high/medium protein expression. Moreover, it has only two transcripts
In the CA TPX2 was the only protein localized to the mitotic spindles. Due to the role of TPX2 in activating the Aurora A enzymatic activity and in promoting the progression of mitosis, the amplification of TPX2 is observed confer a proliferation and growth advantage to pancreatic cancer cells compared with surrounding tissue. Furthermore, because Aurora A kinase has been shown to activate the Akt pathway,43 overexpression of TPX2 may also induce cell survival in cancer cells.44 Given the high percentage of pancreatic tumours with activated K-ras, it is possible that knockdown of TPX2 would selectively kill cancer cells.44
Though the genes were examined for numerous factors, TPX2 failed to pass a few. Firstly, IHC staining of pancreatic tissue samples did not show varying levels of staining for TPX2. Most samples showed medium staining and a few showed low staining. This might prove to be a pitfall in the clinic when using IHC to stratify patients in terms of say risk. Secondly, the reliability score tissue (IHC) was only supported since it had data only for one antibody. A validating antibody would have bolstered the data further.
However, little work has been done to explore TPX2 protein levels in pancreatic cancer cell lines and tumor samples. Hence it would be interesting to compare the TPX2 expression between a metastatic PC line and a non-metastatic PC line through a qRT-PCR and IHC. Furthermore, tissue microarrays from microdissection of various stages of PC cells and normal pancreatic cells from PC patients would provide an excellent means to monitor how TPX2 expression changes during the course of the disease and how it differs between normal and diseased conditions. Additionally, TPX2 expression could be assessed for correlation with lymph node metastasis and poor survival in PC patients. Once the TPX2 profile throughout the disease has been established, validation through a test set and replication of patient samples of cases and control would be important in identifying if TPX2 can be used as a prognostic biomarker. Lastly, one could bolster the utility of TPX2 as a prognostic biomarker by comparing it to other clinically used biomarkers to assess if TXP2 possesses clinical utility.
Thus, primary screening using circulating biomarkers followed by confirmative prognosis based on imaging and pathologic results might be the future strategy for prognosticating PC.