Molecular Epidemiology of Cancer: The Future is Now
Molecular Epidemiology of Cancer: The Future is Now
By Robert A. Sikes, PhD
Deciphering what makes a normal cell become a tumor cell has been a central theme of cancer research for the last 25 years or more. The central assumption has been that cancer is the result of stable, heritable changes in gene/protein expression or function. Traditional research methods focused on the development of some altered but measurable characteristic of the tumor cell, like invasiveness, that was then used to isolate the protein(s) (e.g., integrins, extracellular matrix, and proteases) involved in that behavior. These genes were typically cloned after the protein was purified and a function for the protein was ascribed. Reverse genetics, a term coined to illustrate the acquisition of a nucleic acid sequence prior to a functional determination or isolation of a protein, has changed the way we search for meaningful changes in gene expression for cancer and other genetic disorders.
Background
So, where did it all start and where is it going? Before the advent of polymerase chain reaction (PCR), the cloning of differentially expressed genes required some form of subtractive or differential screening procedure. This process was expensive, time consuming, usually radioactive, and resulted in the cloning of a large number of falsely positive sequences. In the post-PCR/human genome project era, several rapid and efficient methods for cloning differentially expressed sequences have been developed. The PCR-based approaches include representational display analysis (RDA), differential display reverse transcription PCR (DD-PCR)1 and serial analysis of gene expression (SAGE).2 In the middle is CuraGen Corporation’s statistically driven DNA Oligonucleotide-direct screening procedure.3 The technology of high throughput DNA sequencing and associated developments in computer informatics, developed largely for the human genome project, has allowed for the rapid and cost-effective analysis of large numbers of randomly acquired cDNA clones or expressed sequence tags (ESTs).4,5 The informatics has facilitated the examination of the novelty of those ESTs, allowed for rapid chromosomal assignation, a determination of the likelihood of tissue-specific expression, and the development of large microarrays of cDNAs and ESTs6,7 for the comparison of mRNA expression between two closely related conditions.
Since mRNAs are polyadenylated they can be reverse transcribed efficiently using oligo d(T)N primers. DD-PCR1 provides specificity to the reverse transcription reaction by the addition of two more bases, oligo d(T)N-XY, onto the reverse transcription primer (so that only a subpopulation of the mRNAs are amplified in any given reaction). Given that there are only three possible choices for the penultimate base X and four possibilities for the last base Y, it should be clear that a single oligo d(T) primer constructed in this manner will only prime one-twelfth of the total mRNA in any given sample. The use of additional, short primers for the 5’-end then further increase the specificity of the product formed during the PCR phase of the reaction. By using combinations of different 5’-primers with the 12 different 3’-anchors, one can efficiently amplify representative species from most of the expressed mRNAs. These products are then resolved in reducing polyacrylamide gels excised, cloned, and sequenced. Initially, this technique gave product ranging between 100-400 bp. Improved DD-PCR techniques now yield nearly complete cDNAs in single runs. We found that to avoid artifacts (false-positive signals), the test samples need to be closely related and the acquired clones’ differential expression needs to be confirmed by additional experiments, like RNA blot or western blot if an antibody is available.8
Serial Analysis of Gene Expression
SAGE is based on the statistical representation of short oligonucleotides in an mRNA sequence.2 Velculescu and colleagues found that an oligonucleotide of 9 bp contains enough information to specify an mRNA species. The mRNAs are biotin-oligo d(T) primed, followed by binding to streptavidin beads and separation into two fractions, using different linkers, A and B, on each fraction. These are then cut with another restriction enzyme, blunted, and the two fractions are ligated together. PCR using primers to A and B are then used to amplify all the sequences between A and B. The PCR product is cut with the anchoring enzyme, and the ditags are isolated, concatenated, and cloned. These concatenated or multiples of "ditags" are now ready for direct sequencing. The efficiency is basically one of informatics; that is, since the ditags are in a linear array, many can be sequenced from one concatenated clone to yield information about several expressed mRNAs. The other benefit from SAGE is that the information acquired is quantitative as well as qualitative. In other words, information about the relative abundance of an expressed sequence is given, as well as sequence/gene identity.
High Throughput Application
The human genome project created the need to acquire a lot of DNA sequence quickly—so-called high throughput application. Some investigators, like Liew and associates, invested in this new technology to examine the complexity of mRNA expression in developing organs.4 In the fetal heart, Liew et al found that almost 47.4% of 3500 ESTs were previously undescribed or uncharacterized. Recently, Nelson and colleagues repeated this type of random cDNA selection and sequencing from the normal human prostate.5 They examined 1168 cDNA clones to get a profile of prostate gene expression. They were able to detect the presence of prostate-specific genes like prostate specific antigen, human glandular kallikrein 2, prostate specific membrane antigen, and prostatic acid phosphatase. Likewise they found 30% matched only to other ESTs and that 6% were previously undescribed genes. Therefore, about 36% of the ESTs expressed in human prostate tissues have no known function or protein product. Nelson et al, therefore, have laid the groundwork to determine which of these genes’ expression changes in prostate cancer.
Research in my laboratory has taken a similar approach to genes isolated from the prostate progenitor, the urogenital sinus. We have examined 728 randomly pulled cDNA clones from a murine urogenital sinus library that account for a maximum of 678 unique cDNAs. We have observed a similar frequency of unique or EST-matching cDNAs as described above. When we screen these sequences against the RNA expression by LNCaP, an androgen-sensitive prostate cancer cell line as compared to the RNA expression by C4-2, an LNCaP-derived androgen-independent prostate cancer cell line, there are only 34 candidate genes whose expression profile changes. This represents less than 5% of the cDNAs sequenced. Since the LNCaP-C4-2 cell lines approximate the progression of prostate cancer from androgen sensitive, non-metastatic to androgen independent and highly metastatic, it is possible that these cDNAs correspond to proteins that are involved in prostate cancer progression.
Multigene Microarray Filters
Another direct result of these large-scale sequencing projects is the acquisition of many cDNA or PCR clones encoding both known and unknown genes. The most recent application of this material has been the development of multigene microarray filters for use in hybridization based experiments to determine differential expression of a gene in almost any given system.9 The filters, or DNA microchips, have been robotically spotted with small aliquots of cDNA or PCR fragments corresponding to as many as 10,000 individual genes or ESTs,6,9 and there is promise of larger arrays to come. The goal is to be able to quickly and efficiently screen the estimated 100,000 genes expressed by humans to determine what factors and timing influence their expression. The filters usually include control genes as landmarks/reference points so that the signals that change can be easily correlated with a gene spot or clone identity. Positive clones can then be ordered from the source for the investigator’s use. Many filter-based microarrays are now commercially available from companies like Clontech or the IMAGE consortium. Caution must be exercised because these systems still have a low level of false-positive signals for differential expression and confirmation by other techniques is still recommended.
Molecular Epidemiology
The end result of these efforts is molecular epidemiology. Biomedical researchers can rapidly screen the expression changes that occur between normal and diseased tissues. The genes expression changes that occur between these two states can then be rigorously studied to determine their role in the disease development and progression and their potential use as diagnostic/prognostic markers, or they may be developed into therapeutic targets. The increasing knowledge of the human genome and the increasing density of DNA microarrays will eventually give us the ability to acquire a complete molecular snapshot of a diseased tissue. These technologies represent a promise to provide new tools for physicians to help guide therapeutic options and to provide new therapies.
References
1. Liang, P, Pardee AB. Differential display of eukaryotic messenger RNA by means of the polymerase chain reaction. Science 1992;257:967-971.
2. Velculescu V, Zhang L, Vogelstein B, et al. Serial analysis of gene expression. Science 1995;270:484-487.
3. Milosavljevic A., Savkovic S, Crkvenjakov R, et al. DNA sequence recognition by hybridization to short oligomers: Experimental verification of the method on the E. coli genome. Genomics 1996;37:77-86.
4. Liew C, Hwang D, Fung Y, et al. A catalogue of genes expressed in the cardiovascular system as identified by expressed sequence tags. Proc Natl Acad Sci U S A 1994;91:10645-10649.
5. Nelson P, Ng WL, Schummer M, et al. An expressed-sequence-tag database of the human prostate: Sequence analysis of 1168 cDNA clones. Genomics 1998;47:12-25.
6. Ermolaeva O, Rastogi M, Pruitt K, et al. Data management and analysis for gene expression arrays. Nat Genet 1998;20:19-23.
7. Schena M, Shalon D, Heller R, et al. Parallel human genome analysis: Microarray-based expression monitoring of 1000 genes. Proc Natl Acad Sci U S A 1996;93:10614-10619.
8. Chen ME, Lin SH, Chung LWK, et al. Isolation and Characteriazation of PAGE-1 and GAGE-7: New genes expressed in the LNCaP prostate cancer progression model that share homology with melanoma associated antigens. J Biol Chem 1998;273:(in press).
9. Lander E. Array of hope. Nat Genet 1999;21:3-4.
The field of research that seeks to determine a gene’s sequence before its protein product has been identified or assigned a function has been termed:
a. clonal genetics.
b. reverse genetics.
c. mendelian genetics.
d. hereditary genetics.
Subscribe Now for Access
You have reached your article limit for the month. We hope you found our articles both enjoyable and insightful. For information on new subscriptions, product trials, alternative billing arrangements or group and site discounts please call 800-688-2421. We look forward to having you as a long-term member of the Relias Media community.