Numéro |
J. Soc. Biol.
Volume 196, Numéro 4, 2002
|
|
---|---|---|
Page(s) | 303 - 307 | |
Section | Du transcriptome au protéome, une nouvelle lecture de la cellule | |
DOI | https://doi.org/10.1051/jbio/2002196040303 | |
Publié en ligne | 4 avril 2017 |
Étude des transcriptomes par analyse sérielle de l’expression des gènes
Study of transcriptomes by serial analysis of gene expression
Institut de Génétique Humaine, UPR CNRS 1142, 141 rue de la Cardonille, 34396 Montpellier Cedex 5
L’Analyse Sérielle de l’Expression des Gènes (acronyme anglais SAGE) permet d’identifier l’ensemble des gènes exprimés dans un échantillon cellulaire ou tissulaire. Elle se fonde sur l’analyse séquentielle d’un grand nombre de courts fragments d’ADNc, dont chacun représente la signature d’un gène. Leur dénombrement donne une mesure précise de leur niveau d’expression. Ce système analytique ouvert n’exige aucune hypothèse préalable sur les gènes exprimés. L’analyse peut donc révéler des produits de transcription encore inconnus, contrairement aux systèmes fondés sur l’hybridation à des collections de sondes nucléiques, exigeant une caractérisation préalable des gènes analysés. Ses caractéristiques font de la méthode SAGE un outil de découverte de nouveaux gènes et de marqueurs potentiels d’états pathologiques. Elle permet de sélectionner rapidement de nouveaux paramètres de diagnostic et d’évaluation de l’efficacité d’agents thérapeutiques. Toutes les données SAGE peuvent être réunies dans une base informatique unique, où chaque nouvelle analyse bénéficie de l’ensemble des données précédentes. Cette première étape dans l’inventaire détaillé des composants cellulaires ouvre de nouvelles perspectives pour la modélisation in silico des fonctions biologiques.
Abstract
The availability of the sequences for whole genomes is changing our understanding of cell biology. Functional genomics refers to the comprehensive analysis, at the protein level (proteome) and at the mRNA level (transcriptome) of all events associated with the expression of whole sets of genes. New methods have been developped for transcriptome analysis. Serial Analysis of Gene Expression (SAGE) is based on the massive sequential analysis of short cDNA sequence tags. Each tag is derived from a defined position within a transcript. Its size (14bp) is sufficient to identify the corresponding gene and the number of times each tag is observed provides an accurate measurement of its expression level. Since tag populations can be widely amplified without altering their relative proportions, SAGE may be performed with minute amounts of biological extract. Dealing with the mass of data generated by SAGE necessitates computer analysis. A software is required to automatically detect and count tags from sequence files. Criterias allowing to assess the quality of experimental data can be included at this stage. To identify the corresponding genes, a database is created registering all virtual tags susceptible to be observed, based on the present status of the genome knowledge. By using currently available database functions, it is easy to match experimental and virtual tags, thus generating a new database registering identified tags, together with their expression levels. As an open system, SAGE is able to reveal new, yet unknown, transcripts. Their identification will become increasingly easier with the progress of genome annotation. However, their direct characterization can be attempted, since tag information may be sufficient to design primers allowing to extend unknown sequences. A major advantage of SAGE is that, by measuring expression levels without reference to an arbitrary standard, data are definitively acquired and cumulative. All publicly available data can thus be stored in a unique database, facilitating whole-genome analysis of differential expression between cell types, normal and diseased samples, or samples with and without drug treatment. SAGE data are readily amenable to statistical comparisons, allowing to determine the level of confidence of the observed variations. A major limitation of SAGE is that, because each analyse is obligatory performed on the whole set of expressed genes, it can hardly be performed on multiple samples, for example in kinetics studies or to compare the effects of large numbers of drugs. To overcome this limitation, high-throughput detection of a subset of mRNAs is more rapidly performed by parallel hybridization of mRNAs on arrays of nucleic acids immobilized on solid supports. From this point of view, a SAGE platform is a powerful instrument for selecting the most informative subset of genes, assembling them to design microarrays dedicated to a specific problem and calibrating measurement by comparison with a standard cell model for which SAGE data are available. This approach is an attractive alternative to strategies based exclusively on pan-genomic arrays. A very large amount of SAGE data are already available and the problem is now to extract their biological meaning. Knowledge on metabolic pathways is already organized so that its successfull integration in a SAGE platform can be undertaken. For other cell components and pathways, the problem lies on the lack of controled vocabulary to describe gene activites, starting form a clear definition of the concept of biological function itself. Progress in gene and cell ontology is expected to facilitate computer-based extraction of biological knowledge from existing and forthcoming SAGE data.
© Société de Biologie, Paris, 2002
Les statistiques affichées correspondent au cumul d'une part des vues des résumés de l'article et d'autre part des vues et téléchargements de l'article plein-texte (PDF, Full-HTML, ePub... selon les formats disponibles) sur la platefome Vision4Press.
Les statistiques sont disponibles avec un délai de 48 à 96 heures et sont mises à jour quotidiennement en semaine.
Le chargement des statistiques peut être long.