Introduction
Acute lymphoblastic leukaemia (ALL) is an acute form of leukaemia characterised by overproduction of lymphoblasts, which are immature, cancerous leukocytes[1]. Cytogenetic studies, immunophenotyping and genome-wide screening have helped to highlight the heterogeneity of ALL with clear manifestations and therapeutic and prognostic implications [2].
This review aims to examine the literature on the use of gene expression data in the classification of ALL. A search was performed in Embase, Ovid Medline, Ovid journals database and NHS Wales’s full-text journals in October 2015. All papers containing the keywords leukaemia, acute lymphoblastic leukaemia, microarrays, and bioinformatics in the title were combined. This was supplemented with a search in Google Scholar using the same keywords. This resulted in the identification of 20 papers that met the criteria for this review.
Classification of Acute Lymphoblastic Leukaemia(ALL)
Present standards for the diagnosis of ALL combine the use of immunophenotyping, cell morphology, and cytogenetics as described in the 2008 World Health Organization (WHO) classification of lymphoid neoplasms[3].This recognises three distinct subtypes based on immunophenotyping of surface markers on the lymphocytes, B-cell ALL, T-cell ALL, and Burkitt lymphoma. Lymphoid neoplasms are allocated to two main categories: those resulting from B- and T-lineage lymphoid precursors and others from mature B, T or NK cells. ALL comes under the former, titled B- or T-lymphoblastic leukaemia/lymphoma and includes three major categories: B-lymphoblastic leukaemia/lymphoma not otherwise specified, T-lymphoblastic leukaemia/lymphoma and B-lymphoblastic leukaemia/lymphoma with repeated cytogenetic modifications[4]. The leukemic variant involves the peripheral blood and bone marrow, but lymphoma is limited to nodal or extranodal sites with the bone marrow being less involved. A purely leukemic presentation is characteristic of B-lineage ALL(85%), while T-lineage presentations commonly display a lymphomatous mass in the mediastinum or other locations[5].
Diagnosis of ALL
A bone marrow assessment is the first step in the diagnosis of ALL[6]. Table 1 shows the criteria for distinguishing the morphological criteria between lymphoblasts and myeloblasts. Flow cytometry analysis is the gold standard to identify both cell lineage and the characterization of the subset.
Table 1 Morphological characteristic of blasts cells in ALL versus AML[7]
Lymphoblasts Myeloblasts
General characteristics Blast population tends to be homogeneous Blast population is usually heterogeneous except in the undifferentiated form
Size Variable, mainly small Variable, usually large
Nucleus Central, mainly round; sometimes indented, particularly in the form in adults
Nucleocytoplasmic ratio very high in children but not in adults Tending to be round, eccentric, oval or angulated; sometimes convoluted, especially in the shape with a monocytic component
Nucleocytoplasmic ratio high in undifferentiated blast cells and in some megakaryoblasts but
low in the differentiated form
Chromatin Fine, with condensation being dispersed
Condensed in small lymphoblasts Granular, fine and delicately dispersed
Nucleoli Absent in small lymphoblasts
Occasionally indistinct Usually present, often large and prominent
Cytoplasm Basophilic, scanty
Occasionally with a single long projection Variable
Common in monoblasts
With protrusions in megakaryoblasts and erythroblasts
Granules Rarely present, azurophilic and negative for esterases, peroxidase and toluidine blue. Seen in differentiated forms
Positive with cytochemical stains
– peroxidase in the eosinophil and neutrophil lineages
• nonspecific esterase in the monocyte lineage
• toluidine blue in the basophil lineage
Auer rods Absent in all cases May be present
Usually seen in the hypergranular promyelocytic form
Cytogenetics
Conventional cytogenetic analysis can identify variations in chromosome number, which is aided by FISH (Fluorescence in situ hybridization) enumeration. Examples of numerical abnormalities include the entire set of chromosomes or the loss of gain of individual chromosomes (aneuploidy)[8,9]. Cytogenetics has traditionally been used for the detection of established chromosomal abnormalities (translocations, deletions, insertions, and inversions) within the clinical setting, for example, the t (9;22)(q34;q11·2), recognised as the Ph translocation in CML[10] and ALL[11]. FISH led to the detection of the cryptic translocation, t(12;21)(p13;q22),generating the ETV6-RUNX1 (TEL-AML1) fusion[12](fig 1). FISH and other molecular approaches are currently used to screen for major anomalies in ALL[13]. Colour FISH techniques have shown many new chromosomal abnormalities and revealed complex karyotypes[14,15]. FISH and aCGH(Array comparative genomic hybridization) have been used to detect the breakpoints involved in these rearrangements[16].
Fig.1 Frequency of various cytogenetic subtypes in pediatric ALL [12]
List of abbreviations used in the above figure
BCR-ABL1 – Breakpoint cluster region – ABL1
MLL- mixed-lineage leukaemia
TC3-PBX1 – Transcription Factor 3 Pre-B–cell leukaemia homeobox 1
ERG- E 26 transforming sequence related gene
IAMP21 – Intrachromosomal amplification of chromosome 21
TAL1- T-cell ALL 1
TLX3 – T-cell leukaemia homeobox 3
TLX1 – T-cell leukaemia homeobox 1
LYL1 – Lymphocytic leukaemia derived sequence 1
ETV6-RUNX1 – E26 Transformation Specific Variant 6 Runt-related transcription factor 1
CRLF2- Cytokine Receptor-like Factor 2
B-ALL – B-precursor ALL
T-ALL – T-precursor ALL
The cellular origins of ALL have been established and numerous different genetic mechanisms that lead to the malignant conversion of these cells have been documented [17]. In approximately 30% of adults, ALL cells display many genes linked to T-cell differentiation. In the remaining 70%, ALL cells show markers of B-cell differentiation and immunoglobulin genes have revealed specific patterns of clonal rearrangement. These features determine the ALL subtype from distinctive lymphoid precursor cells (T-lineage or B-lineage differentiation). The comparative frequencies of specific molecular rearrangements are different in adults and children with B-lineage ALL. The BCR/ABL gene rearrangement occurs in approximately 25% of cases in adult ALL and the ALL1/AF4 gene rearrangement (MLL/AF4) is seen in about 4% to 7%[18,19]. These cytogenetic abnormalities represent different mechanisms of transformation which might explain why adults and children with ALL have different therapeutic outcomes.
In approximately 50% of adult cases molecular rearrangements are not found, despite the common occurrence of gene rearrangements and chromosome translocations in ALL. [22]. Even if the cellular roots of these leukaemias can be decided by phenotypic studies, their underlying transformation mechanisms are unidentified. Hence, extensive gene expression profiling offers a novel approach to discovering the mechanisms of alteration of malignant cells[20,21].
Specific rearrangements on gene expression and cell lineage derivation strongly influence gene expression in adult ALL. A group of kinases has been identified within subsets of ALL that signify possible targets for therapy in adult patients[22].
Obtaining gene expression data using microarrays for the classification of ALL
Basic concepts on the working of Microarrays
A microarray is normally a glass slide on which DNA molecules are secured in a certain manner at certain sites called spots. It contains thousands of spots, each of which could have a few million copies of identical copies of DNA molecules that correspond to a gene (Figure 2A). The spots are created by photolithography or printed onto a glass slide. Microarrays are used to determine gene expression in several ways, one of which is to compare expression of a group of genes maintained in certain condition (condition A) to the same group from a reference cell under normal conditions (condition B). Following extraction of RNA from the cells, they are reverse transcribed into cDNA (complementary DNA) by the enzyme reverse transcriptase. Nucleotides are added, which are labelled with different fluorescent dyes(Fig 2B)[23]
Both samples hybridize onto the same glass slide. cDNA sequences in the sample hybridize to certain spots that contain its complementary sequence. The quantity of cDNA bound is proportional to the original number of RNA molecules found in the gene for both samples. In the hybridised microarray, the spots excited by a laser are scanned at appropriate wavelengths to detect the differently coloured dyes[23].
The quantity of bound nucleic acid determines the level of fluorescence emitted. For e.g. if the quantity of cDNA from condition A for a certain gene was more than that from condition B, the spot would be the colour indicative of A. The spot would be a combination of both colours if the level of expression was the same in both conditions, and the spot would appear black if the gene was not expressed in both.
Figure 2: Principles of the working of Microarrays [23]
Processing of information from microarray experiments
The image formed is processed as follows:
1. The spots are identified and distinguished from false signals. An image is produced after scanning the microarray which is analysed to identify spots. The spots are arranged into sub-arrays groups(Figure 2A).
2. The spot to be surveyed and local region are determined, to estimate background hybridization. A part of the sub-array is selected to determine the spot signal and to estimate the background intensity (Fig 3).
3. After removing background intensity, summary statistics are reported and spot intensity allocated. For each spot in each channel, a range of summary statistics (green and red channels) is reported. Inside the documented area, each pixel is taken into account, and the mean, median, and total values are recorded for the spot and background(Fig 3) [23].
Fig 3 A microarray slide spot
The spot is represented by a blue circle and the background by a white box. Any pixel in the blue circle is considered as a signal from the spot. Pixels within the white box, but outside the blue circle are considered as a background signal. The images are not perfect with false signals due to scratches, dust particles, etc.[23].
Normalisation of microarray data
Transformations of expression ratios are a sensible method to find genes expressed differentially. When genes whose expression levels which should remain constant in the both conditions are compared (e.g. housekeeping genes), an average expression ratio of such genes that diverges from 1 is commonly found. Normalization is the process of eliminating systemic variations that affect gene expression measurements to allow data comparison from the two samples. A gene-set containing genes (for which expression levels should remain constant) is studied. From that set, a normalization factor, (which is a number that accounts for the variability seen in the gene set), is then calculated and applied to the other genes. The normalization procedure changes the data and is performed only on the background corrected values for each spot [23] (Fig 4).
Fig 4 Gene expression data before and after normalization. Before normalization, the image had several spots of different intensities, but after normalization only spots with large differences light up [23].
Analysis of gene expression data
After normalisation, the processed data can be shown as a gene expression matrix. Each row in the matrix matches to a particular gene and each column to an experimental condition or a specific time point at which gene expression has been noted. The expression levels of a gene across different experimental conditions are the gene expression profile, and the expression levels under an experimental condition are the sample expression profile. Additional levels of annotation can be added either to the gene or to the sample[23].
Representation of gene expression data
Data in the gene expression matrix can be represented in 5 different ways:
Type of measurement Description of each measurement
Absolute measurement: Each cell in the matrix will denote the expression level of the gene in abstract units.
Relative measurement or expression ratio: The expression level of a gene in abstract units is normalized with respect to its expression in a reference condition, which gives the expression ratio of the gene in relative units.
Log2 (expression ratio): Information on up-regulation and down-regulation is captured and is mapped in a symmetric manner in tables representing the log2 (expression ratio) values.
Discrete values: When converting the absolute measurement to discrete numbers, a binary expression matrix of 1 and 0 can be used (1 means that the gene is expressed above a user-defined threshold, and 0 means that the gene is expressed below). The values are divided into 3 classes, +1(positively regulated gene), 0(not differentially regulated) and –1(repressed gene).
Representation of expression profiles as vectors: An expression profile of a gene or sample can be represented in as a vector in space [23]
Various clustering methods used in the analysis of microarray data
Clustering is an unverified approach used to make meaningful biological inferences by ordering data into groups of genes with similar patterns that are characteristic of the group. These can be hierarchical (grouping objects into clusters and specifying relationships amid them, resembling a phylogenetic tree) or non-hierarchical (grouping into clusters without specifying relationships between objects). The different clustering methods are shown in table 2 below.
Table 2 Different clustering methods used in the analysis of microarray data
Clustering Method Information about method
Hierarchical clustering: agglomerative Each object is considered a cluster. The objects are successively fused until all are included. Based on the pairwise distances between them, objects that are similar to each other are grouped into clusters. After this is done, pairwise distances between the clusters are re-calculated, and similar clusters are grouped together until all the objects are included in a single cluster.
Single linkage clustering The distance between two clusters is calculated as the minimum distance between all possible pairs of objects.
Complete linkage clustering The distance between two clusters is calculated as the maximum distance between all possible pairs of objects.
Average linkage clustering The distance between two clusters is calculated as the average of distances between all possible pairs of objects in the two clusters.
Centroid linkage clustering An average expression profile is calculated in two steps. 1st step the average is calculated for all objects in a cluster in each dimension of the expression profiles. 2ND step- the distance between the average expression profiles of the two clusters.
Hierarchical clustering: divisive The entire set of objects is considered as a single cluster and is broken down into several clusters with similar expression profiles. Each cluster is considered separately and the divisive process is repeated until all objects have been separated into single objects.
Non-hierarchical clustering This is an alternative to hierarchical clustering and requires predetermination of the number of clusters. Non-hierarchical clustering groups existing objects into predefined clusters rather than organizing them into a hierarchical structure [23].
Hierarchical clustering may be agglomerative or divisive. The principles behind agglomerative and divisive hierarchical clustering are shown below (Figure 5).
Fig 5 The principle behind agglomerative and divisive clustering. The colour code represents the log2 (expression ratio), where red represents up-regulation, green represents down-regulation, and black represents no change in expression. The matrix at the top is the product of aggregative or divisive clustering, and genes A to E are given in the final order [23]
Operational issues regarding microarrays
The introduction of microarray experiments has produced several bioinformatics challenges including numerous levels of reproduction in experimental design, the number of platforms, the treatment of the data, independent groups and data format, accuracy and precision, the high volume of data and the ability to share it. Microarray data can be difficult to share due to poor standardization in platform production, assay protocols, and methods of analysis, which leads to inoperability. Various projects are trying to ease the analysis and exchange of data produced with non-proprietary chips including “Minimum Information About a Microarray Experiment” (MIAME) checklist, which defines the level of detail that should exist, and The “MicroArray Quality Control (MAQC) Project”. The FGED (Functional GEnomics Data) Society has produced standards to display results and relevant annotations of gene expression experiments. Microarray data sets are usually large and analytical precision is influenced by many variables. Statistical challenges include considering normalization of the data, methods for which may be matched only to specific platforms, and the effects of background noise.
Advances in parallel sequencing have produced RNA-Seq technology, which enables a whole transcriptome approach to characterize and quantify gene expression, unlike microarrays which require a reference genome and transcriptome to be available prior to designing the microarray [23].
The applicability of microarrays in hematologic malignancies
Yeoh et al found leukaemia subtypes of prognostic significance in pediatric ALL, specifically T-cell acute lymphoblastic leukaemia, BCR-ABL E2A-PBX1, TEL-AML, mixed-lineage leukaemia rearrangement, and hyperdiploid with more than 50 chromosomes[24]. A class discriminator was created with a diagnostic accuracy of 97% following data reanalysis[25] which holds promise in clinical practice. Although gene-expression profiling is targeted at better categorising diseases, it can also lead to the identification of new therapeutic targets. Armstrong et al [26] studied the impact of the small molecule inhibitor cytokine FLT3 and demonstrated that it halted tumor progression.
Gene-expression profiling of diffuse large B-cell lymphoma (DLBCL) has led to the separation into groups based on cellular origin[27]. In one group the gene expression was illustrative of germinal-center B cells, while the other group consisted of genes usually induced during in vitro activation of peripheral blood B cells (activated B-like DLBCL). The discovery that patients with germinal center B-like DLBCL have considerably better overall survival than those with activated B-like DLBCL allows stratification of patients for clinical management[28]. Validation of these results reveal that the BCL6 and HGAL (GCET2) genes are explicitly expressed in the germinal center B cells and predict overall survival[29], while other differentially expressed genes( for example,CD10) do not [30]
Lossos et al used reverse transcription polymerase chain reaction (RT-PCR) assays to assess a group of six genes that predicted prognosis in patients with DLBCL. This revealed that only LMO2 and BCL6 from the germinal-center B-cell signature and BCL2, CCND2, and CCL3 from the activated B-cell signature had any predictive value[31].
Limitations of microarray technology
The complexity of biological systems and data handling
Gene-expression profiling studies of human diseases are complicated and result in a high possibility for errors[32]. Since a typical experiment handles enormous quantities of data it is essential that experiments are tightly regulated. The steps involved in microarray analysis and the limitations of the method are shown in Fig 6.
Financial implications
Microarrays chips that include Affymetrix chips, glass slide arrays and membrane arrays are typically expensive. These technologies are rapidly improving and the costs are decreasing, leading to wider access[33].
The tissue sample
Freezing of a sample may result in RNA degradation[34].There might be very small levels of fragile mRNA within the cell which can lead to degradation soon after extraction, drastically affecting the interpretation of data and resulting in different levels of gene expression[35,36]. Strict observance to experimental technique, reliable timing, sampling of tissue and repetition of the experiment using a single reference RNA are ways to remove errors. [37].
The search for differentially expressed genes and reproducibility of arrays
Normal gene expression will vary from one individual to the next and could be misinterpreted as pathological. The steps involved in the search for differentially expressed genes include chip production, probe hybridization, image quantification, normalization and data interpretation[38]. Inter-experimental variability will remain a concern until an agreement is reached to regulate every step[38]. Clinicians should be conscious of uneven molecular signatures and gene misclassification.
The list of genes included in a molecular signature depends on the statistical methods and the selection of patients in the training sets[39]. Another concern is that results from different microarrays are expressed as levels in relation to a non-standardized reference RNA. The implementation of standardized units for the level of gene expression will be an important step towards regulation of microarray data[40].
An algorithm called MuFu (MixupFixup) to facilitate recognition of production errors prior to hybridization highlights the importance of tracking all levels of an experiment to allow evaluation and correction[32]. Tools such as LinCmb, GMC (Gaussian mixture clustering) and BPCA (Bayesian principal component analysis) have been developed to input the missing values of an experiment and improve the significance of a study [41, 42, and 43].
Statistical analysis
The link between a genetic signature with disease outcome has been shown to be stronger in preliminary than subsequent studies[44]. This could be elucidated by ‘overfitting’, one of the key limitations of supervised clustering methods. Overfitting specifies that the number of parameters in a model is too high compared to the number of specimens. The model will fit the original data but not for independent data, as the gene-expression pattern is optimized to predict tumor behavior. Therefore it is vital to get an impartial approximation of the true error rate of the predictive power of a gene-expression pattern. Approaches for finding improved predictors include ‘leave-one-out’ cross-validation when a gene-expression predictor is built by excluding one or more samples[45].
It is essential to validate a predictive gene expression pattern in an adequately large independent group of patients. One drawback of unsupervised cluster analyses is that it does not offer statistically valid quantitative information about differences in expression level between genes or classes. Although unsupervised clustering gives awareness into the quality of differentially expressed genes, it does not quantify the level of upregulated or downregulated expression. An unselected patient cohort is vital for authentication of prognostic or predictive gene-expression profiles. [46].
Fig 6 The steps and limitations when performing gene expression profiling using microarray analysis [47].
Conclusion and further developments
The application of gene expression profiling in classifying leukaemias has shown great promise. Bioinformatics analysis of gene expression profiles in ALL over the past several years has been a major advance in improving diagnosis and enhancing treatment success. Different studies show different approaches in analysing data obtained. Further studies are required to find potential targets for ALL diagnosis and treatment.
Essay: Acute lymphoblastic leukaemia
Essay details and download:
- Subject area(s): Health essays
- Reading time: 12 minutes
- Price: Free download
- Published: 4 May 2017*
- Last Modified: 23 July 2024
- File format: Text
- Words: 3,427 (approx)
- Number of pages: 14 (approx)
Text preview of this essay:
This page of the essay has 3,427 words.
About this essay:
If you use part of this page in your own work, you need to provide a citation, as follows:
Essay Sauce, Acute lymphoblastic leukaemia. Available from:<https://www.essaysauce.com/health-essays/acute-lymphoblastic-leukaemia/> [Accessed 30-01-25].
These Health essays have been submitted to us by students in order to help you with your studies.
* This essay may have been previously published on EssaySauce.com and/or Essay.uk.com at an earlier date than indicated.