Wireless Capsule Endoscopy

INTRODUCTION
Wireless Capsule Endoscopy (WCE)[1] is a novel medical procedure which has revolutionized gastrointestinal (GI) diagnostics by turning into reality the concept of painless and effective visual inspection of the entire length of small bowel (SB). In recent years, the validity of SB WCE in clinical practice has been systematically reviewed[2]. Out of this evidence base, it clearly emerges that WCE is invaluable in evaluating various disorders, such as Crohn’s Disease (CD), and mucosal ulcers. CD is a chronic disorder of the GI tract (GT) that may affect the deepest layers of the intestinal walls. In 45% of cases, CD lesions are located in small intestine. One of the main characteristics of inflammatory bowel diseases, such as CD, is the evolution of extended internal inflammations to ulcers, or open sores, in the GT. CD is not lethal by itself, but serious complications are of high risk, rendering early diagnosis and treatment essential.
Despite the great advantages of WCE and the revolution that has brought, there are challenging issues to deal with. A WCE system produces more than 55.000 images per examination that are reviewed in a form of a video, that requires more than one hour of intense labor for the expert, in order to be examined[3]. This time consuming task is a burden, since the clinician has to stay focused and undistracted in front of a monitor for such a long period. Moreover, it is not guaranteed that all findings will be detected. It is not rare that abnormal findings are visible in only one or two frames and easily missed by the physician. Thus, automatic inspection and analysis of WCE images is of immediate need, in order to reduce the labor of the clinician and eliminate the possibility of omitting a lesion due to the clinician’s non-concentration. Motivated by the latter, a number of automatic GI content interpretation research efforts have been proposed in the literature (see Section “Related Work”).
In this work, we introduce a novel WCE image analysis system for the recognition of lesions created by mucosal inflammation in CD. The main contributions of this paper are in: i) extending the relatively limited research efforts on CD lesions and ulcers detection; ii) developing the novel Hybrid Adaptive Filtering (HAF) for efficiently isolating the lesion-related WCE image characteristics, by applying Genetic Algorithms (GA)[4] to the representation of the WCE images on the Curvelet Transform[5] domain; and iii) examining the performance of the proposed approach, namely, HAF-DLac, based on the severity of lesions. Additionally, this work extends the effectiveness of Differential Lacunarity (DLac)-based feature vector, presented in[6] and further examines the potential of the YCbCr space for efficient lesion detection.
RELATED WORK
In the recent literature, the principal research interest (more than 75% of the WCE-related published works[7]) towards the reduction of the examination time of WCE data deals with detection of certain disorders in the internal mucous membrane. The major types of pathologies targeted are polyps, bleeding, ulcerations, celiac disease, and CD. As far as inflammatory tissue (i.e., ulcer and CD lesions) detection is concerned, only a small proportion of research efforts (7% for ulcers and 2% for CD[7]) are targeted towards this direction, in spite of the great importance and wide-spreading of such disorders. Detecting such kind of eroded tissue is very challenging, since it is characterized by huge diversity in appearance. For ulcer detection, a feature vector that consists of curvelet-based rotation invariant uniform local binary patterns (riuLBP) classified by multilayer perceptron was proposed[8]. Detection rates are heartening, but the performance is affected by the downsides of LBP. Although riuLBP perform well in illumination variations, they are based on the assumption that the local differences of the central pixel and its neighbors are independent of the central pixel itself, which is not always guaranteed, as the value of the central pixel may also be significant. Moreover, there is lack of between-scale texture information that is highly important for medical image analysis. In[9], the authors present a segmentation scheme, utilizing log Gabor filters, color texture features, and support vector machines (SVM) classifier, based on Hue-Saturation-Value (HSV) space. Classification results are promising, but the dataset is rather limited (50 images) and includes perforated ulcerations that are quite easily detected due to clear appearance. Additionally, the HSV model suffers some shortcomings, as the RGB model[10]. The authors in[11] propose bag-of-words-based local texture features (LBP and scale-invariant feature transform-SIFT) extracted in RGB space and SVM classifier, whereas in[12], a saliency map is used along with contour and LBP data. Both approaches are affected by the weaknesses of LBP and SIFT features, which are of narrow use, since such features are often limited in relatively small regions of interest, are susceptible to noise, and exhibit insufficient sensitivity results. In the same direction, the works[6,13,14] investigate the potential of Empirical Mode Decomposition-based structural features extracted from various color spaces, introduce texture features based on color rotation, and perform preliminary research on Curvelet-based lacunarity texture features. The proposed results are promising, but the dataset used for validation is rather small. To the best of our knowledge, the main research efforts reported in the literature dealing with the detection of CD lesions are[15-17]. In[15,16], Color Histogram statistics, MPEG-7 features along with a Haralick features and a Mean-Shift algorithm are used, whereas in[17] a fusion of MPEG-7 descriptors and SVM classifiers are employed. Even though the classification performance is promising, MPEG-7 standards were not particularly designed to describe medical images; thus, there are several problems behind applying them in medical image analysis. They were developed for multimedia content description; hence, in case of images, they describe the overall content of the image, not allowing efficient characterization of local properties or arbitrary shaped regions of interest. Besides, they compute descriptors within relatively big rectangular regions that are inadequate for description of local medical image properties.
Aside from schemes developed to detect a single abnormality, there are efforts towards broad frameworks that detect multiple abnormalities, such as blood, erythema, polyps, ulcers and villous edema[18-24]. Nevertheless, none of them deals with less straightforward lesions created by CD inflammations. It is unquestionable that detecting multiple abnormalities is important for an overall computer-assisted diagnosis tool, but it is crucial that all abnormalities are equally detected properly. This is extremely challenging and not achieved in any of the aforementioned techniques, where ulcer detection results are rather low. There is no one-size-fits-all approach, particularly for CD lesions and ulcerations that exhibit huge diversity in appearance, with attributes (color, texture, size) varying significantly over severity and position.
From a methodological validation point of view, a serious limitation of the preceding efforts is the employment of quite small databases (<250 images instead of >500[7]), which are often unbalanced and not described in detail (severity, incorporation of confusing tissue). Moreover, the inclusion of multiple instances from the lesion taken in the very same region within the GT that exhibit high similarity is a possible source of overfitting and virtual optimistic results. Last but not least, none of the published approaches validates the performance against the severity of lesions, apart from[17], where lesion severity classification takes place.
MATHEMATICAL BACKGROUND
Curvelet Transform
Curvelet space is the core of HAF section of the proposed scheme that aims to spotlight the structural and textural characteristics of CD lesions and facilitate feature extraction. Among the most popular methods for characterizing the textural appearance of surfaces are wavelets and curvelets that offer multi-resolution analysis. The rationale for engaging multi-resolution analysis in WCE images is that CD lesions are characterized by great variations in appearance in terms of scale, shape, size, illumination and orientation. Additionally, the images contain a significant amount of background variation. Consequently, a robust tool is needed that is able to capture structural/textural data in various scales and directions.
Wavelets have been commonly used for multi-resolution two-dimensional (2D) signal analysis. The power of wavelet transform (WT) rests on its ability to successfully capture point singularities, for piecewise smooth functions in one dimension (1D). Unfortunately, this is not the case in two dimensions. In essence, 2D piecewise smooth signals, such as images, exhibit 1D singularities (edges) that cannot be efficiently described by wavelets. That is, edges separate smooth regions and while they are discontinuous across, they are typically smooth curves. In the 2D case, wavelets are produced by a tensor product of 1D wavelets and, thus, are good at describing discontinuities at edge points, but cannot capture the smoothness along edges. In other words, WT isolates directional data that only capture horizontal, vertical and diagonal structures in an image. Such a directional selectivity is not sufficient to describe medical images.
In an attempt to overcome this traditional weakness of WT, Candes et al. introduced curvelet transform (CT)[25]. Its key concept is to represent a curve as a superposition of functions of various lengths and widths obeying a specific scaling law[8]. Continuous CT is defined by a radial window W(r) and an angular window V(θ) that are both smooth, nonnegative and real-valued. Considering U_j as the Fourier transform of a function φ_j (x), we can assume φ_j (x) as a “mother” curvelet in the sense that all curvelets at scale 2-j, orientation θ_l and position x_k (j,l) are obtained by rotations, scaling and translations of φ_j. A curvelet coefficient is then defined as the inner product between an element f ϵ L2 (R2) and a curvelet φ_(j,l,k) (x)=φ_j (R_(θ_l ) (x-x_k^((j,l) ))) at scale 2-j, orientation θ_l, and position x_k (j,l)=R_(θ_l)^(-1) (k_1∙2^(-j),k_2∙2^(-j/2)), where R_θ is the rotation matrix by θ radians. The needle shaped elements of CT exhibit high directional sensitivity; hence, depicting more efficiently singularities along curves than traditional WT and providing better texture discrimination ability than wavelet counterparts[8]. The continuous CT can be extended to the digital space via either unequispaced Fast Fourier Transform (FFT) or wrapping. Both techniques have the same complexity, however, the wrapping algorithm is somewhat simpler and, thus, more popular[8].
DLac Analysis
DLac is a robust tool for multi-scale and translation invariant texture analysis, capable to reveal slight or sharp changes in neighboring pixels without directional limitation, necessary in the case of WCE data. DLac has been used in various pattern discrimination problems in various scientific fields[26-28].
Lacunarity
Lacunarity (Lac), derived from the word lacuna meaning “gap”, was introduced by Mandelbrot[29] as a means to discriminate textures and natural surfaces that share the same fractal dimension, but significantly vary in visual appearance. Fractal dimension does not fully describe the space-filling characteristics of data, since it measures how much space is filled. To this end, Lac is a counterpart to fractal dimension that describes the texture of a fractal by measuring how data fill space. So, Lac has been used as a more general technique to characterize patterns of spatial dispersion[30]. More specifically, Lac analysis evaluates the largeness and distribution of gaps or holes in data sets at multiple scales. The more gaps distributed across a broad range of sizes a set contains, the higher Lac value it exhibits. Beyond being an intuitive measure of “gappiness”, Lac analysis can quantify additional features of patterns, such as translational and rotational invariance and, more generally, heterogeneity. Gefen et al.[31] defined Lac as the deviation of a fractal from translational invariance. Sets with non-uniform distribution of gaps can be considered heterogeneous and exhibit higher Lac than almost transnationally invariant (homogeneous) sets. But, translational invariance is highly scale-dependent. Sets that are homogeneous at small scales can be quite heterogeneous when examined at larger scales and vice versa. Lac, can deal with this situation due to its inherent characteristics. From this perspective, Lac analysis can be considered as a scale-dependent measure of texture of an object[29,30]. A number of methods have been presented to calculate Lac, but the most popular algorithms are founded on the intuitively clear and simple Gliding Box Algorithm (GBA)[32]. GBA is functional on 1-D binary data; Plotnick et al.[30], however, extended the concept of Lac to real datasets by applying thresholding.
Differential Lacunarity
In order to process grayscale images with Lac, the most straightforward approach is to extend the original GBA algorithm to 2D and convert the grayscale images to binary through thresholding. Nonetheless, in many scientific fields, and especially in medical imaging, such a thresholding procedure discards valuable information and cannot always be performed. To address this shortcoming, Dong[33] proposed a new version of Lac, i.e., DLac, appropriate for grayscale image analysis. The calculation of DLac is based on a “Differential Box Counting” method that utilizes a gliding box R (r· r pixels) and a gliding window W (w· w pixels, with r<w). W scans the image, while R scans W. Both W and R move in an overlapping pattern, sliding one pixel at a time. R is used for the calculation of the “box mass” M of the window at every position. If Q(M,r) is the probability function of M distribution across the image, the DLac of the image at scale r, w is defined as[33]
Λ(w,r)= ∑_M▒〖M^2 Q(M,w,r,) 〗/〖[∑_M▒MQ(M,w,r,) ]〗^2. (1)
It is common practice to calculate DLac for a variety of scales (see Section “DLac-Based Feature Vector”), forming a DLac–w curve. This curve is the multi-scale description of texture and characterises the specific space-filling pattern.
THE PROPOSED HAD-DLAC APPROACH
Overview
The objective of HAF-DLac scheme is as follows. Given a region of interest (ROI) I within a WCE image, identify if I corresponds to normal tissue or CD lesion. The overall structure of HAF-DLac scheme, along with a working example, is depicted in Figure 1. After a pre-processing stage, where the RGB image is converted to YCbCr space and the chromatic channels are extracted, the WCE image is inputted to HAF section of HAF-DLac scheme. YCbCr space was selected because it is a perceptually uniform color space that separates color from brightness information and overcomes the disadvantage of high correlation between the RGB channels[34]. The role of HAF is to isolate the CD lesion-related WCE image characteristics, facilitating the task of feature vector extraction that follows. To achieve this, HAF incorporates GA that acts upon the representation of WCE images on the Curvelet space. In the latter, the image is decomposed into a series of Curvelet-based sub-images of various scales and orientations. Then, GA is employed and, by using energy- or Lacunarity curve gradient-based fitness function, selects the optimum sub-images that relate the most with the CD lesion-related characteristics. The HAF output consists of the selected sub-images that could be either combined through a reconstruction process to produce a reconstructed image (R-case), or used directly with no reconstruction (NR-case). Under both scenarios, the HAF output is used as input to the DLac section of the HAF-DLac scheme. There, DLac-based analysis is performed, resulting in efficient extraction of feature vector (FVDLac), corresponding to the R- and NR-case (FVR and FVNR, respectively). The latter is forwarded to SVM-based classification.
Hybrid Adaptive Filtering (HAF)
In order to follow the WCE image characteristics and focus upon the ones that mostly relate to the CD lesion information, a HAF approach was developed. As declared by the term “hybrid”, HAF entails two processing tools, i.e., CT and a simple GA optimization concept, so as to construct a filtering process adapted to specific characteristics of the filtered signal. CT is qualified as a filter bank due to its functionality to decompose an image into sub-images at various scales and orientations that can be interpreted as a pseudo-spectral-spatial representation[35]. In order to exploit the aforementioned capability of CT, a new GA-based approach was introduced for the optimized selection of sub-images that correspond to specific features of an image. The concept of decomposing a WCE image in curvelet domain and selecting specific informative sub-images was implanted by a previous preliminary study[6], where it was evidenced that sub-images at certain scales and angles exhibit high discrimination capabilities. One of the most important modules of the filtering procedure described above is the fitness function (FF) of the GA, as this is a pivotal criterion according to which the filtering is implemented. Energy-based (EFF) and Lacunarity curve gradient-based (LFF) FFs were employed in this approach.
Energy-Based Fitness Function (EFF)
The aim of using EFF was to conduct a filtering procedure by selecting the sub-images which embed the minority of the energy of the image. Based on the results of[6], we observed that the sub-images with better performance exhibited lower mean energy compared to the others that achieved worse results. This may be explained by the fact that the sub-images with high mean energy contain abrupt and steep structures that do not convey valuable information about the texture of normal and eroded mucosa. On the contrary, the low energy sub-images are free from misleading content and are more likely to contain CD lesion-based information. This potential is evidenced in Figure 2A, where we can see the decomposition of an ulcer image (Y channel) at scale 3 and 8 angles. It is clear that sub-images at angles 1, 4, 5 and 8, which contain less mean energy than the rest (Figure 2B), are more likely to exhibit informative texture content, since they display informative, apparently, distribution of non-zero pixels and they do not contain sharp changes. On the contrary, the sub-images at angles 2, 3, 6 and 7 exhibit intense variations at their borders, highlighted by the grater intensity range, that may conceal the delicate textural patterns and hinder efficient features extraction. The formula used for the EFF is:
f(S)= ∑_({S|S_r=1})▒〖E{c_r}〗^2 /∑_(i=1)^M▒〖E{〖c_i〗^2}〗, (2)
where S is the string of 1/0s, S_r=1 is the set of the elements of S with value 1, c_i represents the sub-image at angle i.
Lacunarity Curve Gradient-Based Fitness Function (LFF)
Apart from the EFF, a second approach was attempted, by employing the gradient of the DLac-w curve estimation in the FF structure. As mentioned before, DLac is a measure of heterogeneity and a scale-dependent measure to characterize and discriminate textures and patterns[30]. In this way, an image with uniform patterns delivers lower DLac values than an image with arbitrary and irregular patterns. The DLac analysis scale is determined by the size w of the gliding window (see Section “DLac-Based Feature Vector”). The DLac-w curve can be considered as a multi-scale description of structural patterns and its gradient can reveal the existence of specific textures and structures. For example, if an image contains microstructures with moderate differentiation for a variety of observation scales, the gradient of the DLac-w curve is expected to be lower than the one that corresponds to more abrupt and irregular structures that can be described rather diversely from various scale perspectives.
The aim of using LFF was to capture the variations in structural and textural characteristics of WCE images. Images with small DLac-w curve gradient may correspond to structures of normal and eroded mucosa, while DLac-w curves with steeper slope may account for distracting content. To this end, by considering the capability of DLac-w curve to monitor the existence of valuable or meaningless content, DLac-based filtering would act as a boosting procedure of the information of the initial WCE images related to the CD lesion structures in it. This concept is validated by the observations made, based on the results of[6], where the sub-images that provided better performance exhibited divergent DLac-w curve gradient compared to those that granted worse results. In Figure 3, the boxplot of the local gradient of DLac-w curves versus the analysis scale is depicted for efficient (black) and non-efficient (gray) sub-images at the Curvelet domain, coming from 30 randomly selected WCE images depicting CD lesion or normal tissue. From Figure 3 it is clear that non-efficient sub-images tend to expose higher slope at smaller scales and lower slope at bigger scales. The gradient of DLac curve at scale i (Gr(i)) is calculated as the difference Λ(i+1)- Λ(i-1). The formula used for the LFF is expressed by
f(S)=[Gr(4)+Gr(5)]/∑_(i=6)^13▒〖Gr(i)〗, (3)
since the gradient at the first two scales has to be high and the gradient of the rest scales has to be low (based on Figure 3).
DLac-Based Feature Vector
The second part of the proposed HAF-DLac scheme is DLac analysis that aims to efficiently extract FVDLac. As noted before, CD lesions exhibit widely diverse appearance; thus, a robust tool is required to be able to perform multi-scale, translation invariant texture analysis. DLac is such an attractive tool due to its simple calculation and precision that has been previously used successfully for WCE image analysis[6,14]. The rationale for using DLac[33] is its capability of revealing either sharp or slight changes in neighboring pixels (that characterize CD lesion texture), since it does not use thresholding, as does a very common feature extraction tool, namely riuLBP, that conceals the magnitude of changes. The downsides of riuLBP and other feature extraction approaches are presented in Section “Related Work”. Moreover, DLac is tolerant to: i) non-uniform illumination (very common in WCE images), due to the differential calculation, and ii) rotational translation, since the pixel arrangement in the gliding box is irrelevant. In general, DLac surpasses the simple statistical (e.g., Haralick features, co-occurrence matrix, etc.) as well as the more advanced structural approaches (such as riuLBP, textons, texture spectrum etc.) of texture because it is based on, neither plain non-scale statistical analysis of the raw pixel intensities, nor predefined structural patterns. On the contrary, it relies on the statistical analysis of pseudo-patterns (box mass), defined by the data itself, at multiple scales while providing between-scale information. For the above reasons, DLac is expected to produce powerful features from the HAF-enhanced WCE images and achieve advanced classification results that is evidenced by the experimental results.
In order to exploit the multi-scale analysis advantage of DLac, the value Λ(w,r) (see (1)) is not calculated for a single set of parameters. In this work, we calculate Λ versus w, with r being a constant, despite the fact that initial approaches suggested the opposite[33]. This technique[27,36] is adopted because w is the primary feature that affects the scale of the analysis, since it determines the size of the image region on which the box mass will be calculated. According to[30,36], the larger the area on which the box mass is calculated, the coarser the scale of Lac analysis becomes. On the contrary, r value affects the scale of DLac analysis only to a certain degree, by determining the size of the neighborhood on which the differential height is calculated and, consequently, the sensitivity to recognize intensity variations. Thus, in our approach, we achieve to identify slight variations in neighboring pixels (by selecting a small value for r) and to analyze structure patterns at different scales. Moreover, Λ(w) curve is normalized (Λ^N (w)) to the value Λ(w)|w_min, in order to secure an identical reference level and extract more efficient information[37].
The decay of Λ^N (w) as a function of window size follows characteristic patterns for random, self-similar, and structured spatial arrangements, and lacunarity functions can provide a framework for identifying such diversities. Thus, the Λ^N (w) curve may form the FV. The concept of reducing the feature space dimension introduces the essence of modelling Λ^N (w) with another function L(w). The normalized DLac-w curves bear resemblance to hyperbola. On this ground, the function
L(w)=b/w^a +c,w=[w_min,w_max], (4)
was chosen to model the Λ^N (w) curves[37]. Parameter a portrays the convergence of L(w), b represents the concavity of hyperbola and c is the translational term. The best interpretation of Λ^N (w) by the model L(w) is computed as the solution of a least squares problem, where parameters a, b, c are the independent variables[38]. Parameters a, b, c embody the global behaviour of the Λ^N (w) curve, i.e., the DLac-based texture features of a WCE image. Another way to reduce the feature space dimension established by the DLac curve is to use six statistical measures that are calculated on the Λ^N (w) curve[8,39]. The six common statistical features extracted from Λ^N (w) curve are: mean (MN = Ε[Χ]), standard deviation (STD = 〖(E[〖(X-μ)〗^2])〗^(1/2)), entropy (ENT = -∑▒〖(p_i·log⁡(p_i ))〗), energy (ENG = E[X^2]), skewness (γ_3 = E[〖((X-μ)/σ)〗^3], measure of the asymmetry of the probability distribution), and kurtosis (γ_4 = E[〖((X-μ)/σ)〗^4], descriptor of the shape of probability distribution), where p_i is the probability of value x_i, X is a random variable with mean value μ and standard deviation σ.
In order to draw more conclusive results about the efficiency of DLac-based FV, five different types of FVs are constructed:
〖FV〗_1^DLac=[Λ^N (w_min+1),…,Λ^N (w_min+5)], (5)
〖FV〗_2^DLac=[a,b,c], (6)
〖FV〗_3^DLac=[a,b,c,Λ^N (w_min+1),Λ^N (w_min+2),Λ^N (w_min+3)], (7)
〖FV〗_4^DLac=[MN,STD,ENT,ENG,γ_3,γ_4], (8)
〖FV〗_5^DLac=[〖FV〗_3^DLac,〖FV〗_4^DLac ]. (9)
In 〖FV〗_1^DLac, the entire DLac curve values are not used, in order to avoid the “curse of dimensionality” effect and because the length of the curve depends on the size of the input image/sub-image (for more details see Section “Parameter Setting, HAF Realization and FV Construction). 〖FV〗_3^DLac aims to express the glocal, i.e., both global (parameters a, b, c) and local (values Λ^N (w)), behavior of the curve. As a previous study[6] has shown, this is quite an efficient approach to replace the lengthy DLac-w curve, without omitting crucial information. At last, 〖FV〗_5^DLac constitutes an augmented version of 〖FV〗_3^DLac, in terms of global DLac-w curve behavior representation.
EXPERIMENTAL AND IMPLEMENTATION ISSUES
Dataset
A fundamental part to develop a robust and efficient algorithm for WCE-based lesion detection, in general, is the existence of a sufficiently rich database, on which the algorithm is going to be tested. Unfortunately, the majority of related approaches (59%) are based on databases consisting of less than 500 images[7]. Using a limited number of images, or even highly correlated images can doubtlessly lead to overfitting that may produce a virtual, unrealistic, fruitful performance.
The WCE image database used in this study contains 400 frames depicting CD-related lesions and 400 lesion-free frames acquired from 13 patients who undertook a WCE examination. The exams were rated twice by two clinicians. Then, we selected only the images that have been classified all four times into the same class. This procedure allowed to assess the inter-/intra-rater variability and acquire a highly confident dataset. Moreover, the physicians, upon mutual agreement, manually computed a region of interest (ROI) in each image. Some characteristic examples are given in Figure 4. The CD lesion images were manually annotated into mild (152 samples) and severe (248 samples) cases, based upon the size and severity of the lesion. The mild case includes lesions at an early stage with vague boundaries that are difficult to recognize (Figure 4 bottom), whereas the severe case contains lesions that are clearly shaped (Figure 4 middle). This discrimination was performed in order to extensively assess the performance of the proposed scheme on the basis of the lesion detection difficulty. Additionally, a “total” scenario that contains all lesion images is examined, so as to assess the performance from a spherical perspective. The 400 abnormal images were taken from 400 different lesion events for achieving the lowest possible similarity. The normal part of the dataset contains frames that depict both simple and confusing tissue (folds, villus, bubbles, intestinal juices/debris) for creating realistic conditions and avoiding virtual optimistic results.
In order to further validate the efficacy of the proposed scheme, two open WCE databases are engaged, namely CapsuleEndoscopy.org (CaEn)[41] and KID[42-44]. The CaEn database contains 6 normal and 22 CD-related lesion images (collected using the Pillcam SB from Given Imaging, Israel) while the KID database contains 60 normal (30 with confusing intestinal content) and 14 CD-related lesion images (collected using the MiroCam system, IntroMedic Co, Korea).
Parameter Setting, HAF Realization and FV Construction
As far as the CT is concerned, two parameters have to be determined, i.e., the number of analysis scales and the number of analysis angles at the second scale. It is prevalent in related applications to use three to four scales for the analysis[8]. One of the main factors that determine the number of scales is the input data to be processed. As the number of scales increases, the size of the computed sub-images decreases, which may lead to negative effects. In our approach, after exhaustive trials we opted for four analysis scales. Each scale employs a certain number of angles that differ from scale to scale. It has been shown[6] that, for avoiding data redundancy and complexity, the optimum number of angles at the second scale is 8.
Considering the implementation of DLac analysis, Λ(w) is calculated for gliding box size r = 3 pixels and gliding window size w = 4 to wmax, where wmax is the minimum dimension of the input data. We did not choose a fixed value for wmax because the curvelet sub-images vary a lot in size, and we needed as longer DLac curves as possible, so as to acquire more efficient FVs.
Regarding the gliding box, its size has to be small in order to be capable to recognize slight local spatial variations that characterize lesion tissue. The value r = 3 pixels was selected after exhaustive experiments. As far as the gliding window is concerned, its size has to range from small to large values so as to capture both micro- and macro-structures and achieve multiscale information extraction. The minimum size of gliding window adopted here is the smallest feasible value, i.e., r+1, in order not to miss information from the tightest possible analysis scale.
In order to implement the curvelet sub-image selection via HAF, the 25% of the dataset was used. From the 800 images in total, we randomly selected 100 normal and 100 abnormal samples without considering the severity class they belong. For each generation of GA, the FF value was calculated accordingly to the whole dataset of the 200 images per chromatic channel. The selected sub-images per chromatic channel and FF method were found to be (scale/angle):
{[2/(5, 6, 8), 3/(4, 5, 8, 9, 12, 13), 4/(1, 4, 5, 10, 13)]|Y; [2/(2, 6), 3/(4, 5, 9, 12, 13, 16), 4/(4, 8, 13, 16)]|Cb; [2/(1, 5, 6), 3/(1, 4, 9, 13), 4/(1, 5, 9, 13, 16)]|Cr}|EFF, and
{[1/(1), 2/(6, 7, 8), 3/(8, 9, 12, 13), 4/(1, 2, 3, 7, 9, 13)]|Y; [1/(1), 2/(2, 6), 3/(9, 12), 4/(1, 4, 8, 9, 13, 16)]|Cb; [1/(1), 2/(6), 3/(1, 4, 8, 12), 4/(1, 5, 8, 9, 13, 16)]|Cr}|LFF.
The 〖FV〗_x^DLac, (x =[1,5]), was calculated for each individual chromatic channel, for the combination of all channels and for each feature extraction approach (R-/NR-case). For the combined channel scenario (Section “Hybrid Adaptive Filtering”), the NR-case was not taken into consideration, as it would lead to a lengthy FV and the classification procedure would suffer from the “curse of dimensionality” effect. For example, for the 〖FV〗_5^DLac and EFF case the resulted FV would contain 456 features (12 features/sub-images x 38 sub-images).
Classification Setup
The classification phase of the HAF-DLac scheme is performed by a SVM classifier with radial basis kernel function[40]. SVM have been used extensively in pattern recognition applications related to WCE image analysis[6,9,11,17], showing superior performance. The data from the database that did not contribute to the sub-image selection, were used for the classification procedure. In order to achieve as much generalization as possible, 3-fold cross validation was applied 100 times and the average accuracy ((ACC) ̅), sensitivity ((SENS) ̅), specificity ((SPEC) ̅), and precision ((PREC) ̅) values were estimated.
RESULTS
The performance of the proposed scheme is evaluated through the experimental results derived from the application of the CD lesion detection technique to the experimental dataset. To this end, results from every individual channel (Y, Cb, Cr) and the combination of them, under both HAF-DLac implementation scenarios (R/NR-case) and all severity cases (mild, severe, total) are presented.
Individual Channel Case
For the individual channel case, (ACC) ̅ values were calculated for R and NR cases, for all severity scenarios and FVs.
Reconstruction Case
For the R-case, the (ACC) ̅ values for all individual channels and all CD lesion cases are depicted in Figure 5 for the two FFs used and for the five types of FV. For the mild lesion case, it is clear that the augmented FV (〖FV〗_5^DLac) extracted from Cr channel provides with the best performance (78.8% (ACC) ̅). Channel Cb achieves 3.7 percentage points (pp) lower (ACC) ̅ than Cr, whereas channel Y delivers the worst detection accuracy (71.2%) for the same FV. These results refer to the LFF case. On the contrary, the EFF scenario evidently exhibits deteriorated performance for all channels. This is explained by the fact that LFF-based filtering, due to the intuitive characteristics of DLac, is able to discern and boost more efficiently the textural structures of mucosa that slightly differ in case of mild lesions. In case of severe lesions, the detection accuracy of the HAF-DLac is significantly higher, as expected, for all channels compared to mild lesion scenario. (ACC) ̅ is 91.5%, 90.3% and 93.8% for Y, Cb and Cr channels, respectively, for the LFF case and 〖FV〗_5^DLac. Given the easier task of discriminating severe lesions, the EFF-based filtering, provides with results that slightly differ (–0.2 to –0.9 pp) from the LFF ones, as opposed to the mild lesion case, where the difference is –1.6 to –4.8 pp. Finally, at the total scenario, Cr channel also provides with the best performance (90.5% (ACC) ̅), followed by Cb (88.0% (ACC) ̅) for 〖FV〗_5^DLac and LFF. Y channel achieves 85.7% (ACC) ̅ or the same FV but for EFF case.
No-Reconstruction Case
The procedure followed in the R-case was also adopted in NR-case. More specifically, Figure 6 shows the (ACC) ̅ values for all individual channels. It is clear that, as in R-case, channel Cr and LFF approach exhibit the best performance regarding the value of (ACC) ̅ for the majority of cases. Considering mild lesions, the highest (ACC) ̅ values achieved are 64.8% for {EFF, 〖FV〗_2^DLac, Y}, 77.8% for {LFF, 〖FV〗_4^DLac, Cb} and 81.2% for {LFF, 〖FV〗_3^DLac, Cr}. For severe lesions, the best (ACC) ̅ values are 89.1%, 87.0% and 90.2% for {EFF, 〖FV〗_2^DLac, Y/Cb/Cr (respectively)}. Last but not least, for the total CD lesion case, the highest (ACC) ̅ value is 86.3% for Cr channel, followed by Cb channel with 84.8% (ACC) ̅ value for {LFF, 〖FV〗_4^DLac}. The worst performance is delivered by Y channel, achieving 81.5% (ACC) ̅ for {EFF, 〖FV〗_4^DLac}.
Combined Channel Case (R-case only)
The evaluation of HAF-DLac for the combined channel data followed the same practice as in individual channel data. The (ACC) ̅ values for the combination of Y, Cb and Cr channels and all CD lesion scenarios are depicted in Figure 7 for the two FFs used (LFF (black line) and EFF (gray line)) and for the five types of FV. For mild lesions, the highest classification (ACC) ̅ value for LFF approach is 79% and for EFF approach is 75.9% for 〖FV〗_5^DLac. In case of severe lesions, the (ACC) ̅ values are increased by 13.7 pp and 17.3 pp (i.e., 92.7% and 93.2%) for LFF and EFF, respectively, for 〖FV〗_3^DLac. At last, in the total case, LFF achieves 88.3% (ACC) ̅ value for 〖FV〗_5^DLac, whereas EFF provides with 85.2% (ACC) ̅ value for the same FV.
Overall Performance
Table 1 presents the best (ACC) ̅ values in the format of “percent (R/NR-case – FF – FV type)”, both for individual and combined channel cases and all three severity scenarios from a spherical perspective. The best mean results for each severity scenario are formatted in bold. The (SENS) ̅-(SPEC) ̅ values for Cr-mild, Cr-severe and Cr-total are 76.6%-85.8%, 95.2%-92.4%, and 91.8%-89.2%, respectively. Moreover, for comparison purposes, the best classification results of the proposed scheme for all severity scenarios, and the classification results when using some of the most promising schemes in literature, proposed in[6] (CurvLac),[8] (CurvLBP), and[17] (ECT), are presented in Table 2. In[6], the authors engaged curvelet-based Lac features extracted from single or combined sub-images in the curvelet domain, whereas in[8], curvelet-based LBP is applied for ulcer recognition, and in[17], MPEG-7-based edge, color and texture features are used in order to detect CD lesions. At last, Table 3 presents the classification results acquired from applying the above approaches to the open databases CaEn and KID.
Sensitivity Analysis
To examine the robustness of the proposed HAF-DLac approach, sensitivity analysis with regard to image noise, the parameter r of DLac, and some GA parameters (initial population (IP), generations, P_(0→1), and P_(1→0)) was performed. In particular, the sensitivity of the quantities (SENS) ̅ and (SPEC) ̅, defined as δ(X) = (|Xnew – Xbase|/Xbase)*100%, where X is (SENS) ̅ or (SPEC) ̅, Xbase is the base value that is achieved with the current settings and Xnew is the new value acquired after changing one parameter of the system, was estimated. Given that the δ calculation with respect to each examined parameter requires full analysis, we performed it only for the total scenario. The (SENS) ̅base and (SPEC) ̅base used in this study are the highest values achieved for the total scenario R-case LFF approach and 〖FV〗_5^DLac, i.e., 91.8% and 89.2%, respectively (see Section “Overall Performance”).
As far as the resiliency to noise is concerned, zero mean Gaussian noise was added to the images. The variance of the added noise ranged from 0.005 to 0.05 in increments of 0.001 up to 0.01 and 0.005 from 0.01 to 0.05. In Figure 8A, the (SENS) ̅ and (SPEC) ̅ values are depicted, whereas in Figure 8B the index δ for these metrics is shown. It is observed that the proposed system is rather robust to noise, as the sensitivities of (SENS) ̅ and (SPEC) ̅ are <2% for noise variance 0.005 and do not exceed 6% and 2.5%, respectively, for noise variance up to 0.01. When more intense noise is added, the performance notably drops; however, even in such a case, 83% (SENS) ̅ and 74.4% (SPEC) ̅ for 0.02 variance noise are quite acceptable.
Another significant parameter of the proposed system is the size r of the gliding box of DLac analysis. Figure 8C presents the (SENS) ̅ and (SPEC) ̅ values when r ranges from 2 to 15 pixels, whereas Figure 8D depicts the corresponding sensitivity values of (SENS) ̅ and (SPEC) ̅. The base values correspond to r=3. It is evident that the bigger the gliding window, the lower the performance. However, the HAF-DLac scheme exhibits remarkable robustness, since the sensitivities of (SENS) ̅ and (SPEC) ̅ are 4.5% and 5.5%, respectively, even for tripling the size of the gliding window. At the extreme case of r=15, the (SENS) ̅ value is more than 75% and (SPEC) ̅ value is more than 83%, indicating an efficient performance.
Finally, Table 4 tabulates the results of the sensitivity calculations for +20% and -20% shift of four GA parameters. It is clear that the proposed approach is very robust with respect to all parameters, as the sensitivity of (SENS) ̅ and (SPEC) ̅ is less than 1% for all tested cases.

Essay: Wireless Capsule Endoscopy

Essay details and download:

Text preview of this essay:

About this essay:

Essay details and download:

Text preview of this essay:

About this essay:

Essay Categories: