Detecting Lung Disease using X-ray – Machine Learning
INTRODUCTION
Kaggle is an online community of data scientists, Machine learning engineers, and many other professionals. This online community is owned by Alphabet Inc with its parent as Google. The community provides a platform for hosting challenges, publishing datasets, providing an online workbench for data science and crash courses on AI. Kaggle initially hosted only Machine Learning competitions but currently, it is providing many other services like a public platform for hosting datasets, job postings through careers and many more as listed before. As a whole Kaggle as an online data science platform has fostered advancement in many fields ranging from transport, politics, medical research, physics and many more to come. Through Kaggle competitions participants are compelled to improve beyond the existing practices thereby providing new and efficient methods to solve the problems. Many of the findings found during the competitions are turned into paper and many other have taken the form of research questions. [3] In the modern world, the field of Image processing and Natural Language processing has advanced to very great heights. Detecting objects within images and in video clippings is now an easy task, thanks to the current development and the availability of powerful resources. Many fields like Computer Vision, Artificial Intelligence, Self-driving cars have grown exponentially thanks to the advancements in the existing methods and an increase in the availability of the resources. Certain specialized methods are required for identifying the objects within the photos which were extant even in the 1900s. But why these ideas are now gaining much traction are compared to earlier years. One of the main reasons, which is already pointed out, is the current technological advancements in the computation field and also the availability of huge data required to perform experiments and provide insightful findings through existing methods. Dozens of methods are listed for the activity of object detection which is classified into Machine Learning methods or Deep Learning methods.
Machine Learning methods are [4] :
1.Viola-Jones object detection framework based on Haar features
2.Scale-invariant feature transform (SIFT)
3.Histogram of oriented gradients (HOG) features
Following are the Deep Learning techniques used for object detection [4]:
1.Region Proposals (R-CNN, Fast R-CNN, Faster R-CNN)
2.Single Shot MultiBox Detector (SSD)
3. You Only Look Once (YOLO)
For this paper, we are using Mask RCNN, a technique which is an extension of the Faster RCNN Deep Learning technique. Most of the time we are trying to identify the objects within the images which can be any person, animal or any objects for the given instance. But understanding the semantics of the given image and identifying the classes of the objects present on the level of each pixels is a daunting task but it can be achieved through Semantic Segmentation. [5]. The Semantic Segmentation technique is very helpful for fields like Robotics, Computer Vision, Self-driving cars, where it is necessary to understand the semantics of the environment in which the action is performed so as to achieve the desired outcome. These techniques automate the tasks at hand thereby opening many opportunities for the development of many solutions. Currently, deep learning techniques are providing the much greater result as compared to the other methodologies. In case of object detection too techniques like Fast RCNN, YOLO, Faster RCNN are providing great results but as compared to the listed methods there is one other existing method which provides great results for pixel level detection as compared to others, which is Mask RCNN. It is an extension of the Faster RCNN, which uses a different branch or a fully connected network to predict the regions in comparison to Faster RCNN which uses a region proposal network to generate proposals for objects inside the images. The Mask RCNN provides great results as compared to Faster RCNN because it extends the same techniques to pixel level segmentation which speeds up the process. Also, the techniques can be applied to many general tasks. Mask RCNN extends Faster RCNN by just adding a branch of fully convolutional network parallel to the existing execution flow of Faster RCNN, which identifies the bounding boxes. It adds only a small overhead to the existing methods which are 5fps. [6]. Mask RCNN can also be applied to the problem of human pose estimation and detection of many other general actions within the image.
LITERATURE REVIEW
2.1 YOLO
(1) Paper: You Only Look Once: Unified, Real-Time Object Detection (https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Redmon_You_Only_Look_CVPR_2016_paper.pdf)
Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives in the background.
Compared to other state-of-the-art detection systems, YOLO has produced more localization errors, but it is unlikely to predict FP against the background .
Using our system, you only look once (YOLO) at an image to predict what objects are present and where they are. YOLO is refreshingly simple. A single convolutional network simultaneously predicts multiple bounding boxes and class probabilities for those boxes.
Our system looks at the image once and predicts which object is where. Yolo is very simple. Single convolutional net work is different bounding estimates the probability of a box with the box at the same time.
First, YOLO is extremely fast. Since we frame detection as a regression problem we don’t need a complex pipeline. This means we can process streaming video in real-time with less than 25 milliseconds of latency. Furthermore, YOLO achieves more than twice the mean average precision of other real-time systems.
First, YOLO is very fast because it doesn’t require complicated pipelines. We can process streaming video in real time with less than 25 millisecond latency . Additionally, YOLO achieves more than twice the average accuracy of other real-time systems.
Second, Unlike sliding window and region proposal-based techniques, YOLO sees the entire image during training and test time so it implicitly encodes contextual information about classes as well as their appearance. Fast R-CNN, a top detection method, mistakes background patches in an image for objects because it can’t see the larger context. YOLO makes less than half the number of background errors compared to Fast R-CNN.
Second, unlike the sliding window or Region Proposal-based technology, YOLO sees the entire image during training time. Fast R-CNN incorrectly handles background patching of objects because larger contexts cannot be seen. However, YOLO cuts the number of background errors in half compared to Fast R-CNN .
YOLO still lags behind state-of-the-art detection systems in accuracy. While it can quickly identify objects in images it struggles to precisely localize some objects, especially small ones.
YOLO is less accurate than state-of-the-art detection systems. Although it quickly identifies objects in images, it is less accurate for localize objects or small objects.
2.2 YOLOv4 An Incremental Improvement
YOLOv4 is extremely fast and accurate. In mAP measured at .5 IOU YOLOv4 is on par with Focal Loss but about 4x faster. Moreover, you can easily tradeoff between speed and accuracy simply by changing the size of the model, no retraining required!
YOLOv4 is very fast and accurate. Loss equivalent to Focal in mAP measured by 0.5 IOU, but 4 times faster. Further training again, not even by changing the size of the model, you can set the speed and accuracy.
It looks at the whole image at test time so its predictions are informed by global context in the image. It also makes predictions with a single network evaluation unlike systems like R-CNN which require thousands for a single image.
This makes it extremely fast, more than 1000x faster than R-CNN and 100x faster than Fast R-CNN.
View the entire image during test time and predict through the global context in the image. In addition, a network of thousands of images Phil John and predictions into a single network, unlike the R-CNN. Because of this, it is more than 1000 times faster than R-CNN and more than 100 times faster than Fast R-CNN.
2.3 YOLO and Faxmble
(1) Paper: You Only Look Once: Unified, Real-Time Object Detection (https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Redmon_You_Only_Look_CVPR_2016_paper.pdf)
YOLO makes far fewer background mistakes than Fast R-CNN. By using YOLO to eliminate background detections from Fast R-CNN we get a significant boost in performance.
YOLO makes fewer background mistakes than Fast R-CNN. Therefore, using YOLO when removing background detection in Fast R-CNN significantly boosts performance .
For every bounding box that R-CNN predicts we check to see if YOLO predicts a similar box. If it does, we give that prediction a boost based on the probability predicted by YOLO and the overlap between the two boxes.
For each bounding box that R-CNN predicts, we check if YOLO predicts a similar box. If so, we boost the prediction based on the probability predicted by Yolo and the overlap between the two boxes.
The best Fast R-CNN model achieves a mAP of 71.8% on the VOC 2007 test set. When combined with YOLO, its mAP increases by 3.2% to 75.0%.
The best Fast R-CNN model achieves a mAP of 71.8% on the VOC test set . When combined with the Yolo MAP doep a 75.0% increase to 3.2% it is .
The boost from YOLO is not simply a byproduct of model ensembling since there is little benefit from combining different versions of Fast R-CNN. Rather, it is precisely because YOLO makes different kinds of mistakes at test time that it is so effective at boosting Fast R-CNN’s performance.
R- Fast CNN between bonds because almost no improvement, and the YOLO Fast R-CNN ensemble is not simple. It is effective in improving the performance of Fast R-CNN because YOLO makes several kinds of mistakes at test time .
Unfortunately, this combination doesn’t benefit from the speed of YOLO since we run each model separately and then combine the results. However, since YOLO is so fast it doesn’t add any significant computational time compared to Fast R-CNN.
This combination does not benefit from YOLO’s’speed’. This is because it runs each model individually and then combines the results . However, since YOLO is very fast, it does not add time compared to using Fast R-CNN alone.
2021-6-5-1622902657