Abstract
This research document consists of a detailed explanation about cloud computing, the cloud infrastructure and architecture. Also speaks about managing data growth in cloud. The advantages and disadvantages that may occur due to the growth of data in the cloud and methods that can be implement in order to reduce problems occurred.
Managing cloud data growth
Introduction
Cloud computing is defined as a large pool of interconnected systems in public or private networks providing resources such as data, network, operating systems, applications, storage, processing power and so on, which can be deployed with quick and easy manner. Virtualization is the key technology behind cloud computing. It is also a combination of development of parallel computing, grid computing, virtualization and utility computing etc. Cloud can be described as a space where computing can be pre-installed and provided as a service. This environment implies great flexibility and availability of computing resources at a lower cost. With regard to computing resources, cloud computing has a vision of unlimited resources .
Big data is one of the main reasons for the implementation of cloud computing. Large amounts of data are uploaded in the digital world in the size of PETA bytes, which requires lots of storage and computing resources .
Cloud computing is available to users in Pay-per-use-on-demand mode. These services are provided by vendors such as AWS (Amazon Web Services), Microsoft Azure etc., providing convenient access to shared resources through internet. This would benefit and save costs to buy physical resources that may be vacant . Customers can gain access to general application services such as business, education online through web browser, while data and other software programs are stored on cloud servers in data centres . Cloud computing can improve the availability of IT resources and owns many advantages over other computing techniques. The convenience of implementing and using cloud services has attracted companies to adapt and to optimize their IT infrastructure costs. Therefore, leading to rapid growth of cloud computing.
Cloud computing
Figure 1. Cloud Computing
Cloud computing is the delivery of computing services over the internet on a-pay as you go basis. Companies can rent access to cloud computing resources such as infrastructure or data centers from a cloud service provider rather than owning and maintaining their own IT infrastructure, instead simply pay for what they use, which may save upfront migration costs. Cloud computing includes consumer services such as Gmail, cloud backup of photos from smartphone etc.
Another benefit of cloud storage is, being able to access data from anywhere around the world at any time not having to worry about running out of space on the client’s computer. With the help of cloud computing the client is not required to carry physical storage to access data. Data can be accessed from any device at any location with the help of the Internet.
Cloud Infrastructure
Cloud infrastructure refers to the hardware and software components that are required to support the computing requirements of a cloud computing model. These components include, servers, storage, network and virtualization software. Cloud infrastructure virtualizes resources and logically presents them to users through application program interfaces or graphical user interfaces. These virtualized resources are hosted by service provider and are delivered to users through internet. These resources include virtual machines and components such as servers, memory, network switches, firewall, load balancers and storage.
Cloud computing architecture
Figure 2. Cloud Computing Architecture
In cloud computing architecture cloud infrastructure refers to the backend components, the hardware found within most enterprise data centers. These include multicore servers, persistent local area network equipments such as switches and routers but on much greater scale.
The cloud computing storage architecture consists of four deployment models, known as public, private, community and hybrid. In order to avoid performance issues in applications, administrators must find the right cloud service to match traffic patterns and dependencies. They must also consider where their users will connect from and how much bandwidth is required. For example, less-accessed workloads can be stored in public cloud at lower costs.
Public
The public cloud platform is designed for public use and is open to everyone. Microsoft, Google, Amazon, VMware IBM, and Rackspace. The resources are made available by service provider to everyone and easily accessible. It contains some concerns over privacy, security and data access for customers, but it is less secure than other deployment models and suites small to medium businesses.
Private
This model is managed and maintained by a single company or organization that comprises of multiple customers. These resources are only available internally and cannot be accessed by public. To have a private cloud it requires them to have their own infrastructure. However, the costs are significantly higher than because it requires expertise training and management of infrastructure.
Community
The community cloud model is referred as organization sharing its cloud infrastructure among customers who have similar interests or concerns. This infrastructure is shared and owned by different organizations like research groups, together with work of companies and government organizations.
Hybrid
Hybrid cloud is referred as the combination of two or more deployment models which can be either, public, private and community. It is well organized and allow different entities to access data over the internet because it offers more secure control of applications and data.
Cloud infrastructure is present in each of the models. These deployment models represent how the computing infrastructure delivers cloud services. Cloud service models are classified into, Infrastructure as a Service (IaaS), Software as a Service (SaaS) and Platform as a Service (PaaS). Each of these services provide different products to users and serve different purposes.
Infrastructure as a Service
This service includes computing control and storage, offers virtualized computing resources over the internet such as servers, storage, processor, data centre and network and other infrastructure resources. A cloud infrastructure is a cost-effective model providing distinguished services like reducing hardware maintenance complexity, real-time workload balancing etc. This infrastructure allows users to minimize the initial costs of purchasing computing hardware such as servers, network devices, software and processing power. Instead, the users could buy those resources as a fully outsourced service from cloud vendors such as Amazon Web Services (AWS).
Platform as a Service
This service provides services in the form of programs, framework, integrated development environment and development tools hosted on the server provider. It provides users and externally managed platform for building applications and deploying developed applications onto the cloud infrastructure. PaaS works similar to IaaS, but it offers the additional level of rented functionality and transfer more costs from hardware investment to operational expenses. Some vendors for PaaS model are: Microsoft Azure, VMware, Google App Engine etc.
Software as a Service
This service provides applications upon user requirement. It is a model for software deployment, in which applications are remotely hosted by service providers as a service provided to the user across the network. This model is reliable and cost-effective as the user is not required to purchase resources or to install and run the application in user’s own computer. The user is just required to pay for the particular user during a particular session. This helps in reducing the burden of software maintenance and support.
Cloud data growth
The rise of popularity of cloud-based data storage has increased the amount of data being uploaded on to the cloud. Tons of data are uploaded on a daily basis by companies and individuals which leads to the extreme management of data. The need to keep these data safe and longer requires organizations to integrate how they manage and use their data.
However, the growth of data in cloud has its own risks, one of the main concerns are security. When implementing cloud services, it involves storage of critical and sensitive data. The massive amount of data stored in cloud makes it easier for hackers to gain access to important data. According to, a number of studies have addressed aspects of cloud security from different areas e.g. the network, hypervisor, guest VM and operating system under various approaches derived from traditional rule-based Intrusion Detection System or statistical anomaly detection models.
Cloud framework affects various issues on confidentiality, integrity and availability based on the nature of cloud service delivery models. Data management in clouds suffer from issues such as Data Breach which affects integrity and confidentiality. Data breach occurs when secure or confidential information is released to an untrusted environment, which can become the primary cause of a targeted attack. Data breach occurs when a cybercriminal successfully infiltrates source of data and extracts sensitive information. This can be done by accessing a computer or network or by bypassing network security remotely.
Despite end-user benefits gained by virtualization, it comes with range of threats including, exploits to security holes on virtual machines e.g. rootkit attacks on virtual machines, Internet-based attacks that aim to compromise cloud networks e.g. malware and DDoS attacks on cloud services.
Another problem is management of these huge amount of data. Big data processing on clouds may involve hundreds of application servers accessing data, which leads to the generation of massive amount of read and write requests. This requires placing hundreds of servers for data storage in place to distribute the read and write load and to ensure that failure of any of the large number of servers does not stop the entire service.
Cloud storage management
Cloud Storage Management Challenges
Some of challenges faced in adapting and growth of cloud storage are:
Big Data Security Holes
With the rapid growth of data in cloud storage, security of these data has become a major concern. Many IT managers aren’t comfortable in trusting sensitive data to someone else’s control. Often, big data adoption projects put security off till later stages. Big data technologies keep evolving yet the security of these technologies is being neglected, hoping that security will be granted on the application level, which in turn leads to Big data security being cast aside.
Cost
While lower costs are one of the major reasons for migrating to the cloud storage, the cloud isn’t always cheaper than on-premises storage. Companies adapting to Big data entails a lot of expenses.
If a company requires to have their own on-premises solution, they will have to consider the costs of having new hardware, software, new employees i.e. administrators and developers, electricity and much more. Although, some of the required frameworks are open source the company will still have to pay for the development, set-up, configuration and maintenance of software.
If the company plans on having a cloud-based big data solution, they will still require hiring new staff for the management of the solution and pay for the cloud services, Big data solution and maintenance of the required framework.
Considering both the cases above, the solution adapted must be able to support data growth in future within the premises to reduce extra costs to support and manage large amount of data.
Connectivity
Connectivity is also a major concern when it comes to cloud storage and management. Organizations have fast connections to access their data quickly in their own data centers. But in public cloud they are often forced to rely on public internet to access their data which is much slower.
Future Enhancement
Based on the challenges mentioned above, firstly, the cost for a solution will depend on a company’s specific technological requirements and business goals. For example, if a company requires flexibility they can adapt cloud technology, while companies which requires extremely strict security can have on-premise solutions. The key solution is to recognize business needs when implementing a solution to reduce extra costs. Secondly, the most obvious and possible solution against Big data security challenges is to implement security first. It is important to implement security at least during the design stage of the solution’s architecture to avoid security vulnerabilities.
Conclusion
In conclusion,