Chapter 1
INTRODUCTION
1.1 Introduction
Remote backup solution is provided by CLOUD storage on demand. However, by using such as having a single point of failure and vendor lock-ins with single cloud service providers. As suggested in this system, a logical solution is to stripped by data across different cloud providers. By exploiting the diversity of multiple clouds, we can improve the fault tolerance of cloud storage. While clearing data with traditional wiping, codes performs well even though some clouds experience short-term failures or permanent failures, there are real-life cases showing that permanent failures do occur.
The unexpected permanent cloud failures can be focused by this work. If the cloud permanently fails, it is necessary to activate restore to retain data redundancy and fault tolerance. From existing surviving clouds over the network data can be retrieved by restore operation and reconstructs the lost data in a new cloud. Today’s cloud storage providers charge users for outbound data (see the pricing models in this system), so moving an enormous amount of data across clouds can introduce significant monetary costs. It is important to reduce the repair traffic (i.e., the amount of data being transferred over the network during repair), and hence, the monetary cost due to data migration. To minimize repair traffic, regenerating codes have been proposed for storing data redundantly in a distributed storage system (a collection of interconnected storage nodes). Each node could refer to a simple storage device, a storage site, or a cloud storage provider. Regenerating codes are built on the concept of network coding, in the sense encoding operations and send encoded data performed by nodes. During repair, each surviving node encodes its stored data chunks and sends the encoded chunks to a new node the lost data is regenerated by the new node. It is shown that regenerating codes require less rebuild congestion than traditional eliminating codes by using the same fault-tolerance level. Regenerating codes have been extensively studied in the theoretical context.
However, the practical performance of regenerating codes remains uncertain. One key challenge for deploying regenerating codes in practice is that most existing regenerating codes require storage nodes to be equipped with computation capabilities for performing encoding operations during repair. On the other hand, to make regenerating codes portable to any cloud storage service, it is useful to assume only a thin-cloud interface, the standard read/write functionalities which is supported by storage nodes. This motivates us to explore, from an applied perspective, how to practically deploy regenerating codes in multiple-cloud storage, if only the thin-cloud interface is assumed.
In this report, the design and implementation of NC Cloud is a proxy-based storage system designed and providing fault-tolerant storage over multiple cloud storage providers. The interconnection of different clouds and transparently stripe data across the clouds can be done by Clouds. On top of Cloud, in the proposed system storage nodes posed the first implementable design for the functional minimum-storage regenerating (FMSR) codes. In FMSR code implementation maintains double-fault tolerance and has the same storage cost as in traditional wipe the coding schemes based on RAID-6 codes, but uses less congestion traffic when recovering a single-cloud failure. In particular, during repair it can eliminate encoding operations within storage nodes, while keeping the benefits of network coding in reducing repair traffic.
This is one of the first studies that puts regenerating codes in a working storage system and evaluates regenerating codes in a practical setting. One tradeoff of FMSR codes is that they are nonsystematic, meaning that it can be stored only encoded chunks formed by the linear combination of the original data chunks, and do not keep the original data chunks as in systematic coding schemes. Nevertheless, it has mainly designed FMSR codes for long-term archival applications, in which 1) by practice data backups are rarely read , and 2) it is common to restore the whole file rather than parts of the file should a lost file needs to be recovered. There are many real life examples in which enterprises and organizations store an enormous amount of archival data (even on the peta byte scale) using cloud storage (e.g., see case studies in the system). In August 2012, Amazon further introduced Glacier, a cloud storage offering optimized for low-cost data archiving and backup (with slow and costly data retrieval) that is being adopted by cloud backup solutions. We believe that FMSR codes provide an alternative option for enterprises and organizations to store data using multiple-cloud storage in a fault-tolerant and cost-effective manner. While this work is motivated by and established with multiple-cloud storage in mind, we point out that FMSR codes can also find applications in general distributed storage systems where storage nodes are prone to failures and network transmission bandwidth is limited. In this case, minimizing repair traffic is important for reducing the overall repair time.
The contributions are summarized as follows:
In this presentation a design of FMSR codes, assuming that double-fault tolerance is used. We show that in multiple-cloud storage, the repair cost by 25 percent compared to RAID-6 codes when four storage nodes are used saved by FMSR codes, and up to 50 percent as the number of storage nodes further increases. In the meantime, FMSR codes maintain the same amount of storage overhead as RAID-6 codes. Note that FMSR codes can be deployed in a thin-cloud setting as they do not require storage nodes to perform encoding during repair, still preserving the benefits of network coding in reducing repair traffic. Thus, FMSR codes can be readily deployed in today’s cloud storage services.
In the implementation details of how a file object can be stored via FMSR codes. In particular, we propose a two-phase checking scheme, which ensures that double-fault tolerance is maintained in the current and next round of repair. By performing two-phase checking, we ensure that double-fault tolerance is maintained after iterative rounds of repair of node failures. We conduct simulations to validate the importance of two-phase checking.
The conduction of monetary cost analysis to show that FMSR codes effectively reduce the cost of repair when compared to traditional erasure codes, using the price models of today’s cloud storage providers.
On both local cloud and commercial cloud settings the extensive experiments can be conducted. It shows that the FMSR code implementation only adds a small encoding overhead, which can be easily screen by the file transfer time over the Internet..
1.2 Existing System
Review of the related work in multiple-cloud storage and failure recovery. Multiple-cloud storage. For multiple-cloud storage several systems is proposed. HAIL provides whole and availability guarantees for stored data. To mitigate vendor lock-ins when switching cloud vendors RACS uses deletion of coding. The data is retrieved from the cloud that is about to fail and moves the data to the new cloud. Unlike RACS, NC Cloud excludes the failed cloud in repair. Vukoli_c advocates using multiple independent clouds to provide Byzantine fault tolerance. DEPSKY addresses Byzantine fault tolerance by combining encryption and erasure coding for stored data. All the above systems are built on deleting codes to provide fault tolerance, while NC Cloud takes one step further and considers regenerating codes with an emphasis on both fault tolerance and storage repair.
Minimizing I/Os. On efficient single node failure recovery schemes that minimize the amount of data read (or I/Os) for XOR-based deleting of codes several studies are proposed. For example, the authors of propose optimal recovery for specific RAID-6 codes and reduce the amount of data read by up to around 25 percent (compared to conventional repair that downloads the amount of original data) for any number of nodes. Note that our FMSR codes can achieve 25 percent saving when the number of nodes is four, and up to 50 percent saving if the number of nodes increases. The authors of propose an enumeration-based approach to search for best recovery solution for arbitrary XOR-based deleting of codes. Efficient recovery is recently addressed in commercial cloud storage systems. For example, new constructions of non-MDS erasure codes designed for efficient recovery are proposed for Azure and Face book. The codes used in trade storage overhead for performance, and are mainly designed for data-intensive computing. Our work targets the cloud backup applications.
1.3 Proposed System
In this report, the presentation of the design and implementation of NC Cloud, for providing fault-tolerant storage over multiple cloud storage providers can be done by a proxy-based storage system is designed. NC Cloud can interconnect different clouds and transparently stripe data across the clouds. On top of NC Cloud, it has proposed the first implementable design for the functional minimum-storage regenerating (FMSR) codes1. Our FMSR code implementation maintains double-fault tolerance and has the same storage cost as in traditional deleting coding schemes based on RAID-6 codes, but uses less repair traffic when recovering a single-cloud failure. In particular, the elimination is needed to perform encoding operations within storage nodes during repair. This is one of the first studies that puts regenerating codes in a working storage system and evaluates regenerating codes in a practical setting. One tradeoff of FMSR codes is that they are nonsystematic, meaning that it can store only encoded chunks formed by the linear combination of the original data chunks, and do not keep the original data chunks as in efficient coding schemes. Nevertheless, the main design of FMSR codes for long-term recorded applications, in which 1) in practice data backups are rarely read, and 2) the whole file common to restore rather than parts of the file should a lost file needs to be recovered .2 There are many real life examples in which enterprises and organizations store an enormous amount of archival data using cloud storage. In August 2012, Amazon further introduced Glacier, a cloud storage offering optimized for low-cost data archiving and backup (with slow and costly data retrieval) that is being adopted by cloud backup solutions. We believe that FMSR codes provide an alternative option for enterprises and organizations to store data using multiple-cloud storage in a fault-tolerant and cost-effective manner.
1.4 Thesis of organization
This report focuses on NC on this project. The report is preceded by detailed table of contents including lists of figures, tables and glossary.
The report is divided into seven chapters
‘ Chapter1: Introduction- In this chapter it explains NC Cloud approach for providing fault tolerance across multiple clouds using regeneration code and about existing system and proposed system and thesis of organization.
‘ Chapter2: Literature survey-This chapter gives a brief introduction about the work of author related to NC Cloud.
‘ Chapter3: Software requirement specification-This chapter explains functional requirements, on functional requirements, software requirements, and hardware requirements.
‘ Chapter4: Design-This chapter explains conceptual model, uml diagram, data flow, flow chart diagram, sequence diagram, use case diagram.
‘ Chapter5: Implementation-This chapter explains the various modules and the pseudo code related to it.
‘ Chapter6: Testing-This chapter explains all the required testing to run project.
‘ Chapter7: Results- This chapter gives the outcome of the work done and contains various snapshots of various modules and output.
Chapter 2
LITERATURE SURVEY
2.1 Introduction
For multiple-cloud storage several systems can be proposed. HAIL provides integrity and availability guarantees for stored data. DEPSKY addresses Byzantine Fault Tolerance by combining encryption and erasure coding for stored data. RACS uses erasure coding to mitigate vendor lock-ins when switching cloud vendors. It retrieves data from the cloud that is about to be failed and moves the data to the new cloud. The NC Cloud excludes the failed cloud in repair. All the above systems are based on deleting codes, the NC Cloud considers regenerating codes gives an importance on storage repair.
Regenerating codes exploit the optimal trade-off between storage cost and repair traffic. Existing studies mainly focus on theoretical analysis. Several studies empirically evaluate random linear codes for peer-to-peer storage. However, their evaluations are mainly based on simulations. NCF implements regenerating codes, but does not consider MSR codes that are based on linear combinations. Here, we consider the F-MSR implementation, and perform empirical experiments in multiple-cloud storage.
To provide fault tolerance for cloud storage, recent studies propose to stripe data across multiple cloud vendors. However, if a cloud suffers from a permanent failure and loses all its data, it needs to repair the lost data with the help of the other surviving clouds to preserve data redundancy. The presentation of a proxy-based storage system for fault-tolerant multiple-cloud storage called NC Cloud, which gives the cost-effective repair for a permanent single-cloud failure. NC Cloud is built on top of a network-coding-based storage scheme called the functional minimum-storage regenerating (FMSR) codes, which maintain the same fault tolerance and data redundancy as in traditional erasure codes (e.g., RAID-6), but use less repair traffic and, hence, incur less monetary cost due to data transfer. One key design feature of our FMSR codes is that we relax the encoding requirement of storage nodes during repair, while preserving the benefits of network coding in repair.
2.2 R D C for Network Coding-Based Distributed Storage Systems
Remote Data Checking (RDC) is a technique by which clients can establish that data outsourced at Un trusted servers remains intact over time. RDC is useful as a prevention tool, allowing clients to periodically check if data has been damaged, and as a repair tool whenever damage has been detected. Initially proposed in the context of a single server, RDC was later extended to verify data integrity in distributed storage systems that rely on replication and on erasure coding to store data redundantly at multiple servers. Recently, a technique was proposed to add redundancy based on network coding, which offers interesting tradeoffs because of its remarkably low communication overhead to repair corrupt servers. Unlike previous work on RDC which focused on minimizing the costs of the prevention phase, we take a holistic look and initiate the investigation of RDC schemes for distributed systems that rely on network coding to minimize the combined costs of both the prevention and repair phases. The scheme is able to preserve in an adversarial setting the minimal communication overhead of the repair component achieved by network coding in a benign setting.
2.2.1 Disadvantages
1. This system never analyze all the cloud at the same time due to lack of FMSR Technique
2. There may be a Data loss and corruption lack of RAID concept.
2.3 Enabling Data Integrity Protection in Regeneration-Coding-Based Cloud Storage
To gain protection of outsourced data in cloud storage against corruptions, enabling integrity protection, fault tolerance, and efficient recovery for cloud storage becomes critical. Striping data across multiple Servers, while using less repair traffic than traditional deleting codes during failure recovery it can be provided by regenerating. Therefore, the study of the problem of remotely checking the integrity of regenerating-coded data against corruptions under a real-life cloud storage setting. The design and implement a practical data integrity protection (DIP) scheme for a specific regenerating code, while preserving the intrinsic properties of fault tolerance and repair traffic saving. Our DIP scheme is designed under a Byzantine adversarial model, and enables a client to feasibly verify the integrity of random subsets of outsourced data against general or malicious corruptions. It works under the simple assumption of thin-cloud storage and allows different parameters to be fine-tuned for the performance-security trade-off. The implementation and evaluating the Overhead of our DIP scheme in real cloud storage test bed under different parameter choices.
2.3.1 Disadvantages
1. The DIP scheme never supports the data splitting and merging from different clouds.
2. There is less security due to lack of hybrid clouds.
2.4 A High-Availability and Integrity layer for Cloud Storage (HAIL)
In this topic introduction of HAIL (High-Availability and Integrity Layer) for a distributed cryptographic system that permits a set of servers to prove to a client that a stored file is intact and retrievable. HAIL strengthens, formally unifies, and streamlines distinct approaches from the cryptographic and distributed-systems communities. Proofs in HAIL are efficiently computable by servers and highly compact typically tens or hundreds of bytes, irrespective of file size. HAIL cryptographically verifies and reactively reallocates file shares. It is robust against an active, mobile adversary, i.e., one that may progressively corrupt the full set of servers. We propose a strong, formal adversarial model for HAIL, and rigorous analysis and parameter choices. It shows how HAIL improves on the security and efficiency of existing tools, like Proofs of Retrievability (PORs) deployed on individual servers.
2.4.1Disadvantages
1. The HAIL Architecture never supports the Network Coding Cloud Techniques.
2. The Universal Hash Functions size will be only 64 bit where the attacker can easily attacks the hash code.
2.5 Dependable and Secure Storage in a Clouds-of-Clouds (DEPSKY)
The increasing popularity of cloud storage services has lead companies that handle critical data to think about using these services for their storage needs. Medical record databases, power system historical information and financial data are some examples of critical data that could be moved to the cloud. However, the reliability and security of data stored in the cloud still remain major concerns. In this paper the DEPSKY, is a system that improves the availability, integrity and confidentiality of information stored in the cloud through the encryption, encoding and replication of the data on diverse clouds that form a cloud-of-clouds. The deployment of the system used four commercial clouds and used Planet- Lab to run clients accessing the service from different countries. It is observed that the protocols improved the perceived availability and, in most cases, the access latency when compared with cloud providers individually. Moreover, the monetary cost of using DEPSKY on this scenario is twice the cost of using a single cloud, which is optimal and seems to be a reasonable cost, given the benefits.
2.5.1Disadvantages
1. The Secret sharing overhead will be there due to non unique Digital Signature Algorithm.
2. The Universal Hash Functions size will be only 64 bit where the attacker can easily attacks the hash code.
Chapter 3
SOFTWARE REQUIREMENT SPECIFICATION
3.1 Introduction
In this Chapter it is described as the requirements. It also specifies the hardware and software requirements that are required in order to run the application properly. The Software Requirement Specification (SRS) is explained in detail, which includes overview of this dissertation as well as the functional and non-functional requirement of this dissertation.
Cloud-Assisted Mobile-Access of Health Data with Privacy and Auditability by SRS
Functional The proxy server controls the file access at cloud server, Generates Keys for Uploading Files, Encrypting the keys, Cloud Server authenticates User request, Regenerating codes for restoring the files, network coding is embedded in proxy server, fault tolerance, creating recovery cloud, experimentation, Automatic malicious user Revocation.
Non- Functional Data Owner never monitors the Cloud activities
Outside interface LAN , WAN, Routers
Performance Information about Finding File Hacker, file access details, regenerated files in cloud, Revocation of the File Hackers in the cloud
Attributes File Management, the process of Regenerating codes, coding of network, fault tolerance, fault recovery and implementation, experimentation.
Table: 3.1.1 SRS Summary
3.2 Functional Requirements
Functional Requirement defines a function of a software system and how the system must behave when presented with specific inputs or conditions. These may include calculations, data manipulation and processing and other specific functionality. In this system following are the functional requirements:-
‘ The data holder uploads their data in the cloud server. For the security purpose the data owner splits file into four packets, encrypts the data file and then store in the multiple clouds.
‘ The Proxy server is a proxy-based design that interconnects multiple cloud repositories.
‘ The data storage service of a cloud is managed by the cloud service provider
‘ Data owner encrypts and splits the stored data files in the multiple clouds (cs1, cs2, cs3 and cs4) shared for data consumers.
‘ By knowing the encrypted key user can only access the data file.
‘ The Remote user has to use proper key to access the files and file names. If there is no proper key then he is detected as attacker.
3.3 Non Functional Requirements
Non Functional requirements are those requirements that are not directly concerned with the specific functions delivered by the system. They may relate to emergent system properties such as reliability response time and store occupancy. Alternatively, they may define constraints on the system such as the capability of the Input Output devices and the data representations used in system interfaces. Many non-functional requirements relate to the system as whole rather than to individual system features. This means they are often critical than the individual functional requirements. The following non-functional requirements are worthy of attention.
The key non-functional requirements are:
‘ Security – The system should allow a secured communication between Cs and
Data Owner, User and File Owner
‘ Energy Efficiency – The Energy consumed by the Users to receive the File information from the cloud server
Reliability – The system should much reliable and must not degrade the performance of the existing system and should not lead to the hanging of the system.
3.4 Hardware Requirements:
‘ System : Pentium IV 2.4 GHz.
‘ Hard Disk : 40 GB.
‘ Floppy Drive : 1.44 Mb.
‘ Monitor : 15 VGA Colour.
‘ Mouse : Logitech.
‘ Ram : 1 GB.
3.5 Software Requirements:
‘ Operating system : – Windows XP.
‘ Coding Language : Java ‘ AWT, Swings, Networking
‘ Data Base : MS Access / My Sql
Chapter4
DESIGN
4.1 Introduction
The vital role played in the life cycle of software development is input design and it requires very careful attention of developers. The input design is to feed data to the application as accurate as possible. So that to minimize the errors occurring while feeding inputs are supposed to be designed effectively. According to Software Engineering Concepts, the input forms or screens are designed to provide to have a validation control over the input limit, range and other related validations.
This system has input screens in almost all the modules. Error messages are developed to alert the user whenever he commits some mistakes and guides him in the right way so that invalid entries are not made. Let us see deeply about this under module design.
Input design is the process of converting the user created input into a computer-based format. The goal of the input design is to make the data entry logical and free from errors. The error is in the input are controlled by the input design. The application has been developed in user-friendly manner. The forms have been designed in such a way during the processing the cursor is placed in the position where must be entered. The user is also provided within an option to select an appropriate input from various alternatives related to the field in certain cases. Validations are required for each data entered.
The Output from the computer is required to mainly create an efficient method of communication within the company primarily among the project leader and his team members, in other words, the administrator and the clients. The output of VPN is the system which allows the project leader to manage his clients in terms of creating new clients and assigning new projects to them, maintaining a record of the project validity and providing folder level access to each client on the user side depending on the projects allotted to him. After completion of a project, a new project may be assigned to the client.
4.2Architecture Diagram
Figure 4.2.1- architectural diagram showing upload and download operations
The Data Owner can upload the file details and can also maintain the all the cloud details .We have the Cloud Servers to show the cloud files, Update the cloud file status and to view all Cloud Details status this is done by the Cloud Servers. We also have the Remote User were the file Download Operation takes place by This Remote Users. The Remote Users can take the Details from the Cloud and also receive the File Details. Lastly We have the Proxy Server which would help up to do the repair operation, send the status of the repaired Cloud ,Notify the status of Cloud Partition-Delete, Corrupt and can also sleep.
4.3 Class Diagram
Figure 4.3.1 ‘class diagram showing the members and methods of each modules
In software engineering, a class diagram in the Unified Modeling Language (UML) is a type of static structure diagram that describes the structure of a system by showing the system’s classes, their attributes, operations (or methods), and the relationships among objects. Here the service provider, Receiver, Cloud Server and the Proxy Server. The Service Provider does Browse Encrypt, Upload and Reset Operation and it has File Name, Sender Name, and Cloud Name. Proxy Server does View Load (), Allocate Resources (), Select Cloud () etc and it has the filename, Cloud Name, Cloud Status etc. Receiver does the Decrypt (), Confirm () and store () operation.
4.4 Data Flow Diagram
Figure 4.4.1-data flow diagram for uploading the files by checking cloud status
The graphical representation of the “flow” of data through an information system, modeling its process aspects is DFD. To create an overview of the system and which can later be elaborated it can be used by DFD. DFDs can also be used for the visualization of data processing (structured design).
A DFD shows what kind of information will be input to and output from the system, where the data will come from and go to, and where the data will be stored. It does not show information about the timing of process or information about whether processes will operate in sequence or in parallel (which is shown on a flowchart) here for the Data holder Uploads the file to the cloud servers Cloud server checks the cloud status if it can store the data it proceeds otherwise it would select the another cloud then the Data flow is been shown in the above figure.
4.5 Sequence Diagram
Figure 4.5.1-sequence diagram showing operations between different modules
A Sequence diagram is an interaction diagram that shows how processes operate with one another and what is their order. It is a construct of a Message Sequence Chart. A sequence diagram shows object interactions arranged in time sequence. It depicts the objects and classes involved in the scenario and the sequence of messages exchanged between the objects needed to carry out the functionality of the scenario. Sequence diagrams are typically associated with use case realizations in the Logical View of the system under development. Sequence diagrams are sometimes called event diagrams or event scenarios. Here the Data holder tries to Upload File to Cloud Server (cs1,cs2,cs3,cs4,cs5). Proxy acts as a Client.Reciever checks Block information and Cloud Server would give the blocking confirmation, the file sending response and the file sending response is sent to the receiver. It can perform view data modify and view file .if any file has to be deleted then a response is been sent to the Data Owner.
4.6 Use Case Diagram
Figure 4.6.1-use case diagram
A use case diagram at its simplest is a representation of a user’s interaction with the system that shows the relationship between the user and the different use cases in which the user is involved. A use case diagram can identify the different types of users of a system and the different use cases and will often be accompanied by other types of diagrams as well.
Chapter 5
IMPLEMENTATION
5.1Introduction
The implementation phase of any project development yields the final solution. This phase involves the actual materialization of the ideas, which are expressed in the analysis document and developed in the designed phase. Implementation should be perfect mapping of the design document in a suitable programming language in order to achieve the necessary final product. Then the product is wasted due to incorrect programming language chosen for implementation or unsuitable method of programming. It’s better for coding phase to be directly linked to the design phase.
5.2 FMSR
Figure-5.2.1 Normal and Repair operation in the cloud system
It is considered as distributed, multiple-cloud storage setting from a client’s perspective, where data are striped over multiple cloud providers. An interface between client applications and the clouds can be served by proxy. If a cloud failed permanently, the proxy activates the repair operation.
It is considered as fault-tolerant storage based on a type of maximum distance separable (MDS) codes. For example a file object of size M, this can be divided into equal-size native chunks, which are linearly combined to form code chunks. When (n,k) MDS code is used, the native/code chunks are then distributed over n (larger than k) nodes, each storing chunks of a total size M=k, such that the original file object may be reconstructed from the chunks contained in any k of the n nodes. Thus, it tolerates the failures of any n-k nodes. The extra feature of FMSR codes is that reconstructing the chunks stored in a failed node can be achieved by downloading small data from the surviving nodes than reconstructing the whole file. This project considers a multiple-cloud setting with two levels of reliability fault tolerance and recovery in multiple-cloud storage, given
Figure -5.2.2 RAID-6, EMSR and FMSR codes Architecture
First it is assumed that the multiple-cloud storage is twin-fault tolerant (e.g., as in conventional (RAID 6) codes). It provides data availability under the transient unavailability of at most two clouds. That is, we set k �� n2. Thus, clients can always access their data as long as no more than two clouds experience transient failures or any possible connectivity problems. This can be expected that such a fault-tolerance level suffices in practice. Second, it is considered as single-fault recovery in multiple-cloud storage, given, permanent cloud failure is less frequent but possible.
If single-fault tolerance (i.e., k �� n1) and single-fault recovery, then by the theoretical results of that traditional RAID-5 codes have the same data redundancy and same repair traffic as FMSR codes.
The primary objective is to minimize the cost of storage repair (due to the migration of data over the clouds) for a permanent single-cloud failure. In this work, the comparison on any 2 codes: traditional RAID-6 codes and the FMSR codes with double-fault tolerance. It is defined as the repair traffic as the amount of outbound data being downloaded from the other surviving clouds during the single-cloud failure recovery. It can also minimize the repair traffic for cost-effective repair. Here, it do not consider the inbound traffic (i.e., the data being written to a cloud), as it is free of charge for many cloud providers. Now study on repair traffic involved in different coding schemes via examples. Suppose that store a file of size M on four clouds, each viewed as a logical storage node. Let first consider conventional RAID-6 codes, which are double-fault tolerant. Here, it shows a NC Cloud: a network-coding-based storage system in a cloud-of-clouds
The file can be divided into two native chunks (i.e., A and B) of size M=2 each. Which add two code chunks formed by the linear combinations of the native chunks. Suppose now that Node 1 is down. Then, the proxy must download the same number of chunks as the original file from two other nodes (e.g., B and A and B from Nodes 2 and 3, respectively). It then reconstructs and stores the lost chunk A on the new node. The total storage size is 2M, while the repair traffic is M. Regenerating codes have been proposed to reduce the repair traffic. One class of regenerating codes is called the exact minimum-storage regenerating (EMSR) codes. EMSR codes keep the same storage size as in RAID-6 codes, while having the storage nodes send encoded chunks to the proxy so as to reduce the repair traffic. Fill illustrates the double-fault-tolerant implementation of EMSR codes. Now the file can be divided into four chunks, and allocate the native and code chunks as shown in the figure. Suppose Node 1 is down. To repair it, each surviving node sends the XOR summation of the data chunks to the proxy, which then reconstructs the lost chunks.
Let consider the double-fault-tolerant implementation of FMSR codes. The file can be divided into four native chunks, and construct eight distinct code chunks P1…P8 formed by different linear combinations of the native chunks. Each code chunk has the same size M/4 as a native chunk. Suppose Node 1 is down. The proxy collects one code chunk from each surviving node, so it downloads three code chunks of size M=4 each. Then, the proxy regenerates two code chunks P0 1 and P0 2 formed by different linear combinations of the three code chunks. Note that P0 1 and P0 2 are still linear combinations of the native chunks. The proxy then writes P0 1 and P0 2 to the new node.
5.3 FMSR CODE IMPLEMENTATION
Now present the details for implementing FMSR codes in multiple-cloud storage. Which can specify three operations for FMSR codes on a particular file object: 1) file upload, 2) file download, and 3) repair. Each cloud repository is viewed as a logical storage node. This implementation assumes a thin cloud interface, such that the storage nodes (i.e., cloud repositories) only need to support basic read/write operations. Thus, the FMSR code implementation is compatible with today’s cloud storage services. One property of FMSR codes is that do not require lost chunks to be exactly reconstructed, but instead in each repair, it regenerate code chunks that are not necessarily identical to those originally stored in the failed node, as long as the MDS property holds. It proposes a two-phase checking scheme, which ensures that the code chunks on all nodes always satisfy the MDS property, and hence data availability, even after iterative repairs. In this section, it analyzes the importance of the two-phase checking scheme.
5.3.1 File Upload
To upload a file F, with dividing file into k (n-k) equal-size native chunks, it is denoted as F (i) where i=1,2,3′..k(n-k). Later encode these k(n-k) native chunks into n(n-k) code chunks, denoted by P(i) where i=1,2,3’..n(n-k). Each Pi is formed by a liner combination of the k(n-k) native chunks. Specifically, we let EM =��[i,j] be an k(n-k)*k(n-k) encoding matrix for some coefficients ��i,j in the Galois field GF. Let ECVi denote the Ith row vector of EM. Then compute each Pi by the product of ECVi and all the native chunks, where all arithmetic operations are performed over GF. The code chunks are then evenly stored in the n storage nodes, each having n(n-k) chunks. Also, it stores the whole EM in a metadata object that is then replicated to all storage nodes
Being
Browse the file from a source folder
Select a file
If valid file do
Read the file browsed by the data owner
Encrypt the file using the AES algorithm
Upload the file into the cloud server
Else display an error message
End
5.3.2 File Download
To download a file, first download the corresponding metadata object that contains the ECVs. Then, select any k of the n storage nodes, and download the k(n-k) code chunks from the k nodes. The ECVs of the k(n-k) code chunks can form a k(n-k)*k(n-k) square matrix. If the MDS property is maintained, then by definition, the inverse of the square matrix must exist. Thus, multiply the inverse of the square matrix with the code chunks and obtain the original k(n-k) native chunks. The idea is that treat FMSR codes as standard Reed-Solomon codes, and in this technique of creating an inverse matrix to decode the original data has been described in the tutorial.
Being
View the files present in the cloud
Request the secret key for a desired file in any of the cloud server
Select a cloud server to download the file
If cloud server not attacked do
Provide the file name IP of the file name
Download the file from the cloud server
Else provide the file through the cloud server 5
End
5.3.3 Iterative Repairs
Let now consider the repair of FMSR codes for a file F for a permanent single-node failure. For example FMSR codes regenerate different chunks in each repair, one challenge is to ensure that the MDS property still holds even after iterative repairs .Here, it propose a two-phase checking heuristic as follows: Suppose that the (r ‘ 1)th repair is successful, and now consider how to operate the rth repair for a single permanent node failure (where r -1) satisfies the MDS property after the rth repair. In addition, it also checks if another new set of chunks in all storage nodes still satisfies the MDS property after the (r+1)th repair, should another single permanent node failure occur and call this the repair MDS (rMDS) property.
S 1: By using following step download the encoding matrix from a surviving node.
Recall that the encoding matrix EM specifies the ECVs for constructing all code chunks via linear combinations of native chunks. It uses these ECVs for our later two-phase checking. Since it embeds EM in a metadata object that is replicated, can simply download the metadata object from one of the surviving nodes.
S 2: Choose 1 ECV from each of the n – 1 surviving nodes.
Each ECV in EM corresponds to a code chunk. This picks up one ECV from each of the n _ 1 surviving nodes.
S 3: Produce a repair matrix.
Construct an (n-k)*(n-1) repair matrix RM=[��i,j] where each element in ��ij (where i =1,2,3’..n-k and j =1,2,3”.n-1) is randomly selected in GF.
S 4: Add the ECVs for the new code chunks and reproduce a new encoding matrix.
Multiply RM with then ECVs selected in Step 2 to construct n -k new ECVs, Then, it reproduces a new encoding matrix EM`, this can be formed by substituting the ECVs of EM of the failed node with the corresponding new ECVs.
S 5: Check existing EM and MDS and rMDS properties are satisfied.
Intuitively, verify the MDS property by enumerating all k subsets of k nodes to see if each of their corresponding encoding matrices forms a full rank. For the rMDS property, check that for any possible node failure (one out of n nodes), now can collect one out of n – k chunks from each of the other n -1 surviving nodes and reconstruct the chunks in the new node, such that the MDS property is maintained. If either one phase fails, then we return to Step 2 and repeat. It emphasizes that Steps 1 to 5 only deal with the ECVs, so their overhead does not depend on the chunk size.
S 6: Download the actual chunk data and later regenerate new chunk data.
If the two-phase checking in Step 5 succeeds, then proceed to download the n – 1 chunk that correspond to the selected ECVs in Step 2 from the n -1surviving storage nodes to Cloud. Also, using the new ECVs computed in Step 4, now regenerate new chunks and upload them from NC Cloud to a new node.
5.3.4 Modules
‘ NC Cloud(Network Coding Cloud )
NC Cloud as proxy that bridges user applications and multiple clouds. Its design is built on three layers. The file system layer presents NC Cloud as a mounted drive, which can, thus, be easily interfaced with general user applications. The coding layer deals with the encoding and decoding functions. The storage layers deals with read/ write requests with different clouds. If any unauthorized user is modify the file in a cloud server then NC cloud regenerate that file and send to Remote user via newly created cloud.
‘ Server
The cloud service provider manages a cloud to provide data storage service for sharing with data consumers. To access the shared data files, data consumers download encrypted data files of their interest from the cloud and then decrypt them.
‘ Proxy Server
The Proxy server is a substitutes-based design that interconnects multiple cloud repositories, as shown in this system. The proxy serves as an interface between client applications and the clouds. If a cloud experiences a permanent failure, the proxy activates the repair operation.
‘ Data author
In this module, the data owner uploads their data in the cloud server. The data owner splits file into four packets, encrypts the data file and then store in the multiple clouds.
Being
Browse the file from a source folder
Encrypt the file by using the AES algorithm
Upload the file into the desired cloud server
End
‘ Data Consumer(End User )
In this stage, the user can only access the data file with the encrypted key to access the file. Then Proxy based NC cloud combines all the packets and sends to Remote user.
Being
View the files present in the each cloud server
Request for the secret key for the desired file
Download the file from the cloud server through the proxy server
End
‘ Threat Model (Attacker)
Attacker can attempt to transient failure for a cloud by making Doze Off for a particular period of time. The Attacker can also attempts to permanent failure by Deleting and corrupting the cloud. Then the Unauthorized user will considered as an attacker.
Being
Choose one of the attacks to apply to the server
Choose the desired server
Apply the attack to server
End
Chapter 6
TESTING
6.1 Introduction
Testing is the process of trying to discover every possible fault or defienciey in a work product. This is a way of checking the functionality of components, sub assemblies, assemblies and/or a finished product provided by testing. Testing is the process of exercising software with the intent of ensuring that the software system meets its requirements and user expectations and does not fail in an unacceptable manner.
6.2 Unit testing
This involves the design of test cases that validate that the internal program logic is functioning properly, and that program inputs produce valid outputs. It is the testing of individual software units of the application .it is done after the completion of an individual unit before integration. This is a structural testing, that relies on knowledge of its construction and is invasive. It performs basic tests at component level and tests a specific business process, application, and/or system configuration. Unit tests ensure that each unique path of a business process performs accurately to the documented specifications and contains clearly defined inputs and expected results. Unit testing is usually conducted as part of a combined code and unit test phase of the software lifecycle, although it is not uncommon for coding and unit testing to be conducted as two distinct phases.
‘ Test strategy and approach
Field testing will be performed manually and functional tests will be written in detail.
‘ Test objectives
‘ All field entries must work properly.
‘ Pages must be activated from the identified link.
‘ The entry screen, messages and responses must not be delayed.
‘ Features to be tested
‘ Verify that the entries are of the correct format
‘ No duplicate entries should be allowed
Name of the Test Upload the encrypted file
Item being tested Upload button of data owner
Sample Input File encrypted by data owner
Excepted Output The encrypted file should be splited and uploaded to cloud server to the proxy server
Actual Output The encrypted file is splited and uploaded to cloud server to the proxy server
Remark Successful
Tab 6.2.1- Unit test case of Data Owner
Name of the Test Download the file from cloud server
Item being tested Download button of remote user
Sample Input IP of proxy ,Cloud server name, file name
Excepted Output The file should be fetched from proxy server to remote user
Actual Output The file is fetched from proxy server to remote user
Remark Successful
Tab 6.2.2-Unit test case of Cloud Server
Name of the Test Attack a Cloud
Item being tested Doze-off button of attacker
Sample Input IP of Cloud server,Cloud server name,specified time period
Excepted Output The cloud should be hacked and success message should be displayed
Actual Output The cloud is hacked and success message is be displayed
Remark Successful
Tab 6.2.3-Unit test case of Attacker
Name of the Test View file of cloud server
Item being tested View file button of cloud server
Sample Input File uploaded by data owner
Excepted Output File name and associated keys with file should be displayed
Actual Output File name and associated keys with file should be displayed
Remark Successful
Tab 6.2.4-Unit test case of Cloud Server
6.3 Integration testing
To test integrated software components to determine if they actually run as one program can be tested by integrated testing. Integration tests demonstrate that although the components were individually satisfaction, as shown by successfully unit testing, the combination of components is correct and consistent. Testing is event driven and is more concerned with the basic outcome of screens or fields.
Integration testing is specifically aimed at exposing the problems that arise from the combination of components.
Software integration testing is the incremental integration testing of two or more integrated software components on a single platform to produce failures caused by interface defects.
The main use of the integration test is to check that components or software applications, e.g. components in a software system or ‘ one step up ‘ software applications at the company level ‘ interact without error.
Name of the Test Upload a file from data owner to cloud server
Item being tested Data owner, Cloud server
Sample Input File from source folder
Excepted Output File Uploaded by Data owner should be present in the desired cloud server
Actual Output File Uploaded by Data owner is present in the desired cloud server
Remark Successful
Table 6.3.1-Integration testing of Data Owner and Cloud Server modules
Name of the Test Download a file from cloud sever to remote user
Item being tested Remote user, Cloud server
Sample Input File name,IP address of proxy, cloud server name
Excepted Output Requested file should be provided to user by the cloud server through proxy server
Actual Output Requested file is provided to user by the cloud server through proxy server
Remark Successful
Table 6.3.2-Integeration testing of Cloud Server and Remote User modules
There are four types of Integration Testing they are
6.3.1 Top down Integration
Modules are integrated by moving downward through the control hierarchy, beginning with the main program module. The module control of another to the main program module is incorporated into the structure in either a depth first or breadth first manner.
In this method, the software is tested from main module and individual stubs are replaced when the test proceeds downwards.
6.3.2 Bottom-up Integration
This method begins the construction and testing with the modules at the lowest level in the program structure. Since the modules are integrated from the bottom up, processing required for modules subordinate to a given level is always available and the need for stubs is eliminated. Steps can be followed to integrate bottom up strategy:
‘ The low-level modules are combined into clusters into clusters that perform a specific Software sub-function.
‘ A driver (i.e.) the control program for testing is written to coordinate test case input and output.
‘ The cluster is tested.
‘ Drivers are removed and clusters are combined moving upward in the program structure
The bottom up approaches tests each module individually and then each module is module is integrated with a main module and tested for functionality.
6.4 Functional test
To demonstrations that functions tested are available as specified by the business and technical requirements, system documentation, and user manuals can be tested by functional testing.
This testing is focused on the following items:
Valid Input : identified classes of rational input must be accepted.
Invalid Input : identified classes of invalid input must be rejected.
Functions : identified functions must be exercised.
Output : identified classes of application outputs must be exercised.
Systems/Procedures: interfacing systems or procedures must be invoked.
The requirements focused on Organization and preparation of functional tests, key functions, or special test cases. In addition to this, systematic coverage pertaining to identify Business process flows; data fields, predefined processes, and successive processes must be considered for testing.
6.5 System Test
The entire integrated software system meets requirements ensured by system testing. It tests a configuration to ensure known and predictable results. For example the system testing is the configuration oriented system integration test. System testing is based on process descriptions and flows, emphasizing pre-driven process links and integration points.
6.6 Acceptance Testing
Acceptance testing can be performed by a non-technical person. That person can be your tester, manager or even client. If you are developing a web-application (and probably you are) the tester needs nothing more than a web browser to check that your site works correctly.
6.7 White Box Testing
As the name is suggesting, it is a type of testing where code is transparent. Here, program logic is verified. In this, developers get involved. Code is tested for verifying its correctness. White Box Testing is a method of testing software that tests internal structures or workings of it.
6.8 Black Box Testing
Black box testing tests functional and non-functional characteristics of the software without referring to the internal code of the software. Black Box testing doesn’t require knowledge of internal code/structure of the system/software. It uses external descriptions of the software like SRS(Software Requirements Specification), Software Design Documents to derive the test cases.
6.9 Output Testing
After performing the validation testing, the next step is output testing of the proposed system, since no system could be useful if it does not produce the required output in the specified format. Asking the users about the format required by them tests the outputs generated or displayed by the system under consideration. Hence the output format is considered in 2 ways ‘ one is on screen and another in printed format.
6.10 Validation Checking
The fields of validation checking are:
‘ Text Field:
In this series it will be exploring client-side validation using JavaScript. Name, telephone, email, and number of year’s fields using the internet text boxes. … To see if the user has entered a value we check to see if the length of the text box is.
‘ Numeric Field:
The numeric field can contain only numbers from 0 to 9. An entry of any character flashes an error messages. The individual modules are checked for accuracy and what it has to perform. Each module is subjected to test run along with sample data. The individually tested modules are integrated into a single system. Testing involves executing the real data information is used in the program the existence of any program defect is inferred from the output. The testing should be planned so that all the requirements are individually tested.
A successful test is one that gives out the defects for the inappropriate data and produces and output revealing the errors in the system.
‘ Preparation of Test Data
‘ Database empty instance may not be available.
‘ You may find out that inserted test data is insufficient for testing soma cases such as load testing and performance.
‘ Data insertion may become a difficult task due to database table dependencies.
‘ When you insert limited test data, it may hide some issues, but they could be found by big data set.
‘ When inserting data, complex procedures and queries may be required.
‘ Using Live Test Data:
Live test data are those that are actually extracted from organization files. After a system is partially constructed, programmers or analysts often ask users to key in a set of data from their normal activities. Then, the systems person uses this data as a way to partially test the system. In other instances, programmers or analysts extract a set of live data from the files and have them entered themselves.
It is difficult to obtain live data in sufficient amounts to conduct extensive testing. And, although it is realistic data that will show how the system will perform for the typical processing requirement, assuming that the live data entered are in fact typical, such data generally will not test all combinations or formats that can enter the system. This bias toward typical values then does not provide a true systems test and in fact ignores the cases most likely to cause system failure.
‘ Using Artificial Test Data:
Artificial test data are created solely for test purposes, since they can be generated to test all combinations of formats and values. In other words, the artificial data, which can quickly be prepared by a data generating utility program in the information systems department, make possible the testing of all login and control paths through the program.
The most effective test programs use artificial test data generated by persons other than those who wrote the programs. Often, an independent team of testers formulates a testing plan, using the systems specifications.
The package ‘Virtual Private Network’ has satisfied all the requirements specified as per software requirement specification and was accepted.
6.11 User Training
There are numerous methods and materials available to help you prepare and equip employees to better do their jobs whenever a new system is developed. Indeed, with so many choices out there, it can be daunting to determine which methods to use and when to use them. For this purpose the normal working of the project was demonstrated to the prospective users.
6.12 Maintenance
Correcting code and design errors can be covered by maintenance. It also reduce the need for maintenance in the long run, it’s more accurately defined the user’s requirements during the process of system development.
Chapter 7
RESULTS
7.1Introduction
Results are the outcome obtained through the processing of the input according to the constraints of the problem .In this report we show the output through the screenshots obtained after the execution of the different modules and also shows the user interface of each modules.
7.2 Snapshots
The Cloud Server displays the various files details like file name and associated keys through view all files button.
Snapshot 7.2.1-Cloud Server 1
The Proxy Server will maintain a backup copy of the files that the data Owner tries to upload and send the files to respective Cloud Server
Snapshot 7.2.2-Proxy Server
Data Owner can Browse, Encrypt, Upload the File to the Cloud Servers through the Proxy Server and can also delete the File
Snapshot 7.2.3-Data Owner
Data Owner Encrypt the data and converts into unrecognizable format
Snapshot 7.2.5-Data Owner showing a browsed and encrypted file
The File has been uploaded through the Proxy Server.
Snapshot 7.2.6-Proxy Server uploading the file to Cloud Server1
Remote User can view the Files present in the Cloud and can request for the secret key.
Snapshot 7.2.7-Remote User or Receiver
Hacker can Dose off the Cloud, Delete and Corrupt the File
Snapshot 7.2.8-Hacker or Attacker
Hacker is succeeded in corrupting the File
Snapshot 7.2.9-Hacker after corrupting a file
Cloud Analyzer will display the status of different Clouds
Snapshot 7.2.10- Cloud Analyzer showing the status of each cloud
On the failure of Cloud Server3 the Proxy Server regenerates the data through Cloud Server5 and provide it to the remote user
Snapshot 7.2.11-Backup file provided by proxy server to the remote user on an attack through cloud server 5
Essay: Cloud storage
Essay details and download:
- Subject area(s): Computer science essays
- Reading time: 31 minutes
- Price: Free download
- Published: 12 December 2015*
- Last Modified: 23 July 2024
- File format: Text
- Words: 9,067 (approx)
- Number of pages: 37 (approx)
Text preview of this essay:
This page of the essay has 9,067 words.
About this essay:
If you use part of this page in your own work, you need to provide a citation, as follows:
Essay Sauce, Cloud storage. Available from:<https://www.essaysauce.com/computer-science-essays/essay-cloud-storage/> [Accessed 19-12-24].
These Computer science essays have been submitted to us by students in order to help you with your studies.
* This essay may have been previously published on EssaySauce.com and/or Essay.uk.com at an earlier date than indicated.