Chapter 1
Introduction
1.1 OVERVIEW
The Cloud computing is the practice that uses a network of remote servers hosted on the Internet to store, manage, and process data, rather than a local server or a personal computer. It is defined as the type of computing that relies on-sharing computing resources instead of having local servers or personal devices to manage applications. Cloud computing is commensurable to grid computing, a type of computing where unused processing cycles of all computers in a network are trapping to solve problems too intensive for any stand-alone machine.
The On-demand computing is a kind of world wide web-based computing, where shared resources, data and information are provided to desktops and other devices on-demand. It is a model that enables ubiquitous, on-demand access to a shared collection of constructable computing resources. Cloud computing and storage solutions provide users and enterprises with different facilities to store and process their data in third-party data centres.
The Cloud Service Provider (CSP) is the service provider that offers customers storage or software services available via a private cloud or public network, which means that the storage and software is available for access via the Internet.
The Cloud storage services allow data owners to move data from their local computing systems to the cloud. Thus many number of owners started to store their data in the cloud, which increases CSP worries about storage space. However, the new security challenges were introduced on data hosting service. The Data Owners would worry that the data could be lost in the cloud because the data loss could happen in any infrastructure, no matter what high degree of reliable measures cloud service providers would take. Sometimes, the CSP might try to be dishonest to the data owners. They could discard the data which has not been accessed or rarely accessed to save the storage space or attempts to hide security flaws like data loss or corruption and claim that the data are still correctly stored in the cloud.
Therefore, the owners need to be convinced that the remote data are correctly stored in the cloud. Traditionally, the owners checked the data integrity based on some storage auditing protocols. Moreover, it is inappropriate to let either side of CSP or owners conduct such auditing, because none of them could be guaranteed to provide unbiased auditing result.
The Third Party Auditing (TPA) is a natural choice for the storage auditing in cloud computing. The TPA can periodically check the integrity of all the data stored in the cloud on behalf of the users to ensure their storage correctness in the cloud which convinces both cloud service providers and owners.
The Provable data possession (PDP) is a technique for ensuring the integrity of data in storage outsourcing. The existence of multiple cloud service providers intends to cooperatively store and maintain the client’s data which leads to the Cooperative Provable Data Possession (CPDP) technique for supporting the scalability of service and data migration in distributed cloud.
The Advanced Encryption Standard (AES) is a symmetric encryption algorithm that was designed to be efficient in both hardware and software, and supports a block length of 128 bits and key lengths of 128, 192, and 256 bits. It is based on a design principle known as a substitution – permutation network, combination of both substitution and permutation, and is fast in both software and hardware. The key size used for an AES cipher specifies the number of repetitions of transformation rounds that convert the input, called the plaintext, into the final output, called the cipher text.
The outsourced files were usually striped and redundantly stored across multi-servers or multi-clouds that explore integrity verification schemes suitable for such multi-servers or multi-clouds setting with different redundancy schemes, such as replication, erasure codes, and, more recently, regenerating codes.
Traditionally the regenerating code – scenario was designed and implemented a Data Integrity Protection (DIP) scheme for Functional Minimum-Storage Regenerating (FMSR) – based cloud storage and the scheme were adapted to the thin-cloud setting. Though it was designed for private audit, only the data owner was allowed to verify the integrity and repair the faulty servers.
The large size of the outsourced data and the user’s constrains on resource capabilities, the tasks of auditing and reparation in the cloud can be frightening and expensive for the users. The overhead of using cloud storage should be minimized as much as possible such that a user does not need to perform more number of operations to their outsourced data. Generally, the users would not want to go through the complexity in verifying and reparation. The auditing schemes imply the problem that users need to always stay online, which may impede its adoption in practice, especially for long-term archival storage.
The cloud user is the individual or organization who can have a large amount of data to be stored in multiple clouds and have the permissions to access and manipulate stored data blocks. Whenever the data blocks are uploaded to the cloud. The TPA can view the data blocks that are uploaded in multi cloud. If the user wants to download their files, the data in multi cloud were integrated and downloaded
Chapter 2
Literature survey
2.1 Enabling data integrity protection in regenerating-coding-based cloud storage: theory and implementation
Authors: Henry C.H. Chen and Patrick P.C. Lee
Journal: IEEE transactions on parallel and distributed systems, vol. 25, no. 2, Feb 2014
This paper replaces the traditional erasure codes during failure recovery into striping data across multiple servers and implements data integrity protection (DIP) scheme for code regeneration. We implement and evaluate the overhead of our DIP scheme in a real cloud storage test bed under different parameter choices.They further analyzed the security strengths of their DIP scheme via mathematical models.
MERITS:
• It has less repair traffic.
• It verifies the integrity of random subsets of outsourced data against corruptions.
DEMERITS:
• It consumes more time for downloading striped across multiple servers.
2.2 An efficient and secure dynamic auditing protocol for data storage in cloud computing
Authors: K. Yang , X. Gia
Journal: IEEE Trans. Parallel and distributed Systems, vol.24, no.9, pp. 1717-1726, http:// ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6311398, Sept.2013.
In this paper, they have first designed an auditing frame work for cloud storage systems and proposed and efficient and privacy preserving. Then, they extended the auditing protocol to support the data dynamic operation which is efficient and provably secure in the random oracle model. They further extended the auditing protocol to support batch auditing for multiple owners and multiple clouds without using any trusted organizer.
MERITS:
• It is more efficient and secured.
• It reduces the computational cost.
DEMERITS:
• It increases complexity.
2.3 Cooperative provable data possession for integrity verification in multicloud storage
Authors: Yan Zhu, Hongxin Hu, Gail-Joon Ahn, Senior Member, IEEE, Mengyang Yu
Journal: IEEE transactions on parallel and distributed systems, vol. 23, no. 12,
pp. 2231–2244, Dec 2012
The Provable data possession (PDP) is a technique that ensures the integrity of data in storage outsourcing. In this paper, they addressed the construction of an efficient PDP scheme for distributed cloud storage to support the scalability of service and data migration, in which they considered the existence of multiple cloud service providers to cooperatively store and maintain the client’s data. They presented a cooperative PDP (CPDP) scheme based on homomorphic verifiable response and hash index hierarchy. They proved the security of their scheme based on multiprover zero-knowledge proof system, which satisfied the completeness, knowledge soundness, and zero-knowledge properties.
MERITS:
• It minimizes the computation costs of clients and storage service providers.
• It has lower communication overheads in comparison with non cooperative approaches.
DEMERITS:
• It introduces new problem when the storage nodes become distributed.
2.4 A survey on network codes for distributed storage
Authors: Alexandros G. Dimakis,, Kannan Ramchandran, Yunnan Wu, Changho Suh
Journal: IEEE transactions, vol. 99, no. 3, pp. 476–489, Mar. 2011.
In distributed storage systems, the redundancy was introduced to increase reliability. The repair problem would arise: if a node storing encoded information get fails, the same level of reliability should be maintained by creating encoded information at a new node. This accounts to a partial recovery of the code, whereas conventional erasure coding focuses on the complete recovery of the information from a subset of encoded packets.
MERITS:
• The maintenance bandwidth can be reduced by orders of magnitude when compared to standard erasure codes.
DEMERITS:
• The repair considerations gives rise to new design challenges.
• It is not suitable for repairing and reconstruction of large-finite field size data blocks
2.5 Compact proofs of retrievability,” in advances in cryptology
Authors: H. Shacham and B. Waters
Journal: Springer Berlin Heidelberg, International Conference on the Theory and Application of Cryptology and Information Security, pp. 90-107, 2008
In proof of retrievability, a data storage center convinces a verifier that it actually storing all of the client’s data. The aim is that the stored data should both efficient and provably secure for that the data should pass the verification check. The BLS signatures and random oracle model has the shortest query and response of any proof-of-retrievability with public verifiability.
The pseudorandom functions (PRFs) is secure in the standard model has the shortest response of any proof-of-retrievability scheme with private verifiability for longer queries. Both schemes rely on homomorphic properties to aggregate a proof into one small authenticator value.
MERITS:
• Suitable for public and private verifications.
DEMERITS:
• It has no methods for effective code regeneration.
2.6 Privacy-preserving public auditing for secure cloud storage
Authors: Cong Wang, Sherman S.M. Chow, Qian Wang, Kui Ren, and Wenjing Lou
Journal: IEEE transactions, Vol. 62, no. 2, pp. 362–375, Feb. 2013
The cloud Users can remotely store their data and enjoy the on-demand high quality applications and services from a shared pool of configurable computing resources, with reduced burden of local data storage and maintenance. This paper enables public auditability for cloud storage is of critical importance so that users can resort to a third party auditor (TPA) to check the integrity of outsourced data. To securely introduce an effective TPA, the auditing process should bring in no new vulnerabilities towards user data privacy, as well as it should not introduce additional online burden to user. In this paper, they proposed a secure cloud storage system supporting privacy-preserving public auditing. They further extended our result to enable the TPA to perform audits for multiple users simultaneously and efficiently. Extensive security and performance analysis show the proposed schemes are provably secure and highly efficient.
MERITS:
• Batch auditing reduces time complexity for auditing
DEMERITS:
• No efficient code regeneration has been introduced.
2.7 Toward secure and dependable storage services in cloud computing
Authors: C. Wang, Q. Wang, K. Ren, N. Cao, and W. Lou
Journal: IEEE transactions on services computing, vol. 5, no. 2, April-June 2012
The Cloud storage enables users to remotely store their data and enjoy the on-demand high quality cloud applications with reduced burden of local hardware and software management. Though the benefits are clear, the new security risks toward the correctness of the data could arise in cloud.
In order to address this new problem and further achieve a secure and dependable cloud storage service, this paper proposed a flexible distributed storage integrity auditing mechanism, which utilized the homomorphic token and distributed erasure-coded data. Thus the proposed design allowed users to audit the cloud storage with very lightweight communication and computation cost. Thus the proposed system not only ensured strong cloud storage correctness guarantee, but also simultaneously achieves fast data error localization, i.e., the identification of misbehaving server. Their further design supported secure and efficient dynamic operations on outsourced data, including block modification, deletion, and append.
MERITS:
• It is highly efficient and resilient against Byzantine failure, malicious data modification attack, and even server colluding attacks.
DEMERITS:
• It has more computational complexity
2.8 Network coding for distributed storage systems
Authors: Alexandros G. Dimakis, P. Brighten Godfrey, Yunnan Wu, Martin O. Wainwright and Kannan Ramchandran
Journal: IEEE transactions on information theory, vol. 56, no. 9, pp. 4539-4551, September 2010
This paper aims to achieve less redundancy without reducing the reliability level.
In the proposed scheme the data can be stored as fragments and spread across nodes. Thus the lost coded block can be downloaded from the survival nodes and stored it at the new node. The systematic regenerating code minimizes the storage overhead and provides reliability of dynamic nodes.
MERITS:
It reduces the repair bandwidth.
DEMERITS:
It provides less optimal code regeneration.
2.9 A random linear network coding approach to multicast
Authors: Tracey Ho, Muriel Médard, Ralf Koetter, David R. Karger, Michelle Effros, Jun Shi, and Ben Leong
Journal: IEEE transactions on information theory, vol. 52, no. 10, October 2006
This paper aims for transmission and compression of information in general multisource multicast networks. Network nodes independently and randomly select linear mappings from inputs onto output links over some field.
MERITS:
• The decentralized operations provide robustness to network changes or link failures.
• It provides improved success probability and robustness on redundancy.
DEMERITS:
• It has more storage overheads.
2.10 Enabling public verifiability and data dynamics for storage security in cloud computing
Authors: Qian Wang, Cong Wang, Jin Li, Kui Ren, and Wenjing Lou
Journal: Springer, 2009, pp. 355-370
In large centralized data centers, the management of the data and services may not be fully trustworthy. This many new security challenges, which have not been well understood. It aims to reduce the problem of ensuring the integrity of data storage in Cloud Computing. The task of allowing a third party auditor (TPA), on behalf of the cloud client helps to verify the integrity of the dynamic data stored in the cloud.
MERITS:
• It provides better integrity verification on dynamic data storage.
DEMERITS:
• It provides less protection on privacy against TPA.
Chapter 3
EXISTING SYSTEM
In the existing cloud storage services, the data owners stores their large files in remote servers of cloud service providers which makes the data owners to worry about the integrity of the remotely stored data. To secure the remotely stored data an effective third party auditor (TPA) was introduced, which consists of the following two fundamental requirements that have to be met
1. TPA should be able to efficiently audit the cloud data storage without demanding the local copy of data, and introduce no increased on-line burden to the cloud user
2. The third party auditing process should deliver in no new susceptibility towards user data privacy.
Figure 3.1 cloud storage service
Network codes designed especially for distributed storage systems have the potential to provide dramatically higher storage performance for the same availability. One main challenge in the design of such codes is the exact repair problem, if the node storing encoded information be unsuccessful; in order to maintain the same level of reliability we need to create cipher information at a new node. One of the main open problems in this emerging area has been the design of simple coding schemes that allow identical and low cost repair of faulty nodes and have high data rates.
Figure 3.2 Cloud data storage architecture
The current remote checking methods for regenerating-coded data only provide private auditing, claiming data owners to always stay online and perform auditing, as well as repairing, which is sometimes impossible.
It was noted that data owners lose ultimate control over the consequence of their outsourced data; thus, the correctness, availability and incorruption of the data are being put at risk. On the one hand, the cloud service is usually faced with a broad range of internal/external attackers, who would maliciously delete or corrupt user’s data.
The cloud service providers may act maliciously, tackle to hide data loss or corruption and claiming that the files are still correctly stored in the cloud for fame or financial reasons. Thus it makes great sense for users to implement an efficient protocol to perform frequent verifications of their outsourced data to ensure that the cloud indeed maintains their data integrity. Many mechanisms handling with the integrity of outsourced data without a local copy have been proposed under various system and security models up to now. The most significant work among these studies is the PDP (provable data possession) model and POR (proof of retrievability) model.
In the existing private audit, only the data owner is allowed to verify the integrity and repair the faulty servers. Considering the large size of the contract out data and the users constrained resource capability, the tasks of auditing and reparation in the cloud can be daunting and expensive for the users.
In particular, users may not want to go through the intricacy in verifying and reparation. The auditing schemes imply the problem that users need to always stay online, which may disrupt its adoption in practice, especially for long-term archival storage.
To ensure the data integrity and save the users calculating resources as well as online burden, optimization measures are taken to improve the resilience and efficiency of the auditing scheme. Thus, the storage overhead of servers, the computational complexity of the data owner and communication difficulties during the audit phase can be effectively reduced.
Disadvantage
• The TPA can introduce new security vulnerabilities.
• Data owners should stay online for auditing and reparation.
Chapter 4
Proposed system
The secure cloud storage mainly focuses the privacy-preserving public auditing on data storage. Besides, with the prevalence of cloud computing, an accountable increase of auditing tasks from different users may be assigned to TPA. As the individual auditing of these growing tasks can be difficult and unmanageable, a natural demand is then how to enable the TPA to uncomplicatedly perform multiple auditing tasks in a batch manner.
Figure 4.1 system model
The public auditing scheme for the regenerating-code-based cloud storage, in which the integrity checking and regeneration of failed servers are only provided with the Restful interface, data blocks and authenticators are implemented by a TPA and a proxy server separately on behalf of the data owner. Instead of directly adapting the existing public auditing scheme to the multiple server setting, a novel authenticator is designed, which is more appropriate for regenerating codes. Besides that the coefficients were encrypted to protect data privacy against the auditor. Several challenges and threats automatically arise in our new system model with the proxy.
Our proposed scheme completely releases data owners from online burden for the regeneration of blocks and authenticators at defective servers and it provides the authority to a proxy for the reparation. Optimization measures are taken to improve the resilience and performance of the existing auditing scheme, thus the storage overhead of servers, the computational difficulties of the data owner and communication complication during the audit phase can be effectively reduced.
Advantages
• The proposed system supports data dynamics with public verifiability and privacy against third-party verifiers.
• It reduces the data owner’s online burden by introducing a proxy server on behalf of the data owner.
• The user can download the interested file without fear about data integrity with reduced repair bandwidth.
Chapter 5
System requirements
5.1 HARDWARE REQUIREMENTS
• System : Intel Pentium processor
• Hard disk : 40 GB
• RAM : 512 GB
• Monitor : 14’’ VGA Colour
• Disk space : 1 GB
5.2 SOFTWARE REQUIREMENTS
• OS : Windows 7
• Front End : JAVA
• Back End : MySQL
• IDE : Net Beans IDE
Chapter 6
SYSTEM ANALYSIS
6.1 MODULES
1. Cloud storage server
2. Metadata key generation
3. Third party auditing
4. Code regeneration
6.2 MODULES DESCRIPTION
1. CLOUD STORAGE SERVER
The Cloud storage is the technique of data storage where the client’s data were stored in the physical storage which spans multiple servers, that physical environment were typically owned and managed by some hosting company. Those cloud storage providers were responsible for keeping the client’s data available and accessible for end users. People and organizations can buy or lease those storage capacities from the service providers to store user or organisation’s data.
The Data Owner / User can upload the files by registering themselves using their personal details and create their own username and password. The user / owner have to login themselves using those registered username and password in order to upload download files.
2. METADATA KEY GENERATION
The original file that the data owner would like to upload will be initially partitioned into m blocks. Let each m blocks consists of n bits of data. Those partitioned blocks are further encrypted by using AES algorithm for providing stronger protection on client’s data. Whenever the access granted for user’s download request, the partitioned data are further concatenated like the original file and downloaded at the end user’s side.
3. THIRD PARTY AUDITING
The TPA is privileged to audit the uploaded files of all the data owners who have been registered with the cloud service provider. The auditing process involves verifying the integrity of uploaded data by considering some file parameters such as type of file, contents, length of the document, etc., to considerably reduce the time needed for auditing the batch auditing technique is used to reduce the auditing time to a reasonable amount.
4. REGENERATING CODE
In code regenerating module, Instead of storing the original file in single server, the original file is split into more than one server and stored in an encrypted format, whenever the original file has been corrupted, the proxy will be insisted to repair or regenerate the lost file from the other survival servers by using some mathematical computations.
6.3 UML DIAGRAMS
The Unified Modeling Language is a standard visual modeling language which is used for modeling the processes. It is used for analyzing, designing, and implementation of the project work.
6.3.1 USECASE DIAGRAM:
The use case diagram lists the steps of interactions between a role (“actor") and a system, to achieve a goal of the project. The actors are Branch head, Manager and staff. It is used at a higher level for representing missions or stakeholder goals.
Figure 6.3.1 Use case diagr
6.3.2 SEQUENCE DIAGRAM
The Sequence diagram is shows the interaction diagram to represent how processes operate with another and in a particular order. It is known as Message Sequence Chart. A sequence diagram shows time sequence of the various operations.
Figure 6.3.2 Sequence diagram
6.3.3 ACTIVITY DIAGRAM
The activity diagram describes dynamic aspects of the system by means of the flow form from one activity to another activity. The activity can be represented as an operation of the system.
Thus the control flow is drawn from one operation to another.
Figure 6.3.3 Activity diagram
6.3.4 ER DIAGRAM
The Entity–Relationship model (ER model) describes the data or information aspects of the business domain and its process requirements in an abstract way which lends itself to ultimately being implemented in the MySQL database. The main components of ER models are entities such as (User, TPA, CSP, and Data Owner) and the relationships that can exist among them and the database.
Figure 6.3.4 ER diagram
6.3.5 DATA FLOW DIAGRAM:
The data flow diagram (DFD) is the graphical representation of the flow of data through the modelling of the process aspects. It shows the preliminary step to create an overview of the system and then it gets elaborated.
Figure 6.3.3 Data Flow diagram
6.4 SCREEN SHOTS:
Start up page:
Register page:
User Login Page:
Service provider login:
File upload:
Viewing uploaded files:
File download request:
Send key to download file:
Third Party Auditing:
Chapter 7
Conclusion
7.1 CONCLUSION
Thus the public auditing scheme used in this work can efficiently check the integrity of original data stored across the remote cloud by delegating the privilege to the TPA. To securely preserve the original data against the TPA, the randomized coefficient technique is applied rather than using blind technique method. For reducing data owner’s online burden, a semi trusted proxy server is privileged for handling reparation of maliciously corrupted data across the cloud storage servers.
7.2 FUTURE ENHANCEMENT
In the future work, the existing code regeneration method can be improved by modifying the existing partitioning method for regeneration into splitting up of original data or file as simply A, B, C and store it on more than one server which can reduce the time complexity as well as the computational complexity of code regeneration to an reliable level, when downloading maliciously corrupted data as like original data itself after regenerating. This work may pretend the user that the requested file for download has not been corrupted by any of un-trusted agent which can also reduce the repair bandwidth for file downloading.
References
1. H. C. H. Chen and P. P. C. Lee, “Enabling data integrity protection in regenerating-coding-based cloud storage: Theory and implementation,” IEEE Trans. Parallel Distrib. Syst., vol. 25, no. 2, pp. 407–416, Feb. 2014.
2. K. Yang and X. Jia, “An efficient and secure dynamic auditing protocol for data storage in cloud computing,” IEEE Trans. Parallel Distrib. Syst., vol. 24, no. 9, pp. 1717–1726, Sep. 2013.
3. Y. Zhu, H. Hu, G.-J. Ahn, and M. Yu, “Cooperative provable data possession for integrity verification in multicloud storage,” IEEE Trans. Parallel Distrib. Syst., vol. 23, no. 12, pp. 2231–2244, Dec. 2012.
4. G. Dimakis, K. Ramchandran, Y. Wu, and C. Suh, “A survey on network codes for distributed storage,” Proc. IEEE, vol. 99, no. 3, pp. 476–489, Mar. 2011.
5. H. Shacham and B. Waters, “Compact proofs of retrievability,” in Advances in Cryptology. Berlin, Germany: Springer-Verlag, 2008, pp. 90–107.
6. C. Wang, S. S. M. Chow, Q. Wang, K. Ren, and W. Lou, “Privacy-preserving public auditing for secure cloud storage,” IEEE Trans. Comput., vol. 62, no. 2, pp. 362–375, Feb. 2013.
7. C. Wang, Q. Wang, K. Ren, N. Cao, and W. Lou, “Toward secure and dependable storage services in cloud computing,” IEEE Trans. Service Comput., vol. 5, no. 2, pp. 220–232, Apr./Jun. 2012.
8. G. Dimakis, P. B. Godfrey, Y. Wu, M. J. Wainwright, and K. Ramchandran, “Network coding for distributed storage systems,” IEEE Trans. Inf. Theory, vol. 56, no. 9, pp. 4539–4551, Sep. 2010.
9. T. Ho et al., “A random linear network coding approach to multicast,” IEEE Trans. Inf. Theory, vol. 52, no. 10, pp. 4413–4430, Oct. 2006.
10. Q. Wang, C. Wang, J. Li, K. Ren, and W. Lou, “Enabling public verifiability and data dynamics for storage security in cloud computing,” in Computer Security. Berlin, Germany: Springer-Verlag, 2009, pp. 355–370.