Data Security and Compliance Controls in Cloud Environments
Date: Feb 10, 2022
In this sample chapter from CompTIA Cloud+ CV0-003 Exam Cram, you will learn how to apply data security and compliance controls in cloud environments.
In this chapter you will learn about different data security and compliance controls that are available in cloud environments. You will learn about how encryption and integrity affect an organization’s data. You will also learn how to secure data by classifying and segmenting the data, as well as controlling access to the data.
Also discussed in this chapter is how laws and regulations impact data security, including the concept of a legal host. Lastly, you will learn about records management, a process in which rules are put in place to determine how long data is maintained and how to properly destroy the data when it is no longer needed.
Encryption
Encryption is the process of transforming data from its original form to a form that, when viewed, does not reveal the original data. There are three different forms of encryption:
Data at rest: Data is encrypted when it is stored. This method can either be performed by you prior to uploading the data to storage, or in some cases, it can be performed by a function that is provided by the cloud provider. When you perform the data encryption, it is your responsibility to decrypt the data when the original data is needed. When the cloud provider encrypts the data, the decryption process must be performed by the cloud provider.
Data in transit: Data is encrypted before it is sent and decrypted when received. This form of encryption could involve several different techniques, but in most cases for cloud computing environments it means that the data is encrypted by a network device that then sends the data across the network.
Data in use: Data is encrypted when being actively used, which typically means while it is stored in random-access memory (RAM). Because some exploits may make data in RAM vulnerable, this form of encryption may be very important to ensuring data integrity.
Many different technologies can be used to encrypt data, and which technology you use will depend on several factors, including which cloud provider you utilize. These technologies fall into one of two methods of encryption:
Symmetric encryption: With this method you use the same key (a unique value of some sort) to both encrypt and decrypt the data.
Asymmetric encryption: With this method you use a different key to encrypt and decrypt the data. One key is referred to as the public key, and the other is called the private key. An example of using this encryption method would be if you wanted someone to send data to you across the network. You provide the public key to this person, and this person then encrypts the data. The only way to decrypt the data is to use the private key, which you would never share with anyone else.
Integrity
While data encryption is focused on keeping prying eyes from seeing the original data, data integrity is focused on assuring the data is accurate and consistent. Doing so requires ensuring data integrity through all stages of the data lifecycle, which includes transporting, storing, retrieving, and processing data.
Several tools can be used to ensure data integrity, including hashing algorithms, digital signatures, and file integrity monitoring (FIM).
Hashing Algorithms
A hashing algorithm is a mathematical function that is applied to data that should return a unique result. Unlike encryption, in which the result of the encryption process is data that could be decrypted back to the original format, hash data is one-way, making it impossible to return the original data. The purpose of a hash isn’t to hide or encrypt the data, but rather to ensure that the data you have received matches up with the original.
Consider a situation in which you receive a database with sensitive information. Your organization is going to use this information to help make some critical decisions on future products. You received this data from a trusted third-party source, but how can you be certain that a “bad actor” didn’t intercept the data and inject false information?
Your third-party source could use a hashing algorithm and send the resulting hash separately. Then you could take the data that you have received, perform the same hashing algorithm, and then compare the results with the hash from the third-party. If they match, you know you have unaltered data.
There are many different types of hashing algorithms. Each has specific advantages and disadvantages, but for the CompTIA Cloud+ certification exam, you should be familiar with the names of these algorithms:
MD5
SHA-1
SHA-2
SHA-3
RIPEMD-160
Digital Signatures
Suppose a friend sends you a letter. How would you know that it really came from that person? One method is to have your friend add a signature to the bottom of the letter. If you recognize the signature, you can be more certain that it came from your friend.
Digital signatures are used in the same way but are a bit more complicated in how they are implemented. Digital signatures make use of asymmetric cryptography in which the signature is encrypted using the private key of an individual or organization. The public key is made well known through another means. The signature that has been encrypted with the private key can only be decrypted by the public key. Successful decryption verifies the data came from the correct source.
File Integrity Monitoring (FIM)
In some cases, it is important to determine if data within a file has changed. The process that handles this determination is called file integrity monitoring. With FIM a checksum is created when the file is in a known state called a baseline. This checksum is a value that is based on the current contents and, in some cases, additional file attributes, such as the file owner and permissions.
To determine if a file or a file attribute has been changed, you can take another checksum sometime in the future. When you’re comparing the original checksum to the new checksum, if they match, the current file is the same as the original. This technique can be used to determine if someone has tampered with a key operating system file or a file that has been downloaded from a remote server.
Classification
Consider how you would treat data that contains credit card information compared to how you would treat data that contains comments that have been made regarding your company website. The data that contains credit card information is much more sensitive than the data that contains customer comments, so you would want to treat the data differently.
In this situation data classification becomes important. With data classification, you place data into different categories depending on how you want to treat the data. These categories can be based on rules related to how sensitive the data is, who should be able to read the data, who should be able to modify the data, and how long the data should be available. Unless you are storing data that is related to compliance regulations (like SOC 2, GDPR, PCI-DSS, or HIPAA), the data classification criteria are up to you. See the “Impact of Laws and Regulations” section in this chapter for more details on compliance regulations.
For example, you may consider classifying data based on who is permitted to access the data. In this case you may use the following commonly used categories:
Public: This data is available to anyone, including those who are not a part of your organization. This typically includes information found on your public website, announcements made on social media sites, and data found in your company press releases.
Internal: This data should be available only to members of your organization. An example of this data would be upcoming enhancements to a software product that your organization creates.
Confidential: This data should be available only to select individuals who have the need to access this information. This could include personally identifiable information (PII), such as an employee Social Security number. Often the rules for handling this data are also governed by compliance regulations.
Restricted: This data may seem similar to confidential data, but it is normally more related to proprietary information, company secrets, and in some cases, data that is regarded by the government as secret.
In the cloud there are different techniques to handle different types of data. These techniques could include placing different types of data into different storage locations. Chapter 12, “Storage in Cloud Environments,” will discuss different storage solutions that are typically found in a cloud environment.
You can also make use of metadata. Metadata is data that is associated with the “real data,” and it is used to describe or classify the “real data.” In cloud environments, metadata is normally created by using a feature called tags. Tags are flexible in that you can create a key-value pair that describes components of the data. Figure 8.1 demonstrates applying tags to data in AWS.
FIGURE 8-1 AWS Tags
Segmentation
In relation to data security, data segmentation is the process of placing data into different locations based on who should be able to access the data. For example, it would be a good practice to place employee PII in a different location (like a different database) from the data contained in press releases.
Data segmentation may also be a requirement for compliance regulations. For example, a regulation may require that specific data never leave a country. The reason is typically that laws govern the use of this data, and once the data leaves the country, those laws no longer have effect. In this case, data segmentation may be related to the region in which you store the data. See the “Impact of Laws and Regulations” section in this chapter for further details.
Access Control
Access control is the technique that determines who can access a resource. In terms of data access control, accessing the resource can include viewing, modifying, and destroying the data.
In most cloud environments, the definition of “who” can include both people and other resources. For example, you may have a payroll application that needs to access secure data about employees that is stored in a database. There must be access control rules in place that permit or block access for both people and resources.
People are given user accounts to access cloud resources. These user accounts are granted access to resources by using permissions.
Applications are assigned to roles, which are similar to user accounts in that permissions can be applied to roles just as they are applied to user accounts. However, applications can never be assigned to user accounts (in some cases a user may be assigned to a role, depending on the cloud environment that you are working in).
To learn more about how user accounts and roles impact access to resources, see Chapter 5, “Identity and Access Management.”
Impact of Laws and Regulations
As previously mentioned, many laws and regulations govern how data is treated in an organization. They will vary depending on where your data is located. For example, the laws that govern data in the United States are different from the laws that govern data in the European Union (EU).
The laws and rules are numerous and vary based on the industry of your organization. For example, if your company is a retailer and you accept credit card payments, you will likely need to follow PCI Security Standards when dealing with credit card data. If your organization is a hospital, you will need to follow HIPAA regulations when dealing with patient data.
For the certification exam, it likely is not worthwhile to memorize a bunch of laws and regulations. Many organizations have full-time staff devoted to ensuring these laws are followed. Being aware of the impact of these laws is most critical for the exam.
Legal Hold
Organizations cannot just delete information whenever they want. Some information, such as employee records, must be maintained for specific periods of time in the event of investigations or litigation. The term legal hold is used by an organization’s legal department to indicate how long specific data must be stored and how it should be made available in the event it is needed.
Records Management
Organizations often end up creating, gathering, and accumulating a lot of data. The volumes of information stored by an organization can result in high costs because storing data is not free. While cloud vendors provide many ways of storing data, they will charge to store data, so organizations typically do not want to keep data for longer than necessary.
Records management is the process of determining how and for how long to store data. This large topic includes data classification and encryption, as well as versioning, retention policies, and destruction policies.
Versioning
Versioning is the process of keeping track of file content changes over time. Many cloud technologies provide versioning as a feature that can be enabled, so the versioning happens automatically whenever a data record is changed.
Retention
Retention refers to a policy that determines how long data should be stored. A retention schedule is created that will determine when data is destroyed and how older data is stored until it is to be destroyed.
Destruction
The destruction of data must be clearly defined when developing a records management plan. When the data is to be destroyed is one key element to define, but also how the data is to be destroyed should be clearly stated in the plan. Data can be destroyed by physical destruction of records, degaussing, or zeroizing.
Write Once Read Many
Write once read many, also referred to as WORM, is a form of write protection in which the data can be written only once and then it cannot be modified. This is a critical feature when you need to ensure that data has not been tampered with after it was created.
Data Loss Prevention (DLP)
Data loss prevention is the process of ensuring that sensitive data is not misused, accessed, or lost. It is designed to prevent a data breach that may include accessing, modifying, or destroying data. In some cases, the DLP process must be clearly defined because the data is regulated by laws and regulations. In other cases, the DLP may be the result of wanting to keep classified information secure.
Some cloud providers will include DLP as a software tool. For example, Google Cloud has a product called Cloud DLP, which enables you to view how data is stored and processed, configure data inspection and monitoring, and reduce the risk of data loss. In other cases, the features of DLP may be associated with a specific data-based product. For example, there are techniques that you can use for DLP when storing data in AWS S3 buckets.
Cloud Access Security Broker (CASB)
CASB is a software tool that can be located either on-premises or in the cloud. It is designed to provide an interface between cloud resources (applications) and cloud users. It monitors access to cloud resources including data, issues warnings when a cloud resource may have been compromised, and enforces security policies.
CASBs also provide the means to perform audits, so access to data resources in the past can be analyzed. They are also often used for compliance reporting because they provide insights to data access over time.
What Next?
If you want more practice on this chapter’s exam objectives before you move on, remember that you can access all of the CramQuiz questions on the companion website. You can also create a custom exam by objectives with the practice exam software. Note any objectives you struggle with and go to that objective’s material in this chapter.