DP-203 Data Engineering on Microsoft Azure – Design and Develop Data Processing – Azure Databricks Part 7

  • By
  • June 25, 2023
0 Comment

22. Lab – Azure Data Lake Storage Credential Passthrough

Now in this chapter, I want to go through a scenario wherein you can actually make use of something known as your ad Credential pass-through. So earlier on we had seen that if we wanted to fetch data from a data lake gen two storage account, we had to ensure that we have the access keys defined in a key vault.

But we can also make use of a feature known as Azure Active Directory Credential Pass through wherein the user that is actually working with the notebook can actually be authorized to access the data in the Azure data lake gen two storage account. So this is a much more useful security feature. So here the user who is executing the notebook does not need to go through the process of having the access keys in place based on their credentials, based on their permissions, they will have access on to the data in the data lake gen two storage account.

Now, I go through in detail on how you can actually give access on to your data in your data like Gentle storage account in the section of security in this course. So I go through that. But in this chapter, we are going to see how to make use of that feature, that security feature. And this is your Active Directory Credential pass through feature. So now, in order to have a clean slate to test this feature, I’m going to create a new storage account. So I’m going to choose a storage account. So it’ll be a data lake gen two storage account. So here I’ll choose my resource group. I’ll give a storage account name. I’ll choose North Europe. I’ll make this locally redundant. I’ll go on to Next for advance. I’ll enable the heroku namespace Cornet working, Data protection tags, review and Create and let me go ahead and hit on create. So here I have listed down all the steps that you need to perform.

The first is creating a new data lake storage account. Next, we need to upload a file and give the required permissions. This also includes to ensure that we give something known as the reader role and the storage Blob reader role onto the Azure admin user and also something known as ACL permissions. So once we have the storage account in place, I’ll go ahead onto the resource, I’ll go on to my containers and I’ll create a data container. Now I’m going to upload the log CSV file. So, we’ve already seen this early on. I have this log CSV file in place. Now we need to give permissions on to Azure admin account. See when we run our notebooks. Here I’m running has the Azure admin account. So I need to ensure that I give the right permissions.

Now onto my data in the data lake gen two storage account. Even though I’m the Azure admin, I still need to specifically give these permissions. So the first thing I need to do is to go on to my Data Lake storage account. Here. I have to go on to access control. I have to click on Add and add a role assignment. Here I need to choose the reader role and here I need to search for the Azure admin user ID. And then I’ll click on save. I have to add another role.

So here another role assignment. Here I need to choose the role of Storage Blob data reader. Here again search for my admin account, click on Save. Now, I also need to log into Azure Storage explorer to give something known as access control list permissions. So I mentioned that in the section of security I go through all of these concepts. So at this point in time I have logged into Azure Storage Explorer, has my Azure admin let me go on to New Data Lake.

I’m just waiting for my containers to load up. I’ll go on to my Blob containers, I’ll go on to my data container. I’ll right click. I’ll manage access control. So here I’ll click on Add. Here I’ll search for the user. So I’ll choose my user ID. That’s the first one. I’ll click on Add. Here I’ll choose the permissions of Access read hit on OK. So here, successfully save the permissions. So what I’ve done earlier on, I’ve chosen the option of manage access control list. Now here I’ll choose Propagate Access Control List so that it will propagate the access control onto all of the objects that are there in this container and I’ll hit OK.

So normally this is a more secure way. So you could have users defined in Azure Active Directory and you can give them selective access onto the files in the Azure Data Lake gen Two storage account. So we’ve done this part, we’ve given the required permissions. We’ve also assigned all of these roles. Now, next we need to create a new cluster. Now, one very important note is this is only available with the premium plan of Azure databricks and we are using that trial Premium plan and this only works with Azure Data Lake Gen two storage accounts. So now in Azure data bricks, I’ll go on to the Compute section. Now I’m going to go on to my existing cluster and I’m going to terminate the cluster.

So we have to create a new cluster. So one thing is that we will not be able to have multiple clusters in place because we might not be able to do so based on the number of virtual codes that we can actually create as part of our subscription. So I know that I have a limit on the number of virtual codes that I can use in a region as part of my subscription. So if I try to create another cluster, I might get an error. So we have to create a new cluster. So I’ll go on to my clusters and here I’ll create a cluster. Here I’ll give a cluster name and here I’ll again choose single node. Now I have to go on to the advanced options. And here I need to enable that Credential passthrough for user level access. Now, here I am choosing my root user. So actually this is the long ID for my tech support thousand user.

And then I’ll create the cluster. Let’s wait till we have the cluster in place. Now, once we have the cluster in place, I’ll go on. I’ll create a new notebook here, I’ll choose my cluster as the new cluster and hit on Create. Now, here I’ll take the code to create a data frame. I’ll place it here. So, I need to replace all of this. So, the name of my storage account is New Data Lake. I need to replace it here just to make sure it’s the same. So I have my log CSV file, it is the data container.

Now, let me run this. So here you can see all of the data. So I said the difference here is we have not used any access keys. There are no keys that are signed onto the cluster, no keys that are part of the notebook itself. We are not making use of the secrets that are stored in data bricks. We are now purely basing our authorization on the user that is defined in as your active directory. So now that same user that is running your notebook is also having access on to the data in your notebook data Lake gen two storage account. So, another secure way in which you can access your data from your notebooks.

23. Lab – Running an automated job

Now, in this chapter, I want to go through jobs that are available in Azure Data bricks. So a job is a noninteractive way to run an application in an Azure Databricks cluster. You can run the job immediately or based on a schedule. You can run a notebook or a job file in a job, and the job will run on a cluster. So, as an example, earlier on, we had run this particular notebook that would take the streaming events from Azure Event Hubs onto our table in our dedicated SQL Pool. Now, let’s say you want to run this as a job. Now, the first thing I need to do is to move this notebook. So I’ll click on move here. I’ll choose a shared location, and let me hit on Select Here. I’ll confirm the move. So currently the notebook is in the detached state. Now, in another tab, let me go on to Jobs here.

Let me create a new job. Now, here, the first thing I need to do is to select my notebook. So I’ll go on to shared. I’ll choose my app notebook. I’ll hit on confirm. Now here we need to choose the cluster on which to run our job. Now, you have two options. You can run it on your existing cluster, or you can create a new job cluster. So Job Cluster is specific for running the jobs. Since we don’t want to reach any sort of limits on the number of virtual cores that we can assign on to our clusters, I’ll choose my existing App Cluster. Now, if I just quickly go on to another tab, if I go on to my clusters. So we have been working with a couple of clusters in this particular section.

Now, at any point in time, you can go on to a running cluster, and you can terminate the cluster to basically stop it, and then you can start the cluster again. This is a cluster we created earlier on. This actually had the library installed for Azure event hub. So if I go back onto clusters, if I go on to my terminate cluster, you can again start this cluster at any point in time. So what Azure Databricks does is that it actually retains the configuration of your cluster after it has been terminated for a period of 30 days, so that you can start your cluster with the same configuration at any point in time.

If you want to retain the configuration of this cluster for a longer duration of time, you have to actually choose this icon to pin it on to as your data breaks. So now you can see you don’t even have the option to delete this particular cluster. So let me unpin this, because you should have the option to delete the cluster at any point in time. So just a quick note when it comes to the clusters going back onto our Jobs page. So we have everything in place. Let me give a job name. And here in the schedule type, you can run it based on a schedule or you can manually trigger the job. So we will manually trigger this particular job. So let me hit on create. Once we have the job in place, I’ll go back onto jobs. Now let me go ahead and start this particular job.

So it started the job, you can go on to the job. So here I’m actually getting an error. So I can see there’s an internal error if I view the details. So here it’s saying that the notebook is not found. So that means we made a mistake in our job configuration. So I can go on to job A. I can go on to configuration. And here let me select the proper notebook. So app notebook in the shared location. Let me hit on confirm. So this is fine. I’ll click on save, let me go back onto jobs and let me now run this job again, I’ll go back onto job A. And now we can see it is in the running state.

We can see the duration. Over here we can click on view details. So it has submitted the command on to the cluster for execution. So now we can see it is initializing the stream. So now we can see it is running the stream. So let me now try to see if I have any data in my log table so I can see the data in place. So now this is actually running has a job on a general all purpose cluster. But what you can do is that in a large organization, if you want to run the jobs, you can actually run them on separate job clusters. So just for now, I’ll go back on to the job and let me cancel this running job. Right, so in this chapter I just want to go to the job aspect which is available when it comes to your data bricks. You.

Comments
* The most recent comment are at the top

Interesting posts

Preparing for Juniper Networks JNCIA-Junos Exam: Key Topics and Mock Exam Resources

So, you’ve decided to take the plunge and go for the Juniper Networks JNCIA-Junos certification, huh? Great choice! This certification serves as a robust foundation for anyone aiming to build a career in networking. However, preparing for the exam can be a daunting task. The good news is that this guide covers the key topics… Read More »

Mastering Microsoft Azure Fundamentals AZ-900: Essential Study Materials

Ever wondered how businesses run these days without giant server rooms? That’s the magic of cloud computing, and Microsoft Azure is a leading cloud platform. Thinking about a career in this exciting field? If so, mastering the Microsoft Certified: Azure Fundamentals certification through passing the AZ-900 exam is the perfect starting point for you. This… Read More »

The Impact of Remote Work on IT Certification Exam Processes

With remote work becoming the new norm, it’s not just our daily routines that have changed but also how we tackle IT certification exams. Gone are the days of trekking to testing centers; now, your living room can double as an exam room. This shift has brought about some fascinating changes and challenges. Let’s dive… Read More »

IT Risk Management: CRISC Certification Exam Essentials

Do you ever feel like the IT world is moving at warp speed? New tech seems to pop up every day, leaving you wondering how to keep up and truly stand out in your field. Companies are increasingly concerned about online threats, data leaks, and meeting legal requirements. That’s where the CRISC (Certified in Risk… Read More »

The Ultimate Guide to Mastering Marketing Automation for Email Wizards

Hey there, email aficionados! Welcome to your new favorite read – the one that’s going to turbocharge your email marketing game. You’re about to dive into the captivating world of marketing automation, a place where efficiency meets effectiveness, letting you boost your campaigns without breaking a sweat. Get ready to discover how automation can not… Read More »

Master YouTube Marketing with These 10 Powerful Steps

Welcome to the dynamic world of YouTube marketing! Whether you’re a seasoned pro or just getting started, harnessing the power of YouTube can significantly boost your brand’s visibility and engagement. With over 2 billion monthly active users, YouTube offers a vast audience for your content. But how do you stand out in such a crowded… Read More »

sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |