DP-203 Data Engineering on Microsoft Azure – Design and Develop Data Processing – Azure Databricks Part 6

  • By
  • June 25, 2023
0 Comment

19. Lab – Getting data from Azure Data Lake – Implementation

So in the previous chapter we had created our keyword. We create a key in the keyword. Now it’s time to create something known as a databricks scoped secret. So for that I need to go on to my Databricks resource. So here, here I’ll go on to my Azure Databricks workspace. Here I have to take the workspace URL. So I’ll copy it onto the clipboard and then I need to complete this URL. So I’m creating a scope when it comes to the secrets, I need to go back on to my Azure Key vault. So here I need to give a name for the scope. So I just give a scope name. Now here I have to give the DNS name of the Azure Keyword. So if I go on to the overview of the keyword here I have the vault Uri.

I need to copy this, I need to place it here. Next is the resource. ID. So I have to go back on to my Azure Keyword. I have to go on to the properties. Here, let me just hide this. And here I need to take the resource. ID. Note that the word Uri is also available over here. So I’ll copy the resource ID. Then I’ll place it here and let me hit on Create. So now it has added the Data Lake key I’ll hit on. OK, now here let me create a new notebook. Again, I’ll choose scala. I’ll attach it on to my app cluster. I’ll hit on create. Here I have some code in place to fetch data from Azure Data Lake. I’ll just copy this. I’ll place it here.

So here now I want to connect onto our Data Lake Gen Two storage account and I need to replace this. What is the name of my Data Lake account? So I’ll copy this, I’ll place it here. Yeah, I am setting a configuration. When it comes to the Spark context, I’m mentioning my Data Lake storage account name. Here is the name of the scope that we create early on. So remember, we just create a databricks scope secret. Yeah, this is the name. Remember, we gave the same name of Data Lake key here. What is the actual key in the key vault? So, if I go on to my key vault, if I go on to secrets so, even though we are looking at access keys, don’t confuse us with the keys given here.

These are encryption keys which you create in the Azure Keyword. Secrets are like passwords. So for example, if you have a database password, you can actually store it as a secret in the key vault. So what is the name of our secret? It’s data lake 2000. So I’ll place it here. And what is a file that I want to access? So it’s my log CSV file. It is in the data container. I’ll replace this. So let’s confirm. Let’s go on to my containers.

I’ll go on to my data container. So it’s in my raw folder. Right? I have my log CSV file, so in my raw folder. Now let me run the cell. And here we are, getting the output as desired. So again, this is from a security measure. How do you access your Azure data? Lake Gentle accounts in a more secure way.

20. Lab – Writing data to Azure Synapse SQL Dedicated Pool

So in the last chapter we had seen how we could read data from our Data Lake Gen two storage account. Now let’s take this a step further and let’s write this data into a dedicated SQL pool. So here, let me now add some code. So this code can now be used to read data data from our Data Lake storage account and then write it on to our dedicated SQL pool table. Now here when it comes to the data frame itself, so let me just copy this and replace it here. So here I’m ensuring to infer the schema again, let me ensure that I change the name of my storage account and also the name of the container, and it’s in the Raw directory. Now here I am creating another data frame wherein I’m just selecting the columns of interest. Now I am selecting all of the columns, but over here, if you want to only select certain columns in your initial data frame, this is something that you can do.

But at this point in time, I’m only selecting all of the columns I’ll scroll down. And here is where we need to start writing data into our table in the dedicated SQL pool. So let’s first of all go back onto my dedicated SQL pool in my databases. I’ll go on to new pool. I’ll right click and choose a new query. Now here I’ll delete from my log data table if any information is already available. So I’ve deleted all of the rows. Now going back onto our notebook. So I said that you need to also have a temporary staging area. So I can actually create a temporary staging area in my Dllake Gentu storage account. So let me create a new container. So here I’ll go back. Let me create a new container. I’ll give this container name of Tempdir so that is in place.

So here my table name is Log Data. So here is my temporary container. I can also create another folder known as Log. So I can go on to the temporary container and let me create a new directory. Click on save. Now here I’m creating my connection sync. So internally it’s using Java based drivers. So here I need to give a name for my workspace. So if I go on to my Synapse workspace, I’ll copy the workspace name I need to replace it here. This is the name of my pool, my username and the password. Now I’m using the right method that is available as part of your data frame. Here I’m giving the mode as append. So I’m saying that if there are any new rows that are coming in, then appended on to the table. So this is very useful when you’re using or streaming events from Azure Event Hubs.

So whenever there are new events, it will be added on to the table here. The format is that of being a SQL data warehouse. Here is my connection string here is the temp directory here. I’m just saying to go ahead and take whatever Azure storage credentials we have here. I’m giving the table name and the save option is then used to write onto that table. Let me also copy the Import statement. And now let me run the cell. Now it’s running the command. So it seems this is done. Now let me do the select Star. And here we can see our data in place. So in this chapter, I want to show you how you can now take data from a file in your Azure Data Lake Gen To storage account and write it directly in your notebook onto a table in your dedicated SQL pool.

21. Lab – Stream and write to Azure Synapse SQL Dedicated Pool

So now I want to complete the notebook which I’d use for taking events from the Azure Event Hub and I want to stream that data into a table, into a dedicated SQL pool. So earlier on also we had a table in place. This was the DB log table. Now, if you don’t have the table in place, we can go ahead and recreate the table. So first let me drop the table. If it does exist, then here in the code I have the statement to recreate the table. So here I’m representing all of the numbers has long that’s big integers. So let me run it here. So we don’t have any data in place, so that’s fine.

Now I have gone ahead and made changes on to the notebook in which we were taking the streaming data from Azure Event Hub. Now, what are the changes I have made? Firstly, I have added that spark configuration onto our notebook so that our notebook can now authorize itself onto our data lake Gen Two storage account. Next, if I scroll down so all of the logic remains the same here we’re now defining my data schema to keep in line with the schema that I have in my table.

Again, when it comes to the numbers, I have defined it as a long type over here, since my columns have now the type of Big integer, Big int here, I need to map it accordingly. Earlier it was the integer type. So I want to fit larger numbers into these columns if required. Now I’ll scroll down here I have my table name, which I have changed. So now it’s dBlog. And then we have our temp directory, we have our connection string. Now, when it comes to our data frame, if they are now using the right function, we are using now to write stream. Since we are now taking a stream of data, we have to ensure that we use this particular method. Now here I am also mentioning something known as a checkpoint location. This is very important. So remember, we are reading data from an Azure Event Hub. So let’s say one sequence of events have been read in order to ensure that the notebook does not read the same sequence of events.

Again, it will add a checkpoint to know exactly where it left off from. It also helps if in case this notebook or the query crashes, it knows where to resume from. And then we are now using the start method to start the streaming process. Before we were using the save method to directly save the data onto our table. So these are the changes I have made. Now let me run this cell. So now it’s starting the streaming process. So it started the spark job.

Let me go on to the table and see if we have any data. And here we can see we have data in place if we do account Star I’ll hit on execute. So currently these are the number of events that I have let me just wait for a couple of minutes. So just wait for a couple of minutes. Let me again run the statement. And now you can see we have a change in the number of rows. So now our data is being continuously streamed onto the table. So now I’ll go ahead and cancel the spark job. So in this chapter, I want to show you now how you can stream your data directly from Azure Event Hubs onto a table in your dedicated SQL pool.

Comments
* The most recent comment are at the top

Interesting posts

Preparing for Juniper Networks JNCIA-Junos Exam: Key Topics and Mock Exam Resources

So, you’ve decided to take the plunge and go for the Juniper Networks JNCIA-Junos certification, huh? Great choice! This certification serves as a robust foundation for anyone aiming to build a career in networking. However, preparing for the exam can be a daunting task. The good news is that this guide covers the key topics… Read More »

Mastering Microsoft Azure Fundamentals AZ-900: Essential Study Materials

Ever wondered how businesses run these days without giant server rooms? That’s the magic of cloud computing, and Microsoft Azure is a leading cloud platform. Thinking about a career in this exciting field? If so, mastering the Microsoft Certified: Azure Fundamentals certification through passing the AZ-900 exam is the perfect starting point for you. This… Read More »

The Impact of Remote Work on IT Certification Exam Processes

With remote work becoming the new norm, it’s not just our daily routines that have changed but also how we tackle IT certification exams. Gone are the days of trekking to testing centers; now, your living room can double as an exam room. This shift has brought about some fascinating changes and challenges. Let’s dive… Read More »

IT Risk Management: CRISC Certification Exam Essentials

Do you ever feel like the IT world is moving at warp speed? New tech seems to pop up every day, leaving you wondering how to keep up and truly stand out in your field. Companies are increasingly concerned about online threats, data leaks, and meeting legal requirements. That’s where the CRISC (Certified in Risk… Read More »

The Ultimate Guide to Mastering Marketing Automation for Email Wizards

Hey there, email aficionados! Welcome to your new favorite read – the one that’s going to turbocharge your email marketing game. You’re about to dive into the captivating world of marketing automation, a place where efficiency meets effectiveness, letting you boost your campaigns without breaking a sweat. Get ready to discover how automation can not… Read More »

Master YouTube Marketing with These 10 Powerful Steps

Welcome to the dynamic world of YouTube marketing! Whether you’re a seasoned pro or just getting started, harnessing the power of YouTube can significantly boost your brand’s visibility and engagement. With over 2 billion monthly active users, YouTube offers a vast audience for your content. But how do you stand out in such a crowded… Read More »

sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |