DP-203 Data Engineering on Microsoft Azure – Design and Develop Data Processing – Azure Databricks

  • By
  • June 22, 2023
0 Comment

1. What is Azure Databricks

Now, before we actually go into Azure data bricks, let me go ahead and first explain the need for Data Bricks itself. So, Databricks is a company that was actually founded by the original creators of Apache Spark. So Data Bricks itself, the service actually makes use of Apache Spark to go ahead and provide a unified antics platform. So let’s go ahead and understand the use case of using Data bricks. So let’s say that you want to go ahead and make use of Apache Spark for your underlying processing needs.

So the first thing that you would need to do is you would need to go ahead and provision machines. You then go ahead and install the Spark engine on these underlying machines and the required libraries. And then you could actually go ahead and use Apache Spark for your data processing needs. Now, in such a scenario, so over here, you are responsible for provisioning the underlying machines. You are responsible for installing the required Spark engine and the required libraries.

Over here, you also have the responsibility of maintaining the underlying infrastructure itself. So if you need to go ahead and scale the underlying machines in order to cater to the data processing needs, this is something that you need to take care of. But with Databricks itself over here, data Bricks can allow you to create this entire environment with just a few clicks.

So over here, Data Bricks can actually go ahead and first of all create the underlying compute infrastructure for you. In addition to that, it will also go ahead and work with the underlying storage layer. So in addition to having the servers in place, it also provides an abstraction layer that allows Spark to go ahead and interact with the underlying storage service. It will also go ahead and install Spark for you and also other libraries and frameworks to go ahead and add other capabilities to Spark as well. So for example, you could also go ahead and include the use of machine learning libraries.

So all of this can be done by Data Bricks itself. It also goes ahead and provides a workspace for you. So in this workspace, you can actually go ahead and create notebooks. Users can then go ahead and collaborate on these notebooks. And you can also go ahead and create visualizations on the notebook itself. Now when it comes to Data bricks. So you can go ahead and launch data bricks either in AWS, that’s Amazon Web Services or Azure. And that’s where we come on to Azure databricks. So Azure databricks is nothing but a completely managed databricks environment for you.

So over here, it’ll actually go ahead and make use of the underlying compute infrastructure and the Virtual Network service that is already available in Azure. So Azure Data Bricks is nothing but an implementation of data bricks on Azure itself. Over here, you can also make use of Azure security aspects such as integration with Azure Active Directory and Rolebased access control. Right? So in this chapter, I just want to kind of give an introduction onto Azure databricks.

2. Clusters in Azure Databricks

Hi and welcome back. Now in the previous chapter, I gave an introduction onto Azure databricks. Now in this chapter, I just want to go through some important concepts before we go into labs, into looking at Azure data bricks, just so that you have an idea on what we are going to do in the labs itself. So again, in data bricks allows you to go ahead and create the underlying infrastructure which will have the underlying machines in place. And those machines will have Spock installed along with the underlying libraries that will allow you to go ahead and perform your data analytics. So now in this case, when it comes to Azure databricks over here, with the help of the Azure Data Bricks service, you can actually go ahead and create clusters in something known as a workspace. So this cluster of machines will actually go ahead and have the Spark engine and other components installed. Now, when it comes to the cluster itself, there are two types of nodes that get created.

So first you have the worker nodes. So these are the nodes that actually process the underlying task. So let’s say you want to go ahead and send a particular command on to the underlying Spark engine. That command will actually be sent onto the worker notes. The worker nodes will have the responsibility of performing the underlying tasks. And then you have the driver node. The driver node actually has the responsibility of distributing the tasks which we send on to the Spark cluster onto the worker nodes, right? So this is one of the key concepts in Azure databricks. We can actually go ahead and create a cluster of nodes.

Now in Azure databricks, when it comes to the clusters, there are two types of clusters in place. So we can actually go ahead and create something known as an interactive cluster, or you can go ahead and create something known as a job cluster. Now, with the help of the interactive cluster here, you can actually go ahead and analyze your data with the help of interactive notebooks over here.

Also, multiple users can go ahead and use a cluster and then collaborate on the notebooks that get created. So this is an interactive way of analyzing your data. Whereas let’s say you just want a job to run on the cluster, you don’t want any sort of interaction from a user, then you could actually go ahead and run that job on a job cluster. So when the job needs to run, then as your databricks will automatically go ahead and start the cluster, it will go ahead and run the job. And when the job is complete, the cluster will be terminated. So this is a cost efficient way of running jobs on a cluster. Now, again, when it comes to an interactive cluster, so there are two types of interactive clusters. So you have a standard cluster and you have a high concurrency cluster.

Now, the standard cluster is recommended if you are a single user working in Azure databricks. Now, over here, there is actually no fault isolation. So over here, yes, you can have multiple users that are running workloads on the cluster itself. But over here, in the standard cluster, there is no fault isolation. That means if a fault happens on a workload that has been executed by one user, it might impact the workloads running by other users on the same cluster. Over here, also the resources of the cluster might get allocated onto a single workload. So in this case, what happens is that if all of the resources are just working on a single workload, and if you have other users who are trying to execute their workloads on the cluster, they might not run efficiently because the resources are not being allocated onto those workloads.

Now, when it comes to a cluster, when it comes to running your notebooks, when it comes to a standard cluster, so it has support for the underlying languages, the programming languages of Python, Rscala and SQL, then you have the high concurrency clusters. So this is recommended for multiple users. So if you have multiple data engineering users who need to go ahead and make use of a cluster in Azure databricks, then you can go ahead and make use of a high concurrency cluster.

Here, you have aspects such as fault isolation. You are also the resources of the cluster are effectively shared across different user workloads. Now, over here, this has support for Python, R and SQL. So there is no support for scala. As of yet, in the high concurrency cluster, your odds are something known as table access control. Here you can go ahead and grant and revoke access onto data from either Python or SQL. Right, so in this chapter, just want to go through some important aspects when it comes to clusters in Azure databreak. Six.

3. Lab – Creating a workspace

So now, in this chapter, let’s go ahead with the working of Azure data bricks. So the first thing that we need to do is to create something known as an Azure Databricks workspace. So let’s do that. In all resources, I’ll hit on Create. So here, I will search for as your data bricks, I’ll choose that. I’ll hit on create. Here, I’ll choose my resource group. Here, I need to give a workspace name. I have to choose my region. So here I’ll choose North Europe.

Now, here, in terms of the pricing tier, there are different pricing tiers in place. I’m going to choose the trial, which is giving us the premium features along with 14 days free DB use. Now, I’ll explain this concept when it comes to this particular pricing tier. So I’ll do that at a later point in time. This is when creating the cluster in the workspace. For now, I’ll choose this pricing tier. I’ll go on to networking. I’ll leave everything hazardous. I’ll go on to advance. I’ll go on tags. I’ll go on to review and Create. And let’s hit on create.

So this is now going to launch our databricks workspace. Let’s come back once we have the workspace in place. Once we have the workspace in place, I’ll go ahead on to the resource. Here, we need to scroll down, and we need to now launch our workspace. So now when the workspace is where you’ll actually do all of your work, you’ll create clusters, you create notebooks. You can create spark databases and tables. So you will do all of your data engineering work here.

In this particular workspace, in as your data bricks here, you can see you have the ability to create a new notebook, create a table, create a cluster, create something known as a job. Here, in the menu options, you can again see that you can create a notebook. You can create a table, you can create a cluster. You can create a job. Here you can see an overview of your workspace. Here you can see something known as repos. You can look at your data. You can look at the compute options and at your jobs. Right? So in this chapter, I just want to start with creating databricks workspace.

Comments
* The most recent comment are at the top

Interesting posts

Preparing for Juniper Networks JNCIA-Junos Exam: Key Topics and Mock Exam Resources

So, you’ve decided to take the plunge and go for the Juniper Networks JNCIA-Junos certification, huh? Great choice! This certification serves as a robust foundation for anyone aiming to build a career in networking. However, preparing for the exam can be a daunting task. The good news is that this guide covers the key topics… Read More »

Mastering Microsoft Azure Fundamentals AZ-900: Essential Study Materials

Ever wondered how businesses run these days without giant server rooms? That’s the magic of cloud computing, and Microsoft Azure is a leading cloud platform. Thinking about a career in this exciting field? If so, mastering the Microsoft Certified: Azure Fundamentals certification through passing the AZ-900 exam is the perfect starting point for you. This… Read More »

The Impact of Remote Work on IT Certification Exam Processes

With remote work becoming the new norm, it’s not just our daily routines that have changed but also how we tackle IT certification exams. Gone are the days of trekking to testing centers; now, your living room can double as an exam room. This shift has brought about some fascinating changes and challenges. Let’s dive… Read More »

IT Risk Management: CRISC Certification Exam Essentials

Do you ever feel like the IT world is moving at warp speed? New tech seems to pop up every day, leaving you wondering how to keep up and truly stand out in your field. Companies are increasingly concerned about online threats, data leaks, and meeting legal requirements. That’s where the CRISC (Certified in Risk… Read More »

The Ultimate Guide to Mastering Marketing Automation for Email Wizards

Hey there, email aficionados! Welcome to your new favorite read – the one that’s going to turbocharge your email marketing game. You’re about to dive into the captivating world of marketing automation, a place where efficiency meets effectiveness, letting you boost your campaigns without breaking a sweat. Get ready to discover how automation can not… Read More »

Master YouTube Marketing with These 10 Powerful Steps

Welcome to the dynamic world of YouTube marketing! Whether you’re a seasoned pro or just getting started, harnessing the power of YouTube can significantly boost your brand’s visibility and engagement. With over 2 billion monthly active users, YouTube offers a vast audience for your content. But how do you stand out in such a crowded… Read More »

sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |