Microsoft Fabric Spark Diagnostic Emitter for Logs and Metrics

Microsoft Fabric Spark Diagnostic Emitter for Logs and Metrics

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI

Оглавление (3 сегментов)

Segment 1 (00:00 - 05:00)

he everyone welcome this is a fabric espresso new episode for data engineers and data scientists today together with Jenny we are going to talk about very important topic for every data engineer who is working with Apache spark Jenny thanks for joining us another time hello is Sarah it's great to be here awesome so let's just kick off the discussion we are going to talk about fabric Apache spark diagnostic emitter for locks and metrics so topic important for every data engineer working with Apache spark in the context of Microsoft Fabric and for those who are more advanced who wants to customize the experience process more logs emit metrics and process them later on so to kick off Jenna can you please start with the overview of what is this feature about why it's important yeah sure of course access to spark logs and metrics has always been a highy request feature over the past years the new Fab spark diog emitter Ena users to collect spark logs spark Matrix and spectable events and allow them to access data in their preferred locations yeah previously users can only access spark locks through the fabric UI or the SP server for individual spark application with this new feature user can quickly query and access spark kns and Matrix programmatically making it much easier to receive the data for any spark application at any time additionally they can also build their dashboard to valize the spark Matrix and an loocks on a broad skill such as you can do it at Works Space level Capac level or the tenant level this feature also allows user to quickly spark any abnormities and create alerting and notifications based on the logs Matrix and job events that's awesome so it sounds as a more advanced functionality for big organizations who want to control logs on the tenant capacity workspace level so how this feature so how the diagnostic emitter is integrated with existing monitoring tool yeah great question this dios IM feature is expanding and extending the current fabc monitor capabilities it allows users to liage the existing and Powerful aor monitoring tools to create their own customized notifications and build whatever the dashboard reporting want yeah and also by collecting those is servers with the fabric user can capture logs and matric from their spark applications and aniz those data with you live TOS the famili ways this flexibility also enabl user to choose the best approach they prefer for their specific Monitor and diagnostic needs got it I'm just wondering as in the context of Microsoft Azure one service is super popular Azure lock analy ICS so how monitoring fabrics and Spark monitoring works with Azure analytics in more detail aure loog analytics is a tool designed for analyze Logs with a lot of powerful capabilities so for our cers care when user choose either Lo analytics as their monitor destination so all those spark logs and matrics from the spark application are sended to log a log analytics workspace we emit all those data in four primary tables with a pre defined schema allowing user to easily query matrics such as CPU and memory utilization for any given moment for any Asar application from there users can aage a powerful log analytics query capabilities to analyze the data create their customize the dashboard Even build their like a workbook set up any alerts so this make user very easy to identify any issues and also monitor the application performs track the performance training over time and they can also like easily integrate the Fab spark monitor with some other systems that may also are interesting that's awesome so what I hear is that fabric Apache spark diagnostic aeter enables Apache spark to send to emit logs and metrics to for example Azure log Analytics and then I can process it

Segment 2 (05:00 - 10:00)

further what about two more popular destinations syns event Hub and storage account is that right yes yeah even Hub is another place user can e connect locks and Matrix near real time and also even H is even driven process and the streaming Analytics tool so if the customer has some other third party tool like spank even the H is the ideal option for the user to emit the data to the even H and then collect those data with the thir part tool like blank yeah we have some customers use this option to rout the fabric data and weiz preferred separate tool for like dashboarding Alles for the a blob storage on the other hand is very cost effective if you want to keep the logs for longer period of time and the even AR have certain quas you can also emit your data into the blob storage which will be very like a cost effective you can also incor the data at any time you want if anything goes wrong yeah that makes sense so centralized monitoring realtime monitoring and so the metrics and logs from fabric sending them to event tab or to log analytics or to storage Azure storage now I would love to see it but before the demo I believe that you will show us that on the real case but could you please share what is the process how to do that how to emit the logs the jobs events the metrics how to get started how to get like we know what's the benefit but how to do that yes sure and the process to unboarding is relative straightforward the first step is to config the setup for logs and matrics in the spark environment artifacts first one you need to indicate what information you want to emit such as spark driver knocks EXC knocks job events or Matrix this is only one time configuration currently the destination will support either event Hub either blob storage and log analytics in the future we are going to expand more destinations once this is confict the environment can be associated with a lowbook or workspace then it's very easy for you to reuse without you doing the configuration again once the step is complete users can run their notebooks or their best jobs as usual if they want to emit any cust logs they can use a log for J appender to generate locks T to their specific buiness arguments after they holding all those like data including custom logs flows into the select destination in near real time and the users can continuous getting monitor those data yeah that's a process that's the very good comprehensive uh workflow so a few steps to configure but the gain like the value is clear based on your experience working with key customers for uh using Microsoft fabric could you please share what are the key benefits of this ENT process coming directly from those customers the feature we are delivering is say I would say it's a common practice in the industry so the key benefits including streamlined data artion the real time visibility into spark applications and also the powerful programmatically current capabilities with this feature user can quickly identify any like aboral create any customer dashboard receive alert based on the logs metrix and job events this flexibility also impul the user to Monitor and optimized spark workload it micro like a skill effectively yeah that's in key benefits awesome so now this is a time for the demo let's jump right into to the demo part this demo walks through the process of emitting spark logs job events and metrics to your preferred locations allowing you to Aggregate and consume your data first I need to configure the setup for the logs and metrics destinations in the spark environment artifact the solution currently supports event Hub Azure blob storage and log analytics I can configure one or multiple destinations to store the logs and metrics for each destination I can specify what to emit in this event Hub example I have indicated that I want to emit the spark driver logs executor logs spark job events and metrics I can then specify the endpoint and access credentials for the event Hub similarly I can do the same configurations for log analytics and azure blob storage please note that

Segment 3 (10:00 - 14:00)

this configuration is a one-time effort I can then associate the environment with notebooks or workspaces now I can run my notebooks or spark batch jobs as I normally do if I want to send over any custom logs I can use the log for J appender to emit custom spark logs to meet my specific business needs all the data flows into the destination in near real time I can see the data flowing into event Hub the corresponding requests and messages and I can also choose different time ranges to view the data here is a sample data set of the emitted spark Logs with a predefined schema based on customer needs the solution Associates the logs with the tenant ID capacity ID workspace ID and artifact ID making it easy to Aggregate and Report the data the Livy ID or spark application ID is the identifier for customers to locate a particular spark application for Asher blob storage the data is stored by spark application with the Liv ID used as the root folder within the root folder the correspond executor and Driver data are saved in respective folders by drilling into the driver folder I can see all the spark logs job events and metrics files saved there I can download or export these logs for further queries and access lastly for log analytics all the data is emitted into three main tables with a predefined schema allowing users to easily query the data for instance I can query the metric table to understand my CPU and memory utilization for a particular spark application at a given moment moreover I can also do the last mile work to further ingest the data into fabric event house through the event stream as you can see here I can create an event stream connect to the event Hub which I use to emit the spark logs data as an external data source and then add the event house as a destination for querying the spark logs and metrics after I set up the source and destination I can view my sparkk logs and metrics in near real time for this example I can see all the logs associated with the fabric Livy ID spark application ID executor information and other spark properties the Data Insights tab also indicates my data ingestion flow volumes and locates the peak traffic hours I can also go to the event house system overview for data ingestion activities and use the custo database to query the data here is the 100 record of the table well this is the endtoend flow for emitting and consuming spark logs and metrics thank you for watching Janet thanks for doing the demo it was super great to see that functionality in action now just one short question what's the release stage is that generally available feature or is that in still public preview we have a timeline for GA is this Fe has already been released for preview so please find our documentation and our BLS and try this features and we love to hear your feedback and one more thing to add on as and we are looking forward to customer feedbacks to enhance our features so there are few things in the road map about this feature and we'll be looking the better UI experience for user to do the configurations that will be the like first thing we consider and second what potentially allow the user to indicate which customer log they want to emit or which customer log that want to emit so that will give user flexibility to control the information they want to Emit and the third one and we are also looking to like a better integration with the fabric monitor capabilities such as a fabric evem fabric even the house last just give user more option in addition to liage the power of a we want also like a strengthen the near real time monitor within the fabric so those are the items in our road map thank you so much thank you for sharing it thank you for joining us and for those who are watching us remember to again leave the like button subscribe the channel leave a comment suggest the topic for the next question the next Deep dive and until the next time happy enabling the that functionality a meting logs and a meeting metrics and making sure that you control your job execution thanks a lot for watching us

Другие видео автора — Azure Synapse Analytics

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник