Apache Spark on Kubernetes: The RIGHT Way (No Master/Worker Clusters Needed)
Machine-readable: Markdown · JSON API · Site index
Описание видео
Run Apache Spark jobs on Kubernetes with ZERO permanent infrastructure!
In this comprehensive tutorial, you'll learn how to deploy a production-ready Spark setup that creates pods ONLY when jobs run and automatically cleans up when done. Say goodbye to costly always-on Spark clusters.
🎯 WHAT YOU'LL LEARN
✅ Deploy Spark on Kubernetes WITHOUT permanent master/worker nodes
✅ Build custom Spark images with embedded PySpark jobs
✅ Submit jobs that auto-scale executors and self-cleanup ✅ Run real-world analytics: customer segmentation, cohort analysis, revenue trends
✅ Set up the Spark History Server for job monitoring
✅ Implement proper RBAC security for production
✅ Debug and monitor jobs using kubectl and Spark UI
TIMESTAMPS
0:00 Introduction
1:37 System Architecture
5:48 Setting up K8S
8:10 Setting up the project
10:00 K8S Namespaces
11:45 K8s Service Accounts, RBAC
17:27 Creating Spark Jobs for K8S
26:40 k8s Spark History Server
34:24 Spark Control Dashboard
42:42 k8s API layer
49:52 Spark Dashboard, Job submissions and review
56:52 Outro
🔗 RESOURCES & LINKS
FULL SOURCE CODE - https://buymeacoffee.com/yusuf.ganiyu/source-code-spark-k8s
• Apache Spark K8s Docs: https://spark.apache.org/docs/latest/running-on-kubernetes.html
• Kubernetes Documentation: https://kubernetes.io/docs/
• PySpark API Reference: https://spark.apache.org/docs/latest/api/python/
Like this video? Support us: https://www.youtube.com/@CodeWithYu/join
#ApacheSpark #Kubernetes #DataEngineering #BigData #PySpark #DevOps #CloudNative #K8s #DataPipelines #ETL #Tutorial