Skip to content

DataDev

Kubernetes

dmenezesgabriel/datadev

DataDev

dmenezesgabriel/datadev

Home
Home
- DataDev
Notes
Notes
- Data Engineering
  Data Engineering
  - Types of data
  - Properties of data
  - Data Layers
  - Storage Solutions
  - ACID
  - ETL
  - Data Sources
  - SQL Databases
    SQL Databases
    
    Databases
    
    SQL Commands
- Machine Learning
  Machine Learning
  - CRISP-DM
  - Design Principles
  - Requirements Gathering
  - Data Preprocessing / EDA
    Data Preprocessing / EDA
    
    Missing data
    
    Handling outliers
    
    Normalization & Scaling
    
    Unbalanced data
    
    Shuffling
  - Feature Engineering
    Feature Engineering
    
    Features
    
    Feature Engineering Types
    
    Feature Creation
    Feature Creation
    
    Encoding
    
    Binning
    
    Feature Extraction
    Feature Extraction
    
    PCA
  - Feature Store
  - Model Training
    Model Training
    
    Training
    
    Algorithms
    Algorithms
    
    KNN
    
    K-Means
    
    TD-IDF
    
    Factorization Machines
    
    Ensembles
    Ensembles
    
    Ensemble methods
    
    XGBoost
    
    LightGBM
    
    Deep Learning
    Deep Learning
    
    Activation Functions
    
    Convolutional Neural Networks
    
    Recurrent Neural Networks
    
    Parameter Tunning
    
    Problems
    Problems
    
    Bias
    
    Overfitting
  - Model Tuning
  - Model Evaluation
    Model Evaluation
    
    Confusion Matrix
    
    Metrics
    
    K-Fold Cross Validation
  - Optimization
  - MLOps
    MLOps
    
    Lifecycle
    
    MLflow
    
    Pipelines
    
    Model Deployment
    
    Model Monitoring
  - Cloud
    Cloud
    
    AWS Sagemaker
    
    AWS ML Services
- Generative AI
  Generative AI
- Software Engineering
  Software Engineering
  - Asymptotic Notations
  - Software Design Principles
    Software Design Principles
    
    SOLID
  - Infrastructure
    Infrastructure
    
    API Gateway
    
    Reverse Proxy
    
    Load Balancer
    
    Service Mesh
    
    Kubernetes Kubernetes
    Table of contents
    
    Description
    
    When to use
    
    Pros
    
    Cons
    
    Core components
    
    Control Plane
    
    Worker Nodes
    
    Core Objects
    
    Pod
    
    ReplicaSet
    
    Deployment
    
    Service
    
    ConfigMap and Secret
    
    Namespace
    
    Ingress
    
    AWS ECS
    
    Side Car
  - System Architecture
    System Architecture
    
    Monolith
    
    Microservices
  - Communication Styles
    Communication Styles
    
    REST
  - Application architecture
    Application architecture
    
    Hexagonal Architecture
  - Deployment Strategies
    Deployment Strategies
    
    Rolling Deployment
Cookbook
Cookbook
- Python
  Python
  - Data Structures & Algorithms
    Data Structures & Algorithms
    
    Lists & Stacks
    
    Strings
    
    Tuples
    
    Sets & Hashsets
    
    Dictionaries & Hasmaps
    
    Singly Linked Lists
    
    Doubly Linked Lists
    
    Queue
    
    Recursion
    
    Binary Search
    
    Binary Tree
    
    Binary Search Tree
    
    Heaps & Priority Queues
    
    Sorting
  - LeetCode
    LeetCode
    
    HashMap
    HashMap
    
    Two Sum
    
    Valid Anagram
    
    Contains Duplicate
    
    Two Sum
    
    Top K Frequent Elements
    
    Two Pointers
    Two Pointers
    
    Valid Palindrome
    
    Two Sum 2
    
    Container With Most Water
    
    Sliding Window
    Sliding Window
    
    Best Time to Buy and Sell
    
    Length of Longest Substring
    
    Stack
    Stack
    
    Valid parenthesis
    
    Binary search
    Binary search
    
    Binary Search
  - Essentials
    Essentials
    
    Math
    
    Is
    
    Iterators
    
    Generators
    
    Functions
    
    Decorators
    
    Classes
    
    Logging
    
    Process & Threads
    
    Memory Management
  - Design Patters
    Design Patters
    
    Singleton
    
    Multiton
    
    Strategy
  - Databases
    Databases
    
    SQLite
  - Data Engineering
    Data Engineering
    
    Spark
  - Data Analysis
    Data Analysis
    
    Numpy
    
    Pandas
    
    DuckDB
    
    Matplotlib
    
    Seaborn
    
    Altair
  - Linear Algebra
    Linear Algebra
    
    Vector & Matrix
  - Probability
    Probability
    
    Distribution Functions
    
    Bernoulli Distribution
    
    Binomial Distribution
    
    Poisson Distribution
    
    Normal Distribution
    
    Log Normal Distribution
    
    Pareto Distribution
    
    Central Limit Theorem
    
    Estimates
  - Descriptive Statistic
    Descriptive Statistic
    
    Percentile
    
    Covariance & Correlation
  - Inferential Statistic
    Inferential Statistic
    
    Hypothesis Testing
    
    P Value
  - Machine Learning
    Machine Learning
    
    Feature engineering
    
    ML ZoomCamp
    ML ZoomCamp
    
    Car price prediction
    
    Churn prediction
    
    Credit risk score
    
    Fashion Classification
    
    Taxi Trip Duration
    
    KNN
    KNN
    
    Iris Classification
    
    Cat & Dog Classification
    
    California Housing Price Regression
    
    Naive Bayes
    Naive Bayes
    
    Gaussian classification
    
    Multinomial classification
  - MLOps
    MLOps
    
    Model Registry
    Model Registry
    
    Mlflow
    
    Training Pipeline
    Training Pipeline
    
    Taxi Trip Duration
    
    Deployment
    Deployment
    
    Churn Prediction FastAPI
    
    Churn Prediction FastAPI MLflow
  - Generative AI
    Generative AI
    
    LLM ZoomCamp
    LLM ZoomCamp
    
    Search Engines
    
    Retrieval & Search
    
    Vector Search with Qdrant
    
    Evaluation
    
    Completion
    
    Structured Output
    
    Tools
    
    Retrieval
    
    Evaluation - Unit Testing
    
    Control Flow
    
    Knowledge
    Knowledge
    
    Extraction (Docling)
    
    Agent
    
    MCP
    MCP
    
    Stdio Client
    
    SSE Client
    
    Streamable HTTP Client
    
    LLM MCP Integration
    
    Transformers
  - Web
    Web
    
    Flask
    Flask
    
    Hello World
    
    Todo API
    
    Churn Prediction API
    
    Streamlit
    Streamlit
    
    Hello World
  - Infrastructure As Code
    Infrastructure As Code
    
    Terraform Resource
  - Cloud
    Cloud
    
    AWS Lambda Churn Prediction
    
    AWS Lambda Fashion Classification
Blog
Blog
- Blog
- Archive
  Archive
  - 2025
- Categories
  Categories
  - Blog

Infrastructure - Kubernetes¶

Description¶

Kubernetes (k8s) is an open source container orchestration platform.

In simple terms:

Kubernetes manages and automates the deployment, scaling, networking, and lifecycle of containerized applications.

If docker run on one container, Kubernetes manages hundreds or thousands of containers across many machines

It solves problems like:

Where should containers run?
How many replicas should exist?
What happens if a container crashes?
How do services communicates?
How we rollout a new version without downtime?

It uses a control loop model:

You declare desired containers
Controllers compare actual vs desired
Controllers reconcile differences

When to use¶

You have multiple microservices
You need auto-scaling
You need high availability
You deploy frequently CI/CD
You need zero-downtime deployments
You run workloads across multiple machines
You need portability across cloud providers

Typical scenarios:

SaaS platforms
Microservices architectures
Machine learning workloads
large scale web applications
Event driven systems

Pros¶

High availability
- Automatically restarts failed containers
Auto scaling
- HPA (Horizontal Pod Autoscaler) scales based on metrics
Rolling updates
- Zero downtime deployments
Self-healing
- Recreates crashed containers
Infrastructure abstraction
- Works on AWS, GCP, Azure, on-premises
Declarative configuration

Cons¶

High Complexity
- Steep learning curve
Operational overhead
- Cluster maintenance, upgrades, network complexity
Debugging can be hard
- Distributed systems are inherently complex
Overkill for small teams

Core components¶

Control Plane¶

API Server: Entrypoint to the cluster (kubectl interacts with this)
etcd: Key-value store for cluster state
Controller Manager: Runs controllers to maintain cluster state (eg. Replicaset Controller)
Scheduler: Assigns pods to nodes based on resource requirements and constraints

Worker Nodes¶

Kubelet: Agent that runs on each node, ensures containers are running
Container Runtime: Docker, containerd, etc. that runs the actual containers
Kube Proxy: Handles networking and load balancing for services

Core Objects¶

Pod¶

Smallest deployable unit (one or more containers)
Shared network and storage
Ephemeral (can die anytime)

Note

You rarely create pods directly in production. Instead, you use higher level controllers like Deployments or StatefulSets that manage pods for you.

ReplicaSet¶

Ensures a specified number of pod replicas are running
If one pod dies, it creates a new one to maintain the desired count

Deployment¶

Manages ReplicaSets and provides declarative updates for pods
Rolling updates
Rollbacks
Version control

Service¶

Provides stable network endpoint for a set of pods
It solves the problem that Pods are ephemeral, IPs can change.

Types:

ClusterIP (default): Internal access only
NodePort: Exposes service on a static port on each node
LoadBalancer: Provisions external load balancer (cloud provider)

ConfigMap and Secret¶

ConfigMap: Store non-sensitive configuration data
Secret: Store sensitive data (passwords, API keys) with base64 encoding

Namespace¶

Logical isolation for resources

Ingress¶

Manages external access to services, typically HTTP

Example:

/api -> service A
/web -> service B