At LinkedIn, we are using this curriculum for onboarding our entry-level talents into the SRE role.
-
Updated
Dec 25, 2025 - HTML
Site reliability engineering (SRE) is a set of principles and practices that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems. Site reliability engineering is closely related to DevOps, a set of practices that combine software development and IT operations, and SRE has also been described as a specific implementation of DevOps.
At LinkedIn, we are using this curriculum for onboarding our entry-level talents into the SRE role.
Compilation of public failure/horror stories related to Kubernetes
A curated list of Platform Engineering Tools
A role-playing game for incident management training
A comprehensive 31-day Linux mastery curriculum designed specifically for DevOps Engineers, Site Reliability Engineers (SRE), System Administrators, and Cloud Engineers. This hands-on learning path takes you from Linux fundamentals to advanced system administration and automation.
Curated Self Study Guide for Computer Science, DevOps, SRE & SysAdmin
Complete Think
A RubyGems.org cache and private gem server
A self-hosted Infrastructure Automation Platform that brings structure, scale, and visibility to Ansible-based IaC.
A resource website dedicated to Reliability Engineering
Compilation of public failure/horror stories related to Kubernetes
Command-line network sniffer for Kafka. Tests DNS, TCP, TLS, SASL, protocol, produce/consume in one tool. Built for SREs and AI agents (structured JSON output).
A collection of opensource runbooks / playbooks
💪 Enhance your Claude Code superpowers with a community-driven library of skills and utility scripts for seamless skill management.
explore data-intensive architectures
Source code of my personal website/blog built using hugo and hosted at Github Pages: shyamjos.com