Site Reliability Engineer

You are here

Date posted: 
September 15, 2021
Job Type: 
Boston, MA
Job ID: 


The ideal candidate will possess:

  • BS Degree in Computer Science or related technical field involving coding or equivalent practical experience
  • Strong data visualization (preferable experience with Delphix visualization platform), and data masking skills
  • 5+ years of IT infrastructure experience overall
  • 3+ years of Linux system administration experience, particularly supporting 3-tier web applications
  • Obsessive need to automate, relevant languages include: bash, python, PowerShell, AWS CLI
  • 1+ years of experience migrating and managing AWS workloads
  • Experience on a service desk providing a high level of customer satisfaction
  • Familiarity with AWS platforms such as EC2, VPC, S3, ELB, RDS, Route53, WorkSpaces, CloudWatch, CloudTrail, etc.
  • Fluency with modern DevOps concepts, CI/CD processes, git commands, container platforms (docker, kubernetes, ECS, etc.)
  • Comfortable working with CloudFormation, or Terraform
  • Experience as an on-call DevOps, SRE, or Cloud Operations Senior Engineer (at least 3 years)
  • Experience implementing Terraform best practices for infrastructure in AWS (at least 2 years)
  • Proven track record of designing, building, sizing, optimizing, and maintaining cloud infrastructure in AWS and Azure
  • Proven experience automation glue code, and managing production infrastructure in AWS
  • Proven track record of designing, implementing, and maintaining full build/release pipelines in a cloud environment (Jenkins/TeamCity/GitLab/GitHub Actions experience preferred)
  • Experience improving developer experience with desktop tooling and scripts
  • Knowledge of NoSQL database operations and concepts
  • Experience with MongoDB, Elasticsearch, and Redis (at least 1 year)
  • Understanding and experience with implementing best security practices in AWS/Linux/Kubernetes, pen testing and internal vulnerability analysis/incident response
  • Experience in monitoring, system performance data collection and analysis, and reporting


The person who secures this role will:

  • Work with internal teams to deliver substantially improved platforms to manage our hybrid cloud environment
  • Share responsibility for health, scalability and availability of our cloud services
  • Guide reliability practices through the entire software development lifecycle through activities such as architecture reviews, code reviews, creating platforms and frameworks, capacity planning, and chaos testing
  • Maintain the delicate balance between quality, speed, user experience, and customer expectations in a 24x7 operations environment
  • Automate deployment of AWS of infrastructure and services
  • Work with the Team to ensure cloud architecture meets scalability, availability and cost requirements
  • Follow good operational practices such as the creation of incident tickets and maintain documentation as needed
  • Design, code, test, and deliver software to automate manual operational work
  • Troubleshoot priority incidents, facilitate blameless post-mortems and ensure permanent closure of incidents
  • Engage with the Development Team throughout the lifecycle to help develop software for reliability and scale, ensuring minimal refactoring or changes
  • Identify application patterns and analytics in support of better service level objectives
  • Design self-healing and resiliency patterns
  • Design automated software and product upgrades, and release management solutions
  • Maintain service health through monitoring and follow-the-sun incident response
  • Protect the infrastructure and core applications from configuration drift using config management tooling for compliance
  • Contribute to codebase with CloudFormation, Terraform, or potentially other automation and scripting languages



50 Thomas Patten Dr.<br />2nd Floor<br />Randolph, MA 02368<br /><a href="" target="_blank">Directions to location</a>