Staff Software Engineer
Software Engineering
Calgary, AB, Canada
Job Description
What's the opportunity?
We're seeking a seasoned Staff Software Engineer to join the RBC Borealis AI Platform team and own the end-to-end lifecycle of machine learning systems—from experimentation and validation through to high-throughput production serving at scale. You'll be the technical anchor for operationalizing vision language models and document processing systems that handle thousands of documents per minute, setting the bar for reliability, observability, and engineering excellence across our AI platform.
You'll lead the design and evolution of our scalable document processing platform—a production system that combines event-driven architecture, vision language models, and cloud-native infrastructure to extract intelligence from financial documents at enterprise scale:
ML System Operationalization: Own the production lifecycle of LLM and computer vision models, from integration and validation to serving, monitoring, and continuous improvement at 1000+ documents/minute throughput
Platform Architecture: Design resilient microservices using FastAPI and event-driven patterns with Apache Kafka, ensuring 99.5%+ reliability for mission-critical financial document processing
Scalable Infrastructure: Build and optimize Kubernetes-native workloads with KEDA-based autoscaling (3-50 replicas dynamically), PostgreSQL/MongoDB data layers, and S3 object storage with lifecycle management
Observability & Reliability: Establish comprehensive monitoring, alerting, and SRE practices that provide deep visibility into model performance, system health, and business metrics across distributed services
This is a rare opportunity to shape the foundation on which Canada's largest financial institution runs its most critical AI workloads, working directly with leading researchers in machine learning while having access to rich, massive datasets and the computational resources to support groundbreaking innovation
Your responsibilities include:
Technical Leadership & ML Engineering
Architect production ML pipelines that seamlessly integrate vision language models, OCR engines, and document extraction services into scalable, fault-tolerant systems
Drive technical decisions on complex distributed systems challenges involving data consistency, exactly-once processing semantics, and sub-500ms API response times
Collaborate closely with ML researchers to translate cutting-edge models in computer vision, NLP, and reinforcement learning into production-ready services
Set engineering standards for model serving, A/B testing, feature flags, and gradual rollouts that enable safe, data-driven experimentation at scale
Platform Development & Innovation
Build sophisticated retry mechanisms with exponential backoff, circuit breakers, dead-letter queues, and fallback strategies that ensure system resilience
Implement advanced event-driven patterns across Kafka topics (ingestion, processing, callbacks, DLQ) with precise consumer group management and lag-based autoscaling
Develop reusable frameworks and libraries for async processing, template-based document parsing, and callback orchestration that accelerate team productivity
Lead the evaluation and adoption of emerging AI technologies, ensuring alignment with enterprise security, compliance, and data governance requirements
Cross-Functional Collaboration
Partner with data scientists and ML researchers to understand model requirements, performance characteristics, and integration patterns for production deployment
Work with process engineers and business stakeholders to translate financial document processing needs into robust, scalable technical solutions
Foster strong relationships across platform, infrastructure, and security teams to deliver end-to-end capabilities that span multiple domains
Mentor engineers on distributed systems design, event-driven architecture, ML ops best practices, and cloud-native development patterns
Strategic Problem Solving
Navigate ambiguity in complex technical challenges, from Kafka partition strategies to LLM provider selection to autoscaling configurations
Identify and mitigate architectural risks before they impact production, using techniques like chaos engineering, load testing, and failure mode analysis
Provide clear, data-driven recommendations to engineering leadership on infrastructure investments, technology choices, and platform roadmap priorities
Drive continuous improvement in system performance, cost efficiency, and developer experience through metrics-driven iteration
You're our ideal candidate if you have:
5-8+ years of software engineering experience with 3+ years focused on ML systems, data platforms, or high-scale distributed systems
Deep expertise in Python and production-grade API frameworks (FastAPI, Flask, or similar) with strong software design principles
Proven track record operationalizing ML models in production—you've integrated LLMs, vision models, or similar AI services into scalable systems
Strong hands-on experience with event-driven architectures using Apache Kafka, RabbitMQ, or cloud-native messaging platforms
Production experience with both SQL (PostgreSQL) and NoSQL (MongoDB, DynamoDB) databases, understanding tradeoffs and optimization strategies
Expert-level knowledge of containerization (Docker) and Kubernetes/OpenShift orchestration, including custom resources, operators, and autoscaling
What's in it for you?
Become part of a team that thinks progressively and works collaboratively. We care about seeing each other reach full potential;
A comprehensive Total Rewards Program including bonuses and flexible benefits, competitive compensation, commissions, and stock options where applicable;
Leaders who support your development through coaching and managing opportunities;
Ability to make a difference and lasting impact from a local-to-global scale.
About RBC Borealis
RBC Borealis is the driving force behind Royal Bank of Canada’s AI and data innovation. As part of Canada’s largest financial institution, we bring together a team of architects, engineers, scientists, and product experts on a mission to revolutionize finance through world-class research, solutions, and a resilient data platform. With locations across Toronto, Waterloo, Montreal, Calgary, and Vancouver, we’re at the forefront of AI research and platform development. With a focus on cutting-edge research in areas like time series forecasting, causal machine learning, and responsible AI, we are seamlessly integrating AI research and data engineering, to solve critical challenges in the financial industry. We are building intelligent, and scalable, data-driven solutions that will help communities thrive and drive innovation for our customers across the bank.
Inclusion and Equal Opportunity Employment
RBC is an equal opportunity employer committed to diversity and inclusion. We are pleased to consider all qualified applicants for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, protected veterans status, Aboriginal/Native American status or any other legally-protected factors. Disability-related accommodations during the application process are available upon request.
#TECHPJ
#Ll-POST
Job Skills
AI Ops, Amazon SageMaker, Apache Kafka, Autoscaling, Big Data Management, CI/CD, Datadog, Data Mining, Data Science, Deep Learning, Dynatrace APM, Machine Learning (ML), Microsoft Azure, MLflow, ML Integration, MongoDB, Predictive Analytics, Programming Languages, Python (Programming Language), Red Hat OpenShiftAdditional Job Details
Address:
City:
Country:
Work hours/week:
Employment Type:
Platform:
Job Type:
Pay Type:
Posted Date:
Application Deadline:
Note: Applications will be accepted until 11:59 PM on the day prior to the application deadline date above
Our Employment Opportunities
At RBC, we are guided by living shared values of Client First, Integrity, Collaboration, Respect and Excellence and winning together as One RBC. We believe an inclusive workplace that has diverse perspectives is core to our continued growth as one of the largest and most successful banks in the world. Maintaining a workplace where our employees feel supported to perform at their best, effectively collaborate, drive innovation, and grow professionally helps to bring our Purpose to life and create value for our clients and communities. RBC strives to deliver this through policies and programs intended to foster a workplace based on respect, belonging and opportunity for all.
Join our Talent Community
Stay in-the-know about great career opportunities at RBC. Sign up and get customized info on our latest jobs, career tips and Recruitment events that matter to you.
Expand your limits and create a new future together at RBC. Find out how we use our passion and drive to enhance the well-being of our clients and communities at jobs.rbc.com.
RBC is presently inviting candidates to apply for this existing vacancy. Applying to this posting allows you to express your interest in this current career opportunity at RBC. Qualified applicants may be contacted to review their resume in more detail.