← Browse all jobs
RI

AI/ML Data Engineer (with AWS)

Rivago Infotech Inc
Markham, OntarioOn-site1 week ago
Apply Now →

About this role

Role Overview & Key Responsibilities • Data Pipeline Operations & On-Call : Own on-call rotation for ingestion pipelines (Kafka, AWS Glue); triage and resolve pipeline failures, schema mismatches, and throughput degradation; author RCAs. • Data Quality Monitoring : Implement and maintain data quality checks across Bronze->Silver->Gold lakehouse layers (S3->Kafka->Snowflake/Redshift); alert on anomalies, missing data, or drift. • ML Model Health & MLOps : Monitor deployed models for accuracy degradation, data drift, and concept drift; manage model redeployment workflows; maintain ML experiment tracking. • AI Platform Reliability (Bedrock + LangChain) : Monitor AWS Bedrock inference latency, token usage, error rates, and cost; operate LangChain agent pipelines; use Langfuse for Al evaluation and observability. • DORA Metrics - Data & AI Lens : Track deployment and release health for data pipeline and model updates; measure lead time for data model changes; monitor pipeline reliability as a DORA proxy. • Schema & Contract Management : Monitor AWS Glue Schema Registry for schema evolution events; validate Avro contract compliance for new producer payloads; coordinate schema changes with module teams. • Snowflake / Redshift Operations : Manage query performance, warehouse sizing, cost controls, and data retention policies; monitor Gold-layer data freshness and SLA compliance. • Incident Escalation : Serve as first-line triage for all data and Al incidents; escalate to core data/ML engineers only when root cause requires architectural changes or new feature work. Required Skills & Experience Data Engineering (Strong) • 4+ years of data engineering experience with production-grade pipelines • Proficient with Apache Kafka: consumer groups, topic management, lag monitoring, DLQ handling • Experience with AWS Glue, AWS Glue Schema Registry, and Avro/Parquet data formats • Hands-on with Snowflake or Redshift: query optimization, cost management, RBAC • Familiarity with lakehouse patterns: Bronze/Silver/Gold (S3-based) data architecture ML/AI Operations (Core Competency) • Experience with MLOps practices: model versioning, drift detection, retraining pipelines Familiarity with AWS Bedrock, SageMaker, or equivalent managed ML inference platforms • Working knowledge of LangChain or LlamaIndex for LLM application pipelines • Experience with AI/LLM observability tools (Langfuse, LangSmith, or equivalent) • Understanding of RAG (Retrieval-Augmented Generation) architectures and vector stores Operational Excellence (Core Competency) • DORA metrics application to data and ML delivery pipelines • On-call experience for data infrastructure; structured incident management and RCA • Data quality framework implementation: Great Expectations, dbt tests, or custom checks • Experience with monitoring and alerting for streaming pipelines (Kafka lag, throughput) Backend & AWS Exposure • Python proficiency - scripting, pipeline development, data transformation • AWS services: S3, Lambda, Glue, Bedrock, CloudWatch, SQS/SNS, IAM • Familiarity with containerized workloads on Kubernetes (EKS) • Experience with dbt or similar data transformation frameworks is a plus Nice to Have • Exposure to ontology or knowledge graph systems (RDF, OWL, or property graphs) • Familiarity with Temporal for workflow orchestration of ML pipelines • Experience with multi-tenant data platforms and row-level security patterns • Understanding of GDPR-compliant data handling and encryption key management
Want to see how well you match this job?
Get AI-scored for free →