Senior Data Engineer - Power Factors

Opportunities with AppDirect's Advisor Partners (Recruitment as a Service)
RemoteRemote1 week ago
About this role

About the position

Power Factors is seeking a Senior Data Engineer to join the Innovation team and play a crucial role in the PF-LLM program. This initiative focuses on building a multivariate time-series foundation model for renewable energy assets. The engineer will be responsible for the data foundation, inference service, and platform integration, ensuring the success of state-of-the-art model training and deployment. This is a critical, day-one role with significant impact.

Responsibilities
• Design and build production ETL pipelines at fleet scale, from source systems to warehouse and feature store, covering thousands of wind and PV sites across multiple OEMs.
• Own the design of canonical signal schemas for wind and PV asset classes and OEMs.
• Implement automated data quality gates, including sparsity and missingness checks, flatline detection, outlier flagging, and freshness validation, with automated alerting.
• Implement dataset versioning to ensure reproducibility of trained models.
• Build and maintain backfill jobs with idempotency guarantees and retry logic to handle mid-run failures without data duplication.
• Govern storage and compute costs for the data warehouse from the outset.
• Build the batch and on-demand inference API with contract tests, sized for fleet-wide daily runs.
• Establish latency and throughput baselines, and own the cold-start and model-loading strategy.
• Instrument the inference service with structured logs and metrics.
• Integrate forecasts into the Power Factors product platform, including authentication, authorization, customer isolation, observability, and feature flags.
• Build and maintain the shadow validation pipeline to run live inference in parallel with existing forecast paths, log predictions and actuals, and generate validation reports.
• Support the pilot customer rollout by enabling the product for friendly customers and managing incoming data and integration tickets.
• Collaborate closely with ML Engineers on data quality requirements, feature store interfaces, and pipeline handoffs.
• Partner with the Tech Lead and Frontend Engineer on platform integration.
• Contribute to architectural decisions and document data flows, schemas, and pipeline runbooks.

Requirements
• 6+ years of back-end and data engineering experience with a proven track record of shipping production systems.
• Production-grade ETL/ELT pipeline design at scale, including idempotency, retry logic, backfill jobs, incremental loading, and cost-controlled warehouse compute.
• Schema design and data modeling across heterogeneous sources, with experience reconciling signals from disparate systems into a canonical, queryable format.
• Data quality engineering: automated quality gates (sparsity, flatline detection, outlier flagging, freshness checks), alerting pipelines, and dataset versioning for ML reproducibility.
• API design and development: RESTful inference services with contract testing, latency and throughput budgeting, and structured observability (logs, metrics, traces).
• Experience integrating ML model outputs into SaaS product surfaces: auth and authorization, customer isolation, and feature flag management.
• Cloud infrastructure proficiency (AWS preferred), containerization (Docker, Kubernetes), and CI/CD pipeline ownership.
• Python and SQL as core tools.
• Hands-on experience with modern warehouse technologies (Snowflake, BigQuery, or Databricks).
• Experience with pipeline orchestration tools like Airflow, Prefect, or Dagster.
• Excellent written and verbal communication skills in English.

Nice-to-haves
• Experience with time-series, IoT, or industrial sensor data (SCADA systems, irregular sampling, high-missingness signals).
• Familiarity with streaming data platforms (Kafka, Kinesis, or Pub/Sub) for real-time or near-real-time ingestion.
• Experience designing and managing feature stores for ML training and serving.
• Renewable energy domain knowledge: understanding of wind and solar asset operations and the data they produce.
• Experience standing up shadow-mode or A/B comparison pipelines for ML systems.
• Multi-tenancy and platform integration experience in B2B SaaS products.
• Knowledge of data lake architectures and open table formats (Delta Lake, Iceberg, Parquet).
• Familiarity with MLOps practices: model registry conventions, retraining triggers, and drift monitoring.

Benefits
• Comprehensive benefits package including health, dental, and vision coverage.
• Dedicated wellness support.
• Generous paid vacation policy.
• Employer RRSP matching program.
• Work-from-abroad opportunities with manager approval.
• Exposure to a global team operating across multiple countries and time zones.
Want to see how well you match this job?
Get AI-scored for free →