Description

Position Name – Advance Data Engineer

Type of hiring – Fulltime

Location – Remote Canada

Job Description:

Python & PySpark

  • Proficient in both, with strong understanding of software engineering best practices.

Data Exploration & Troubleshooting

  • Ability to investigate data quality issues, debug pipelines, and explore datasets beyond surface-level analysis.

CI/CD & GitHub

  • Experience with GitHub and GitHub Actions for version control and automation; familiarity with CI/CD practices for testing and deployment.

Azure & Databricks

  • Hands-on experience with Azure cloud services and Databricks, including:
  • Databricks Jobs, Clusters, Unity Catalog

Data Pipeline Development

  • Design, build, and maintain robust, scalable, and automated data pipelines for both batch and streaming data ingestion using Databricks Workflows.
  • Implement data quality checks, profiling, validation, and root cause analysis to ensure data accuracy and consistency.
  • Design and implement data models and architectures that align with business needs and support efficient processing, analysis, and reporting.

Orchestration & Monitoring

  • Use workflow orchestration tools (e.g., Airflow) to automate pipeline execution and manage dependencies.
  • Integrate monitoring and alerting mechanisms to track pipeline health and proactively address issues.

Agile & Collaboration

  • Work in Agile environments, participating in sprint planning, stand-ups, reviews, and retrospectives.
  • Collaborate across teams and be adaptable to rotating responsibilities and changing priorities