Public summary

We are seeking a Senior Data Engineer to design, develop, and maintain scalable data pipelines and architectures supporting analytics and data-driven decision-making. The role involves collaborating with cross-functional teams to transform raw data into reliable analytical assets using technologies such as Python, SQL, GCP services, and data orchestration tools. This position supports a mission-driven organization using cutting-edge AI and big data to innovate in the legal sector. The role encourages diversity, including affirmative hiring for people with disabilities.

Responsibilities

Design and maintain large-scale data ingestion and transformation pipelines (batch and streaming). Ensure data quality, reliability, and integrity throughout the data lifecycle through testing, version control, and monitoring. Implement and evolve data architecture on Google Cloud Platform including BigQuery, Cloud Storage, Pub/Sub, and Composer. Collaborate with Analytics, Product, and Engineering teams to translate business requirements into technical solutions. Contribute to data engineering best practices, standards, naming conventions, and monitoring frameworks.

Qualifications

Solid experience with Python and SQL; familiarity with Java or Scala preferred. Experience with Airflow, Airbyte, or similar orchestration and replication frameworks. Knowledge of data modeling and design best practices (Kimball, Data Vault, Medallion). Hands-on experience with GCP services like BigQuery, Cloud Storage, Pub/Sub, Composer, and IAM. Familiarity with streaming technologies such as Kafka, Dataflow, or Beam is a plus. Experience with infrastructure as code (Terraform) and CI/CD pipelines. Product-oriented mindset and ability to collaborate across disciplines. Additional advantages include experience with Apache Flink, data observability, governance initiatives, and analytical table formats.

Skills

Python SQL Java Scala Airflow Airbyte Google Cloud Platform BigQuery Cloud Storage Pub/Sub Composer IAM Kafka Dataflow Beam Terraform CI/CD Apache Flink Data Modeling Data Governance Observability Data Pipelines