-Data Platform at (COMPANY NAME):
The Data Platform team at (COMPANY NAME) advances the state of data at (COMPANY NAME), and empowers users to intuitively derive insights from this data. To accomplish this goal, the team leverages existing open source technologies like Kafka, Hadoop, Hive, Presto, Spark, and other inhouse tools to curate high-quality data sets. The team also builds data tooling and establishes company-wide best practices that empower users throughout the company to build high-quality datasets and data products.
What are examples of work that Data Platform Engineers have done at (COMPANY NAME)?
* Global Metrics Repo: a widely adopted framework that allows users to easily define metrics and dimensions, which can be leveraged for business reporting and evaluating experiment performance. The framework automates common data engineering practices, and optimizes data in DRUID for sub-second query response.
* Real-time/Online Data Services: a framework that enable online data use-cases for use in Product. Our current infrastructure leverages Spark Streaming and Mussel (our Production facing Key/Value store) to power numerous production facing use cases, and is backed by a robust anomaly detection framework powered by DRUID.
* Machine learning infrastructure: Many products at (COMPANY NAME) rely on machine learning (ML) to achieve their goals, and we've built a common infrastructure for ML that saves significant development time for the company.
* Logging Infrastructure: By clarifying testing procedure, and automating common testing procedures, we provide improved data quality, and faster iteration cycles for anyone working with data at (COMPANY NAME). To accomplish this, we are building tooling and test infrastructure that identifies problems before code is deployed into production.
* Pipeline Development/Testing Infrastructure: a framework that facilitates the data development lifecycle (mainly testing and deploying pipeline code). Currently under development, this tooling will support a variety of compute environments (i.e. Java Spark, Hive, SparkSQL) and integrate with SLA tracking, alerting, and anomoly detection frameworks.
The following experience is relevant to us:
* 4+ years of full-time, industry experience
* Working with data at the petabyte scale
* Design and operation of robust distributed systems
* Experience with Java / Scala is preferred
* Strong scripting ability in Ruby / Python / Bash
* Working knowledge of relational databases and query authoring (SQL)
* Love to use and develop open source technologies like Kafka, Hadoop, Hive, Presto, and Spark
* Rigor in high code quality, automated testing, and other engineering best practices
* BS/MS/PhD in Computer Science or a related field (ideal)
* Competitive salaries
* Quarterly employee travel coupon
* Paid time off
* Medical, dental, & vision insurance
* Life insurance and disability benefits
* Fitness Discounts
* Flexible Spending Accounts
* Apple equipment
* Commuter Subsidies
* Community Involvement (4 hours per month to give back to the community)
* Company sponsored tech talks and happy hours
* Much more