tl;dr: high-impact opportunity for a strong infrastructure engineer aspiring for start-up leadership to build high-performing infrastructure to serve production-ready API for LLMs
The Job:
We are looking for a growth-oriented, positive-minded, talented and motivated lead infrastructure engineer to join our founding team. You will be based in New York and report directly into the company’s co-founder and CTO.
Joining us this early means you’re going to have huge influence over the technical and cultural evolution of the company. You are signing up for a fast-paced working environment with a high bar for quality. We are working on hard problems and we have high expectations for one another.
This is a great fit if you hope to start your own company one day.
Together, we’re going to face the hardest technical challenges this journey has to offer head on. As the team grows, you’ll find yourself as the seasoned expert that everyone looks towards for guidance :)
What you’re signing up for:
- Architect, build, and deploy the backend systems and services to reliably serve LLM inference at scale (this is a hands on keys role)
- Set-up robust, efficient, configurable and high-performing infrastructure to serve LLMs (incl., GPU clusters, workload orchestration, failure management)
- Working closely with customers to understand what they need, figuring out how to make them successful, and owning the solution end to end
- Constantly researching, testing and implementing cutting edge infrastructure optimization frameworks as they come out
- Build a high-performing and customer-obsessed engineering team and culture
- Acting as a thought partner to the founders on strategy and company-building
- Defining and scoping green field projects that move the company forward
- Support the founders in our go-to-market efforts (e.g., joining technical calls with prospects)
You are:
- An expert using kubernetes and either pulumi or terraform (bonus points if you’re knowledgeable about Ray)
- An experienced software developer able to code in Python plus your languages of choice
- Experience with at least one Deep Learning framework such as PyTorch, TensorFlow, JAX
- Experienced working with multiple cloud providers (i.e., AWS, Google Cloud, Azure etc.)
- A hands-on engineering leader who can see the big picture and also dive into the code
- A total owner who delivers end-to-end solutions and cracks yet unsolved problems
- A force of nature, you sprint towards goals, you get the job done
- Able to see the 20% of effort that delivers 80% of the results / value
- Eager to get your hands dirty and build, measure, learn and build again on repeat
- Collaborative and able to communicate clearly, honestly and vulnerably
- A fast and eager learner with a growth mindset, you see feedback as a gift
- Comfortable in an environment of extreme uncertainty
- Fired up about building the future API of choice for LLMs
We estimate that someone with 3-5+ years of experience as a distributed systems engineer and 3+ years of engineering management experience will quickly contribute to our challenges.