At (COMPANY NAME) we face some of the most challenging, but interesting problems in the IT industry. We work at a scale of speed, performance and complexity that few others in the industry can compete with. Our data is not big it's absolutely HUGE. We have about 40 petabytes in our Hadoop storage (more than 30 TB extra per day), we take less than 10ms to respond to an ad request and we deliver billions of ads per day.
To help us solve these challenges, (COMPANY NAME) is looking for the best of the best in terms of engineering talent within our cool and geeky environment!
(COMPANY NAME) Research is pioneering innovations in online publishing and advertising. As the center of scientific excellence in the company, (COMPANY NAME) Research in Paris and Palo Alto deliver both fundamental and applied scientific leadership through published research, product innovations and new technologies powering the company's products.
We are looking for outstanding machine learning research scientists whose skills span the entire spectrum of scientific research, i.e. data gathering/cleaning, modeling, implementation, publication and presentation.
A Sampling of Research Topics
* Click prediction: How do you accurately predict if the user will click on an ad in less than a millisecond? Thankfully, you have billions of data points to help you.
* Recommender systems: A standard SVD works well. But what happens when you have to choose the top products amongst hundreds of thousands for every user, 2 billion times per day, in less than 50ms?
* Auction theory: In a second-price auction, the theoretical optimal is to bid the expected value. But what happens when you run 15 billion auctions per day against the same competitors?
* Explore/exploit: It's easy, UCB and Thomson sampling have low regret. But what happens when new products come and go and when each ad displayed changes the reward of each arm?
* Game theory/Reinforcement learning: How to find the optimal bidding strategy across multiple auctions? Can this be cast as a reinforcement learning problem in very high dimensions with unpredictable rewards.
* Offline testing/Metrics: You can always compute the classification error on model predicting the probability of a click. But is this really related to the online performance of a new model? What is the right offline metric that predicts online performance?
* Optimization: Stochastic gradient descent is great when you have lots of data. But what do you do when all data are not equal and you must distribute the learning over several hundred nodes?
Challenges of this role
* Gather and analyze data, identify key prediction/classification problems, devise solutions and build prototypes
* Research and investigate state-of-the-art data mining, machine learning and modeling techniques to apply to our specific business cases
* Initiate unique modeling projects, develop new and innovative algorithms and technologies and pursue patents where appropriate
* Stay current on published state-of-the-art algorithms and competing technologies
* Maintain academic credentials through publications, presentations and service to the research community
* Develop high-performance algorithms, test and implement the algorithms in scalable, product-ready code
Strong candidates qualifications
* PhD in Machine Learning or a related field.
* Strong hands-on skills in sourcing, cleaning, manipulating and analyzing large volumes of data.
* Strong implementation experience with languages, such as, Python, Perl, Ruby, Java, C#, Scala etc.
* Familiarity with Linux/Unix/Shell environments.
* Knowledge of Hadoop programming environments (e.g. Pig, Hive).
* Good oral and written communication and presentation skills.
* Experience with end-to-end modeling projects emerging from research efforts.
* A track record and interest in contributing to publications and service to the research community.
(COMPANY NAME) R&D Culture
* Empowerment -We believe in hiring the best engineers in the industry and then letting them get on with what they do best - designing, coding and releasing state of the art software.
* Mobility -In our Voyager program our engineers get to pick which team they want to work on for 2-4 weeks, boosting collaboration, networking and maybe even leading to switching teams.
* Agility- We work in a fast pace environment where we build and release stuff frequently to deliver value soon and adapt to changes quickly.
* Variety -We have many ways to get your code to production including our Hackathon, 10% projects, Voyager and more.
* Multicultural -We have engineers from all over the world for you to interact and exchange ideas with.
Our culture keeps evolving, and you will be expected to contribute actively with new ideas to complement and enhance the existing programs that include frictionless internal mobility, 10% time, mentoring, technical talks, hackathons, conferences, etc.
Are you up to the challenge?
About (COMPANY NAME)
(COMPANY NAME) (CRTO), the leader in commerce marketing, is building the highest performing and open commerce marketing ecosystem to drive profits and sales for retailers and brands. 2,700 (COMPANY NAME) team members partner with 16,000 customers and thousands of publishers across the globe to deliver performance at scale by connecting shoppers to the things they need and love. Designed for commerce, (COMPANY NAME) Commerce Marketing Ecosystem sees over $550 billion in annual commerce sales data. For more information, please visit wxx.xxxxxx.xxx.
The 600+ engineers @(COMPANY NAME) are building the next generation digital advertising technologies that allow us to manage billions of ad impressions every day. We're working in a very fast-paced release cycle and are adding new capabilities weekly and even daily.
A few figures:
* 15 datacenters (9 with computing capacity + 6 dedicated to network connectivity) across US, EU, APAC
* More than 24K servers, running a mix of Linux and Windows
* One of the largest Hadoop clusters in Europe with close to 108PB of storage and 32.000 cores
* 150B HTTP requests and close to 4B unique banners displayed per day
* Close to 3M HTTP requests per second handled during peak times
* 130Gbps of bandwidth, half of it through peering exchanges
We recognize that engineering culture is key for building a world-class engineering organization. Our core values are getting stuff done, collaboration and respect, code quality, striving for excellence, and having fun at what we do.
Do you want to know more about life in the R&D?
Youtube: R&D (COMPANY NAME) @ Europe
Our blog: http://wxx.xxxxxxxxxx.xxm