Description du poste:
Our company mission is to map the internet's attention flows and create transparency about how society assigns credibility to information, people and institutions
What we're working on
We are building an influence algorithm. In other words, we are trying to find ways to describe groups of people mathematically. Many tried and failed before. But we think we can make it work.
Our core hypothesis is that influence can be quantified by tracking attention flows. In order to do that, we ingest data streams from multiple sources (we started with Twitter and are now indexing podcasts and soon more). We then cross-reference these datasets in an attempt to continuously improve the accuracy of our algorithm.
The accuracy of our work is being verified by members of the groups that we aim to describe. We publish our results in real-time and there are thousands of people already using our scores. It is hard to verify when we are right. But it is very easy to tell when we are wrong. This short feedback loop puts us in a unique position to work on problems that might be much harder or impossible to solve somewhere else.
We are a small, VC-funded startup. We are a remote-first team. Most of the team is based in Europe (Berlin, London, Barcelona) with our main office in Berlin. The whole team meets in person and works together in Berlin for several days at least every 3 months. Other than that the company 'lives' in Slack, Notion and other tools enabling effective communication.
About this role
Your main responsibility will be the design, implementation and continuous development of our data architecture.
We leverage heterogeneous data streams (Twitter API, RSS feeds, …) and therefore rely on non-relational databases as the central technology in data warehousing and processing. Deep familiarity with NRDBS (e.g. ArangoDB) in clustered architectures is what we are looking for the most in a candidate.
As owner of our data architecture, you will develop a deep understanding of the problems we are trying to solve with data, as well as our company's strategic direction and make design and implementation decisions accordingly.
This role is crucial for our company and we will ensure that the successful applicant enjoys the full support of our experienced team of developers and algorithm architects.
Additionally, you will also interface closely with our developers and algorithm architects in peripheral tasks up- and downstream from our DBS, such as
* Designing, launching and maintaining crawlers to tap into new data streams from various APIs
* Specifying data requirements and pre-processing routines as well as generating features from raw data
* Relating and matching entities from different data streams
* Measuring and ensuring data quality
* Developing solutions for automatic labeling of data based on machine learning
* Proficiency in *nix and Python
* Extensive experience with relational and non-relational databases (ideally ArangoDB) in clustered architectures
* Experience with the AWS ecosystem
* Experience utilizing 3rd party APIs for web scraping
* Good communication & writing skills
Great to have
* Experience working with API's & RSS feeds
* Interest and familiarity with latest developments in Deep Learning and general AI
Don't apply if
* You get defensive about your ideas
* You need somebody else to organize your work
* You'd rather not talk to people
Do apply if
* You use precise language and you insist that others do to
* You are happy to drop an idea if circumstances have changed and it's no longer the best solution
* You look for systemic flaws in systems and you are proactive in preventing them
The job is full time permanent. Our Berlin office is located in Mitte in a modern industrial coworking. You can make your own hours, but everybody is expected to be online during office hours in CET.
How to Apply
Please use the apply button. In the confirmation email you will get a link to a typeform with some task related questions. We think that your answers tell us way more about you than a CV or cover letter ever could.
If you take your time to answer the questions, we will get back to you within a couple of days. If you don't take your time to answer the questions, we will not move forward with you