Databases Meets Machine Learning — Scalable Machine Learning with Graphs and Search BigData и машинное обучение

Программный комитет ещё не принял решения по этому докладу

Тезисы

Many powerful Machine Learning algorithms are based on existing graphs and text retrieval algorithms, e.g., Page Rank (Pregel), Recommendation Engines (collaborative filtering), text summarization and other NLP tasks.
There are even more applications once we consider data pre-processing and feature engineering which are both vital tasks in Machine Learning Pipelines.

But how can we combine Databases with Machine Learning Systems such as TensorFlow or Pytorch? How do Databases fit in more complex Machine Learning Pipelines such as TensorFlow Extended (TFX)? How can we scale Graph-based Machine Learning to the data sizes typically involved in Machine Learning?

Using real-world examples we show how Multi-Model Databases and Machine Learning System (supporting Graph and Search natively) form a very powerful combination. In particular, we will focus on graph-based Machine Learning models and graph-based data pre-processing and feature engineering (which can, in turn, serve as input for a deep neural network).

In this talk you learn about:
* How graphs and text retrieval can help us to model complex Machine Learning tasks in practice.
* How to leverage Databases for graph-based Machine Learning Models.
* How to leverage graphs and search for data pre-processing and feature engineering.
* How Databases integrate into existing Machine Learning Pipelines such as TensorFlow Extended.

Андрей Абрамов

ArangoDB

Последние 9 лет занимается разработкой поисковых и рекомендательных систем, специализируется на методиках анализа данных, основанных на различных мерах похожести, задачах классификации и кластеризации.

Jörg Schad

ArangoDB

Head of Engineering and Machine Learning at ArangoDB. In a previous life, he has worked on or built machine learning pipelines in healthcare, distributed systems at Mesosphere, and in-memory databases. He received his Ph.D. for research around distributed databases and data analytics. He’s a frequent speaker at meetups, international conferences, and lecture halls.