Yandex.Cloud Data Proc Operators¶

The Yandex.Cloud Data Proc is a service that helps to deploy Apache Hadoop®* and Apache Spark™ clusters in the Yandex.Cloud infrastructure.

You can control the cluster size, node capacity, and set of Apache® services (Spark, HDFS, YARN, Hive, HBase, Oozie, Sqoop, Flume, Tez, Zeppelin).

Apache Hadoop is used for storing and analyzing structured and unstructured big data.

Apache Spark is a tool for quick data-processing that can be integrated with Apache Hadoop as well as with other storage systems.

Using the operators¶

See the usage examples in example DAGs