Yandex.Cloud Data Proc Operators¶
The Yandex.Cloud Data Proc is a service that helps to deploy Apache Hadoop®* and Apache Spark™ clusters in the Yandex.Cloud infrastructure.
You can control the cluster size, node capacity, and set of Apache® services (Spark, HDFS, YARN, Hive, HBase, Oozie, Sqoop, Flume, Tez, Zeppelin).
Apache Hadoop is used for storing and analyzing structured and unstructured big data.
Apache Spark is a tool for quick data-processing that can be integrated with Apache Hadoop as well as with other storage systems.
Using the operators¶
See the usage examples in example DAGs