Using OpenLineage integration¶
Usage¶
No change to user DAG files is required to use OpenLineage. However, it needs to be configured. Primary, and recommended method of configuring OpenLineage Airflow Provider is Airflow configuration.
At minimum, one thing that needs to be set up in every case is Transport
- where do you wish for
your events to end up - for example Marquez. The transport
field in configuration is used for that purpose.
[openlineage]
transport = '{"type": "http", "url": "http://example.com:5000"}'
If you want to look at OpenLineage events without sending them anywhere, you can set up ConsoleTransport - the events will end up in task logs.
[openlineage]
transport = '{"type": "console"}'
You can also configure OpenLineage transport using openlineage.yml
file.
Detailed description of that configuration method is in OpenLineage python docs.
To do that, you also need to set up path to the file in Airflow config, or point OPENLINEAGE_CONFIG
variable to it:
[openlineage]
config_path = '/path/to/openlineage.yml'
Lastly, you can set up http transport using OPENLINEAGE_URL
environment variable, passing it the URL target of the OpenLineage consumer.
It’s also very useful to set up OpenLineage namespace for this particular instance. If not set, it’s using default
namespace.
That way, if you use multiple OpenLineage producers, events coming from them will be logically separated.
[openlineage]
transport = '{"type": "http", "url": "http://example.com:5000"}'
namespace = 'my-team-airflow-instance`
Additional Options¶
You can disable sending OpenLineage events without uninstalling OpenLineage provider by setting disabled
to true or setting OPENLINEAGE_DISABLED
environment variable to True.
[openlineage]
transport = '{"type": "http", "url": "http://example.com:5000"}'
disabled = true
Several operators - for example Python, Bash - will by default include their source code in their OpenLineage events. To prevent that, set disable_source_code
to true.
[openlineage]
transport = '{"type": "http", "url": "http://example.com:5000"}'
disable_source_code = true
If you used OpenLineage previously, and use Custom Extractors feature, you can also use them in OpenLineage provider.
Register the extractors using extractors
config option.
[openlineage]
transport = '{"type": "http", "url": "http://example.com:5000"}'
extractors = full.path.to.ExtractorClass;full.path.to.AnotherExtractorClass
Other¶
If you want to add OpenLineage coverage for particular operator, take a look at
Implementing OpenLineage in Operators
For more explanation visit OpenLineage docs