Logs de Execução e CI/CD¶
O aptdata foi arquitetado com automação e integração em mente.
- Logs Estruturados: Emissão em NDJSON (
--json). - Observabilidade: Facilita a captura, parsing e ingestão direta em Data Lakes e ferramentas de monitoramento (ex: Datadog, ELK).
- Automação: Ideal para esteiras de CI/CD.
Integração com GitHub Actions¶
O exemplo abaixo demonstra um workflow automatizado usando cron-jobs, onde o log em formato JSON é arquivado no final da execução.
.github/workflows/data-pipeline.yml
name: Run Data Pipeline
on:
push:
branches: [ "main" ]
schedule:
# Execução diária à meia-noite
- cron: '0 0 * * *'
jobs:
run-pipeline:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Configurar Python 3.10
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Instalar dependências
run: |
python -m pip install --upgrade pip
pip install aptdata
- name: Executar Sistema aptdata
run: |
# Dispara o sistema "my_system" e coleta logs em NDJSON
aptdata run my_system --json > pipeline_run.json
- name: Arquivar Logs de Execução
uses: actions/upload-artifact@v3
with:
name: aptdata-execution-logs
path: pipeline_run.json
Formatos de Saída¶
Ao usar a flag --json, a saída é renderizada em NDJSON puro, ideal para parseadores automáticos e sistemas de logging.
pipeline_run.json
{"event": "system.started", "system_name": "my_system", "env": "prod", "dry_run": false, "timestamp": "2023-10-27T10:00:00.000000Z"}
{"event": "flow.started", "flow_id": "extract_flow", "timestamp": "2023-10-27T10:00:00.050000Z"}
{"event": "component.started", "component_id": "postgres_extractor", "timestamp": "2023-10-27T10:00:00.100000Z"}
{"event": "component.completed", "component_id": "postgres_extractor", "elapsed_seconds": 2.45, "timestamp": "2023-10-27T10:00:02.550000Z"}
{"event": "component.started", "component_id": "data_validator", "timestamp": "2023-10-27T10:00:02.600000Z"}
{"event": "component.completed", "component_id": "data_validator", "elapsed_seconds": 0.82, "timestamp": "2023-10-27T10:00:03.420000Z"}
{"event": "flow.completed", "flow_id": "extract_flow", "elapsed_seconds": 3.42, "timestamp": "2023-10-27T10:00:03.470000Z"}
{"event": "flow.started", "flow_id": "transform_flow", "timestamp": "2023-10-27T10:00:03.500000Z"}
{"event": "component.started", "component_id": "pandas_transformer", "timestamp": "2023-10-27T10:00:03.550000Z"}
{"event": "component.completed", "component_id": "pandas_transformer", "elapsed_seconds": 5.12, "timestamp": "2023-10-27T10:00:08.670000Z"}
{"event": "flow.completed", "flow_id": "transform_flow", "elapsed_seconds": 5.22, "timestamp": "2023-10-27T10:00:08.720000Z"}
{"event": "system.completed", "system_name": "my_system", "env": "prod", "dry_run": false, "elapsed_seconds": 8.77, "timestamp": "2023-10-27T10:00:08.770000Z"}
Por padrão (sem --json), o terminal utiliza o Rich para prover barras de progresso, realce de sintaxe e status visuais facilitando o debugging.
$ aptdata run my_system
[10:00:00] INFO System 'my_system' started in env 'prod'.
[10:00:00] INFO Flow 'extract_flow' started.
[10:00:00] INFO Component 'postgres_extractor' running...
[10:00:02] INFO Component 'postgres_extractor' completed in 2.45s.
[10:00:02] INFO Component 'data_validator' running...
[10:00:03] INFO Component 'data_validator' completed in 0.82s.
[10:00:03] INFO Flow 'extract_flow' completed in 3.42s.
[10:00:03] INFO Flow 'transform_flow' started.
[10:00:03] INFO Component 'pandas_transformer' running...
[10:00:08] INFO Component 'pandas_transformer' completed in 5.12s.
[10:00:08] INFO Flow 'transform_flow' completed in 5.22s.
[10:00:08] INFO System 'my_system' completed successfully in 8.77s.