AutoML

This documentation section focuses on the creation and management of AutoML models (experiments).

The AutoML page in the Data Science Lab lets users create experiments on top of their datasets and view a list of all created experiments. Automated Machine Learning (AutoML) automates model selection and hyperparameter tuning to reduce the time and effort required to build accurate models. The AutoML feature manages the end-to-end workflow—from preparing raw datasets to producing a deployable machine learning model.

Note: AutoML accelerates model development but does not replace evaluation. Always validate performance and business fit before promotion to production.

What is an AutoML experiment?

An AutoML experiment applies multiple machine learning algorithms (with tuned hyperparameters) to a selected dataset, evaluates them against a chosen metric, and ranks candidates to identify the best-performing model(s).

Access the AutoML page

  1. Open Data Science Lab.

  2. Navigate to AutoML.

  3. The Experiments list displays all previously created AutoML experiments with key details (name, dataset, status, metric, last run).

Key capabilities

  • Automated modeling: Tries multiple algorithms with systematic hyperparameter tuning.

  • Metric-driven ranking: Scores candidates and ranks results by the selected objective metric.

  • Reproducible runs: Stores configurations and artifacts for auditability and reruns.

  • One-click progression: Promote top candidates to Models for registration, explainer generation, and pipeline use.

Typical workflow

  1. Select dataset – Choose the prepared table/file to train on.

  2. Define target & problem type – Classification/regression (AutoML may infer from data, but confirm settings).

  3. Choose metric – e.g., AUC, accuracy, F1, RMSE, MAE.

  4. Configure limits – Time budget, max trials, train/validation split, or CV.

  5. Run experiment – AutoML searches algorithms/parameters and logs candidates.

  6. Review leaderboard – Compare models by the objective metric and secondary diagnostics.

  7. Promote best model – Save to Models; optionally Register to Data Pipeline or Generate Explainer.

Evaluate and promote

  • Inspect diagnostics: Confusion matrix, ROC/PR curves (classification), residual/error plots (regression).

  • Fairness & drift checks: Validate on holdout or recent data; review feature importance and stability.

  • Promote: Save the chosen candidate to Models; proceed to Register (Data Pipeline) or Register as API if applicable.

Best practices

  • Data readiness: Ensure clean target labels, consistent schema, and representative splits.

  • Metric alignment: Select metrics aligned with business costs (e.g., F1 for imbalance, RMSE for forecast error).

  • Avoid leakage: Exclude post-outcome features; use proper time-aware splits for temporal data.

  • Right-size search: Set sane time/trial budgets; prefer cross-validation for small datasets.

  • Human in the loop: Review explainability (feature importance, partial dependence) before deployment.

Limitations & notes

  • AutoML may not capture complex domain constraints or custom loss functions without configuration.

  • Top-ranked models may overfit if validation is misconfigured—always retest on an unseen holdout set.

  • Resource/time budgets constrain search space; increase budgets for higher-quality candidates.

Reminder: AutoML speeds up exploration; final accountability for model quality and compliance remains with the data science team.