Profile Tab

The Profile Tab provides a data overview (patterns, values, occurrences) and auto-suggested transformations to quickly assess data quality and structure.

The Profile Tab provides a comprehensive overview of the dataset, highlighting data patterns, distinct values, and occurrences for each column. It also provides auto-suggested transformations to help users quickly clean and standardize their data.

The Profile Tab helps users understand the quality, distribution, and structure of data before performing transformations, enabling faster and more accurate data preparation.

Best Situations to Use

Use the Profile Tab when you want to:

  • Quickly analyze the structure and quality of a dataset.

  • Identify invalid, empty, or duplicate values in each column.

  • Examine patterns and distributions of column values.

  • Apply auto-suggested transforms for efficient data cleaning.

  • Visualize numeric, string, and date columns for profiling purposes.

Not Recommended for:

  • Very large datasets where profiling may impact performance (use sampling first).

  • Final reporting (Profile Tab is primarily for preparation, not reporting).

Info: Values and Statistics

String Columns

When a column is of string type, the following statistics are displayed:

  • Count: Total number of rows

  • Valid: Count of valid values

  • Invalid: Count of invalid values

  • Empty: Count of empty cells

  • Duplicate: Number of duplicate entries

  • Distinct: Number of unique values

Numeric Columns

For numeric columns, in addition to the above, the following aggregations are displayed:

  • Minimum: Smallest value

  • Maximum: Largest value

  • Mean: Average value

  • Variance: Measure of data dispersion

Pattern

The Pattern section displays the occurrence of unique patterns in the column values, represented in a chart.

  • Note: These patterns are representative of value structures (e.g., numeric or text patterns) and do not reflect the actual values.

Suggestions

The Suggestions section provides auto-generated recommended transformations for the selected column, helping users clean and standardize the dataset efficiently.

Accessing Suggestions

  1. Select a column from the dataset.

  2. Open the Profile Tab.

  3. Scroll down to the Suggestions section.

  4. Auto-generated suggestions related to the selected column will be displayed.

Applying Suggestions

  1. Select the desired transform(s) using the provided checkboxes.

  2. Click Apply.

  3. The selected transform is applied, and a new column is added with the transformed data.

Chart

The Profile Tab includes built-in charts for visualizing column data:

  • Column Chart: Used for numeric and date columns, displaying the distribution of values.

  • Bar Chart: Used for string columns, displaying occurrences of each category.

Sorting the Bar Chart

  • Charts can be sorted by group or count of occurrences.

  • Sorting can be done in ascending or descending order to highlight patterns.

  • Customize the chart display by searching for specific values or patterns.

  • Example: Entering "M" in the search bar can filter the chart to display only occurrences of categories containing "M".

Notes:

  • The Profile Tab is ideal for initial data exploration and quality assessment.

  • Use the Suggestions feature to streamline repetitive cleaning tasks.

  • For large datasets, consider sampling to improve performance when generating patterns and charts.

  • Chart visualizations provide a quick overview of data distribution and help identify anomalies or outliers.

Last updated