Skip to main content
Version: v1.0.0

All metrics

List of all available Metrics and Charts.

CSV summary​

  • Number of variables
  • Number of observations
  • Number of missing values
  • Percentage of missing values
  • Number of duplicated rows
  • Percentage of duplicated rows
  • Number of numerical variables
  • Number of categorical variables
  • Number of datetime variables

Summary with all variable name and type (float, int, string, datetime).

Data quality​

  • Numerical variables
    • Average
    • Standard deviation
    • Minimum
    • Maximum
    • Percentile 25%
    • Median
    • Percentile 75%
    • Number of missing values
    • Histogram with 10 bins
  • Categorical variables
    • Number of missing values
    • Percentage of missing values
    • Number of distinct values
    • For each distinct value:
      • count of observations
      • percentage of observations
  • Ground truth
    • if categorical i.e. for a classification model: bar plot (for both reference and current for an easy comparison)
    • if numerical, i.e. for a regression model: histogram with 10 bins (for both reference and current for an easy comparison)

Model quality​

  • Classification model
    • Number of classes
    • Accuracy (for both reference and current for an easy comparison)
    • Line chart of accuracy over time
    • Confusion matrix
    • Log loss, only for binary classification at the moment
    • Line chart of log loss over time, only for binary classification at the moment
    • For each class:
      • Precision (for both reference and current for an easy comparison)
      • Recall (for both reference and current for an easy comparison)
      • F1 score (for both reference and current for an easy comparison)
      • True Positive Rate (for both reference and current for an easy comparison)
      • False Positive Rate (for both reference and current for an easy comparison)
      • Support (for both reference and current for an easy comparison)
  • Regression model
    • Mean squared error (for both reference and current for an easy comparison)
    • Root mean squared error (for both reference and current for an easy comparison)
    • Mean absolute error (for both reference and current for an easy comparison)
    • Mean absolute percentage error (for both reference and current for an easy comparison)
    • R-squared (for both reference and current for an easy comparison)
    • Adjusted R-squared (for both reference and current for an easy comparison)
    • Variance (for both reference and current for an easy comparison)
    • Line charts for all of the above over time
    • Residual analysis:
      • Correlation prediction/ground_truth
      • Residuals plot, i.e, scatter plot for standardised residuals and predictions
      • Scatter plot for predictions vs ground truth and linear regression line
      • Histogram of the residuals
      • Kolmogorov-Smirnov test of normality for residuals

Data Drift​

Data drift for all features using different algorithms depending on the data type: float, int, categorical. We use the following algorithms (but others will be added in the future):