All metrics
List of all available Metrics and Charts.
CSV summary​
- Number of variables
- Number of observations
- Number of missing values
- Percentage of missing values
- Number of duplicated rows
- Percentage of duplicated rows
- Number of numerical variables
- Number of categorical variables
- Number of datetime variables
Summary with all variable name and type (float, int, string, datetime).
Data quality​
- Numerical variables
- Average
- Standard deviation
- Minimum
- Maximum
- Percentile 25%
- Median
- Percentile 75%
- Number of missing values
- Histogram with 10 bins
- Categorical variables
- Number of missing values
- Percentage of missing values
- Number of distinct values
- For each distinct value:
- count of observations
- percentage of observations
- Ground truth
- if categorical i.e. for a classification model: bar plot (for both reference and current for an easy comparison)
- if numerical, i.e. for a regression model: histogram with 10 bins (for both reference and current for an easy comparison)
Model quality​
- Classification model
- Number of classes
- Accuracy (for both reference and current for an easy comparison)
- Line chart of accuracy over time
- Confusion matrix
- Log loss, only for binary classification at the moment
- Line chart of log loss over time, only for binary classification at the moment
- For each class:
- Precision (for both reference and current for an easy comparison)
- Recall (for both reference and current for an easy comparison)
- F1 score (for both reference and current for an easy comparison)
- True Positive Rate (for both reference and current for an easy comparison)
- False Positive Rate (for both reference and current for an easy comparison)
- Support (for both reference and current for an easy comparison)
- Regression model
- Mean squared error (for both reference and current for an easy comparison)
- Root mean squared error (for both reference and current for an easy comparison)
- Mean absolute error (for both reference and current for an easy comparison)
- Mean absolute percentage error (for both reference and current for an easy comparison)
- R-squared (for both reference and current for an easy comparison)
- Adjusted R-squared (for both reference and current for an easy comparison)
- Variance (for both reference and current for an easy comparison)
- Line charts for all of the above over time
- Residual analysis:
- Correlation prediction/ground_truth
- Residuals plot, i.e, scatter plot for standardised residuals and predictions
- Scatter plot for predictions vs ground truth and linear regression line
- Histogram of the residuals
- Kolmogorov-Smirnov test of normality for residuals
Data Drift​
Data drift for all features using different algorithms depending on the data type: float, int, categorical. We use the following algorithms (but others will be added in the future):