Explainable Artificial Intelligence (XAI) refers to a set of methods and practices aimed at making the behavior and decisions of artificial intelligence systems understandable to humans. As machine learning and deep learning models have grown in complexity, the need to demystify their inner workings has become increasingly important - especially in domains where decisions impact lives, such as healthcare, law, and public administration. XAI seeks to provide transparency, interpretability, and trust by enabling users to comprehend why a model made a certain prediction, how it weighs input data, and to what extent it can be held accountable for its outcomes.

Traditionally, many machine learning models - especially those involving deep neural networks - function as "black boxes," offering high predictive accuracy but little insight into how they arrive at specific decisions. Explainability challenges this paradigm by emphasizing models whose mechanisms can be logically traced, mathematically analyzed, and, ideally, intuitively understood by both developers and domain experts.

In this context, MicroPython offers a uniquely advantageous environment for fostering explainability. As a lightweight implementation of Python designed for microcontrollers, MicroPython encourages minimalism, clarity, and a hands-on approach to programming. When complex artificial intelligence architectures are rebuilt in MicroPython from the ground up - without relying on abstracted libraries or opaque function calls - the result is a codebase that closely mirrors the mathematical logic behind machine learning and neural network operations. This stripped-down setting compels the developer to engage directly with core principles such as matrix multiplication, activation functions, and gradient descent, making the implementation not just visible, but pedagogically powerful.

By removing the comfort of high-level libraries, MicroPython invites learners and researchers to reengage with the fundamentals. Each neuron, each layer, and each weight update must be explicitly defined and managed, thereby providing a unique opportunity to demystify how models process data, adjust parameters, and converge toward solutions. This low-level perspective is not only educational but also instrumental in aligning artificial intelligence development with the ideals of transparency and accountability central to XAI. Thus, combining the discipline of MicroPython with the goals of XAI creates a framework that is as instructive as it is principled - offering a rare and valuable bridge between theoretical understanding and practical implementation. Therefore, this comprehensive codebook covers a well-chosen set of common statistical basics, as well as machine learning and deep learning algorithms.

1. Statistical Basics - I

We will start with some statistical basics: Mean, variance and standard deviation. As part of univariate statistics, they not only serve to describe individual variables, but are also important foundations for advanced statistical analyses.

1.1. Dataset

One variable of the trees dataset, provided by Atkinson, A. C. (1985): Plots, Transformations and Regression via Oxford University Press:

# Girth (x) of Black Cherry Trees
x = [8.3, 8.6, 8.8, 10.5, 10.7, 10.8, 11, 11, 11.1, 11.2, 11.3, 
     11.4, 11.4, 11.7, 12, 12.9, 12.9, 13.3, 13.7, 13.8, 14, 
     14.2, 14.5, 16, 16.3, 17.3, 17.5, 17.9, 18, 18, 20.6]

Ready.

1.2. Mean

The mean, also known as the arithmetic mean, is one of the most common measures of central tendency in statistics. It represents the average value of a dataset and provides a single value that summarizes the entire data distribution. To calculate the mean, you sum all values in your dataset and divide this total by the number of values:

# Mean
def mean(data):
    return sum(data) / len(data)

Ready.

1.3. Variance

The sample variance is a measure of how spread out the values in a dataset are. It quantifies the average squared deviation from the mean, giving insight into the variability within the sample. Unlike population variance, it divides by n−1 to account for the degrees of freedom, making it an unbiased estimator when working with a sample:

# Variance
def variance(data):
    m = mean(data)
    return sum((x - m) ** 2 for x in data) / (len(data) - 1)

Ready.

1.4. Standard Deviation

The standard deviation is the square root of the variance and provides a measure of spread in the same units as the original data. It indicates how much the values in a dataset typically deviate from the mean, making it easier to interpret than variance in practical terms:

# Standard Deviation
def std_dev(data):
    return variance(data) ** 0.5

Ready.

1.5. Application

These are application examples for mean, sample variance and standard deviation in MicroPython:

# Application Examples
print("Mean", mean(x))
print("Variance", variance(x))
print("Standard Deviation", std_dev(x))

Ready.

2. Statistical Basics - II

After analyzing individual variables, the focus is now on the possible interactions between two variables. The statistical basis for this is called bivariate statistics. Common methods include covariance, correlation and simple linear regression.

2.1. Dataset

Two variables of the trees dataset, provided by Atkinson, A. C. (1985): Plots, Transformations and Regression via Oxford University Press:

# Girth (x) and Volume (y) of Black Cherry Trees
x = [8.3, 8.6, 8.8, 10.5, 10.7, 10.8, 11, 11, 11.1, 11.2, 11.3, 
     11.4, 11.4, 11.7, 12, 12.9, 12.9, 13.3, 13.7, 13.8, 14, 
     14.2, 14.5, 16, 16.3, 17.3, 17.5, 17.9, 18, 18, 20.6]

y = [10.3, 10.3, 10.2, 16.4, 18.8, 19.7, 15.6, 18.2, 22.6, 19.9,
     24.2, 21, 21.4, 21.3, 19.1, 22.2, 33.8, 27.4, 25.7, 24.9,
     34.5, 31.7, 36.3, 38.3, 42.6, 55.4, 55.7, 58.3, 51.5, 51, 77]

Ready.

2.2. Covariance

The covariance measures the directional relationship between two variables. A positive covariance indicates that the variables tend to increase together, while a negative covariance suggests that as one increases, the other tends to decrease. It's a foundational concept in statistics for understanding how two variables vary together:

# Covariance
def covariance(x, y):
    mx = mean(x)
    my = mean(y)
    return sum((x[i] - mx) * (y[i] - my) for i in range(len(x))) / (len(x) - 1)

Ready.

2.3. Correlation

The correlation quantifies the strength and direction of the linear relationship between two variables. It standardizes the covariance by dividing it by the product of the standard deviations, resulting in a value between -1 and 1. A correlation close to 1 or -1 indicates a strong relationship, while a value near 0 suggests little to no linear association:

# Correlation
def correlation(x, y):
    return covariance(x, y) / (std_dev(x) * std_dev(y))

Ready.

2.4. Single Linear Regression

A single linear regression models the relationship between two variables by fitting a straight line to the data. It calculates the slope b and intercept a of the line y=a+bx, where b indicates how much y changes for each unit increase in x, and a is the predicted value of y when x=0:

# Simple Linear Regression
def linear_regression(x, y):
    b = covariance(x, y) / variance(x)
    a = mean(y) - b * mean(x)
    return a, b

Ready.

2.5. Predict Function

The predict function is required to determine the respective y values for the underlying x values via a and b:

# Predict Function
def predict(x_new, a, b):
    return a + b * x_new

Ready.

2.6. Residuals

Residuals represent the differences between the observed values and the predicted values from a linear regression model. They indicate how well the model fits the data: a residual close to 0 means a good fit, while larger residuals suggest that the model doesn't capture the data as accurately. The residuals can be used to assess the assumptions of linear regression and identify any outliers:

# Residuals
def residuals(x, y, a, b):
    return [y[i] - (a + b * x[i]) for i in range(len(x))]

Ready.

2.7. Coefficient of Determination

The coefficient of determination measures the proportion of variance in the dependent variable that is explained by the independent variable in a regression model. It indicates the goodness of fit: a coefficient of determination close to 1 means that the model explains most of the variance, while a value near 0 suggests the model doesn’t capture much of the variability:

# Coefficient of Determination
def r_squared(x, y, a, b):
    y_mean = mean(y)
    ss_tot = sum((yi - y_mean) ** 2 for yi in y)
    ss_res = sum((y[i] - (a + b * x[i])) ** 2 for i in range(len(y)))
    return 1 - ss_res / ss_tot

Ready.

2.8. Application

These are application examples for covariance, correlation, as well as the single linear regression with the corresponding predictions, residuals and the coefficient of determination in MicroPython:

# Application Examples
print("Covariance:", covariance(x, y))
print("Correlation:", correlation(x, y))

a, b = linear_regression(x, y)
print("\nSingle Linear Regression: y = {:.2f} + {:.2f} * x".format(a, b))
print("Predictions for x = 11.4:", predict(11.4, a, b))
print("\nResiduals:", residuals(x, y, a, b))
print("\nCoefficient of Determination:", r_squared(x, y, a, b))

Ready.

3. Machine Learning - I

Because dependent variables are generally not dependent on just one independent variable, it is advisable to broaden the perspective to include multivariate statistics which can take several independent variables into account. Therefore, multiple linear regression is introduced as first multivariate approach for regression tasks and in order to predict the outcome of the dependent variable.

3.1. Dataset

Three variables of the trees dataset, provided by Atkinson, A. C. (1985): Plots, Transformations and Regression via Oxford University Press:

# Girth (x1), Height (x2) and Volume (y) of Black Cherry Trees 
X = [[8.3, 70], [8.6, 65], [8.8, 63], [10.5, 72], [10.7, 81], [10.8, 83], [11, 66], [11, 75], [11.1, 80], [11.2, 75],
    [11.3, 79], [11.4, 76], [11.4, 76], [11.7, 69], [12, 75], [12.9, 74], [12.9, 85], [13.3, 86], [13.7, 71], [13.8, 64],
    [14, 78], [14.2, 80], [14.5, 74], [16, 72], [16.3, 77], [17.3, 81], [17.5, 82], [17.9, 80], [18, 80], [18, 80], [20.6, 87]]

y = [10.3, 10.3, 10.2, 16.4, 18.8, 19.7, 15.6, 18.2, 22.6, 19.9,
     24.2, 21, 21.4, 21.3, 19.1, 22.2, 33.8, 27.4, 25.7, 24.9,
     34.5, 31.7, 36.3, 38.3, 42.6, 55.4, 55.7, 58.3, 51.5, 51, 77]

Ready.

3.2. Matrix Inversion

Matrix inversion is essential in solving systems of linear equations, particularly in methods like multiple linear regression. The following code implements the Gaussian elimination method to invert a matrix, ensuring it is invertible by checking for non-zero pivots during the process:

# Mathematical Basics - Matrix Inversion
def invert_matrix(matrix):
    n = len(matrix)
    identity = [[float(i == j) for j in range(n)] for i in range(n)]
    m = [row[:] for row in matrix]

    for i in range(n):
        max_row = i
        max_val = abs(m[i][i])
        for k in range(i + 1, n):
            if abs(m[k][i]) > max_val:
                max_val = abs(m[k][i])
                max_row = k

        if max_val == 0:
            raise ValueError("Matrix is not invertible!")

        if max_row != i:
            m[i], m[max_row] = m[max_row], m[i]
            identity[i], identity[max_row] = identity[max_row], identity[i]

        factor = m[i][i]
        for j in range(n):
            m[i][j] /= factor
            identity[i][j] /= factor

        for k in range(n):
            if k != i:
                factor = m[k][i]
                for j in range(n):
                    m[k][j] -= factor * m[i][j]
                    identity[k][j] -= factor * identity[i][j]

    return identity

Ready.

3.3. Matrix Transposition

Matrix transposition involves flipping a matrix over its diagonal, converting rows into columns and vice versa. The resulting matrix is called the transpose of the original matrix. Transposition is commonly used in linear algebra, especially in operations like solving systems of equations or adjusting data representations:

# Mathematical Basics - Matrix Transposition
def transpose(matrix):
    return [[row[i] for row in matrix] for i in range(len(matrix[0]))]

Ready.

3.4. Matrix Multiplication

Matrix multiplication is a way of combining two matrices to create a new one. This operation is essential in many areas of linear algebra, including solving systems of linear equations and applying transformations. It is important for multiple linear regression because it allows you to calculate the coefficients of the regression model by multiplying the inverse of the design matrix with the target values:

# Mathematical Basics - Matrix Multiplication
def matmul(A, B):
    result = []
    for i in range(len(A)):
        row = []
        for j in range(len(B[0])):
            val = sum(A[i][k] * B[k][j] for k in range(len(B)))
            row.append(val)
        result.append(row)
    return result

Ready.

3.5. Multiple Linear Regression

With these mathematical basics, the multiple linear regression can be calculated as follows:

 # Multiple Linear Regression
def multivariate_regression(X_raw, y):
    X = [[1] + row for row in X_raw]
    y_vec = [[val] for val in y]
    
    XT = transpose(X)
    XTX = matmul(XT, X)
    XTX_inv = invert_matrix(XTX)
    XTy = matmul(XT, y_vec)
    
    beta = matmul(XTX_inv, XTy)
    return [b[0] for b in beta]

Ready.

3.6. Predict Function

A slightly modified predict function is required to determine the respective y values for the underlying x values:

# Predict Function
def predict_multi(X_raw, beta):
    X = [[1] + row for row in X_raw]
    return [sum(b * x for b, x in zip(beta, row)) for row in X]

Ready.

3.7. Residuals

Again, residuals represent the differences between the observed values and the predicted values:

# Residuals
def residuals_multi(X_raw, y, beta):
    y_pred = predict_multi(X_raw, beta)
    return [yi - y_hat for yi, y_hat in zip(y, y_pred)]

Ready.

3.8. Coefficient of Determination

The coefficient of determination for a multiple linear regression model measures how well the model's predictions match the actual data. It indicates the proportion of the variance in the target variable that can be explained by the model. Its interpretation is therefore similar to the coefficient of determination of a single linear regression model and may vary between 0 and 1, while a value closer to 0 means the model doesn't explain much of the variance:

# Coefficient of Determination
def r_squared_multi(X_raw, y, beta):
    y_pred = predict_multi(X_raw, beta)
    y_mean = sum(y) / len(y)
    
    ss_tot = sum((yi - y_mean) ** 2 for yi in y)
    ss_res = sum((yi - y_hat) ** 2 for yi, y_hat in zip(y, y_pred))
    
    return 1 - ss_res / ss_tot if ss_tot != 0 else 0

Ready.

3.9. Application

Finally, these are application examples for the multiple linear regression coefficients with corresponding predictions for one case, the residuals of the model as well as the coefficient of determination in MicroPython:

# Application Example
beta = multivariate_regression(X, y)
print("Coefficients:")
for i, b in enumerate(beta):
    if i == 0:
        print("a =", b)
    else:
        print("b{} = {}".format(i, b))

x_case_13 = [X[12]]
y_pred_13 = predict_multi(x_case_13, beta)[0]
print("\nPredictions for x1 = 11.4 and x2 = 76:", y_pred_13)

residuals = residuals_multi(X, y, beta)
print("\nResiduals:", residuals)

r2 = r_squared_multi(X, y, beta)
print("\nCoefficient of Determination:", r2)

Ready.

4. Machine Learning - II

Another multivariate approach can be demonstrated via multiple logistic regression. This time, the dependent variable is nominally scaled and enables a distinction to be made between classes 0 or 1. As a result, this multivariate statistics approach can be used for classification tasks.

4.1. Dataset

Three variables of the trees dataset, provided by Atkinson, A. C. (1985): Plots, Transformations and Regression via Oxford University Press. The dependent variable has been dichotomized, whereby a volume greater than 20 results in 1, else 0:

# Girth (x1), Height (x2) and Binary Volume (y) of Black Cherry Trees
X = [[8.3, 70], [8.6, 65], [8.8, 63], [10.5, 72], [10.7, 81], [10.8, 83], [11, 66], [11, 75], [11.1, 80], [11.2, 75],
    [11.3, 79], [11.4, 76], [11.4, 76], [11.7, 69], [12, 75], [12.9, 74], [12.9, 85], [13.3, 86], [13.7, 71], [13.8, 64],
    [14, 78], [14.2, 80], [14.5, 74], [16, 72], [16.3, 77], [17.3, 81], [17.5, 82], [17.9, 80], [18, 80], [18, 80], [20.6, 87]]


y = [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Ready.

4.2. Sigmoid Function

The sigmoid function in (multiple) logistic regression maps any input value to a range between 0 and 1, allowing us to interpret the result as a probability by producing an S-shaped curve ideal for binary classification with 0=no and 1=yes, for example:

# Mathematical Basics - Sigmoid Function
def sigmoid(z):
    return 1 / (1 + pow(2.71828, -z))

Ready.

4.3. Log Function

The log function approximates the natural logarithm using a numerical method based on the limit definition, useful when built-in log functions are unavailable in MicroPython. The natural logarithm (ln) is the inverse of the exponential function and tells us how many times we must multiply e≈2.71828 to get a given number:

# Mathematical Basics - Log Function
def log(x):
    n = 1000.0
    return n * ((x**(1/n)) - 1)

Ready.

4.4. Prediction of Probabilities

As a result of these mathematical basics, a function for the prediction of probabilities is required for processing the values of the previous sigmoid function:

# Mathematical Basics - Prediction of Probability
def predict_proba(x_input, weights, bias):
    z = bias
    for i in range(len(x_input)):
        z += weights[i] * x_input[i]
    return sigmoid(z)

Ready.

4.5. Gradient Descent Training

This function trains a logistic regression model using gradient descent. It iteratively updates the weights and biases to minimize the error between predicted probabilities (from the sigmoid function) and actual labels. By adjusting the weights in the direction that reduces the loss, the model gradually learns to classify input data:

# Multivariate Logistic Regression via Gradient Descent
def train_logistic_regression(X, y, lr=0.01, epochs=5000):
    n_samples = len(X)
    n_features = len(X[0])
    weights = [0] * n_features
    bias = 0

    for _ in range(epochs):
        grad_w = [0] * n_features
        grad_b = 0
        for i in range(n_samples):
            z = bias
            for j in range(n_features):
                z += weights[j] * X[i][j]
            p = sigmoid(z)
            error = p - y[i]
            for j in range(n_features):
                grad_w[j] += error * X[i][j]
            grad_b += error
        for j in range(n_features):
            weights[j] -= lr * grad_w[j] / n_samples
        bias -= lr * grad_b / n_samples

    return weights, bias

Ready.

4.6. Predict Function

Again, a slightly modified predict function is required to determine the respective y values for the underlying x values:

# Binary Prediction of Multivariate Logistic Regression
def predict(x_input, weights, bias):
    p = predict_proba(x_input, weights, bias)
    return 1 if p >= 0.5 else 0, p

Ready.

4.7. Application

The application examples for the multiple logistic regression focus on weights and bias of the model and return logits and probabilities as values for classification. A classification example highlights the functionality of multiple logistic regression models:

# Application Examples
weights, bias = train_logistic_regression(X, y)
print("Weights:", weights)
print("Intercept:", bias)

print("\nLogits and Probabilities:")
for i in range(len(X)):
    z = bias + sum([weights[j] * X[i][j] for j in range(len(weights))])
    p = sigmoid(z)
    print("x =", X[i], "Logit =", z, "P(y=1) =", p)

classification = predict([11.4, 76], weights, bias)
print("\nPredicted Class:", classification)

Ready.

5. Machine Learning - III

title It is possible that not all cases in a data set are equivalent. Accordingly, similar cases can be clustered to enable detailed analyses of the corresponding clusters. Many different clustering algorithms are available and k-Means clustering will be demonstrated since it is particularly illustrative and commonly used.

5.1. Dataset

Two variables and 15 cases of the original trees dataset, provided by Atkinson, A. C. (1985): Plots, Transformations and Regression via Oxford University Press. The other 15 cases are simulated trees, based upon another type of tree. Therefore, the dependent variable is dichotomized, indicating black cherry trees from the original dataset by 0 and simulated trees by 1:

# Girth (x1), Height (x2) and Class (y) of Black Cherry Trees and Simulated Trees
X = [[8.3, 70], [8.6, 65], [8.8, 63], [10.5, 72], [10.7, 81], [10.8, 83], [11.0, 66], [11.0, 75], [11.1, 80],
    [11.2, 75], [11.3, 79], [11.4, 76], [11.7, 69], [12.0, 75], [12.9, 74], [5.2, 45], [5.5, 48], [6.0, 50],
    [6.3, 46], [6.7, 49], [7.0, 51], [7.2, 47], [7.4, 52], [7.5, 50], [7.7, 46], [7.9, 53], [8.1, 49],
    [8.4, 47], [8.5, 54], [8.7, 52]]

# y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1 ,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Ready.

5.2. Euclidean Distance

The euclidean distance measures the straight-line distance between two points in a multi-dimensional space, calculated as the square root of the sum of the squared differences between corresponding coordinates. It’s commonly used in clustering and classification tasks to determine similarity between data points:

# Mathematical Basics - Euclidean Distance
def euclidean_distance(p1, p2):
    return sum((p1[i] - p2[i])**2 for i in range(len(p1))) ** 0.5

Ready.

5.3. Centroids Function

The initializing centroids function sets the starting points for the cluster centers and influences the convergence of the algorithm and the quality of the final clusters, as it determines how the data is grouped during the iterative process:

# Initializing Centroids Function
def initialize_centroids(X, k):
    return [X[i][:] for i in range(k)]

Ready.

5.4. Assigning Clusters Function

The assigning clusters function groups data points into clusters based on their proximity to the centroids. For each point, it calculates the euclidean distance to each centroid and assigns the point to the closest centroid’s cluster, ensuring that each cluster contains the points nearest to its respective centroid:

# Assigning Clusters Function
def assign_clusters(X, centroids):
    clusters = [[] for _ in centroids]
    for point in X:
        distances = [euclidean_distance(point, centroid) for centroid in centroids]
        min_index = distances.index(min(distances))
        clusters[min_index].append(point)
    return clusters

Ready.

5.5. Computing Centroids Function

The computing centroids function calculates the new centroids by finding the mean of all points within each cluster. For each cluster, it averages the values of each feature across all points, updating the centroid to represent the center of that cluster.

# Computing Centroids Function
def compute_centroids(clusters):
    new_centroids = []
    for cluster in clusters:
        if not cluster:
            continue
        n_features = len(cluster[0])
        mean = [0] * n_features
        for point in cluster:
            for i in range(n_features):
                mean[i] += point[i]
        mean = [val / len(cluster) for val in mean]
        new_centroids.append(mean)
    return new_centroids

Ready.

5.6. Within Cluster Sum of Squares

The within cluster sum of squares is defined as the total squared distance between each point and its assigned cluster centroid. It measures the compactness of the clusters, with smaller values indicating tighter clusters. The code computes this by summing the squared differences for all points in each cluster, relative to the centroid of that cluster:

# Within Cluster Sum of Squares
def wcss(clusters, centroids):
    total = 0
    for i in range(len(clusters)):
        for point in clusters[i]:
            total += sum((point[j] - centroids[i][j])**2 for j in range(len(point)))
    return total

Ready.

5.7. k-Means Algorithm

The k-means algorithm groups data points into k clusters. It iteratively assigns points to the closest centroids, recalculates the centroids, and computes the within cluster sum of squares until the centroids no longer change or the maximum number of iterations is reached, returning the final within cluster sum of squares value to assess the clustering quality:

# K-Means Algorithm
def kmeans_wcss(X, k=2, max_iter=100):
    centroids = initialize_centroids(X, k)
    for _ in range(max_iter):
        clusters = assign_clusters(X, centroids)
        new_centroids = compute_centroids(clusters)
        if new_centroids == centroids:
            break
        centroids = new_centroids

    labels = [0] * len(X)
    for cluster_index, cluster_points in enumerate(clusters):
        for point in cluster_points:
            for idx, original_point in enumerate(X):
                if original_point == point:
                    labels[idx] = cluster_index
                    break

    return wcss(clusters, centroids), labels, centroids

Ready.

5.8. k-Means Indicator

The k-means indicator highlights the chance of the within cluster sum of squares values when the number of centroids is increased. A decreasing value indicates a better allocation of the cases to the centroids:

# K-Means Indicator
def kmeans_indicator(X, max_k=10):
    wcss_values = []
    for k in range(1, max_k + 1):
        wcss_value, _, _ = kmeans_wcss(X, k)
        wcss_values.append(wcss_value)
    return wcss_values

Ready.

5.9. Application

The application examples indicate the within cluster sum of squares for each number of clusters. In addition, it indicates the number of clusters within the dataset, which in this case is supposed to be 2 and assigns the labels accordingly. The position of the centroids is highlighted as well:

## Application examples
wcss_list = kmeans_indicator(X, max_k=6)

print("WCSS values for k = 1 to 6:")
for k, w in enumerate(wcss_list, 1):
    print("k =", k, "-> WCSS =", w)

wcss_value, cluster_labels, final_centroids = kmeans_wcss(X, k=2)

print("\nWCSS:", wcss_value)
print("\nCluster Labels:", cluster_labels)
print("\nCentroids:", final_centroids)

Ready.

6. Machine Learning - IV

A factor analysis is a statistical method that reduces a large number of variables into a smaller, more manageable set of underlying factors. It helps identify hidden patterns and relationships within data, making it easier to understand complex structures.

6.1. Dataset

These are 10 variables, based upon 5 variables each for the two personality dimensions extraversion and neuroticism, from the bfi dataset by Revelle, W., Wilt, J. and A. Rosenthal (2010): Individual Differences in Cognition: New Methods for examining the Personality-Cognition Link via Springer:

# Extraversion (x1:x5) and Neuroticism (x6:x10) of the Big Five Inventory
X = [[2, 1, 6, 5, 6, 3, 5, 2, 2, 3],
    [3, 6, 4, 2, 1, 6, 3, 2, 6, 4],
    [1, 3, 2, 5, 4, 3, 3, 4, 2, 3],
    [3, 4, 3, 6, 5, 2, 4, 2, 2, 3],
    [2, 1, 2, 5, 2, 2, 2, 2, 2, 2],
    [2, 2, 4, 6, 6, 4, 4, 4, 6, 6],
    [3, 2, 5, 5, 6, 2, 3, 3, 1, 1],
    [1, 1, 6, 6, 6, 2, 3, 1, 2, 1],
    [2, 4, 4, 2, 6, 3, 3, 5, 3, 2],
    [1, 2, 6, 5, 4, 1, 4, 2, 2, 5],
    [1, 2, 6, 5, 5, 5, 4, 4, 3, 1],
    [1, 2, 4, 5, 5, 3, 2, 4, 1, 2],
    [6, 6, 2, 1, 1, 1, 2, 1, 3, 6],
    [3, 4, 3, 2, 3, 5, 3, 4, 4, 3],
    [6, 6, 3, 2, 2, 2, 2, 2, 4, 1],
    [3, 4, 3, 3, 5, 5, 6, 5, 5, 4],
    [3, 2, 3, 6, 5, 1, 2, 1, 2, 1],
    [4, 3, 4, 4, 4, 2, 2, 3, 3, 3],
    [3, 3, 2, 5, 4, 2, 3, 1, 3, 2],
    [6, 4, 4, 4, 3, 2, 2, 3, 4, 5]]

Ready.

6.2. Mean Center Function

Mean centering via a mean center function, is often used in data preprocessing to make the dataset more suitable for machine learning algorithms by ensuring all features contribute equally to the model:

# Mean Center Function
def mean_center(X):
    cols = len(X[0])
    rows = len(X)
    means = [sum(X[i][j] for i in range(rows)) / rows for j in range(cols)]
    centered = [[X[i][j] - means[j] for j in range(cols)] for i in range(rows)]
    return centered, means

Ready.

6.3. Correlation Matrix

Again, the correlation matrix quantifies the strength and direction of the linear relationship between two variables. This code can be used to correlate several variables and summarize the corresponding values in one matrix:

# Correlation Matrix
def correlation_matrix(X):
    rows = len(X)
    cols = len(X[0])
    corr = [[0]*cols for _ in range(cols)]

    for i in range(cols):
        for j in range(cols):
            xi = [row[i] for row in X]
            xj = [row[j] for row in X]
            num = sum(xi[k] * xj[k] for k in range(rows))
            denom_i = sum(xi[k]**2 for k in range(rows)) ** 0.5
            denom_j = sum(xj[k]**2 for k in range(rows)) ** 0.5
            corr[i][j] = num / (denom_i * denom_j)
    return corr

Ready.

6.4. Power Iteration Function

The power iteration function is an algorithm used to compute the dominant eigenvalues and eigenvectors of a matrix. The process involves iteratively applying matrix-vector multiplication to a random initial vector and normalizing it to avoid overflow or underflow, which allows the vector to converge to the eigenvector corresponding to the largest eigenvalue:

# Power Iteration Function 
def power_iteration(A, num_vectors=2, iterations=100):
    n = len(A)
    eigenvectors = []
    eigenvalues = []

    for _ in range(num_vectors):
        b = [1.0]*n
        for _ in range(iterations):
            # Multiply A * b
            Ab = [sum(A[i][j] * b[j] for j in range(n)) for i in range(n)]
            norm = sum(x**2 for x in Ab) ** 0.5
            b = [x / norm for x in Ab]
        # Rayleigh quotient for eigenvalue
        Ab = [sum(A[i][j] * b[j] for j in range(n)) for i in range(n)]
        eigval = sum(b[i] * Ab[i] for i in range(n))
        eigenvalues.append(eigval)
        eigenvectors.append(b)

        # Deflation
        for i in range(n):
            for j in range(n):
                A[i][j] -= eigval * b[i] * b[j]

    return eigenvalues, eigenvectors

Ready.

6.5. Factor Loadings Function

The factor loadings function computes the factor loadings based on the correlation matrix, eigenvalues, and eigenvectors. In factor analysis, factor loadings represent the relationships between observed variables and the underlying latent factors.

# Factor Loadings Function
def factor_loadings(corr_matrix, eigenvalues, eigenvectors):
    loadings = []
    for i in range(len(corr_matrix)):
        row = []
        for j in range(len(eigenvectors)):
            loading = eigenvectors[j][i] * (eigenvalues[j] ** 0.5)
            row.append(loading)
        loadings.append(row)
    return loadings

Ready.

6.6. Application

This application example shows how to compute the correlation matrix, the eigenvalues as well as the corresponding factor loadings for identifying the underlying factors:

# Application examples
X_centered, means = mean_center(X)
R = correlation_matrix(X_centered)
eigvals, eigvecs = power_iteration([row[:] for row in R], num_vectors=2)
loadings = factor_loadings(R, eigvals, eigvecs)

print("Correlation matrix:")
for row in R:
    print(["{0:.2f}".format(x) for x in row])

print("\nEigenvalues:")
for i, val in enumerate(eigvals):
    print("Factor", i+1, ":", round(val, 3))

print("\nFactor Loadings:")
for i, row in enumerate(loadings):
    print("V" + str(i+1), ":", ["{0:.2f}".format(x) for x in row])

Ready.

7. Deep Learning - I

A neural network consists of neurons and layers that process data via activation functions. Weights and biases are necessary in order to activate a neuron and to reach out for other neurons in another layer. The first neural network will use pretrained weights and biases straight out of TensorFlow in order to identify and process non-linear patterns within datasets.

7.1. Libraries

Normally all functions in MicroPython can be coded manually. However, the math library is imported here to simplify the execution of the exponential function.

# Libraries
import math

Ready.

7.2. Dataset

These are five variables from the iris dataset by Fisher, R. (1936): The use of multiple measurements in taxonomic problems via John Wiley & Sons. The four independent variables are based upon length and width of the sepal leaf (x1 and x2) as well as the petal leaf (x3 and x4). All indipendent variables are standardized. Additionaly, the dependent variable differs between versicolor (0) and virginica (1) as different species of iris flowers.

# Standardized independent variables (X) and dichotomized dependent variable (y)
X = [[ 0.81575475, -0.21746808, -0.12904165, -0.65303909],
         [ 0.05761837,  1.59476592,  0.84485761,  1.71304456],
         [ 0.96738203,  0.68864892, -0.00730424, -0.41643072],
         [ 2.02877297,  0.38660992,  2.06223168,  1.00321947],
         [ 1.42226386,  0.99068792,  1.33180724,  0.29339437],
         [ 0.81575475,  0.99068792,  1.21006983,  1.4764362 ],
         [-1.00377258,  0.38660992, -0.49425387, -0.41643072],
         [ 0.05761837, -0.51950708, -0.00730424,  0.29339437],
         [ 0.36087292,  0.38660992,  1.08833242,  1.23982783],
         [ 0.66412748,  0.38660992,  0.35790798,  1.4764362 ],
         [ 0.05761837,  0.08457092,  0.84485761,  0.29339437],
         [-0.70051802, -0.51950708,  0.23617057,  0.53000274],
         [ 0.20924564, -0.21746808,  0.84485761,  1.00321947],
         [-0.24563619,  0.08457092, -0.25077906, -0.65303909],
         [-2.06516352, -1.42562408, -1.95510276, -1.59947255],
         [-1.15539985, -1.42562408, -1.34641572, -1.36286418],
         [ 0.05761837, -1.12358508, -0.00730424, -0.41643072],
         [ 0.20924564,  0.08457092, -0.73772869, -0.88964745],
         [-0.39726347, -0.51950708,  0.23617057, -0.17982236],
         [ 0.5125002 ,  0.08457092, -0.37251647, -0.88964745]]

y = [0,1,0,1,1,1,0,1,1,1,1,1,1,0,0,0,0,0,0,0]

Ready.

7.3. Activation Functions

Neural networks are based upon neurons and activation functions decide whether a neuron should be activated or not. This means that it will decide whether the neuron's input to the neural network is important or not in the process of prediction using simpler mathematical operations like Rectified Linear Unit (ReLU), Leaky Rectified Linear Unit (Leaky ReLU), Hyperbolig Tangent (Tanh) or Logistic Regression (Sigmoid).

# ReLU
def relu(x):
    return [max(0, val) for val in x]

# Leaky ReLU
def leaky_relu(x, alpha=0.01):
    return [val if val >= 0 else alpha * val for val in x]

# Tanh
def tanh(x):
    return [(math.exp(val) - math.exp(-val)) / (math.exp(val) + math.exp(-val)) for val in x]

# Sigmoid
def sigmoid(x):
    return [1 / (1 + math.exp(-val)) for val in x]

Ready.

7.4. Single Neuron

A single neuron therefore accesses one of the previously defined activation functions and can be defined as follows in MicroPython:

# Single Neuron
def neuron(x, w, b, activation):

    tmp = zero_dim(x[0])

    for i in range(len(x)):
        tmp = add_dim(tmp, [(float(w[i]) * float(x[i][j])) for j in range(len(x[0]))])

    if activation == "sigmoid":
        yp = sigmoid([tmp[i] + b for i in range(len(tmp))])
    elif activation == "relu":
        yp = relu([tmp[i] + b for i in range(len(tmp))])
    elif activation == "leaky_relu":
        yp = relu([tmp[i] + b for i in range(len(tmp))])
    elif activation == "tanh":
        yp = tanh([tmp[i] + b for i in range(len(tmp))])
    else:
        print("Function unknown!")

    return yp

Ready.

7.5. Data Formats and Processing

In order for the data to be adequately processed by a neural network, a series of data formats such as vectors, matrices and the architecture of neural networks via layers must be defined. These mathematical basics of a neural network can be defined as follows:

# Mathematical Basics - I
def zero_dim(x):
    z = [0 for i in range(len(x))]
    return z

# Mathematical Basics - II
def add_dim(x, y):
    z = [x[i] + y[i] for i in range(len(x))]
    return z

# Mathematical Basics - III
def zeros(rows, cols):
    M = []
    while len(M) < rows:
        M.append([])
        while len(M[-1]) < cols:
            M[-1].append(0.0)
    return M

# Mathematical Basics - IV
def transpose(M):
    if not isinstance(M[0], list):
        M = [M]
    rows = len(M)
    cols = len(M[0])
    MT = zeros(cols, rows)
    for i in range(rows):
        for j in range(cols):
            MT[j][i] = M[i][j]
    return MT

# Mathematical Basics - V
def print_matrix(M, decimals=3):
    for row in M:
        print([round(x, decimals) + 0 for x in row])

# Mathematical Basics - VI
def dense(nunit, x, w, b, activation):
    res = []
    for i in range(nunit):
        z = neuron(x, w[i], b[i], activation)
        res.append(z)
    return res

Ready.

7.6. Weights and Biases

The architecture of a neural network can be reconstructed in MicroPython with the weights and biases from a already pretrained deep learning model. They can be transferred from TensorFlow (which is a deep learning library suitable for Python) to MicroPython. The following structure indicates four independent variables (rows) for two neurons (columns) in the input layer with the according weight w1. In addition, the first layer has two accoring biases b1. Therefore, the first hidden layer consists of three neurons with w2 and b2, the second hidden layer consists of two neurons with w3 and b3 and the output layer is a single neuron with w4 and b4. As a result, this neural network consists of a total of eight neurons.

# Include Parameters from TensorFlow
w1 = [[-0.75323504, -0.25906014],
      [-0.46379513, -0.5019245 ],
      [ 2.1273055 ,  1.7724446 ],
      [ 1.1853403 ,  0.88468695]]
b1 = [0.53405946, 0.32578036]
w2 = [[-1.6785783,  2.0158117,  1.2769054],
      [-1.4055765,  0.6828738,  1.5902631]]
b2 = [ 1.18362  , -1.1555661, -1.0966455]
w3 = [[ 0.729278  , -1.0240695 ],
      [-0.80972326,  1.4383037 ],
      [-0.90892404,  1.6760625 ]]
b3 = [0.10695826, 0.01635581]
w4 = [[-0.2019448],
      [ 1.5772797]]
b4 = [-1.2177287]

# Transpose
w1 = transpose(w1)
w2 = transpose(w2)
w3 = transpose(w3)
w4 = transpose(w4)

Ready.

7.7. Neural Network Architecture

According to the transferred weights and biases the architecture of the neural network can be defined in MicroPython as follows. This specifies the number of neurons within each layer and the activation functions for activating the neurons.

# Neural Network Architecture
yout1 = dense(2, transpose(X), w1, b1, 'relu') # input layer (2 neurons)
yout2 = dense(3, yout1, w2, b2, 'sigmoid') # hidden layer (3 neurons)
yout3 = dense(2, yout2, w3, b3, 'relu') # hidden layer (2 neurons)
ypred = dense(1, yout3, w4, b4,'sigmoid') # output layer (1 neuron)
print(ypred)

Ready.

7.8. Confusion Matrix

A confusion matrix, also known as an error matrix, is a table that visualizes the performance of a classification model by comparing its predictions against the actual results. It's a two-dimensional matrix that displays the counts of true positives, true negatives, false positives, and false negatives, providing a detailed view of where a model's predictions are correct and where it's making errors.

# Confusion Matrix Basics
def classification_report(y, ypred):
    TP = TN = FP = FN = 0
    for true, pred in zip(y, ypred):
        if true == pred:
            if true == 1:
                TP += 1
            else:
                TN += 1
        else:
            if true == 1:
                FN += 1
            else:
                FP += 1
    accuracy = (TP + TN) / len(y)
    print("Accuracy: {:.3f}".format(accuracy))
    print("Confusion Matrix:")
    print("TN: {}, FP: {}".format(TN, FP))
    print("FN: {}, TP: {}".format(FN, TP))

Ready.

7.9. Application

The performance of the pretrained neural network can be viewed via the following MicroPython code:

# Confusion Matrix
ypred_class = [1 if i > 0.5 else 0 for i in ypred[0]]
print(ypred_class)
classification_report(y, ypred_class)

Ready.

8. Deep Learning - II

The second neural network will adjust weights and biases automatically. As a result, this neural network can identify and process non-linear patterns within datasets on its own.

8.1. Libraries

The random library and math library are imported to simplify the execution of some functions required for self-learning neural networks.

# Libraries
import random
import math

Ready.

8.2. Dataset

Again, the five variables from the iris dataset by Fisher, R. (1936): The use of multiple measurements in taxonomic problems via John Wiley & Sons will be used. The four independent variables are based upon length and width of the sepal leaf (x1 and x2) as well as the petal leaf (x3 and x4). All indipendent variables are standardized. Additionaly, the dependent variable differs between versicolor (0) and virginica (1) as different species of iris flowers.

# Standardized independent variables (X) and dichotomized dependent variable (y)
X = [[ 0.81575475, -0.21746808, -0.12904165, -0.65303909],
         [ 0.05761837,  1.59476592,  0.84485761,  1.71304456],
         [ 0.96738203,  0.68864892, -0.00730424, -0.41643072],
         [ 2.02877297,  0.38660992,  2.06223168,  1.00321947],
         [ 1.42226386,  0.99068792,  1.33180724,  0.29339437],
         [ 0.81575475,  0.99068792,  1.21006983,  1.4764362 ],
         [-1.00377258,  0.38660992, -0.49425387, -0.41643072],
         [ 0.05761837, -0.51950708, -0.00730424,  0.29339437],
         [ 0.36087292,  0.38660992,  1.08833242,  1.23982783],
         [ 0.66412748,  0.38660992,  0.35790798,  1.4764362 ],
         [ 0.05761837,  0.08457092,  0.84485761,  0.29339437],
         [-0.70051802, -0.51950708,  0.23617057,  0.53000274],
         [ 0.20924564, -0.21746808,  0.84485761,  1.00321947],
         [-0.24563619,  0.08457092, -0.25077906, -0.65303909],
         [-2.06516352, -1.42562408, -1.95510276, -1.59947255],
         [-1.15539985, -1.42562408, -1.34641572, -1.36286418],
         [ 0.05761837, -1.12358508, -0.00730424, -0.41643072],
         [ 0.20924564,  0.08457092, -0.73772869, -0.88964745],
         [-0.39726347, -0.51950708,  0.23617057, -0.17982236],
         [ 0.5125002 ,  0.08457092, -0.37251647, -0.88964745]]

y = [0,1,0,1,1,1,0,1,1,1,1,1,1,0,0,0,0,0,0,0]

Ready.

8.3. Activation Functions and Derivatives

Self-learning neural networks not only require activation functions, but also their derivates. The derivative of a function represents its instantaneous rate of change at a specific point. This allows the neural network to be trained.

# Sigmoid
def sigmoid(x):
    return 1 / (1 + math.exp(-x))

# Derivate of Sigmoid
def sigmoid_derivative(output):
    return output * (1 - output)

# ReLU
def relu(x):
    return max(0, x)

# Derivate of ReLU
def relu_derivative(output):
    return 1 if output > 0 else 0

Ready.

8.4. Function for Random Initialization

Since the neural network is supposed to learn the weights and biases by itself, the layers and neurons of the neural network will be initialized with some random values.

# Function for Initializing Weights and Biases
def init_layer(input_size, output_size):
    weights = [[random.uniform(-0.5, 0.5) for _ in range(input_size)] for _ in range(output_size)]
    biases = [random.uniform(-0.5, 0.5) for _ in range(output_size)]
    return weights, biases

Ready.

8.5. Forward and Backward Data Processing

In neural networks, forward propagation is the process of passing input data through the network's layers to generate a prediction and backward propagation, on the other hand, is the mechanism used to train the network by calculating the error between the prediction and the actual output, and then adjusting the network's weights to minimize that error. This important for the learning ability of a neural network.

# Forward Propagation
def dense_forward(inputs, weights, biases, activation='relu'):
    outputs = []
    pre_activations = []
    for w, b in zip(weights, biases):
        z = sum(i*w_ij for i, w_ij in zip(inputs, w)) + b
        pre_activations.append(z)
        if activation == 'sigmoid':
            outputs.append(sigmoid(z))
        elif activation == 'relu':
            outputs.append(relu(z))
        else:
            raise Exception("Unknown activation")
    return outputs, pre_activations

# Backward Propagation
def dense_backward(inputs, grad_outputs, outputs, pre_activations, weights, biases, activation='relu', lr=0.01):
    input_grads = [0.0 for _ in range(len(inputs))]
    for j in range(len(weights)):
        if activation == 'sigmoid':
            delta = grad_outputs[j] * sigmoid_derivative(outputs[j])
        elif activation == 'relu':
            delta = grad_outputs[j] * relu_derivative(pre_activations[j])
        else:
            raise Exception("Unknown activation")
        for i in range(len(inputs)):
            input_grads[i] += weights[j][i] * delta
            weights[j][i] -= lr * delta * inputs[i]
        biases[j] -= lr * delta
    return input_grads

Ready.

8.6. Loss Function

Furthermore, a loss function quantifies the difference between a deep learning model's prediction and the actual outcome, essentially acting as a measure of the model's error. Cross-entropy, a specific type of loss function, is commonly used for classification problems, especially when the model outputs probabilities.

# Loss Function
def binary_cross_entropy(predicted, target):
    epsilon = 1e-7
    return - (target * math.log(predicted + epsilon) + (1 - target) * math.log(1 - predicted + epsilon))

def binary_cross_entropy_derivative(predicted, target):
    epsilon = 1e-7
    return -(target / (predicted + epsilon)) + (1 - target) / (1 - predicted + epsilon)

Ready.

8.7. Neural Network Architecture

This time the architecture of the neural network consists of four independent variables which will be forwarded to three neurons in the input layer and one neuron in the output layer. This is a very simple neural network that consists of four neurons in two layers with according weights (w1 and w2) and biases (b1 and b2).

# Initialize Weights and Biases
w1, b1 = init_layer(4, 3)
w2, b2 = init_layer(3, 1)

Ready.

8.8. Specification of Learning Behavior

Finally, the number of epochs and the learning rate need to be specified in MicroPython. In neural networks, an epoch represents one complete pass of the entire training dataset through the model. Learning rate determines how much the model's weights are adjusted during each update step in the training process. Both are crucial hyperparameters that influence training and model performance.

# Epochs and Learning Rate for Training
epochs = 100
lr = 0.05

for epoch in range(epochs):
    total_loss = 0
    for xi, yi in zip(X, y):
        # Forward pass
        out1, pre1 = dense_forward(xi, w1, b1, 'relu')
        out2, pre2 = dense_forward(out1, w2, b2, 'sigmoid')
        loss = binary_cross_entropy(out2[0], yi)
        total_loss += loss

        # Backward pass
        dL_dout2 = [binary_cross_entropy_derivative(out2[0], yi)]
        dL_dout1 = dense_backward(out1, dL_dout2, out2, pre2, w2, b2, 'sigmoid', lr)
        _ = dense_backward(xi, dL_dout1, out1, pre1, w1, b1, 'relu', lr)

    if epoch % 10 == 0 or epoch == epochs - 1:
        print(f"Epoch {epoch+1}, Loss: {total_loss:.4f}")

Ready.

8.9. Predict Funtion

The outcome of the neural network can be predicted with the collowing code in MicroPython:

# Predict Function
def predict(x):
    out1, _ = dense_forward(x, w1, b1, 'relu')
    out2, _ = dense_forward(out1, w2, b2, 'sigmoid')
    return 1 if out2[0] > 0.5 else 0

Ready.

8.10. Confusion Matrix

As in the pretrained neural network before, a confusion matrix can be used to evaluate the performance of the neural network.

# Confusion Matrix Basics
def classification_report(ytrue, ypred):
    TP = TN = FP = FN = 0
    for true, pred in zip(ytrue, ypred):
        if true == pred:
            if true == 1:
                TP += 1
            else:
                TN += 1
        else:
            if true == 1:
                FN += 1
            else:
                FP += 1
    accuracy = (TP + TN) / len(ytrue)
    print("Accuracy: {:.3f}".format(accuracy))
    print("Confusion Matrix:")
    print("TN: {}, FP: {}".format(TN, FP))
    print("FN: {}, TP: {}".format(FN, TP))

Ready.

8.11. Application

Finally, the performance of the neural network can be inspected and validated with new data:

# Generate predictions
ypred = [predict(xi) for xi in X]

# Show classification metrics
classification_report(y, ypred)

Ready.