Merwyn Carrillos
Published on

Enhancing ML Classifications with Human-In-The-Loop Feedback Using Slack and AWS

Introduction

In machine learning, classification is a method by which input data is categorized into distinct classes by trained classification algorithms.

Like many other machine learning models, classification involves two major phases:

  1. Model Training: This phase encompasses the process of creating training datasets and training a model.
  2. Model Inference: This phase involves hosting the trained model to make it available for classifying new data.

In this context, Human-In-The-Loop (HITL) is a strategy for improving model accuracy by involving humans in the labeling of data when building a model.

Not all outputs of a model's inference will be correct. By involving humans to correct model inferences, we essentially provide the model with high-quality, highly accurate data to train on.

Incremental training enhanced with HITL processes improves classification algorithms by continuously refining the model's accuracy and performance based on real-world feedback and corrections. This feedback can be used to refine the model and improve its accuracy over time.

High level workflow

In this article, I will explore how to enhance ML classifications with human-in-the-loop feedback using Slack and AWS.


Table of Contents

Theory

This article follows an effort on the classification of timeseries data. The data itself are a composition of sensor values, where the sensors are infrastructure and application metrics, logs, and traces.

While the data collected is enough to train a classification model using run-of-the-mill unsupervised learning algorithms, data labeling provides a much higher level of confidence of a model's classification.

By definition, Subject Matter Experts (SMEs) are able to provide an accurate assessment of a model's output respective to their area of expertise. Should a predicted classification not be true to reality, a SME can take action to label the data as erroneous.

This leads to higher quality data over time, and in turn more accurate ML models as they are incrementally trained.

In Practice

Say we own a service whose purpose is monitor external services and classify their availability using a simple classification schema.

The classification of that data is based on a multitude of key performance indicators (KPIs) which are fed to a trained machine learning model.

During inference, this model is meant to deliver a value on service availability.

For simplicity's sake, say we map availability classifications with a simple and recognizable red-yellow-green pattern.

  • Green == Available
  • Yellow == Partially Available
  • Red == Unavailable

The results of this classification are available for consumption by end-users via a website dashboard.

A feature of this service is to track changes to a monitored service and notify pertinent parties of its change in state.

Should the state change a notification would be sent to a Slack channel. This would trigger a Slack notification which would be enriched with context.

Part of the message is the feedback loop mechanism. A simple interface - a button, yes or no, if the classification inferred was accurate or not.

An Example

For example, say I have written a synthetic webpage test using AWS Synthetics (previously AWS CloudWatch Canary).

This service uses Selenium, an end-to-end browser automation tool popular amongst website QA teams, to visit and interact with websites.

The test being written is simple. Navigate to a website, use the search functionality and click on some link.

As this hypothetical site's administrator, I hope that every single test results in success. That means all intended functionality exists. If not, I want to be alerted as to why.

The example is simple and receiving and alert is possible with AWS' current tooling. However, once we also want to look at application performance, javascript rendering, and other facets that make the site truly available to an end user, we start to collect more data.

It is at this point where one might consider running all that data through a machine learning classification algorithm, for the analysis of vast quantities of data is one area where machine learning thrives.

Human-in-the-Loop

My idea of a system that facilitates accurate data labeling revolves around a few rules:

  • Subject Matter Experts are responsible for data labeling
  • Data labelers have a simple, intuitive method to label data
  • New data labelers are easily onboarded

The practical solution takes the form of a simple yes/no question presented to the data labelers where in this case is a subject expert.

The press of the button triggers a slew of services to fulfill the HITL feedback loop. In a nutshell:

  1. A POST request to an API Gateway endpoint would trigger a Lambda function
  2. The Lambda function would update an Aurora table with the button selection

The end result is a dataset that is later joined with existing historical data used for classification. In other words, existing data is enriched with the result of the button selection.

Architecture

  • Slack
  • AWS API Gateway
  • AWS Lambda
  • AWS Aurora (w/Postgres)
High level workflow

As we are running inference on minute-by-minute data of key performance indicators of critical infrastructure, enterprise, and business services.

The notification serves two purposes:

  1. Service owners are alerted to potential issues and are able to take action
  2. The classification accuracy of the model is improved
Human in the Loop writes back to AWS and updates the DB alerts table.

If a service is classified as changing state, a notification event is generated and passed to a notification service. That service is tasked with wrangling up notification methods, destinations, a the composition/content of a message.

The event is captured in a table and the notification service is triggered.

This table represents an alert table.

TimestampServiceOld StateNew StateEvent GUIDFeedback Response
2025-02-01 04:10:11Service AGreenYellow1234567890abcdefYes
2025-02-01 11:12:20Service BGreenRed1234567890abcdefNo
2025-02-01 12:18:43Service CRedGreen1234567890abcdefNaN
2025-02-01 14:20:04Service DGreenYellow1234567890abcdefNaN

As services are determined to have changed state, our database grows by a row.

As data is labeled, the feedback is captured and is used to amend said row.

This data, like a surging river flowing with rich nutrients, makes its way back into the data lake for which the model algorithms feed from.

Implementation

This article does not cover how the model is trained, a dataset is created, or how the model is served and inference is accomplished.

However, at a high level, the workflow looks like:

  1. A classification model delivers an inferred value on service state and is logged in a database.
  2. The existing (old) service state is compared to the inferred value.
  3. If the state has changed, the workflow continues, otherwise, it stops here.
  4. If the state has changed, a system triggers a Slack notification enriched with contextual information.
    1. The message contains a feedback loop mechanism.
    2. A button, yes or no, if the classification was accurate or not.
  5. The press of the button would trigger a series of services in the backend.
  6. The final service would enrich the object created for the anomaly/classification with the result of the button selection.
  7. The human-in-the-loop would be a member of that service's Subject Matter Experts group.
  8. The human would be used to label the data.

The payload for the alert contains the following:

  • The classification
  • The context
  • The feedback loop mechanism

The context itself contains identifying information:

  • A timestamp
  • A service name
  • An alert GUID

The alert GUID is used to identify the alert in the backend and would be used to link the feedback to the alert.

The dataset is enriched with the feedback loop mechanism, which is a binary value: yes or no.

Services

Slack

To simplify the feedback loop mechanism, a Slack Application is created and added to target channels. It is implied that interested parties, such as service owners and administrators subscribe to this Slack channel. It is this collection of roles that makes up the Subject Matter Experts for a service/dataset being labelled using HITL.

In its simplest form, the Slack Application is used as a permissions vehicle, one which allows our notifications microservice to write to a Slack Channel.

AWS

The feedback loop mechanism is implemented using AWS services.

API Gateway

This is the entrypoint for the feedback loop mechanism. A POST request is made to an API Gateway endpoint and after authentication, the request is forwarded to a worker Lambda function.

Lambda

By definition, AWS Lambda is a service that lets you run code without provisioning or managing servers.

Using AWS Step Functions, we can sequence the Lambda functions that are triggered by the feedback loop mechanism.

After initial authentication to validate the incoming Slack message,

  1. The first Lambda function is initiated via an API Gateway endpoint and triggered by a POST request from Slack and parses the incoming payload.
    1. The Lambda function updates verifies the incoming HITL selection was made by someone part of that service's SME team.
  2. A second Lambda function is triggered by the first Lambda function (if successful) and updates the relevant AWS Aurora table with the HITL selection.
  3. The final Lambda function is triggered by the second Lambda function and updates the original slack message (where the HITL loop originated) with a selection.

The last function is important as it provides consistency in the feedback loop mechanism and informs others that the HITL loop has been completed.

Alternatively, there is an opportunity to use other compute services such as AWS Fargate or AWS EC2.

Aurora

AWS Aurora provides a serverless, scalable, managed Postgres service which we can use to store our events. The same table is used to store the feedback loop mechanism.

The table is updated with the HITL's button selection.

Conclusion

Involving humans in the classification of data is a powerful tool for improving the accuracy of machine learning models. Iterative training with human-in-the-loop feedback can significantly enhance the performance of classification algorithms.

I believe that the key to making the human-in-the-loop process efficient, it is important to provide a simple, intuitive interface for data labeling.

Slack provides an almost organic way to interact with data and is a great tool for implementing this type of feedback.

By combining Slack with AWS services like API Gateway, Lambda, and Aurora, we can create a robust system for enhancing machine learning classifications.