YABAC: Yet Another Bacteria and Archaea Classifier

1. Select your machine learning models

Select one learning model
The Multinomial Naïve Bayes (MNB) works by analyzing the frequency of short nucleotide sequences (k-mers) in your data. It assumes that each k-mer contributes independently to the overall classification, helping to predict the class label based on the observed k-mer patterns. This approach is effective for tasks like classifying DNA or RNA sequences, where certain k-mer patterns are associated with specific biological categories. The Random Forest (RF) works with sequence data by processing the data in stages, using a series of decision trees to classify or predict outcomes. When applied to your sequence data, RF treats each sequence or subsequence as a feature set, where the input data (such as DNA or RNA sequences) is represented by features like k-mers, sequence motifs, or other extracted characteristics. Each tree evaluates a subset of features and contributes independently to the final prediction. The ensemble improves accuracy by averaging results across diverse decision trees, reducing variance and overfitting. This makes it well-suited for sequence-based classification tasks like genomic sequence analysis or time series prediction.

Multinomial Naïve Bayes (MNB)

Random Forest (RF)

2. Import data

Select the file containing the data you want to do predictions on
This file should be in FASTA format.

An validation FASTA is provided here with the metadata: validation_sequences_with_lineage.zip. You can use this file to test the tool.

No .fasta file selected

3. Run model

Run the selected model
This will start the prediction process. The time it takes depends on the size of your data and the selected model. You can cancel the task at any time.

Machine learning tool

1. Select your machine learning models

2. Import data

3. Run model