DYLATIS - Dynamic Latent Taxonomy Identification Space

Istvan Benedek
1 day ago
2 min read

Updated: 4 hours ago

We Have Started Developing DYLATIS: A Dynamic Latent Taxonomy Identification Space for Evaluating Recognition Accuracy at Large Scale

We have recently begun development of DYLATIS, the Dynamic Latent Taxonomy Identification Space, a framework created to address a central question in the

OFELIA project:

What recognition accuracy can we expect when more than 30 deep-learning feature-detector models—with precisely measured accuracies—work together inside a massive, noisy, high-dimensional species universe?

OFELIA’s early experimental phases revealed an uncomfortable truth. Methods that behave well at small scale— hundreds of species (<1000) —break down completely once we move beyond ~10000 species. Traditional “flat” classification, simple distance metrics, and textbook feature-matching pipelines remain usable for small species spaces. But for anyone who wants to recognize the entire kingdom of Fungi, the entire plant kingdom, or even the full animal world, these traditional approaches cannot scale.

The complexity does not grow linearly but explosively. Managing thousands of species requires a kingdom-independent architecture built from general, deeply structured components rather than ad hoc heuristics.

DYLATIS provides such a foundation. We construct an artificial species universe directly in a latent feature space, where each species is described not by images, but by observable feature vectors — traits that can be perceived either by human observers or by machine models.

These features are generated to match realistic marginal distributions, entropic structure, inter-feature correlations, cluster morphology, and the empirical species–species distance distribution. Through a set of custom loss terms — soft CDF matching for pairwise distances, confusion-aware metrics derived from the Confusion Distribution Matrices of multiple recognition models, a cluster-forming loss that induces latent taxonomic structure, and correlation alignment for biological realism — we construct a synthetic yet coherent biological world containing tens of thousands of species.

These matrices act as probabilistic channels: a “true” species feature is transformed into a noisy observation exactly as our deployed system would produce it. This allows DYLATIS to simulate OFELIA’s recognition behaviour long before the full real system exists.

A key innovation is the discovery of a latent taxonomy. Instead of imposing genera or families externally, the species self-organize into clusters based purely on latent geometry.

The purpose of all this is clear: to predict, with high fidelity, what recognition accuracy OFELIA can achieve when dozens of DL models with known reliability operate together across ten thousands of species.

By running large-scale Monte Carlo simulations in this latent world, DYLATIS provides estimates for top-1, top-5, and cluster-level accuracy, across varying noise conditions, feature qualities, and species counts.

More importantly, DYLATIS shows the path forward for global-scale identification. If we want OFELIA—or any system—to recognize an entire biological kingdom, we must move beyond classical techniques.

DYLATIS is our experimental universe for discovering how such systems should behave—and how accurate they can ultimately become.

DYLATIS - Dynamic Latent Taxonomy Identification Space

Recent Posts

Comments