Skip to main content

Images and videos

Revelio is currently the model with the most effective architecture in many contexts. V01 and Brokenwand are on the platform for coverage reasons for specific scenarios (old generators no longer widely used).

Revelio:

  • Technology: This is a technology that allows training a specialized model, combining the semantic part of the image with the intrinsic characteristics of the image at the pixel level. Unlike the V0X family, it does not analyze the specific detail of all pixels, preferring a “high-level” analysis combined with the semantic value extracted from the image. The advantage is greater precision on certain types of images at the expense of reliability on images not aligned with those in the training set.
  • Dataset: Multi-generator dataset featuring photorealistic AI-generated images. The oldest generators date back to 2023 and we maintain a continuous integration pipeline, updating Revelio with the latest architectural signatures from emerging generators. .

Revelio-Onboarding:

  • Technology: Same technology as in Revelio. Compared to morphing, it recognized more modern faceswap techniques
  • Dataset: Multi-vector facial dataset featuring high-fidelity synthetic or manipulated human faces. We maintain a continuous integration pipeline, updating Revelio Onboarding with the latest architectural signatures—from deep-swapping and latent-space editing to full-scale identity synthesis.

Ellen:

  • Technology: The model’s architecture is specifically designed to detect traces left by photo editing tools, mainly in the context of image compression. It is a model that responds affirmatively only in the presence of photo editing; in other cases, it is not considered (it is listed under “excluded models”).
  • Dataset: Dual-stream dataset featuring original and digitally altered (tampered) pairs. The model learns to detect localized anomalies resulting from copy-move operations and advanced splicing techniques.

V01 (legacy)

  • Technology: Older model along with Brokenwand. It is a model based on a technology that allows working on “general” aspects of an image, regardless of the type of image displayed. These are therefore “generalist” models and therefore allow analyzing a good number of image types but with usually lower reliability compared to models like Revelio.
  • Dataset: A legacy reference corpus featuring photorealistic synthetics generated between Q1 and Q4 2023. This dataset captures the foundational architectural signatures of early-to-mid-stage Latent Diffusion and GAN-based synthesis.

Morphing (legacy):

  • Technology: This is a different technology from the previous ones, whose objective is to identify faces as a first step, and then analyze their features in search of distortions compared to the real image. It is a model that, when it correctly detects a face, allows determining if it has been transformed in some of its features. It is less effective in the case of completely generated faces; in case of no faces, this model can be ignored.
  • Dataset: A legacy dataset of altered faces, generated using classical geometric warping and latent-space synthesis prevalent through 2023. This dataset serves as a forensic baseline for detecting synchronized landmark-based manipulations and early-stage generative morphing.

Audio (speech)

IdentifAIs employs multiple models to detect deepfake speeches.

Shared Dataset

Each model relies on a distinct technological approach, while all are trained on a shared multimodal dataset composed of high-fidelity authentic speech samples and their synthetic counterparts. The dataset spans the evolution of neural speech synthesis—from legacy concatenative methods to modern diffusion-based vocoders. We maintain a continuous integration pipeline to ensure that the dataset is regularly updated with the latest generation and voice manipulation techniques.

Underlying Technologies

  • Zebra II (legacy): the model architecture is designed to distinguish between authentic and artificial audio. It leverages a supplementary model to perform numerical feature transformations of the audio signal prior to classification;
  • Phantom: this model focuses on raw audio analysis. Phantom detects subtle artifacts that other methods may overlook, while its specialized training ensures robustness in unpredictable real-world conditions.
  • Quiet: this model features a complex architecture composed of a modular system of specialized detectors, governed by a dynamic router that assigns the most appropriate expert module to each task. This design provides critical adaptability against next-generation deepfakes;
Quiet is currently the most effective model in most contexts. Given the inherent variability and complexity of speech (e.g., voice types, speech patterns, frequency ranges, acoustic environments), performance should always be evaluated with these factors in consideration.