#ai

TIL Nano Banana per la generazione di immagini AI non è un diffusion model ma autoregressive, a differenza delle generazioni precedenti di Imagen e a differenza di DALL-E 2 e 3. E Midjourney e Stable Diffusion.

Of note, gpt-image-1, the technical name of the underlying image generation model, is an autoregressive model. While most image generation models are diffusion-based to reduce the amount of compute needed to train and generate from such models, gpt-image-1 works by generating tokens in the same way that ChatGPT generates the next token, then decoding them into an image. It’s extremely slow at about 30 seconds to generate each image at the highest quality (the default in ChatGPT), but it’s hard for most people to argue with free.

In August 2025, a new mysterious text-to-image model appeared on LMArena: a model code-named “nano-banana”. This model was eventually publically released by Google as Gemini 2.5 Flash Image, an image generation model that works natively with their Gemini 2.5 Flash model. Unlike Imagen 4, it is indeed autoregressive, generating 1,290 tokens per image. After Nano Banana’s popularity pushed the Gemini app to the top of the mobile App Stores, Google eventually made Nano Banana the colloquial name for the model as it’s definitely more catchy than “Gemini 2.5 Flash Image”.

#154 /

15 novembre 2025

20:57

/ #ai #google #openai

"I was wrong"

Meglio di "You're absolutely right", probabilmente.

(Claude Code)

#141 /

11 novembre 2025

16:21

/ #ai #anthropic #claude

Magika 1.0

Scrivevo un anno e mezzo fa:

In uno dei suoi tremila blog ieri Google ha annunciato anche un nuovo interessante progetto open source chiamato Magika. Serve a identificare il tipo di un file in automatico e si basa su un modello deep learning molto piccolo e molto efficiente, con tempi di inferenza di pochi millisecondi anche su CPU.

Finora il riconoscimento del tipo di un file era basato sul suo nome (es. estensione .pdf) o sull'analisi dei "magic byte", delle sequenze binarie presenti all'inizio dei file che in molti casi ne permettono l'identificazione. Magika è però di gran lunga superiore rispetto a queste tecniche, con le metriche precision, recall e F1 che superano il 99% e per alcuni tipi di file raggiungono il 100%.

Magika si può usare facilmente con Python o JavaScript, infatti la demo ufficiale funziona nel browser: https://google.github.io/magika/

Ora Magika ha raggiunto la 1.0:

Today, we are happy to announce the release of Magika 1.0, a first stable version that introduces new features and a host of major improvements since last announcement. Here are the highlights:

Expanded file type support for more than 200 types (up from ~100). -A brand-new, high-performance engine rewritten from the ground up in Rust.

A native Rust command-line client for maximum speed and security.

Improved accuracy for challenging text-based formats like code and configuration files.

A revamped Magika Python and TypeScript module for even easier integrations.

Prestazioni notevoli:

Magika is able to identify hundreds of files per second on a single core and easily scale to thousands per second on modern multi-core CPUs thanks to the use of the high-performance ONNX Runtime for model inference and Tokio for asynchronous parallel processing, For example, as visible in the chart below, on a MacBook Pro (M4), Magika processes nearly 1,000 files per second.

#137 /

9 novembre 2025

20:26

/ #ai #google #open-source

Uno spot Coca Cola mostra i limiti dell'AI generativa nei video:

Palesemente non una buona idea, eppure.

#136 /

9 novembre 2025