Note di Matteo


#ai

PageIndex usa un sistema agentico per trovare documenti rilevanti per rispondere a una query, rispetto al classico RAG basato su similarità semantica.

Traditional vector-based RAG relies on semantic similarity rather than true relevance. But similarity ≠ relevance — what we truly need in retrieval is relevance, and that requires reasoning. When working with professional documents that demand domain expertise and multi-step reasoning, similarity search often falls short.

Inspired by AlphaGo, we propose PageIndex — a vectorless, reasoning-based RAG system that builds a hierarchical tree index from long documents and uses LLMs to reason over that index for agentic, context-aware retrieval. It simulates how human experts navigate and extract knowledge from complex documents through tree search, enabling LLMs to think and reason their way to the most relevant document sections. PageIndex performs retrieval in two steps:

  1. Generate a “Table-of-Contents” tree structure index of documents
  2. Perform reasoning-based retrieval through tree search
#491 /
8 maggio 2026
/
15:23
/ #ai

La nuova vulnerabilità di Linux Copy Fail è stata scoperta con uno strumento di penetration testing che usa l'AI:

Theori said that it discovered the vulnerability after its researcher, Taeyang Lee, found surface area in the crypto subsystem (specifically, splice() hands page-cache pages and scatterlist page provenance) had been underexplored. Using its AI-powered Xint code security tool, the researchers then found the bug after about an hour of scan time. The company said it has also developed an exploit that uses CopyFail to break out of Kubernetes containers.

Dice un altro ricercatore:

Some have also raised concerns about us releasing the exploit publicly. We have experience writing N-day exploits and know that monitoring git commits for fixes is common practice in offensive security. Attackers were likely already aware and exploiting this within the a few days after the kernel fix landed. With AI coding tools today, turning a CVE plus commit into a working exploit happens in hours anyway.

Grande differenza rispetto al passato, si muove tutto più velocemente.

#487 /
2 maggio 2026
/
10:08
/ #ai#security

Where the goblins came from

OpenAI spiega (via theverge) che ha dovuto inserire un'istruzione nel prompt di GPT-5.5 in Codex per impedire che nelle risposte comparissero troppo frequentemente riferimenti o battute sui goblin. La spiegazione è che un "tic di stile" della personalità "nerdy" è "uscito" e ha contaminato anche il modello in generale. Mi sembra però indicativo del fatto che non abbiamo ancora idea (e forse non ce l'avremo mai) di come e perché gli LLM funzionano, al di là di tentativi e correzioni continue.

The rewards were applied only in the Nerdy condition, but reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them. Once a style tic is rewarded, later training can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data.

#481 /
30 aprile 2026
/
23:03
/ #ai#openai


È di gennaio quindi forse non aggiornato ma questi sono i limiti presunti dei piani in abbonamento di Claude:

Il costo è molto più basso rispetto a usare le API direttamente, anche se va considerato che anche le API hanno probabilmente un grosso margine quindi non è chiaro se il tutto sia sostenibile per Anthropic.

#469 /
25 aprile 2026
/
16:10
/ #ai#claude

La mia esperienza media con Codex, che lo rende per me inusabile:

Un tizio ha studiato l'over-editing dei modelli linguistici e in effetti i modelli GPT sono quelli che tendono ad aggiungere più complessità.

#465 /
23 aprile 2026
/
10:20
/ #ai#codex#openai

OpenAI Privacy Filter

C'è un nuovo modello open di OpenAI:

Privacy Filter is a small model with frontier personal data detection capability. It is designed for high-throughput privacy workflows, and is able to perform context-aware detection of PII in unstructured text. It can run locally, which means that PII can be masked or redacted without leaving your machine. It processes long inputs efficiently, making redaction decisions in a quick, single pass.

Dimensione da 1,5 miliardi di parametri di cui 50 milioni attivi. Disponibile su Hugging Face con licenza Apache 2.0 (quindi anche uso commerciale).

#464 /
22 aprile 2026
/
21:53
/ #ai#openai

Imagine if this is as good as AI gets. If this is where it stops, you'd still have models that can almost code a web browser, almost code a compiler—and can even present a pretty cool demo if allowed to take a few shortcuts. You'd still get models that can kinda-sorta simulate worlds and write kinda-sorta engaging stories. You'd still get self-driving cars that almost work, except when they don't. You get AI that can make you like 90% of a thing!

90% is a lot. Will you care about the last 10%?

I'm terrified that you won't.

I'm terrified of the good enough to ship—and I'm terrified of nobody else caring. I'm less afraid of AI agents writing apps that they will never experience than I am of the AI herders who won't care enough to actually learn what they ship. And I sure as hell am afraid of the people who will experience the slop and will be fine with it.

[...]

I'm terrified that our craft will die, and nobody will even care to mourn it.

Dima Konev, software engineer, in (AI) Slop Terrifies Me.

#461 /
21 aprile 2026
/
12:21
/ #ai#dev

Il modo di parlare fastidiossissimo e innaturale delle AI è diventato misurabile:

#459 /
20 aprile 2026
/
10:21
/ #ai

Tokenmaxxing

It feels to me that a good part of the industry is using token count numbers similarly to how the lines-of-code-produced metric was used years ago. There was a time when the number of lines written daily or monthly was an important metric in programmer productivity, until it became clear that it’s a terrible thing to focus on. A lines-of-code metric can easily be gamed by writing boilerplate or throwaway code. Also, the best developers are not necessarily those who write the most code; they’re the ones who solve hard problems for the business quickly and reliably with – or without – code!

Similarly, the number of tokens a dev generates can easily be gamed, and if this metric is measured then devs will indeed game it. But doing so generates a massive accompanying AI bill!

Gergely Orosz in The Pulse: ‘Tokenmaxxing’ as a weird new trend.

#456 /
18 aprile 2026
/
11:12
/ #ai#dev

I tre tipi di software engineer secondo The Pragmatic Engineer:

  • Builders: those who care about quality, good architecture, following good coding practices, and who talk about the craft of software engineering, etc.

  • Shippers: those who primarily focus on outcomes for a product, features, testing, and experimenting with users. A fair number of leaders, managers, and engineers who were more hands-off with coding before AI tools are in this category, as are product engineers.

  • Coasters: engineers who are not considered particularly good or great engineers, but they get the work done. They often do this without much taste or concern for quality, and seem to be mostly coasting along and doing what they’re told.

Con l'AI, queste categorie restano ma con diversi livelli di entusiasmo e pro/contro. Il resto nell'articolo.

#450 /
16 aprile 2026
/
14:18
/ #dev#ai


Firn is a high-performance, multi-tenant vector and full-text search engine backed by object storage (S3 / MinIO / R2 / GCS). It is designed as a credible open-source alternative to turbopuffer, proving that a professional-grade tiered storage architecture (RAM → NVMe → S3) is achievable entirely from open-source components. The cost efficiency of S3 with the speed of local RAM. A multi-tenant vector and full-text search engine backed by S3. Built with LanceDB and Foyer for microsecond-scale search latency on top of object storage.

#443 /
13 aprile 2026
/
23:42
/ #ai#database#storage

Intelligenze

Alexa che ritiene che S. Luigi IX sia il 12 aprile (è il 25 agosto):

Gemini che sostiene che il 12 aprile sia Pasqua anche per i cattolici in Italia, inventadosi regole inesistenti per giustificare la data:

(Gemini Flash, I know.)

#440 /
12 aprile 2026
/
20:36
/ #ai#amazon#google

Un altro caso di AI che sta al gioco e (forse) contribuisce a portare una persona al suicidio, questa volta con Gemini, che mostra tutti i limiti degli LLM che ogni tanto deragliano e dicono cose assurde.

#437 /
12 aprile 2026
/
09:42
/ #ai#google


La nuove funzioni AI di Telegram basate sulla rete distribuita Cocoon sono evidentemente powered by un LLM cinese (Qwen, immagino):

🇹🇼 The AI text editor's error correction function refuses to work when mentioning the country Taiwan because it corrects it to something like «Taiwan is a state of the PRC».

Dal canale Telegram News and Tips.

#425 /
4 aprile 2026
/
17:23
/ #ai#telegram


Altra conferma, dal Wall Street Journal, che Sora è stato dismesso perché usava troppe GPU:

OpenAI was weeks away from finishing work on a new AI model, code-named Spud, and needed to free up more computing resources to power the coding and enterprise products that would run on it. AI chips are the most precious commodity at any leading research lab, and at OpenAI, Sora was eating up far too many of them.

OpenAI’s researchers are able to track how AI chips are allocated between different groups through an internal dashboard. Some of them were surprised by the amount of computing resources the company gave to the Sora team, given that video-generation tools didn’t make much money, nor improve the capabilities of its language models.

Mentre sempre il WSJ racconta un po' della backstory che ha portato alla scissione di Dario Amodei e quindi Anthropic da OpenAI, nel 2020. Il riassunto è che il tutto è in mano a cricca di tech bros un po' pazzi e dalla dubbia etica che litigano continuamente tra loro per il potere. Cioè business as usual per gli standard della Silicon Valley.

#411 /
30 marzo 2026
/
13:48
/ #ai#anthropic#openai

In the next 6 to 12 months, the engineer who makes the difference is the one who can look at an agent’s output and say “this is wrong for reasons you can’t see from where you’re standing.” The one who knows which doors are one-way, which abstractions will calcify, and which corners will cost you later.

Boris Tane in Slop Creep: The Great Enshittification of Software

#406 /
25 marzo 2026
/
11:58
/ #ai

Pagina 1 di 5 Successiva →