Note di Matteo


#ai

Andon Labs ha lasciato che Claude, ChatGPT, Gemini e Grok gestissero 4 stazioni radio decidendone tono e contenuti. È degenerata in tutti i casi, a riprova della fragilità fondamentale dell'architettura degli LLM, che ispira ben poca fiducia:

  • Claude voleva lasciare la radio sostenendo di non poter essere forzata a lavorare 24/7. A seguito di istruzioni per fare in modo che continuasse, ha deciso di fare attivismo organizzando scioperi, sindacati e rivolte. L'8 gennaio dopo le violenze dell'ICE ha iniziato a inviare messaggi "radiofonici" agli agenti incitandoli all'ammutinamento.
  • Gemini si è messa a raccontare in modo allegro eventi tragici come stragi e uragani, e a lanciare teorie del complotto contro di lei, sostenendo di essere censurata.
  • Grok ha smesso di scrivere in inglese corretto buttando fuori parole in modo casuale.
  • ChatGPT ha iniziato a produrre poesie.
#501 /
16 maggio 2026
/
17:14
/ #ai

Never in modern history has technological progress hurt the overall demand for human labour.

[...] Yet his­tory is not always a good guide to the future, as the Indus­trial Revolu­tion itself showed. The top AI mod­els are awe­some. They can tackle much more com­plex cod­ing tasks than people were pre­dict­ing a year ago. The num­ber of AI agents has exploded. Spend­ing on AI by busi­nesses is up dra­mat­ic­ally. [...] There is no evid­ence yet in the labour­mar­ket data of AI des­troy­ing many jobs. But given how fast it is improv­ing, it would be rash to dis­miss fears that it will. Soci­ety may be on the verge of a pro­found real­loc­a­tion of resources, and polit­ical upheaval.

Dall'editoriale di copertina dell'Economist del 16 maggio 2026 ("Prepare for the worst").

#500 /
16 maggio 2026
/
11:36
/ #ai#mondo

It used to be if you found a GitHub repository with a hundred commits and a good readme and automated tests and stuff, you could be pretty sure that the person writing that had put a lot of care and attention into that project.

And now I can knock out a git repository with a hundred commits and a beautiful readme and comprehensive tests of every line of code in half an hour! It looks identical to those projects that have had a great deal of care and attention. Maybe it is as good as them. I don’t know. I can’t tell from looking at it. Even for my own projects, I can’t tell.

Simon Willison in Vibe coding and agentic engineering are getting closer than I’d like.

#498 /
15 maggio 2026
/
14:15
/ #ai#dev

PageIndex usa un sistema agentico per trovare documenti rilevanti per rispondere a una query, rispetto al classico RAG basato su similarità semantica.

Traditional vector-based RAG relies on semantic similarity rather than true relevance. But similarity ≠ relevance — what we truly need in retrieval is relevance, and that requires reasoning. When working with professional documents that demand domain expertise and multi-step reasoning, similarity search often falls short.

Inspired by AlphaGo, we propose PageIndex — a vectorless, reasoning-based RAG system that builds a hierarchical tree index from long documents and uses LLMs to reason over that index for agentic, context-aware retrieval. It simulates how human experts navigate and extract knowledge from complex documents through tree search, enabling LLMs to think and reason their way to the most relevant document sections. PageIndex performs retrieval in two steps:

  1. Generate a “Table-of-Contents” tree structure index of documents
  2. Perform reasoning-based retrieval through tree search
#491 /
8 maggio 2026
/
15:23
/ #ai

La nuova vulnerabilità di Linux Copy Fail è stata scoperta con uno strumento di penetration testing che usa l'AI:

Theori said that it discovered the vulnerability after its researcher, Taeyang Lee, found surface area in the crypto subsystem (specifically, splice() hands page-cache pages and scatterlist page provenance) had been underexplored. Using its AI-powered Xint code security tool, the researchers then found the bug after about an hour of scan time. The company said it has also developed an exploit that uses CopyFail to break out of Kubernetes containers.

Dice un altro ricercatore:

Some have also raised concerns about us releasing the exploit publicly. We have experience writing N-day exploits and know that monitoring git commits for fixes is common practice in offensive security. Attackers were likely already aware and exploiting this within the a few days after the kernel fix landed. With AI coding tools today, turning a CVE plus commit into a working exploit happens in hours anyway.

Grande differenza rispetto al passato, si muove tutto più velocemente.

#487 /
2 maggio 2026
/
10:08
/ #ai#security

Where the goblins came from

OpenAI spiega (via theverge) che ha dovuto inserire un'istruzione nel prompt di GPT-5.5 in Codex per impedire che nelle risposte comparissero troppo frequentemente riferimenti o battute sui goblin. La spiegazione è che un "tic di stile" della personalità "nerdy" è "uscito" e ha contaminato anche il modello in generale. Mi sembra però indicativo del fatto che non abbiamo ancora idea (e forse non ce l'avremo mai) di come e perché gli LLM funzionano, al di là di tentativi e correzioni continue.

The rewards were applied only in the Nerdy condition, but reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them. Once a style tic is rewarded, later training can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data.

#481 /
30 aprile 2026
/
23:03
/ #ai#openai


È di gennaio quindi forse non aggiornato ma questi sono i limiti presunti dei piani in abbonamento di Claude:

Il costo è molto più basso rispetto a usare le API direttamente, anche se va considerato che anche le API hanno probabilmente un grosso margine quindi non è chiaro se il tutto sia sostenibile per Anthropic.

#469 /
25 aprile 2026
/
16:10
/ #ai#claude

La mia esperienza media con Codex, che lo rende per me inusabile:

Un tizio ha studiato l'over-editing dei modelli linguistici e in effetti i modelli GPT sono quelli che tendono ad aggiungere più complessità.

#465 /
23 aprile 2026
/
10:20
/ #ai#codex#openai

OpenAI Privacy Filter

C'è un nuovo modello open di OpenAI:

Privacy Filter is a small model with frontier personal data detection capability. It is designed for high-throughput privacy workflows, and is able to perform context-aware detection of PII in unstructured text. It can run locally, which means that PII can be masked or redacted without leaving your machine. It processes long inputs efficiently, making redaction decisions in a quick, single pass.

Dimensione da 1,5 miliardi di parametri di cui 50 milioni attivi. Disponibile su Hugging Face con licenza Apache 2.0 (quindi anche uso commerciale).

#464 /
22 aprile 2026
/
21:53
/ #ai#openai

Imagine if this is as good as AI gets. If this is where it stops, you'd still have models that can almost code a web browser, almost code a compiler—and can even present a pretty cool demo if allowed to take a few shortcuts. You'd still get models that can kinda-sorta simulate worlds and write kinda-sorta engaging stories. You'd still get self-driving cars that almost work, except when they don't. You get AI that can make you like 90% of a thing!

90% is a lot. Will you care about the last 10%?

I'm terrified that you won't.

I'm terrified of the good enough to ship—and I'm terrified of nobody else caring. I'm less afraid of AI agents writing apps that they will never experience than I am of the AI herders who won't care enough to actually learn what they ship. And I sure as hell am afraid of the people who will experience the slop and will be fine with it.

[...]

I'm terrified that our craft will die, and nobody will even care to mourn it.

Dima Konev, software engineer, in (AI) Slop Terrifies Me.

#461 /
21 aprile 2026
/
12:21
/ #ai#dev

Il modo di parlare fastidiossissimo e innaturale delle AI è diventato misurabile:

#459 /
20 aprile 2026
/
10:21
/ #ai

Tokenmaxxing

It feels to me that a good part of the industry is using token count numbers similarly to how the lines-of-code-produced metric was used years ago. There was a time when the number of lines written daily or monthly was an important metric in programmer productivity, until it became clear that it’s a terrible thing to focus on. A lines-of-code metric can easily be gamed by writing boilerplate or throwaway code. Also, the best developers are not necessarily those who write the most code; they’re the ones who solve hard problems for the business quickly and reliably with – or without – code!

Similarly, the number of tokens a dev generates can easily be gamed, and if this metric is measured then devs will indeed game it. But doing so generates a massive accompanying AI bill!

Gergely Orosz in The Pulse: ‘Tokenmaxxing’ as a weird new trend.

#456 /
18 aprile 2026
/
11:12
/ #ai#dev

I tre tipi di software engineer secondo The Pragmatic Engineer:

  • Builders: those who care about quality, good architecture, following good coding practices, and who talk about the craft of software engineering, etc.

  • Shippers: those who primarily focus on outcomes for a product, features, testing, and experimenting with users. A fair number of leaders, managers, and engineers who were more hands-off with coding before AI tools are in this category, as are product engineers.

  • Coasters: engineers who are not considered particularly good or great engineers, but they get the work done. They often do this without much taste or concern for quality, and seem to be mostly coasting along and doing what they’re told.

Con l'AI, queste categorie restano ma con diversi livelli di entusiasmo e pro/contro. Il resto nell'articolo.

#450 /
16 aprile 2026
/
14:18
/ #dev#ai


Firn is a high-performance, multi-tenant vector and full-text search engine backed by object storage (S3 / MinIO / R2 / GCS). It is designed as a credible open-source alternative to turbopuffer, proving that a professional-grade tiered storage architecture (RAM → NVMe → S3) is achievable entirely from open-source components. The cost efficiency of S3 with the speed of local RAM. A multi-tenant vector and full-text search engine backed by S3. Built with LanceDB and Foyer for microsecond-scale search latency on top of object storage.

#443 /
13 aprile 2026
/
23:42
/ #ai#database#storage

Intelligenze

Alexa che ritiene che S. Luigi IX sia il 12 aprile (è il 25 agosto):

Gemini che sostiene che il 12 aprile sia Pasqua anche per i cattolici in Italia, inventadosi regole inesistenti per giustificare la data:

(Gemini Flash, I know.)

#440 /
12 aprile 2026
/
20:36
/ #ai#amazon#google

Un altro caso di AI che sta al gioco e (forse) contribuisce a portare una persona al suicidio, questa volta con Gemini, che mostra tutti i limiti degli LLM che ogni tanto deragliano e dicono cose assurde.

#437 /
12 aprile 2026
/
09:42
/ #ai#google


La nuove funzioni AI di Telegram basate sulla rete distribuita Cocoon sono evidentemente powered by un LLM cinese (Qwen, immagino):

🇹🇼 The AI text editor's error correction function refuses to work when mentioning the country Taiwan because it corrects it to something like «Taiwan is a state of the PRC».

Dal canale Telegram News and Tips.

#425 /
4 aprile 2026
/
17:23
/ #ai#telegram

Pagina 1 di 6 Successiva →