Medicine Technology 🌱 Environment Space Energy Physics Engineering Social Science Earth Science Science
Technology 2026-03-17 3 min read

NYU researchers borrow from bird flocks to reduce AI hallucinations in long documents

A preprocessing algorithm that clusters sentences like birds self-organizing into flocks improved factual accuracy in AI-generated summaries across 9,000 documents.

When a large language model summarizes a 50-page scientific paper, it does something that looks like comprehension but is not. It processes tokens, calculates probabilities, and produces fluent text that may or may not correspond to what the paper actually says. The longer and messier the input, the more likely the output drifts from reality.

A team at NYU's Courant Institute of Mathematics, Computing, and Data Science thinks the problem starts before the model ever sees the text. Their solution borrows from an unlikely source: the way birds organize themselves in flight.

Sentences as virtual birds

The algorithm, published in Frontiers in Artificial Intelligence, treats each sentence in a long document as a virtual bird positioned in an abstract space according to its meaning. The birds then self-organize using the same three rules that govern actual flocking: cohesion (stay close to neighbors), alignment (move in the same direction), and separation (avoid crowding).

Sentences with similar meanings naturally cluster together. Within each cluster, a leader emerges - the highest-scoring sentence, evaluated on document-wide centrality, section-level importance, and alignment with the abstract. Only the leaders survive into the final curated summary that gets passed to the AI model.

"The intention was to ground AI models more closely to the source material while reducing repetition and noise before generating a final summary," said Anasse Bari, computer science professor and director of the Predictive Analytics and AI Research Lab at NYU.

Why preprocessing matters

The core insight is that LLM hallucinations are partly an input problem, not just an output problem. "When input text is excessively long, noisy, or repetitive, model performance degrades, causing AI agents and LLMs to lose track of key facts, dilute critical information among irrelevant content, or drift away from the source material entirely," Bari explained.

Consider a cancer research paper where the five highest-ranked sentences all discuss treatment outcomes. Feeding those to an LLM would produce a summary obsessively focused on one aspect of the paper while ignoring background, methods, and conclusions. The flocking algorithm prevents this by selecting leaders from different clusters, ensuring topical diversity that mirrors the document's actual content.

Before the flocking step, each sentence undergoes cleaning: articles, prepositions, and conjunctions are stripped. Multi-word terms are merged ("lung cancer" becomes "lung_cancer") so concepts stay intact. Each sentence is then scored using a fusion of lexical, semantic, and topical features, with numerical boosts for key sections like Introduction, Results, and Conclusion.

Testing across 9,000 documents

The researchers evaluated the algorithm on more than 9,000 documents, comparing summaries generated with the flocking preprocessing step against summaries produced by LLMs alone. The framework, combined with LLMs, produced summaries with greater factual accuracy.

Bari, who previously applied natural-phenomena-inspired algorithms to improve online searches, was careful about the scope of the claim. "While this approach has the potential to partially address the issue of hallucination, we do not want to claim we have solved it - we have not."

That honesty is worth noting. The algorithm is a preprocessing step, not a replacement for the LLM. It cannot prevent a model from generating plausible-sounding nonsense once it starts producing text. What it can do is give the model a cleaner, more representative input - reducing the probability of hallucination without eliminating it.

The limits of biomimicry

The bird-flocking metaphor is evocative but has boundaries. Real birds adjust their behavior in real time based on environmental feedback. The algorithm's "birds" are static once positioned - they do not update based on the model's output or the quality of the resulting summary. The system is also designed primarily for extractive preprocessing of text documents; it does not address hallucination in multimodal tasks or conversational AI.

The 9,000-document test set, while substantial, is not described in terms of domain diversity. Performance on scientific papers may not transfer directly to legal documents, financial reports, or other text types with different structural conventions.

Still, the approach represents a practical contribution. Rather than trying to fix hallucination inside the model - a problem that has resisted years of effort - it addresses the input side of the equation. Clean up what the model sees, and it is less likely to make things up about what it read.

Source: Bari, A. and Huang, B. Published in Frontiers in Artificial Intelligence, 2026. NYU Courant Institute of Mathematics, Computing, and Data Science, Predictive Analytics and AI Research Lab.