Diffbot's AI model doesn't guess – it knows, thanks to a trillion-truth knowledge graph


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. learn more


Diffbota small Silicon Valley company best known for maintaining one of the world's largest indexes of web experienceannounced today the release of a new AI model that promises to address one of the biggest challenges in the field: factual accuracy.

The new modela fine-tuned version of Meta's LLama 3.3, the first open-source implementation of a system called enhanced graph recapture generation, or GraphRAG.

Unlike conventional AI models, which rely entirely on large amounts of preloaded training data, LLM at Diffbot draws on real-time information from the company Knowledge Grapha regularly updated database containing more than a trillion interconnected facts.

“We have a thesis: that eventually the general logic will boil down to about 1 billion parameters,” said Mike Tung, founder and CEO of Diffbot, in an interview with VentureBeat. “You don't really want the knowledge in the model. You want the model to be good at just using tools so it can query knowledge outside.”

How it works

Diffbot at A graph of knowledge is a sprawling, automated database that has been crawling the public web since 2016. It classifies web pages into groups such as people, companies, products and articles, producing structured information that ' using a combination of computer vision and natural language processing.

Every four to five days, the Knowledge Graph is updated with millions of new facts, ensuring that it remains up to date. Diffbot at AI model uses this resource by querying the graph in real time to retrieve information, rather than relying on static knowledge encoded in its training data.

For example, when asked about a recent news event, the model can search the web for the latest updates, extract relevant facts, and cite the original sources. This process is designed to make the system more accurate and transparent than traditional LLMs.

“Imagine asking an AI about the weather,” Tung said. “Instead of generating an answer based on outdated training data, our model queries a live weather service and provides an answer based on real-time information. “

How Diffbot's Knowledge Graph Beats Traditional AI in Finding Facts

In benchmark tests, Diffbot's approach seems to be paying off. The company reports that its model achieves an 81% accuracy score. FreshQAa benchmark created by Google for testing real-time virtual experiences, surpassing both ChatGPT and Gemini. He also scored 70.36% forward MMLU-Proa more difficult version of a standardized test of academic knowledge.

Perhaps most notably, Diffbot makes its model completely open, allowing companies to run it on their own hardware and customize it for their needs. This addresses growing concerns about data privacy and vendor lock-in with major AI providers.

“You can run it locally on your machine,” Tung said. “There's no way you can run Google Gemini without sending your data over to Google and putting it outside of your building.

Open source AI could revolutionize how enterprises handle sensitive data

The news comes at a very important time in the development of AI. In recent months there has been increasing criticism of the tendency of large language models to “hallucinate” or generating false information, even as companies continue to increase model sizes. Diffbot's approach suggests another way forward, one focused on grounding AI systems in verifiable facts rather than trying to encode all human knowledge in neural networks.

“Not everyone is going after just bigger and bigger models,” Tung said. “You can have a model that has more capacity than a large model with a kind of unintelligible approach like ours.”

Industry experts note that Diffbot's Knowledge Graph-based approach could be particularly valuable for enterprise applications where accuracy and search capability are critical. The company already provides data services to major companies including Cisco, DuckDuckGo and Snapchat.

The model is immediately available through an open source release on it GitHub and can be tested through a public demo at diffy.chat. For organizations looking to install it internally, Diffbot says the smaller 8-billion-parameter version can run on one Nvidia A100 GPUwhile the full 70-billion version requires two parameters H100 GPUs.

Looking ahead, Tung believes that the future of AI does not lie in ever-bigger models, but in better ways to organize and access human knowledge: “Facts are getting old A lot of those facts will be moved out into clear places where you can change the experience and where you can source data.”

As the AI ​​industry grapples with challenges of factual accuracy and transparency, Diffbot's release offers a compelling alternative to the bigger-is-better paradigm. Whether it succeeds in changing the direction of the field remains to be seen, but it has certainly shown that when it comes to AI, size isn't everything.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *