Do AI reasoning models need new approaches to motivation?

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. learn more

The era of AI reasoning is well underway.

After OpenAI again AI revolution started with it o1 reasoning symbol introduced back in September 2024 – which takes longer to answer questions but with the reward of higher performance, especially on complex, multi-step problems in maths and science – the commercial AI field has been flooded with copycats and competitors.

Yes DeepSeek has the R1, Google Gemini 2 Flash thinkand just today, LamaV-o1all trying to offer similar “reasoning” to OpenAI's new o1 o1 and o3 model families. These models are involved “chain-of-thought” (CoT) stimulation. – or “self-motivated” – makes them reflect on their analysis mid-stream, double back, check their own work and ultimately arrive at a better answer than just being fired from their jobs. shelters as soon as possible, as other large language models (LLMs) do.

But the high cost of o1 and o1-mini ($15.00/1M input tokens versus $1.25/1M input tokens for GPT-4o on API OpenAI) has made some balk at the supposed performance benefits. Is it really worth paying 12X as much as the standard, modern LLM?

As it turns out, there is a growing number of conversions – but perhaps the key to unlocking the true value of reasoning models lies in the user motivating them in a different way.

Shawn Wang (founder of news service AI Resin) appeared on his Sub-stack over the weekend a guest post from Ben Hylak, a former Apple Inc., interface designer for visionOS (which powers the Vision Pro spatial computing headset). The post has gone viral because it clearly explains how Hylak encourages the o1 OpenAI model to get (him) very valuable results.

In short, instead of the human user writing recommendations for the o1 model, they should think about writing “summaries,” or more detailed explanations that provide a- enter a lot of context beforehand about what the user wants the model to produce, who the user is. and in what form they want the model to produce information for them.

As Hylak wrote on it Sub-stack:

With most models, we are trained to tell the model how we want it to respond. eg 'You are an expert software engineer. Think slowly and carefully“

This is the opposite of how I had success with o1. I'm not leading him on how – just what. Then let o1 take over and plan and solve his own steps. This is what the automated reasoning is for, and it can be much faster than manually reviewing and chatting as the “man in the loop”.

Hylak also includes a nice annotated screenshot of an example trigger for o1 that produced useful results for a list of circuits:

This blog post was so helpful, OpenAI president and co-founder Greg Brockman himself reshared it on his X account with the a message: “o1 is a type of model. Excellent performance requires using it in a new way compared to standard chat models. “

I tried again on my own to learn fluent Spanish and the result was herefor those who are curious. Perhaps not as impressive as Hylak's quick and well-constructed response, but certainly shows strong potential.

Separately, even when it comes to irrational LLMs like Claude 3.5 Sonnet, there may be room for regular users to improve their motivation to get better, more limited results.

As Louis Arge, former Teton.ai engineer and current creator of the neuromodulation tool openFUS, write on X“One trick I've found is that LLMs trust their own suggestions more than mine,” and he gave an example of how he made Claude “less guarded” with the first going out “inciting a fight” with him over his results.

All of this shows that agile engineering remains a valuable skill as the AI era progresses.

Daily thoughts on business practice issues by VB Daily

If you want to impress your boss, VB Daily has you covered. We'll give you the inside scoop on what companies are doing with generative AI, from management trends to practical use, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletter here.

An error occurred.

Source link

Leave a ReplyCancel Reply