Google accused of using novices to test Gemini AI answers

It's true that the AI still has its fair share of unreliable issues, but hopefully its estimates will at least be accurate. However, last week Google allegedly instructed contract workers Gemini score do not miss a single hint, regardless of your competence, TechCrunch reports based on internal guidance that he had reviewed. Google shared a preview Gemini 2.0 earlier this month.

Google has reportedly instructed GlobalLogic, the outsourcing firm whose contractors evaluate AI-generated results, not to force reviewers to skip tips that are outside their expertise. Previously, contractors could overlook any requests that were outside their scope of expertise, such as asking a doctor about the law. The guidelines stated, “If you do not have the critical knowledge (e.g., programming, mathematics) to evaluate this prompt, skip this assignment.”

Now, contractors were allegedly instructed: “You should not skip hints that require domain-specific knowledge,” and that they should “evaluate the parts of the prompt that you understand,” adding a note that it was not an area in which they had expertise. It appears that Contracts can now only be skipped if a large piece of information is missing or if it contains harmful content that requires special consent forms to evaluate.

One contractor responded to the changes aptly by stating, “I thought the point of skipping was to improve accuracy by outsourcing it to someone better?”

Shortly after this article was first published, Google provided Engadget with the following statement: “Evaluators perform a wide range of tasks across a variety of Google products and platforms. They provide valuable feedback not only on the content of the answers, but also on style, format, and other factors. The scores they provide do not directly impact our algorithms, but collectively they are a useful source of data that will help us evaluate how well our systems are performing.”

A Google spokesperson also noted that the new language will not necessarily change Gemini's accuracy, as they are asking raters to specifically rate the parts of the clues that they understand. This may include providing feedback on issues such as formatting issues, even if the evaluator does not have specific experience in the matter. The company also pointed out release of the FACTS Grounding test this week which can review LLM responses to ensure that “they are not only factually accurate with respect to the given inputs, but are also detailed enough to provide satisfactory answers to user queries.”

Update December 19, 2024 11:23 a.m. ET: This story has been updated with a statement from Google and more details about how its rating system works.

Source link

Leave a ReplyCancel Reply