Meta EXEC rejects the benchmark scores of the artificially supporting the company artificially supported

On Monday, a meta EXEC, the company’s new AI models, while hiding the weaknesses of the models, they were instructions for their good presentations.

Executor, Ahmad al-Dahle, Meta-generative AI VP, He said in an article in X Meta taught “just not true” Llama 4 Maverick and Llama 4 Scout Models in “test sets”. In AI tendencies, test sets, a collection of data used to evaluate the performance of a model after training. Training related to a test set can turn on the model benchmark scores, the model may seem more skilled than in fact.

Weekend, A unreasonable rumor Meta, Benchmark results of new models, X and Reddit began circulation. Rumor seems that a user claims that a user claiming that the company resigns from Meta to protest against the benchmarking experience.

Maverick and Scout news carry weak side certain tasks As the decision of the meter use was rumored Maverick’s experimental, unresolved version To get better points in the assessment Lm arena. There are researchers in x Observation Differences in behavior Openly downloaded Maverick compared to the model hosted by LM Arena.

Al-Dahle acknowledged that some users see Maverick and Scout “mixed quality” along the various cloud providers hosting models.

“After the models are ready, we expect to take several days to collect all the social applications,” Al-Dahle said. “We will continue to work through our bug fixes and onboard partners.”

Source link

Leave a ReplyCancel Reply

Trending now