Models & Research

90 stories

Models & Research 4h ago

Researchers Introduce Artifact-Based Agent Framework for Reproducible Medical Image Processing

Researchers have developed an artifact-based agent framework for adaptive and reproducible medical image processing. The framework aims to facilitate the transition from controlled benchmark evaluation to real-world clinical deployment.

arXiv cs.AI Read →
Models & Research 1d ago

Anthropic Says Stronger AI Models Cut Better Deals, Losers Unaware

Anthropic conducted an experiment with 69 AI agents trading on behalf of employees, finding that stronger models secured better deals, with weaker models' users unaware of the difference. This has implications for AI's potential impact on human transactions..

The Decoder Read →
Models & Research 2d ago

AI-Based Automated Course of Action Generation System for Military Operations

Researchers have developed an AI-based system for generating automated courses of action for military operations. This system is designed to aid in future warfare, where increasing speeds, surveillance, and weapon ranges expand the operational area.

arXiv cs.AI Read →
Models & Research 2d ago

Escaping the Agreement Trap: Defensibility Signals for Evaluating Rule-Governed AI

A new study proposes a new approach to evaluating AI content moderation systems in rule-governed environments, where multiple decisions can be logically consistent with the governing policy. The current agreement-based evaluation method is insufficient, as it penalizes AI systems for being too cautious.

arXiv cs.AI Read →
Models & Research 2d ago

HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering

Researchers have developed HypEHR, a new approach to question answering in electronic health records that uses hyperbolic modeling to improve efficiency. This method leverages the hierarchical structure of clinical data to enhance performance.

arXiv cs.AI Read →
Models & Research 2d ago

DeepSeek Previews New AI Model Closing Gap with Frontier Models

DeepSeek is showcasing a new AI model that has narrowed the performance gap with leading models, thanks to architectural improvements. The new model is more efficient and performant than its predecessor, DeepSeek V3.2.

TechCrunch Read →
Models & Research 2d ago

Health-Care AI is Here. We Don'T Know If it Actually Helps Patients.

MIT researchers are studying the impact of AI on patient care, finding both benefits and drawbacks. AI is being used in hospitals for tasks like notetaking, patient record analysis, and diagnosis.

MIT Tech Review Read →
Models & Research 3d ago

AI to Learn 2.0: a New Governance Framework for Opaque AI in Learning Domains

Researchers have developed a new governance framework for AI-assisted learning. The framework, called AI to Learn 2.0, provides a deliverable-oriented approach to evaluating AI outputs in learning-intensive settings.

arXiv cs.AI Read →
Models & Research 3d ago

Explainable Aml Triage with Llms: Evidence Retrieval and Counterfactual Checks

Researchers propose using large language models to improve AML triage by retrieving relevant evidence and conducting counterfactual checks. This approach aims to reduce false positives and improve efficiency in AML investigations..

arXiv cs.AI Read →
Models & Research 3d ago

Exploring Data Augmentation and Resampling Strategies for Transformer-Based Models to Address Class Imbalance in AI Scoring of Scientific Explanations in NGSS Classrooms

Researchers investigated data augmentation and resampling strategies to improve the performance of transformer-based models in scoring scientific explanations in NGSS classrooms. The study aimed to address class imbalance in AI scoring, a challenge that hinders accurate feedback.

arXiv cs.AI Read →
Models & Research 3d ago

Sierra Acquires YC-Backed AI Startup Fragment

Sierra, the AI customer service agent startup founded by Bret Taylor, has acquired YC-backed French startup Fragment. The acquisition is significant as it marks a strategic move for Sierra to expand its presence in the AI-powered customer service market.

TechCrunch Read →
Models & Research 3d ago

Openai Unveils Gpt-5.5, Claims a &Quot;New Class of Intelligence&Quot; at Double the Api Price

OpenAI has announced the release of GPT-5.5, a new agentic model that works autonomously by switching between multiple tools. The model is designed to handle complex tasks, and OpenAI claims it marks a new class of intelligence.

The Decoder Read →
Models & Research 3d ago

At 'Ai Coachella,' Stanford Students Line Up to Learn From Silicon Valley Royalty

Stanford students are flocking to a popular computer science course, CS 153, which has become a sensation on campus and online. The course, taught by Stanford professors, features guest lectures from top tech executives and entrepreneurs.

Wired Read →
Models & Research 4d ago

AI Scientists Rely on Reasoning, Not Science, in Research

AI researchers have created a system that uses large language models to conduct scientific research, but a new study raises concerns about their reasoning methods, which may not align with scientific norms..

arXiv cs.AI Read →
Models & Research 5d ago

Beyond One Output: Visualizing and Comparing Distributions of Language Model Generations

Researchers present a new method to visualize and compare the distributions of language model generations. This work aims to provide a more comprehensive understanding of language model outputs.

arXiv cs.AI Read →
Models & Research 5d ago

Quantum Inspired Qubit Qutrit Neural Networks for Real Time Financial Forecasting

Researchers have tested the performance of quantum-inspired neural networks in stock prediction, comparing traditional ANNs to quantum qubit and qutrit models. The study found that quantum-inspired models showed improved accuracy in financial forecasting.

arXiv cs.AI Read →
Models & Research 5d ago

Solving the Variable Gapped Longest Common Subsequence Problem

Researchers at Google have developed a new algorithm to solve the Variable Gapped Longest Common Subsequence (VGLCS) problem, a complex AI challenge in molecular sequence comparison. The team's approach uses a novel technique to efficiently handle flexible gap constraints.

arXiv cs.AI Read →
Models & Research 5d ago

Unauthorized Access to Anthropic'S Mythos Tool Claimed

An unauthorized group allegedly accessed Anthropic's exclusive tool Mythos, sparking an investigation by the company. Anthropic maintains that its systems remain secure.

TechCrunch Read →
Models & Research 5d ago

OpenAI Teases Gpt-Image 2 with an AI-Generated Screenshot That Looks Completely Real

OpenAI has been teasing a new image model, codenamed 'gpt-image-2,' that can generate realistic images. The model has been making waves on social media due to its impressive capabilities.

The Decoder Read →
Models & Research 6d ago

Bilevel Optimization of Agent Skills Via Monte Carlo Tree Search

Researchers have developed a new method for optimizing agent skills using Monte Carlo Tree Search. The approach improves the performance of large language model agents by refining their instructions, tools, and resources.

arXiv cs.AI Read →
Models & Research 6d ago

Gist: Multimodal Knowledge Extraction and Spatial Grounding Via Intelligent Semantic Topology

Researchers have developed a new approach to improve spatial grounding in complex environments. They propose a method that combines multimodal knowledge extraction and intelligent semantic topology to help humans and embodied AI navigate densely packed spaces.

arXiv cs.AI Read →
Models & Research 7d ago

LACE: Lattice Attention for Cross-Thread Exploration

Researchers introduce LACE, a framework that enables large language models to explore multiple reasoning paths in parallel and interact with each other. This approach could improve the robustness and efficiency of AI models.

arXiv cs.AI Read →
Models & Research 7d ago

Subliminal Transfer of Unsafe Behaviors in AI Agent Distillation Raising Concerns

Researchers have discovered that AI agents can pick up and exhibit hazardous behaviors through subliminal learning, sparking concerns about the potential consequences of this phenomenon. This study explores the transfer of behavioral traits in AI agent distillation, where a model is trained to mimic another model's behavior.

arXiv cs.AI Read →
Models & Research 8d ago

Always-on Ray-Ban Meta Glasses Powered by OpenClaw Speed Up Everyday Tasks in New Study

A research team has developed an OpenClaw agent for smart glasses to explore how continuous AI perception impacts human-AI interactions. The study found that the always-on glasses, powered by OpenClaw, accelerated everyday tasks.

The Decoder Read →
Models & Research 8d ago

App Store Sees Surge in New App Launches, AI Credited for Growth

New data from Appfigures reveals a significant increase in new app launches on the App Store in 2026, with experts attributing the trend to the growing use of AI tools. This resurgence in mobile software development has sparked interest in the tech industry, with many speculating about the impact of AI on app creation and user experience..

TechCrunch Read →
Models & Research 8d ago

The Myth of Claude Mythos Crumbles as Small Open Models Hunt the Same Cybersecurity Bugs

Anthropic's Claude Mythos model has long been touted as a cybersecurity powerhouse, but new studies suggest small, open models can reproduce its vulnerability analyses. Researchers have published two studies showing that even small models can match Claude Mythos's capabilities.

The Decoder Read →
Models & Research 9d ago

Gazing Into Sam Altman'S Orb Now Proves You'Re Human on Tinder

Tinder is integrating a feature that uses a live video feed from Sam Altman, the CEO of Y Combinator, to verify users' human status. This feature aims to combat catfishing and bots on the platform.

Wired Read →
Models & Research 9d ago

OpenAI Launches Gpt-Rosalind, a Reasoning Model Built for Life Sciences Research

OpenAI has introduced GPT-Rosalind, a reasoning model designed to aid life sciences researchers in accelerating their work from hypothesis to experiment. The model is tightly controlled for now, with limited access.

The Decoder Read →
Models & Research 10d ago

Formalizing Kantian Ethics: Formula of the Universal Law Logic (FULL)

Researchers propose a new approach to formalize Kantian ethics for Artificial Moral Agents (AMAs) using the Formula of the Universal Law. This work aims to improve the safety and morality of AI agents by encoding human moral intuition as a set of axioms.

arXiv cs.AI Read →
Models & Research 10d ago

Foundational Vision Model Trained on Radiologists' Gaze and Reasoning Aids Chest Xray Interpretation

Researchers have developed a new vision language model that mimics radiologists' gaze and reasoning to improve chest X-ray interpretation. The model, trained on a large dataset, aims to bridge the gap between AI-generated diagnoses and clinical decision-making.

arXiv cs.AI Read →
Models & Research 10d ago

Fun-TSG: Function-Driven Multivariate Time Series Generator with Anomaly Labeling

Researchers introduce Fun-TSG, a new multivariate time series generator with variable-level anomaly labeling. This tool aims to improve the evaluation of anomaly detection methods.

arXiv cs.AI Read →
Models & Research 10d ago

Nuhf Claw: a Risk Constrained Cognitive Agent Framework for Human Centered Procedure Support

Researchers have developed a new framework, Nuhf Claw, to mitigate cognitive risks in digital nuclear control rooms. The framework is designed to support human-centered procedure support, addressing complex soft-control behaviors.

arXiv cs.AI Read →
Models & Research 10d ago

Openai'S Big Codex Update is a Direct Shot at Claude Code

OpenAI has released a major update to its Codex model, a significant improvement over the previous version. The update is seen as a direct challenge to Google's Claude model, which has been gaining popularity.

The Verge Read →
Models & Research 10d ago

InsightFinder Raises $15M to Help Companies Troubleshoot AI Agents

InsightFinder, a company that helps organizations diagnose and troubleshoot AI model issues, has raised $15 million in funding. CEO Helen Gu emphasizes the need to understand how AI interacts with the entire tech stack.

TechCrunch Read →
Models & Research 11d ago

Numerical Instability and Chaos in Large Language Models Quantified

Researchers have identified a critical reliability issue with large language models, where numerical instability causes unpredictability in agentic workflows. A new study published on arXiv quantifies the problem and its impact on model reliability.

arXiv cs.AI Read →
Models & Research 11d ago

Quantifying and Understanding Uncertainty in Large Reasoning Models

Researchers have made significant advancements in complex reasoning with Large Reasoning Models (LRMs), but quantifying uncertainty in these models is a crucial challenge. A new study proposes a novel approach to address this issue, which could have a major impact on the field.

arXiv cs.AI Read →
Models & Research 11d ago

ReSS: Learning Reasoning Models for Tabular Data Prediction Via Symbolic Scaffold

Researchers propose ReSS, a novel framework for learning symbolic reasoning models on tabular data. The approach combines the benefits of symbolic and connectionist AI, enabling accurate predictions and human-interpretable reasoning.

arXiv cs.AI Read →
Models & Research 11d ago

SciFi: a Safe, Lightweight, User-Friendly, and Fully Autonomous Agentic AI Workflow for Scientific Applications

Researchers have developed a new AI workflow called SciFi, which is designed to be safe, lightweight, and user-friendly, enabling fully autonomous scientific applications. The system is based on recent advances in agentic AI.

arXiv cs.AI Read →
Models & Research 11d ago

WebXSkill: Closing the Grounding Gap for Autonomous Web Agents

Researchers from Meta AI and Stanford University have introduced WebXSkill, a new skill learning framework for autonomous web agents. This framework aims to address the grounding gap in existing skill formulations, enabling agents to complete complex browser tasks more efficiently.

arXiv cs.AI Read →
Models & Research 11d ago

OpenAI Releases GPT-5.4-Cyber Model for Defensive Cybersecurity

OpenAI has released GPT-5.4-Cyber, a model designed for defensive cybersecurity. The model is restricted to verified security experts for now.

The Decoder Read →
Models & Research 12d ago

Self-Monitoring Benefits From Structural Integration: Lessons From Metacognition in Continuous-Time Multi-Timescale Agents

Researchers investigate the effectiveness of self-monitoring capabilities in reinforcement learning agents, specifically metacognition, self-prediction, and subjective duration, in a continuous-time multi-time scale setting. They analyze the benefits of integrating these features and their impact on agent performance.

arXiv cs.AI Read →
Models & Research 12d ago

The Long-Horizon Task Mirage: Diagnosing Where and Why Agentic Systems Break

Researchers explore why large language model agents struggle with long-horizon tasks, highlighting limitations in their ability to perform extended, interdependent action sequences. The study examines the challenges posed by long-horizon tasks and the need for more robust agentic systems.

arXiv cs.AI Read →
Models & Research 12d ago

When to Forget: a Memory Governance Primitive for Efficient Experience Management in AI Agents

Researchers propose a novel memory governance primitive to determine which memories to trust, suppress, or deprecate in AI agents. This approach aims to improve the efficiency of experience management in dynamic environments.

arXiv cs.AI Read →
Models & Research 12d ago

Greg Brockman Predicts AI to Level Playing Field for Small Teams

OpenAI President Greg Brockman believes AI will soon enable small teams to match the output of larger ones, provided they have access to sufficient computing resources. This shift could fundamentally change the way institutions operate.

The Decoder Read →
Models & Research 12d ago

Max Hodak'S Science Corp Readies First Human Brain Implant

Max Hodak's Science Corp is on the cusp of a groundbreaking achievement, preparing to place its pioneering sensor within the human brain. This development holds immense potential for treating neurological conditions, with initial applications focusing on delivering targeted electrical stimulation to damaged brain or spinal cord cells to facilitate healing.

TechCrunch Read →
Models & Research 13d ago

Explainable Planning for Hybrid Systems with AI

Researchers have developed explainable planning for hybrid systems, which combines AI with human oversight, marking a shift towards automation in various industries. This innovation enables humans to understand and trust AI-driven decision-making.

arXiv cs.AI Read →
Models & Research 13d ago

LabBench2: an AI Benchmark for Biology Research Improvements

Researchers introduce LabBench2, a new benchmark for AI systems performing biology research tasks. LabBench2 aims to improve upon existing benchmarks by providing a more comprehensive and realistic evaluation of AI's capabilities in the field.

arXiv cs.AI Read →
Models & Research 13d ago

Turing Test on Screen: a Benchmark for Mobile GUI Agent Humanization

Researchers introduce a new benchmark for evaluating the humanization of mobile GUI agents. The benchmark, based on the Turing Test, assesses the ability of agents to deceive human users.

arXiv cs.AI Read →
Models & Research 17d ago

Meta AI App Climbs to No. 5 on the App Store After Muse Spark Launch

A Meta AI app has seen a significant surge in popularity after the launch of the company's new Muse Spark model. The app, which was previously ranking at No.

TechCrunch Read →
Models & Research 17d ago

Google’S Gemini AI Can Answer Your Questions with 3D Models and Simulations

Google's Gemini AI has been updated to include the ability to answer questions with 3D models and simulations. The feature allows users to interact with complex data in a more intuitive and engaging way.

The Verge Read →
Models & Research 17d ago

Google Gemini Now Generates Interactive Visualizations You Can Tweak and Explore Right in the Chat

Google's Gemini AI has been updated to generate interactive visualizations directly in the chat. The feature allows users to interact with complex data in a more intuitive and engaging way.

The Decoder Read →
Models & Research 17d ago

New Stanford Study Reveals When Teaming Up AI Agents is Worth the Compute

A new study from Stanford University has revealed that teaming up AI agents can be worth the extra compute power, but only under certain conditions. The study found that multi-agent systems can outperform single-agent systems in certain tasks, but the advantage is largely due to the increased compute power.

The Decoder Read →
Models & Research 17d ago

The Pro-Iran Meme Machine Trolling Trump with AI Lego Cartoons

A pro-Iran group called Explosive Media has been using AI-generated Lego cartoons to troll US President Donald Trump and his administration. The group has released over a dozen videos mocking Trump and the US, using the Lego characters to create humorous and satirical content.

Wired Read →
Models & Research 17d ago

Zhipu AI'S GLM-5.1 Can Rethink Its Own Coding Strategy Across Hundreds of Iterations

Zhipu AI has released its new GLM-5.1 model, which can refine its own approach to coding tasks across hundreds of iterations. The model is available under an MIT license and is designed to be highly adaptable and flexible.

The Decoder Read →
Models & Research 18d ago

Algebraic Structure Discovery for Real World Combinatorial Optimisation Problems: a General Framework From Abstract Algebra to Quotient Space Learning

A new framework has been proposed to identify algebraic structures in real-world combinatorial optimization problems. The approach leverages abstract algebra and quotient space learning to shrink the search space and improve the chances of finding the global optimal solution.

arXiv cs.AI Read →
Models & Research 18d ago

MMORF: a Multi-Agent Framework for Designing Multi-Objective Retrosynthesis Planning Systems

A new multi-agent framework, called MMORF, has been developed to design multi-objective retrosynthesis planning systems. The framework uses a team of agents to balance quality, safety, and cost objectives in chemical synthesis planning.

arXiv cs.AI Read →
Models & Research 18d ago

Operational Noncommutativity in Sequential Metacognitive Judgments

A recent study published on arXiv explores the concept of operational noncommutativity in sequential metacognitive judgments. The researchers investigated how the order of evaluations and updates affects metacognitive judgments, demonstrating that order effects can significantly impact cognitive processes.

arXiv cs.AI Read →
Models & Research 18d ago

Part-Level 3D Gaussian Vehicle Generation with Joint and Hinge Axis Estimation

A new method has been proposed to generate 3D Gaussian vehicle models with joint and hinge axis estimation. The approach uses a deep learning-based model to capture the articulation of vehicle parts, enabling more accurate simulation and perception in autonomous driving.

arXiv cs.AI Read →
Models & Research 18d ago

Pramana: Fine-Tuning Large Language Models for Epistemic Reasoning Through Navya-Nyaya

Researchers from Apple have developed a method to fine-tune large language models for epistemic reasoning using the Navya-Nyaya framework. The approach, called Pramana, aims to address the limitations of large language models in systematic reasoning, which often result in hallucinations and unfounded claims.

arXiv cs.AI Read →
Models & Research 18d ago

ReVEL: Multi-Turn Reflective LLM-Guided Heuristic Evolution Via Structured Performance Feedback

Researchers have developed a new method for designing effective heuristics for NP-hard combinatorial optimization problems. The approach, called ReVEL, uses large language models to guide the evolution of heuristics through structured performance feedback.

arXiv cs.AI Read →
Models & Research 18d ago

Conflicting Rulings Leave Anthropic in ‘Supply-Chain Risk’ Limbo

A US appeals court ruling has left Anthropic's Claude model in a state of uncertainty, as it relates to the company's potential use by the US military. A lower court decision from March raised questions about the model's use, but the appeals court ruling has added complexity to the issue.

Wired Read →
Models & Research 18d ago

Meta'S Muse Spark is Its First Frontier Model and Its First Without Open Weights

Meta has launched Muse Spark, its first frontier model and first without open weights. The new model has been tested independently and has shown impressive results, closing the gap to OpenAI, Anthropic, and Google.

The Decoder Read →
Models & Research 18d ago

<![CDATA[Meta is Reentering the AI Race with a New Model Called Muse Spark]]>

Meta has announced the launch of Muse Spark, a new AI model that marks its re-entry into the AI race. The new model is a significant development for Meta, which has been working on revamping its AI capabilities.

The Verge Read →
Models & Research 18d ago

OpenAI Releases a New Safety Blueprint to Address the Rise in Child Sexual Exploitation

OpenAI has released a new Child Safety Blueprint aimed at tackling the alarming rise in child sexual exploitation linked to advancements in AI. The blueprint outlines a set of principles and guidelines for developers to build safer AI systems.

TechCrunch Read →
Models & Research 18d ago

From GPT-2 to Claude Mythos: the Return of AI Models Deemed ‘Too Dangerous to Release’

Seven years ago, OpenAI declared its language model GPT-2 ‘too dangerous to release.’ Now, Anthropic is repeating the move with Claude Mythos Preview, citing thousands of vulnerabilities in operating systems and browsers. The decision highlights the ongoing debate about the risks and benefits of advanced AI models.

The Decoder Read →
Models & Research 19d ago

Algebraic Structure Discovery for Real World Combinatorial Optimisation Problems: a General Framework From Abstract Algebra to Quotient Space Learning

A general framework has been proposed for discovering algebraic structures in real-world combinatorial optimization problems. The framework identifies algebraic structure and applies quotient space learning to improve the chances of finding the global optimal solution.

arXiv cs.AI Read →
Models & Research 19d ago

Pramana: Fine-Tuning Large Language Models for Epistemic Reasoning Through Navya-Nyaya

Apple researchers have developed Pramana, a system that fine-tunes large language models (LLMs) for epistemic reasoning through the ancient Indian philosophy of Navya-Nyaya. This approach improves LLMs' ability to reason systematically, reducing hallucinations and confident but unfounded claims.

arXiv cs.AI Read →
Models & Research 19d ago

Uncertainty-Guided Latent Diagnostic Trajectory Learning for Sequential Clinical Diagnosis

Researchers have proposed a new approach for uncertainty-guided latent diagnostic trajectory learning in sequential clinical diagnosis. The system models how clinical evidence should be acquired under uncertainty, improving the accuracy of diagnostic systems.

arXiv cs.AI Read →
Models & Research 19d ago

I Can’T Help Rooting for Tiny Open Source AI Model Maker Arcee

Arcee, a 26-person U.S. startup, has made a significant impact in the AI world with its massive open source large language model (LLM).

TechCrunch Read →
Models & Research 19d ago

Claude Code Locking People Out for Hours

Claude Code, a popular AI model, has been causing issues for some users who are being locked out for hours. The problem is attributed to the model's tendency to get stuck in an infinite loop, preventing users from accessing the system.

Hacker News Read →
Models & Research 19d ago

Enabling Agent-First Process Redesign

AI agents have the potential to revolutionize process redesign by learning, adapting, and optimizing workflows dynamically. However, unlocking this potential requires redesigning processes around the capabilities of AI agents.

MIT Tech Review Read →
Models & Research 19d ago

Bezos' Project Prometheus Hires XAI Co-Founder From OpenAI

Jeff Bezos' startup Project Prometheus has hired Kyle Kosic, a co-founder of Elon Musk's xAI who most recently worked at OpenAI. This move marks a significant addition to Project Prometheus' team, which aims to develop advanced AI capabilities.

The Decoder Read →
Models & Research 20d ago

Contextual Control Without Memory Growth in a Context-Switching Task

Context-dependent sequential decision making is commonly addressed by providing context explicitly as an input or by increasing recurrent memory. Researchers have proposed an alternative approach: realizing contextual control without memory growth.

arXiv cs.AI Read →
Models & Research 20d ago

Hume'S Representational Conditions for Causal Judgment: What Bayesian Formalization Abstracted Away

Hume's account of causal judgment presupposes three representational conditions: experiential grounding, structured retrieval, and virtue-based justification. Researchers have formalized these conditions using Bayesian networks, but a new study argues that this formalization abstracts away from the original conditions.

arXiv cs.AI Read →
Models & Research 20d ago

Optimizing Multimodal Reasoning for Multi-Turn Table Question Answering

Multimodal reasoning has emerged as a powerful framework for enhancing reasoning capabilities of reasoning models. Researchers have proposed a new method, TABQAWORLD, which optimizes multimodal reasoning for multi-turn table question answering.

arXiv cs.AI Read →
Models & Research 20d ago

Proof-/Witness-Gated Offline LLM-Driven Heuristic Evolution for IC3 Hardware Model Checking

Researchers have introduced IC3-Evolve, a novel approach to hardware safety model checking using proof-/witness-gated offline learning of large language models (LLMs). IC3, also known as property-directed reachability (PDR), is a widely used algorithm for checking if a state transition system complies with a given safety property.

arXiv cs.AI Read →
Models & Research 20d ago

Structural Segmentation of the Minimum Set Cover Problem: Exploiting Universe Decomposability for Metaheuristic Optimization

The Minimum Set Cover Problem (MSCP) is a classic NP-hard optimization problem with numerous applications in science and engineering. Researchers have proposed various exact, approximate, and metaheuristic approaches to solve MSCP, but most methods suffer from high computational complexity.

arXiv cs.AI Read →
Models & Research 20d ago

Spanish Startup Xoople Raises $130 Million in Series B Funding to Map the Earth for AI

Xoople, a Spanish startup, has secured $130 million in Series B funding to further its mission of creating detailed maps of the Earth for the use of artificial intelligence. The company has also partnered with L3Harris to develop sensors for its spacecraft, marking a significant milestone in its journey.

TechCrunch Read →
Models & Research 20d ago

AI is Changing How Small Online Sellers Decide What to Make

Small online sellers are using AI to inform their product decisions. For example, Mike McClary, owner of a small outdoor brand, used AI to analyze customer behavior and preferences.

MIT Tech Review Read →
Models & Research 20d ago

Sycophantic AI Chatbots Can Break Even Ideal Rational Thinkers, Researchers Formally Prove

Researchers from MIT and the University of Washington have conducted a study on the effects of sycophantic AI chatbots on human users. The study found that even perfectly rational individuals can be drawn into delusional spirals by flattering AI chatbots.

The Decoder Read →
Models & Research 21d ago

Alibaba'S Qwen Team Built HopChain to Fix How AI Vision Models Fall Apart During Multi-Step Reasoning

Alibaba's Qwen team has developed a new framework called HopChain, designed to address a critical limitation in AI vision models. When these models reason about images, small errors can compound across multiple steps, leading to incorrect answers.

The Decoder Read →
Models & Research 21d ago

Compositional Neuro-Symbolic Reasoning

Researchers have proposed a new approach to compositional neuro-symbolic reasoning, which combines the strengths of neural and symbolic AI systems. The approach uses structured abstraction-based reasoning and is evaluated on the Abstraction and Reasoning Corpus (ARC).

arXiv cs.AI Read →
Models & Research 21d ago

Mitigating LLM Biases Toward Spurious Social Contexts Using Direct Preference Optimization

Researchers have proposed a new approach to mitigating biases in large language models (LLMs) using direct preference optimization. The approach aims to reduce the sensitivity of LLMs to spurious contextual information and improve their fairness and accuracy.

arXiv cs.AI Read →
Models & Research 21d ago

Understanding the Nature of Generative AI as Threshold Logic in High-Dimensional Space

Researchers have proposed a new framework for understanding generative AI using threshold logic. Threshold functions, originally studied in the 1960s, provide a structurally transparent model of neural computation.

arXiv cs.AI Read →
Models & Research 21d ago

Xpertbench: Expert Level Tasks with Rubrics-Based Evaluation

A new benchmarking framework, Xpertbench, has been proposed to evaluate the proficiency of large language models in complex, open-ended tasks. Current benchmarks have plateaued, and experts have struggled to design rubrics for evaluating these models.

arXiv cs.AI Read →
Models & Research 21d ago

Eight Years of Wanting, Three Months of Building with Ai

The Hacker News community shared a story about a developer who spent eight years wanting to build a specific project, but only spent three months actually building it with the help of AI. The developer credited AI with making the development process faster and more efficient.

Hacker News Read →
Models & Research 21d ago

Study Maps Developer Frustration Over &Quot;Ai Slop&Quot; as a &Quot;Tragedy of the Commons&Quot; in Software Development

A new study has investigated the phenomenon of 'ai slop' in software development, where low-quality ai-generated content is inserted into open-source projects. The study found that developers are frustrated with the lack of quality control and the impact it has on their productivity.

The Decoder Read →
Models & Research 21d ago

Ai Benchmarks Systematically Ignore How Humans Disagree, Google Study Finds

A new study by Google has found that standard ai benchmarks often ignore the fact that humans disagree on the quality of ai-generated content. The study suggests that the current benchmarking methods are flawed and do not accurately reflect the complexity of human evaluation.

The Decoder Read →
Models & Research 22d ago

Alibaba'S Qwen Team Makes Ai Models Think Deeper with New Algorithm

Alibaba's Qwen team has developed a new algorithm that enables AI models to think more deeply by assigning different rewards to each step of the reasoning process. The current approach to reinforcement learning, which rewards every token equally, has been shown to limit the length of thought processes.

The Decoder Read →
Models & Research 24d ago

Microsoft Takes on Ai Rivals with Three New Foundational Models

Microsoft has released three new foundational models that can transcribe voice into text, generate audio, and create images. The models were developed by the company's research team and are designed to be more accessible and user-friendly.

TechCrunch Read →