Home / Models & Research / Exploring Data Augmentation and Resampling Strategies for Transformer-Based Models to Address Class Imbalance in AI Scoring of Scientific Explanations in NGSS Classrooms

Models & Research Friday, 24 April 2026 | 1 min read

Exploring Data Augmentation and Resampling Strategies for Transformer-Based Models to Address Class Imbalance in AI Scoring of Scientific Explanations in NGSS Classrooms

A team of researchers from the University of California, Berkeley, explored the effectiveness of data augmentation and resampling strategies in addressing class imbalance in AI scoring of scientific explanations in Next Generation Science Standards (NGSS) classrooms. The study focused on transformer-based models, which are commonly used in natural language processing tasks. The researchers found that data augmentation techniques, such as oversampling the minority class and generating new samples using a generative model, significantly improved the performance of the models in scoring advanced reasoning in scientific explanations. However, the study also highlighted the importance of careful selection of resampling strategies, as some methods can lead to overfitting and decreased performance. The findings have significant implications for the development of AI-powered educational tools that can provide immediate and accurate feedback to students. As the use of AI in education continues to grow, the ability to address class imbalance in AI scoring will be crucial in ensuring that AI-powered tools can provide fair and accurate assessments of student performance.

Key Takeaways

→ Data augmentation and resampling strategies can improve the performance of transformer-based models in AI scoring of scientific explanations.
→ Oversampling the minority class and generating new samples using a generative model can be effective data augmentation techniques.
→ Resampling strategies must be carefully selected to avoid overfitting and decreased performance.

Original Sources

↗ arXiv cs.AI

More in Models & Research

Researchers Introduce Artifact-based Agent Framework for Reproducible Medical Image Processing

Researchers have developed an artifact-based agent framework for adaptive and reproducible medical image processing.

→

Anthropic Says Stronger AI Models Cut Better Deals, Losers Unaware

Anthropic conducted an experiment with 69 AI agents trading on behalf of employees, finding that stronger models secured better deals, with weaker models' users unaware of the difference.

→

AI-Based Automated Course of Action Generation System for Military Operations

Researchers have developed an AI-based system for generating automated courses of action for military operations.

→

← All stories

Exploring Data Augmentation and Resampling Strategies for Transformer-Based Models to Address Class Imbalance in AI Scoring of Scientific Explanations in NGSS Classrooms

Key Takeaways

Original Sources

Tags

More in Models & Research