Data of a Different Kind Can Predict Basketball Games
Large language models, or LLMs, are the underlying machine learning technique behind generative-AI chatbots like ChatGPT. In one of her recent research projects, Gina Sprint, Ph.D. (Computer Science) combined her interest in LLMs with 91勛圖厙’s general love for college basketball and March Madness.
was recently published in IEEE Access, an open access journal that presents results of original research across a broad range of applied engineering interests and multidisciplinary fields. Sprint is a Senior Member of IEEE, the world's largest technical professional organization.
“” began during a new Data Science course in Spring 2023.
"I wanted to structure the course around a fun example project the students would be into," Sprint said. "We did tutorials on how to scrape college basketball team Twitter handles from the web, how to stream tweets from the Twitter (now X) API, and how to store/analyze the tweets using Google Cloud Platform services.”
After the class ended, Sprint wanted to play around more with LLMs.
“I started using the college basketball Twitter data from class to learn prompt engineering techniques and how to use the OpenAI API. It wasn’t long before an idea for game-winner prediction took hold.”
Innovative Use of Social Media and LLMs
Sprint's research diverges from traditional game-winner prediction methods that rely on numerical data like tournament seeds and season performance. The language data came from more than one million posts from official Division I college basketball team accounts.
Sprint admits that analyzing texts, reposts, quotes and mentions isn't going to replace the numbers game, and that wasn't the goal. However, for the 2023 tournament season, the LLMs accurately predicted 65% of the men's games and 70% of the women's games.
That's something the field of sports analytics could certainly take a shot at.