Crackerjack data: How MLB is reinventing the fan experience of today, tomorrow, even yesterday
Contributing Writer, Google Cloud
Major League Baseball is using cloud technology to bring games, stats, history, and augmented experiences to life for fans new and old.
In baseball, there are three true outcomes: a batter walks, strikes out, or hits a home run.
In each case, the ball never enters the field of play. Yet even if it seems like the other seven fielders are standing around waiting for something to happen, an at-bat that results in one of the three true outcomes is still chock full of data.
Data is everything in baseball. Every movement on the field generates data. In the case of one of the three true outcomes, data is generated by the velocity of the pitch, how much it spins on the way to the plate, where the pitcher’s arm was when he released the ball, how much the ball breaks from side to side, the path of the batter’s swing, where the catcher caught the ball, or the angle and velocity the ball was hit out of the park. Baseball tracks and measures everything… and that was just one pitch.
“In terms of data, I think the main challenge is that it’s just never-ending,” said Truman Boyes, the senior vice president of technology infrastructure at Major League Baseball. “The content is always happening. There are 162 games a year, plus all of the other ancillary content that is produced.”
The modern era of baseball may have more sophisticated forms of data — with systems like Statcast and Hawk-Eye tracking everything on the field—but baseball is also the oldest of the major sports in the United States. Not only does Major League Baseball (or MLB) have all the data generated by the era of ubiquitous sensors and cloud computing but also decades upon decades of historical archives that have the potential to be digitized, tagged with metadata, and made searchable for fans everywhere. According to Boyes, the league has over 60 petabytes of historical data within its archives.
What can the league do with all that data? Let’s explore three true outcomes of Major League Baseball’s treasure trove.
Outcome the first: Engage the modern day fan
Baseball is a game that can be enjoyed in a variety of ways. Some fans, like season ticket holders, primarily engage by attending live games at the stadium. For others, the game is a ubiquitous presence on their television — or smartphone — screens on summer afternoons and evenings. Some people like to get deep into the massive world of advanced statistics, or by watching analysts on the web and TV, or by reading sports reporters’ daily stories.
“We’re engaging with a broader view of baseball fans that are not just the ones who go to a game or watch a three-hour game in your living room,” Boyes said. “They are looking for highlights, they are looking for custom reels and content that is personalized to them. We make it available, and I think we care more about the engagement than pushing emails. We’ve learned more about what their viewing habits are, what they care about.”
The pinnacle of MLB’s on-field data analytics operation is Statcast, which was introduced in 2015 and migrated to Google Cloud in 2020. From 2015 to 2019, the system consisted of cameras and radar installed at every ballpark. In 2020 the system was upgraded to Hawk-Eye, which includes 12 cameras recording every action on the field at 100 frames per second. According to MLB, “Five of those [cameras], which have higher frames-per-second rates, focus on pitch tracking. The other seven are dedicated to tracking players and batted balls. This more robust system has raised the percentage of batted balls that get tracked from roughly 89% to 99%.”
Going long on data and history to build the fan experience of the future
Statcast is the foundation for not just engaging the modern baseball fan but also providing front offices and players with that latest data to help improve the game. The Statcast era has brought terms like “spin rate” (how fast a pitch spins on its way to the plate) and “launch angle” (the angle of a batted ball) into the modern baseball lexicon and changed how the game is both watched and played.
MLB also uses the data it collects at its ballparks to understand its fans. What are their seating preferences? Is there a particular concession stand they like or beverage they prefer? Virtually, the league services fans through customizable highlights and reels via its Film Room feature, the play-by-play on its mobile app, providing all the advanced stats gathered via Statcast, and more.
“It’s not just what we produce inside our editing facilities, but we open it up,” Boyes said. “What do you want to see? Do you want to see the top home runs of Aaron Judge? And then have those highlights and experiences curated, and then they can share it.”
Outcome the second: Bring history back to life
During the 2020 pandemic-shortened season (the league only played a 60-game regular season), teams were eager to engage their fans in the absence of regular baseball. The San Francisco Giants took the opportunity to go through its own massive catalog of archives, using automation and artificial intelligence to digitize decades of old game footage, news reels, and player interviews. The biggest challenge was aggregating all the metadata so that the new digital archives were searchable and usable.
The Giants uncovered some gems from its digitization efforts, including an interview of the legendary Willie Mays giving batting tips.
With its 60 petabytes of historical data, MLB has the same opportunity to bring the rich and vibrant history of baseball to life and present it to the modern era of fans.
“We have over three million assets in our film room, which is hosted on Google Cloud,” Boyes said. “Most of these edits have been entirely automated in that we’re taking time code data, which has metadata associated with what’s actually happened in play, and those clips are automatically generated and then uploaded for anyone to view.”
The work has already started, with Film Room clips searchable back to the 1920s. Film Room allows fans and broadcasters to search for any kind of play from any player in the database. For instance, a search of “Ted Williams home runs” brings up three results from All-Star Games in the 1940s and ‘50s. A more modern search might look for every home run hit by Angels star Shohei Ohtani on a curveball in 2022.
Ted Williams 1941 All-Star Game walk off home run
“These are historical archives that go back to the 1940s,” Boyes said. “Beautiful games that are largely left in dusty archive rooms. And we have the opportunity to take these archives and make them available to fans in new ways. Not just old 16-millimeter film which has been digitized, but actually highlight reels that have been generated that contain metadata so you can search across it and have this content newly presented to fans of today.”
Outcome the third: Develop the fan of the future
The data vault that MLB has amassed and organized over the years has allowed it a variety of opportunities to improve its on-field product, how it engages fans, and plan for future expansion of the game.
“For the last hundred plus years, this sport has been one of the most popular in the country,” Boyes said. “We are looking to see baseball grow internationally, and to see the presence of baseball throughout our country have a larger growth rate and engagement. Being able to engage even deeper with youth is really important as we continue to see this sport evolve.”
It's a critical evolution, too, as fans' tastes change, attention spans shift, and competition arises from venues as varied as international soccer and e-sports. The league continues to explore new and emerging technologies to engage fans, be they digital collectibles, or combining Statcast data with augmented reality to allow fans an immersive experience.
“Connecting technology to the game allows us to get that engagement,” Boyes said. “This relates to our media content, and it’s related to the experience at the parks. “And it relates to how the game is played from the perspective of how long the game takes. We’re looking at ways to maintain a faster paced game that is more engaging that attracts people to tune in.”
A data grand slam
Major League Baseball exemplifies what can happen when an organization focuses on aggregating all of its data and making it universally accessible.
MLB’s customers — the millions of fans across the globe — benefit from in-depth experience unlike any in the game’s history. And the league’s internal stakeholders, such as front offices and players, benefit from the data to help make the game better. Together, MLB has created a virtuous cycle that should benefit its evolution in the next century of games to come.
“That’s what we’re building and what we’re looking to take to the next level,” Boyes said. “The experience is something you pass down to your children. This is something that when you go and you as a family, it's memorable and so we want each one of those memories to really just just be a great experience all over across the board.”
If you’d like to learn more about Major League Baseball's technology journey, we have two more exciting channels for you. You can watch a video conversation or listen to an extended podcast with Truman Boyes. In each, he goes even more in depth on MLB’s transformation with Anil Jain, managing director of Media & Entertainment Solutions at Google Cloud, and Chris Hood, a digital strategist at Google Cloud.
Major League Baseball trademarks and copyrights are used with permission of Major League Baseball. Visit MLB.com.