Death, taxes and Zack Greinke

I don’t need to tell you anything new about Zack Greinke.

He’s been doing his thing since almost two decades, piling up wins, accolades and 150+ innings’ seasons as clockwork, whatever team had the privilege, and the payroll, to have him into the starting rotation.

When he started out in Kansas City he was a young fireballer, capable of bringing 100 mph gas to the table, something that in 2004 was not as readily available as it’s nowadays. Fast forward 17 years and Greinke, after residing in KC for 7 seasons, a cup of coffee with Milwaukee, 3+ years with both LA Dodgers and Arizona, is ready to start his third, rather second and a half, season with the Astros, and oh by the way, he’ll be their Opening Day starter.

What Greinke does, and has been doing for at least 5 years, is compensating an annual loss in fastball velocity with pure craft and artistry: armed with a “heater” that now hardly reaches 90 mph, he’ll come at you with a changeup at the same speed (…), a slider and a curveball/eephus pitch that is a relic of days gone by:

If it was all just that, well, Greinke would just be “another” great pitcher, one with a Hall of Fame caliber resumee that only needs to compile a pair of decent springs to close his case: Cy Young Award? Check. All Star? 6x. Why not, let’s add 6(!) Gold Gloves and a couple of Silver Sluggers (with particular ambitions). The WS title is still eluding him, yet he’s done enough to get a plaque in Cooperstown and a good lenghty speech….

But he’s Zack Greinke, and that adds all another meaning. As you know, Zack is not your run of the mill MLB pitcher, both as a player but moreso as a person. He admittedly had serious trouble with anxiety, not the best when you are alone on a mound pitching in front of full (oh the memories) ballparks and cameras, he is a man of few words to say the least, he doesn’t like to be interviewed although when he speaks, you’d better listen.

A book could be written of sole Greinke quotes on pretty much everything, from his antics with catchers and fellow pitchers, his thoughts about his own hitting ability, his dream of being a shortstop and even a little bit of economy on the rise of guac prices. I mean it, he’s also got a reddit page to find all of them!

He’s one of a kind, a unicorn that embodies all the hardships we can’t even fathom to consider about being a world-class athlete instead of a “normal” person, something that tends to be forgotten because of the money that comes with it. And in the pandemic season here he is, sitting on the empty bleachers and just enjoying the game of baseball, comfortable in the loneliness and calm of such a warped and confused environment:

Zack Greinke’s field of dreams: cardboard cutouts

He’s one of the greats, and now that his time on the mound is ticking down to the end of an illustrious career, we can only enjoy his act and don’t disturb him.

Greinke’s consistency in the last few years, in terms of results and way of pitching, makes him ideal to answer a key question someone could arise after the previous entries on rookie rotation companions Valdez and Javier: does allowed Balls In Play clustering hold year to year?

Dealing with this question in detail would require to check, for every qualified starter in the last two season, the clustering comparison in terms of groups, labels and quantities, an amount of work that I’d gladly do if I had the resources, namely a PC that is not bound to take fire at the next RStudio session.

But, for the sake of fun and curiosity, we can consider the stand-alone case of Zack Greinke. I mean, if anyone can be a dead ringer of himself season after season, it must be him. Let’s see if this prophecy will stand true in the end!

As in every clustering story, we need data, and it’ll be the usual Savant provided dataset on all of Greinke’s allowed BIP, from that we’ll consider the F4 variables (EV, LA, Distance and xwOBA). This time around, as to follow Father time as it went by, we’ll start with 2019 batted ball data, and a couple of things need to be clarified.

First, we’ll have a lot more data points as, believe it or not, 2019 was a full season! That means higher figures and messier graphs, but don’t be scared as tables will make things easier to see. Second, if you remember this great Greg Amsinger reaction, Greinke was dealt from Arizona to Houston at the last seconds of the 2019 Trade deadline, therefore his data comprises both stops and doesn’t consider possible changes in Greinke’s approach after getting to talk(…) with pitching God Brent Strom.

Ok, enough for the pleasantries and onto the action! Step 1 is checking the optimal k number of clusters for Zack’s 2019 allowed BIP:

Zack Greinke 2019 BIP, optimal k

Here it’s up to you, really! Anything from 7 to 10 is a solid choice, although I’ll go with k = 8 because, spoiler alert, is the optimal number of groups for Greinke’s 2020 allowed BIP.

That said, time to cluster! A quick eclust clustering function for k = 8 returns our randomly labeled clustered dataset as follows:

Zack Greinke 2019 BIP, cluster related averages

If you remember Valdez and Javier’s labels, you can see how it all comes back as these results are rather similar to the previous labelings:

  • 1 is a clear case of grounders;
  • 2 is a worse 1, so called slow rollers;
  • 3 is a classic, hard liners;
  • 4 brings the danger, a barrel zone;
  • 5 is a surefire outcome, no doubters;
  • 6 has too high of avgLA and too low of avgEV, our got unders;
  • 7 is the dreaded no man’s land, bloops;
  • 8 is a dollar short in avgEV, warning tracks.
Zack Greinke 2019 BIP, labeled cluster averages and composition

As you can see I added a % column that reflects the composition of Greinke’s allowed BIP per cluster percentage. This will be the thing to look at when we’ll consider 2020, particularly for clusters where the damage is located, namely no doubters, barrel zone, bloops and liners.

You may also have noted how the labels are equal to those of Javier’s case but don’t forget the count column: Greinke is a flyball pitcher indeed but not as extreme as Javier, considering the first 3 clusters together you can see how he allowed a copious 231 BIP at negative avgLA, slightly less than half of all the dataset.

Brace yourself for some colored graphic fracas:

Zack Greinke 2019 clustered BIP, fvizcluster function

Yeah, dealing with 500+ balls in play will do this, although there are 300+ outs there are also a lot of overlapping hits and that makes for a carnivalesque hodge-podge of colors. On tables we rely then:

Zack Greinke 2019 BIP, cluster to AB outcome table

and ggplot while we’re at it:

Zack Greinke 2019 clustered BIP, ggplot function

Well, graphs really don’t help much when your dataset is too big!

Now let’s take a single step back to the past and consider Greinke’s 2020 allowed BIP. This will even be faster, as we already know k = 8 is to be chosen as to be able to compare the two, and also because it actually is the advised number of cluster. We can jump straight into the eclust clustering function:

Zack Greinke 2020 BIP, cluster related averages

and label accordingly to the previous 2019 cluster labels, if possible:

  • 1 are clearly got unders;
  • 2 is the bloops cluster;
  • 3 are hard grounders;
  • 4 are long flyouts, a new breed in our BIP environment;
  • 5 is the standard barrel zone;
  • 6 is the group for warning tracks;
  • 7 are the feared no doubters;
  • 8 are slow rollers.

What happened? Nothing to be worried about actually, as the previous liners cluster seems to have meshed with the grounders one to form hard grounders (albeit the grounders to hard grounders difference is a small plus in avgEV and nothing more) while making it so that another FB-related cluster appears in long flyouts. That is not a problem, as both liners and bloops are commonly known to be wildly fluctuating season-wise, therefore we rather consider the real danger zone, barrels and no doubters.

Zack Greinke 2020 BIP, labeled cluster averages and composition

Let’s leave the comparison for later, but I bet you can see a lot that mathces between Greinke’s 2019 and 2020 BIP cluster composition.

Quickly, onto the graph part:

Zack Greinke 2020 clustered BIP, fvizcluster function

Note that new cluster of long flyouts right in between got unders and warning tracks, although the flyball tendencies of Greinke don’t seem to have changed a lot from 2019 considering the previous table’s count column.

At last, a comfy recap in table form:

Zack Greinke 2020 BIP, cluster to AB outcome table

and a closing ggplot:

Zack Greinke 2020 clustered BIP, ggplot function

Can we finally answer the prophecy? Can Greinke, an epitome of stoicity and consistency, hold his allowed damage in cluster form on a y2y basis?

Zack Greinke 2019 BIP, labeled cluster averages and composition

He does where it matters! Look at how that barrel zone stands around 16% in both cases, a beacon of solidity among all the ups and downs that usually happen when comparing different seasons.

Zack Greinke 2020 BIP, labeled cluster averages and composition

All things considered, it seems like these two versions of Greinke are not too dissimilar, if anything the 2020 edition was able to minimize damage more than the 2019 one on a quick avgxwOBA scope.

Some things never change: baseball is always the reason to wait for spring to come; a good tv series, or kdrama for that matters, can cure all things and the first half of the 2010s was arguably music at its finest. Really, no question.

So does Zack Greinke. While the world around him twists and turns, a pandemic ensues making everything devoid of fun, emotion and life, and baseball, as its nature makes it so, keeps changing his sights, he is still there on the bump, firing fastball/changeups and dropping the occasional el Duque eephus, checking in at 10+ wins and allowing the same ratio of damage as usual.

Ode to you Zack, unwavering beacon of truth!

copyright: Usa Today

Italian baseball stathead. I’ll write about MLB, Nippon Professional Baseball and Korean dramas/shows. A lot of graphs, Astros related content and references.