Cristian Javier threads the needle

Alessandro Zilio
8 min readMar 13, 2021


In this day and age of baseball I bet you felt somewhat nostalgic reading the previous Framber Valdez entry. Speaking of groundballs and sinkers in 2021 is a lot like marching to this absolute Girls’ Generation tune while the rest of the world is all BTS and BLACKPINK (although if you want to indulge in more old school SNSD bangers you’ll find me there).

Yet we are in 2021, where a pandemic is still there to haunt us, concerts are held online and baseball is not the land of Maddux and Glavine anymore. The trend, pitching-wise, is all about velocity and breaking balls in a north-south scope, peppering the top part of the zone with the heat (better if 95+ mph) and then spinning the bender/slider/frisbee out of the zone and racking up Ks. Jomboy has you covered: here is Gerrit Cole (ColeTrain still miss ya) firing a dart past Brandon Lowe:

and then making Josè Martinez (?) look foolish on a low and away slider:

Cristian Javier is not Gerrit Cole by any means, but the way he goes about his craft is similar: fill the upper part of the strike zone with 4-seamers and finish off guys with breaking stuff. Why is he not a Gerrit Cole-light then? Well, Coles don’t grow on trees for starters, but to be fair it’s part location and part stuff.

Javier averaged 92.5 mph on his fastball in 2020, not exactly a fireballer, although he located it as one:

Cristian Javier, 2020 FF location

Throwing a not-so-fastball 63% of the time up there seems a bold move, Cotton, and it was one indeed: the pitch itself had good peripherals, Savant has a .210/.405/.309 xslash line as xBA/xSLG/xwOBA in 2020, yet the Whiff% is just decent at 19.9% and the spin is only slightly above average (2360 rpm). What is it about Javier’s 4-seamer then? Well, it has to do with his release point, a high 3/4 slot, and the vertical drop, 15% above league average for comparable fastballs.

When Javier threw some cheese up top, batters had a hard time figuring its trajectory, whiffing high or hitting under and flying out. The element of trickery that Javier seems to bring to the table was much more evident with his real weapon, a slider that made no prisoners in 2020:

Cristian Javier, 2020 slider location

That is not Gerrit Cole’s pattern for sure! Yes, sometimes Javier went low and away to righties but the big red dot in the middle speaks for itself: CJav threw a “here it comes try to hit this” challenge and batters lost…a lot!

Savant has a love story with Javier’s slide piece: in 2020 it was one of the best among “qualified” starters, a dandy that gave hitters the fizz to the tune of a .103/.125/.153 expected slash line (xBA/xSLG/xwOBA), a 32% Whiff rate and a miserable 6.9% HardHit rate as a closing remark on a death sentence:

Whether you call it slider or curveball, Javier threw 30% breaking pitches in 2020, apart from a sporadic changeup, and gave up no extra-base hits while filling the strike zone with a pitch that is usually located out of it to avoid damage.

It’s not movement nor stuff, neither location, rather a pure mastery in the trait of deception, one that garnered Javier a mention as one of the last “invisiball” artists in the sport, along with old reliable Yusmeiro Petit, even before he threw a single pitch in the Majors.

Javier’s pitches are a pain to pick up for a hitter and, when he does, it usually means one thing: a flyball. Among 2020 starters with at least 40 IP, only Robbie Ray (52.4%) had a higher FB% than Javier’s 51.9%. Moreover the young Houston gun plied the teachings of Marco Estrada and allowed a majestic 15.9% IFFB (Infield Fly FB).

All of the aforementioned make Javier the perfect pitcher to which apply the BIP clustering and then compare it with the Framber Valdez results, given their opposite ways of going about pitching.

By now you know the drill, so less words and more action! Starting from all of Javier’s allowed BIP and the usual four variables (LA, EV, Distance and xwOBA) we first need some clues on the k number of clusters to set:

Cristian Javier 2020 BIP, optimal k

Well, that is an issue! While k = 1 is what the criteria tells us, it’s obviously not the best road to follow. Here’s a small piece of advice: when in doubt check another function. In this case I switched to the ClusGap one in the cluster package and looked for another condition: considering the smallest k that is 1 or less Std.Err away from the first local maximum in the GapStat (called firstSEmax criteria).

That returns a familiar result: k = 8! While it’s nothing more than a mere coincidence, it’s fun to see how complete opposite patterns of allowed BIP can be divided in the same number of groups.

Speaking of which, let’s get to the clustering! The quick eclust clustering function returns Javier’s allowed BIP clustered in 8 groups with 1–8 labels as follows:

Cristian Javier 2020 BIP, cluster related averages

Same number of clusters as Framber but complete different composition if we look at the averages. Time to name your cluster!

  • 1 is something we already saw for Framber, avgLA near 0 and above 95 avg EV, these are hard liners;
  • 2 is also nothing new, that is the damage cluster known as barrel zone;
  • 3 is a rarity for Javier, balls hit on a downward angle therefore grounders;
  • 4 is almost a subset of the barrel zone, one that hurts even more in terms of avgxwOBA, these are what your broadcaster would call no doubters;
  • 5 turns out to be the weirdest cluster, the one full of no-man’s land dropping bloops;
  • 6 is 3 with less punch and more overswings, our slow rollers;
  • 7 is what a flyball pitcher wants, absurdly high LAs and balls popping out harmlessly, because batters just got unders;
  • 8 is some EV short and a LA late, the warning tracks.
Cristian Javier 2020 BIP, labeled cluster averages

Note how, while there are some overlapping labels with the Framber case, notably a barrel zone, bloops and liners, we lost two ground-related clusters, hard grounders and tappers, that have been replaced by two flyball-oriented groups, no doubters and got unders.

That is to say that, although the number of clusters is the same for both Valdez and Javier, they are not even close as pitchers in their M.O. as they carve the opposing team’s lineup.

Let’s clear things out with Javier’s clustering graph:

Cristian Javier 2020 BIP, fvizcluster function

As you can see the majority of Javier’s allowed BIP resides on the far right, the high LA zone, while only few scattered points are down to the left, in the grounder-Framber area. That implies a more complex grouping of FB for Javier, and viceversa for Valdez and all his groundballs.

Here is the table rendition of the above cluster:

Cristian Javier 2020 BIP, cluster to AB outcome table

And the family-friendly ggplot:

Cristian Javier 2020 clustered BIP, ggplot function

What is that we got out of this fun skit? Well, first of all point dispersion is a key factor: given a certain k we are more likely to get further splits into groups where observations are condensed rather than where they are few and far away from each other.

There’s also an important note about clusters: they are labeled according to averages, so that two completely different balls in play could be grouped together. That also implies that a grounder given up by Valdez could be different from one allowed by Javier, and that’s why, when compared, what I’m keen to look at is only the avgxwOBA, a term of conceded production:

Framber Valdez 2020 BIP, labeled cluster averages

All the subjectiveness of clustering is there to be seen: hard liners, bloops and warning tracks are slightly more dangerous against Valdez because they were hit harder and further, on average.

Cristian Javier 2020 BIP, labeled cluster averages

Javier on the other hand seems to have had a lucky year looking at the barrel zone avgxwOBA, but don’t forget that the Valdez clustering didn’t separate “normal” barrels and no doubters.

Lastly, watch out for the count column: while bloops hurt both, they were also a rare occurence (17 and 11). If anything, both pitchers seem to have succeeded with their own plans: Valdez allowed a grand total of 88 BIP in ground-related clusters (hard and normal grounders, slow rollers and tappers) for a below .200 avgxwOBA, Javier racked up even better results in 59 BIP for not-barreled FB clusters (got unders and warning tracks) for a well-below .100 avgxwOBA.

One stat that stands out is how Javier, the one with a FB oriented approach, managed to avoid barrels better than Framber and his sinkerballing style, and that is a data point to be wary of for Javier’s future seasons: if he’ll be able to pitch his way out of barrels and into infield popouts and lazy flyouts year in and year out he could be a perennial overachiever against all projection sistems, otherwise a regression is to be expected.

What is left to explore, and what’ll be the topic of my next entry about clustering allowed BIP, is if these clusters hold their structure in consecutive seasons, both in terms of averages, therefore labels, and conceded production.

To check that I’ll pick another Astros starter, this time not a rookie but…let’s say a seasoned veteran, check his 2020 and 2019 allowed BIP and their clusterings to see if there’s a common ground to be found. A hint:

He is not impressed

Until next time, beware of sliders in the zone!



Alessandro Zilio

Italian baseball stathead. I’ll write about MLB, Nippon Professional Baseball and Korean dramas/shows. A lot of graphs, Astros related content and references.