Archive for the ‘playing with stats’ Category

The Somethingiest Something of the Aughts: The Hitters

September 16, 2009

Funny thing about writing a daily blog with no remuneration and no one to hold you accountable: sometimes life gets in the way and you don’t really feel like writing anything. Sorry about my absence on Tuesday, but real-life Monday sucked like you wouldn’t believe. Things now are…not okay, but they’re not getting any worse, so here’s your new thing.

As he often does, Rob Neyer made me think of something today. He pointed out (via some link to somebody else) that there’s something of a race for the batting champion of the decade, with Ichiro! and Pujols running pretty much neck-and-neck. Which left me wondering who led in all the various other categories, and by how much. And as long as I was wondering, I thought I might as well write about it. Verducci, the “somebody else” in Neyer’s post, did much the same thing, but I don’t care about that, and I’m going to look at some different categories and in a different way. So away we go, stats through Monday night:

Home Runs: Alex Rodriguez, 430
No surprise here. A-Rod led his league in homers five times this decade, and this is the first year he’s likely to finish out of the top eight (and he’s only four out of the top ten, with at least two of the dudes in front of him out for the rest of the season). What’s a little surprising is by how much A-Rod leads: he’s up by 62 over Jim Thome’s 368, meaning he’s hit about 17% more homers than anybody else this decade. The 1990s’ leader was Mark McGwire, with 405. The 1980s? Mike Schmidt, with 313. Eight players have hit more than 313 homers from 2000 through 2009, and I suppose Andruw Jones or Lance Berkman could make it nine or ten with a couple hot weeks.

Runs Batted In: Rodriguez, 1227
That’s right, the unclutchiest choker ever leads the decade in the lazy man’s ultimate clutchy stat, by a comfy 125 over Pujols (approximately one season’s worth, which is appropriate since Pujols didn’t start playing until 2001). Your 1990s leader was Albert Belle (really?) with 1099, and 1980s was Eddie Murray with 996. Murray’s total would place 10th in the 2000s, right between Big Papi and Bobby Abreu.

Runs Scored: Rodriguez, 1181
That A-Rod guy? He’s a good player. And one who stayed pretty healthy for an entire numerological decade, which has at least as much to do with it. This is a closer contest than the ones above, with Johnny Damon close behind at 1110. Derek Jeter and Bobby Abreu mean that four of the top five have spent at least some of the decade as Yankees. 1990s: Barry Bonds, 1091. 1980s: Rickey Henderson, 1122. Hey, score one for the eighties, almost!

On-Base Percentage (min. 3000 PA): Barry Bonds, .517
What what what? Bonds OBP’ed over .500 for the whole decade? Somehow that shocks me. But I guess OBPing .559 in 2001-2004, four of his five full years in the decade, will do that. Todd Helton is a distant second with a Coors-aided .439, with only three other players within 100 points of Bonds. Frank Thomas led the nineties at .440 (Bonds just behind at a merely fantastic .434); 1980s, Wade Boggs at an equal but more dominant .440.

Slugging Percentage: Bonds, .724
Naturally, and well ahead of Pujols at .630 (though Pujols will end up with nearly 2000 more plate appearances in the decade). 1990s: McGwire, .615 (Bonds right behind again at .602); in the 1980s, Schmidt at .540. In the aughts, you’d have to go to #19 before you drop below .540; Schmidt slots between Teixeira at .542 and Bagwell at .534.

OPS+: Bonds, 221
Well, duh. Pujols second at 173, then Manny at 160. Theoretically, this should be pretty constant across the decades, and it almost works that way, but doesn’t. Bonds paces the nineties again at 179, Schmidt the 80’s at 153.

Stolen Bases: Juan Pierre, 455
That surprised me a little, but Pierre has played since 2000 and was a regular from 2001 until late 2008, while Carl Crawford (#2 but way behind at 359) didn’t play full time until 2003 and missed about a third of 2008. 1990s: Otis Nixon, 478; 1980s: Rickey Henderson, 838. Rickey led that decade by a whopping 255 (over Tim Raines) and missed leading the 1990s by 15, coming in second place. He was #105 in the 2000s.

Hit By Pitch: Jason Kendall, 155.
Up by 17 on Jason Giambi. I never thought of A-Rod or Jeter as guys who get plunked a lot, but they’re both in the top ten; lots of plate appearances -> lots of stray inside fastballs, I guess. Chase Utley has been hit 104 times despite not becoming a regular until 2005. Craig Biggio was hit 147 times in the 90s (and was fourth in the 2000s at 132). Don Baylor crushed everyone else in the eighties with 160, 52 more than Chet Lemon and more than three times as many as #8 Lloyd Moseby.

Sacrifice Flies: Mike Lowell, 76.
Now that’s a surprise. One leadoff triple by Denard Span could mean that Lowell gets tied by the even more surprising Orlando Cabrera, now at 75, and don’t count out the less surprising Carlos Lee (74). After that, you hit Abreu at 66, and I don’t think he’s getting ten sac flies in three weeks. Frank Thomas had 82 in the nineties, Andre Dawson 74 in the eighties.

Double Play Groundouts: Miguel Tejada, 222.
Again, the identity of the leader is surprising, but even more surprising is the margin; Miggy is crushing Paul Konerko and his 193. Belle led the 1990s at 172, and Jim Rice predictably dominated the 1980s with 224. Rice’s 224 trumps Tejada’s 222 by more than it looks like, considering that (a) Julio Franco was second in the eighties at 166, which would’ve been seventh in the aughts, and (b) Tejada took over a thousand more plate appearances than Rice did to arrive at his total.

Plate Appearances: Bobby Abreu, 6864
This one could very easily change hands before the end of the decade, as Derek Jeter is only six behind Abreu and is batting leadoff for the best offense in the majors. Next is Tejada, a hundred behind Jeter. Biggio had 6794 in the nineties and Dale Murphy had 6540 in the eighties.

Hits: Ichiro!, 2005
He’s 85 ahead of Jeter or anyone else for the decade, which is especially impressive when you consider that he was in Japan for the year 2000. Going down the rest of the list, Pujols is the next one you’ll see that did not play at least a little big-league ball in both 2000 and 2009 (he’s ninth at 1697), and to find the next such player, you’d have to go all the way down to #33 and Jeff Kent, who retired after last season and may end up 600 hits behind Ichiro for the decade.

Three Comparisons

September 10, 2009
One: Half-Season MVP Division

Through their first 42 games with their new, National League teams:
Manny Ramirez, 2008: .395/.478/.743 (1.222 OPS), 29 R, 14 HR, 43 RBI
Matt Holliday, 2009: .379/.437/.702 (1.139 OPS), 33 R, 12 HR, 41 RBI
(thanks to the StL P-D for that one.)

Two: I Told You So Division

Orlando Cabrera, since August 1: .256/.283/.353 (.636 OPS), -6.2 UZR (yes, -6.2 runs in 34 games. I mean, what?)
Nick Punto, season: .220/.320/.275 (.595 OPS), +1.4 UZR

Three: Obviously, They’re Just Being Cheap Again Division

Since June 3:
Nate McLouth: .264/.353/.439 (.792 OPS), -5.2 UZR
Andrew McCutchen: .278/.355/.470 (.826 OPS), +2.4 UZR

Prince Albert and the Crown

August 18, 2009

The other day, I opined in passing that, standing first in HR and RBI and (then) fourth in batting average, Albert Pujols had the best chance to win a Triple Crown that we’d seen in a good long while.

And, well, does he, really? I mean, it’s obviously still not likely (it never is), but what are the chances? You probably know by now that I’m not going to sit here and give you precise mathematical odds, but let’s look at the English major’s version of the question: can we envision it actually happening?

Albert went 1-for-4 on Monday, so this morning is batting .325. Leader Hanley Ramirez’s Fish didn’t play, and he’s been on fire lately and now stands at .356. Already not looking good. Pujols does have the HR lead by one over Mark Reynolds, though (39 to 38 after both hit one yesterday), and is just two behind Prince Fielder for the RBI lead, 105 to 107.

I’m going to commit a big no-no right off the bat and assume away HR and RBI. ZiPS calculated for the rest of the season thinks Albert ends up with 50 HR and 138 RBI, and that that will best Reynolds by two in the former and drop six behind Fielder in the latter. So even in the two categories he’s closest in, he’s only a favorite to hold one of them. But I’m going to assume he does get both; it just feels like the more likely result to me, and anyway, the bigger hurdle will obviously be the batting average. Also, if Pujols goes on the kind of hot streak he’ll need to in order to win the batting title, odds are he’ll be piling up the HR and RBI too. So in reality, I’m sure there’s not even a 50% chance that Pujols ends up leading in both HR and RBI, but let’s just say he does it.

Now. The Marlins have 44 games left, and Hanley has averaged 3.88 AB per game played. Say he starts every one of those 44 games; at that rate, he’s got 170 more AB. This season, he’s been BABIPing out of his head, with a .404 batting average on balls in play that’s unsustainable by anybody; his pre-’09 career BABIP was approximately .340. So say he reverts back to that, and maintains his current HR and strikeout rates. He strikes out in 18% of his ABs (31 times), homers in 4.2% (7), and gets a hit in 34% of the remaining 132 (45). That makes him 52 for 170, a .305 BA over the rest of the season (seems unrealistically low, doesn’t it? Wonder if I’m doing something wrong…oh well, pressing on). That still puts his overall 2009 batting average at a robust .340.

By the same AB/G * Games Remaining formula, Pujols ought to have 147 AB left in his season. He’ll need 57 hits in those 147 AB–a .388 batting average the rest of the way–to put him at 192/563 = .341 for the year.

Pujols has been a bit down on BABIP this year (.294), either because he’s been unlucky or because he’s hitting more flies and fewer liners. But let’s assume, again, that he gets back to his career BABIP (.321) and keeps the other rates the same. 11.4% Ks (16), 9.2% HRs (14), and 32.1% of the remaining 117 ABs are hits (38). That makes him 52 for 147 (.354), and puts him at just .332 for the year…but if just five more hits fall in (or leave the yard) somewhere in there, he’s right where he needs to be.

Doesn’t sound too bad, right? Not likely, sure, but with just five hits’ worth of better-than-average luck and with a slide back toward the mean by Hanley, it could happen! And just last year, from July 10 to August 31, Pujols played in 45 games and hit .392. So I’m not sure there’s anything Pujols can’t do, but if there is, hitting .388 in 43 games ain’t it.

So, sure, it can happen. If Hanley slips back to .340 or so (if he stays at .356, Albert has to put up a .450 average the rest of the way to catch him). And if the current #2 in average, Pablo Sandoval at .330, doesn’t finish just as strong as Pujols does. And if Pujols holds off Reynolds for the HR title and Prince for the RBI one.

So the odds of this actually happening are probably tiny. Not statistically insignificant, not one in a million, but small enough for most of us lay folk to write it off more or less completely. Still, though, it’s absolutely possible (certainly more likely than Mauer hitting .400, which we’re still hearing a lot about), and probably the “best” odds at this point in the season that anybody has had in many years. I think it’s something we should really keep an eye on for at least the next week or two (though if he goes 0-for-9 in the next two days or something, it’s basically all over).

Weird Wright

June 18, 2009

Hey, real baseball!

By any reasonable analysis you want to do, David Wright is having the best offensive year of his career. He has (through Tuesday) a career-high 161 OPS+, .430 wOBA, and already has 6 wins above replacement according to BP’s WARP3 (which is insane). He’s leading the NL with a .365 batting average (40 points over his career high) and a .458 OBP (42 points over his career high), while posting a .526 SLG that’s right in line with his career average of .532. He’s even stolen 18 bases, second in the NL (though he leads in CS with 8, already a career high in that category, so he’s barely breaking even when he runs and probably should go back to being more selective).

The amazing thing you probably already know is this: Wright, who has a career full-season low of 26 HR, is doing all this while having hit just four homers all year. He’s on pace to hit 11 all season, or three fewer than he hit in 283 PA as a 22 year old rookie in 2004. He’s balancing some of that out with doubles, but he’s only on pace for 8 more of those than in ’08 (50 total, but he’s always hit a lot of doubles), so his Isolated Power is down 70 points from ’08; that SLG is being sustained mostly by that astronomical batting average.

Some have written that it’s too hard to hit HR in the Mets’ new park, so you might think that had something to do with it. Doesn’t look like it, though; while overall scoring at Citi is pretty low, it’s actually been the fifth most homer-happy park in the Majors so far, and in fact Wright has hit three of his four homers at home.

It gets weirder still. Look at these numbers (lifted straight from FanGraphs):
GB/FB: 0.95 (2008), 0.94 (2009)
LD%: 25.6% (2008), 25.9% (2009)
GB%: 36.2% (2008), 35.9% (2009)
FB%: 38.2% (2008), 38.2% (2009)

So Wright is hitting line drives, grounders and fly balls in almost exactly the same proportions as he did last year. Even fewer of those fly balls (4.6% this year, 7.6% last) are staying in the infield. We’d expect him to be hitting HR at more or less the same rate, even a tiny bit better…but, well, obviously, that ain’t happening. You have to assume he’s getting unlucky, homer-wise; he has to be hitting the ball pretty hard to maintain that BA, but the ones in the air just aren’t carrying quite far enough.

So, we should expect the homers to come around. He’s not likely to hit 30 again this year, but it’s not unreasonable to expect him to hit ’em at a 30-HR pace from here on out (which would give him a total of about 22 for the season).

But there’s a big, huge, flashing neon warning sign for Wright that has nothing to do with his HR power or batted ball types, and this is the incredible part to me: Wright is putting up that huge batting average not only while keeping the ball in the park when he does hit it, but while striking out once per game. He’s struck out between 113 and 118 times in each of his four full seasons, but now he’s already struck out 61 times in 61 games, which over a full season would top his career high strikeout total by 40+. His walk rate is up very marginally, while his strikeout rate is up by over a third. That’s bad.

It’s been a while since I’ve talked about BABIP, so let me just remind you: that sort of thing (a strikeout per game + a .365 BA) just doesn’t happen. It varies a little based on the percentages of GB/LD/FB players hit, but when they don’t hit a homer or strike out, we expect everybody to have a 30% or so chance of getting a hit (that is, a .300 BABIP). Wright’s BABIP right now (well, through Tuesday) is .485. By comparison, Joe Mauer is hitting a ridiculous .429 right now, and his BABIP is “only” .443. Ichiro! is hitting .354, pretty close to Wright’s BA, but with a BABIP of .374; he’s done it by striking out about 1/3 as often as Wright.

A different perspective: Wright’s .485 BABIP leads the #2 (PA-qualified) guy in the majors in that category, Kevin Youkilis, by 76 points. There is no one within 76 points of Wright, and then there are 43 guys within 76 points after Youk. The 2008 leader BABIP’ed .396, 89 points below Wright’s ’09 number.

So you get the point by now: it’s not going to last. Something’s got to give–Wright has to start making better contact, or his batting average will start coming way, way down, and then if he doesn’t also start hitting home runs (and playing better defense, which is another weird thing I haven’t even touched on here), it’ll take a huge chunk of his value right down with it.

Wright has had an amazing first 62 games, and is an amazing player. There’s really no telling what this guy can do. But I’m pretty confident in this: whatever he does, he’ll look like a very, very different player over these last 100 games than he did over the first 62.

If it’s May 9 rather than June 9…

June 9, 2009

…and your team’s MVP candidate is hitting .228/.343/.447, do you worry?

Because that’s Ian Kinsler’s line since May 6 (the season started on April 6, so if this were a month earlier that would take us back to about game 1). Fortunately, back in the real world, he hit .321 and slugged .652 for the first five weeks or so. So since May 6 he’s lost 47 points of average, 14 points of OBP and 103 points of SLG, but he’s still a .905 OPS second baseman, not some .228-hitting disappointment. For now.

Another one: his season numbers are still awe-inspiring, because he hit .400 for the first month or so. But do you think Miguel Cabrera would be getting feature stories right now if the first baseman had put up an .839 OPS with 3 homers through May 9, rather than from May 6 to June 9?

On the other hand, how do you suppose the New York media would react if Mark Teixeira had waltzed into the city and hit .350/.417/.761 with 12 HR in his first month-plus, rather than his second?

Do you think there would be any doubt about his All-Star chances if Ichiro! had hit .400/.439/.538 in April-May rather than May-June? Would the media get off David Wright’s back a little bit if he had been hitting .388 with a .500 OBP on May 9?

One thing that drives me crazy is the way that, at least with regard to position players, each passing month is a little less important to us than the last, until you get to September (and that’s assuming you’re in a pennant race). If a guy hits .400 in April but then hits .200 in May, he’s still a good bet to make the All-Star team, while if he hits .200 in April and .400 in May, he’s probably still considered a disappointment come June (unless somebody noticed and gave him the Player of the Month Award or something). The April stats count for all the hype, and the October stats count for who’s “clutch” and who’s not, and all the stuff in the middle just kind of happens.

But if the Mets win by a game or two, Wright’s enormous early-May-to-early-June will have been as big a part of it as anything Delgado or Reyes or Beltran could possibly do in August or September. With that decimated lineup, being only three games out at this point is a miracle you can attribute almost exclusively to the wonders that are Wright and Santana. Yet if Wright slips a bit in September (or even if he’s his usual stellar self, but is perceived as being “not clutch”), he’ll be widely regarded as a failure again. These games (and these stats) count too, people…

In Defense of Compassionate Sabermetricism

May 23, 2009

If I’m going to have a horribly unhealthy, gut-busting, productivity-killing Friday lunch, I’m a big fan of Panda Express’ Orange Chicken. And there’s a decent copycat place a couple blocks from the office, but it was a nice day yesterday, and I was up for a walk, so I went for the real thing. To get that, you have to head to the James R. Thompson Center, a big gathering point for a lot of Chicago that, as I understand it, houses some government offices and whatnot. The Panda Express is really all I’m interested in.

So I get there, and there’s this big protest going on right outside the building. Up close, people are waving signs about the right to life and how gay marriage is destroying our families, milling about in the general neighborhood of someone who is speaking ineffectively into a megaphone, while across the street is another group of people doing their best to drown out this first group with shouts like “What do we want?” “Abortion rights.” “When do we want it?” “Now!” and “Fascists go home!” and I’m thinking to myself, what are these people (any of them) doing here, really? Do they expect to convince anyone by labeling the other side murderers or fascists, or by just being louder? Or do they just like to hear themselves talk? Is there just nothing better to do on a pleasant Friday leading into a holiday weekend?

That’s basically how tHeMARKsMiTh sees the world of baseball fans and writers: the internet-savvy sabermetric crowd against the talk-radio-and-newsprint traditional crowd, both sides trying to shout each other down, never getting anywhere. (Of course, that doesn’t even remotely do justice to his post. Read it yourself; I’ll still be here when you’re done. Ready now? Good.) A couple basic things to get out of the way:

  1. I agree with most of his main points. There’s a lot of shouting into the abyss that goes on on both sides, a lot of name-calling and making fun, and it’s hard to see how any of it does anything at all other than making people on the same side feel smug and superior at the other side’s expense. (Okay, I have to make an exception for these guys, who were just too funny. And JoePoz, who’s kind of a fence-straddler, anyway. But otherwise, I don’t see the point.)
  2. I don’t think traditional stats (or most of them, anyway; sorry, Holds and Fielding Percentage) are completely worthless. You’ve seen me use HR and RBI a bunch of times already. Stats like those give context; even if you believe that VORP or WAR or Win Shares are a perfect measure of player value, think of the traditional stats as the splash of color in the crystal-clear black-and-white picture. They tell the story: what kind of hitter he is, where he likely hit in the lineup, and so on. WAR will tell you that Mark Teixeira and Carlos Beltran were almost exactly as valuable as each other in 2008, but don’t you want to know a little more than that? That’s where I think runs, RBI, HR, SB, and so forth come in handy.
  3. Another main point of Mark’s is that neither side has it completely right. I agree with that, too: there’s not much “right” about picking an MVP based on who has the most HR or RBI or Saves, and sabermetric analysis is certainly far from perfect as well — all you need to do is look at how much the various metrics (WARP vs. WAR, plus/minus vs. UZR) disagree with each other.

But where I disagree with Mark is: I don’t see this as being like the abortion or gay marriage debate at all. In those debates, like in the “dialectic” Mark envisions, there are really only three plausible truths: (a) one side is correct; (b) the other side is correct; or (c) the answer is somewhere in the middle. If you have one side that believes that abortion should be legal in all circumstances and one side that believes it should be banned in all circumstances, that’s as far as it goes; it can’t be more legal than the first side wants it, and it can’t be more illegal than the second side wants it. So the one true “right answer” has to be either one of those extremes or something between them.

Not so here. Our advanced metrics are flawed, but the answer isn’t some compromise between them and the traditional stats; the answer is more research, and more metrics. The metrics we have have grown out of the more traditional statistics. Saying you prefer HR and RBI to VORP and WAR isn’t at all like saying you prefer “Choice” to “Life” or vice-versa; rather, it’s like saying you prefer Betamax to Blue-Ray.

Here’s how Mark defends the traditional crowd:

Those who follow counting numbers have a point (among many). Baseball revolves around the run. It determines who wins and who loses. Therefore, should you not pay attention more to runs, RBI’s, and home runs? Home runs automatically score a run (making them slightly important) and bring in whoever is on base (making them more important). If the point of the game is to score runs than the other team, home runs and RBI’s are awfully darn important, which gave Howard the edge [over Pujols for 2008 NL MVP].

But this ignores the critical weakness of run and RBI totals (and this isn’t a criticism of Mark, who I know understands this: it’s just that I don’t think there’s any way for anyone to successfully defend this position), which is that, in every instance in which you don’t hit a home run, your runs and RBI are totally dependent upon your teammates either getting on base for you or driving you in.

This doesn’t work well for the NL race, because Howard actually did do a phenomenal job of knocking runners in in 2008 (Pujols was still the clear MVP for other reasons), but take a look at this list (I hope). In 2008, Justin Morneau finished 2nd in the AL MVP voting, while his teammate Joe Mauer finished a distant 4th, based largely (or rather, entirely) on the fact that Morneau had 129 RBI and Mauer managed just 85. If that link went to the right place, though, you’ll see that when they batted with runners on base, Mauer and Morneau drove in those runners at almost exactly the same percentage: 19.0% to 18.6%. Morneau gets that huge edge in RBI because he batted with 151 more runners on base than did Mauer. Morneau actually batted with the most runners on base of anyone in the league. Part of that, of course, is because he’s not a catcher, and thus got to play every day. But a huge part of that is that he got to hit behind Joe Mauer, and his 2nd-in-the-AL OBP!

So the RBI stat tells you who was at the plate for the final event resulting in the creation of a run, but it can actually distort your sense of how that run was created. Mauer was, hands down, a better hitter than Morneau in ’08, and played a much bigger part in how the Twins’ runs were scored. When you add in defense and adjust for position scarcity, it’s not even close. They’re very nice complementary pieces, but Morneau is the Scottie Pippen to Mauer’s MJ.

So, yeah, runs are awfully important. On the team level, you could almost say they’re all-important (almost). But to look at the HR, runs or RBI a single player has as a way of judging that player’s value is never a good idea. Even with Howard: make him the MVP because he drove a bunch of guys in, and you’re ignoring Pujols’ 100+ points of OBP and 100+ points of SLG, amounting to 100+ fewer outs and many more runs for Pujols’ team, and Pujols’ vastly superior defense, all for the sake of (a) Howard’s good fortune of having 50 more runners on base during his PAs than Pujols had in his and (b) a 2% edge in his success at driving those runners in. It doesn’t add up, or even come close.

More to the point, every one of those traditional stats is totally encapsulated in some more advanced metric or other. Whatever skills you think RBI measures, that’s also measured, and better, in SLG; or, if you think hitting with runners on base or “in the clutch” is a skill that’s worth measuring, stats like WPA/LI do a better job with that. Batting average is a fun little stat for what it is, but OBP tells you the same thing and more. Fielding Percentage is totally encapsulated by all advanced fielding metrics, like UZR and Plus/Minus.

You might think that these things (well, save OBP) are less-than-perfectly accurate, but that’s not an argument in favor of going back to the old things; it’s an argument in favor of doing more research and finding better new things. UZR may not be perfectly accurate, but it’s always, in every possible instance, going to do a better job of telling you who is the better fielder than fielding percentage will. FIP may not be perfect, but it’s better than just comparing two players’ ERAs. There may be slightly different ways to measure OPS+, but it’s always going to be better than not adjusting for era or ballpark factors at all. And so on. We can argue about how good the new stuff really is, but it’s just plain better than the old stuff (the well-grounded stuff that gains some level of acceptance, that is, not just any old thing someone thinks up).

So that’s the point: I’m not going to use the term “flat-earthers” around here. I try to avoid mudslinging of all types. I have nothing against people who rely solely on traditional stats, and I think those stats have their place. But their place isn’t in player analysis, not anymore. If you’re going to argue something like that Howard was the 2008 NL MVP and base it on traditional stats, you’re going to be wrong — simply, objectively, obviously wrong. And I’m sincerely sorry to say that. But I’m not trading in my DVD player for a VCR, and I’m not giving up my numbers for a set that tells me the same stuff, but less of it, and with more static.

Luckiest and Unluckiest Pitchers So Far

May 15, 2009

One of the most interesting of many, many interesting things on FanGraphs is the pitching leaderboards’ E-F stat, which is simply the pitcher’s current ERA minus his FIP (Fielding Independent Pitching, which I’ve mentioned a few times–an attempt to measure what his ERA “should” be, with defense, park and luck taken out of the equation). A negative number means the pitcher has been lucky — the ERA is lower than it “should” be — while of course a positive number means the opposite. So here are your leaders on both ends of the spectrum so far:

AL’s Luckiest: Trevor Cahill, A’s.
Cahill has put up some awfully strong-looking numbers for a rookie on a terrible offensive team: 2-2 with a 3.69 ERA in seven starts. His FIP, though, is an astronomical 6.18. Why? Well, he’s not striking anybody out, at just 3.23 per nine innings, and yet he’s walking more than one batter for every two innings, which gives him an awful 0.70 K/BB ratio. He’s getting by right now on some combination of luck, defense, and forgiving ballparks (he’s made four of his seven starts at home in the pitcher-friendly McAfee Coliseum, and another one at Safeco), having held batters to a very lucky .256 BABIP.
Prognosis: the kid’s 21 years old and a solid prospect, with a minor league history of very solid K rates (one of the best in the minors in ’08), respectable walk rates and almost no homers allowed, which makes me think the current flyball rate is a little fluky. He’s probably not really a 3.69 sort of pitcher right at the moment, but I doubt he’s a 6.19 one either. He should be fine.

AL’s Unluckiest: Gavin Floyd, White Sox.
Funny enough, Floyd was one of the luckiest in 2008, with a FIP of 4.77, essentially identical to this year’s 4.63. But his ERA in 2008 was 3.84; in ’09 to date, it’s 7.32. What goes around, I guess. Floyd is having more control trouble this year (4.81 walks per 9 to 2008’s 3.05), but is balancing it so far by giving up fewer HR (0.92 to 1.31). The big difference, natch, is the BABIP: he got unbelievably lucky last year at .268, and is unbelievably unlucky so far this year at .380.
Prognosis: Problem is, I don’t think the Sox or their fans would have been happy with even just a 4.63 ERA this year after what he turned in last year. So if you were expecting that, you’ll be awfully disappointed. Also, the HR rate drop doesn’t seem real; he’s giving up about the same percentage of line drives and fly balls and has an almost identical GB/FB ratio to ’08, so the only difference is that fewer of those fly balls have gone over the fence so far. That’s likely to regress, so if Floyd can’t find the strike zone more often, he could be in for a very rough year indeed. Just not 7.32 rough.

NL’s Luckiest: Jair Jurrjens, Braves.
3-2 with a 2.06 ERA in 8 starts (48 innings), Jurrjens’ start has led at least one dude (the bald guy from Princess Bride again) to believe he’s quietly becoming one of the best pitchers around. But Rob Neyer always points out that it’s really, really tough to succeed while striking out less than five per nine, and Jair is at 4.5, with a very unsustainable .244 BABIP. Accordingly, his FIP is 4.09 — still very respectable, but more than two runs higher than his current ERA.
Prognosis: Well, his opponent BABIP in 2008 was a very typical .311, but his strikeout rate was a much more palatable 6.64, and so he still posted a 3.68 ERA with a FIP that essentially matched it. And he’s only 23, so there’s reason to believe he’ll improve on even those solid numbers. His pitch speed and selection are very similar to what they were in 2008. If he can get that strikeout rate back up and start getting grounders again when it is put into play (his GB/FB ratio is less than half what it was last year) — and I don’t see any immediate reason to believe he can’t — he should be totally fine, even considerably better than the above-average pitcher his current 4.09 FIP suggests he is. He just hasn’t suddenly become Pedro Martinez or something.

NL’s Unluckiest: Ricky Nolasco, Marlins.
Strkeouts are good (7.5 per 9). Walk rate is up, but still very good (2.6 per 9). But his ERA is 7.78. FIP says it “should” be 4.34. Problem is, when a batter doesn’t strike out against him, he’s hitting almost .400.
Prognosis: That BABIP obviously can’t last, even with the Marlins’, um, unspectacular defense behind him. He is getting hit quite a bit harder than he was in ’08 — 26% of balls put in play off of him are line drives, compared to just 19% in both 2007 and 2008 — which is why that 4.34 FIP is up about six tenths from last year’s. He’ll be fine. I mean, he won’t win a bunch of games with the way the Fish are going right now, and he might not be the potential ace he looked like last year, but he’s at least an average pitcher, and is probably considerably better than that.

There Goes the Only Reason to Pay Attention to the Nationals

May 14, 2009

A few words [on/tangentially related to/somehow inspired by] Ryan Zimmerman’s just-ended 30-game hitting streak:

  • Not naming names (or linking links) here, but I can’t stand it when my fellow sabermetrically-inclined folk say that they’re bored by, or otherwise downplay, events like hitting streaks and no-hitters. Look, they’re really just oddities, not statistically meaningful. I get all that, and I bet most non-statheads would too, on most levels. But if you can’t get at least a little excited about or intrigued by this sort of thing, you’re giving credence to the tired old refrain that we’re all just misplaced accountants who don’t really like to “watch the games.” To each her own and all that, but if you can’t bring yourself to appreciate the human interest angles of little stories like this, totally fine, but please do the rest of us a favor and shut the hell up about it. It’s not like there aren’t other things to talk about.
  • On the opposite end of the spectrum, David Pinto has been all over the streak these last few days, with pithy little tidbits like this and this (along with a bunch of other, more news-y updates). My favorite part is this, explaining why the league-wide “hit average” going up eight points has led to a hugely increased frequency of long hit streaks:

    So the probability of a player getting a hit in a four at bat game prior to 1996 was 0.646. In the later period, that’s up to 0.66. That doesn’t seem like much, but remember, we’re talking about long streaks here, so we’re multiplying. The chance of a player hitting in the next 29 games goes from .00000314 to .00000584, nearly double. Now, figure that over all possible players playing at least 29 batting games, and you can see how batting streaks would have increased.

  • I’d really like to be good with numbers.
  • There have been 199 hitting streaks of at least 20 games since 1980, by my count, which is probably six or seven times as many as I would’ve guessed. Zim’s is just the fifteenth in that span, however, to last as long as 30 games. Of those fifteen, Zim’s is the eighth to have ended at exactly 30 games. Kind of weird, right?
  • I just remembered that I was at one of those streak-snapping 31st games, Sandy Alomar’s at the Metrodome in July of 1997. That’s one of the least enjoyable notable games to be present for, since of course you’re really there hoping he does get a hit (even when he’s on the other team…especially when your own team sucks).
  • Of the fifteen thirty-plus-gamers, only three — Hal Morris, Vladimir Guerrero, and George Brett — had career batting averages of over .300 through the year of their streak, though four more of them were over .290. Zimmerman’s career average sits at .288 (though, interestingly, he’s never had a full season end that high). Anyway, they’re all over the map. Eric Davis had the lowest career average at the time of his streak, at .269.
  • A more common thread connecting the 30-game-streak club is that they’re all free swingers; you don’t get a hit a day by walking a whole lot. None of the fifteen had ever walked 80 times in a year as of the season in which he had his streak (Vlad, Brett, and Luis Gonzalez did it in seasons coming after their streaks…but all with the aid of more than 20 intentional passes), and for most of them, even 70 walks was a pipe dream. Benito Santiago, for instance, hit .300 with a .324 on-base percentage (16 walks) in his “streak year” of 1987. Rollins, Guerrero, Morris, Alomar Jr., and Nomar have very little to talk about with the likes of Jack Cust and Adam Dunn at hitters’ cocktail parties.
  • The best performance during a 30-game streak, predictably, was by the great George Brett; in the middle of his .390 season of 1980, Brett hit .467/.504/.746 (1.250 OPS) while hitting in 30 straight games from July 18 to August 18. Paul Molitor deserves a mention, too: he’s had the longest streak in this time frame, a 39-gamer in 1987, and posted a 1.178 OPS throughout.
  • The “worst” performance during a 30-game streak, also predictably, was turned in by Jerome Walton. He won the Rookie of the Year Award in 1989, his only decent year with the bat, and hit in 30 straight from July 21 to August 20, putting up an .801 OPS that wasn’t all that much better than his year-long .721 line. Dishonorable mention goes to Willy Taveras, he of the 74 career OPS+, who hit in 30 straight games while still managing only an .830 OPS (though that was a good sight better than his putrid year-long .672).
  • In one of his posts on the subject, Pinto wondered whether this year’s Nationals were the worst team ever to have a hitter with a streak this long, and the answer, since 1980, is…well, probably. Vlad’s 1999 Expos lost 94 games; at 11-21 entering today, the Nationals would have to play .438 ball the rest of the way (57-73) to lose only 94 games. Not a terribly lofty goal, but I don’t see it happening, do you? [Edit: Benito’s ’87 Pads lost 97. So the Nats will have some fairly stiff competition for that title, actually, but I still have faith in them.]
  • The stat report I set up to look at all these streaks, if you’re interested, is here.

Re-Projecting Youkilis

May 5, 2009

Content is going to (continue to) be a little light over here for the next couple days. Real work beckons.

Here’s a fun little exercise. Everybody knows it’s early…but it’s not that early. Lots of guys are doing a lot better, or a lot worse, than anybody expected. What if we were (well, specifically in this case, PECOTA was) right about those guys all along…starting now? That is, from today forward, the hitter performs exactly as we expected. What does that end up looking like?

We’re going to start with the guy they used to call the Greek God of Walks.

Kevin Youkilis’ entry in Baseball Prospectus 2009 lauds Youk’s sudden transformation “from an above-average, patient hitter into a legitimate power threat,” but then hints pretty forcefully that it’s all a mirage. The book notes that a number of his homers just barely cleared the wall, and that he put up an awfully high .347 BABIP that we can expect to come back down. Faced with his impressive .312/.390/.569, 29 HR, 91 R, 116 RBI from 2008, PECOTA saw this line from him in ’09, which must’ve been awfully disappointing to The Nation:

.275 .366 .475 21 81 84

To date, though (through Sunday, actually), Youkilis has put up this line, leading the league in average, OBP and SLG and in the top ten in just about everything else:

.407 .519 .714 6 23 20

If we start with that line and then give him another 491 PA/441 AB (PECOTA’s projected PA minus the ones he’s already had) at exactly the rates that PECOTA projected for him above, then (so, he hits .275/.366/.475 the rest of the way), we get this final combined line:

.296 .393 .518 24 89 89

The runs and RBI still look a little low, and honestly, it’s hard to see anybody hitting in the middle of that Red Sox lineup and not ending up with 100 of both. Otherwise, though, that line is a pretty gigantic jump from what PECOTA had him pegged at. If PECOTA was exactly right about his true talent and he performs exactly to that talent the rest of the way, his hot start nonetheless lets him coast to near-superstar-level numbers. On the other hand, if, as is at least equally likely, PECOTA was wrong and 2008 was a lot closer to his true talent, this start could propel him to a runaway MVP season. Amazing what one little month can do.

The Importance of Catching Strikes

April 29, 2009

We’re going Twins-related again (and graphics-free today), and then yet a third Twins post tomorrow, probably, and back to regularly scheduled programming with a non-Twins gameblog on Friday morn.

If you have Extra Innings, or MLB.TV, or live in Minnesota or central Florida, try to take some time out to catch an inning or two of the Twins-Rays game tonight. Not because I expect it to be a great game, really; they’re two pretty interesting teams, I think, and Kazmir is on the hill, but I don’t expect it’ll be making Lar’s Most Interesting this morning or anything.

But, see: Mauer is set to be back for Friday’s game, and the Twins are off tomorrow, so this should be the last chance you get for quite a while to watch Jose Morales catch.

After a rough start, I’ve come around on Morales. He’s a switch-hitting catcher, which is rare enough in itself (there’s a chance he might move into a tie for 48th place tonight on the all-time-plate-appearances-by-a-switch-hitting-catcher list, with 50), and he can hit a little. But that’s not why I want you to watch.

He might be the worst defensive catcher since Matt LeCroy, and that’s kind of entertaining — his throws to second seem to stop for cheese and crackers somewhere above the mound, and he’s lost a couple of very routine foul pops — but that’s not it, either, not really.

No, I’d like you to watch part of this game because I’d like you to notice how Morales catches each pitch. That’s it! See, as I’m sure you know, most professional (and college, and a lot of high school) catchers practice a technique called framing the pitch, whereby you subtly nudge your glove back toward the strike zone as a close pitch comes in, hoping to get your pitchers a few extra called strikes over the course of the game. (Little white lies make up about 40% of baseball, if you haven’t noticed.)

Morales, I’ve convinced myself, does exactly the opposite, stabbing at pitches that should be strikes and effectively driving them well out of the umpire’s idea of the strike zone. I’ve seen pitches that defined the very concept of “down the middle” called balls because Morales almost falls on the pitch, pushing it down toward the batter’s ankles as he catches it. Just watch and see if you see what I see, I guess, because I can’t believe I haven’t heard anyone comment on it.

Like I said, I like Morales. But he’s very likely going to be getting an all-expenses-paid trip to Rochester tomorrow, and this is something he’s going to have to work on. Not only is it frustrating to watch, but an extra ball here and there can make a much bigger difference than most people realize.

Say you have an average AL hitter on an 0-1 count. If the next pitch is a strike (and called such), you have the hitter at a huge disadvantage; the American League as a whole hit .172 with a .245 SLG on PAs with the last pitch coming on 0-2 in 2008, and just .185 with a .274 SLG in PAs in which the count was 0-2 at any point in the at-bat! Meanwhile, the league hit a shocking .330 BA/.519 SLG swinging on 1-1 counts.

Look at those numbers again…I think everybody knows that the count is important, but that important? An average hitter becomes an average-hitting pitcher on an 0-2 count, and the same hitter becomes an MVP candidate when he swings on a 1-1 count. So if Morales stabs at an 0-1 pitch and turns what should have been a strike into a ball, he’s essentially transformed the hitter from Roy Oswalt into Lance Berkman (if the hitter swings at that pitch, that is — the stats after a 1-1 count are much closer to the overall league average, because the possibility of a strikeout comes back into play — but still: would you rather face a league-average hitter or Oswalt?).

I don’t really believe in the surpassing importance of catcher defense; I don’t think having a guy with a cannon arm or superior wild-pitch-avoiding ability is going to win that many games for you. Matt LeCroy could have caught for my team just about any time, back when he could hit. But from watching Morales and looking at those stats, I’m starting to believe that whatever else he can or can’t do, a catcher who doesn’t know how to frame a pitch can lose his share of ballgames for you.

Do any other catchers do this? I feel like framing is such an ingrained practice that every single professional catcher does it without drawing attention, but maybe this sort of shortchanging one’s own pitcher is more common than I think and I just haven’t been paying attention? I’m sure there’s a study to be done there (adjusted called strike percentage for catchers against average, or something)…