Fun with RBI Opportunities

September 1, 2009

I’m a big fan of the RBI Opportunities Report over at Baseball Prospectus. Sort by OBI% (and enter a minimum number of PA), and it shows you the percentage of runners on base during each hitter’s PA that that hitter has driven in. If you want to impress on somebody how completely context-dependent players’ RBI totals are, it’s really nice to be able to head over there and point out how David DeJesus is doing a considerably better job driving in the runners that have gotten on base for him than Mark Teixeira is, or that Yunel Escobar comes out ahead of Ryan Howard.

But that’s not totally fair to the sluggers, is it? I mean, really, a player has one RBI opportunity in each plate appearance that’s not accounted for here, because he could always drive himself in, and of course Tex does a much better job of that than DeJesus does. So to get the true percentage of potential RsBI converted, counting the hitter himself as one such potential RBI, we’d have to do this (where “ROB” is the total number of runners on base for the hitter’s times at bat, according to the BP report linked above):

RBI / (PA + ROB)

Right? I’m sure this isn’t by any means a new idea, but I haven’t seen it. I don’t have the kind of database I’d need to really do this for everybody (I’m pretty sure I could do something like that, but haven’t got around to figuring it out yet), so let’s take a look at just a few.

First, the leaders in RBI total in each league (all stats through Sunday):

Teixeira 579 410 101 .1021
Morneau 541 382 96 .1040
Morales 498 329 94 .1137
Bay 524 356 92 .1045
Pena 539 372 92 .1010
Longoria 536 371 90 .0992
Markakis 585 384 89 .0918
Martinez 557 293 86 .1018
Abreu 539 357 86 .0960
Hill 591 369 85 .0802


Fielder 575 399 119 .1222
Howard 562 412 111 .1137
Pujols 566 373 110 .1170
Braun 565 397 95 .0988
Dunn 549 400 91 .0959
Reynolds 535 350 90 .1017
Zimmerman 569 404 90 .0925
D.Lee 501 362 87 .1008
Ethier 561 399 87 .0906
H.Ramirez 530 338 85 .0979
Kemp 539 366 85 .0939

First of all, note that this is a terrible way to measure the value of anything. It gives a huge advantage to guys who get the most opportunities with runners on base (much less of an advantage than just counting up RBI totals gives them, but still) and punishes guys who get lots of PA without runners on, since your odds of hitting a solo HR are a lot worse than your odds of getting a runner home from third. But anyway, it was just for fun. And as I said, the Baseball Prospectus OBI% method has the opposite problem, so I thought I’d give the HR hitters a bit of a bump.

So. Teixeira is not the best in the American League at driving in runs on a per-opportunity basis. Fielder may be the best in the NL; he’s at least the best out of the top 11 in total RBI (note that he also leads the NL in plate appearances, so his dominance of this made-up statistic is especially impressive).

But you know, all those big guys at the top still do a pretty good job by this method. Still, though, to lead the league in RBI seems to take a healthy amount of ability and luck: of the four guys with over 100 RBI as of Sunday, three of them have seen the first (Tex), second (Howard) and sixth (Prince) most runners on base during their PAs. The fourth, of course, is Pujols, who is just 15th in ROB.

Your luckiest, most opportunity-dependent RBI leaders are the guys you’d actually expect…especially Aaron Hill. A lot has been written lately about how Aaron Hill is suddenly this big surprise run producer, and yeah, it’s pretty shocking that he’s cracked out 31 homers. But with a .322 OBP, he’s just a slightly above average hitter (.350 wOBA). He’s acquired all those RBI (and to some extent all those HR) by having the most plate appearances in the majors. He’s also 8th in the AL in ROB.

Of the 10 RBI leaders in the AL, 6 are in the top 10 in the league in ROB. 8 are in the top 14. Morales and V-Mart are the oddballs (-slash-impressive run producers) at #24 and 48, respectively.

Of the 11 RBI leaders in the NL, 7 are in the top 10 in ROB, and 10 are in the top 17.
Hanley Ramirez is your real Mr. RBI (well, after Fielder), way down at #26.

So, yeah. Next time someone starts talking about a Good RBI Guy, you can correct them–about 90% of the time, he’s more of a Lots of RBI Opportunities Guy.

A few more, just for fun:
Mauer…..464…..268…..79…… .1079 (better than all of the above AL guys except Morales).
Jeter…..582…..296…..60…. .0683 (all those leadoff PA make this a useless stat for him).
Utley…..556…..338…..84….. .0939 (if only he could be hitting behind Utley every day).

Okay, that was pointless but fun. Next time, something poignant but mind-explodingly dull.

MVPs and RsBI

July 21, 2009

Sorry for a third consecutive Twins-related post (this one doesn’t have much to do with the Twins at all, it just starts out that way), but DicknBert really ticked me off the other day.

This was Friday night, the same night that Alexi Casilla made me (and apparently Billy Smith) wish the second base position had never been invented. It was the second game back after the All-Star break, and the “ question of the day,” or some similarly silly promotion, was: who are Dick and Bert’s picks for MVP through the first half of the season?

Both picked Albert Pujols for the National League MVP, which is the only pick a thinking person can make. But Bert goes first, and his AL MVP pick (stats up to the last date he could’ve made the pick, through July 16) is:

Jason Bay. .260/.380/.527, 20 HR, 72 RBI, 56 R, 125 OPS+. He’s a left fielder, and probably the worst one in the league; UZR says he’s already cost the Red Sox 8.1 runs, or essentially 1 win, with his defense alone.

Then it’s Dick’s turn, and he starts out by indicating he agrees with Bert on his NL pick, but disagrees on the AL. Thank God, I think. Dick’s pick:

Torii Hunter: .305/.380/.558, 17 HR, 65 RBI, 56 R, 140 OPS+. He’s a center fielder, and has always had a sterling defensive reputation, but the stats have never agreed, and this year UZR has him at -2.1 runs.

So kudos to Dick Bremer, I guess, for picking a much, much more valuable player as his Most Valuable Player than Bert did; Torii is the better hitter, plays the more important position, and has been the much less damaging defender. But it should go without saying that neither of these guys is anywhere near the actual most valuable player in the American League.

And then I started thinking: what do these guys have in common? And then Blyleven listed off all the other guys he could have picked: Miguel Cabrera, Justin Morneau, Mark Teixeira, Evan Longoria…and that’s about when it dawned on me.

1. None of these guys is Joe Mauer.
2. On a related note, each of these guys is near the top of the league in runs batted in.

Now, let’s be clear about this. He had a huge slump over the weekend that has muddied the waters a bit, but as of July 16, there was only one remotely reasonable selection for AL MVP, and that was Joe Mauer. There’s just no debating that. You could’ve made an argument for somebody else, but you would’ve been indefensibly wrong. Check it:

.373/.477/.622, 15 HR, 49 RBI, 49 R, approx. 182 OPS+. He wasn’t just leading the league in batting average, or on-base percentage, or slugging percentage, or OPS, or OPS+; he was leading the league in all of those things. And he’s a catcher, and one of the best in the business; consider that while the average AL LFer (Jason Bay’s position) has a .771 OPS and the average AL CFer (Torii’s) has a .743, the average AL catcher has just a .712 OPS…and that number is significantly buoyed by Mauer himself. Aside from Pujols, there is nothing in all of baseball right now that even has a case for being anywhere near as valuable as a great defensive catcher with an 1.100 OPS. And, yeah, he missed a month, but he was still leading the league in almost every cumulative stat that attempts to measure player value, too; that’s just how much better he was than everybody else.

So there are DicknBert, Mauer’s own home team announcers, and not only do they not pick him, they don’t even mention him as being in the conversation. Morneau, sure, but not the runaway best player in the league hitting right in front of him (incidentally, the only other player even arguably in the conversation is Ben Zobrist, who also went unmentioned).

So it’s really clear to me that all they did was look at the RBI leaders and pick the one they think is having the best year (Bert didn’t even do that, he just picked the #1 RBI guy, despite the fact that he’s hitting roughly as well as you’d expect a LF to hit, and much worse than you’d expect a terrible defensive LF to hit). That would be fine and all, since it’s just two guys on a small-market local broadcast filling air space, except I’m pretty sure that that’s what the writers do, too. Here’s an ordered list of how the leader in RsBI has fared in the MVP voting the last five seasons (so 10 total contests, AL & NL):

1, 1, 2, 2, 2, 2, 3*, 5, 7*, 23

The average of these numbers is 4.8; the median is 2. In the two races with asterisks, there was a very close second-place finisher in the RBI race who finished first or second in MVP voting. The 23 throwing the whole thing off is Vinny Castilla, who had about an average offensive year in the middle of the lineup for the 2004 Rockies…if the Rox had won 94 rather than losing 94, Vinny might have wound up as the worst MVP pick in modern history.

Writers (and most everybody else) have seemingly always been in love with the RBI; I stopped with 2004 because before that, Barry Bonds stepped up, was intentionally walked approximately 800 times a year, and forced them to get away from RBI for a couple years. And of course every now and then they’ll pick a middle infielder–like Rollins in 2007 or Pedroia in 2008–but they almost never end up with the right middle infielder. The only way they end up on a non-RBI guy is: when the RBI champ is playing for a bad team (and where your team finishes in the standings should have nothing to do with how valuable you are, but that’s a discussion for another day); when other big RBI guys all have something go wrong; and when some little middle infielder is bestowed with the tag of “heart and soul” or “team leader” of some first-place team. In 2008, Morneau was the big RBI guy for the contending team, but he fell flat on his face in September, so he finished “only” second to sparky ‘n’ scrappy little Dustin Pedroia, whereas in 2006 Morneau did well down the stretch, so he won it. In both years, Joe Mauer was far and away the Twins’ MVP, and you could’ve made a case for him for league MVP too (though Derek Jeter was in the discussion in ’06 and Pedroia actually had a decent case in ’08).

The thing about it is–and I’m probably preaching to the choir here, but whatever–raw RBI total has almost nothing at all to do with a player’s value. It’s remarkably easy for a decent hitter with some power who spends 160 games hitting 4th or 5th in a high-scoring lineup to wind up in the top two or three of the league in RBI, and to be an average or worse overall player (see Ryan Howard ca. 2008 and 2009). The work Mauer did in getting on base in front of all those Morneau RBI, and in playing impeccable defense at catcher, was just much, much more valuable to the Twins, in ’06 and ’08 and again in ’09, than the RBI themselves are. And I think people are starting to recognize that, or at least the writers who refuse to recognize it are retiring or dying off and being replaced by the Rob Neyers, Keith Laws and Christina Kahrls of the world. But when the two guys whose entire livelihood is made by watching Mauer do his thing and relaying the wonder of it all to the masses can’t get this down, it really makes you realize how far we still have to go.

More About RBI and Such

May 24, 2009

Just a few follow-up things on yesterday’s super monster baseball nerd post:

almost immediately had a response for me back at his own site. It’s excellent and thoughtful, again, and there’s an interesting discussion that goes on in the comments, under which I put my own reaction to it. Go there and read it yourself, but in a nutshell, Mark wonders whether there are things that we still need to consider that are more difficult or perhaps impossible to quantify — leadership, intimidation, distracting a pitcher with the threat to steal, and so on. I think those things, or many of them, absolutely do exist and have an effect on the game, but that while sabermetricians don’t have any way of measuring those things, it’s important to note that batting average, RBI and fielding percentage don’t measure these things either. So I don’t see how these things bolster the anti-saber crowd’s arguments…except that Joe Morgan and Steve Phillips keep telling us that they do.


In the comments to yesterday’s post, Ron from Baseball Over Here pointed me toward something he wrote about six months ago in defense of the RBI, and again, I think this is excellent and definitely worth a read.

Ron starts by showing that the list of career leaders in RBI is populated almost exclusively by great players. The conclusion seems to be (correct me if I’m wrong, Ron) that RBI must be measuring something useful, if only great players are getting a lot of them.

A potential problem with that is that you need to pretty much be a great player to be among the all-time leaders in any stat. The top 14 in At-Bats, which doesn’t measure anything but one’s propensity for being penciled into the lineup and not walking or sacrificing, are all in or surely headed to the Hall [edit: or are Pete Rose]; you can draw your own conclusions about Raffy at #15, but then the next ten after him are already in too (though some you could argue about — Maranville, Aparicio). The top three batters in career strikeouts are Reggie, Slammin’ Sammy and Thome; the top six pitchers in career losses are all in the Hall. So by itself, I’m not sure that that line of thinking gets us very far.

I’ll definitely accept the general premise, though. You can even just limit it to a single season. There are definitely a few exceptions (Jose Guillen and Emil Brown were mentioned in the comments to Ron’s piece), but really, if you’ve got 100 or even 80 or 90 RBI, the odds are very good that you were an excellent hitter (for power, at least) within that season. The problem is that that extremely high-level thing is pretty much all it does; if dude A has 100 RBI and dude B has 120, we have no better idea who was the better player with that info than we had without it.

Ron is all over that. He recognizes that RBI totals are a poor way to decide the MVP race, for instance. But he concludes that RBI is a good stat anyway, because they tell us something important–how many times the guy did something that brought a run home. He has a lot of interesting analysis about how many different ways a run scores, and basically shows that, you know, RBI are usually pretty necessary to scoring runs most of the time.

But I’m not sure I understand why that makes the RBI stat itself important. We already know what a player can do that tends to lead to a run scoring (in rapidly descending order: (1) get on base; (2) hit with power; (3) run the bases well). We can track how well he does these things and get a pretty good sense of how good he is, all else being equal, at producing runs. If we already know these things, what does it add to consider RBI themselves, knowing as we do that so much of what we’re really measuring is the opportunities that that player’s teammates created for him?

I’m really asking. From where I’m sitting, it looks like RBI themselves are superfluous when you already have all the other, non-context-dependent stats that make good RBI guys good RBI guys. I’m certainly open to discussion and new ideas on this….I’m just not seeing it right now.

In Defense of Compassionate Sabermetricism

May 23, 2009

If I’m going to have a horribly unhealthy, gut-busting, productivity-killing Friday lunch, I’m a big fan of Panda Express’ Orange Chicken. And there’s a decent copycat place a couple blocks from the office, but it was a nice day yesterday, and I was up for a walk, so I went for the real thing. To get that, you have to head to the James R. Thompson Center, a big gathering point for a lot of Chicago that, as I understand it, houses some government offices and whatnot. The Panda Express is really all I’m interested in.

So I get there, and there’s this big protest going on right outside the building. Up close, people are waving signs about the right to life and how gay marriage is destroying our families, milling about in the general neighborhood of someone who is speaking ineffectively into a megaphone, while across the street is another group of people doing their best to drown out this first group with shouts like “What do we want?” “Abortion rights.” “When do we want it?” “Now!” and “Fascists go home!” and I’m thinking to myself, what are these people (any of them) doing here, really? Do they expect to convince anyone by labeling the other side murderers or fascists, or by just being louder? Or do they just like to hear themselves talk? Is there just nothing better to do on a pleasant Friday leading into a holiday weekend?

That’s basically how tHeMARKsMiTh sees the world of baseball fans and writers: the internet-savvy sabermetric crowd against the talk-radio-and-newsprint traditional crowd, both sides trying to shout each other down, never getting anywhere. (Of course, that doesn’t even remotely do justice to his post. Read it yourself; I’ll still be here when you’re done. Ready now? Good.) A couple basic things to get out of the way:

  1. I agree with most of his main points. There’s a lot of shouting into the abyss that goes on on both sides, a lot of name-calling and making fun, and it’s hard to see how any of it does anything at all other than making people on the same side feel smug and superior at the other side’s expense. (Okay, I have to make an exception for these guys, who were just too funny. And JoePoz, who’s kind of a fence-straddler, anyway. But otherwise, I don’t see the point.)
  2. I don’t think traditional stats (or most of them, anyway; sorry, Holds and Fielding Percentage) are completely worthless. You’ve seen me use HR and RBI a bunch of times already. Stats like those give context; even if you believe that VORP or WAR or Win Shares are a perfect measure of player value, think of the traditional stats as the splash of color in the crystal-clear black-and-white picture. They tell the story: what kind of hitter he is, where he likely hit in the lineup, and so on. WAR will tell you that Mark Teixeira and Carlos Beltran were almost exactly as valuable as each other in 2008, but don’t you want to know a little more than that? That’s where I think runs, RBI, HR, SB, and so forth come in handy.
  3. Another main point of Mark’s is that neither side has it completely right. I agree with that, too: there’s not much “right” about picking an MVP based on who has the most HR or RBI or Saves, and sabermetric analysis is certainly far from perfect as well — all you need to do is look at how much the various metrics (WARP vs. WAR, plus/minus vs. UZR) disagree with each other.

But where I disagree with Mark is: I don’t see this as being like the abortion or gay marriage debate at all. In those debates, like in the “dialectic” Mark envisions, there are really only three plausible truths: (a) one side is correct; (b) the other side is correct; or (c) the answer is somewhere in the middle. If you have one side that believes that abortion should be legal in all circumstances and one side that believes it should be banned in all circumstances, that’s as far as it goes; it can’t be more legal than the first side wants it, and it can’t be more illegal than the second side wants it. So the one true “right answer” has to be either one of those extremes or something between them.

Not so here. Our advanced metrics are flawed, but the answer isn’t some compromise between them and the traditional stats; the answer is more research, and more metrics. The metrics we have have grown out of the more traditional statistics. Saying you prefer HR and RBI to VORP and WAR isn’t at all like saying you prefer “Choice” to “Life” or vice-versa; rather, it’s like saying you prefer Betamax to Blue-Ray.

Here’s how Mark defends the traditional crowd:

Those who follow counting numbers have a point (among many). Baseball revolves around the run. It determines who wins and who loses. Therefore, should you not pay attention more to runs, RBI’s, and home runs? Home runs automatically score a run (making them slightly important) and bring in whoever is on base (making them more important). If the point of the game is to score runs than the other team, home runs and RBI’s are awfully darn important, which gave Howard the edge [over Pujols for 2008 NL MVP].

But this ignores the critical weakness of run and RBI totals (and this isn’t a criticism of Mark, who I know understands this: it’s just that I don’t think there’s any way for anyone to successfully defend this position), which is that, in every instance in which you don’t hit a home run, your runs and RBI are totally dependent upon your teammates either getting on base for you or driving you in.

This doesn’t work well for the NL race, because Howard actually did do a phenomenal job of knocking runners in in 2008 (Pujols was still the clear MVP for other reasons), but take a look at this list (I hope). In 2008, Justin Morneau finished 2nd in the AL MVP voting, while his teammate Joe Mauer finished a distant 4th, based largely (or rather, entirely) on the fact that Morneau had 129 RBI and Mauer managed just 85. If that link went to the right place, though, you’ll see that when they batted with runners on base, Mauer and Morneau drove in those runners at almost exactly the same percentage: 19.0% to 18.6%. Morneau gets that huge edge in RBI because he batted with 151 more runners on base than did Mauer. Morneau actually batted with the most runners on base of anyone in the league. Part of that, of course, is because he’s not a catcher, and thus got to play every day. But a huge part of that is that he got to hit behind Joe Mauer, and his 2nd-in-the-AL OBP!

So the RBI stat tells you who was at the plate for the final event resulting in the creation of a run, but it can actually distort your sense of how that run was created. Mauer was, hands down, a better hitter than Morneau in ’08, and played a much bigger part in how the Twins’ runs were scored. When you add in defense and adjust for position scarcity, it’s not even close. They’re very nice complementary pieces, but Morneau is the Scottie Pippen to Mauer’s MJ.

So, yeah, runs are awfully important. On the team level, you could almost say they’re all-important (almost). But to look at the HR, runs or RBI a single player has as a way of judging that player’s value is never a good idea. Even with Howard: make him the MVP because he drove a bunch of guys in, and you’re ignoring Pujols’ 100+ points of OBP and 100+ points of SLG, amounting to 100+ fewer outs and many more runs for Pujols’ team, and Pujols’ vastly superior defense, all for the sake of (a) Howard’s good fortune of having 50 more runners on base during his PAs than Pujols had in his and (b) a 2% edge in his success at driving those runners in. It doesn’t add up, or even come close.

More to the point, every one of those traditional stats is totally encapsulated in some more advanced metric or other. Whatever skills you think RBI measures, that’s also measured, and better, in SLG; or, if you think hitting with runners on base or “in the clutch” is a skill that’s worth measuring, stats like WPA/LI do a better job with that. Batting average is a fun little stat for what it is, but OBP tells you the same thing and more. Fielding Percentage is totally encapsulated by all advanced fielding metrics, like UZR and Plus/Minus.

You might think that these things (well, save OBP) are less-than-perfectly accurate, but that’s not an argument in favor of going back to the old things; it’s an argument in favor of doing more research and finding better new things. UZR may not be perfectly accurate, but it’s always, in every possible instance, going to do a better job of telling you who is the better fielder than fielding percentage will. FIP may not be perfect, but it’s better than just comparing two players’ ERAs. There may be slightly different ways to measure OPS+, but it’s always going to be better than not adjusting for era or ballpark factors at all. And so on. We can argue about how good the new stuff really is, but it’s just plain better than the old stuff (the well-grounded stuff that gains some level of acceptance, that is, not just any old thing someone thinks up).

So that’s the point: I’m not going to use the term “flat-earthers” around here. I try to avoid mudslinging of all types. I have nothing against people who rely solely on traditional stats, and I think those stats have their place. But their place isn’t in player analysis, not anymore. If you’re going to argue something like that Howard was the 2008 NL MVP and base it on traditional stats, you’re going to be wrong — simply, objectively, obviously wrong. And I’m sincerely sorry to say that. But I’m not trading in my DVD player for a VCR, and I’m not giving up my numbers for a set that tells me the same stuff, but less of it, and with more static.

Fun with Small Sample Sizes

April 22, 2009
  1. The Yankees sit at 8-6, but are on pace to score 810 runs and allow 972. This would make their expected (Pythagorean) record about 66-96.
  2. Then again, if Chien-Ming Wang were allowed to make 30 starts at his current pace, he’d give up 230 runs (in just 60 innings). This would be a record since 1901, narrowly edging out Snake Wiltse’s 1902 effort (in 300 innings). The record since 1950 is Phil Niekro’s 166 in 1977 (in 330 innings).
  3. Miguel Cabrera (through Monday, prorated): .489/.538/.787, 635 AB, 149 R, 310 H, 54 HR, 162 RBI
  4. Carlos Quentin: 87 HR, 162 RBI, 150 R…12 2B, 0 3B
  5. Brian Giles is hitting .151/.211/.189 (through Monday) and is on pace for twelve runs scored, zero homers…and 87 RBI. That’s how you know RBI is an awesome and totally not at all context-dependent stat.
  6. Washington Nationals (through Monday): 27-135 (.167), 770 RS, 1040 RA, Pythagorean W/L: 57-105.
  7. Raul Ibanez: .383/.442/.830, 176 R, 68 HR, 149 RBI, 14SB/0CS, about four defensive runs saved. Which totally makes sense considering the following hilarious evidence (from Lookout Landing): 1, 2, 3, 4, etc. So, yeah…it’s a long season.