Ken Jennings

Message Boards

IBM's DeepQA and the Jeopardy! Challenge

The place to talk. "On topic"? "Off topic"? We make no such petty distinctions here.

Postby christo » Mon Feb 21, 2011 4:54 pm

bradleyk wrote:Did Watson believe that that number was the wager that gave him the absolute best chance of winning, or did he "think" "I need something between 2,000 and 4,000, I'll bet 3217"?


Watson doesn't do "fuzzy math" the way people do. The bet is made based on a formula no person could solve in that amount of time, and the result is a number. For a machine, there is no particular advantage to having a number ending in zeros, so it bets the result it gets - why shouldn't it?

In the early matches everyone laughed at the "crazy bets", so we started rounding up to the nearest hundred or thousand (depending on the size of the bet). But in the first game we did that, a number of people complained, "Hey, what happened to the funny bets??? That was so endearing!" So we put it back. During the match, it was a constant source of entertainment.

-Chris
christo
 
Posts: 27
Joined: Sat Feb 12, 2011 8:41 am

Postby econgator » Mon Feb 21, 2011 5:01 pm

christo wrote: No one on our team was allowed to have a computer, and the control system for Watson was monitored at all times by one of the auditing team.

-Chris


Maybe this was mentioned elsewhere and I missed it:

How was Watson controlled? I assume that there was someone at a computer telling Watson when it was time to pick, when an incorrect response came up, etc.?
econgator
 
Posts: 3618
Joined: Mon Sep 18, 2006 6:11 pm

Postby gameshowcongress » Mon Feb 21, 2011 6:30 pm

TheConfessor wrote:For those who may be interested, here's a pre-broadcast interview I did about Watson, within the constraints of the non-disclosure agreement I signed with IBM.
http://gameshows.about.com/od/interview ... llenge.htm

I'm pleased to add that I now have a Watson t-shirt, which I proudly wore to my pub quiz last Wednesday. About half the teams paid some sort of tribute to Watson in their team names that night. I don't own any Jennings or Rutter t-shirts, but I was hoping they both would perform well against Watson and I hated to see any of the three contestants lose. At least for the purposes of this competition, I was a loyal IBMer and 100% committed to helping their side. In general, I love to get involved and support anything that helps promote game shows, and this was the biggest game show event in a long time.


I am sure there were lots of good pub quiz team names this week but one from www.geekshodrink.com was

Elementary my dear Watson, Skynet is live!
gameshowcongress
 
Posts: 648
Joined: Fri Jun 30, 2006 1:17 pm
Location: Boulder, Colorado USA

Postby geniusonwheels » Mon Feb 21, 2011 8:14 pm

And for you playing along at home, here is what Watson missed.

http://www.sporcle.com/games/smreidy113 ... dearwatson
Fun Trivia A great place for trivia.
geniusonwheels
 
Posts: 537
Joined: Wed Nov 14, 2007 4:47 pm
Location: South Carolina

some stats

Postby christo » Mon Feb 21, 2011 8:17 pm

Ken Jennings (Blog) wrote:Much of the conversation about last week’s supercomputer smackdown on Jeopardy! has revolved around Watson’s prodigious buzzer advantage. In the second game, for example, Watson answered 23 of the 30 clues on the board. I’m not really sure if I ever had a 23-answer round while I was on Jeopardy!, but if I did, it sure as hell wasn’t against Brad Rutter and Ken Jennings. Watson was very, very fast.


I'm certainly aware no one usually wants a scientist in the room because we have a tendency to end interesting bar discussions by giving precise numbers, but since Ken doesn't drink anyway, I thought I'd share some numbers.

By my count (this was by hand, I could have been off a bit), Watson won the buzz in the four rounds: 16, 21, 11, and 17 times (this is just a count of who won the buzz, not whether it was correct). Ken won 7, 3, 8, 7. Brad won 7, 2, 10, 4. Of course we don't know how often Ken & Brad were also trying to buzz when Watson won it, no doubt most of them, but we do know exactly how many of the times Ken & Brad won the buzz when Watson also tried, since the answer ticker shows when Watson's answer is above the confidence threshold. Ken beat Watson 4, 1, 2, and 4 times, Brad beat Watson 3, 2, 4, and 2 times.

For Watson that's: 55%, 71%, 38%, and 61% buzzer wins in each round, for a four round avg. of 54%. Ken, in your average game during the streak you took 65% of the board. In your best game you took over 80%. A third of your games were over 70%. Your worst game (in the streak) was about 45%. Most of the players who faced you proposed rules changes to the producers to even out your buzzing advantage. ;)

-Chris
christo
 
Posts: 27
Joined: Sat Feb 12, 2011 8:41 am

Re: some stats

Postby TheConfessor » Mon Feb 21, 2011 9:13 pm

christo wrote:I'm certainly aware no one usually wants a scientist in the room because we have a tendency to end interesting bar discussions by giving precise numbers, but since Ken doesn't drink anyway, I thought I'd share some numbers.

Chris, you will never clear out a room of serious Jeopardy fans by quoting interesting statistics. Most of us live for that stuff and can't get enough of it. My obsession is quite mild compared to some of the regulars on the official Jeopardy board.

christo wrote:By my count (this was by hand, I could have been off a bit), Watson won the buzz in the four rounds: 16, 21, 11, and 17 times (this is just a count of who won the buzz, not whether it was correct). Ken won 7, 3, 8, 7. Brad won 7, 2, 10, 4. Of course we don't know how often Ken & Brad were also trying to buzz when Watson won it, no doubt most of them, but we do know exactly how many of the times Ken & Brad won the buzz when Watson also tried, since the answer ticker shows when Watson's answer is above the confidence threshold. Ken beat Watson 4, 1, 2, and 4 times, Brad beat Watson 3, 2, 4, and 2 times.

Please correct me if I'm wrong, but I don't think your numbers truly reflect the instances when both Watson and at least one human were in an actual buzzer race, trying to buzz at the earliest possible moment. Your numbers reflect the times when Watson was eventually confident enough to buzz, but not necessarily confident at the first possible moment. It includes several instances when Watson achieved sufficient confidence only after some additional thinking time past the point of buzzer enablement. This was especially evident in the category "Actors Who Direct." On all five clues in that category, Watson was in the 80s or 90s in confidence, but he lost all five buzzer races to Brad or Ken. That's because there was no true buzzer race on those clues. The clues were very brief, with Alex reading only a short movie title before the buzzers were turned on. Ken and Brad were ready to buzz when the "go" lights came on, but Watson was not. Watson became confident enough to buzz only after Ken or Brad had already buzzed in and beat him to the clue. So you can remove those five clues and probably some others from your count of the number of times when humans beat Watson in an actual buzzer race.

christo wrote:For Watson that's: 55%, 71%, 38%, and 61% buzzer wins in each round, for a four round avg. of 54%. Ken, in your average game during the streak you took 65% of the board. In your best game you took over 80%. A third of your games were over 70%. Your worst game (in the streak) was about 45%. Most of the players who faced you proposed rules changes to the producers to even out your buzzing advantage. ;)

It's unlikely that Ken's earlier opponents were trying to buzz as often as Watson's opponents were. :wink:
TheConfessor
 
Posts: 1467
Joined: Fri Jun 16, 2006 3:11 pm
Location: Austin, TX

Postby Ken Jennings » Mon Feb 21, 2011 11:13 pm

Yeah, at the risk of sounding immodest, this is apples and oranges. It's impossible to definitively compare the number of three-way buzzer races in a Watson-Jennings-Rutter game to the number in an average regular season game, but my gut feeling is that there could have been twice as many.
Ken Jennings
Site Admin
 
Posts: 4436
Joined: Wed Jun 14, 2006 10:43 am

Re: some stats

Postby christo » Tue Feb 22, 2011 11:15 am

TheConfessor wrote:
christo wrote:By my count (this was by hand, I could have been off a bit), Watson won the buzz in the four rounds: 16, 21, 11, and 17 times (this is just a count of who won the buzz, not whether it was correct). Ken won 7, 3, 8, 7. Brad won 7, 2, 10, 4. Of course we don't know how often Ken & Brad were also trying to buzz when Watson won it, no doubt most of them, but we do know exactly how many of the times Ken & Brad won the buzz when Watson also tried, since the answer ticker shows when Watson's answer is above the confidence threshold. Ken beat Watson 4, 1, 2, and 4 times, Brad beat Watson 3, 2, 4, and 2 times.

Please correct me if I'm wrong, but I don't think your numbers truly reflect the instances when both Watson and at least one human were in an actual buzzer race, trying to buzz at the earliest possible moment. Your numbers reflect the times when Watson was eventually confident enough to buzz, but not necessarily confident at the first possible moment. It includes several instances when Watson achieved sufficient confidence only after some additional thinking time past the point of buzzer enablement. This was especially evident in the category "Actors Who Direct." On all five clues in that category, Watson was in the 80s or 90s in confidence, but he lost all five buzzer races to Brad or Ken. That's because there was no true buzzer race on those clues. The clues were very brief, with Alex reading only a short movie title before the buzzers were turned on. Ken and Brad were ready to buzz when the "go" lights came on, but Watson was not. Watson became confident enough to buzz only after Ken or Brad had already buzzed in and beat him to the clue. So you can remove those five clues and probably some others from your count of the number of times when humans beat Watson in an actual buzzer race.


I see the distinction you are making but from our perspective it is the same: these are clues where Watson knew the right answer above its buzzing threshold but was beaten to the buzzer. So yes, I included the "actors who direct" category where most of Watson's answers were not available when the buzzer enabled. We actually don't have the information to count the different events.

KenJennings wrote:Yeah, at the risk of sounding immodest, this is apples and oranges. It's impossible to definitively compare the number of three-way buzzer races in a Watson-Jennings-Rutter game to the number in an average regular season game, but my gut feeling is that there could have been twice as many.


I didn't mean to sound like I was comparing, actually I was just throwing the numbers out there as a result of you saying you weren't sure you'd ever answered 23 questions in a round: you have, and in fact it was not uncommon in your streak.

-Chris
christo
 
Posts: 27
Joined: Sat Feb 12, 2011 8:41 am

Postby Bill » Thu Feb 24, 2011 4:03 am

Ken Jennings wrote:Yeah, at the risk of sounding immodest, this is apples and oranges. It's impossible to definitively compare the number of three-way buzzer races in a Watson-Jennings-Rutter game to the number in an average regular season game, but my gut feeling is that there could have been twice as many.


I agree, completely. Everything about this tournament was apples and oranges, which of course, was the point.

I posted some thoughts about the exhibition here. In a nutshell, I think this is a significant milestone in the history of AI, but in no way does it diminish humanity's pride of place.
Bill
 
Posts: 1551
Joined: Sat Jun 16, 2007 2:32 am
Location: New York City

Postby TheConfessor » Thu Feb 24, 2011 11:29 am

Bill wrote:I posted some thoughts about the exhibition here. In a nutshell, I think this is a significant milestone in the history of AI, but in no way does it diminish humanity's pride of place.

Bill, thanks for your article. I enjoyed reading it and have a few comments about it.

Last week, an IBM computer named Watson beat Ken Jennings and Brad Rutter, the two greatest Jeopardy! players of all time, in a nationally televised event.

Brad and Ken are undisputedly the two biggest money winners on Jeopardy, but it's pretty subjective to say they are the two Greatest Of All Time. It's kind of like declaring the two greatest Shakespeare plays of all time.

Watson was named for Thomas J. Watson, IBM’s first president. But he could just as easily have been named after John B. Watson, the American psychologist who is considered to be the father of behaviorism.

He could also have been named after Bunny Watson, as played by Katherine Hepburn in the prescient humans-versus-machine film, Desk Set. In that 1957 comedy, the humans who worked in the research department of a TV network lost their jobs to an IBM computer that was crammed with the same kind of facts and trivia that they had been paid to look up all day. The action led to a climactic showdown between the human workers and the computer. Watson won that contest, but in this case, it was the human Watson who triumphed over the IBM computer. I think IBM has been plotting its revenge ever since.

Some complained that the computer’s superior buzzer speed gave it the advantage, but buzzer speed is the whole point.

It is for Jeopardy players, but it certainly wasn't the whole point for IBM's researchers. IBM hoped to impress the world with Watson's ability to answer questions, not his ability to press a button. Excellence in both parts of the game is required in order to win, and IBM has clearly made great advances in the ability to answer unstructured questions. It must be very frustrating to the Watson team that so much discussion has been focused on the buzzer and the competitive advantage that was provided by 100-year-old solenoid technology.

Watson can’t hear the the other players, which means he can’t eliminate their incorrect responses when he buzzes in second. It also means that he doesn’t learn the correct answer unless he gives it, which makes it difficult for him to catch on to category themes.

Not true. Watson is sent the correct answer at the end of every clue, after it is out of play. This is what helps Watson learn within each category, regardless of whether or not anyone provides a correct response.

This wasn’t a Turing test. Watson was trying to beat the humans, not emulate them. And he did.

I totally agree. Watson was built and trained to produce a specific result under a set of rules, and IBM tried to identify and optimize all the steps needed to reach the goal of winning a Jeopardy match. I've seen a lot of suggestions that Watson's accomplishments are somehow less valid because he achieves these results by using different thinking processes than humans use. I think that's missing the whole point about why this technology is potentially valuable. We already have lots of human doctors who think the way humans think, as imperfect as that is. With help from Watson, future doctors will be able to think more effectively and find solutions that they never would have thought of on their own.
TheConfessor
 
Posts: 1467
Joined: Fri Jun 16, 2006 3:11 pm
Location: Austin, TX

Postby Bill » Thu Feb 24, 2011 12:04 pm

Ed, thanks for reading and commenting. Quoting you quoting me would be cumbersome, so I'll just use subheadings.

Re: Greatest - Absolutely subjective, and purely my own opinion. I wasn't even using the money standard.

Re: Bunny Watson - That is magnificent. I wish I'd known to include that.

Re: Buzzer speed - In the paragraph after the quote, I clarify what I mean by buzzer speed being the whole point. The machine has certain things it needs to do before it can buzz, and the challenge is for the computer to do those things in time to beat the humans to the buzzer. Reaction time is part of the game, but not sufficient to win. So I think we agree on this point.

Re: Watson knowing correct responses - Okay, I didn't know that. I should update the post. I had been going crazy wondering where "Delete Key" came from, and now it makes a lot more sense. Thanks for the info!

Re: Not a Turing test - Well put. The goal was to win a game of Jeopardy! and not anything else. Therefore, nothing beyond that should be read into the accomplishment, though the accomplishment is impressive in itself.

Thanks again!
Bill
 
Posts: 1551
Joined: Sat Jun 16, 2007 2:32 am
Location: New York City

Postby Ken Jennings » Thu Feb 24, 2011 1:24 pm

TheConfessor wrote:It must be very frustrating to the Watson team that so much discussion has been focused on the buzzer and the competitive advantage that was provided by 100-year-old solenoid technology.


I don't really know what else they can expect, given that they built a machine that still underperforms good humans at the interesting part of its task (answering Jeopardy clues) and only vastly out-performs them on the trivial part of its task (precise triggering of a solenoid).

I'm on the record as not feeling like the contest was unfair, and I'm as impressed by Watson's progress as anyone, but I can't bring myself to shed too many tears for the poor IBM developers being forced to defend Watson's buzzing. If they didn't want gameplay-related discussion, no one was forcing them to roll out their new technology on a quiz show set. I'm actually gratified that TV audiences seem to have figured out that the buzzer provided the margin of victory; I assumed that angle would be under-reported.
Ken Jennings
Site Admin
 
Posts: 4436
Joined: Wed Jun 14, 2006 10:43 am

Watson At SXSW Austin March 16

Postby TheConfessor » Fri Feb 25, 2011 12:55 am

Chris, when I was at the IBM viewing parties in Austin last week, the local communications people told me that David Ferrucci will be doing a Watson demonstration at the big annual South By Southwest Festival next month and that I'm invited. I just saw this confirmation of the event on March 16th. Do you know anything about that? Will other Watson team members or Todd Crain be attending? If there's anything I can do to help, please let me know. SXSW is always the craziest week of the year to be in Austin, which I mean mostly in a good way, though it can get a little overwhelming if you don't enjoy perfect weather, attending parties around the clock and hearing a few thousand bands all desperately trying to get discovered.
TheConfessor
 
Posts: 1467
Joined: Fri Jun 16, 2006 3:11 pm
Location: Austin, TX

Postby SVCS08 » Fri Feb 25, 2011 2:46 pm

Ken Jennings wrote:
TheConfessor wrote:It must be very frustrating to the Watson team that so much discussion has been focused on the buzzer and the competitive advantage that was provided by 100-year-old solenoid technology.


I don't really know what else they can expect, given that they built a machine that still underperforms good humans at the interesting part of its task (answering Jeopardy clues) and only vastly out-performs them on the trivial part of its task (precise triggering of a solenoid).

I'm on the record as not feeling like the contest was unfair, and I'm as impressed by Watson's progress as anyone, but I can't bring myself to shed too many tears for the poor IBM developers being forced to defend Watson's buzzing. If they didn't want gameplay-related discussion, no one was forcing them to roll out their new technology on a quiz show set. I'm actually gratified that TV audiences seem to have figured out that the buzzer provided the margin of victory; I assumed that angle would be under-reported.


By that standard, we never can say if a champion won by knowledge or speed. But it doesn't matter. Jeopardy requires both. Ken and Brad became great champions with a combination of knowledge and speed and Watson won this one match the same way.

It's not really fair to say that Watson underperforms good humans at the interesting point of its task. This was only one match which makes it very difficult to draw that general conclusion. In another match, Watson may have done significantly better -- or significantly worse. The match was not statistically significant scientifically. But Watson won. I say let's accept it for what it is. One match. Watson won.

Besides, who wants a slower computer?
SVCS08
 
Posts: 2
Joined: Fri Feb 25, 2011 2:29 pm

Postby Ken Jennings » Fri Feb 25, 2011 4:12 pm

SVCS08 wrote:It's not really fair to say that Watson underperforms good humans at the interesting point of its task.


Sure it is. (Assuming that you find answering Jeopardy clues interesting.) We don't need to draw conclusions from a single match, because Watson played dozens of matches this summer against top players. Good human players average 50+ correct answers per Jeopardy round (playing along at home, that is; they obviously can't buzz in that often). If and when Watson's career stats are published, you're going to see numbers in the 40s. (Maybe Chris actually has a number here; obviously, I didn't get to see most of those other matches.)

Even buzzing in at will, it lost 29% of its games against top-level Jeopardy players. That's high enough to suggest to me there's a substantial differential in clue-answering ability.
Ken Jennings
Site Admin
 
Posts: 4436
Joined: Wed Jun 14, 2006 10:43 am

Postby SVCS08 » Fri Feb 25, 2011 5:47 pm

Ken Jennings wrote:
SVCS08 wrote:It's not really fair to say that Watson underperforms good humans at the interesting point of its task.


Sure it is. (Assuming that you find answering Jeopardy clues interesting.) We don't need to draw conclusions from a single match, because Watson played dozens of matches this summer against top players. Good human players average 50+ correct answers per Jeopardy round (playing along at home, that is; they obviously can't buzz in that often). If and when Watson's career stats are published, you're going to see numbers in the 40s. (Maybe Chris actually has a number here; obviously, I didn't get to see most of those other matches.)

Even buzzing in at will, it lost 29% of its games against top-level Jeopardy players. That's high enough to suggest to me there's a substantial differential in clue-answering ability.


I see your point, but I think the more telling stat for champion level players is the percentage you get right on the ones on which you're confident enough to attempt to buzz, whether you won the buzz or not. Sitting on my couch, I'm brilliant. However, playing the game I might only be confident enough to buzz in on 30 percent of the 50 clues I think I can answer from my couch. Hard to collect that stat for humans, but for Watson, it should be easy. Of the questions on which its confidence exceeds buzzer threshhold, what percentage of those does it have the correct response, whether it won the buzz or not. If it's 60 percent, I'll agree with you. But if it's confident enough to buzz in 70 percent of the time and answers those with >90 percent accuracy, then I think it's fair to say that Watson performs as least as well as good humans at this part of the task (Maybe not Ken or Brad good, but certainly as well as "good humans").

I hope Christo can shed some light on this.
SVCS08
 
Posts: 2
Joined: Fri Feb 25, 2011 2:29 pm

Watson's performance

Postby christo » Fri Feb 25, 2011 7:19 pm

The following graph shows Watson's question answering performance (on 5000 J! clues it had never seen before) compared to the "Winner's Cloud".

The winners cloud plots performance of Jeopardy game winners. Each point is a game winner - there is some random jitter to spread some of the points out that are actually the same, still there are points on top of each other. The x-axis is the % of the board taken by the winner of the game, and the y-axis is the % of those clues the winner got right. For the most part the x-axis shows the buzzer wins of the game winner, although rebounds, DDs and FJ are included. The red dots are Ken's performance on games he won.

The blueish curve is Watson's performance on 5000 clues, and shows the "confidence curve". The idea is this: at 100% Watson is a little above 70% correct. If you let Watson remove 10% of the questions that it is least confident in, it gets about 78% of those correct (so it is 78% precision at 90% answered). If you let Watson remove 20% of the questions it is least confident in, it gets about 82% correct. What this shows is that Watson's confidence estimation is pretty accurate, the questions it drops as you move from right to left on the curve tend to be wrong ones, so its precision for the remaining ones goes up. Watson is 90% precision at 60% answered, and about 86% at 70% answered.

The average Jeopardy! game winner answers (ie wins the buzz) 45% of the questions at about 90% precision. Ken averaged I think around 62% answered at 90% precision.

The reason Jeopardy! was so interesting to us is not what Ken finds interesting, ie total number of correct answers. Obviously, neither to the Jeopardy! producers - the game is designed to test knowledge, confidence, and speed. Really, we could not have designed a better way to test what we were interested in than the way Jeopardy! has been played for the past 25+ years.

Image

-Christo
Last edited by christo on Sat Feb 26, 2011 8:36 am, edited 1 time in total.
christo
 
Posts: 27
Joined: Sat Feb 12, 2011 8:41 am

Re: Watson At SXSW Austin March 16

Postby christo » Fri Feb 25, 2011 7:46 pm

TheConfessor wrote:Chris, when I was at the IBM viewing parties in Austin last week, the local communications people told me that David Ferrucci will be doing a Watson demonstration at the big annual South By Southwest Festival next month and that I'm invited. I just saw this confirmation of the event on March 16th. Do you know anything about that? Will other Watson team members or Todd Crain be attending? If there's anything I can do to help, please let me know. SXSW is always the craziest week of the year to be in Austin, which I mean mostly in a good way, though it can get a little overwhelming if you don't enjoy perfect weather, attending parties around the clock and hearing a few thousand bands all desperately trying to get discovered.


Ed, I don't really know anything about it, but send me an internal email and I'll hook you up with someone who does.

-Chris
christo
 
Posts: 27
Joined: Sat Feb 12, 2011 8:41 am

Controlling Watson

Postby christo » Fri Feb 25, 2011 8:05 pm

econgator wrote:Maybe this was mentioned elsewhere and I missed it:

How was Watson controlled? I assume that there was someone at a computer telling Watson when it was time to pick, when an incorrect response came up, etc.?


Watson was controlled by the game system itself, and a Jeopardy! staffer - I believe the same "AH" (the anonymous human Ken referred to) who enables the buzzer. Watson gets categories and clues electronically from the same system that displays them on the game board. When the buzzer was enabled, Watson got the same signal that turned on the buzzing light. If Watson won the buzz, after Alex said "Watson", the AH would press a button telling Watson to answer. There was a button for "correct" and another for "incorrect", a button for "repeat the answer" and a button for "be more specific". When the correct answer was revealed (if not by Watson), there was a button to send it to Watson. There was a DD wager button, and controls for FJ. I actually ran it once for a sparring game and it was tricky to get right. It also had a taunting button but it didn't get used.... ;)

-Chris
christo
 
Posts: 27
Joined: Sat Feb 12, 2011 8:41 am

Re: Watson's performance

Postby TheConfessor » Sat Feb 26, 2011 1:08 am

christo wrote:The blueish curve is Watson's performance on 5000 clues, and shows the "confidence curve". The idea is this: at 100% Watson is a little above 70% correct. If you let Watson remove 10% of the questions that it is least confident in, it gets about 78% of those correct (so it is 70% precision at 90% answered). If you let Watson remove 20% of the questions it is least confident in, it gets about 82% correct. What this shows is that Watson's confidence estimation is pretty accurate, the questions it drops as you move from right to left on the curve tend to be wrong ones, so its precision for the remaining ones goes up. Watson is 90% precision at 60% answered, and about 86% at 70% answered.

Why does Watson's precision drop off at the left side of the curve? He actually has a higher success rate when answering 20% of the clues than when cherry picking the 1% of the clues where his has his highest confidence.

Like Ken, I thought I could beat Watson in a straight up test of knowledge with no buzzer, but I think you're inferring from the graph that it would be a close contest between Watson and any top player. I'm not sure that's the correct interpretation. The blue line shows how Watson would do when given a set of 5000 questions, any of which he could choose to answer or not. On the other hand, the red dots represent Ken's actual performance when playing against two other contestants. If Ken were playing solo, under the same conditions as Watson's blue line results, Ken's red dots would be clustered farther to the right because he would have a chance to answer all of the clues. In most cases, each increment to the left represents a question where Ken's opponent beat him on the buzzer. That never happens for Watson's blue line on the graph.

The blue line is labeled Watson v 1.0. When was that? How many versions were there, and was the final version significantly better than what's shown on the graph?
TheConfessor
 
Posts: 1467
Joined: Fri Jun 16, 2006 3:11 pm
Location: Austin, TX

Re: Watson's performance

Postby TheConfessor » Sat Feb 26, 2011 1:26 am

I've deleted most of this duplicate post because I meant to edit my post above, but instead I quoted it.

I intended to add that without an opponent, Ken's precision results on the cloud graph would also be a bit higher, since in actual games, his lost buzzer races would have occurred most often on questions where all three players knew the correct answer.
Last edited by TheConfessor on Sat Feb 26, 2011 11:06 am, edited 1 time in total.
TheConfessor
 
Posts: 1467
Joined: Fri Jun 16, 2006 3:11 pm
Location: Austin, TX

Postby Ken Jennings » Sat Feb 26, 2011 9:02 am

SVCS08, my metric of 50+ questions per game is not "clues for which I happened to guess the correct response." That's "clues at which I was confident enough to buzz with the correct response." When I've trained for Jeopardy, I've tracked those stats pretty closely every night watching at home, and against games from the J-Archive. Contestants with scores in the 40s win on the show all the time, but for me personally, 50 is a bad game, and 55+ is a good game.

I'm sure there are other players who could beat that. Jerome Vered, for example, always beats me on written quiz tests of the type popular in Europe, so I imagine his buzzer-less Jeopardy prowess would be even more impressive than he is with a buzzer in his hand.

I don't really know how else to convince you here...if you're not going to believe the 70-whatever-game Jeopardy winner about this kind of stuff, I'm not really sure who to refer you to.
Last edited by Ken Jennings on Sat Feb 26, 2011 9:10 am, edited 1 time in total.
Ken Jennings
Site Admin
 
Posts: 4436
Joined: Wed Jun 14, 2006 10:43 am

Postby Ken Jennings » Sat Feb 26, 2011 9:08 am

As another data point: David Gondek says in Stephen Baker's book that Watson's Final Jeopardy conversion is "below 50%". Now, that's on FJ clues, which are usually harder for computers to parse than regular game clues, but it's the closest thing to a "written test" that Jeopardy offers. It you don't believe there's a skill differential, Baker points out my 68% career Final Jeopardy record as if that's good, and I don't feel like a particularly strong Final Jeopardy player. Brad's lifetime rate has got to be near 100%, right? Has he ever missed one?
Ken Jennings
Site Admin
 
Posts: 4436
Joined: Wed Jun 14, 2006 10:43 am

Re: Watson's performance

Postby christo » Sat Feb 26, 2011 9:45 am

TheConfessor wrote:Why does Watson's precision drop off at the left side of the curve? He actually has a higher success rate when answering 20% of the clues than when cherry picking the 1% of the clues where his has his highest confidence.


Statistics are not accurate when the data is small, so that is just an artifact of dealing with a small amount of data.

TheConfessor wrote:Like Ken, I thought I could beat Watson in a straight up test of knowledge with no buzzer, but I think you're inferring from the graph that it would be a close contest between Watson and any top player. I'm not sure that's the correct interpretation. The blue line shows how Watson would do when given a set of 5000 questions, any of which he could choose to answer or not. On the other hand, the red dots represent Ken's actual performance when playing against two other contestants. If Ken were playing solo, under the same conditions as Watson's blue line results, Ken's red dots would be clustered farther to the right because he would have a chance to answer all of the clues. In most cases, each increment to the left represents a question where Ken's opponent beat him on the buzzer. That never happens for Watson's blue line on the graph.


Your analysis is correct. This curve is the best we can do in estimating Watson's performance in a statistically meaningful way. It shows that Watson answers a significant number of questions correctly and is good at knowing when its wrong. But it doesn't factor in speed or the competition for the buzzer. Compared to machines today, this is more rare than Ken is among humans. I believe the curve is comparable to the performance of a very good human trivia game player, and I'm certain it is well above average human performance (though we have these measurements for neither). The really cool thing is that where Ken's abilities are rare among humans, I hope Watson's abilities will be available on your desktop in a few years. Even more, I'm not going to trust Ken's advice in the emergency room, nor in Chinese. It should be possible to turn Watson's abilities on less...ummmm....trivial areas of human knowledge.

Regarding Watson's overall win/loss record, I am trying to get that. It would be nice to see it plotted in the winners cloud. However, its hard to conclude anything from the losses as most (but not all) were due to DD selection, not to answering incorrectly. For the first game against Ken & Brad, Watson answered 68% of the clues at 93% precision; for the second it answered 50% of the clues at 93% precision.

TheConfessor wrote:The blue line is labeled Watson v 1.0. When was that? How many versions were there, and was the final version significantly better than what's shown on the graph?


Watson v1.0 is the version Ken and Brad played - they are the only ones to have played it. There has been an internal release of Watson every quarter since 2007, but only the three most recent versions of Watson participated in matches against human players. Watson v1.0 had some improvements and bug fixes over the version that played in the sparring games against tournament of champions players, improvements were mainly to FJ (I don't think we would have gotten the second FJ right without that), and also confidence processing. Since Watson's buzzing speed is a function of its confidence, this improvement may have made it faster at buzzing on more questions than in sparring.

-Chris
christo
 
Posts: 27
Joined: Sat Feb 12, 2011 8:41 am

Postby Robert K S » Sat Feb 26, 2011 10:59 am

Incidentally, after getting 2 rights in the Watson games, Brad is tied with your 68% FJ! get rate.
Robert K S
 
Posts: 159
Joined: Tue Jun 20, 2006 11:51 am
Location: Cleveland, Ohio

PreviousNext

Return to Main Forum

Who is online

Users browsing this forum: No registered users and 1 guest