Ken Jennings

Message Boards

IBM's DeepQA and the Jeopardy! Challenge

The place to talk. "On topic"? "Off topic"? We make no such petty distinctions here.

Postby mathochist » Sun Feb 20, 2011 7:50 am

Ken Jennings wrote:
I can think of a couple less egregious ways to level the playing field without forcing Watson to play with one timing chip behind its back, so to speak. I'll get around to posting those next week.


I agree. I was thinking Watson should have to listen to the clues just like you did, instead of being fed its own special versions.
mathochist
 
Posts: 2
Joined: Sat Feb 19, 2011 3:52 pm

Postby mathochist » Sun Feb 20, 2011 7:58 am

mathochist wrote:
Ken Jennings wrote:
I can think of a couple less egregious ways to level the playing field without forcing Watson to play with one timing chip behind its back, so to speak. I'll get around to posting those next week.


I agree. I was thinking Watson should have to listen to the clues just like you did, instead of being fed its own special versions.


Drat, it just occurred to me that it would also have to be allowed to read the clues visually the way humans are allowed to. I wonder if the amount of time it would take the OCR software to work would tip the scales much.
Hi. I'm Rob.
mathochist
 
Posts: 2
Joined: Sat Feb 19, 2011 3:52 pm

World Memory Championship -- Watson's Next Challenge?

Postby TheConfessor » Sun Feb 20, 2011 11:13 am

Here's an article about a different form of competition among "mental athletes." I bet Watson would be pretty strong at this, and there wouldn't be any controversy about his buzzer speed.
http://www.nytimes.com/interactive/2011 ... crets.html

For viewers watching at home on TV, there would be ample opportunity to play along, pick up Lach Trash from triple stumpers and compare Coryat scores. There could be Daily Doubles and Final Memory wagers. Hmm, the more I ponder this, the more I think it might be a better format with a buzzer. Spaces on the memory grid could be revealed at random, and the first person to buzz in with the correct word or number earns the points. Or maybe then they would control the board and select which space must be recalled next. For primetime, each space could be held in a briefcase by a different fashion model.
TheConfessor
 
Posts: 1467
Joined: Fri Jun 16, 2006 3:11 pm
Location: Austin, TX

Postby bwouns » Sun Feb 20, 2011 2:53 pm

What's lach trash?
bwouns
 
Posts: 1521
Joined: Wed Aug 16, 2006 4:31 am
Location: Eugene, OR

Postby TheConfessor » Sun Feb 20, 2011 2:55 pm

bwouns wrote:What's lach trash?

http://www.j-archive.com/help.php#lachtrash
TheConfessor
 
Posts: 1467
Joined: Fri Jun 16, 2006 3:11 pm
Location: Austin, TX

Question for lurking IBMers

Postby rick1013 » Sun Feb 20, 2011 8:26 pm

I was most struck by Watson's use of the Forrest Bounce strategy as well as starting some categories in the middle. Was this strictly Daily Double-seeking behavior? I can't imagine that Ken or Brad would have been affected by the bounce much as compared to the average player.
I chose him out of thousands. I didn't like the others, they were all too flat.
rick1013
 
Posts: 70
Joined: Fri Feb 09, 2007 10:33 am
Location: Carrollton, TX

Postby bradleyk » Sun Feb 20, 2011 9:52 pm

Haha. That first daily double was rediculous. I think it may have been daily double hunting honestly.
bradleyk
 
Posts: 89
Joined: Sat Jan 29, 2011 1:42 pm

Postby TheConfessor » Sun Feb 20, 2011 10:56 pm

It's pretty obvious that each selection Ken, Brad, or Watson made was hunting for a Daily Double, as long as a DD remained on the board. Almost without exception, after the DD's were gone, Watson would select from the top of the board and Ken and Brad would select one of the highest value clues that remained on the board. This may have given the appearance of a Forrest Bounce, but I don't think any of the three players were trying to use that strategy. Selecting high value clues makes sense for humans playing against a computer. Selecting low value clues makes sense for a computer playing against humans, as long as there is a guarantee that all clues on the board will be revealed.
TheConfessor
 
Posts: 1467
Joined: Fri Jun 16, 2006 3:11 pm
Location: Austin, TX

Re: The Amazing "Christo"?

Postby christo » Mon Feb 21, 2011 6:34 am

TheConfessor wrote:Here's another clue, left over from the Watson games:

TRUE IDENTITY OF "CHRISTO" WHO POSTS HELPFUL EXPLANATIONS ON THE KEN JENNINGS MESSAGE BOARD

============= Chris Welty============================= 83%

==== Christo the fabric wrapping artist ===48%

== Bobcat Goldthwaite* ===30%

* see video: http://www.youtube.com/watch?v=Nz9Hm9jY5AI

Hey Chris Welty, is that you? If so, thanks for taking the time to answer some questions here. I'm Ed Toutant, whom you may remember from seeing me play some practice matches early last month. I'm sorry we didn't get to talk more during my three visits to the lab. I was impressed by everyone I met there. Your name came up several times this week in Austin at the viewing parties at IBM and the University of Texas, both of which were very successful. James Fan spoke at the Monday event at UT and I got to do some color commentary on Wednesday for about a thousand of my former colleagues at IBM.

I hope you'll stick around and answer more questions from this board. As you know, there's a tremendous amount of interest. Here's Chris's group on the Watson team:
http://www-943.ibm.com/innovation/us/wa ... ithms.html

A friend in Canada wondered if the Toronto answer could have been influenced by the World War II Battle of Taranto. It seems plausible, if Watson assigns any weight to slight misspellings or alternate spellings of evidence that seems potentially related. If Watson found a statement saying that "George Wasington's vice-president was John Adams," would that be considered or discarded?


Thanks Ed. Yes, its me. Ken asked after the show if I'd mind a post or two here and I had to clear it with IBM M&C.

No, Taranto did not factor in. It might make sense if Watson understood that part of the question, but it didn't, so the WWII battle part didn't really affect the answer. Again, with such a low confidence, its important to realize this was the machine equivalent of a "wild guess". Watson didn't think the answer was Toronto, it thought Toronto was a better answer than Chicago.

-Chris
christo
 
Posts: 27
Joined: Sat Feb 12, 2011 8:41 am

Postby jzerocsk » Mon Feb 21, 2011 6:41 am

Re: #1, I agree with you in that the amount of knowledge a Jeopardy champ has stored right in his head is quite amazing.

polarea wrote:I'm not entirely sure Watson's grammar has been conclusively demonstrated to be up to snuff on things that are beyond Jeopardy level questions. Presumeably the answer to each jeopardy question is found in hundreds of different sources in the Watson data bank, and using redundancy of information can substitute for a true grammatical understanding of the question in deciding certitude, I even think I read somewhere that Watson was allowed to practise on old jeopardy sets. Finding the right answer to a question of much higher obscurity I maintain would be much more difficult, given what I have seen of Watson's problem solving algorithms.


Really? Given the weirdness of Jeopardy clues and its relative success, it seems pretty apparent that if you gave Watson a straightforward "What is _________?" question, no matter how obscure the answer is, if the correct answer is in the database, Watson will find it. How could it not?

Of course if you give it a ridiculously obscure question and use obtuse Jeopardy wording you may present more of a challenge, but all that goes back to the basic intent of the machine...the only real-world application of this tech that involves answering Jeopardy questions is....answering Jeopardy questions. Maybe it wouldn't be able to hack those....but it doesn't really matter - the user is not going to intentionally phrase queries to try and confuse the system. "Real world" Watson applications would also presumably be networked and have access to data sets larger but also more appropriately constrained (e.g. the canonical example Watson app that helps doctors with diagnoses isn't going to have baseball trivia in its library). Finally, in a lot of real-world applications, one single correct answer is probably neither necessary nor helpful. Showing a list of possible answers with a confidence rating is preferable.


I really can't see it not being effective.
Last edited by jzerocsk on Mon Feb 21, 2011 6:43 am, edited 1 time in total.
jzerocsk
 
Posts: 766
Joined: Wed Jul 26, 2006 8:04 am

Re: Why Toronto

Postby christo » Mon Feb 21, 2011 6:41 am

naurae29 wrote:
christo wrote:Question 1: Its largest airport is named after a WWII hero.
Question 2: Its second largest after a WWII battle.

For question 1, Watson gets Toronto, Chicago, New York, Omaha.
For question 2, unfortunately, Watson is unable to make the question meaningful. We speak English and easily recognize that the "second largest" refers to airport, but Watson does not. So it gets no answers to question 2 that are cities, and all are extremely low confidence.


hi, chris,

i've been puzzling over why - even if he was ignoring the category - watson would think toronto qualified at all. billy bishop (namesake of the smaller toronto airport) was a ww i hero, and lester b. pearson (namesake of the larger airport), as well as being a nobel prize winner and our prime minister, was an airman in wwi (though probably doesn't qualify as a "hero," per se) who didn't serve in ww ii at all.

your post explains why final jeopardy stumped watson, but given that toronto is the wrong answer for both questions 1 and 2, i'm still confused as to why he picked *this* wrong answer. can you shed any light on that?


Pearson did serve in WWII, though not in combat. However, that's not what gave more support for Toronto. You are assuming Watson understood the question the way we do. Watson goes and finds evidence for different parts of the question matching a particular answer, it was more the general association of WW II and other words from the clue that very weakly favored Toronto. It wasn't enough evidence for Watson to think it had the right answer, but it was more than it found for Chicago (again, as I responded to Ed, Watson failed to understand the second part of the question at all).

This general lack of actually understanding questions is the big advantage humans have over Watson in this game.

-Chris
christo
 
Posts: 27
Joined: Sat Feb 12, 2011 8:41 am

Postby Bill » Mon Feb 21, 2011 6:50 am

econgator wrote:
skullturfq wrote:Is there an easy way to legally watch episode 2 on the internet tomorrow after it airs, or should I try to remember how to program a VCR for probably the first time this millennium? I teach a class on Tuesday nights, and don't have TiVo or whatever the kids call it.


Tomorrow, no. Thursday, yes.


Apologies if this has been posted already, but the board's been pretty active lately.

Where can you watch the episodes online?
Bill
 
Posts: 1551
Joined: Sat Jun 16, 2007 2:32 am
Location: New York City

Absolute answer test?

Postby christo » Mon Feb 21, 2011 7:07 am

Ken Jennings wrote:After seeing Watson in a variety of games, through good play and bad, I feel pretty confident that in a written-answer one-on-one contest, Watson would lose 9 out of 10 to me or Brad if there were Daily Doubles and Final Jeopardy and the like involved, and something like 99 out of 100 without them.


Perhaps, but this would not have been a contest we'd be interested in. The confidence aspect - the knowing when you're right and deciding to buzz in - was an important part of the technology demonstration. Since machines "think" differently than people, the mistakes they make are very different from the mistakes people make. But with the confidence dimension, you can build trust in tools like this, Toronto or not.

As a few of you saw from the "confidence chart", Watson's raw performance (when forced to answer all questions) is in the mid-70s%. But its performance at the buzzing threshold is right there with Ken - up in the 90%s. This is a really important point, machines will never be perfect (and neither will people) but being able to know when it knows the answer was a major innovation that we are pretty proud of.

For the much debated buzzing speed to be a factor in the game, any contestant must be performing at this level. So the game isn't just about the buzzer, its about knowing enough, and knowing you know it well enough, for it to come down to the buzzer on almost every question.

-Chris
christo
 
Posts: 27
Joined: Sat Feb 12, 2011 8:41 am

Re: Question for lurking IBMers

Postby christo » Mon Feb 21, 2011 7:25 am

rick1013 wrote:I was most struck by Watson's use of the Forrest Bounce strategy as well as starting some categories in the middle. Was this strictly Daily Double-seeking behavior? I can't imagine that Ken or Brad would have been affected by the bounce much as compared to the average player.


Watson's strategy for "clue selection" included finding daily doubles, and also learning about what each category is by exposing the lower valued clues. The Bounce strategy is not part of it's strategy specifically, but hunting and category learning may appear "bouncy".

-Chris
christo
 
Posts: 27
Joined: Sat Feb 12, 2011 8:41 am

Strategy Question

Postby TheConfessor » Mon Feb 21, 2011 7:30 am

Chris, I don't know if you're the right person to answer this, but I've read a lot of recent articles about Watson, only a few of which seem to show much understanding of Jeopardy strategy. One thing that's been bugging me is the people who think they could beat Watson just by selecting the categories that they expect Watson to be weaker in. For example, Stephen Baker wrote this article in which he said that his strategy was to "feast on" the shorter and more humorous clues.

But assuming that all clues on the board will be played, as was the case in all of Watson's matches, does it really matter whether the human-friendly clues are found early or late in the game, or who selects them? It seems like Baker's strategy would at best allow him to be more competitive early in the game before inevitably being trounced later in the game on the computer-friendly clues. I guess there's some bragging value in saying "I had Watson worried for a while" instead of "I was never close, but I held my own toward the end."

My question applies mainly after the Daily Doubles have been found. Since all three players made finding DDs their top priority, they tended to be found early. And it's not obvious to me that the humans should necessarily hunt for DDs in the human-friendly categories. Depending on the relative scores at the time, it might make more sense for humans to seek DDs in Watson's favorite categories, to prevent him from using them to jump to an insurmountable lead.

Of course, Daily Doubles are most valuable when you have a substantial amount available to wager, so there is a case to be made for trying to build up one's score before seeking DDs. But that approach carries a much higher risk of allowing your opponent to find the DD before you do.
TheConfessor
 
Posts: 1467
Joined: Fri Jun 16, 2006 3:11 pm
Location: Austin, TX

Postby rick1013 » Mon Feb 21, 2011 7:34 am

christo wrote:Watson's strategy for "clue selection" included finding daily doubles, and also learning about what each category is by exposing the lower valued clues. The Bounce strategy is not part of it's strategy specifically, but hunting and category learning may appear "bouncy".

-Chris


Chris, thanks for the response. This begs my next, "real" question: given the relatively low DD wagers Watson placed, was the reason for DD hunting really more to prevent the carbon-based competitors from finding and betting the moon than to make big wagers it/himself?

If HP and Apple created Hal and Steve to compete with Watson, all presumably comparable (but clearly inferior, heh, heh) abilities, would his programming be altered to be more risk-taking? Or would Watson be able to learn to do this over multiple games with these siliconesque pugilists?
Last edited by rick1013 on Mon Feb 21, 2011 12:44 pm, edited 2 times in total.
I chose him out of thousands. I didn't like the others, they were all too flat.
rick1013
 
Posts: 70
Joined: Fri Feb 09, 2007 10:33 am
Location: Carrollton, TX

Postby polarea » Mon Feb 21, 2011 8:01 am

But questions in the format "What is_____?" are very uninteresting to answer, given that a dictionary, let alone a search engine, will be able to answer them. Watson's applications only become interesting once grammatically challenging questions that involve combining disparate information are able to be understood and answered specifically, and I still think it would be much more difficult to do that if the topic matter doesn't fall under the category of "jeopardy appropriate" subject matter. Anyway, I sort of see your point, but I will be interested to see whether Watson can do anything in fields where the answer hasn't been written about again and again in various forms.

jzerocsk wrote:Re: #1, I agree with you in that the amount of knowledge a Jeopardy champ has stored right in his head is quite amazing.

polarea wrote:I'm not entirely sure Watson's grammar has been conclusively demonstrated to be up to snuff on things that are beyond Jeopardy level questions. Presumeably the answer to each jeopardy question is found in hundreds of different sources in the Watson data bank, and using redundancy of information can substitute for a true grammatical understanding of the question in deciding certitude, I even think I read somewhere that Watson was allowed to practise on old jeopardy sets. Finding the right answer to a question of much higher obscurity I maintain would be much more difficult, given what I have seen of Watson's problem solving algorithms.


Really? Given the weirdness of Jeopardy clues and its relative success, it seems pretty apparent that if you gave Watson a straightforward "What is _________?" question, no matter how obscure the answer is, if the correct answer is in the database, Watson will find it. How could it not?

Of course if you give it a ridiculously obscure question and use obtuse Jeopardy wording you may present more of a challenge, but all that goes back to the basic intent of the machine...the only real-world application of this tech that involves answering Jeopardy questions is....answering Jeopardy questions. Maybe it wouldn't be able to hack those....but it doesn't really matter - the user is not going to intentionally phrase queries to try and confuse the system. "Real world" Watson applications would also presumably be networked and have access to data sets larger but also more appropriately constrained (e.g. the canonical example Watson app that helps doctors with diagnoses isn't going to have baseball trivia in its library). Finally, in a lot of real-world applications, one single correct answer is probably neither necessary nor helpful. Showing a list of possible answers with a confidence rating is preferable.


I really can't see it not being effective.
polarea
 
Posts: 719
Joined: Wed Jul 05, 2006 7:50 pm

Postby marpocky » Mon Feb 21, 2011 8:39 am

Bill wrote:
econgator wrote:
skullturfq wrote:Is there an easy way to legally watch episode 2 on the internet tomorrow after it airs, or should I try to remember how to program a VCR for probably the first time this millennium? I teach a class on Tuesday nights, and don't have TiVo or whatever the kids call it.


Tomorrow, no. Thursday, yes.


Apologies if this has been posted already, but the board's been pretty active lately.

Where can you watch the episodes online?


I swear I read as they were airing that they were going to be archived somewhere specific, but I can't remember at all anymore.

Here is a page compiling Youtube videos of all 3 games. The fact that they're still up makes me think they're IBM- and Sony-approved, but if not I apologize. Maybe someone else has a more "official" link.
marpocky
 
Posts: 1523
Joined: Mon Apr 14, 2008 6:39 pm
Location: Bozeman, MT

Postby TheConfessor » Mon Feb 21, 2011 11:49 am

For those who may be interested, here's a pre-broadcast interview I did about Watson, within the constraints of the non-disclosure agreement I signed with IBM.
http://gameshows.about.com/od/interview ... llenge.htm

I'm pleased to add that I now have a Watson t-shirt, which I proudly wore to my pub quiz last Wednesday. About half the teams paid some sort of tribute to Watson in their team names that night. I don't own any Jennings or Rutter t-shirts, but I was hoping they both would perform well against Watson and I hated to see any of the three contestants lose. At least for the purposes of this competition, I was a loyal IBMer and 100% committed to helping their side. In general, I love to get involved and support anything that helps promote game shows, and this was the biggest game show event in a long time.
TheConfessor
 
Posts: 1467
Joined: Fri Jun 16, 2006 3:11 pm
Location: Austin, TX

Postby Bill » Mon Feb 21, 2011 12:33 pm

marpocky wrote:
Here is a page compiling Youtube videos of all 3 games. The fact that they're still up makes me think they're IBM- and Sony-approved, but if not I apologize. Maybe someone else has a more "official" link.


Awesome. Thanks!
Bill
 
Posts: 1551
Joined: Sat Jun 16, 2007 2:32 am
Location: New York City

Postby geniusonwheels » Mon Feb 21, 2011 1:06 pm

Just making sure, but Watson did have a FJ wager locked in before the clue was shown, like traditional Jeopardy players, right?
geniusonwheels
 
Posts: 537
Joined: Wed Nov 14, 2007 4:47 pm
Location: South Carolina

Re: Strategy Question

Postby christo » Mon Feb 21, 2011 4:20 pm

TheConfessor wrote:Chris, I don't know if you're the right person to answer this, but I've read a lot of recent articles about Watson, only a few of which seem to show much understanding of Jeopardy strategy. One thing that's been bugging me is the people who think they could beat Watson just by selecting the categories that they expect Watson to be weaker in. For example, Stephen Baker wrote this article in which he said that his strategy was to "feast on" the shorter and more humorous clues.

But assuming that all clues on the board will be played, as was the case in all of Watson's matches, does it really matter whether the human-friendly clues are found early or late in the game, or who selects them? It seems like Baker's strategy would at best allow him to be more competitive early in the game before inevitably being trounced later in the game on the computer-friendly clues. I guess there's some bragging value in saying "I had Watson worried for a while" instead of "I was never close, but I held my own toward the end."My question applies mainly after the Daily Doubles have been found. Since all three players made finding DDs their top priority, they tended to be found early. And it's not obvious to me that the humans should necessarily hunt for DDs in the human-friendly categories. Depending on the relative scores at the time, it might make more sense for humans to seek DDs in Watson's favorite categories, to prevent him from using them to jump to an insurmountable lead.

Of course, Daily Doubles are most valuable when you have a substantial amount available to wager, so there is a case to be made for trying to build up one's score before seeking DDs. But that approach carries a much higher risk of allowing your opponent to find the DD before you do.


Yes, I agree, I don't see that strategy being better than any other. All the clues are revealed in the end, so there doesn't appear to be an advantage to doing anything other than DD hunting, unless you want to take this big risk on building your score so you can bet more if you get the DD. I'll tell you honestly I've never seen a category that I knew for sure Watson would fail on every clue. It has happened, but its hard to predict. Not all of Watson's failures are due to the question, often it is the sources that make the question hard.

-Chris
christo
 
Posts: 27
Joined: Sat Feb 12, 2011 8:41 am

Postby christo » Mon Feb 21, 2011 4:27 pm

rick1013 wrote:Chris, thanks for the response. This begs my next, "real" question: given the relatively low DD wagers Watson placed, was the reason for DD hunting really more to prevent the carbon-based competitors from finding and betting the moon than to make big wagers it/himself?


Watson's betting strategy considers a lot of different factors. It is not taking a "scorched earth" policy towards them, per se, but clearly finding them also prevents your opponents from potentially doubling up. If it was behind when finding a DD, Watson would probably bet more. The calculation takes into account the scores of all the players, the stage of the game, any confidence in the category learned by revealed questions, how many clues are left, etc. It is difficult to say for sure why it bet what it did without looking at all that. Though I agree it did seem to be betting conservatively overall, this may be because it was ahead most of the time.

rick1013 wrote:If HP and Apple created Hal and Steve to compete with Watson, all presumably comparable (but clearly inferior, heh, heh) abilities, would his programming be altered to be more risk-taking? Or would Watson be able to learn to do this over multiple games with these siliconesque pugilists?


I don't think we would change it much, no. The betting component was pretty sturdy from a mathematical and game-theoretic point of view. Honestly, though, the betting component is not really the thing we wanted people to notice, our primary focus and most of the computation was on the natural language processing.

-Chris
christo
 
Posts: 27
Joined: Sat Feb 12, 2011 8:41 am

Postby bradleyk » Mon Feb 21, 2011 4:34 pm

Did Watson believe that that number was the wager that gave him the absolute best chance of winning, or did he "think" "I need something between 2,000 and 4,000, I'll bet 3217"?
bradleyk
 
Posts: 89
Joined: Sat Jan 29, 2011 1:42 pm

Postby christo » Mon Feb 21, 2011 4:48 pm

geniusonwheels wrote:Just making sure, but Watson did have a FJ wager locked in before the clue was shown, like traditional Jeopardy players, right?


Yes. Several of the Jeopardy! production staff commented on the added degree of scrutiny this match received from the auditors (all game shows are independently audited in the US since the scandals of the 60s). For example, we on the Watson team could not come near Ken & Brad until after the show (which is a shame, because I was carrying a glove in my back pocket the whole time to throw in from of them) - to the extent they would make sure the bathroom was clear before they went in, and stand there not letting anyone else in. No one on our team was allowed to have a computer, and the control system for Watson was monitored at all times by one of the auditing team.

I found it quite annoying - my plans to cripple Ken's buzzing hand were thwarted, but it turns out he's ambibuzzterous anyway :D - and in the end I'm glad we can point to their rigor in ensuring everything was done the way it should.

-Chris
christo
 
Posts: 27
Joined: Sat Feb 12, 2011 8:41 am

PreviousNext

Return to Main Forum

Who is online

Users browsing this forum: No registered users and 0 guests