Quote from CaesumMar 06, 2011 - 11:59:46
Perhaps there should be another page where some of these suggestions can be tested? I still think spoj scoring is a problem as well, half an hour to an hours work to solve a problem usually gives 0 points, whereas on other sites 2 minutes work can give a lot of points.
Yes, SPOJ scoring is a serious problem. I've looked through some of the challenges at SPOJ and from what I can tell the challenges are easily equivalent to what you find at Rosecode or CSTutoring as far as difficulty, but at either of those sites you get around 20 points per challenge at the low end when you are just starting to work the site. Much more per challenge later on. Caesum, assuming
this is the same Caesum we hear about in hushed whispers in the dark
, has essentially solved close to the whole of CSTutoring, or solved Rosecode three times over, for 1788 (4.67%). That is seriously broken.
My thoughts towards a solution:
I've been thinking about WeChall's scoring mechanism, mostly because of
this thread and
this one (starting on page 3), and I think the scoring issue is twofold. First, WeChall's scoring tries to 'normalize' points such that 100% is about the same number of points no matter how many challenges are on the site. That shows up here:
Quote from Gizmore
site_score = site_basescore + (avg*site_basescore) + (challcount*basescore_per_chall)
...
userscore = site_score * pow(percent_solved, 1+100/site_challcount);
From http://www.wechall.net/forum/show/thread/465/New_Scoring/page-1#post2675
Of course, several other factors change the final site score but 'normalizing to challenge count' is a significant factor. Until very recently, I thought that this was a good idea, now I believe otherwise. It badly breaks scoring on sites like SPOJ, and generally it results in equivalently difficult challenges scoring differently purely because the site challenge count is different. If you pasted a challenge from CSTutoring into Rosecode, the challenge would score differently because total challenge counts are different on the two sites. That seems wrong to me. This, I think, is the primary problem. I've come to think that total challenge count should not be a factor in WeChall scoring. I know that is a radical break but I've tried to walk through the reasoning.
Second, I am beginning to wonder if weighting scores by percentage solved is a good idea. This, also, is a radical break so I'll try to make a case. Again, this breaks badly on sites like SPOJ where you'd need to solve more than a thousand challenges to get 50% but even that number may be badly off since Caesum's 240 solved at SPOJ is about 11.5 % there but translates to 4.67% here. The idea behind weighting challenges by percentage solved seems to be to prevent people from solving a bunch of easy challenges on multiple sites and thereby racking up a lot of points for easy work. That makes sense. Until very recently I also thought that this was a good idea. Now, I think that it only makes sense for some sites, namely sites with 1) challenges of varying difficulty that 2) internally score challenges evenly.
This is a critical consideration so I'll elaborate. Imagine a site where all of the challenges are roughly equivalent in difficulty. I'm not sure such a site actually exists though Ma's or Electrica might qualify. I haven't played through enough of either of those so I can't say. CSTutoring's programming section mostly fits the description though. I stress, "mostly". It isn't as good of an example as I'd like so pretend. CSTutoring internally scores all challenges evenly, as far as I can tell. There are no challenges worth ten points vs. challenges worth 30 or 100. They are all scored the same. Assuming the somewhat fictional idea that all of the challenges are equally difficult it makes no sense to penalize a player for solving only one or two challenges. All challenges are equally difficult so it makes no sense to penalize a player for solving only the 'easy' ones.
Now, not many sites score challenges evenly. I've been playing Rosecode so I'll use that as an example. Rosecode has an internal difficulty rating. Challenges score 2 to 25 points or so depending upon how hard the challenge is. That internal difficulty rating is reflected in WeChall's scoring. That is, a site's internal difficulty calculations are already taken into account when the score is sent to WeChall. When WeChall further weights by percentage solved it is compensating, or punishing, a second time for difficulty, which doesn't make sense to me.
The only time it does make sense to weight by percentage solved is when, as I said, a site internally scores at a flat rate for challenges of varying difficulty, but I'm not sure how many of those sites there are. They are certainly pretty rare.
Something like this makes sense to me at this point:
site_score = (site_difficulty*site_basescore);
userscore = (site_score + (usercount/rank) + score) * (score/maxscore)
site_difficulty is pulled from the 'Dif' column on the 'Sites' table. This value is generated by WeChall user votes. The rest comes from values returned from linked sites-- username:rank:score:maxscore:challssolved:challcount:usercount. I have no idea how this would effect scoring. It may be terrible, and I'm sure it far from ideal. It does keep raw challenge count from the equation, which I think is the biggest issue, and keeps scoring relative to the user rank and the number of users who have solved a challenge both of which should be some kind of indication of difficulty whereas raw challenge count is not. At any rate, my primary point was not to suggest a new algorithm but to point out what seem to me to be errors in the logic behind the existing algorithm. There may be much better ways to correct those errors than mine.
I second Caesum's suggestion that we have 'test' page of some sort-- maybe something like the 'Rankings' table but with a number of algorithms side by side.
Anyway... flame away