Last login data gets overwritten every login so you'd have to do it in real time. I guess you could also figure out the last login by searching the logs but that would be very slow.
I agree. The biggest skewing factor is that you'd be looking at accounts and not people. One person could be accessing dozens of accounts, some of which haven't been accessed in years, others that were online just the day before. One person might be online for 10 hours straight, another might relog 20 times in that period. It might give you some rough idea of how frequently people are accessing their alts or how often they relog but how useful is that really...
I don't see how eliminating data based on arbitrary measures like time spent AFK will do anything to help solve this problem. If anything it makes the problem worse by skewing the data even further.Pvm porn wrote: ↑Mon Feb 17, 2020 9:25 pm There is a thought I'm having you could add variables to reduce skew on the data. I.e if you condition the distribution on people that spent less than 50% of their time in an afk chair, and condition further on people who spent over an hour logged in, to reduce outliers or unrelated data coming from vets. On the opposite end you could condition the count of new accounts (probably the group corresponding to a last login of 0 or NULL) on accounts which were created on unique IPs. I know some other network data is actually collected on login however the uniqueness of that data is debatable but I wouldn't consider it likely, maybe a combination of that and recognized IP could be used to determine new players to the most accurate degree possible. Obviously people trying to look new to an analysis algorithm will probably manage to do that but that's fairly rare I think. However, again, this works under the major assumption that data for last login, time spent AFK, and time logged in are easy to collect for Rapsey and that the initial predictor variable I chose (last login date) is actually a good one[/spoiler]
It doesn't change the underlying problem. You're looking at accounts but what you want to know is people. Every account you see could be someone's 100th alt or it could be that rare new player you're trying to detect. Trying to distinguish the two based on how long they were logged in or how long they were AFK is pure guesswork.
In this case you can't say "let's just look at accounts and then the number of new people will be proportional to that". New players are so rare compared to new accounts that get created constantly, there's no discernible correlation between the two.
It's kinda like trying to determine if you have a leaky faucet by looking at the total water consumption per day. You're never gonna be able to spot the impact of a few drips when the normal usage fluctuates by several gallons from day to day.
Just because you can doesn't mean you should. That is as true about facial recognition as it is about appointing you the new sysadmin.
EDIT: Well, it's official: 201 players on a Monday evening. Good job everyone!