FORUMula1.com - F1 Forum

Discuss the sport you love with other motorsport fans

Formula One related discussion.
#337925
There's 5 different types of overtakes counted.
Give me a stat you think is relevant then.

I think the stat that is relevant is how many overtakes each driver has performed on the track this year period.

Well, is that stat not present?
These are the on-track overtakes.


It is present. What I'm criticising is not stopping there with just that one stat. Inventing all these other stats as well is very bad statistics.
#337928
There's 5 different types of overtakes counted.
Give me a stat you think is relevant then.

I think the stat that is relevant is how many overtakes each driver has performed on the track this year period.

Well, is that stat not present?
These are the on-track overtakes.


It is present. What I'm criticising is not stopping there with just that one stat. Inventing all these other stats as well is very bad statistics.

So if Vettel overtook 74 HRTs, and Alonso overtook 45 McLarens, you would be satisfied with just reading?

Sebastian Vettel: 74
Mark Webber: 74
Felipe Massa: 65
Lewis Hamilton: 55
Romain Grosjean: 55
Nico Rosberg: 53
Michael Schumacher: 52
Kimi Räikkönen: 48
Fernando Alonso: 45
Jenson Button: 45


I know I wouldn't, and neither would many others.
Context is what it's called.
#337929
There's 5 different types of overtakes counted.
Give me a stat you think is relevant then.

I think the stat that is relevant is how many overtakes each driver has performed on the track this year period.

Well, is that stat not present?
These are the on-track overtakes.


It is present. What I'm criticising is not stopping there with just that one stat. Inventing all these other stats as well is very bad statistics.

So if Vettel overtook 74 HRTs, and Alonso overtook 45 McLarens, you would be satisfied with just reading?

Sebastian Vettel: 74
Mark Webber: 74
Felipe Massa: 65
Lewis Hamilton: 55
Romain Grosjean: 55
Nico Rosberg: 53
Michael Schumacher: 52
Kimi Räikkönen: 48
Fernando Alonso: 45
Jenson Button: 45


I know I wouldn't, and neither would many others.
Context is what it's called.

If you wanted to go all out you could create an index based on a driver's overtakes and factoring in the final championship standing of each of his overtakees. Lots of work, but that would weigh the overtakes presumably according to how difficult they were.
#337930
If you wanted to go all out you could create an index based on a driver's overtakes and factoring in the final championship standing of each of his overtakees. Lots of work, but that would weigh the overtakes presumably according to how difficult they were.

You know there would still be people unwilling to accept a view different from theirs.
I have now taken the top 5 cars from the season, and 9 of the top 10 drivers.
Image
#337932
So if Vettel overtook 74 HRTs, and Alonso overtook 45 McLarens, you would be satisfied with just reading?


Well that's completely irrelevant, as nothing like that happened.

A statistic should match the situation from which the data is obtained. The situation we are in is nothing like that you described.

You don't obtain the data, and then decide what statistic to use. That is one of the cardinal rules of statistics.

People criticise statistics a lot, but it's mainly because they don't understand statistics and because they are extremely often unknowingly or knowingly misused. As is happening in this thread.

Have you looked up what a statistical fishing expedition is?
#337933
If you wanted to go all out you could create an index based on a driver's overtakes and factoring in the final championship standing of each of his overtakees. Lots of work, but that would weigh the overtakes presumably according to how difficult they were.

You know there would still be people unwilling to accept a view different from theirs.
I have now taken the top 5 cars from the season, and 9 of the top 10 drivers.
Image


Very, very, not close enough.

If you actually want to do something vaguely resembling statistics, then you need to choose your statistic before you obtain the data.
#337934
I never mentioned bias, just that people should stop talking big based on something that even the author admits isn't a complete picture. I'm interested in the most realistic and reliable data, which this sure as hell isn't - so my perception isn't actually being shattered or anything like that here, do not worry. :)

So, tell me then, what is "the most realistic and reliable data"?


Obviously not the example given in this thread when the author themself states, and i repeat: "Gaps in the available data, such as moves missed by TV cameras or obscured on lap charts by pit stops or retirements, mean that the data do not lend themselves to detailed analysis".

Personally i think it'd be good to see something from an official source that perhaps utilises actual data from the cars' computer systems rather than, what appears to be in this case, someone watching on the tv and on lap charts and taking a tally of what they see. Even that isn't perfect for making analysis or any sort of real judgement though.

The data provider has also ignored passes on certain cars and in certain circumstances, for example like i said, the first lap. Why should this person be the judge on what matters and what doesn't? And more to the point why should any of us just sit and accept this set of statistics given that? Someone could provide an alternate set of data ignoring, say, DRS or KERS assisted passes and the results may end up completely different. Or they could include every single pass, which would give pure data by definition, but with zero circumstantial factors, something that must always be considered for accurate analysis.

This data gives quite simply an undeniably incomplete and utterly misrepresentative picture no matter which way you look at or try to spin it.
#337937
I never mentioned bias, just that people should stop talking big based on something that even the author admits isn't a complete picture. I'm interested in the most realistic and reliable data, which this sure as hell isn't - so my perception isn't actually being shattered or anything like that here, do not worry. :)

So, tell me then, what is "the most realistic and reliable data"?


Obviously not the example given in this thread when the author themself states, and i repeat: "Gaps in the available data, such as moves missed by TV cameras or obscured on lap charts by pit stops or retirements, mean that the data do not lend themselves to detailed analysis".

Personally i think it'd be good to see something from an official source that perhaps utilises actual data from the cars' computer systems rather than, what appears to be in this case, someone watching on the tv and on lap charts and taking a tally of what they see. Even that isn't perfect for making analysis or any sort of real judgement though.

The data provider has also ignored passes on certain cars and in certain circumstances, for example like i said, the first lap. Why should this person be the judge on what matters and what doesn't? And more to the point why should any of us just sit and accept this set of statistics given that? Someone could provide an alternate set of data ignoring, say, DRS or KERS assisted passes and the results may end up completely different. Or they could include every single pass, which would give pure data by definition, but with zero circumstantial factors, something that must always be considered for accurate analysis.

This data gives quite simply an undeniably incomplete and utterly misrepresentative picture no matter which way you look at or try to spin it.

If you perceive this as an in-depth analysis rather than an analysis of a trend, I'm sorry, but that's not my doing.

Pirelli uses the same context, and counted 3 more on-track overtakes in the entire year prior to Brazil.
It's pretty accurate :wink:
#337938
You don't obtain the data, and then decide what statistic to use. That is one of the cardinal rules of statistics.

I didn't see you watching over my shoulder as I wrote this article.
How did you do that? :yikes:

The statistics are chosen based upon the criteria many people pick when judging overtaking.
Passing a HRT 23 times (in Vettel's case) is pretty irrelevant and uninspiring, so I factor in what's more telling.
Again, if you think it unreasonable, please provide me with what you consider better criteria.

That's why it cuts down more and more to which level of car was overtaken in which circumstances.
For instance, I more value Hamilton's overtake on Hülkenberg on pretty equal tires in Spain for 11th, then Button's overtake on Räikkönen for third, with the Lotus tires' having fallen off the cliff.
#337943
Pirelli uses the same context, and counted 3 more on-track overtakes in the entire year prior to Brazil.
It's pretty accurate :wink:


Accurate based on those criteria, yeah, but in reality, not so much (like i already explained haha).
#337947
You don't obtain the data, and then decide what statistic to use. That is one of the cardinal rules of statistics.

I didn't see you watching over my shoulder as I wrote this article.
How did you do that? :yikes:

The statistics are chosen based upon the criteria many people pick when judging overtaking.
Passing a HRT 23 times (in Vettel's case) is pretty irrelevant and uninspiring, so I factor in what's more telling.
Again, if you think it unreasonable, please provide me with what you consider better criteria.

That's why it cuts down more and more to which level of car was overtaken in which circumstances.
For instance, I more value Hamilton's overtake on Hülkenberg on pretty equal tires in Spain for 11th, then Button's overtake on Räikkönen for third, with the Lotus tires' having fallen off the cliff.


You're posting your discussion in the thread as you do it. I don't need to be looking over your shoulder. Your discussion shows a classic case of someone who doesn't understand how to handle statistics.

If you collect your data, and then say what if I calculated the statistics this way, or that way, would it be more accurate, you're doing it in the knowledge of what the data is. Hence there's no way you can distinguish yourself whether or not you're looking for methods of calculating the statistic that are "better", or whether you are simply finding statistics that meet your preconceived notions of who is the better overtaker or what the order is. There is plenty of research that shows that people do this, and it is often subconscious, so there's no way you can say whether you are doing this or not.

If you think that just counting the overtakes isn't the most accurate way of calculating a statistic that represents how good a driver is at overtaking, then work out how it should be calculated, make sure that it can be calculated with no subjective interpretation necessary, publish it openly, then use it on the data collected from the races in 2013. That's the way to do things. Because in that way there is no possibility that the choice of the most "accurate" statistic out of many was chosen because it matches your (or various groups of people) preconceptions.
#337956
You don't obtain the data, and then decide what statistic to use. That is one of the cardinal rules of statistics.

I didn't see you watching over my shoulder as I wrote this article.
How did you do that? :yikes:

The statistics are chosen based upon the criteria many people pick when judging overtaking.
Passing a HRT 23 times (in Vettel's case) is pretty irrelevant and uninspiring, so I factor in what's more telling.
Again, if you think it unreasonable, please provide me with what you consider better criteria.

That's why it cuts down more and more to which level of car was overtaken in which circumstances.
For instance, I more value Hamilton's overtake on Hülkenberg on pretty equal tires in Spain for 11th, then Button's overtake on Räikkönen for third, with the Lotus tires' having fallen off the cliff.


You're posting your discussion in the thread as you do it. I don't need to be looking over your shoulder. Your discussion shows a classic case of someone who doesn't understand how to handle statistics.

If you collect your data, and then say what if I calculated the statistics this way, or that way, would it be more accurate, you're doing it in the knowledge of what the data is. Hence there's no way you can distinguish yourself whether or not you're looking for methods of calculating the statistic that are "better", or whether you are simply finding statistics that meet your preconceived notions of who is the better overtaker or what the order is. There is plenty of research that shows that people do this, and it is often subconscious, so there's no way you can say whether you are doing this or not.

If you think that just counting the overtakes isn't the most accurate way of calculating a statistic that represents how good a driver is at overtaking, then work out how it should be calculated, make sure that it can be calculated with no subjective interpretation necessary, publish it openly, then use it on the data collected from the races in 2013. That's the way to do things. Because in that way there is no possibility that the choice of the most "accurate" statistic out of many was chosen because it matches your (or various groups of people) preconceptions.


Exactly, there are so many problems related to trying to divide the true, factual, inarguable statistics into non-statistical manipulation of numbers. There are too many factors that have an input that cannot be ignored such as:

- Certain tracks lend themselves to overtaking, others do not. An example of this would be that it may be relatively simply for, say, a McLaren to overtake a Toro Rosso at Austin, but not at Monaco. So what is of more value if indicating a better overtaker? 4 passes on track at Monaco or 6 passes on track at Austin?

- Different track conditions / temperatures etc. suit certain cars, which can have real deviation of performance dependent on these factors. So an overtake on a Lotus at one track may be easier or harder than the next track depending on those conditions.

- Some tracks favour engine power whilst others favour aero design. See above for example.

- Different levels of car evolution throughout the season.

-Two or three uncharacteristically poor qualifying positions in a season can make a lot of overtakes possible in few races, skewing the statistics dependent on the data gathered.


These are fun for having a look at and a bit of discussion about, but to repeat again, I am not comfortable with these different sets of data being described as 'statistics', because they're not really.

As I've always said, a true quantitative statistic will tell you the existence of a phenomena whereas qualitative analysis is what will tell you the reason for the existence of that phenomena. Quantitative and qualitative analysis are useless in isolation, as it relates to F1 anyway. You need both to tell the full story, as you do in any non-spec series (and even in a spec series you have set-up deviation so quantitative analysis on its own is still only of limited use).
  • 1
  • 2
  • 3
  • 4
  • 5
  • 7

See our F1 related articles too!