@Texas-Hawk-10
(Note: I rarely go into any nuts and bolts discussion of QA methods, because it bores those not interested in QA and creates cognitive dissonance for the many that think they understand QA, but have only a superficial working knowledge. But when a board rat advises me on QA, as you just did, I reckon you want to know, or should want to know about it. It is an odd thing in our "information driven, high tech age" that QA would continue to be one of the most poorly taught and so most poorly understood subjects in our culture. Our current circumstance is kind of frightening on some levels. Students are increasingly either not taking QA, or taking it and being taught it poorly. Thus, we have large numbers of QA illiterates combined with an increasing class of persons that believe they are QA literate but that have been taught so poorly that they really cannot think at all well with the concepts and algorithms they think they understand, but don't. I am increasingly a freak simply because I was once required know the connection between the underlying logic of, say parametric and non parametric statistical inference, and the models used. It is one of my most enduringly grotesque memories to recall meeting a college professor of statistics recently and learning that he could talk with impressive fluency about the manifold routines a particular statistical software package possessed and pontificate about what each was intended to be best used for, but then grow glassy eyed and inarticulate when asked to talk about the logics of deduction and induction underpinning statistics. He could not see that induction was based on certain deductive principles. He really could not see the logical disconnections in algorithms combining deduction and induction. He was a technician of QA, not a thinker. He was little different than an auto mechanic trained to run digital diagnostics on a car in order to fix it without really being able to think through problems of how the care operated and why design and materials science underlied phenomena he was plugging into to diagnose. It is okay that auto mechanics are trained this way. It is pointless in many circumstances to train auto mechanics to understand much more than what the digital diagnostics tell them, because the cars themselves are designed to be fixed that way in the first place today. But undesigned phenomena, or haphazardly designed phenomena subject to emerging complexity outside a design program, requires some dexterity of QA thinking. One has to define what one is even asking before one can organize a means to even a rudimentary answer. In short, one has to think a little. Not surprisingly, one can benefit from having some common sense about a subject before one tries to use QA to see through the biases in that common sense. One has to be logical, or want to try to be logical about how one thinks about phenomena. Even an esoteric realm of QA (I consider everything talking about quantities, or potentialities of quantities, in terms of probabilities as QA), as counter intuitive as quantum mechanical description and explanation of certain phenomena really is, requires common sense about the counter intuitive to be done well. Contrary to cliche, common sense is never the enemy. All good thinking evidences common sense. But stubborn adherence to common assumptions underpinning common sense often is an enemy to good thinking.)
Exactly, that's why I noted that distinction.
Try to characterize the distribution and topology of data to what extent is feasible given time and resources.
The mistake you are making is to want to be unnecessarily reductive and ignore the old data.
This is a mistake many make.
The mistake is made for many reasons, but most often it is an innocent one. Often analysts just overlook the worth of knowing something that can be wrung from old and new data, because they are so focused on getting to the most reliable point estimate about a specific phenomenon. The quest for reliability in inferences is a virtue, but it is a vice when it blinds us to other important insights, especially when we can have both, simply by being awake to looking for both.
I have few steadfast rules in QA; this is as close as I come: never, never, never, never, never, never, EVER want to ignore data old, or new, related to the questions one seeks answers to.
I want to INCLUDE the data, all the data that might be relevant to the question, and then wring insight from it.
The only kind of data you absolutely want to exclude is corrupted data--data that is a false indicator of the evidentiary event it represents. Root out the data that is made up. Root out the data that has huge measurement error. And so on. But ridding data sets of corrupt data is quite different than ridding data sets of old information, because times have changed. Times are always changing. All data is obsolete for inferences about the present when viewed as naively as you are apparently viewing the issue.
Too many analysts play god with data. They parse it from their own assumptions, rather than their won logic, when the whole point of working with data is to find out what the available data can logically tell us, not what the available data can be made to tell us.
Data exclusion often betrays an analyst struggling for relief from complexity, or some times seeking expediently to rationalize their pre-established POV by intervening to alter the distribution of the data and so the potential inferences from it.
Let me clarify what I mean by pre-established POV. It comes in two flavors: hypothesis and ideological assertion.
A hypothesis is a pre-established POV for sure, but it is one posited precisely to find out what is true, not as an end in itself.
An ideological assertion is one based on assumptions that the individual has already decided are necessary and so must be adhered to no matter what.
Ideological assertions are most often associated with political and moral issues these days, but they crop up like crab grass in everything human beings think about. Your flat assertion that we should exclude multi-ring winners before a certain year is really an ideological assertion masquerading as data parsing in pursuit of comparing apples with apples, not a logical one. You incorrectly assume there is nothing to be learned from including the old coaches in the data set, because you incorrectly assume they can shed no light on the topic. I have shown above that they can reveal something very useful to know in our search to understand the issue and I will shortly call attention to it.
For now, let me just say: iinclude all the available data at hand that resources and time permit the collection of regarding the question one wants to answer. Spend your time and effort figuring out how the data relates to your problem rather than assuming it doesn't matter.
In this case, some are asking if pursuing conference titles impedes Self from winning multiple rings? Thus we want to look at the data set of multiple ring winners in relation to conference titles, at least initially, to learn what we can about the correlation of multiple ring winners and conference titles. Start with correlation, then proceed to causation if ever possible. And if you're a real stickler, forget causation and just try to get to probabilities with confidence levels, which is also frequently not feasible. Fortunately, often, correlation is all we need, or, less fortunately, all that is feasible. In any case, starting with correlation allows us to include all the data. In the increasingly baroque, bordering on tyrannical age of the algorithm, in which many analysts lose site of the common sense logic that underlies all QA. Rough cut before fine cut. Broad before narrow. Include before exclude. And never criticize inclusion for its vagaries, because the vagaries are simply the price in accuracy we pay for getting the blinders off in order to get to the right answer to then hopefully parse the data insightfully so as to enable greater accuracy. I have read huge books about QA, but it all distills to what I just wrote. And without what I just wrote, all the huge books are worthless.
Given the question being asked by fans here, the reason to exclude the one ring winners from analysis in a non-ideological way is (to reiterate) that Self has already won one ring and so we know that either way works. Of course, if fans wanted to know which was the most likely way to win one ring, then I would probably include both the one ring winners and the multi ring winners because multi ring winners had to win one ring first before they won multiple rings; i.e., their inclusion in the data set would offer information to be wrung from them about winning one ring, even though it may be some what obscured by being bundled in multiple ring wins.
The reason to include the pre-75 guys AND distinguish between the conditions that prevailed under them and the differing conditions that prevailed afterwards is to establish a legacy context to wring insight about the actual impact of liberalization of access to the NCAA tourney. We need to gain this insight, because if we want to understand what enables multiple ring winners, we want to establish to some extent what, if any, effect the current liberalization of tournament access has on the tendency of multi-ring winners to win more without being conference champions. Our logic and knowledge of probabilities tell us that liberalizing tournament access at least creates a possibility of winning a ring without winning a conference title. But one of the things we want to learn from the data is whether that possibility is a major factor, or a minor factor, in winning multiple rings. And one imperfect way (and all inferences are imperfect) to gain that inference is from before liberalization and after liberalization comparison. Again, we would like to know if the probability of winning rings without winning titles is so much greater after liberalization that one would rationally expect coaches pursuing multiple rings to restructure their seasonal objectives on the way to winning rings and so de-emphasize the pursuit of conference titles.
And what the data shows is that even after liberalizing to 64 teams and allowing lots of teams with crappy overall records and less than first place conference finishes into the tournament, teams that finish first in their conferences STILL predominate as the winners of the tournament. Thus I infer that coaches still try to win conference titles in their pursuit of national titles for a variety of reasons, even after liberalization, and that when they don't win a conference title, they keep trying to win a ring but rarely do so.
This is so obvious that I have never understood why others have gone on this counter intuitive expedition into the possibility that coaches are trying not to win conference titles in order to increase their probabilities of winning rings.
And knowing what I know about the varying conditions of tournament access based on the entire data set of multiple ring winners, and knowing that liberalization has so far had a small impact on the correlation of conference titles being pre conditions for rings, I can then zero in on the portion of the data set that is post liberalization and look for further trends there, knowing that there is really only one anomalous data point--Jim Calhoun--indicating even the possibility of liberalization having a longer term effect than what we have so far observed.
All that being clarified, what could be most interesting to track in coming years is the potential effect of rising asymmetry in talent distributions hypothetically triggered by the PetroShoeCo-Agent Complex on how coaches become multiple ring winners.
Rock Chalk!