Free State Project Forum

Please login or register.

Login with username, password and session length
Pages: [1]   Go Down

Author Topic: A better spreadsheet  (Read 9275 times)

TedApelt

  • FSP Participant
  • ***
  • Offline Offline
  • Posts: 117
  • Free 50 states - one at a time.
A better spreadsheet
« on: December 20, 2002, 03:16:35 pm »

I have made some improvements to the original spreadsheet:

1.  The "interpolation" method used in the original simply doesn't make any sense.  I am using a simple ratio instead, always setting the best state to 10

2.  I added climate (numbers on that are my own personal opinion) and a few other things.  Two of them (medical marijuana and the number of people who live within an hour's drive of the state) will take some doing to get on there.  I am contacting NORML to see what they say about how marijuana friendly (or at least less hostile to it) various states are, and I am also encouraging them to work with the FSP.

3.  I adjusted "jobs" for population.

Question:  Where do the "homeschooling" numbers come from?

Anyway, here are the results:



Category   Variable   WEIGHT   Wyoming   Alaska   N Dakota   Vermont   S Dakota   Delaware   Montana   Idaho     N.H.   Maine                                 
SIZE   Voters   7   10   7.72   7.41   7.29   6.76   6.53   5.2   4.32   3.77   3.28                              
QUALITY   Climate   10   7   0   3   5   5   10   5   7   5   4                              
SIZE   Finance   8   8.94   6.88   10   9.77   2.23   4.83   3.85   5.45   2.14   3.56                              
QUALITY   Livability   1   9.34   7.82   8.87   8.69   9.12   8.15   8.62   9.85   10   8.45                              
QUALITY   Crime   1   6.6   5.48   9.6   8.39   9.96   5.73   6.29   7.41   10   8.64                              
CULTURE   Spending   5   5.75   4.03   5.34   5.92   6.06   8.37   4.67   5.88   10   5.46                              
QUALITY   Income   4   7.34   10   6.71   7.93   6.84   9.19   6.4   7.29   9.59   7.21                              
VIAB   BLM   3   0.22   0.15   2.56   0.35   1.58   5   3.47   0.16   0.781   10                              
CULTURE   Taxes   2   6.43   10   6.18   5.73   6.92   6.18   6.3   6   7.33   4.92

QUALITY   Jobs   5   4.62   6.27   4.5   4.67   6.78   6.45   8.52   10   6.99   3.65

CULTURE   Prez   1   7.39   10   2.5   2.71   5.76   2.61   4.56   7.71   5.21   5

CULTURE   Guns   2   9.24   8.48   8.29   10   7.14   6.67   8.29   9.24   9.52   8.76

CULTURE   Homesch   5   7   10   0   0   3   10   7   10   3   0

SIZE   Area   10   0.2   0.03   0.28   2.11   0.25   10   0.23   0.24   2.17   0.63

CULTURE   R/D ratio   5   4.22   4.86   5.6   8.31   6.4   7.86   5.89   4.24   10   8.88
QUALITY   Nearby pop   3                              
CULTURE   Med mar.   5                              
QUALITY   Distance   10   2.36   0.1   2.45   5.58   2.13   10   1.68   1.78   5.54   3.42
   TOTAL      429.76   336.89   350.8   434.61   335.22   641.7   351.36   399.73   420.173   330.18
                              
Logged
How much political experience do you have?  Probably not enough.  Get some!  DO THIS NOW!!!

Zxcv

  • ****
  • Offline Offline
  • Posts: 1229
Re:A better spreadsheet
« Reply #1 on: December 20, 2002, 08:32:21 pm »

What spreadsheet is this, Ted?
.....

Ohhhh, that spreadsheet!   :P

I finally gave up on the java version and downloaded the xls file. That worked for me. However I don't understand what you are talking about, the improvements you made? The 0-10 interpolation is still there. I don't see your improvements.

I assume you are talking about your local copy. If I understand correctly, I agree with you about the ratio change.

What this spreadsheet really needs, is a lot more rows, or criteria to work with. Is anyone working on that, I wonder?

BTW, when you post columnar data like this, try the "code" tag. It preserves the columns...
« Last Edit: December 20, 2002, 10:25:32 pm by Zxcv »
Logged

TedApelt

  • FSP Participant
  • ***
  • Offline Offline
  • Posts: 117
  • Free 50 states - one at a time.
Re:A better spreadsheet
« Reply #2 on: December 21, 2002, 01:26:01 am »

Take a look at the numbers on my spreadsheet.  They (with the exception of homeschooling) are different.  I calculated each one by hand.

How do I use the "code" tag?  The lowest check box?
Logged
How much political experience do you have?  Probably not enough.  Get some!  DO THIS NOW!!!

JasonPSorens

  • Administrator
  • *****
  • Offline Offline
  • Posts: 5706
  • Neohantonum liberissimum erit.
    • My Homepage
Re:A better spreadsheet
« Reply #3 on: December 21, 2002, 10:03:30 am »

How exactly did you calculate your numbers, Ted?  The spreadsheet on the website gives the worst state "0", the best state "10," and arranges the others in between.  I notice yours doesn't give the worst state "0."  That makes the weightings meaningless, actually.  Making the ratings range from 0-10 "normalizes" the data so that each variable can be compared with the others.
Logged
"Educate your children, educate yourselves, in the love for the freedom of others, for only in this way will your own freedom not be a gratuitous gift from fate. You will be aware of its worth and will have the courage to defend it." --Joaquim Nabuco (1883), Abolitionism

Solitar

  • Guest
Re:A better spreadsheet
« Reply #4 on: December 21, 2002, 11:12:07 am »

Jason,
Though the downloaded spreadsheet does zero the worst state, the charts on the web page do not.
http://www.freestateproject.org/state.htm

A further point is that a zero to ten scale does not reflect the worth of each state since, if all ten states really are pretty good on a particular measure, giving a zero to one artificially spreads out the range and hurts the "lower" states. This is like a test where everyone pretty well masters the material and aces the test but, to fit the results to a curve, some must flunk.
The reverse also is evident. If all our states are pretty bad on a particular category and we are only picking the least bad of the worst, then it artifically inflates the better scores. This is like a tests in our public schools where everyone is grossly ignorant of geography, for instance, but, to fit the results to a curve, some must be graded as "proficient".

Also, to anyone who uses weights restricted to the categories on the state data charts or spreadsheet, there have been many other criteria posted on this forum by several contributors which may be important to consider. Other criteria that is in those sheets may not reflect other data that has been dug up. Thus not only the weights may be varied, but the individual scores as well.
Logged

JasonPSorens

  • Administrator
  • *****
  • Offline Offline
  • Posts: 5706
  • Neohantonum liberissimum erit.
    • My Homepage
Re:A better spreadsheet
« Reply #5 on: December 21, 2002, 11:44:35 am »

Your point about grading states on a curve is a fair one, Joe.  However, it's about the only way to do it.  If I understand how Ted has done it, his ratings are arbitrary to the way the data are counted.  For a quick example, think of it this way: what if we used # of voters above 215,000 instead of number of voters?  Then the ratings on Ted's spreadsheet would change, even though the underlying variable is no different.

If all states are basically bad on a variable, but the normalization "spreads them out" 0-10, the appropriate solution is not to weight that variable very heavily at all, since it doesn't really measure important differences among the states.
Logged
"Educate your children, educate yourselves, in the love for the freedom of others, for only in this way will your own freedom not be a gratuitous gift from fate. You will be aware of its worth and will have the courage to defend it." --Joaquim Nabuco (1883), Abolitionism

Zxcv

  • ****
  • Offline Offline
  • Posts: 1229
Re:A better spreadsheet
« Reply #6 on: December 21, 2002, 12:29:51 pm »

No, Jason, I think Ted has a good solution. I'm guessing but I think what I have below illustrates it.

Let's think of a simple example to illustrate. Imagine we are comparing only two states, A and B. And we are using a criterion for which more is better, such as percentage of land not owned by government.

If State A has 50% not owned by government, and State B has 49%, then with our current method State A would be given a "10" and State B a "0". That clearly does not make sense; the two are essentially the same.

I *think* what Ted is doing is making State A's (the highest)value 10, then calculating the rest as, in this example, (49/50*10), or 9.8. This better captures the real difference between the two.

When using a criterion for which less is better, you'd just give the lowest a 10, then flip the ratio, e.g. population in thousands would be something like 500/600*10. For a state with double the population it would be 500/1000*10, giving a 5.

Ted, I tried the code tag but it didn't seem to work like it did for me on another BBS software.
« Last Edit: December 21, 2002, 12:41:53 pm by Zxcv »
Logged

Zxcv

  • ****
  • Offline Offline
  • Posts: 1229
Re:A better spreadsheet
« Reply #7 on: December 21, 2002, 12:56:17 pm »

I was thinking about this some more last night anyway. Rather than normalizing the data this way, we should be entering the raw data into the spreadsheet and using the spreadsheet formulas to normalize it. After all, that's what spreadsheets are for!  ;)

There are probably not that many ways to normalize. We can make a few template rows with the different possible formulas then cut and past the appropriate template into the sheet for each new row of data we enter.

At least that way, it will be clear what is being done with the raw data.

I'd also like to see a column giving the URL for the source of the data.

I don't know who owns the spreadsheet, but if that person is busy I might take a whack at revising it this way, and also making a shot at harvesting the new rows of data out of all the discussions people have been using to bring new criteria up with. The latter is probably a hopeless task, but the spreadsheet sure needs more rows than it has now.

There's got to be a better way to add these criteria people are always coming up with, into the spreadsheet, but I sure can't think of one at the moment.
« Last Edit: December 21, 2002, 12:58:35 pm by Zxcv »
Logged

TedApelt

  • FSP Participant
  • ***
  • Offline Offline
  • Posts: 117
  • Free 50 states - one at a time.
Re:A better spreadsheet
« Reply #8 on: December 21, 2002, 03:21:14 pm »

>  I *think* what Ted is doing is making State A's (the highest)value 10, then calculating the rest as, in this example, (49/50*10), or 9.8. This better captures the real difference between the two.

Yes.  This is exactly what I did.  The idea was not to overmagnify small differences between the states.

Why always make the best value 10?  So that all the weights have the same meaning.

One more thing - stats for "Voters", "Prez", and "R/D ratio" were all from the 2000 Presidential election:

Voters - the number of people who actually voted.  Why worry about people who don't vote?

Prez - Harry Browne vote % - AK was the highest at 0.92%

"R/D ratio" - ratio of Republican to Democrat votes in that election - as a fraction with the lowest divided by the highest.  Values closest to 1 (R votes equallling D votes) were considered to be the best.  This is because I think that states that are overwhelmingly lopsided by one party over another will be harder for us.

Logged
How much political experience do you have?  Probably not enough.  Get some!  DO THIS NOW!!!

Zxcv

  • ****
  • Offline Offline
  • Posts: 1229
Re:A better spreadsheet
« Reply #9 on: December 21, 2002, 03:38:20 pm »

Yes, Ted, I'm suggesting retaining the normalization to a given value (in this case 10), but doing that entirely within the spreadsheet, rather than doing it manually outside the spreadsheet and entering the already-normalized numbers.
Logged

JasonPSorens

  • Administrator
  • *****
  • Offline Offline
  • Posts: 5706
  • Neohantonum liberissimum erit.
    • My Homepage
Re:A better spreadsheet
« Reply #10 on: December 29, 2002, 02:24:57 pm »


If State A has 50% not owned by government, and State B has 49%, then with our current method State A would be given a "10" and State B a "0". That clearly does not make sense; the two are essentially the same.

Right, but what is the purpose of the spreadsheet?  It's aggregating a bunch of variables to get a final, composite ranking.  So the actual, cardinal numbers used in the variables are arbitrary.  What we need are to transform the variables so that they are comparable to each other.  That's what normalization does.  If the substantive differences among the states are not important, as in the example you give, the solution is to give the normalized variable a very small weighting so that it does not much affect the final result.  You can mathematically demonstrate that this works the exact same way as Ted's solution, except that Ted's solution can create paradoxes, as below.

Quote
I *think* what Ted is doing is making State A's (the highest)value 10, then calculating the rest as, in this example, (49/50*10), or 9.8. This better captures the real difference between the two.

IMO, it doesn't capture the real difference any better, because the difference between 9.8 and 10 is huge if you weight the variable heavily, but small if you weight the variable by just a little.  So either way, it's the weighting that matters.  But Ted's solution can create paradoxes if you're not very sophisticated about the way in which you do the ranking.  For example, imagine the ranking is "government ownership above 48%."  Then state A scores 2% and state B scores 1% even though the fundamental, underlying concept is the same.  Ted's solution would give B a 5 and A a 10.  But any good normalization should not change if the scale of the fundamental variable changes.
Logged
"Educate your children, educate yourselves, in the love for the freedom of others, for only in this way will your own freedom not be a gratuitous gift from fate. You will be aware of its worth and will have the courage to defend it." --Joaquim Nabuco (1883), Abolitionism

Zxcv

  • ****
  • Offline Offline
  • Posts: 1229
Re:A better spreadsheet
« Reply #11 on: December 29, 2002, 04:01:09 pm »

Quote
Right, but what is the purpose of the spreadsheet?  It's aggregating a bunch of variables to get a final, composite ranking.  So the actual, cardinal numbers used in the variables are arbitrary.  What we need are to transform the variables so that they are comparable to each other.  That's what normalization does.  If the substantive differences among the states are not important, as in the example you give, the solution is to give the normalized variable a very small weighting so that it does not much affect the final result.

I see your point about weighing, but I don't agree with it.

The fundamental function of assigning weights, is to decide for yourself how important that criterion should be in contributing to what state we choose. One should be able to assign it without having to look at the data itself. You shouldn't need to fudge your weights in order to account for poor choices in the normalization algorithm that cause non-real-world representations of the data (such as that 49% is very different from 50%).

It's hard enough assigning weights, only taking into account the importance of criteria. We don't need to pile this other function on to it.

Quote
But Ted's solution can create paradoxes if you're not very sophisticated about the way in which you do the ranking.  For example, imagine the ranking is "government ownership above 48%."  Then state A scores 2% and state B scores 1% even though the fundamental, underlying concept is the same.  Ted's solution would give B a 5 and A a 10.

Well, I'm not suggesting you can't pick a criterion in such a way as to break this simple nomalization algorithm. But you have to admit, you had to go out of your way to do it!

On the other hand, we did not go out of our way at all to break the current normalization algorithm (in such a way as to require fudging weights to compensate); in fact we used criteria that are already in the spread sheet!

Generally I think you have to look at each criterion, then pick one of several nomalization algorithms that would best display the data in real-world terms, and that doesn't give nonsense answers when you apply different data to the criterion. (Obviously the criterion having to do with borders and coasts has to have a different normalization algorithm than this one.)

This is similar to what I picked up long ago when I was a Physics major at the University. One way we used to verify that an equation matched the system under study, was to take each variable in the equation and set it to its maximum possible or minimum possible value, then verify that the results the equation predicts still makes sense. If it does, for every variable, then we can be pretty sure the equation is right.
Logged

JasonPSorens

  • Administrator
  • *****
  • Offline Offline
  • Posts: 5706
  • Neohantonum liberissimum erit.
    • My Homepage
Re:A better spreadsheet
« Reply #12 on: December 29, 2002, 04:31:57 pm »


The fundamental function of assigning weights, is to decide for yourself how important that criterion should be in contributing to what state we choose. One should be able to assign it without having to look at the data itself. You shouldn't need to fudge your weights in order to account for poor choices in the normalization algorithm that cause non-real-world representations of the data (such as that 49% is very different from 50%).

Naturally, I disagree. ;)  I don't think it's possible to consider how to weight variables without some knowledge of the base numbers & what they mean.  But even if you think you should just consider variables abstractly to determine their weightings, the current method of interpolation is better than Ted's suggested method, which requires not just a consideration of the variables' importance "abstractly" or "in themselves" but also how the scale of the base variable is affecting the transformation of the data.

Quote
Well, I'm not suggesting you can't pick a criterion in such a way as to break this simple nomalization algorithm. But you have to admit, you had to go out of your way to do it!

Well, the point in doing so was to show that there's nothing magical about the number zero as a base point in the fundamental variable.  As a consequence, any minimum point on the new scale is going to be completely arbitrary!  It's "going out of your way" no matter how you slice it.  Hope that makes sense. ;)

Quote
Generally I think you have to look at each criterion, then pick one of several nomalization algorithms that would best display the data in real-world terms, and that doesn't give nonsense answers when you apply different data to the criterion. (Obviously the criterion having to do with borders and coasts has to have a different normalization algorithm than this one.)

Generally you want to have the same normalization algorithm across all variables: since they are all being added up together, they need to be on the same scale.  For geography, of course, you can't do that, so what the spreadsheet does is to create a "pseudo-normalization" that is conceptually as close as possible as what is being done to the other variables.
Logged
"Educate your children, educate yourselves, in the love for the freedom of others, for only in this way will your own freedom not be a gratuitous gift from fate. You will be aware of its worth and will have the courage to defend it." --Joaquim Nabuco (1883), Abolitionism

Zxcv

  • ****
  • Offline Offline
  • Posts: 1229
Re:A better spreadsheet
« Reply #13 on: December 29, 2002, 05:05:41 pm »

Well, you can't have the same algorithm for all. Sometimes for one criterion, more is better. For another, less is better. It's not too hard to imagine one for which a medium value is better (like population density). And you have more discrete measures like borders/coastlines; it is in fact a different algorithm (I'm using the word algorithm in the strict sense).

Maybe what we really need is a spreadsheet that allows the individual to choose not only the weights, but the normalization algorithm.  ;D

This is all academic anyway, because most people will pick their state based on their Aunt Millie's impressions when she drove through there 60 years ago.  ;)

Well, since I can't talk you into modifying the algorithms, what do you think about providing a link to an alternative spreadsheet that guys like Ted and me can be comfortable with?

And what do you think about putting the algorithm calculation itself in the spreadsheet so it doesn't have to be done manually, and so raw data can go into the sheet?

And what about adding new rows? I'm very disturbed that when I run the sheet as it is, I come up with Delaware on top!  :o
Logged

JasonPSorens

  • Administrator
  • *****
  • Offline Offline
  • Posts: 5706
  • Neohantonum liberissimum erit.
    • My Homepage
Re:A better spreadsheet
« Reply #14 on: December 29, 2002, 06:18:35 pm »

Well, the thing is, if you know what you're doing, Ted's spreadsheet and mine should yield exactly the same result.  If the spreadsheets yield different results, it means you've created inconsistent weightings.  I'm loath to provide a link to a spreadsheet I consider statistically unsound and potentially misleading, though. ;)  Maybe with appropriate caveats to the casual browser.

Putting the raw data into the spreadsheet shouldn't be tough.  It will take some time though that I don't have right now. :P

Regarding new rows: let me know new ones, and I'll put them in!  I try to put in everything that has been adequately defended as a useful quantitative measure for state culture, viability, ease of victory, or livability.
Logged
"Educate your children, educate yourselves, in the love for the freedom of others, for only in this way will your own freedom not be a gratuitous gift from fate. You will be aware of its worth and will have the courage to defend it." --Joaquim Nabuco (1883), Abolitionism
Pages: [1]   Go Up