Page 1 of 2 12 LastLast
Results 1 to 30 of 34

Thread: Statistical software recommendations?

  1. #1
    New Romantic
    Join Date
    Nov 2003
    Location
    Seattle and Charlotte
    Posts
    6,293

    Statistical software recommendations?

    I'm looking for a cheap, basic statistical software package that can load data in some format (sxc, ideally, but csv if necessary) and perform basic analysis (median, stddev, etc.).

    OO Calc has stats function, but I'm not quite sure how useful a function like MEDIAN() is when you have to pass it parameters instead of just giving it column or row to extra the median from.

  2. #2
    New Romantic
    Join Date
    Dec 2003
    Location
    St. Louis - Gamertag: hjmadigan
    Posts
    5,326
    You can do all that in Excel, but SPSS also offers a time-limited demo version for free. It may be overkill for what you need, though, plus the full version is many thousands of dollars.

    There are other packages, though. I seem to even remember an open source one that a lot of academics are gravitating towards. I'll see if I can dig them up.

  3. #3
    New Romantic
    Join Date
    Dec 2003
    Location
    St. Louis - Gamertag: hjmadigan
    Posts
    5,326
    Ah, here we go...

    STATA:
    http://www.stata.com/
    Haven't used it, but my brother-in-law says it's great.

    S-PLUS:
    http://www.insightful.com/
    Ditto.

    S-Plus/R:
    http://statcomp.ats.ucla.edu/splus/
    http://www.r-project.org/
    This is the open-source version of S-PLUS. Probably cheaper, but like with a lot of open source stuff it's documentation and presentation aren't the slickest.

    Personally I use SAS (http://www.sas.com/), which is really powerful once you learn the programming language. It also handles and modifies very large data sets better than a lot of other programs, including (IMO) SPSS.

    But again, if you really just want to do the stuff you described in the OP you can do it all in Microsoft Excel pretty easily. Probably any other spreadsheet program, too.

    Isn't it amazing the things people at QT3 know about? :)

  4. #4
    New Romantic
    Join Date
    Nov 2003
    Location
    Seattle and Charlotte
    Posts
    6,293
    I'm looking at R right now, but it's more of a programming language/system than an application.

    Are you sure Excel will give you a median of a row of data? I would imagine that almost anything Excel can do that Gnumeric or OO Calc could do, and I verified that the latter two have MEDIAN() in the form:

    MEDIAN(a,b,c,d)

    And it will return the median of those values, whereas I want:

    MEDIAN(E0:E50)

    Or something like that.

  5. #5
    New Romantic
    Join Date
    Dec 2003
    Location
    St. Louis - Gamertag: hjmadigan
    Posts
    5,326
    Quote Originally Posted by BaconTastesGood
    Are you sure Excel will give you a median of a row of data?
    Yep, just verified it now. I have a row of data and this formula: =MEDIAN(A1:P1)

    It gives me the median of that row.

  6. #6
    Neo Acoustic
    Join Date
    Sep 2003
    Posts
    1,631
    From Excel 2000's help:

    MEDIAN

    Returns the median of the given numbers. The median is the number in the middle of a set of numbers; that is, half the numbers have values that are greater than the median, and half have values that are less.

    Syntax

    MEDIAN(number1,number2, ...)

    Number1, number2,... are 1 to 30 numbers for which you want the median.

    The arguments should be either numbers or names, arrays, or references that contain numbers. Microsoft Excel examines all the numbers in each reference or array argument.

    If an array or reference argument contains text, logical values, or empty cells, those values are ignored; however, cells with the value zero are included.

    Remarks

    If there is an even number of numbers in the set, then MEDIAN calculates the average of the two numbers in the middle. See the second example following.

    Examples

    MEDIAN(1, 2, 3, 4, 5) equals 3

    MEDIAN(1, 2, 3, 4, 5, 6) equals 3.5, the average of 3 and 4
    Emphasis mine.

  7. #7
    New Romantic
    Join Date
    Nov 2003
    Location
    Seattle and Charlotte
    Posts
    6,293
    Well, crap, don't I look like a screaming dork. Well, I'd rather be a dork then muddle through R -- good lord that's an obtuse program.

  8. #8
    New Romantic
    Join Date
    Nov 2003
    Location
    Seattle and Charlotte
    Posts
    6,293
    You know, I love OpenOffice, but at times it's just a piece of shit.

    All I want to do is take a set of data, filter it, and then run some statistical analysis over it. For example, in SQL maybe I'd do something like:

    SELECT duration FROM games WHERE players > 1 and players < 4;

    then, given that output, do an average of the duration of games where there are 2-3 players. Very simple conceptually.

    In OOCalc, I have the raw data. The median, as we've just learned, is easy enough to compute on the raw data. So I filter the data using its data filter features, and sure enough, it limits the display data...but the fucking MEDIAN function still processes even the filtered out data.

    Um, hello?!

    Ugh. Nothing more frustrating that knowing exactly what you want to do and not being able to do it.

    (Okay, the only thing more frustrating is someone bitching in the forums only to have people post and show he's an idiot, but we've done that oncec already this thread)

  9. #9
    Social Worker
    Join Date
    Dec 2003
    Posts
    2,310
    I'd recommend Excel as well; it even allows you to do intermediate descriptive statistics such as correlating two columns of data & determining beta weights from simple linear regressions. The graphical interface makes it fairly intuitive for me to use, as well.

    Now the for quick derail into stats-geekery:
    Quote Originally Posted by Thrrrpptt!
    Personally I use SAS (http://www.sas.com/), which is really powerful once you learn the programming language. It also handles and modifies very large data sets better than a lot of other programs, including (IMO) SPSS.
    What specific advantages does SAS have over SPSS? Does it just run faster on large data sets? I've only used SPSS (besides some specialized programs for doing structural modeling or hierarchical linear regression); while I've heard about SAS, I've never bothered to do an in-depth comparison of the two systems.

  10. #10
    New Romantic
    Join Date
    Nov 2003
    Location
    Seattle and Charlotte
    Posts
    6,293
    Quote Originally Posted by Sidd_Budd
    I'd recommend Excel as well
    Any opinions on Excel vs. OOCalc's respective statistical functions? I was under the impression that OO had most, if not all, of Excel's features.

    Do these spreadsheet programs allow statistical analysis on a subset of rows, e.g. "MEDIAN(R1:R100) WHERE S[y] > 5" or something?

  11. #11
    Social Worker
    Join Date
    Dec 2003
    Posts
    2,310
    I've got no opinion on OOCalc -- never used it, and didn't even know what it was until this thread.

    There are some database functions in Excel, although I find them kind of clunky; I don't see median as a specific function, but you can compute the mean & standard deviation of a subset of a large dataset that meets criteria that you input. I also don't have any modern (object oriented) programming experience, but Excel allows you to create macros in Visual Basic or Microsoft Script Editor that might allow you to find the specific section of data you are interested in.

    I use SPSS for my major stats work, since I learned most of that language in grad school, and do cut & pastes from the graphical interface into a syntax file for any procedures with which I'm unfamiliar. When I'm just doing low-level stuff in Excel, I would just find the specific data I'd need using primitive search techniques. For example:

    1) Use Excel to sort my data in ascending order on the field S[y]
    2) Type "=MEDIAN(" into a convenient empty cell above my data set
    3) Manually scan the S[y] field until I find the first record with a value greater than 5
    4) Click & drag on the the R-column starting with the row I identified in step 3 until the end of the dataset -- this is automatically entered into the formula I started in step 2
    5) Type ")" to end the formula and hit ENTER. Not elegant, but you've got your median.

    I have an Excel spreadsheet of about 3000 rows that I frequently do this stuff with. However, your dataset may contain features that make this visual inspection untenable.

  12. #12
    New Romantic
    Join Date
    Dec 2003
    Location
    St. Louis - Gamertag: hjmadigan
    Posts
    5,326
    Quote Originally Posted by Sidd_Budd
    What specific advantages does SAS have over SPSS? Does it just run faster on large data sets? I've only used SPSS (besides some specialized programs for doing structural modeling or hierarchical linear regression); while I've heard about SAS, I've never bothered to do an in-depth comparison of the two systems.
    I haven't done a lot of in-depth comparisons either, SAS seems to be better at handling data sets, especially large ones (as in many variables and many observations) that you want to manipulate. Think rotating data, merging it with other data sets, keeping or dropping variables, outputting results of analyses to new data sets, etc.

  13. #13
    New Romantic
    Join Date
    Nov 2003
    Location
    Seattle and Charlotte
    Posts
    6,293
    However, your dataset may contain features that make this visual inspection untenable.
    It's not the visual inspection, it's the sheer number of variables. For example, let's say I want to analyze baseball pitching stats. Sure, there's the typical ERA, etc. but what if I want to compute "ERA when pitching against left-handers in a dome after a loss"? I don't know jack about stats software (which is why I started this thread) but I see elements of DB manipulation there. For example, I know I could mostly hack something like that together with MySQL, just not the actual number crunching (beyond "SUM").

  14. #14
    New Romantic
    Join Date
    Aug 2003
    Posts
    9,203
    Quote Originally Posted by BaconTastesGood
    However, your dataset may contain features that make this visual inspection untenable.
    It's not the visual inspection, it's the sheer number of variables. For example, let's say I want to analyze baseball pitching stats. Sure, there's the typical ERA, etc. but what if I want to compute "ERA when pitching against left-handers in a dome after a loss"? I don't know jack about stats software (which is why I started this thread) but I see elements of DB manipulation there. For example, I know I could mostly hack something like that together with MySQL, just not the actual number crunching (beyond "SUM").
    Excel has simple boolean logic functions. It may be a bit more clunky to write the formulas, but you should be able to do the equivalent of what you want. I used to do some significantly more advanced excel formulations, but then I found my graphing program had all the functionality I needed, so learned to use it instead. (Mind you, my graphing program is SigmaPlot, which means it's got some inner gut workings sharing going on with the SPSS line anyway..)

  15. #15
    Social Worker
    Join Date
    May 2003
    Location
    in the land of the ice and snow
    Posts
    2,091
    If what you want to do is test hypothesis on your desktop, Jump from SAS is handy quick and easy.

    If what you want to do is compute some medians and means, Excell will do.

    If you want to seriously mine Sean Lahmans free database of baseball stats, you'll probably be best off learning to use the query function in Acess.

  16. #16
    New Romantic
    Join Date
    Nov 2003
    Location
    Seattle and Charlotte
    Posts
    6,293
    Quote Originally Posted by dfs
    If what you want to do is test hypothesis on your desktop, Jump from SAS is handy quick and easy.

    If what you want to do is compute some medians and means, Excell will do.

    If you want to seriously mine Sean Lahmans free database of baseball stats, you'll probably be best off learning to use the query function in Acess.
    The problem is that I want to kind of do all of the above (although the baseball example was just an example, I don't know who Sean Lahman is). Basically I have all kinds of data in various forms that I'm trying to do some analysis on, looking for correlations based on criteria I specify.

  17. #17
    Social Worker
    Join Date
    Dec 2003
    Posts
    2,310
    Quote Originally Posted by BaconTastesGood
    It's not the visual inspection, it's the sheer number of variables...I don't know jack about stats software (which is why I started this thread) but I see elements of DB manipulation there.
    Thanks for the clarification. Sadly, I have little to offer; the work I do doesn't require that level of database manipulation; in SPSS, I may need to compare male respondents to female, but that's the extent of it. I know little about Access, but you may be better served by a database program. It sounds like the actual statistical calculations you are interested in are low-level (means, counts, medians), so I'd assume most database software would include them.

    As mouselock indicates, you probably will be able to do all this in Excel as well, once you learn the setup for queries. I've never needed to learn any sophisticated criteria for Excel searching, because I can always find my data of interest fairly easily. To use your "ERA when pitching against left-handers in a dome after a loss" example, I'd just do a sort on multiple criteria (Pitcher, Location, Prev W/L) to zero in on the ERAs of interest, & run a simple average function.

  18. #18
    Social Worker
    Join Date
    May 2003
    Location
    in the land of the ice and snow
    Posts
    2,091
    The problem is that I want to kind of do all of the above (although the baseball example was just an example, I don't know who Sean Lahman is). Basically I have all kinds of data in various forms that I'm trying to do some analysis on, looking for correlations based on criteria I specify.
    Then Access is probably the easiest/cheapest way to go.

    Of course with all large data sets, you should probably spend more time structuring your data set and choosing your tools than actually searching the set and testing hypothesis. That's the nature of the beast.

  19. #19
    New Romantic
    Join Date
    Nov 2003
    Location
    Seattle and Charlotte
    Posts
    6,293
    Quote Originally Posted by Sidd_Budd
    It sounds like the actual statistical calculations you are interested in are low-level (means, counts, medians), so I'd assume most database software would include them.
    Means, counts, medians, modes, linear correlations, graphical output (histograms), and that's about it. Pretty basic stuff so that I can at least answer "How much of an effect does this variable have on this action?"

    As mouselock indicates, you probably will be able to do all this in Excel as well, once you learn the setup for queries.
    Yeah, I'm guessing it's doable, but it's probably as difficult to learn as R, so I may as well do it right and learn R.

    thanks again!

  20. #20
    Social Worker
    Join Date
    Feb 2005
    Location
    Seattle, WA
    Posts
    2,657
    Quote Originally Posted by BaconTastesGood
    Yeah, I'm guessing it's doable, but it's probably as difficult to learn as R, so I may as well do it right and learn R.

    thanks again!
    R is..kind of cumbersome at first.
    There are some good books out there for S-Plus though, and code written in one should work in the other. I'd suggest "An Introduction to S and S-Plus" by Phil Spector.

    Wow...this thread has really served as a reminder of how much I've forgotten since I was in grad school. I was using R for my dissertation research, but STATA was my favorite program.

  21. #21
    New Romantic
    Join Date
    Jun 2002
    Location
    First Terrace of Purgatory
    Posts
    6,218
    I used SPSS for my graduate work and found it a real breeze. I could do almost anything with the numbers with a few mouse clicks.

    The trick, of course, was figuring out what it all meant.

    Troy

  22. #22
    New Romantic
    Join Date
    Nov 2003
    Location
    Seattle and Charlotte
    Posts
    6,293
    Quote Originally Posted by Misguided
    There are some good books out there for S-Plus though, and code written in one should work in the other. I'd suggest "An Introduction to S and S-Plus" by Phil Spector.
    A-ha, I hadn't thought of that -- I figured this stuff was so niche (unlike SAS or SPSS) that I'd end up having to learn from tutorials on the Web. Turns out there are probably a half-dozen books on R and intro stats with R, which is surprising.

    Data Analysis & Graphics Using R
    Using R for Introductory Statistics

    I've gone ahead and ordered both.

  23. #23
    Social Worker
    Join Date
    Dec 2003
    Posts
    2,310
    Quote Originally Posted by TSG
    I used SPSS for my graduate work and found it a real breeze. I could do almost anything with the numbers with a few mouse clicks.

    The trick, of course, was figuring out what it all meant.
    Man, you must have gone to grad school in the Stone Ages.

    Nowadays, we start with a unshakeable opinion of what we expect the answers to be, and *then* run the data a million different ways until we get the results we want.

    (/sarcasm off, in case any of my committee members are closet lurkers)

  24. #24
    Social Worker
    Join Date
    Feb 2005
    Location
    Seattle, WA
    Posts
    2,657
    Quote Originally Posted by Sidd_Budd
    Nowadays, we start with a unshakeable opinion of what we expect the answers to be, and *then* run the data a million different ways until we get the results we want.
    Error correction? What error correction?

  25. #25
    New Romantic
    Join Date
    Dec 2003
    Location
    St. Louis - Gamertag: hjmadigan
    Posts
    5,326
    Quote Originally Posted by Misguided
    Error correction? What error correction?
    Yes, error correction.

  26. #26
    New Romantic
    Join Date
    Jul 2003
    Location
    Toronto, Canada XBL Gamertag: tromik
    Posts
    8,759
    Sometimes you can find SPSS for free.

  27. #27
    Social Worker
    Join Date
    Feb 2005
    Location
    Seattle, WA
    Posts
    2,657
    Quote Originally Posted by Thrrrpptt!
    Quote Originally Posted by Misguided
    Error correction? What error correction?
    Yes, error correction.
    At times like this I wonder why I didn't leave graduate school sooner :D

  28. #28
    Account closed Social Worker
    Join Date
    Jul 2002
    Location
    Between an Escape from the Outback and a Void
    Posts
    3,016
    Quote Originally Posted by Sidd_Budd
    Nowadays, we start with a unshakeable opinion of what we expect the answers to be, and *then* run the data a million different ways until we get the results we want.
    Are you serious? If so, then can you expand upon this?

  29. #29
    New Romantic
    Join Date
    Jun 2002
    Location
    First Terrace of Purgatory
    Posts
    6,218
    Quote Originally Posted by Brian Koontz
    Quote Originally Posted by Sidd_Budd
    Nowadays, we start with a unshakeable opinion of what we expect the answers to be, and *then* run the data a million different ways until we get the results we want.
    Are you serious? If so, then can you expand upon this?
    He's sort of joking. Hence the sarcasm tag.

    It's a common issue in grad schools, though. People start with the argument they want to make and then find stuff to support it. If you have a poor grasp of statistical methods but lots of "data", it is really not that hard to massage the numbers and only publish the correlations and regressions that fit your argument.

    Which is why my thesis committee stayed on top of me. (Not that they thought I would cheat, but they did want me to pass.) I had to create my own data set, run reliability tests on my coding, pass tests of internal and face legitimacy, have tons of charts and make sure that I could defend my results in the oral defense stage.

    A lot of bad science, though, is based on just looking at the first friendly result that comes along. Or assuming that correlation equals causation. Or ignoring multiple seemingly contradictory correlations.

    As my profs explained to me, statistics isn't math; it's a language. And the results have to have real meaning.

    Troy

  30. #30
    Social Worker
    Join Date
    Feb 2005
    Location
    Seattle, WA
    Posts
    2,657
    Quote Originally Posted by TSG
    He's sort of joking. Hence the sarcasm tag.
    Right, and I was playing along with the error correction line.
    You see, with every statistical test there is a chance that we will decide a result is statistically meaningful even though it really isn't. This is an accepted part of science and one of the reasons it is important that findings are replicated in independent data sets. But if you take a bunch of data and run a bunch of different comparisons, sooner or later you're going to get one that supports your claim by sheer chance, even if there really is no meaningful association.

    When you do lots of statistical tests, there are proper ways to adjust for the number of tests that you've done, but there is a lot of published research where they never make any mention of this. To put it bluntly, a little knowledge of statiistics is a dangerous thing.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •