Thursday, January 16, 2014

A very basic understanding of doing Data Analysis


It has indeed been quite a while since I've had these "monkey banging on a typewriter" moments.

In trying to refresh myself on the importance and use of data analysis online, I came across two sources that I felt best summarized how this process is done and how it is often managed. One is from an article written by Joel H. Levine, Entitled “Introduction to Data Analysis, The Rules of Evidence” (1997), another is an online article done on the webpage of Responsible Conduct in Data Management, entitled “Data Analysis” (Accessed, 2014).
Both works cite the importance of data analysis as giving the researcher the necessary methods or techniques to make sense of the raw data he/she has been given. In particular, its importance in being able to help the researcher define facts, detect patterns, develop explanations, and test hypotheses (Levine, 1997), regarding the data. Determining the proper method of analyzing data relies on the approaches that were used to gather the data as well as the form of the data gathered.

Researchers must exercise caution though when analyzing data as improper analysis can lead to a skewing of findings that may affect how a reader may take in the findings of the entire research. The integrity of the data and the research at a whole is then put into peril when this happens.
On that note, While both sources agree there are no “hard and fast rules” (Levine, 1997) in terms of doing data analysis within the works I found there are several things that are important to bear in mind both before and while conducting it;

Skill is considered important in that it helps make the analysis easier to do. A researcher with sufficient skill in analyzing a certain type of data is more reliable in giving credible findings than a researcher applying a skill that isn’t applicable to the data and expecting the same findings as the former. Researchers must then be faced with a continual development of their skill sets required to analyze effectively, meaning that training, re-training, and monitoring are vital aspects. Being updated on techniques and methods are valuable as well, in that the skills they may have learned before might be considered outmoded by the time new data comes along.

Establishing good research skill also builds up both the credibility and confidence of a researcher, as the task at hand may not seem as daunting, due to their familiarity with the data. Though it is a given that different researchers may have different skill proficiencies, it is always more practical to pitch the proper data to the proper “experts” (Data analysis, 2014).

Concurrently doing analysis as data comes in is recommended to the researcher as, “Statistical advice should be obtained at the stage of initial planning of an investigation so that, for example, the method of sampling and design of questionnaire are appropriate”. (Smeeton & Goda, 2003) This means that as data comes in, analysis may help in tweaking the data gathering apparatus to be more precise or appropriate.

Unbiased inference means accepting the findings of the data, regardless of the researcher’s opinion on the issue or problem statement. Fostering a bias early in the research process leads to, more often than not, churning out a research paper of poor quality. An effect of biased inference leads to another issue called “subgroup analysis” which involves orbiting the entire research around the findings from select tests with results that are significant to a hypotheses that is favored by the researcher and ignoring the rest of the tests conducted that resulted in findings contrary to the hypotheses—in short, the researcher becomes prone to cherry-picking their findings.

Following proper procedure is important, because every field of study has certain standards to insure the quality of its output. Quite often two factors come into play (Resnick, Data Analysis): The nature of the variables used, and assumptions about the population from which the data are drawn. In the instance that the researcher feels that protocol may be inappropriate, unconventional methods they intend to use must first be mentioned in the research proper, or be verified as being able to produce accurate and verifiable data.
Determining significance in the data helps the researcher “filter out” noise of the raw data and focus on pertinent findings that have an effect on the hypotheses they set out to test. Significance can be categorized as either Statistical or Clinical.

Statistical means that, “A result that is not likely to occur randomly, but rather is likely to be attributable to a specific cause. Statistical significance can be strong or weak, and is important to research in many math- and science-related fields, including medicine, sociology, psychology and biology”, (Investopedia) meaning most of the time the results can be predictable and reliable.

Clinical meanwhile is, “the potential for research findings to make a real and important difference to clients or clinical practice, to health status or to any other problem identified as a relevant priority for the discipline” (Jeans 1992), The findings are verified to be significance on the basis of how greatly they affect the hypotheses.

Clearly defining your objectives and means of measuring outcomes creates a more precise environment for accurate results to come about. Quality again hinges on how well the researcher has stated their objectives and methods in doing their research. And the reader is just as well oriented very clearly and will have the proper mindset and expectations while reading the research.

Honest and accurate analysis is a must. Preventing statistical errors reduces the potential for an erroneous research output. No change, alteration, mining, or manipulation of data must be done in order to preserve the integrity of the findings to be presented.

Manner of presentation matters. How data is presented can influence how it is analyzed. Both raw and derived data have different approaches in terms of analysis, and in presenting findings it is important to keep in mind who you want to present your work to, for what aims, and how.

Environmental and contextual issues are important to researchers as they must be aware of the where, how, and whys of the data they gathered. This influences their techniques and approaches to it both before and during analysis. It can be thought of as being a matchmaker between the data and the method by looking at context in order to create a more suitable pairing.

Partitioning the text ensures a consistent analysis. Lone researchers will have no problem with this as throughout the analysis of the text they are assured that the analysis will be consistent. Working in teams however needs a more coordinated approach akin to an assembly line. It must be considered that how a researcher partitions the text for analysis varies Thus a consensus must be reached by researchers in order to assure that they all use the same methods or strategies, for the sake of a consistent analysis.

Determining the reliability or validity of findings one can refer to three factors: stability, reproducibility, accuracy (Gottschalk, 1995). Validity lies in how the findings correspond to both theory and how the conclusions can be considered “ “sensible” and true statements about the world” (Levine 1997).

Extent of Analysis means that the researcher must make this conscious decision to measure up how much of the data needs analysis, be it comparative, qualitative, or quantitative means, And for that matter appropriating the strategies to the text/data. To give an example, statistically looking up the frequency of a word in the text gives a different meaning as compared to how the word was used in the text.

I think these few items are what constitute a necessary mindset in doing data analysis. And while there may be more factors that one may wish to add, those factors stated in this brief paper are more than enough to get the researcher started on the right foot. As Levine would put it (1997), the most important part of doing the analysis, or doing the research in general is to “think”. Think about the data, and think about what it’s trying to tell you, because if you put more thought into what you’re about to undertake, most of the time, you can’t go wrong.


Works Cited
• "Data Analysis." Data Analysis. Responsible Conduct in Data Management, n.d. Web. 11 Jan. 2014.Website
• Levine, Joel H. "Introduction to Data Analysis: The Rules of Evidence." Macintosh HD: DA: DA XI I.006 (1997): 1-11. Dartmouth.edu. Dartmouth College. Web. 14 Jan. 2014.Journal Article
• "Statistical Significance." Investopedia. N.p., n.d. Web. 14 Jan. 2014.

No comments: