# Early Attempts at Analysis with Excel

okay, here’s what I’ve got:
1) I did a line by line index of all my interviews, archival data, observations, and reflections
2) I have that index in excel (it’s 4 columns by 16425 rows)
3) I also have six pattern groups that reflect, for lack of a better term, the spheres of context that i think are at play

what I want to do:
1) color code each index item according to the pattern groups
2) sort the color-coded index so that i can see how often two or more pattern groups occur together

what I think I have to do:
1) after I color code each index item, i need to read through, line by line, and, in a separate column, mark the colors alphabetically (such as “bgr” for blue green red)
2) I will then ask Excel to sort that column of color codes alphabetically.  I think I have to do this because, for the life of me, I cannot get excel to sort the columns by color

mistakes so far:
I have already color-coded one index using a more complex color scheme (in each pattern group, I sorted the words by hues of the same color).  I think I need to re-color code the index using one version of the color, as I started to re-read the index and was having a hard time differentiating the lighter hues.  Plus, I think it is too much information for Excel to hold at once…things started to go wrong, words were the wrong color for no reason.

One critical thing I did learn what was that, if I colored “beat” before “eat” (colored blue and red, respectively), then both “beat” and “eat” would be red.

Note: I am using the conditional formatting function in Excel to color code the separate indices.

# Data Analysis: Initial Progress

Okay, so just to catch up, here’s what I have done so far:
1) used an Excel sheet to organize my data.
2) researched qualitative data analysis software/program options (see previous post).

The major realization I have had: the organization and indexing (aka coding) I have already done in Excel, I should have done within the system of a program such as Dedoose.  And I should have done it from the beginning, IF I wanted to use Dedoose (or some similar program).  They break it down neatly on the website for Ethnograph, suggesting that the data analysis process should look something like this (I’ve added some steps, but here’s the link to the original, http://www.qualisresearch.com):

1. Create a project
2. Add a data file (an interview, a piece of archival data, a set of observations, a short video)
3. Code the data file
4. Edit the code book
6. Code the data file
7. Edit the code book
8. Repeat for each data file

As far as I can tell, all of the data analysis software programs assume this is the process for data analysis.

I struggle with this situation for two reasons:

1. It seems to me that you have to create the code book before you have all of your data.  You “can’t code the data file” if you do not have a code book and you can’t have a code book until you read through all your data.  However, by the end of a year of ethnographic research, I have hundreds upon hundreds of pages of data, plus hours upon hours of videotape.  In order to 1) ask questions in interviews that better reflected the context in which I worked and 2) prevent having to spend months after my research phase, simply reading my fieldnotes, I started to do preliminary indexing (aka coding) a few months before my fieldwork ended.  I had to start to get ahead of it, otherwise I was going to be crushed by a tsunami of information that I worried I would feel too overwhelmed to process.  So, I started indexing (aka coding) months ago.  I thought I was doing the right thing – I was creating a code book so that I could code.
1. Little did I know, when I started coding was when I should have started putting my data into one of these programs, if I wanted to use the program.  My Excel file cannot be uploaded into Dedoose, and I either don’t have the money to access any of the other options, or don’t have the correct operating system or the means to get it.  These technicalities do not change or solve the need for a code book before I had read through all my data.  In other words, even if I had started using Dedoose months ago, I would still have needed the code book that I only have now, after organizing my data in Excel and reading through it all.
2. Now that I have read through all my data and have developed a system of codes, if I did decide to go with Dedoose, I would have to re-read all of my data and re-index (aka code) all of my data.
3. Looking at these software programs makes me think that the developers have very little idea of how much data ethnographers collect, and even less idea about the diversity of the data.  All of the examples shown in the demos always parred down the amount of data, as if to make the software workable, you had to cull a lot of data.
2. Which brings me to my second concern: control.  Once my data is in the program, all of the analytical works gets done by the program.  My analytical control begins and ends with the code book.  It feels a little bit like doing a complex math problem and only writing down the equation and then the answer.  My calculus teacher always used to say, “show your work,” and it has stuck with me.  I want help with data analysis, I do not want someone else to do it for me.
1. A code (which is, in truth, only a word) changes meaning and significance in different contexts.  Dedoose attempts to be responsible to this by offering you the option of “weighting” different codes, or attaching a numerical value that could indicate anything from amorphous significance to an indication of how many times something happens (an example they give is a code “reading with mother,” which they weight 1 to 7 depending on how many times in one week “reading with mother” happens in a given data file).

What I want to find in my data is correlations.  Dedoose, as far as I can tell, is good at this.  Check out one of their video demonstrations: click on the video labeled “Analysis” here: http://www.dedoose.com/LearnMore/VideoTour.aspx.  I want to create what Dedoose calls “Code Co-Occurence,” a great chart that looks like one of those old multiplication tables, where the frequency of co-occurence of pairs of codes is reflected numerically in the chart.  This is the only thing that Dedoose does, that I want.  And even then, the chart only shows the co-occurence of pairs of codes – what about three codes, or four codes, co-occuring?  Dedoose also has a great feature, where it is possible to connect excerpts from data files to codes, and relatedly, excerpts from data files to co-occurences of codes.

Because I have already read through all of my written and audio data (I have not ventured too far into my video data), and already have “excerpts” pieced out in Excel, as well as indices (aka codes), I am going to try doing my data analysis longhand, with Excel.  I will keep control and “show my work.”  *sigh*  Hopefully, I’m not going terribly wrong.

# More Internet resources for data analysis

Unlike my earlier post about the Internet as a resource, where some of the website resources were more appropriate, at least initially, for small businesses, the websites listed here are geared towards qualitative data analysis of the sort undertaken by social scientists.

Since I last posted this entry, I have done some more research about these qualitative data analysis options.  I have included my findings below.  I should also mention that there is another great set of posts on another blog, Chaos and Noise, about the vagaries of wrangling qualitative data analysis, specifically with a Mac.  Here are my two favorite posts: Qualitative Analysis software for Mac – a brief look, and Another look at Qualitative data analysis for Mac users: Dedoose.  Okay, now for what I’ve found out so far:

• Atlas.ti: http://www.atlasti.com/index.html
• I use a Mac, so to be able to use Atlas.ti, I would need Bootcamp or something similar, as Atlas.ti only runs in a Microsoft Windows operating system.  According to the Atlas.ti website, Atlas.ti will also work on the Mac if you run something called Parallels or VMWare Fusion.  I have heard of Bootcamp, but not the other two options (I am fairly computer illiterate, though).
• Bootcamp: http://www.apple.com/support/bootcamp/
• Parallels: http://www.parallels.com/products/desktop/
• VMWare Fusion: http://www.vmware.com/products/fusion/overview.html
• Ethnograph 6.0: http://www.qualisresearch.com
• This website is the least user-friendly.  You can download the demo, but it is not clear what the system requirements are, not is it clear how much it costs (is it internet-based?  is it software that you install?)
• The system looks similar to Dedoose, but far less flexible and maybe less powerful?  I’m not sure it’s worth finding out.
• Dedoose: http://www.dedoose.com
• Dedoose is internet based, though you can download a desktop app.
• You can use Dedoose for 1 month free, on a trial basis.  After that, it costs \$12.95/month for one user, or \$10.95/month for two or more users.
• The website includes some really helpful how-to videos: https://www.dedoose.com/LearnMore/VideoTour.aspx
• Overall, this seems like good option.  It is not expensive.  However, it is geared very much towards mixed-methods data (qualitative + quantitative) and is very good at correlating and highlighting relationships among different these different kinds of data.
• Careful, though, you’ll need to have your data in .docx or .txt format.  The system will upload some .xlsx data, but only in a specific context (this makes more sense once you get into the system).
• HyperRESEARCH: http://www.researchware.com/products/hyperresearch.html
• Good news!  This works on Mac and Windows.  It looks like the developers might also be interested in compatibility with Linux.
• You can download a trial version with no time limit.  However, you can only input 75 codes and 7 cases (I’m not sure what they mean by “case.”)
• The website is the most helpful and upfront of all the ones I have looked at – they have a whole tab dedicated to the file extensions (.txt, .doc, etc) that are compatible with HyperRESEARCH.
• It is expensive, however, to purchase, at \$199.00.  The upgrade is another \$99.00 (I’m not sure if this is necessary).
• MAXQDA: The Art of Data Analysis: http://www.maxqda.com
• The price of the student license is \$99.00, which is a little pricey for me.
• Also, like Atlas.ti and NVivo, this software needs a Windows platform.
• NVivo: http://www.qsrinternational.com/products_nvivo.aspx
• Again, NVivo, like Atlas.ti and MAXQDA, will only run on Microsoft Windows platforms.
• The price of a student license is \$215.00.  The price for a semester license (for full-time students) is \$145.00.  Way too pricey for me.
• TAMS Analyzer for Mac OS X: http://tamsys.sourceforge.net / On Facebook: https://www.facebook.com/pages/TAMS-Analyzer/172172999506418?fref=ts
• I tried to download this software, but ran into problems right away, as my computer will only run programs downloaded from recognized developers (I don’t really understand what this means, so I need some more time to figure out how to change my security settings – I think – again, I refer back to my computer illiteracy 0.o)

# Continuing thoughts on data retention

Sorry for the long absence.  I fell into the final moments of my fieldwork at the same time I started re-reading all of my fieldnotes from two and a half years of chunks of fieldwork.  Needless to say, I went looking for the forest and face-planted on a few trees in the meantime.

First, I would like to respond to Scott’s post.  As with most reading, I have to come back multiple times to a text before I start to really appreciate what is being said.  I think you definitely captured my intention with my Excel spreadsheet when you wrote about how this method of data retention allows me to “visualize the qualitative aspects of my data;” yes, definitely, that is what I am after.

I returned to New York over the winter break and met with some of the professors on my committee.  I offered up my Excel spreadsheet for criticism and feedback, while I tried to verbalize what my intentions in creating the spreadsheet included.  I want to visualize my data, initially, as a huge spider web, with the words that appear most frequently as condensed centers from which branch off words that appear less frequently.  I think I can connect these big centers to specific avenues of thought in anthropology (in my case, something around empathy, gender, and space).  Anthropologists working in other contexts are also talking about, for example, empathy, gender, and space, and my hope is to offer comparative data from the particular context in which I am working.  So I may contribute to specific conversations about empathy, gender, and space, as well as conversations about the intersections between empathy, gender, and space, all from the novel perspective afforded by my work among people who take animals seriously in the context of practices designed to ameliorate emotional suffering.  Anthropology is supposed to be comparative and holistic; I am doing my best to remain responsible to that initial and basic impulse.

In one revealing conversation, one professor said that what I am doing with my Excel spreadsheet is not coding, but indexing.  What I think I understand is that while my index of terms and phrases will populate and create the spider web, the condensed centers in the web will be the codes.

Given my understanding of this important difference between indices and codes, I better understand your comments about the crucial importance of maintaining the context and the “overall sentiment of conversations,” insofar as “adding a few lines of summary/observation of the interaction helps facilitate other forms of coding.”  I hear you.  Can you say more?  What other forms of coding are you thinking about?

I am maintaining two separate documents, in addition to the spreadsheet.  One document includes fieldnotes that are primarily reflective.  Still data, but much more emotional, much more obviously laden with my own self.  The other document is a bunch of short notes, brief thoughts that feel like inspiration, where somehow my brain threads through a connection between ideas, thoughts, words, phrases, pictures, and sensations that had previously appeared shapeless.  I say this in an attempt to nod to your process of maintaining “two levels of dialogue- what actually happens, and my little inner monologue.”  I think that is brilliant and absolutely necessary and helpful.

# Thinking through data retention…

It’s a great idea. I think that in certain scenarios, depending on the kind of analysis you want to do, this sort of excel type strategy can be particularly helpful. In fact last year, my wife said to me “listen, you can’t just go through long form like that, make a freaking excel sheet.” Clearly, she’s the smarter social scientist of the two of us.

So that’s what I did. I made an excel sheet to keep track of the quasi-quantitative interviews that I had done. The format made sense because I had a limited number of questions that I was asking over and over again. I could have used some quantitative software after transferring into excel, but I didn’t need to do anything fancy, means and modes were the most helpful.

And because it wasn’t just quantitative, I was able to visualize the qualitative aspects as well, which is what I see your method as doing. It’s helpful in a sense to focus in on the important pieces of data. I see that as being helpful for getting through some of the types of data you are collecting- largely many conversations that could be long and meandering, yet where you need to really focus on a couple of points.

My own issue here is that frankly I don’t have a good memory. To me, the positive issue of writing out the long form of field notes is to more or less provide a story for these little sound bites that are recorded. That’s something that I need a lot: to remember, more or less, the narrative of what happened that day. If I don’t have that, I’d be afraid that I’d take something out of context (and I think that they revoke your “anthropologist” title for that).

Therefore, what I could see adding is the following. At the top of each excel sheet, insert maybe three or four sentences: (for example) “Charlotte and I are having a conversation and she tells me about a book. After a while she starts crying. Then she collected herself and we continued the conversation. It was kind of weird because…”

In my notes I’ve got two levels of dialogue- what actually happens, and my little inner monologue. I keep my inner monologue in brackets to designate my reflections and feelings about what was going on. I think that inner monologue has been highlighted because of the post modern reflective moment in anthropology, and regardless of how we all feel about it, I think it’s important to record. I also find the separation of texts also helpful. (This strategy more or less comes out of Bernard (2006)).

I think that the way that data is being entered in your methodology prefigures it to textual analysis; specific words are highlighted that may come into play later. But what about the overall sentiment of the conversation? And what about the possibility of coding this entire conversation? Does this method of entering data preclude that by demanding codes for each line of data? Does adding a few lines of summary/observation of the interaction in general help facilitate other forms of coding?

# It starts with the field notes (I think)

As I was preparing the previous post on various online resources related to qualitative data analysis, specifically textual analysis, it occurred to me that in order to utilize those resources, I needed to organize my field notes differently.  I have been transferring notes from my jottings book into long form writing, or reports; however, in reading about all those online qualitative data analysis resources, I found myself wondering how to marry those tools with my long form field notes.

This uncomfortable realization also reminded me of other reasons why I am frustrated with long form field notes:

1. Once I transform my jottings into long form writing, with proper transitions to increase the readability and sense of the entry, I find it difficult, if not impossible, to convince myself to re-read a majority of my long form entries.
2. Not only is it emotionally taxing to re-live each day as I write and then re-read my writing, but I am just not that good of a writer, on the fly, every night, and sometimes the transitions I engineer, to move from one conversation to another, are downright obfuscating on a second or third read.
3. I also worry that I lose the flow of conversations – who said what?  when?  what did I say?  All of that becomes muddled when I’m trying to piece my jottings together into some kind of sensible long form whole.
4. Once I have pages upon hundreds of pages of these long form pieces of writing, I have to then go back and break the writing back apart so I can code chunks that programs like NVivo or Atlas.ti can chew on and analyze.  So, I had pieces, then I, somewhat artificially, made a series of “wholes,” and now I need pieces again.  What was the point of the whole, of the long form writing?
5. Moreover, how do I code those transitions that I engineered?  Sure, I needed them in order to make the whole piece make sense, but what do I do with them when I start to code?  Are they data?  Do I still need them?  Do I use them to then link chucks of coded text?

A lot of these questions and concerns connect back, for me, to my original post, “What is data?”  I would like to think that data (good data?  my favorite data? the data I feel most comfortable with?) comes directly from the people I am working with: things they said and things they did, or the things they didn’t say, or didn’t do.  So the transitions that I created to make my writing make sense don’t really interest me, at least at first.

In my constant daily worry over generating long form writing that is sensible, I think I loose the ability to be critical, make connections, and find holes.  So, I am in the process of trying a new data retention method where I transfer every one of the bits I record in my jottings notebook into a separate line on a Microsoft Excel workbook (click: FieldnotesInExcel for an example of what I’m doing – I have changed the names to protect confidentiality).

I fear my professors may come charging toward me, hair on fire, over the choices I have made recently regarding my field notes and how I transfer them from my jottings book to my computer each night; but for me, using Microsoft Excel to compartmentalize unique data points makes me feel like I have a chance at doing some hard core qualitative data analysis eventually.

At the same time, and this is super important, I am still keeping a Microsoft Word file open where I occasionally write long form pieces and let myself try to generate those transitions, which are important, in so far as they challenge me to ask different questions in the field, and ask different questions of my data.

I agree, it’s all data, but some data has to do different things, or work differently, for me, than other data. (I think)

# Textbook Resources

Okay, so, for me, one of the major takeaways from my last post was how, in order to maximize some of the resources available on the internet, in terms of text mining, I would need to learn a programming language and write a program to analyze my data.  Though this sounds like all kinds of awesome, I am slightly pessimistic about my potential programming abilities.  So, I took to Amazon and started searching for textbooks on qualitative data analysis.  I should note, I have not abandoned the resources I came across on the Internet, and I am curious about intersections between the contemporary discussions of design (which seem to grow beauty out of elaborate computer programming) and anthropology and how those intersections might manifest in new ways to analyze the qualitative data that anthropologists generate.  In the meantime, here are some textbooks that I found (I plan to purchase one or two of them, maybe hound some local libraries, and see where they take me):

• Analyzing and Interpreting Ethnographic Data (Ethnographer’s Toolkit)
by Margaret D. LeCompte and Stephen L. Schensul
• Analysing Qualitative Data (The SAGE Qualitative Research Kit)
by Graham Gibbs
• Analyzing Qualitative Data: Systematic Approaches
by H. Russell Bernard and Gery Ryan
• The Coding Manual for Qualitative Researchers
by Johnny Saldaña
• Qualitative Data Analysis: An Expanded Sourcebook
by Matthew B. Miles and Michael Huberman

# The Internet as a Resource

A Google search yielded some potentially helpful blogs about data analysis, data analysis tools, and data visualization.  Many of the blogs are geared towards small businesses who need to manipulate data in order to make decisions about how to attract and keep customers.  I am curious about how I might be able to re-purpose these motivations in analyzing my fieldwork data.  I am also toying with the idea of trying to make Mircrosoft Excel and Google Spreadsheets work for me and help me aggregate, organize, and compare portions of my qualitative data.  We’ll see how it goes – websites on how to use Excel and Google Spreadsheets are also included below.

As a note, I especially need help with video analytics.  One of the major obstacles that I am encountering in my research is my need to use videotape and my total inability to do anything with the videotape.  In human-horse interactions, movement, literally the movement of bodies is profoundly important.  Current work by Goodwin and others in Embodied Interaction is helpful right up to the point of conveying and analyzing movement.  The movement of a hand in a circular motion, for example, is reduced to an arrow and a spiraling line, for example.  I need more, but I am at a loss for how to get there.  Hopefully, some of these websites will help steer me in helpful direction.

Data Analysis Blogs & Websites:

Data Visualization Websites:
I wonder, would these kinds of visualizations be helpful to include in a dissertation?  As I remember back to some of the charts, graphs, and diagrams that Levi-Strauss included in Structural Anthropology, for example, I wonder…could I build a data visualization model based on his early diagrams?

• DataVisualization.chhttp://datavisualization.ch
This site seems particularly fascinated by maps.  Also, there seems to be a rich library of tools.
• The Dashboard Spyhttp://dashboardspy.wordpress.com
This website looks super helpful, though the site is geared specifically to “dashboards.”  I am still very ignorant of all of this and am not sure what a “dashboard” is…it appears to be like a home page or a page that directs and manages traffic to more specific information.  I need to learn more.
• EagerEyeshttp://eagereyes.org
Run by an Associate Professor of Computer Science at UNC Charlotte, this blog is stunning and rich.  It is going to take me months to sort through all these blogs and find out how I can use some of this information in my work.
• Flowing Datahttp://flowingdata.com
This website has a lot of examples on it, as well as some tutorials.  They seem to map all kinds o data.  I am particularly interested in this idea of mapping and maps.  Maps are a great metaphor.  Maybe leaning on maps and cartography can be one way to develop a theoretical context in which to talk about my data.
• Information Aestheticshttp://www.infosthetics.com
The most recent entry is on “Microsonic Landscapes: Visualizing Music in Physical.”  ‘Nuff said.  Read it and weep, literally.
• Information & Visualizationhttp://informationandvisualization.de
Super straight-forward…not all that inspiring, but potentially super informative.  A good place to start.
• Information is Beautifulhttp://www.informationisbeautiful.net
Wow, again.  New words: infographic design, interactive visualization, data journalism, and motion infographic
• Many Eyeshttp://www-958.ibm.com/software/data/cognos/manyeyes/
A great website where you can upload your own data and create some visualizations.  Note:  all the uploaded data is public.
• Presentation Zenhttp://presentationzen.blogs.com/presentationzen/
This is a beautiful blog and the most recent posts on creativity, education, and story look tremendously promising.

# What is data?

in the words of Paul Byers (via Hervé Varenne)

Source: http://varenne.tc.columbia.edu/byers/dissertation_talk.html

“….”Data” is anything that you can show to be relevant to your research. Grey Gundaker’s dissertation (an examination of “creativity” and its cognates) included (as data) pages of the NY telephone book, conversations overheard on a train, in hallways, in museums, anthropological research on “sorting,” psychological research into “creativity,” personal experiences as an art teacher, etc. etc.). Data can be what people don’t say, what people lie about, or even what never happens. And, apart from “standard” corpi (that’s the plural of “corpus”), one of the most important ways to recognize “data” is to pay attention to things that surprise, startle, disgust, or elate you (or anyone else) unexpectedly. Your reaction is NOT data but it alerts you to something that may be “data.”

…..”Objectivity” is not disassociating yourself from your “data” but (if we need the word at all) recognizing that it was your “subjectivity” that recognized the data. Indeed, where else does curiosity come from? And what is science but our effort to satisfy that curiosity. What distinguishes “science” as “precise” is our commitment to recognizing (as best we can) the relationship between ourselves and our “data.” To suppose that we can separate ourselves from our observations is the folly of a foolish “science.”

…..This leads to the illusory construct called “method.” Margaret Mead once said (to me) that “methods were invented so that not-very-bright people could participate in science.” It often amuses me that many so-called “qualitative” dissertations cite “grounded theory” as their “methodological” foundation. I have often wondered if Glaser and Straus had their tongues in their cheeks since “grounded theory” is, at bottom, essentially a formal description of the way anyone (at least before he/she goes to school) learns anything.

…..In my own dissertation I described something of “systems theory” as a point of view or way of looking (pretending that “systems theory” is a “theory,” which it decidedly is NOT) and finished the chapter with the words “one obvious point emerges without further justification: the ultimate test of any method is that it works.”

…..There is really no need for an explicit discussion of “method.” But there is the necessity to show the reader your “stuff” (i.e. data) and show him/her exactly what you did to or with it. Then you can say what YOU see or think about it and the reader can agree, disagree, replicate, etc. That [is] ALL you can do.

…..Method, then, is no more than finding ways to juxtapose recognizable things in relationships that show us what we hadn’t recognized before. And, if we’re clever enough, that recognition will inform us about other pieces of the world.

…..In the end the dissertation-writer’s obligation is to write so that the reader can acquire the insights of the researcher without having to do the work the researcher did. And it should be seen as a sort of story that the reader will find interesting–even if it tells him more than he really wants to know.”