Data, data and more data

Sali A. Tagliamonte

doi:10.1017/CBO9780511801624.006

What do you do with your data once you have collected it? This chapter will elucidate the procedures for handling a large body of natural speech.

Chapters 1 to 3 have focused on methods for collecting optimal data for analysis. Now it is time to learn what to do with data once you have it. This chapter focuses on data handling and, in particular, techniques for representing speech data in writing.

When faced with a collection of dozens upon dozens of audio-tapes, minidisks or sound files, what do you do next? How can you make the invaluable data contained within maximally accessible and useful?

In this chapter, I focus on tried-and-true procedures from my own experience. I build on the foundations of earlier corpus-building projects (Poplack 1989, Poplack and Tagliamonte 1991). However, I also focus on data arising from fieldwork conducted in the British Isles between 1995 and 2001 (e.g. Tagliamonte 1998, Tagliamonte et al. 2005).

THE CORPUS

The components of a corpus, at least in my own research, are listed in (1):

Components of a corpus

(1)

a. recording media, audio-tapes (analogue, digital) or other
b. interview reports (hard copies) and signed consent forms
c. transcription files (ASCII, Word, txt)
d. a transcription protocol (hard copy and soft)
e. a database of information (FileMaker, Excel, etc.)
f. analysis files (Goldvarb files, token, cel, cnd and res)

The basic substance of a language corpus is the data. Most of my corpora have been collected on audio-tapes and represent one to two hours of conversation between a single interviewer and an informant.

Book contents

4 - Data, data and more data

Summary

Access options

Book contents

4 - Data, data and more data

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive