«Abstract This paper revisits the debate on the units of Content Analysis (CA) for the purposes of Corporate Social Reporting (CSR) research and also ...»
11 Wiseman (1982) and Patten (2002b) also counted lines, in a complementary manner to an „index‟ CA. Lines have also been employed by e.g. Bowman and Haire (1975; 1976) and Trotman and Bradley (1981), but in order to estimate the proportion of the total discussion on all issues. Davey (1982, cited in Guthrie and Mathews, 1985, pp. 258-259) interestingly determined the volume of disclosures by calculating words as composed of five characters and a one character space (six characters in total), in essence a character-based quantification, similar to the one adopted by Tinker and Neimark (1987).
Although, it should be acknowledged that, particularly characters could possibly bring extra precision in measurements, as Milne and Adler (1999) note for words, this “seems unlikely to add to understanding” (p. 243). It is further assumed that the arguments behind the potential use of these units are subsumed in the discussion of e.g. words or sentences. Further, Burritt and Welch (1997) counted passages/ thematic units, an approach to measurement, however, highly contested given that equal sovereignty was granted to issues discussed in one sentence with others in whole paragraphs, where further reliability is very difficult to be attained (Holsti, 1969).
- 17 Words, sentences and proportion of pages As illustrated in Table 2, a number of CSR studies have employed words or sentences as the recording unit. As further illustrated in Table 3, these two approaches share a number of benefits and limitations and their inter-relation has been empirically validated as early as 1947 (Dollard and Mower, 1947). Both approaches do not account for differences in typeface within the document (Hackston and Milne, 1996) or for repetitions in the information (Patten, 2002a); however, both approaches are not affected by variations in the general font size of different documents (Tilt and Symes,
1999) or by the presence of margins or blank pages (Gray et al., 1995b) nor by whether the sources are in an electronic (particularly internet or.pdf files) or in microfiche form (Campbell, 2004) and they generally seem to “lend themselves to a more controllable analysis” (Gao et al., 2005).
Compared to sentences, words seem to have the advantage of being “the smallest unit of measurement for analysis and can be expected to provide the maximum robustness in assessing the quantity of disclosure” (Wilmshurst and Frost, 2000, p. 16). As Krippendorff (2004) similarly argues “To ensure agreement among different analysts in describing the coding/recording units of a content analysis, it is desirable to define these units of description as the smallest units that bear all the information needed in the analysis, words being perhaps the smallest meaningful units of text… and the safest recording unit for written documents” (pp. 100, 104). Further, words as the recording unit may also assist by allowing the inclusion of tables in the analysis (but see Hackston and Milne‟s, 1996, approximation for one table line to equal one sentence, which allows tables also to be captured when using sentences as the recording unit).
A number of studies, though, have questioned the usefulness of the additional detail in measurements from employing words rather than sentences. Researchers note that the “tedious exactitude” (Patterson and Woodward, 2006, pp. 21-22) of words “seems unlikely to add to understanding” (Milne and Adler, 1999, p. 243) and put forward arguments for the use of sentences, since these are also “easily recognizable
- 18 syntactically defined units of text” (Krippendorff, 2004, p. 105), they may be quantified with greater measurement accuracy (Unerman, 2000), they are thus subject to less inter-coder variation (Ingram and Frazier, 1980; Deegan et al., 2002) and overall seem to be able “to provide complete, reliable and meaningful data for further analysis” (Milne and Adler, 1999, p. 243).
A strong argument, however against employing either words or sentences as recording units “is that this will result in any non-narrative CSR disclosures (such as photographs or charts) being ignored” (Unerman, 2000, pp. 675). As Beattie and Jones (1997) have argued particularly with regards to graphs, approximately 80% of leading US and UK companies use them in their Annual Reports; these are more userfriendly than tables; and graphs, especially in colour, attract the reader‟s attention;
additionally “the reader‟s ability to remember visual information is normally superior to that for remembering numerical or textual information” (Beattie and Jones, 1997, p.
34, a justification supported by Leivian, 1980). Photographs have also been used to present and highlight what companies wish to portray (Preston et al., 1996) and it seems that the role of graphic representation in corporate external financial reporting is being recognised increasingly by a number of national regulatory bodies, such as the Canadian Institute of Chartered Accountants (Beattie and Jones, 1997). It is thus evident that this information needs not to be excluded from CA studies (see also similar arguments by Berelson, 1952; Stone et al., 1966).
In an attempt to capture this valuable non-narrative information, a number of researchers employ proportions of a page as recording unit. Researchers frequently lay an A4 grid with twenty five rows of equal height and four columns of equal width (but see e.g. different A4 grids by Guthrie and Parker, 1989; Hackston and Milne, 1996; and Newson and Deegan, 2002) across each CSR disclosure, “with volume being counted as the number of cells on the grid taken up by a disclosure” (Unerman, 2000, p. 676). The main benefit of this approach, other than capturing the information provided in a pictorial, tabular, graphic or large typeface form, is that it generates detailed measurements and comparable findings across reports of the same and different companies.
- 19 The proportions of a page approach, however, has been also criticised for a number of reasons, mainly due to that it takes as data only dictated attributes of the content (statements about the content made by the analyst or participants in the research) and not the content itself, which results in arbitrarily created visual recording units (Ekman et al., 1969; Paisley, 1969). Given further that “it is difficult to place an objective measure on pictures and diagrams” (Deegan et al., 2000, p. 118) researchers often find it difficult to decide on whether this information is CSR or not, and then on how to classify it. As Newson and Deegan (2002) note, “in a number of cases, the expert coder regarded photographic evidence referring to employees in a work environment as less relevant than photographic evidence highlighting employees celebrating acts of achievement” (p. 193). In addition, as Wilmshurst and Frost (2000) note, although arguably pictures may be worth a thousand words (but which words?), “to include them in a measure based upon an unweighted word count is highly subjective” (p. 17).
Further limitations of the proportions of a page approach include that this approach is affected by different font and page sizes (Tilt and Symes, 1999; Paterson and Woodward, 2006); that an additional area of subjectivity is introduced with regards to the treatment of margins and blank pages (Gray et al., 1995b; Unerman, 2000); that the approach is similarly affected as words and sentences by the differences in grammar and repetition12 (Patten, 2002a); that the additional thoroughness and effort required in the use of the grid for recording increases the possibility for measurement errors (Milne and Adler, 1999); and that it is impossible to directly record data in an electronic (e.g. internet or pdf) or microfiche form (Campbell, 2004). It should be noted that, even if it is attempted to print (as Patten and Crampton, 2004, did) or type (as Ingram and Frazier, 1980, did) data in one of these forms, this would still result in some distortion to the size and context (e.g. margins, font and page sizes are often affected). Thus, the inherent limitations of this approach lend support to researchers to reject it and, albeit acknowledging the losses from the exclusion of pictorial or graphical evidence, to adopt words or sentences as recording units.
12 Although Unerman (2000) points out the possible limitations arising from differences in the use of grammar when he discusses sentences, it should be noted that grammar and repetition are context issues and thus affect all recording methods.
In an attempt to employ an alternative and more valid method for CSR measurements, a number of studies adopt, implicitly or explicitly, a page size approach (where “the written and pictorial part of a page… [is] considered to be the page itself”, Gray et al., 1995b, fn16, p. 90). The basic tenet of this analysis is that when measuring the extent of disclosures the collected data should be considered in conjunction with the physical source from which they are extracted. Most frequently researchers attempt to estimate CSD as proportion to the whole discussion in the report, often on a line-by line (Bowman and Haire, 1975; 1976; Trotman and Bradley, 1981) or on a sentence-bysentence (Salama, 2003; Hasseldine et al., 2005) basis. However, even when authors such as Dierkes (1979) report that they measure the extent of CSD in number of pages of each report (or as quarters of a page of each report, as in Gibson and Guthrie, 1995), this implies that a page – size rather than a standardised proportions of an A4 page approach was adopted. Major limitations of this approach are that researchers do not seem to include pictures when employing it and, even more importantly, that it would have been meaningless to adopt it in a research examining standalone reports, where it has been suggested (Buhr, 1994) that coding should commence on the assumption that everything should be considered CSD apart from some pre-specified information.
Hackston and Milne‟s (1996) study is a potentially more credible adaptation of a page size approach. The authors originally sentence-coded and measured their data sets and then constructed an approximation to page measurement from these data: firstly, “the average number of sentences per page of the chairman‟s report for each annual report was calculated. The average for each report was then divided into the total number of social disclosure sentences for that report to produce a derived page measurement for each company” (p. 86). Despite their acknowledged crudeness of this measure, when some refinements are made to it (more sentences from pages containing non-pictorial information are included in the estimation of the average and a detailed page adjusted
- 21 grid is also employed to account for the space of non-narrative disclosure) 13 it may provide more valid results with than the proportion of page approach.
As illustrated in Table 3, this approach is not affected by the report or font sizes, neither by the margins and blank pages, and although it provides less detailed measurements, it is easier to measure and further is not affected by a possible pdf/microfiche form of the text (even if printing affects the documents‟ size, it still has no effect on page size measurements); it thus seems to capture in a more valid way than the proportion of a page approach the information identified as CSR, particularly given its triangulation benefits from the additional use of sentences, and to further generate more reliable results. However, given that the generated data are in a page size and not in a standardised A4 page form, this implies that the derived measure (similarly to the proportions of the report one) could provide meaningful average CSD approximations but crude aggregate figures unless the average sentences per report of all added derived measures is known and adjustments are made; thus for the purposes of a database such as the one of the Centre for Social and Environmental Accounting Research (CSEAR), the employment of standardised proportions of A4 pages is deemed more suitable to avoid tedious adjustments.
The BA study utilises a refinement to Hackston and Milne‟s (1996) page size approach. To further increase the validity of their instrument all pages containing solely narrative information per report were considered for the calculation of the average number of sentences per page. Further, all pages in microfiche form containing non-narrative information were firstly printed in a A4 format; then as illustrated in Figure 2, a clear plastic A4 acetate was employed to draw a page-size grid with six rows of equal height and eight columns of equal width and the proportions were estimated to the nearest 1% of a page (for the documents in electronic form a specialised pdf reader software, incorporating a detailed 25x35 cell „grid‟ view, was employed)14. Despite the fact that the page grid employed for the 13 Tinker and Neimark (1987) in an attempt to develop an aggregate textual and pictorial character measure of CSD “by counting the number of textual characters that would fit into the photographs that address the subject” (p. 80) seem to employ a similar measurement approach.
14 Note that a number of practicalities may arise when it is attempted to estimate the „average‟ page size, both in terms of sentences and in terms of written and pictorial space of the report: e.g. in BA (2000) there seem to be 2 main types of pages, including on average either 24 or 44 sentences, in which
- 22 data in microfiche form was not as detailed as the ones employed by Gray et al.
(1995b) or Unerman (2000), (but was still significantly more detailed than quantifications on the basis of the nearest tenth or even quarter of a page [e.g. Guthrie and Parker, 1989] that seems to have been employed in a number of earlier studies [Hackston and Milne, 1996]) since only tables, images and narratives in large typeface were recorded in this manner, it was considered that the use of a more detailed grid would add more to the possibility of measurement errors rather than to the validity of the findings.
Take in Figure 2