We used a literature review, survey methods, and statistical analysis to develop and test this health measurement tool. The study was designed based on Streiner and Norman’s book, Health Measurement Scales: A Practical Guide to their Development and Use[5]. In particular, we used their definitions and directions regarding reliability and validity, since our primary intent was to check for consistency and accuracy. The work was completed in three phases: 1) development of the tool; 2) assessment of content validity; and 3) testing for inter-rater reliability.
Phase I
Initial drafts of the tool were created based on a methodical review of the literature to find definitions used for infant feeding categories and on the authors’ clinical and research experience. It was designed for use by researchers or research assistants who will ask study participants questions and then complete the chart of feeding categories and scores. We attempted to provide flexibility with respect to timing of data collection and to account for quantity of breast milk and mode of feeding in the first six months following birth (i.e., during the typical period of exclusive breastfeeding). To capture breastfeeding patterns over time, we added a scoring system, so multiple time points could be averaged for a single score. The tool does not collect data about the introduction of complementary foods, specifically, the soft, semi-solid, and solid foods introduced around the middle of the first year.
Phase II
To assess the face and content validity of the initial tool, we consulted clinical and research experts in the field of breastfeeding and lactation to review the draft (see Additional file 1). We asked for feedback about the tool’s content, specifically, whether they believed we captured descriptions of breastfeeding patterns that would be useful for research projects. The revised tool (see Additional file 2) was then pilot tested with research assistants to determine readability, usability, and burden for potential users.
Phase III
For the third phase, breastfeeding women were recruited then telephoned by research assistants who worked in pairs. Breastfeeding mothers were recruited following the birth of their baby and before discharge from the postpartum unit of a large Canadian hospital that averages 6,200 births per year [6]. Inclusion criteria for the study were women who had given birth to a single, healthy newborn, planned to breastfeed and were able to breastfeed freely, were able to read and write in English or French, were willing and able to maintain a weekly feeding diary for 6 weeks and to answer 6 English telephone questionnaires (twice within 24 hrs × 3 times over 6 months). Nurses asked patients who met the criteria if they would be willing to learn about the study, then the researcher or a research assistant explained the study to interested mothers and obtained a signed consent and a questionnaire of demographic information.
Research assistants (RAs) collected data at 1 month, 3 months, and 5 months following recruitment. One RA was a registered nurse and certified lactation consultant, one RA was a fourth year nursing student who had completed a maternity course, and one RA was a third year nursing student who had not completed a maternity course. The rationale for selection of RAs was to avoid bias due to expertise in the field of maternity and breastfeeding care. The three RAs were randomized into pairs to make the telephone calls, then the two RAs called study participants within 48 hours of each other to administer the tool. After a first call was completed, the first caller would text her partner to inform her that the second call could be made.
To further test validity (i.e., that we were measuring what we intend to measure), we compared mothers’ diaries with the categories RAs recorded. Participants were given a feeding diary with eight boxes to tick each week for six weeks (see Additional file 3). The fourth week of the diary was compared to the 1 month telephone call made by the first caller. Without knowing the category recorded by the RA, the researcher determined a feeding category based on the boxes ticked by the mother. The two categories were then compared for levels of agreement.
Data analysis
Characteristics of mothers and newborns were summarized using descriptive statistics (mean and standard deviation for continuous variables and frequencies and proportions for categorical variables). Agreement between the two callers and agreement between diaries’ and RAs’ classification of feeding categories was assessed using an intraclass correlation coefficient (ICC). For the ICC, we used two-way mixed and absolute agreement options.
While the score using this tool is correctly an ordinal (categorical) score, it has been suggested that kappa does not function as well beyond the 2 by 2 table and it does not take into account the distance between scores on an ordinal scale [7, 8]. Weighted kappa’s can be used for data of this type but a number of authors have identified problems inherent in its use, and Maclure and Willett concluded that “a logical choice of standard weights makes weighted kappa equivalent to the intraclass correlation coefficient” [7, 8]. For these reasons, the data was treated as ordinal-continuous and the intraclass correlation coefficient (ICC) in SPSS 20 was used to calculate the inter-rater reliability as well as agreement between the RA’s rating and the mother’s diary.
Sample size calculation for Phase III
While some researchers have based sample size calculations for ICCs on tests of hypotheses, others have argued that it makes more sense to base sample size calculations on attaining a specified level of precision around the ICC [9]. We anticipated there would be high levels of agreement (0.8 or more) between raters and between the tool and criterion (diary). Based on this supposition, assuming an alpha of 0.05 and a desired width of confidence interval of 0.2, we estimated that 55 subjects would be required [10]. In order to account for 30% attrition due to rates of early weaning and loss to follow-up over the 6 month data collection period, 75 subjects were recruited.
Research ethics
The University of Ottawa Research Ethics Board [File Number: H11-11-02] and the Ottawa Hospital Research Ethics Board [Protocol # 20120200-01H] approved the study. The study was conducted in English but all recruitment material was also available in French.