No empty cells
Fill in all cells. Use some common code for missing data.
Not everyone agrees with me (for example,
White et al. (2013)
state a preference for leaving cells blank), but I’d prefer to have
“NA
” or even a hyphen in the cells with missing data, to make sure
it’s clear that the data are known to be missing rather than
unintentionally left blank.
I also often see cells left blank when a single value is meant to be repeated multiple times. For example, one might put the date in only a few cells, like this:
Don’t do that! If the rows were to be sorted at some point, that date column would be completely mangled.
It’s much better to fill them all in, like this:
I also see this sort of thing for information about different treatments. For example, I recently saw a file like the following:
We’ll talk more about layout shortly, but while it’s sort of clear, to a human, what’s intended above, the computer will have a hard time with it.
You could fill in some of those cells, to make it more clear, but even better would be to make a “tidy” version of the data (more on what is meant by “tidy” later, as part of the discussion of layout), with each row being one replicate, as follows:
No empty cells!
Next up: Put just one thing in a cell.
(Previous: Write dates as YYYY-MM-DD
.)