Don't use font color or highlighting as data
You might be tempted to highlight particular cells with suspicious
data, or rows that should be ignored. Instead, add another column
with an indicator variable (for example, "trusted"
, with values
TRUE
or FALSE
).
Here’s an example in which a cell with a suspicious entry is highlighted.
<?xml version=”1.0” encoding=”UTF-8”?>
It would be better to include an additional column that indicates the outliers. The highlighting is nice visually, but it’s hard to grab that information for use in the later analysis.
<?xml version=”1.0” encoding=”UTF-8”?>
Here’s an example with males highlighted in blue and females in
pink. Rather than use highlighting to indicate sex, it’s better to
include a sex
column, with values Male
or Female
.
<?xml version=”1.0” encoding=”UTF-8”?>
Next up: Choose good names for things.
(Previous: No calculations in the raw data files.)