You might be tempted to highlight particular cells with suspicious data, or rows that should be ignored. Instead, add another column with an indicator variable (for example, "trusted", with values TRUE or FALSE).

Here’s an example in which a cell with a suspicious entry is highlighted.

<?xml version=”1.0” encoding=”UTF-8”?>

169.4 2015-06-20 107 8 149.0 2015-06-20 106 7 108.0 2015-06-18 105 6 1.1 2015-06-18 104 5 97.5 2015-06-18 103 4 95.3 2015-06-14 102 3 149.3 2015-06-14 101 2 glucose date id 1 C B A

It would be better to include an additional column that indicates the outliers. The highlighting is nice visually, but it’s hard to grab that information for use in the later analysis.

<?xml version=”1.0” encoding=”UTF-8”?>

FALSE 169.4 2015-06-20 107 8 FALSE 149.0 2015-06-20 106 7 FALSE 108.0 2015-06-18 105 6 TRUE 1.1 2015-06-18 104 5 FALSE 97.5 2015-06-18 103 4 FALSE 95.3 2015-06-14 102 3 FALSE 149.3 2015-06-14 101 2 outlier glucose date id 1 D C B A

Here’s an example with males highlighted in blue and females in pink. Rather than use highlighting to indicate sex, it’s better to include a sex column, with values Male or Female.

<?xml version=”1.0” encoding=”UTF-8”?>

169.4 2015-06-20 107 8 149.0 2015-06-20 106 7 108.0 2015-06-18 105 6 117.0 2015-06-18 104 5 97.5 2015-06-18 103 4 95.3 2015-06-14 102 3 149.3 2015-06-14 101 2 glucose date id 1 C B A

Next up: Choose good names for things.

(Previous: No calculations in the raw data files.)