It’s important to pick good names for things. This can be hard, and so it’s worth putting some time and thought into it.

As a general rule, don’t use spaces, either in variable names (that is, the names of the columns in your data) or in file names. They make programming harder: the analyst will need to surround everything in double quotes, like "glucose 6 weeks" rather than just writing glucose_6_weeks. Where you might use spaces, use underscores or perhaps hyphens. But don’t use a mixture of underscores and hyphens; pick one and be consistent.

Be careful about extraneous spaces (say, at the beginning or end of a variable name). “glucose” is different from “glucose ” (with an extra space at the end). Extraneous spaces can cause headaches later.

Avoid special characters, too. (Except for underscores and hyphens; they’re okay.) Other symbols ($, @, %, #, &, *, (, ), !, etc.) often have special meaning in programming languages, and so they can be harder to handle. They’re also a bit harder to type.

The main principle in choosing names, whether for variables or for file names, is short, but meaningful. So not too short.

The Data Carpentry lesson on using spreadsheets has a nice table with good and bad example variable names:

good name good alternative avoid
Max_temp MaxTemp1 Maximum Temp (°C)
Precipitation Precipitation_mm precmm
Mean_year_growth MeanYearGrowth Mean growth/year
sex sex M/F
weight weight w.
cell_type CellType Cell type
first_observation Observation_01 1st Obs.

I agree with all of this. I’d maybe cut down on some of the capitalization. So maybe max_temp, precipitation, and mean_year_growth.

Finally, never include “final” in a file name. You’ll invariably end up with “final_ver2”. I can’t say that without referring to this PHD comic:

PHD Comic on Final.doc

phdcomics.com


Next up: Make backups.

(Previous: Don’t use font color or highlighting as data.)