Choose good names for things
It’s important to pick good names for things. This can be hard, and so it’s worth putting some time and thought into it.
As a general rule, don’t use spaces, either in variable names (that
is, the names of the columns in your data) or in
file names. They make programming harder: the analyst will need to
surround everything in double quotes, like "glucose 6 weeks"
rather
than just writing glucose_6_weeks
. Where you might use spaces, use
underscores or perhaps hyphens. But don’t use a mixture of underscores
and hyphens; pick one and be consistent.
Be careful about extraneous spaces (say, at the beginning or end of a
variable name). “glucose
” is different from “glucose
” (with an
extra space at the end). Extraneous spaces can cause headaches later.
Avoid special characters, too. (Except for underscores and hyphens;
they’re okay.) Other symbols ($
, @
, %
, #
, &
, *
, (
, )
,
!
, etc.) often have special meaning in programming languages, and so
they can be harder to handle. They’re also a bit harder to type.
The main principle in choosing names, whether for variables or for file names, is short, but meaningful. So not too short.
The Data Carpentry lesson on using spreadsheets has a nice table with good and bad example variable names:
good name | good alternative | avoid |
---|---|---|
Max_temp |
MaxTemp1 |
Maximum Temp (°C) |
Precipitation |
Precipitation_mm |
precmm |
Mean_year_growth |
MeanYearGrowth |
Mean growth/year |
sex |
sex |
M/F |
weight |
weight |
w. |
cell_type |
CellType |
Cell type |
first_observation |
Observation_01 |
1st Obs. |
I agree with all of this. I’d maybe cut down on some of the
capitalization. So maybe max_temp
, precipitation
, and mean_year_growth
.
Finally, never include “final
” in a file name. You’ll invariably end up
with “final_ver2
”. I can’t say that without referring to this
PHD comic:
Next up: Make backups.
(Previous: Don’t use font color or highlighting as data.)