Markdown
My prefered way to construct an informal report describing a data analysis project is as a web page. A great advantage is that I don’t need to worry about page breaks and the placement of figures.
Web pages are written in
html. But html is cumbersome to
write directly, and so for analysis reports, I’ll generally use either
Markdown or
AsciiDoc. These are two systems
for writing simple, readable text, with the sort of marks that you’d
use in an email message (for example, **bold**
for bold or
_italics_
for italics), that can be easily converted to html.
Here, I’ll discuss Markdown. This is a prerequisite for what comes next: R Markdown with knitr.
HTML
It’s helpful to know a bit of html, which is the markup language that web pages are written in. html really isn’t that hard; it’s just cumbersome.
An html document contains pairs of tags to indicate content, like
<h1>
and </h1>
to indicate that the enclosed text is a “level one
header”, or <em>
and </em>
to indicate emphasis (generally
italics). A web browser will
parse the html tags and render
the web page, often using a
Cascading style sheet (CSS)
to define the precise style of the different elements.
But we won’t get into all of that; html is great, but the code is cumbersome to create directly, as it looks something like this:
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
</head>
<body>
<h1>Markdown example</h1>
<p>This is a simple example of a Markdown document.</p>
<p>Use a blank link between paragraphs.
You can use a bit of <strong>bold</strong> or <em>italics</em>. Use backticks to indicate
<code>code</code> that will be rendered in monospace.</p>
<p>Here's a list:</p>
<ul>
<li>an item in the list</li>
<li>another item</li>
<li>yet another item</li>
</ul>
<p>You can include blocks of code using three backticks:</p>
<p><code>
x <- rnorm(100)
y <- 2*x + rnorm(100)
</code></p>
<p>Or you could indent four spaces:</p>
<pre><code>mean(x)
sd(x)
</code></pre>
<p>It'll figure out numbered lists, too:</p>
<ol>
<li>First item</li>
<li>Second item</li>
</ol>
<p>And it's easy to create links, like to
the <a href="https://daringfireball.net/projects/markdown/">Markdown</a>
page.</p>
</body>
</html>
Pretty ugly. That’s probably more than you really needed to see. But knowing about html gives you a greater appreciation of Markdown.
Note that there are six levels of headers, with tags
<h1>
, <h2>
, <h3>
, …, <h6>
. Think of these as the title,
section, subsection, sub-subsection, …
A key design principle for creating good html documents (as well as Markdown, AsciiDoc, and LaTeX documents), is that you want to focus on the semantics (ie, the meaning of elements) rather than the style in which the material is to be presented. So focus on things like “section” or “heading” rather than “large and bold”. The reason for this, is that you’re giving the web browser more information about the material, and also you can more easily revise, externally (with Cascading Style Sheets (CSS)), the style in which the material is to be presented without having to go in and revise the html code.
Markdown
As I mentioned above, Markdown is a system for writing simple, readable text that is easily converted into html. The reason it’s useful to know a bit of html is that then you have a better idea how the final product will look. (Plus, if you want to get fancy, you can just insert a bit of html within the Markdown document.)
A Markdown document looks like this:
# Markdown example
This is a simple example of a Markdown document.
Use a blank link between paragraphs.
You can use a bit of **bold** or _italics_. Use backticks to indicate
`code` that will be rendered in monospace.
Here's a list:
- an item in the list
- another item
- yet another item
You can include blocks of code using three backticks:
```
x <- rnorm(100)
y <- 2*x + rnorm(100)
```
Or you could indent four spaces:
mean(x)
sd(x)
It'll figure out numbered lists, too:
1. First item
2. Second item
And it's easy to create links, like to
the [Markdown](https://daringfireball.net/projects/markdown/)
page.
That bit of Markdown text gets converted to the html code in the previous section. (Here is the source file and the derived html file.)
I hope the markup is reasonably self-explanatory. Markdown is just a system of marks that will get searched-and-replaced to create an html document. A big advantage of the Markdown marks is that the source document is much like what you might write in an email, and so it’s much more human-readable.
Take a look at the Markdown basics page, and the more complete Markdown syntax, or just the Markdown cheatsheet.
Converting Markdown to html
You can skip this section and move on to knitr with R Markdown, but for completeness let me explain how to convert a Markdown document to html.
Via RStudio
If you use RStudio, the simplest way to
convert a Markdown document to html is to open the document within
RStudio. You’ll see a
“Preview HTML” button just above the document. Click that, and another
window will open, with a preview of the result. (The resulting .html
file will be placed in the same directory as your .md
file.) You
can click “Open in browser” to open the document in your web browser,
or “Publish” to publish the document to the web (where it will be
viewable by anyone).
Another a nice feature in RStudio: when you open a Markdown document, you’ll see a little button with a question mark. Click that, and then “Markdown Quick Reference,” and you’ll get a cheat-sheet on the Markdown syntax. Like @StrictlyStat, I seem to visit the Markdown site almost every time I’m writing a Markdown document. If I used RStudio, I’d have easier access to this information.
Via the command line
Markdown is a formatting syntax, but it’s also a software tool; in particular, it’s a Perl script. So one approach to converting a Markdown document to html is to download and use that perl script.
But I prefer to use the markdown package for R.
Within R, you can install the package with
install.packages("markdown")
. Then load it with
library(markdown)
. And then convert a Markdown document to html with
markdownToHTML('markdown_example.md', 'markdown_example.html')
In practice, I do this on the command line, as so:
R -e "markdown::markdownToHTML('markdown_example.md', 'markdown_example.html')"
(Note that in Windows, it’s important to use double-quotes on the outside and single-quotes inside, rather than the other way around.)
Rather than actually type that line, I include it within a GNU make file, like this one. (Also see my minimal make tutorial.)
RStudio uses the
rmarkdown package package to
convert from Markdown to html. This uses
pandoc for the actual
conversion. The
RStudio Desktop software
includes pandoc, so if you install RStudio, you won’t need to install
pandoc separately; you just need to include it within your PATH
. On
a Mac, you’d use:
export PATH=$PATH:/Applications/RStudio.app/Contents/MacOS/pandoc
In Windows, you’d include "c:\Program Files\RStudio\bin\pandoc"
in
your Path
system environment variable. (For example, see
this page,
though it’s a bit ad-heavy.)
To convert your Markdown document to HTML, you’d then use
R -e "rmarkdown::render('markdown_example.md')"
(I still sort of prefer the markdown package to the use of the rmarkdown package and pandoc; the output file is a lot larger with the latter. But it’s best to follow the RStudio folks on this.)
Up next
Now go to heart of this tutorial, knitr with R Markdown.