We share, receive and see URLs all the time, so it’s nice to have human-readable URLs like example.tld/yeah instead of example.tld/yeah.html.
On this blog using Vitepress on a Nginx environment, it works rather well (with a minor trade-off).
Vitepress cleanUrls option on a Nginx server
Setting Vitepress cleanUrls to true removes the trailing .html from URLs. The internal links of your Vitepress app changes from <a href="/something.html"> to <a href="/something">.
However, cleanUrls does not change the file names or the folder structure of the generated HTML files, which means your server must now be able to serve the /something.html file when the /something URL is requested.
Here’s how I solved it, assuming the app is hosted on its own domain or subdomain (example.tld or subdomain.example.tld, I did not test example.tld/path/to/app).
At the time of writing, I’m using Vitepress 1.0.0-rc.24 and Nginx 1.25.
Full configuration
If you don’t want to read the step-by-step section, here are the important parts of the configuration. Adapt them to your needs and make sure to understand the trade-off for index pages.
server {
index index.html;
rewrite ^(.+)/$ $1 permanent;
if ($request_uri ~ ^/(.*)index\.html(\?|$)) {
return 301 /$1;
}
if ($request_uri ~ ^/(.*)\.html(\?|$)) {
return 301 /$1;
}
location / {
error_page 404 /404.html;
try_files $uri $uri.html $uri/ =404;
}
}The same, with code comments:
server {
index index.html;
# and other things…
# Remove the trailing slash (permanent 301 redirect).
rewrite ^(.+)/$ $1 permanent;
# Remove the trailing `index.html`.
if ($request_uri ~ ^/(.*)index\.html(\?|$)) {
return 301 /$1;
}
# Remove the trailing `.html`.
if ($request_uri ~ ^/(.*)\.html(\?|$)) {
return 301 /$1;
}
location / {
# When the HTTP status code is 404, answer with the `/404.html` file.
error_page 404 /404.html;
# When `foo/bar` (which is `$uri`) is requested,
# try to serve the first existing file among the list:
# `foo/bar`, `foo/bar.html` or `foo/bar/index.html`.
# Otherwise answer with a 404 code.
try_files $uri $uri.html $uri/ =404;
}
}Step-by-step
The first step is enough to have clean URLs, the other steps after will bring refinements and safeguards.
Step 1: the gist
Make sure your Nginx location block have this try_files rule:
server {
index index.html;
# and other things…
location / {
# When `foo/bar` (which is `$uri`) is requested,
# try to serve the first existing file among the list:
# `foo/bar`, `foo/bar.html` or `foo/bar/index.html`.
# Otherwise answer with a 404 code.
try_files $uri $uri.html $uri/ =404;
}
}The try_files list in details:
$uriis for exact matches like assets: when the browser requestslogo.svg, we search for alogo.svgfile.$uri.htmladds a missing.html(foo/barbecomesfoo/bar.html), which is the reverse operation of VitepresscleanUrls.$uri/adds a trailing/(foo/bar/) to instruct Nginx to “look into thefoo/bardirectory”, so Nginx ends up searching forfoo/bar/index.html, as defined by theindexdirective. This one is needed for the root of the website (example.tldmust serveexample.tld/index.html).=404throws a 404 status if none of these files can be found.
At this step, most paths are working with and without .html:
| Markdown | Valid paths | Invalid paths |
|---|---|---|
/index.md | /, /index.html | - |
/notes.md | /notes, /notes.html | /notes/ (403) |
/notes/my-note.md | /notes/my-note, /notes/my-note/, /notes/my-note.html | - |
| - | anything not found | Nginx 404 |
We need 3 improvements:
- Vitepress 404 page (see step 2) instead of Nginx 404 message;
- a fix for the 403 when there’s a trailing slash (step 3);
- no more paths with
.html(step 4);
Step 2: the 404 page
The default Vitepress comes with a 404.html page. Let’s serve it using the error_page directive:
server {
index index.html;
# and other things…
location / {
# When the HTTP status code is 404, answer with the `/404.html` file. #
error_page 404 /404.html;
try_files $uri $uri.html $uri/ =404;
}
}Step 3: avoiding the 403 errors
To avoid the 403 on paths with a trailing slash (e.g. my-blog.com/ or my-blog.com/notes/), we can redirect them to their non-trailing slash version using the rewrite directive in the server block.
server {
index index.html;
# and other things…
# Remove the trailing slash (permanent 301 redirect). #
rewrite ^(.+)/$ $1 permanent;
location / {
error_page 404 /404.html;
try_files $uri $uri.html $uri/ =404;
}
}WARNING
Note that the permanent keyword does a 301 redirect, which is permanent and might be cached during a very long time by browsers and search engines. If you are not sure about this rewriting rule, consider the redirect keyword to get a 302 (temporary) redirect instead of a 301.
Step 4: redirect /slug.html to /slug
For now, /slug and /slug.html are both working, so let’s redirect the one with the trailing .html.
https://stackoverflow.com/questions/38228393/nginx-remove-html-extension
server {
# other things…
rewrite ^(.+)/$ $1 permanent;
# Remove the trailing `index.html`. #
if ($request_uri ~ ^/(.*)index\.html(\?|$)) {
return 301 /$1;
}
# Remove the trailing `.html`. #
if ($request_uri ~ ^/(.*)\.html(\?|$)) {
return 301 /$1;
}
location / {
error_page 404 /404.html;
try_files $uri $uri.html $uri/ =404;
}
}WARNING
Like explained in the warning of the previous step, you might prefer return 302 over return 301.
The trade-off for index pages
If you have index pages like notes/index.md, you have to move them to notes.md, otherwise you’ll experience an infinite redirection loop where Nginx wants to remove the slash, but Vitepress wants to add it back.
In other words, do this:
.
├─ index.md
├─ notes
│ ├─ index.md // [!code --]
│ ├─ post-1.md
│ └─ post-2.md
├─ about.md
└─ notes.md // [!code ++]Readings
Aside from Nginx documentation for the HTTP core module, I was helped by:
- NGINX remove .html extension (StackOverflow)
- Routing access failure after server Nginx deployment (GitHub issue on Vitepress repository): this was where I started before deciding to write this post.