htDig Search Engine Installation on Debian 12


by Ramses Soto-Navarro ramses@sotosystems.com, 6/12/2025


Overview

htDig is a light web search engine written in C++. I’ve been using it for almost 20 years; makes it easy to search lots of text; great for IT administrators that need to search their own technical notes. htDig can also parse pdf, doc, and other types alike. htDig uses CGI. It is an old and simple web search engine but it still works great and is simple to setup once you know how. Here I put together some steps for a quick setup.


Debian 12 Apache Setup

Install Debian Version: 12.11 x86_64. Hostname is deb1.example.com.

Install Apache and enable/verify CGI:

# apt install -y apache2 apache2-doc w3m rsync
# a2enmod cgi
# a2enmod autoindex
# a2enmod rewrite
# systemctl enable --now apache2
# systemctl restart apache2
# apachectl -M | grep -i -E "cgi|rewrite|index"

Setup your directory full of technical text notes, and your web directory. Here. as a test, we’ll fill it up the notes directory with random html files.

# mkdir -p /u1/howto
# mkdir -p /u1/www/deb1
# rsync -a /usr/share/doc/apache2-doc/manual/en/ /u1/howto/
# echo "<p>Welcome to deb1.example.com</p> <p>Search <a href=http://deb1.example.com/howto/search.html>here</a>" > /u1/www/deb1/index.html

Verify that your DNS or /etc/hosts is setup for name resolution.

# ping deb1.example.com

Apache Virtual Host Setup

Create and test the Apache virtual host. Configure it to show and index listing in http://deb1.example.com/howto/

# cat >> /etc/apache2/sites-available/deb1.example.com.conf <<EOF
RewriteEngine on
<VirtualHost *:80>
	ServerAdmin admin@example.com
	DocumentRoot /u1/www/deb1
	ServerName deb1.example.com
        #  Redirect "/" "https://deb1.example.com"
	DocumentRoot /u1/www/deb1

	<Directory "/u1/www/deb1">
		Options Indexes FollowSymLinks
		AllowOverride All
		Require all granted
	</Directory>

	Alias /howto/ "/u1/howto/"

	<Directory "/u1/howto">
		Options Indexes FollowSymLinks
		AllowOverride All
		Require all granted
	</Directory>

	ErrorLog ${APACHE_LOG_DIR}/deb1.example.com.error.log
	CustomLog ${APACHE_LOG_DIR}/deb1.example.com.access.log combined

	Include conf-available/serve-cgi-bin.conf
</VirtualHost>
EOF

NOTE: `Alias /howto “/u1/howto”` is the same as `Alias /howto/ “/u1/howto/”` However, notice the trailing slashes. If so then you’ll have to open http://deb12.example.com/howto/ instead of http://deb12.example.com/howto. In the past I experienced troubleshooting scenarios because of that.

Enable the site; restart Apache2; disable default page; remove any previous index files; Restart Apache; test the Apache virtual host :

# ln -s /etc/apache2/sites-available/deb1.example.com.conf /etc/apache2/sites-enabled/deb1.example.com.conf
# unlink /etc/apache2/sites-enabled/000-default.conf
# rm -f /u1/howto/{index.html,.htaccess}

# systemctl restart apache2
# systemctl status apache2
$ w3m -dump http://deb1.example.com
$ w3m -dump http://deb1.example.com/howto/

htDig Setup

Install htdig; add the htdig and cgi-bin links:

# apt install -y htdig
# ln -s /var/lib/htdig/www /u1/www/deb1/htdig
# ln -s /usr/lib/cgi-bin /u1/www/deb1/cgi-bin

Configure and index htdig:

# vi /etc/htdig/htdig.conf
start_url: http://deb1.example.com/howto/

# rundig -v -s -c /etc/htdig/htdig.conf

# htstat

Copy the search.html web page. Test searching for the word “cgi”. It should return about 80 matches. Click on a match to verify that it works:

# cp /usr/share/doc/htdig/examples/search.html /u1/howto/
# w3m http://deb1.example.com/howto/search.html

Scroll to the bottom and verify that the scroll icons work.


htaccess Authentication Setup

Secure your howto documentation directory. Password protected the /howto/ directory. Last, enable SSL encryption.

Create a user and password:

# htpasswd -c /etc/apache2/.htpasswd user1
# chmod 0644 /etc/apache2/.htpasswd

Add .htaccess to the directory to prompt for the authentication. Restart Apache:

# cat >> /u1/howto/.htaccess <<EOF
AuthType Basic
AuthName "Authentication Required"
AuthUserFile /etc/apache2/.htpasswd
Require valid-user
EOF

# systemctl restart apache2

Open a new browser session and test your search: It should prompt you the first time for the username and password. You should be able to search for keywords such as ‘rewrite’ or ‘cgi’.

$ w3m http://deb1.example.com/howto/search.html

Cron Setup for htDig

Periodically run `rundig` in a cronjob in order to keep the indexes renewed, here every hour:

# crontab -e
1 */1 * * * rundig -s -c /etc/htdig/htdig.conf

Troubleshoot

Using ” Redirect “/” “https://deb1.example.com” disables the htrun feature, and your htstat will report 0 indexed files. So, do not redirect your main page. Most modern browsers automatically redirect to https anyway.


Summary

You should now be able to quickly search through your own documentation. htDig is a great tool for that for local work related searching of your own technical notes. If you are hosting online be sure to configure Apache properly and run the search encrypted under and SSL session.


The End.