htDig Search Engine Installation on Debian 12
by Ramses Soto-Navarro ramses@sotosystems.com, 6/12/2025
Overview
htDig is a light web search engine written in C++. I’ve been using it for almost 20 years; makes it easy to search lots of text; great for IT administrators that need to search their own technical notes. htDig can also parse pdf, doc, and other types alike. htDig uses CGI. It is an old and simple web search engine but it still works great and is simple to setup once you know how. Here I put together some steps for a quick setup.
Debian 12 Apache Setup
Install Debian Version: 12.11 x86_64. Hostname is deb1.example.com.
Install Apache and enable/verify CGI:
# apt install -y apache2 apache2-doc w3m rsync # a2enmod cgi # a2enmod autoindex # a2enmod rewrite # systemctl enable --now apache2 # systemctl restart apache2 # apachectl -M | grep -i -E "cgi|rewrite|index"
Setup your directory full of technical text notes, and your web directory. Here. as a test, we’ll fill it up the notes directory with random html files.
# mkdir -p /u1/howto # mkdir -p /u1/www/deb1 # rsync -a /usr/share/doc/apache2-doc/manual/en/ /u1/howto/ # echo "<p>Welcome to deb1.example.com</p> <p>Search <a href=http://deb1.example.com/howto/search.html>here</a>" > /u1/www/deb1/index.html
Verify that your DNS or /etc/hosts is setup for name resolution.
# ping deb1.example.com
Apache Virtual Host Setup
Create and test the Apache virtual host. Configure it to show and index listing in http://deb1.example.com/howto/
# cat >> /etc/apache2/sites-available/deb1.example.com.conf <<EOF RewriteEngine on <VirtualHost *:80> ServerAdmin admin@example.com DocumentRoot /u1/www/deb1 ServerName deb1.example.com # Redirect "/" "https://deb1.example.com" DocumentRoot /u1/www/deb1 <Directory "/u1/www/deb1"> Options Indexes FollowSymLinks AllowOverride All Require all granted </Directory> Alias /howto/ "/u1/howto/" <Directory "/u1/howto"> Options Indexes FollowSymLinks AllowOverride All Require all granted </Directory> ErrorLog ${APACHE_LOG_DIR}/deb1.example.com.error.log CustomLog ${APACHE_LOG_DIR}/deb1.example.com.access.log combined Include conf-available/serve-cgi-bin.conf </VirtualHost> EOF
NOTE: `Alias /howto “/u1/howto”` is the same as `Alias /howto/ “/u1/howto/”` However, notice the trailing slashes. If so then you’ll have to open http://deb12.example.com/howto/ instead of http://deb12.example.com/howto. In the past I experienced troubleshooting scenarios because of that.
Enable the site; restart Apache2; disable default page; remove any previous index files; Restart Apache; test the Apache virtual host :
# ln -s /etc/apache2/sites-available/deb1.example.com.conf /etc/apache2/sites-enabled/deb1.example.com.conf # unlink /etc/apache2/sites-enabled/000-default.conf # rm -f /u1/howto/{index.html,.htaccess} # systemctl restart apache2 # systemctl status apache2
$ w3m -dump http://deb1.example.com $ w3m -dump http://deb1.example.com/howto/
htDig Setup
Install htdig; add the htdig and cgi-bin links:
# apt install -y htdig # ln -s /var/lib/htdig/www /u1/www/deb1/htdig # ln -s /usr/lib/cgi-bin /u1/www/deb1/cgi-bin
Configure and index htdig:
# vi /etc/htdig/htdig.conf start_url: http://deb1.example.com/howto/ # rundig -v -s -c /etc/htdig/htdig.conf # htstat
Copy the search.html web page. Test searching for the word “cgi”. It should return about 80 matches. Click on a match to verify that it works:
# cp /usr/share/doc/htdig/examples/search.html /u1/howto/ # w3m http://deb1.example.com/howto/search.html
Scroll to the bottom and verify that the scroll icons work.
htaccess Authentication Setup
Secure your howto documentation directory. Password protected the /howto/ directory. Last, enable SSL encryption.
Create a user and password:
# htpasswd -c /etc/apache2/.htpasswd user1 # chmod 0644 /etc/apache2/.htpasswd
Add .htaccess to the directory to prompt for the authentication. Restart Apache:
# cat >> /u1/howto/.htaccess <<EOF AuthType Basic AuthName "Authentication Required" AuthUserFile /etc/apache2/.htpasswd Require valid-user EOF # systemctl restart apache2
Open a new browser session and test your search: It should prompt you the first time for the username and password. You should be able to search for keywords such as ‘rewrite’ or ‘cgi’.
$ w3m http://deb1.example.com/howto/search.html
Cron Setup for htDig
Periodically run `rundig` in a cronjob in order to keep the indexes renewed, here every hour:
# crontab -e 1 */1 * * * rundig -s -c /etc/htdig/htdig.conf
Troubleshoot
Using ” Redirect “/” “https://deb1.example.com” disables the htrun feature, and your htstat will report 0 indexed files. So, do not redirect your main page. Most modern browsers automatically redirect to https anyway.
Summary
You should now be able to quickly search through your own documentation. htDig is a great tool for that for local work related searching of your own technical notes. If you are hosting online be sure to configure Apache properly and run the search encrypted under and SSL session.
The End.