Introduction
If you ever take the time to look at the address in the URL bar of your web browser, you will often see a string of characters separated by “/” (the forward slash). In a manner similar to the folders used on your computer, a web server uses folders but we tend to use the term directories because that’s the more unixy term and most web servers run on an Unix-like OS. When your browser goes to the address www.someaddress.com/dir1/dir2/ you are simply having the web server look at what is in the directory called dir2 (which is a subdirectory of dir1) and serve it back to you. A web server will serve back the file index.html in dir2 if it is found and if it is not, then it will show the contents of the directory. The latter is a bit of a security risk so most webser admins (including myself) turn this off.
There are however a number of webservers darted across the Internet that leave this function on either deliberately or accidentally. ‘Open directories’ refer to these such directories where the contents are visible and even downloadable. Of course the the contents can vary from random images through to ebooks or music. There are goups about that like to find and catalgoue these open directories and one such example would be the r/opendirectories subreddit. Acually, nowadays, these open directories tend to encompass more than just the driectories not locked down by Apache webservers but include folders shared by cloud storage services e.g. Google Drive or One Drive – anything where files can be viewed/downloaded without authentication.
Concerns
There are of course then implications for copyright material to be distributed illegally (depending on local laws) but legality and/or morality is beyond the scope of this article, where I will focus on the technical aspects and tools. I will however state, that you have to be extremely careful when viewing/downloading unknown materials from these directories as the files may not be labelled in a way to actually indicate what the contents are. Hence you run the risk of downloading malware or even worse, illegal/offensive materials (think extreme NSFW) and fortunately where I’ve not been subjected to this, I read a horror story where somebody came across some distrubing contents that had a lasting effect on them. My advice, if in doubt, move on; and hopefully you can see while I will refrain from linking to any such open directories (whose contents may also change after I have linked them).
Finding Files and Open Directories
The previously mentioned r/opendirectories subreddit has user submissions of open directories.
Both filepursuit.com and odcrawler.xyz are indexes that allow you to search for specific files in open directories. Anna’s Archive is similar but focussed on eBooks, magazines, comics etc. and even academic papers.
To donwload files from an open directory individually, you can often just click on the file in a web browser and download it but to download multiple files efficiently will require tools.
Downloading Multiple Files
I’ve yet to find a GUI (graphical user interface) application that will basically take the input of a directory (e.g. a URL) and download all the files within so I use two applications, both of which are open source.
Listing the Files
First a list of the URIs of the individual files needs to be obtained so OpenDirectoryDownloader (by KoalaBear84) will do this. It’s an open source CLI application and you just copy-paste the URL into it.
It then spits out a test file which can be read used by either of the next two applications.
wget is the first main program that many people recommend – it has a CLI (command line interface) as opposed to a GUI (graphical user interface), which will put off most people. wget is an application included on most linux distributions but Windows Powershell uses wget as an alias for its Invoke-WebRequest
However, I use aria2 for its simplicity and support for the HTTP(S), FTP, SFTP, BitTorrent, and Metalink protocols. It’s open-source CLI and available for various OS’s. I just stick a copy of the aria2 executable into the same folder as the text file generated by ODD (above) and execute the command:
aria2c -i theurlsfile.txt