{"id":2537,"date":"2024-03-02T17:14:51","date_gmt":"2024-03-02T17:14:51","guid":{"rendered":"https:\/\/www.jameshatton.co.uk\/blog\/?page_id=2537"},"modified":"2024-03-02T17:14:51","modified_gmt":"2024-03-02T17:14:51","slug":"open-directories","status":"publish","type":"page","link":"https:\/\/www.jameshatton.co.uk\/blog\/open-directories\/","title":{"rendered":"Open Directories"},"content":{"rendered":"\n<div class=\"wp-block-ideabox-toc ib-block-toc\" data-anchors='h2,h3,h4,h5,h6' data-collapsable='true' ><div class=\"ib-toc-container ib-toc-list-style-numbers ib-toc-hierarchical ib-toc-expanded\"><div class=\"ib-toc-header\"><div class=\"ib-toc-header-title\">Table of Contents<\/div><div class=\"ib-toc-header-right\"><span class=\"ib-toc-icon-collapse\"><span class=\"dashicon dashicons dashicons-minus\"><\/span><\/span><span class=\"ib-toc-icon-expand\"><span class=\"dashicon dashicons dashicons-plus\"><\/span><\/span><\/div><\/div><div class=\"ib-toc-separator\" style=\"height:2px\"><\/div><div class=\"ib-toc-body\"><ol class=\"ib-toc-anchors\"><\/ol><\/div><\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Introduction<\/h2>\n\n\n\n<p>If you ever take the time to look at the address in the URL bar of your web browser, you will often see a string of characters separated by &#8220;\/&#8221; (the forward slash). In a manner similar to the folders used on your computer, a web server uses folders but we tend to use the term directories because that&#8217;s the more unixy term and most web servers run on an Unix-like OS. When your browser goes to the address www.someaddress.com\/dir1\/dir2\/ you are simply having the web server look at what is in the directory called dir2 (which is a subdirectory of dir1) and serve it back to you. A web server will serve back the file index.html in dir2 if it is found and if it is not, then it will show the contents of the directory.  The latter is a bit of a security risk so most webser admins (including myself) turn this off.<\/p>\n\n\n\n<p>There are however a number of webservers darted across the Internet that leave this function on either deliberately or accidentally. &#8216;Open directories&#8217; refer to these such directories where the contents are visible and even downloadable. Of course the the contents can vary from random images through to ebooks or music. There are goups about that like to find and catalgoue these open directories and one such example would be the <a href=\"https:\/\/www.reddit.com\/r\/opendirectories\">r\/opendirectories<\/a> subreddit. Acually, nowadays, these open directories tend to encompass more than just the driectories not locked down by Apache webservers but include folders shared by cloud storage services e.g. Google Drive or One Drive &#8211; anything where files can be viewed\/downloaded without authentication.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Concerns<\/h2>\n\n\n\n<p>There are of course then implications for copyright material to be distributed illegally (depending on local laws) but legality and\/or morality is beyond the scope of this article, where I will focus on the technical aspects and tools. I will however state, that you have to be extremely careful when viewing\/downloading unknown materials from these directories as the files may not be labelled in a way to actually indicate what the contents are. Hence you run the risk of downloading malware or even worse, illegal\/offensive materials (think extreme NSFW) and fortunately where I&#8217;ve not been subjected to this, I read a horror story where somebody came across some distrubing contents that had a lasting effect on them. My advice, if in doubt, move on; and hopefully you can see while I will refrain from linking to any such open directories (whose contents may also change after I have linked them).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Finding Files and Open Directories<\/h2>\n\n\n\n<p>The previously mentioned <a href=\"https:\/\/www.reddit.com\/r\/opendirectories\">r\/opendirectories<\/a> subreddit has user submissions of open directories.<\/p>\n\n\n\n<p>Both <a href=\"https:\/\/filepursuit.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">filepursuit.com<\/a> and <a href=\"https:\/\/odcrawler.xyz\" target=\"_blank\" rel=\"noreferrer noopener\">odcrawler.xyz<\/a> are indexes that allow you to search for specific files in open directories. <a href=\"https:\/\/annas-archive.org\" target=\"_blank\" rel=\"noreferrer noopener\">Anna&#8217;s Archive<\/a> is similar but focussed on eBooks, magazines, comics etc. and even academic papers.<\/p>\n\n\n\n<p>To donwload files from an open directory individually, you can often just click on the file in a web browser and download it but to download multiple files efficiently will require tools.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Downloading Multiple Files<\/h2>\n\n\n\n<p>I&#8217;ve yet to find a GUI (graphical user interface) application that will basically take the input of a directory (e.g. a URL) and download all the files within so I use two applications, both of which are open source.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Listing the Files<\/h2>\n\n\n\n<p>First a list of the URIs of the individual files needs to be obtained so <a href=\"https:\/\/github.com\/KoalaBear84\/OpenDirectoryDownloader\/releases\" target=\"_blank\" rel=\"noreferrer noopener\">OpenDirectoryDownloader<\/a> (by KoalaBear84) will do this. It&#8217;s an open source CLI application and you just copy-paste the URL into it.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"802\" height=\"193\" data-attachment-id=\"2538\" data-permalink=\"https:\/\/www.jameshatton.co.uk\/blog\/open-directories\/image-6\/\" data-orig-file=\"https:\/\/i0.wp.com\/www.jameshatton.co.uk\/blog\/wp-content\/uploads\/2024\/03\/image.png?fit=802%2C193&amp;ssl=1\" data-orig-size=\"802,193\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/www.jameshatton.co.uk\/blog\/wp-content\/uploads\/2024\/03\/image.png?fit=802%2C193&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/www.jameshatton.co.uk\/blog\/wp-content\/uploads\/2024\/03\/image.png?resize=802%2C193&#038;ssl=1\" alt=\"\" class=\"wp-image-2538\" srcset=\"https:\/\/i0.wp.com\/www.jameshatton.co.uk\/blog\/wp-content\/uploads\/2024\/03\/image.png?w=802&amp;ssl=1 802w, https:\/\/i0.wp.com\/www.jameshatton.co.uk\/blog\/wp-content\/uploads\/2024\/03\/image.png?resize=300%2C72&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.jameshatton.co.uk\/blog\/wp-content\/uploads\/2024\/03\/image.png?resize=768%2C185&amp;ssl=1 768w\" sizes=\"auto, (max-width: 802px) 100vw, 802px\" \/><\/figure>\n<\/div>\n\n\n<p>It then spits out a test file which can be read used by either of the next two applications.<\/p>\n\n\n\n<p><a href=\"https:\/\/www.gnu.org\/software\/wget\/\" target=\"_blank\" rel=\"noreferrer noopener\">wg<\/a><a href=\"https:\/\/www.gnu.org\/software\/wget\/\">et<\/a> is the first main program that many people recommend &#8211; it has a CLI (command line interface) as opposed to a GUI (graphical user interface), which will put off most people. wget is an application included on most linux distributions but <a href=\"https:\/\/www.educba.com\/powershell-wget\/\" target=\"_blank\" rel=\"noreferrer noopener\">Windows Powershell uses wget<\/a> as an alias for its <a href=\"https:\/\/learn.microsoft.com\/en-us\/powershell\/module\/microsoft.powershell.utility\/invoke-webrequest?view=powershell-7.4\" target=\"_blank\" rel=\"noreferrer noopener\">Invoke-WebRequest<\/a><\/p>\n\n\n\n<p>However, I use <a href=\"https:\/\/github.com\/aria2\/aria2\/releases\" target=\"_blank\" rel=\"noreferrer noopener\">aria2<\/a> for its <a href=\"https:\/\/aria2.github.io\/\" target=\"_blank\" rel=\"noreferrer noopener\">simplicity<\/a> and support for the HTTP(S), FTP, SFTP, BitTorrent, and Metalink protocols. It&#8217;s open-source CLI and available for various OS&#8217;s. I just stick a copy of the aria2 executable into the same folder as the text file generated by ODD (above) and execute the command:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>aria2c -i theurlsfile.txt<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Introduction If you ever take the time to look at the address in the URL bar of your web browser, you will often see a string of characters separated by &#8220;\/&#8221; (the forward slash). In a manner similar to the folders used on your computer, a web server uses folders but we tend to use the term directories because that&#8217;s the more unixy term and most web servers run on an Unix-like OS. When your browser goes to the address www.someaddress.com\/dir1\/dir2\/ you are simply having the web server look at what is in the directory called dir2 (which is a subdirectory of dir1) and serve it back to you. A web server will serve back the file index.html in dir2 if it is found and if it is not, then it will show the contents of the directory. The latter is a bit of a security risk so most webser admins (including myself) turn this off. There are however a number of webservers darted across the Internet that leave this function on either deliberately or accidentally. &#8216;Open directories&#8217; refer to these such directories where the contents are visible and even downloadable. Of course the the contents can vary from random[&#8230;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"ngg_post_thumbnail":0,"footnotes":""},"class_list":["post-2537","page","type-page","status-publish","hentry"],"featured_image_src":null,"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/P89zH1-EV","jetpack-related-posts":[],"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/www.jameshatton.co.uk\/blog\/wp-json\/wp\/v2\/pages\/2537","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.jameshatton.co.uk\/blog\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.jameshatton.co.uk\/blog\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.jameshatton.co.uk\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.jameshatton.co.uk\/blog\/wp-json\/wp\/v2\/comments?post=2537"}],"version-history":[{"count":1,"href":"https:\/\/www.jameshatton.co.uk\/blog\/wp-json\/wp\/v2\/pages\/2537\/revisions"}],"predecessor-version":[{"id":2539,"href":"https:\/\/www.jameshatton.co.uk\/blog\/wp-json\/wp\/v2\/pages\/2537\/revisions\/2539"}],"wp:attachment":[{"href":"https:\/\/www.jameshatton.co.uk\/blog\/wp-json\/wp\/v2\/media?parent=2537"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}