Here's an interesting equation: Most bands and labels are posting free mp3s of their latest music on their sites. Add to that an army of fans scouring these sites daily, then blogging what they find. The result is a constant stream of new music being discovered, sorted, commented, and publicized. But how to keep up? For a while, I just visited a couple of interesting and well written mp3 blogs, but then they'd link to a couple more, and I'd start reading those. And then that happened a few dozen more times. My desire to stay in touch was in conflict with my increasingly limited free time. Wget to the rescue. It's a utility for unix/linux/etc. that goes and gets stuff from Web and FTP servers -- kind of like a browser but without actually displaying what it downloads. And since it's one of those awesomely configurable command line programs, there is very little it can't do. So I run wget, give it the URLs to those mp3 blogs, and let it scrape all the new audio files it finds. Then I have it keep doing that on a daily basis, save everything into a big directory, and have a virtual radio station of hand-filtered new music. Neat. Here's how I do it: wget -r -l1 -H -t1 -nd -N -np -A.mp3 -erobots=off -i ~/mp3blogs.txt And here's what this all means: -r -H -l1 -np These options tell wget to download recursively. That means it goes to a URL, downloads the page there, then follows every link it finds. The -H tells the app to span domains, meaning it should follow links that point away from the blog. And the -l1 (a lowercase L with a numeral one) means to only go one level deep; that is, don't follow links on the linked site. In other words, these commands work together to ensure that you don't send wget off to download the entire Web -- or at least as much as will fit on your hard drive. Rather, it will take each link from your list of blogs, and download it. The -np switch stands for "no parent", which instructs wget to never follow a link up to a parent directory. We don't, however, want all the links -- just those that point to audio files we haven't yet seen. Including -A.mp3 tells wget to only download files that end with the .mp3 extension. And -N turns on timestamping, which means wget won't download something with the same name unless it's newer. To keep things clean, we'll add -nd, which makes the app save every thing it finds in one directory, rather than mirroring the directory structure of linked sites. And -erobots=off tells wget to ignore the standard robots.txt files. Normally, this would be a terrible idea, since we'd want to honor the wishes of the site owner. However, since we're only grabbing one file per site, we can safely skip these and keep our directory much cleaner. Also, along the lines of good net citizenship, we'll add the -w5 to wait 5 seconds between each request as to not pound the poor blogs. Finally, -i ~/mp3blogs.txt is a little shortcut. Typically, I'd just add a URL to the command line with wget and start the downloading. But since I wanted to visit multiple mp3 blogs, I listed their addresses in a text file (one per line) and told wget to use that as the input. I put this in a cron job, run it every day, and save everything to a local directory. And since it timestamps, the app only downloads new stuff. I'll should probably figure out a way to import into iTunes automatically with a script and generate a smart playlist, so I can walk in, hit play, and have the music just go. The following are a couple of lists of mp3 blogs that you can use to find authors that match your musical tastes. Put their URLs in your text file and off you go. mp3 blogs: defining fair use Close Your Eyes: mp3 blogs How do you find new music? Previous: Why APIs Are So Cool Up: Home 25 comments so far (Post your own) 1 On 07 Jul 2004, Keith said... Sounds very interesting, thanks for the write-up I'll check it out. So, how do I find new music. Well, a few ways: 1 -- KEXP. You can listen to them online at kexp.org. 2 -- PitchforkMedia.com. There are lots of great reviews in there. 3 -- Live shows. I tend to catch a lot of shows and many times I get introduced to a new band there. 4 -- Newsgroups. I check various lists on Easynews.com and dip in and sample anything that looks interesting. This is how I discovered Snow Patrol last year. 5 -- Word of mouth. Blogs, such as music (for robots) and then friends. My brothers seem to dig up good recommendations quite a bit. Of course then I try and share these finds on my own sites. Finding and sharing good music is one of my favorite pleasures in life. 2 On 07 Jul 2004, Adrian Holovaty said... Jeff, you've invaded the turf of the wget and curl weblog! :-) http://www.superdeluxo.com/wget_curl/index.php 3 On 07 Jul 2004, Patrick said... Question: Where do the files get saved. You refer to a single directory... but not where it is created. Or defined? 4 On 07 Jul 2004, pb said... garageband.com is trying to pick up where mp3.com and cnet left off. 5 On 07 Jul 2004, veen said... The files get saved in the directory from which you issue the command. You can chage that by adding a greater-than sign and specifying a directory. For example: wget [all the switches] > ~/jeff/Music/ 6 On 07 Jul 2004, JP said... I didn't have wget with my default install of OS X.3 (Panther). This article, Building wget 1.9 on OS X.2.8 [http://wincent.org/article/articleview/173/1/8], was a great help—it worked fine with Panther. Thanks a lot for the script, Jeff! 7 On 07 Jul 2004, Ben said... If you don't want to save all these files, you could just use a Webjay bookmarklet to scrape any page and make a playlist out of it. This assumes of course that you are on a live net connection... 8 On 07 Jul 2004, anders said... you should use "-A.mp3,.ogg" to catch the oggs too. 9 On 07 Jul 2004, JP said... Regarding Applescripts & playlists, it looks like this might be the start of a useful script: http://www.malcolmadams.com/itunes/scripts/scripts06.php?page=1#droptoaddnmake It'd be great if someone could alter the script to automatically run in tandem with the above shell script so that one could just press play, as Jeff suggested above. (hint, hint) 10 On 07 Jul 2004, sean said... "wget [all the switches] > ~/jeff/Music/" Hmm I use '-P'. wget [all the switches] -P ~/Music Jeff, nice one-liner. Extremely useful. 11 On 07 Jul 2004, Scott said... This should be very do-able, at least with perl... I don't have osx (dirty windows user am I) but this looks promising: http://www.macdevcenter.com/pub/a/mac/2002/11/22/itunes_perl.html I've taken things a bit further, myself, by making that text file in the form of: FolderName=URL Then I parse the file and run wget with a different -P parameter for each blog that I'm scraping. I can sort out better which blogs are more to my liking that way. I could post the (extremely simple) perl script I use to do this if there's interest, or you can email me. 12 On 07 Jul 2004, John Y. said... I cruise by www.3hive.com once a week or so; all they do is post free, legal mp3 downloads with brief descriptions. Also, they categorize it. 13 On 07 Jul 2004, Tim said... www.last.fm plays everything from obscure to major label stuff. You tell it what you like. It's kind of like creating your own radio station. Definitely still in beta. 14 On 08 Jul 2004, Richard Earney said... I think curl is the new wget. 15 On 08 Jul 2004, dekay said... I think this is about the first application I see that screams "Folder actions"!!! But then: How do you do it? Oh, btw: Fusker could help, too :) 16 On 08 Jul 2004, Steve K. said... I like PureVolume myself (http://www.purevolume.com/). 17 On 08 Jul 2004, paolo said... Yep, use Webjay.org ! you can create playlists of mp3s available on the web or simply listen to the playlists created by other people. 18 On 08 Jul 2004, Lucas said... Given an mp3 blog at http://www.redfishaudio.com/samples.html, pick up http://gonze.com/m3udo, then do: GET 'http://webjay.org/playthispage?x-fmt=m3u&url=http://www.redfishaudio.com/samples.html' | tee redfishaudio.m3u | m3udo wget - You want to use the playthispage utility for the scraping because figuring out what's a bona fide audio file is very picky and generally intensely buggy. Otherwise you will end up with HTML, troff, pdf, and a bunch of other junk in your playlist, and that will make mp3 players throw an error and stop. Also, this way you get the playlist in most-recent order with duplicates stripped out. :) 19 On 08 Jul 2004, Jeff said... You can get Win32 ports of wget and other *nix utils here: http://unxutils.sourceforge.net/ 20 On 08 Jul 2004, bogg said... You can't put wget into the public audience it is too good!!! nice popscraper - could you do one for ringtones too? http://www.mobile-phone-directory.org/ 21 On 08 Jul 2004, Philip Dorrell said... "Weak Subscriptions", which is part of the Womcat Bookmarks application, does something similar, but includes a popularity ranking. 22 On 09 Jul 2004, Jean Jordaan said... What about mp3s you've heard and decided once is enough? If you delete the file, wget will download it again. If you leave it, it'll clog up your playlist and disk. Maybe 'echo "" > badfile.mp3' to zero it, and tell wget --no-clobber ? (Your mp3 player will probably still list it though :( ) 23 On 09 Jul 2004, sfb said... And what's in your mp3blogs.txt? It should only be links to blogs, so you should be able to post it, and I would be interested to have a starting point of mp3 blogs. 24 On 09 Jul 2004, veen said... my mp3blogs.txt: http://teachingtheindiekidstodanceagain.blogspot.com/ http://www.fatplanet.com.au/ http://music.for-robots.com/ http://tofuhut.blogspot.com/ http://www.scenestars.blogspot.com/ http://blog.largeheartedboy.com/ http://www.livejournal.com/community/talkiewalkie/ http://www.kingblind.com/