Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Making of ReoCities, 600K accounts recovered already and counting (reocities.com)
90 points by jacquesm on Oct 28, 2009 | hide | past | favorite | 51 comments


A few random clicks brought me this:

http://www.reocities.com/Area51/1997/rants/oj.htm

How could I have known OJ was involved with covering up the moon landing hoax without geocites? I forgot how much fun random readings of goecites could be.

I'm partying like its 1999. Thanks so much for this effort! The epic struggle to save the cities was fascinating. Geek-heroism at its finest.


My favorite so far was this:

Jason Kottke blogged about geocities going down:

http://kottke.org/09/10/walter-millers-home-page

Then someone told him the page was up at reocities and he linked to it, but lamented that one of his favorite pages wasn't up, 'cartoon girls I wanna nail'.

So I got an email from one of his 'fans' if I could restore that page pronto, so I did and mailed him :)

http://www.reocities.com/TelevisionCity/1356/cartoon2.htm


How could I have known OJ was involved with covering up the moon landing hoax without geocites? I forgot how much fun random readings of goecites could be.

OJ was in Capricorn One, which many of the moon hoax nuts think is a documentary about how NASA did it.

So it makes sense to think that they'd think he'd be involved in the coverup.


Sell Reocities to AOL for $6 billion!


You never know! This may be a good opportunity to launch a startup. Some users may be interested in maintaining their sites so maybe tools can offered to help them in that regard. The work done so far is commendable and there is more work yet to be done. It would be a shame if all the work goes unrewarded!


If someone could guarantee say 100 years of webpage hosting and uptime, for a cheap fixed one-time cost, I think they could make a lot of money AND stop the massive page-drain of the Web today.


It's interesting - hosting/bandwidth costs should decrease more or less exponentially over time. (after correcting for inflation) That fixed fee should therefore be substantially less than 100 * (hosting for a year).


It would be anyway because of time value of money.


That sounds like a challenge worth taking.


It certainly wouldn't be the first time that someone made something profitable out of someone else's "trash". Good on ya!


Could somebody explain how it makes sense for a few nerds to do this, but it does not for yahoo?

If geocities was eating that much of their bandwidth (which I doubt), couldn't they advertise on it?

I just don't understand how a few geeks can like this enough to buy (or scavenge from their closet) the necessary equipment to do a project like this, but a multi-billion-dollar technology powerhouse like Yahoo! cannot.


It makes sense to me; there is little to no profit expectation for Yahoo with regards to GeoCities at this point. Since they are a for-profit company, that's at least a little bit important.

I'm not saying GeoCities couldn't make money, but you have to remember that there is a bureaucracy charge for every big company that wants to maintain software. If there is a developer on the project, there has to be a manager. If there's a manager on the project, there will be reporting requirements. If there are reporting requirements, there must be an analyst, etc.

It's easy for me to see Yahoo considering this not worth their time; I don't think there were any nefarious intentions at all.


My own guess is that they wanted to push the millions of GeoCities users to become customers for Y!s paid hosting service.

But to me that smacks of bait-and-switch, if you buy up an advertising supported hosting service you should not change the basic business model that made it a success.


I agree, but markets do change. What was a successful business model in 2000 may not be viable in 2009.


Bandwidth costs are actually a small fraction of what they were back then, it should be easier today.


A bit of a sidenote.

Way (way) back in the day when geocities and angelfire were the only free ways to have a website they kept putting ads on peoples pages.

This led to a bunch of different scripts and hacks being created that would hide the ads. There then seemed to be an ongoing battle between the hackers and site devs on how to insert html code that couldn't be hidden via JS/CSS/HTML


At a guess, Liability.

For Yahoo to state that they'd be keeping an archive of geocities up, there would be legal issues about what kind of guarantees they're giving, legal state of the sign-up 'contract' that people entered into when they started their geocities page, legal requirements for proving someone is who they say they are when they suddenly say 'hey, delete my page I created in 1997, btw, I moved 8 times and changed my name 3 times', how that all fits in with EU data protection laws, etc etc.

Geeks doing it makes it a 'fun project', with no expectation of those legal issues (not that some of them don't exist still). If it vanishes overnight, it's a loss of something that wasn't guaranteed,

For an individual to do something is a different endevour from a large corporation doing something, and in this case, the financial burden of keeping a read-only archive of geocities might have been too much for yahoo.

Or perhaps they never even contemplated it :)



Neat project, but some things I am wondering... What do you mean by recovering an account? I assume you don't have the password? Will people have control over their old content? Is this legal--don't the authors own the copyright of their pages?


No, I don't have the password.

The plan is to give people control again, but that's not a very easy problem. A number of solutions have already been suggested though, and I think that's something that can be dealt with.

Obviously, if someone does not want their content on there I'll remove it immediately, as far as I'm concerned I'm just hosting it.

And yes, of course the authors have the copyright to their pages, that's an inalienable right.

So far though, the only things I've received to that effect are people that are actively contributing their content, nobody has asked to have it removed. (And the number of 'contributors' is quite significant already (50+)).


Have you considered the ramifications of becoming a replacement for GeoCities, as opposed to just an archive of it?

(Saying "no" to this question isn't a bad thing! Sometimes the most interesting projects are a result of doing something because it's possible, and thinking about what that means later.)


Yes, I did. Not rigorously but sort of back-of-the-envelope. But I'll deal with that when the requests come in, the first stop for me now is to get complete coverage.

Then to couple the content to user accounts and to give a modicum of control back to the real owners of the content.

After that there are two possible avenues, the first one is a site where no new accounts are accepted and you can only remove or update files that are already there (mostly to save people from embarrassment).

The second option would be to open up new account creation, but that's a different kettle of fish. It would require some major development in terms of spam and abuse control and dealing with that. We have some of that for ww.com (nsfw, most of the times it is, but you never know if some 'jerk' is being an ass), so we can use that or expand it.

My first thought would be to identify a number of users as neighborhood cops.

I'm sure that will be a workable solution, but we're still a ways away from that.

The next big milestone will be complete coverage, after that we will move to more functionality.

Webcounters, webrings and so on will all be restored as far as possible.


A suggestion: if people request removal, of course remove it from the public-facing archive, but don't completely delete their pages. Keep everything around as a private archive until it's appropriate to release publicly.

Consider letters from the Civil War. Publishing them a few years after the war would have been a gross violation of someone's privacy. But now they are an invaluable primary source historians have for understanding the era.

GeoCities was the web in its infancy. We know what that was like because we remember it, even took part in it. People in 100 years will want insight into that time period as well.


Like a 20 year embargo or so, that's an excellent idea!

I'm something of a data packrat anyway, so I probably would just move them out of sight, but a mechanism to restore the pages in time would be good.

Thank you!


20 is probably not enough. It would need to be closer to 100 years both because of copyright law, and to make sure the owners are no longer alive.

Put this site in your will for your descendants :) (You have any kids?)


Yes, one


Depending on Yahoo/Geocities' TOS, it may be Yahoo that has copyright.


You should make the DMCA policy linked on your front page. That's the safest way to avoid legal troubles.

Also remember you can make your DMCA requirements as hard as possible.

   -Your name, address, phone # and email address;
   -Location of your content on our site
   -your electronic or physical signature or the electronic or physical signature of the person authorized to act on your behalf;
   -a statement made by you under penalty of perjury, that the information in your notice is accurate, that you are the copyright owner or authorized to act on the copyright owner’s behalf


The DMCA does not apply to this site.


Can you tell me if there are any in the SiliconValley neighborhood with the address of 7771? I don't know if I ever deleted it, but I don't remember the exact subneighborhood.


it's not in the set that has been restored so far, but I haven't even scratched the surface yet. I've made a note and if it comes up I'll drop you an email.


Awesome. Thanks!


Makes me feel nostalgic for this Angelfire Sonic fansite I had when I was younger. I didn't find my page (probably dead), but here's a delicious piece of retro I did see:

http://www.angelfire.com/sk/sonicknuckles/


This is wonderful. My friends and I made a few sites on there, but I can't remember where. Mine and another friend's were in SiliconValley, I remember that much. Thank you, thank you!


Has anyone from Yahoo or Geocities contacted you, officially or unofficially? I'd imagine there must be some worker bees within the organization who were glad to see you doing this.


Excellent !!


@jacquesm have you thought about eventually sharing the recovered data with Archive.org?


Yes, absolutely, we've been going back and forth during the last days of the crawl and sharing seed lists, they had plenty I didn't have and vv.

Now we're going to merge both sets, for technical reasons (they need their data in a specific 'full headers' format) I'll be on the receiving end, then later archive.org will recrawl all of reocities.

Then there is 'textfiles', they also did a crawl and we will share data as well.

In the end we should be able to recover pretty much all of it, including the international sites.


s/icon/iconic/


Fixed, thanks ! There isn't a day that I still learn something about English... sigh...


Your welcome. This is one of my favorite parts of HN: built in error correction.


hehe, You're welcome.

Let me return the favor there ;)


HAH! Stellar.


I'm curious as to how you are dealing with potential copyright issues. (Or, for that matter, how archive.org and the like approach these problems.)


The same as everybody else, simply ask to have your stuff removed and it will be done, easy as that.

I don't claim copyright over anything but the http://reocities.com/ and http://reocities.com/newhome/makingof.html pages at this point, all the rest is owned by the respective users.

The funny thing is though, that was my question before doing this, and I figured it's better to ask for forgiveness than to ask for permission, first let's get it done.

Much to my surprise the only mail I've got so far is people that are literally ecstatic that their pages weren't lost and people that send me their backups for inclusion.

Not a single removal request. Though I'm sure that in time that will happen.


> Much to my surprise the only mail I've got so far is people that are literally ecstatic that their pages weren't lost and people that send me their backups for inclusion.

I seem to recall a recent post on another site about this - reddit maybe - about parallel efforts to archive GeoCities, and whether everyone was working together for maximum benefit.

Did you make that post? If you could respond and link to it, I'd appreciate it.

EDIT: Gotta be honest that I don't recall what exactly I'm talking about. Here are a couple reddit links for anyone curious:

The making of Reocities.com - one man's attempt to archive the entire contents of Geocities.com http://www.reddit.com/r/reddit.com/comments/9yne3/the_saving...

The Saving of Geocities, a tale of bandwidth and stress, most if not all Geocities content saved! http://www.reddit.com/r/reddit.com/comments/9yne3/the_saving...


hey Walter,

Yes we're pooling resources, good chance that we'll get almost everything.


Excellent. I wish you well, of course. I'm not a guy who personally believes in intellectual property, so I wasn't trying to cast any aspersions on your project.

I didn't figure that you were claiming copyright over any one else's material. I just wasn't sure that removal after the fact was sufficient to avoid legal troubles for "reproducing" their works without permission.

I expect that the vast majority of the content producers will be pleased to see their works live on. I admit to not having checked out the site itself: are you going to provide them means to keep the pages updated, as well? Under similar terms to their Geocities pages.. whatever those were?


It depends on what country you reside. In the USA, UK, Israel etc, you have a fair use clause and you might be allowed of doing it.

In other countries, alone the fact that you are storing the content (without publishing it) is blatant theft.


It's highly unlikely that fair use would apply here, since the amount of copied work is 100% of the original. This is not using a small excerpt for edudcation purposes...

That being said, a DMCA notice on the site would protect the content provider. The way it works is that if you have a copyright claim against the site, they have a duty to take it down when you inform them. As long as they respect that, they are pretty much covered.


That's definitely an option, see below.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: