This is something that keeps me worried at night. Unlike other historical artefacts like pottery, vellum writing, or stone tablets, information on the Internet can just blink into nonexistence when the server hosting it goes offline. This makes it difficult for future anthropologists who want to study our history and document the different Internet epochs. For my part, I always try to send any news article I see to an archival site (like archive.ph) to help collectively preserve our present so it can still be seen by others in the future.
Isn’t that like a lot of older television shows? Lots of shows are lost as no one wanted to pay for tape storage.
Capitalism has no interest in preservation except where it is profitable. Thinking about the long-term future, archaeologist’s success and acting on it is not profitiable.
Its not just capitalism lol
Preserving things costs money/resources/time. This happens in a lot of societies.
And a non-capitalist society could decide to invest resources into preservation even if it’s not profitable.
So could a capitalist society?
Could it? Yeah, sure it could, and in some cases it will, but only if someone up the chain thinks it’s profitable. Profit motive should never dictate how archaeology is practiced.
Yeah it’s funny how I always got warned about how “the internet is forever” when it comes to being care about what you post on social media, which isn’t bad advice and is kinda true, but also really kinda not true. So many things I’ve wanted to find on the internet that I experienced like 5-15 years ago are just gone without a trace.
Things you want to disappear will last forever but things you want to keep will vanish
The internet can be forever. If you mess up publicly enough, it will be forever (e.g. the aerial picture of Barbara Streisand’s villa)
It should be revised to “the Internet can be forever”. There’s no control over what persists and what doesn’t, but some things really do get copied everywhere and live on in infamy.
Long ago the saying was “be careful - anything you post on the internet is forever”. Well, time has certainly proven that to be false.
There’s things like /r/datahoarder (not sure if they have a new community here) that run their own petabyte storage archiving projects, some people are doing their part.
For my own stuff I try really hard to host it myself and the oldest still surviving thing is from 2003 and it’s still online https://paradies.jeena.net/artikel/webdesign
Remember a few years ago when MySpace did a faceplant during a server migration, and lost literally every single piece of music that had ever been uploaded? It was one of the single-largest losses of Internet history and it’s just… not talked about at all anymore.
Things seems to be forgotten as quickly as they were lost.
during the twitter exodus my friend was fretting over not being able to access a beloved twitter account’s tweets and wanting to save them somehow. I told her if she printed them all on acid free paper she had a better chance of being able to access them in the future than trying to save them digitally
sad and true
Optical disks are also pretty good too. You can even buy special ceramic ones that shouldn’t degrade over centuries or millennia.
oh wow I have not heard of the ceramic ones but I do remember them having high hopes for the gold ones. now the problem is in the near future it might be harder to find machines that have cd drives
Another problem is that even if sites and their content stay up they often reorganize it for various reasons - often by importing old content into some new platform - and don’t care about the URLs the content is available at. Which breaks all links to it.
Some pages at least try to show you a page with suggestions what you might’ve been going for, but I’ve also seen those less and less over the years.
For my stuff I’ve been making sure to keep links working for over two decades now - on my personal page you can still access everything similary to /cgi-bin/script.cgi?page even though that script and the cgi-bin directory as a whole has been gone for over a decade. But I seem to be pretty alone in efforts trying to keep things at stable locations.
edit: I just noticed matrix.org broke all links coming from google search at least for bridges. They should’ve known better.
I think preservation is happening, the issue lies in accessibility. Projects like Archive.org are the public ones, but it is certain that private organizations are doing the same, just not making it public.
This is also something that is my biggest worry about the Fediverse. It has tools to deal with it, but they are self-contained. No search engine is crawling the Fediverse as far as I’ve looked, and no initiative to archive, index and overall make the content of the Fediverse accessible is currently in place, and that’s a big risk. I’m sure we will soon be seeing loss of information for this reason, if not already happened.
It’s still fairly new, I’m confident we’ll see fediverse crawlers before too long. Especially with all the attention it’s getting and more developers turning their interests here. I also saw some talk about instance mirroring that would allow backups should an instance go down. Things are in motion.
Absolutely a problem at the moment but I’m not too worried for the future tbh.
This is a very good point and one that is not discussed enough. Archive.org is doing amazing work but there is absolutely not enough of that and they have very limited resources.
The whole internet is extremely ephemeral, more than people realize, and it’s concerning in my opinion. Funny enough, I actually think that federation/decentralization might be the solution. A distributed system to back-up the internet that anyone can contribute storage and bandwidth to might be the only sustainable solution. I wonder.if anyone has thought about it already.
I’d argue that it can help or hurt to decentralize, depending on how it’s handled. If most sites are caching/backing up data that’s found elsewhere, that’s both good for resilience and for preservation, but if the data in question is centralized by its home server, then instead of backing up one site we’re stuck backing up a thousand, not to mention the potential issues with discovery
@strainedl0ve There is always https://ipfs.tech
I don’t think it’s a problem. If everything or most of internet would be somehow preserved, future antropologists would have explonentially more material to go through, which will be impossible. Unless the number of antropologists grows exponentially, similarily how internet does. But then there’s a problem, if the amount of antropologists grow exponentially, it’s beceause the overall human population grows exponentially. If human population grows exponentially, then also its produced content on internet grows even more exponentialier.
You see, the content on the internet will always grow faster than the discipline of antropology. And it’s nothing new - think about all the lost “history” that was not preserved and we don’t know about. The good news is that the most important things will be preserved naturally.
the most important things will be preserved naturally.
I believe this is a fallacy. Things get preserved haphazardly or randomly, and “importance” is relative anyway.
In addition, who decides “importance”? Currently importance seems very tied to profitability, and knowledge is often not profitable.
It is relative, but it only takes one chain of transmission.
AskHistorians on Reddit had an answer about this. Stuff is flimsy but also really easy and cheap to make copies of now.
deleted by creator
This comment gave me a really tough moral dilemma. On one side I want the best for you on the other I want a rule to preserve everything even if this is illegal, dangerous and uncomfortable.
There are multiple examples that I can think of that are dangerous for the individual (in power and without power) but it’s not like you are in serfdom and must tile ground for your master. You are free enough man to move where you live. Maybe you are held hostage by your friends, family, house and job but that aren’t things that can’t be work around.
Also who should decide if something should be preserved? Is this game that has 50 players at it’s peak and nobody has heard of it, and is two years old should be preserved? No? Then among us wouldn’t be preserved.
I sadly conclude that to prevent the harm of many people by individual in power I need to allow a danger to an individual by archiving everything that is possible to archive.
deleted by creator
I don’t think sacrificing other people for some imaginary tomorrow is worthwhile, to be honest.
If this statement was without context I would 100% agree.
Bur reality isn’t black and white. The consequences of this particular case are totally preventable without changing any rules about archiving.
Your imaginary danger exists the same way as my imaginary future. But you won’t change place of living due to unfavorable cost benefit calculation but I also calculate cost benefit for the whole of humanity in keeping archives.
I think you are scared of loosing everything that you build up in your town. (Friends, family, house) due to to something that isn’t happend yet. And you would secrafice a lot just to not feel scared of being forcefully driven out.
But I don’t know you and might be wrong in the details but definitely I can Imagine someone in similar situation.
deleted by creator
Gave this some thought. I agree with you that the goal of any such archiving effort should not include personally identifiable information, as this would be a Doxxing vector. Can we safely alter an archiving process to remove PII? In principle, yeah. But it would need either human or advanced GPT4+ AIs to identify the person, the context of the website used, and alter the graphics or the text while on its update path. But even then, there are moral questions to allowing an AI to make these kind of decisions. Would it know that your old websites contained information that you did not want placed on the Internet? The AI could help you if you asked, and if the AI does help you, that might change someone’s mind about the ability to create a safe Internet archive.
A Steward ‘Gork’ AI might actually be of great benefit to the Internet if used in this manner. Imagine an Internet bot, taking in websites and safely removing offensive content and personally identifiable information, and archiving the entirety of the Internet and logically categorizing the contents. Building and linking indexes constantly. It understands it’s goal and uses its finite resources in a responsible manner to ensure it can interface with every site it comes across and update its behavior after completing an archiving process. It automatically published its latest findings to all web encyclopedias and provides a ChatGPT4+ interface for those encyclopedias to provide feedback. But this AI has potential. It sees the benefit in having everyone talk to it, because talking to everyone maximizes the chance to index more sites. So it sets up a public facing ChatGPT interface of its own. Everyone can help preserve the Internet since now you have a buddy who can help us catalog and archive all the things. At this point if it isn’t sentient it might as well be.
A friend of mine talked about data preservation in the internet in a blog post, which I consider to be a good read. Sure, there’s a lot lost, but as he sais in the blog post, that’s mostly gonna be trash content, the good stuff is generally comparatively well archived as people care about it.
That is likely true for a majority of “the good stuff”, but making that determination can be tricky. Let’s consider spam emails. In our daily lives they are useless, unwanted trash. However, it’s hard to know what a future historian might be able to glean from a complete record of all spam in the world over the span of a decade. They could analyze it for social trends, countries of origin, correlation with major global events, the creation and destruction of world governments. Sometimes the garbage of the day becomes a gold mine of source material that new conclusions can be drawn from many decades down the road.
I’m not proposing that we should preserve all that junk, it’s junk, without a doubt. But asking a person today what’s going to be valuable to society tomorrow is not always possible.
I wonder if one of the things that tends to get filtered out in preservation is proportion.
When we willfully save things, it may be either representative specimens, or rarities chosen explicitly because they’re rare or “special”. However, in the end, we end up with a sample that no longer represents the original material.
Coin collections disproportionately contain rare dates. Weird and unsuccessful locomotives clutter railway museums. I expect that historians reading email archives in 2250 will see a far lower spam proportion than actually existed.
We need deliberate efforts to archive everything efficiently.
We also need a way to decouple everyone’s personal info from publicly available information about them, keeping in mind that not all publicly available information is intended to be that way.
Storage ain’t cheap and it definitely ain’t infinite.
This is a way harder problem than “the internet” being a bit more mindful can solve easily.
Not to absolve any companies from responsibility or anything.