Rose here. Also @umbraroze for non-kbin stuff.

  • 4 Posts
  • 47 Comments
Joined 1 year ago
cake
Cake day: June 14th, 2023

help-circle
  • Reddit has an user data checkout feature (IIRC, check out the user settings or maybe reddit help pages to find it).

    It’s a bit crap though.

    It takes a long time to process, especially if you happened to post in the era when the Reddit data infrastructure was horribly terrible instead of merely ordinarily terrible, and apparently this involves some handwork in the worst cases on behalf of the staff.

    Some data may be missing or truncated. It doesn’t give you data from privated/banned subreddits (which was a fun thing to discover because last time I tried to do this the blackouts were on), and even for legit stuff, long comments/posts may be truncated. Even so, I’m pretty sure that the dumps just straight up didn’t have all of my posts from several years ago, even if those were on public subreddits. So you need to make sure the checked out data is sensible.

    In conjunction to the official dumps, I recommend a few other tools, especially since the dumps aren’t really magnificently usable on their own. One tool that I found personally invaluable is reddit-user-to-sqlite, which allows you to import Reddit data dumps and available live user data (I think it does this by scraping or something, I’m sure it worked despite the API being shut down) to sqlite database, and Datasette is a nice frontend for browsing the posts.

    As for scrubbing, there’s tools for that are supposed to work. I think.


  • Yup. The robots.txt file is not only meant to block robots from accessing the site, it’s also meant to block bots from accessing resources that are not interesting for human readers, even indirectly.

    For example, MediaWiki installations are pretty clever in that by default, /w/ is blocked and /wiki/ is encouraged. Because nobody wants technical pages and wiki histories in search results, they only want the current versions of the pages.

    Fun tidbit: in the late 1990s, there was a real epidemic of spammers scraping the web pages for email addresses. Some people developed wpoison.cgi, a script whose sole purpose was to generate garbage web pages with bogus email addresses. Real search engines ignored these, thanks to robots.txt. Guess what the spam bots did?

    Do the AI bros really want to go there? Are they asking for model collapse?


  • I’m, like, OK, nuclear power isn’t necessarily a bad thing.
    But power plants like that should probably serve wider municipal needs.

    Building a private nuclear power plant just to power a data center? Well that’s clearly stupid.
    Building a private nuclear power plant just to power a data center focused on a niche application? Well you know how that goes.

    Also, look up SL-1. Disturbingly few Americans I’ve talked to have heard about that. Generally a good argument about why not every single thing should be powered by a tiny dedicated nuclear reactor.



  • umbraroze@kbin.socialtoLinux@lemmy.mlLinux Boomers
    link
    fedilink
    arrow-up
    21
    arrow-down
    1
    ·
    9 months ago

    So yeah, Xfce looks the same as it did 10 years ago.

    And?

    Desktop environment is meant to launch apps and give me windows and maybe have a file manager. Xfce does that. It’s a desktop environment.

    Hey, “modern” desktop environment enthusiasts, if you bring Compiz back from the dead, give us luddites a call, will you? Ohhhh you kids should have seen it back in the day. Windows and Mac users saw Compiz in action and were, like, “wat.” You don’t get them to react that way to modern Linux desktops, no. And all that is lost now. Thanks Wayland.


  • umbraroze@kbin.socialtoLinux@lemmy.ml*Permanently Deleted*
    link
    fedilink
    arrow-up
    2
    arrow-down
    1
    ·
    9 months ago

    Yeah, there’s an important distinction. Just because you could use Linux doesn’t mean you can at any particular moment.

    I don’t really do music production; I’m more into writing and visual arts and photography. I could do all of those things on Linux and be perfectly productive. But there’s a difference between being productive and being optimal. My current process happens to be based on software that runs on Windows. (Heck, a lot of the software I use already runs on both Windows and Linux, anyways.)

    The key here being that you shouldn’t lock yourself too much to just one tool and one approach, and that actually goes both ways.







  • I literally just looked at Reddit for the first time in ages.

    What the fuck.

    Here’s the thing: Reddit’s UI design has always been shitty. Old Reddit was fucking garbage, so admins cheerfully asked RES folks to fix their shit. (Instead of, you know, hiring them.) New Reddit? Always been shit, and nobody’s going to fix it.

    This Newer New Reddit? I… I don’t think they even know at this point. What. What’s going on.

    If they ask critique from the community, some AI bot will AI-pat the admin’s arse and AI-splain the remaining AI-users that things will be just fine. (Now, “things actually getting better” has literally never happened as far as Reddit or its user interface has ever been concerned, as you should well know if you’ve ever been a human Reddit user.)


  • This is literally the old EA stratagem. Give the “independent” developer basically an impossible goal and then go “well you failed to meet the goal, looks like you need a little bit of help from us, and by little help, we mean from now on, you do exactly what we tell you, or else”. EA pulled this off with Origin Systems and (to a different extent) BioWare to name just a few examples. It ended with complete sadness.

    To EA’s credit, that charade usually took a long time to come to completion. Sony is trying to pull this this so soon after acquiring Bungie.


  • My theoretical answer is this: in an ideal world, there would be no copyright at all. This is an artificial contrivance that was once dreamed up to serve physical-copy economy, and it was rendered obsolete by the digital age. Shit would be so much easier when we got rid of this shit and everyone could share everything by default without any profit motive. (Caveat: This will not work unless literally every jurisdiction on the planet gets rid of copyright laws all at once, otherwise this is way too exploitable due to power imbalance. So I don’t think this is a practical proposition. *cough* unless we all decide Anarchism is a good idea after all *cough*)

    My practical answer is this: Welllllll we’re kinda damned if we do and we’re damned if we don’t. My personal feeling is that AI creations aren’t really copyrightable, and even suggesting they are copyrightable is kind of opening a huge can of worms regarding what exactly counts as “creativity” in the first place. The best we can do under current copyright regime is to regulate how the AI datasets are curated, because goodness knows the current datasets weren’t exactly ethically obtained.


  • I was a Slashdot user.

    People kept hyping Digg as a Slashdot replacement, but trying to submit posts was actually even more futile in practice than trying to submit articles to Slashdot editors. So much bigger hivemind too. Boring unfunny comment section.

    When I first joined Reddit, it seemed like it was mostly populated by Slashdot refugees. Just people posting awesome shit. Great riveting discussions, even before anyone actually read the articles. That sort of stuff.







  • Well, if American McGee wants to rebuild the franchise from scratch, then he faces the exact same problem, doesn’t he?

    If EA wants to remake the franchise, they’re basically saying “Look, we filed the serial markers off, here’s a new Dark Alice in the Wonderland IP”, and they know nobody will buy it.

    If American McGee wants to remake the franchise, it’s basically “Look EA, we can’t actually remake the Dark Alice in the Wonderland IP, but here’s Wark Dalice in the Anderland IP”, and none of the EA’s lawyers will buy it, and he get sued to oblivion by EA.

    It’s an extermely uncomfortale stalemate regardless of the fact that the original stories were in public domain.

    Sure, American McGee can go “well fuck it, here’s a super fucking cute and lore-friendly happy trippy Alice in the Wonderland remake that totally goes to a whole different direction this time, HEY BACK OFF DISNEY LAWYERS, I said totally different direction”, but that’s no longer American McGee’s Alice, now is it?