• 2 Posts
  • 6 Comments
Joined 1 day ago
cake
Cake day: October 7th, 2025

help-circle

  • I was able to get the comments uploaded after splitting them in 7zip. After combining with 7zip, hashes still match and first couple of items looked good to go.

    Also threw a python module and example loop in there, and an example comment and submission item in the readme.

    I was thinking of making a class for the threads, another class for comments and making each submission an instance of the thread class, comments as instances in a list for their parent thread so we can track comment author and utc for chronology.

    I can build out classes and assigning comments to threads. I imagine you’re going to be much faster at all of this than I am, so whenever you’ve got the image scrape ready, let me know and I’ll start grabbing my share of the links.



  • Awesome! https://github.com/hoglegcc/rebuildtheark

    I was able to add the submissions zst, but I’m running into a file limit size to add the comments archive. Compressed, the comments are around 35MB (480 uncompressed) and github has a limit of 25MB. Worst case scenario I can get it to you another way.

    I’m technically supposed to be doing something else right now, but I should be able to get the comments issue squared in the next couple of hours. Also, dunno if you want a private repo for the actual bot; don’t know if you keys you’re worried about or anything like that. I can dump a private repo, or if you want to make one and add me, either way works. Or if public repo works for you, that works for me, too.

    Thanks!



  • Oh, that’s awesome. If you’ve got a bot that can already parse and push to fosscad.io, we should definitely be able to tweak that. I’m not active on discord (any social media really), but I imagine that’s the place to organize an effort like this. If there is an alternative that others prefer (I’ve heard about matrix and element), I’m open to suggestion.

    I’ve been fighting the flu the last couple of days, but on the upswing now. Dunno how much I’ll be able to dig in today, but I’ll get a github set up; I can dump the zst’s there and some psuedocode and notes. If I can see what your bot is ingesting, I can try to match output from the zst’s to it.

    I think there will be a bit of work marrying comments to submissions; they’re split up into two separate archives. Since the pictures are time sensitive (potentially), maybe the move is trying to focus on looping through the submissions and grabbing the pics from their urls, then rebuilding after the fact.

    I’ve got plenty of local storage for pics or if we can dump straight to lemmy, that would be great. I’m completely ignorant to this platform as far as rate-limiting, storage, any of that fun stuff. I don’t know how big a whole subreddit will end up being, but I imagine it’s not inconsequential.