I’d be down for mass import as long as it’s done well.

My idea is either do it with selenium, simulating the usage of a normal user on the website. In that case, every post would have my user as the creator and would generate a lot of spam in the front wall. Or it could be done directly to the database, but that would create bogus users that would be unavailable to use for anyone else. I think it would be better to do the later but with a reserved user that you create only for that purpose and has a descriptive name like “bot”. Let me know your ideas.

@thann
mod
admin
link
fedilink
2
edit-2
4M

I think it makes sense to have a bot account specifically for this.

Selenium might be a bit overkill as SO uses server-rendered-pages so curl can scrape the content just fine =]

Also the bot should leave a link to the answer on SO 🤔

@ajr@lemmy.ml
creator
link
fedilink
14M

I mean selenium for uploading to HeapOverflow.

Let’s write some high level pseudocode to make sure we are on the same page before I begin:

Download all question pages from StackOverflow.
Use beautiful soup to get each separate question, a link to the question, answers and their dates.
Add the link to the question.
Add the license depending of the date to each question and answer.
Use selenium to create a post in HeapOverflow with the question.
@thann
mod
admin
link
fedilink
14M

Ahh, I see, I figured there was a suitable API for Lemmy but IDK.

Some important considerations:

  • determining the propper sub to post
  • include a link to SO user profiles
  • probably have a score threshold
  • rate-limiting from lemmy and SO

Discuss this instance and request communities

Create Post From:
lemmy.ml

  • 0 users online
  • 3 users / day
  • 3 users / week
  • 3 users / month
  • 4 users / 6 months
  • 7 subscribers
  • 5 Posts
  • 11 Comments
  • Modlog