I’d be down for mass import as long as it’s done well.
My idea is either do it with selenium, simulating the usage of a normal user on the website. In that case, every post would have my user as the creator and would generate a lot of spam in the front wall. Or it could be done directly to the database, but that would create bogus users that would be unavailable to use for anyone else. I think it would be better to do the later but with a reserved user that you create only for that purpose and has a descriptive name like “bot”. Let me know your ideas.
I think it makes sense to have a bot account specifically for this.
Selenium might be a bit overkill as SO uses server-rendered-pages so curl can scrape the content just fine =]
Also the bot should leave a link to the answer on SO 🤔
I mean selenium for uploading to HeapOverflow.
Let’s write some high level pseudocode to make sure we are on the same page before I begin:
Download all question pages from StackOverflow. Use beautiful soup to get each separate question, a link to the question, answers and their dates. Add the link to the question. Add the license depending of the date to each question and answer. Use selenium to create a post in HeapOverflow with the question.
Ahh, I see, I figured there was a suitable API for Lemmy but IDK.
Some important considerations: