> write a Python script to use Selenium in order to webscrape the titles of the papers from the links.
OMG. Programmer here. Isn't there any metadata in some webpages to get the title of the study, or some other data about the study?
I'd be willing to help you with programming tasks, I just use Perl. It's great at text processing. For example I can deduplicate studies as I've done deduplication of mailing addresses. What a task that was!
Can't you deduplicate by the DOI number or DOI link? Every study has one I believe and the DOI number is unique to a study. More info at https://doi.org
What about having several people make a growing spreadsheet of these articles with 1 row per study and different fields for data? Such as study name, 3 main researchers (sometimes there are 10 authors), DOI number, DOI link, publish date, link to full study, link to abstract only (not all full studies are free), and 3-5 topics for each study. That's just off the top of my head without doing any analysis on the study data.
I'm going to be honest, I am approaching this for the first time, and was not aware DOI was a unique identifier, and that would have probably been vastly more useful to obtain had I known about it.
In terms of data, webpages are optimised for viewing by people, and not datamining by robots, so each one has a nightmarishly different format. I have over 20 distinct rules (one set of rules per website) for obtaining the titles, but that is titles alone. Authorship formatting, abstracts, dates etc wildly varies (some have no abstract), and my original goal was to make squashing duplicates easier, rather than creating a comprehensive database.
My hope was by flattening it into a high quality, no-duplicates-allowed list, it'd make it much easier for other people to datamine, as well as serve as rebuttal evidence to health agencies. If we're having trouble reading it all - we know they sure as hell haven't read any of this.
An updateable database would require a few things:
A) Money (which I don't have: this Substack is not financially viable and won't be without 200 paying subscribers; I'm at 9) for custom hosting
B) A decent, *censorship free* webhost that will allow for a combination of PHP + SQL/SQLite
C) A team of competent moderators to filter incoming submissions, apply corrections, updates etc. Automation can only do so much.
I'd need to be at Steve Kirsch level of resources before I could tackle such a project. I do think such an idea is extremely important, but I'm struggling tackling my own projects + real life.
Whilst he 'replied' to my EMA leak request via email, he did not hold a dialogue. He did offer his phone number, but given the legal risks involved in other projects (and lack of money to legally defend myself with), I declined to call given it'd not only reveal my number, but also my location (it's not Kirsch, but America has mass surveillance up to the eyeballs). I did offer a VoIP session, he didn't respond, so we don't have any active lines of communication.
I'd prefer not to hound him again, and I'd like to get the abstracts extraction stuff working, and then also classify the titles based on title content for easier grouping/searching and improve the dataset currently. There's nothing stopping other people approaching Kirsch and taking the dataset I've published so far and proposing the database, and the more stuff people can take off my plate, the better.
Best way to get his attention is to lurk (usually over a 12-24 hour period) on his Substack, wait for a Substack article to be published (I'd advise eyeballing the sort of times he publishes, and then refreshing every 15 minutes or so within that timeframe), and lay down a comment with a reference/explanation within the first few minutes so it gets seen as soon as it goes live.
Anything older than an hour will be buried with comments, and you'll want to post within the first 15 minutes for the 'golden hour'.
You're more than welcome to reference this article, the dataset etc - anything published publicly on my Gitlab is free to use. I don't mind collaborating with him either, but an active comms (like DM/PM/VoIP) would need to be established for that to realistically happen.
Kirsch may not be the only option out there either, but I don't have time to check possible backers of such an idea. The main hurdle will be censorship free hosting, as many website hosts and service providers censor. Hosting in Iceland is usually censorship free, though.
I've emailed June Raine and the Pharmacovigilance Team at MHRA (and let my MP know), asking for their definition of 'safe and effective'.
I've included reference to Steve Kirsch, and obviously to your video on Brighteon showing 750 peer-reviewed articles on vaxx harm, Dr Aseem Malholtra's YT video etc.
I assume they must have published their definition of 'safe and effective' somewhere already so hopefully it won't take them long for a reply.
I'll let you know when the honourable Dr Raine urgently addresses this question and sets our minds at rest.
Just to let you know, I'm still working on the abstracts, but I am slotting it inbetween a slew of other tasks. Some of the NIH abstracts have been obtained, but there's still a number of studies left in the list, however calibrating for each site takes time, and in the interim I've found a rather suspicious issue with Substack itself, although I'm awaiting a response from their support team. Just letting you know I'm still working on it.
It is unlikely anyone would have an accurate update to either his or my list. His list was not in a parseable format, and mine - which removes a lot of duplicates and fixes a number of issues - was literally only published late last night.
I'm not sure he is the creator of the list, I was just reporting where I had found the dataset from originally.
Okay, I see Paul Alexander published the original list on 30th April. That means there's likely to be more articles published since then. I'll have a good look at how Paul originally collected the 1000 articles.
Have you tried ginger tea or chewing a small piece of raw ginger root for your acid reflux? It might help.
I would like a pdf with all the 755 journal article titles listed, with authors, date etc. Is that possible, or would it require someone to manually copy and paste it? A bibliography app should be able to do it? The format you have at the moment, with just the urls showing, means I can't search the articles, categorise them into adverse effect types, sort according to Pfizer, AZ etc, get a sense of how many people in each case study (usually one, I expect, but that's a guess) etc.
I don't know how to slow down the video speed on Brighteon, like I can on YT, so I can read the titles.
I want to present all these data to the MHRA and to June Raine personally and to every public figure in England telling us these jabs are safe and effective. I want them to define 'safe' in the context of these data.
If you can send me the list of abstracts in Word or something, I can get some more analysis done on them now you've done all the heavy lifting. You said 5 are preprints - do we know the cutoff date for collection of these articles? When did the Paul chap make the original 1000 article list?
Phew, a lot of questions there. Forgive me for not responding sooner, tying up loose ends.
"I would like a pdf with all the 755 journal article titles listed, with authors, date etc. Is that possible, or would it require someone to manually copy and paste it?"
It might be possible with a modification to the bot, but no guarantees as automation at scale with differing datasets is difficult to do. I'll look into it, and maybe write an article covering it as well. Don't be surprised if I take a few days to get back, it is a lot of data to go over.
"The format you have at the moment, with just the urls showing, means I can't search the articles, categorise them into adverse effect types, sort according to Pfizer, AZ etc, get a sense of how many people in each case study"
That's unfortunately a limit of the webpages. Some might show abstracts. Some might not. The full text formatting varies wildly, and the bot had great difficulty reading PDFs. I do plan to classify the titles so it is possible for people to sort, but I wanted to publish the titles list straight away as there are still people dying from this.
"I don't know how to slow down the video speed on Brighteon, like I can on YT, so I can read the titles."
The video is mainly for show. That said, if you look in the mid-left of the Brighteon video, you'll see a "x1" - if you click that you can slow it down to 0.25x. My recommendation is to read the CSV list directly - as not only does it contain all the shown studies, but it lists ones I wasn't able to show in the video as well:
"If you can send me the list of abstracts in Word or something, I can get some more analysis done on them now you've done all the heavy lifting."
I only extracted the titles. It would have required even more effort to get the abstracts given the uh... 'creative designs' of the various journals websites. I will modify the bot and do my best to get you that additional data, although I cannot guarantee quality given automation is prone to making mistakes.
"You said 5 are preprints - do we know the cutoff date for collection of these articles?"
Unfortunately I do not. 4 are the medRxiv (which you can find by the URL), and 1 is an NIH article, although I don't know which one that is specifically off the top of my head, it is shown at one point in the later half of the video (the URL for each study is displayed in the top of the video).
"When did the Paul chap make the original 1000 article list? "
He published it April 30th this year, although I do not think he is the originator of the list. I had encountered other variations which had a fewer number of studies.
Ironically, even with 755 studies, I don't think it represents all the ones out there, as ones I've seen before were not shown in the viewing (although with so many I could have overlooked one). The list is roughly 6 months old. I have had little time to update it.
I'm very excited by what you've achieved, and frustrated that I don't currently have an institutional affiliation so it's hard for me to access journal databases.
I'm trying to think through objections that Raine et al would put to this list, eg it mixes up the different types of jab; many papers are probably case-studies on one individual; given that there have been approx 12 billion jabs given out, 750 articles is not that many, etc. I want to preempt such objections.
I really want to use this work you've done. I want to amass evidence to get Raine charged with safeguarding issues or something similar. As you discussed in your earlier article, we HAVE to step up and get things changed. Cheers!
Fantastic video!
I really like the music but an not familiar with it.
Please provide source.
Never mind. Just reread the article.
Getting a Ph.D lowered my reading skills.
Wow! That's a lot of work!
> write a Python script to use Selenium in order to webscrape the titles of the papers from the links.
OMG. Programmer here. Isn't there any metadata in some webpages to get the title of the study, or some other data about the study?
I'd be willing to help you with programming tasks, I just use Perl. It's great at text processing. For example I can deduplicate studies as I've done deduplication of mailing addresses. What a task that was!
Can't you deduplicate by the DOI number or DOI link? Every study has one I believe and the DOI number is unique to a study. More info at https://doi.org
What about having several people make a growing spreadsheet of these articles with 1 row per study and different fields for data? Such as study name, 3 main researchers (sometimes there are 10 authors), DOI number, DOI link, publish date, link to full study, link to abstract only (not all full studies are free), and 3-5 topics for each study. That's just off the top of my head without doing any analysis on the study data.
I'm going to be honest, I am approaching this for the first time, and was not aware DOI was a unique identifier, and that would have probably been vastly more useful to obtain had I known about it.
In terms of data, webpages are optimised for viewing by people, and not datamining by robots, so each one has a nightmarishly different format. I have over 20 distinct rules (one set of rules per website) for obtaining the titles, but that is titles alone. Authorship formatting, abstracts, dates etc wildly varies (some have no abstract), and my original goal was to make squashing duplicates easier, rather than creating a comprehensive database.
My hope was by flattening it into a high quality, no-duplicates-allowed list, it'd make it much easier for other people to datamine, as well as serve as rebuttal evidence to health agencies. If we're having trouble reading it all - we know they sure as hell haven't read any of this.
An updateable database would require a few things:
A) Money (which I don't have: this Substack is not financially viable and won't be without 200 paying subscribers; I'm at 9) for custom hosting
B) A decent, *censorship free* webhost that will allow for a combination of PHP + SQL/SQLite
C) A team of competent moderators to filter incoming submissions, apply corrections, updates etc. Automation can only do so much.
I'd need to be at Steve Kirsch level of resources before I could tackle such a project. I do think such an idea is extremely important, but I'm struggling tackling my own projects + real life.
Dude, reach out to Steve Kirsch. He might fund it?
Whilst he 'replied' to my EMA leak request via email, he did not hold a dialogue. He did offer his phone number, but given the legal risks involved in other projects (and lack of money to legally defend myself with), I declined to call given it'd not only reveal my number, but also my location (it's not Kirsch, but America has mass surveillance up to the eyeballs). I did offer a VoIP session, he didn't respond, so we don't have any active lines of communication.
I'd prefer not to hound him again, and I'd like to get the abstracts extraction stuff working, and then also classify the titles based on title content for easier grouping/searching and improve the dataset currently. There's nothing stopping other people approaching Kirsch and taking the dataset I've published so far and proposing the database, and the more stuff people can take off my plate, the better.
Best way to get his attention is to lurk (usually over a 12-24 hour period) on his Substack, wait for a Substack article to be published (I'd advise eyeballing the sort of times he publishes, and then refreshing every 15 minutes or so within that timeframe), and lay down a comment with a reference/explanation within the first few minutes so it gets seen as soon as it goes live.
Anything older than an hour will be buried with comments, and you'll want to post within the first 15 minutes for the 'golden hour'.
You're more than welcome to reference this article, the dataset etc - anything published publicly on my Gitlab is free to use. I don't mind collaborating with him either, but an active comms (like DM/PM/VoIP) would need to be established for that to realistically happen.
Kirsch may not be the only option out there either, but I don't have time to check possible backers of such an idea. The main hurdle will be censorship free hosting, as many website hosts and service providers censor. Hosting in Iceland is usually censorship free, though.
Brilliant work! Exceptional compilation. We are all underdogs these days. Thank you.
I've emailed June Raine and the Pharmacovigilance Team at MHRA (and let my MP know), asking for their definition of 'safe and effective'.
I've included reference to Steve Kirsch, and obviously to your video on Brighteon showing 750 peer-reviewed articles on vaxx harm, Dr Aseem Malholtra's YT video etc.
I assume they must have published their definition of 'safe and effective' somewhere already so hopefully it won't take them long for a reply.
I'll let you know when the honourable Dr Raine urgently addresses this question and sets our minds at rest.
Just to let you know, I'm still working on the abstracts, but I am slotting it inbetween a slew of other tasks. Some of the NIH abstracts have been obtained, but there's still a number of studies left in the list, however calibrating for each site takes time, and in the interim I've found a rather suspicious issue with Substack itself, although I'm awaiting a response from their support team. Just letting you know I'm still working on it.
I've commented to Dr Alexander to ask details on how he assembled the original list, and to ask about a possible update.
It is unlikely anyone would have an accurate update to either his or my list. His list was not in a parseable format, and mine - which removes a lot of duplicates and fixes a number of issues - was literally only published late last night.
I'm not sure he is the creator of the list, I was just reporting where I had found the dataset from originally.
Yes, it actually looked *extremely* jumbled in his original article. You did a terrific job making sense of it and straightening it out!
Okay, I see Paul Alexander published the original list on 30th April. That means there's likely to be more articles published since then. I'll have a good look at how Paul originally collected the 1000 articles.
Huge thank you for this resource!
Have you tried ginger tea or chewing a small piece of raw ginger root for your acid reflux? It might help.
I would like a pdf with all the 755 journal article titles listed, with authors, date etc. Is that possible, or would it require someone to manually copy and paste it? A bibliography app should be able to do it? The format you have at the moment, with just the urls showing, means I can't search the articles, categorise them into adverse effect types, sort according to Pfizer, AZ etc, get a sense of how many people in each case study (usually one, I expect, but that's a guess) etc.
I don't know how to slow down the video speed on Brighteon, like I can on YT, so I can read the titles.
I want to present all these data to the MHRA and to June Raine personally and to every public figure in England telling us these jabs are safe and effective. I want them to define 'safe' in the context of these data.
If you can send me the list of abstracts in Word or something, I can get some more analysis done on them now you've done all the heavy lifting. You said 5 are preprints - do we know the cutoff date for collection of these articles? When did the Paul chap make the original 1000 article list?
Cheers!
Phew, a lot of questions there. Forgive me for not responding sooner, tying up loose ends.
"I would like a pdf with all the 755 journal article titles listed, with authors, date etc. Is that possible, or would it require someone to manually copy and paste it?"
It might be possible with a modification to the bot, but no guarantees as automation at scale with differing datasets is difficult to do. I'll look into it, and maybe write an article covering it as well. Don't be surprised if I take a few days to get back, it is a lot of data to go over.
"The format you have at the moment, with just the urls showing, means I can't search the articles, categorise them into adverse effect types, sort according to Pfizer, AZ etc, get a sense of how many people in each case study"
That's unfortunately a limit of the webpages. Some might show abstracts. Some might not. The full text formatting varies wildly, and the bot had great difficulty reading PDFs. I do plan to classify the titles so it is possible for people to sort, but I wanted to publish the titles list straight away as there are still people dying from this.
"I don't know how to slow down the video speed on Brighteon, like I can on YT, so I can read the titles."
The video is mainly for show. That said, if you look in the mid-left of the Brighteon video, you'll see a "x1" - if you click that you can slow it down to 0.25x. My recommendation is to read the CSV list directly - as not only does it contain all the shown studies, but it lists ones I wasn't able to show in the video as well:
https://gitlab.com/TheUnderdog/general-research/-/blob/main/COVID-19-Shot-Questions/Part2/755Studies.csv
You can download the CSV directly from here:
https://gitlab.com/TheUnderdog/general-research/-/raw/main/COVID-19-Shot-Questions/Part2/755Studies.csv?inline=false
"If you can send me the list of abstracts in Word or something, I can get some more analysis done on them now you've done all the heavy lifting."
I only extracted the titles. It would have required even more effort to get the abstracts given the uh... 'creative designs' of the various journals websites. I will modify the bot and do my best to get you that additional data, although I cannot guarantee quality given automation is prone to making mistakes.
"You said 5 are preprints - do we know the cutoff date for collection of these articles?"
Unfortunately I do not. 4 are the medRxiv (which you can find by the URL), and 1 is an NIH article, although I don't know which one that is specifically off the top of my head, it is shown at one point in the later half of the video (the URL for each study is displayed in the top of the video).
"When did the Paul chap make the original 1000 article list? "
He published it April 30th this year, although I do not think he is the originator of the list. I had encountered other variations which had a fewer number of studies.
Ironically, even with 755 studies, I don't think it represents all the ones out there, as ones I've seen before were not shown in the viewing (although with so many I could have overlooked one). The list is roughly 6 months old. I have had little time to update it.
Sorry, didn't mean to overwhelm you!
I'm very excited by what you've achieved, and frustrated that I don't currently have an institutional affiliation so it's hard for me to access journal databases.
I'm trying to think through objections that Raine et al would put to this list, eg it mixes up the different types of jab; many papers are probably case-studies on one individual; given that there have been approx 12 billion jabs given out, 750 articles is not that many, etc. I want to preempt such objections.
I really want to use this work you've done. I want to amass evidence to get Raine charged with safeguarding issues or something similar. As you discussed in your earlier article, we HAVE to step up and get things changed. Cheers!
Incredible thank you
That is herculean! What an effort. Thanks, I will forward the link.