Data Explained

Original Data

The original released data (which I have called “raw”) as downloaded from Steve Kirsch’s repository is a flat file table in CSV format with each row (of 4M rows) being a record of a vaccine administered containing the following fields:

Technical Detail

CSV (Comma Separated Values) is a standard format for exchanging tabular data. Sometimes quote characters (“) are used to delimit fields, however in this case the data fields themselves contain no commas and so quote delimiters are not required. The total size of the file is 278,248 bytes.

  • mrn – Probably stands for Medical Record Number and relates to a specific person. Since there are typically multiple records per person, it turns out that there are a little over 2M different people covered by the set of data.
  • batch_id – Identifies a specific vaccine batch but is a generated number and cannot be independently matched to an official manufacturer’s batch number
  • dose_number – The dose number for this record where 1=primary dose, 2=second primary dose, 3=first booster, etc.
  • date_time_of_service – Date when the dose was administered
  • date_of_death – If the person had prior to the end of the sampled period (somewhere in the last half of 2023) there is a date specified here, otherwise the field is blank.
  • vaccine_name – The specific vaccine administered with manufacturer and type specified. There are NINE different vaccines specified here although one of them only has 1 record and another just 95. Most of them (3.4M) are “Pfizer BioNTech COVID-19”.
  • date_of_birth – The approximate date of birth of the person
  • age – The approximate age of the person. It seems that for most of the records, this number is the age at the end of the sampling period (2023) however there are some discepancies – see later section on this page.

More about the dates

In order to protect the anonymity of the person records, the dates have been slightly altered before the data was published. This makes it impossible to identify a person which might have been feasible if birth and death dates were exact. For example death notices (in the public domain) in the paper, or funeral notices often specify these dates. See later section on this page describing a further obscurification of this data which I believe is possible without loosing the level of accuracy necessary for a high quality safety analysis.

My Changes

This is a summary of the changes I have made to the data:

  • Normalised the data into 3 tables; persons, doses, jabs. This significantly reduces the size of the data when exporting as there is a lot of duplication in the original flat table – particularly with the vaccine description
  • Converted birth date to just a year (there is little value in specifying the day of the year that a person was born even if it is altered a bit.
  • Converted recent dates (when dosed and death) to a week number starting at 2021-04-05 (5th April 2021). Again, this makes it even harder to identify individuals without loosing any significant detail for analysis.

Samples

The Samples table allows you to generate a time cohort snapshot of deaths from the data and display the results in a graph. You can select the period of time as well as a number of other filters such as period of time a specific vaccine dose was administered and which vaccine batch was used. The concept here is to examine the SHAPE of the graph and compare different filter settings. Doing it this way should pretty much eliminate factors like seasonal variations and age bias. The first two entries demonstrate this concept – click on the hyperliinked SampleID number to display the graph in a separate tab.

Technical Details on Curve Fit

Since the weekly death numbers are inherently “noisy”, I decided that using a polynomial regression curve fitting algorithm would give a simple and clear indication of the data. For the curve fit program I have used a PHP library by Andrew Que – see http://polynomialregression.drque.net/index.php
I am using FIVE degrees in the algorithm.
The display of the graph is rendered in Javascript using Google Charts.

The graph shows weeks along the x axis (horizontal) and deaths on the y axis (vertical). The deaths values have been normalised so that the minimum in the range shows as ZERO and the maximum in the range shows as ONE. Provided the total sample sizes have a reasonable number of samples (I would suggest a minimum of 3000 (given around 50 weeks in the horizontal direction) but no doubt there could be discussion on this), then IF we take a sample (based on some criteria) which is a sub-set of another then we should reasonably expect the SHAPE of this graph to be similar if the criteria has no impact on death. For example let’s assume we graphed all those who died in the year between weeks 44 and 95 (see SampleID=1). If we were able to take a sample based on the orientation of their house and select those whose house was oriented N-S vs E-W, we would expect the sample graph to look very similar because most people would assume that the orientation of one’s house has NO affect on when you die. The SampleID=2 graph was a subset of SampleID=1 – the only difference being that these people received their first booster (dose equal to 3) in the first 4 weeks of that period of time. Make up your own mind whether the difference in graphs is significant!

Please note:

  • I have done very little sampling myself to look at different scenarios – the main goal is to make a research tool readily available
  • This technique is using only the data (all vaccinated people) itself to suggest causality
  • The query used, the fitted curve details, and the result data are all provided through this interface. Use view sample (magnifying glass icon) to see the query and the curve details. Click on the coloured hamburger icon with number beside it to show the weekly record data which can be exported by clicking on “Weekly Summaries” link and then click on arrow beside the gear icon (top right)
  • The “Random subset” sampling method can be used to show how a sample subset of the total number of deaths in a period SHOULD result in a simlar shape graph to the sample of ALL deaths in the period. There will of course be some variation because theoretically the “random” sample could select all of the records that are included in the sample filtered by vaccine parameters.
  • Click on the copy icon to quickly duplicate and modify/tweak an existing sample configuration.

Anyone can create, edit, and delete sample table entries. Pinned entries (yellow highlight/bold) cannot be edited or deleted. Feel free to add your own notes to explain the background of your sample. Please respect other people’s contributions.

Scroll to Top