-
Notifications
You must be signed in to change notification settings - Fork 0
Changelog1.5
- Status
- Changelog
- What to expect
On TSD p229/mobagenetics now contains raw-data for everything that the NORMENT project has gentotyped since 2019 (all of it by deCODE). See individual datasets for more details.
6.12.23 See updated documentation on withdrawal here - Note that from now, the latest deletion file (snpDeletions231204.csv) contains the date of when previous samples were deleted: You only need to check the newest one. 118 samples were deleted.
23.2.23 See updated documentation on withdrawal here - The documentation also tells you how to identify the sets and which samples have been deleted - (104 samples were deleted this time)
29.7.22 As described on the withdrawal-page - We have now deleted the duplicate raw-data files found both in version 1.0 and 1.5. We have done so we don't have to maintain different versions of the datasets, a choice done by the MoBa.
The legacy directory MoBa_harvest has been disabled for user for now. It contains data that has been incorporated in MoBaGenetics 1.0 and 1.5 and should not be in use. Let us know if you rely on it - after a grace period we will delete it completely.
22.7.22 See updated documentation on withdrawal here - The documentation also tells you how to identify the sets and which samples have been deleted (642 - 637 of these exists in bedsets)
5.7.22 The standard structure has been used for snp019. We also discovered and are investigating 28 samples that seem to be duplicates of the same person.
Note that, as earlier, this set is special as
- It most probably only has duplicates found in other sets
- There are two cluster-files but no bedset (plink). This is due to the chip being customized
24.6.22 The standard structure has been used for snp017 Furthermore the set has been split in subsets a-f: This previously actually was the case if looking at the (at the time non-standardized) names of the plink-files.
Directories of idat-files are now as one should expect, and so is the naming of the bedsets.
Samplesheets have also been cleaned up so they only contain samples that actually have corresponding idat-files. This means that the original number of samples has changed slightly in the documentation.
22.6.22 The samplesheets for the snp001, snp002, snp003 and snp007 had different headers than the rest - due to changes in standardization. This has now been corrected, and corresponding gsheader.csv files (for use with Genome Studio) have been added to the set.
The SnpCommon/Manifest directory has been updated with new manifest-files.
22.6.22 The standard structure has been used for snp016 Furthermore the set has been split in subsets a-b: This previously actually was the case if looking at the (at the time non-standardized) names of the plink-files.
Directories of idat-files are now as one should expect, and so is the naming of the bedsets.
Samplesheets have also been cleaned up so they only contain samples that actually have corresponding idat-files. This means that the original number of samples has changed slightly in the documentation.
Also the scanned date in the documentation where bad and have been replaced with call-dates.
21.06.22 (Technically most of this does not belong to the 1.5 changelog but we have no changelog for 1.0)
Part of the raw data-set (plink and idats) found MOBAGENETICS-V1/NORMENT1/data/raw-data have been deleted. This is true for the plink files and the idats-files of the subdirectories called may16.
Most of the data is available in the 1.5 set, under the name snp009.
The reason for this is that some of the samples had to be deleted in 1.5, and we don't want them visible in the 1.0 raw-dataset. Soon, all the 1.0 raw-data files will be removed as they are duplicated on 1.5. However, in version 1.5, we will remove MoBa participants that have withdrawn their consent - and doing so in 1.0 will cause extra work.
The section Important warning MoBaGenetics 1.5 has been updated as well: We now are pretty sure the data-structure is fixed, and we described version handling (or rather lack thereof) better.
17.6.22 The standard (hopefully final!) structure has been used for snp018 Furthermore the set has been split in subsets a-e: This previously actually was the case if looking at the (at the time non-standardized) names of the plink-files.
We have done this to minimize the chance of batch effects - not wanting to merge plink-files.
Directories of idat-files are now as one should expect, and so is the naming of the bedsets.
Note that the old directory (snp018 that contained a mix of everything in a somewhat unstructured way) has been deleted.
14.6.22 idat-files are now in a structure matching standard MobaGenetics. See snp015.
The problems with snp015b lacking 14 duplicate samples have been 'silenced' - these have been removed from the samplesheet.
Sex was wrongly coded in the bedset (fam-files), so their values have (for now) been replaced by 'unknown' (0).
8.6.22 The raw-data for snp014 (aka Rotterdam2 is now available. A GenomeStudio suitable manifest-file (GSA-24v1-0_C1.bpm) has been uploaded to SnpCommon/Manifest (downloaded from the Illumina-site).
All snpArray raw.data returned to FHI should now be published under MoBaGenetics 1.5.
7.6.22 raw-data for snp013 has been refreshed. The data was just dumped on TSD before the structure of the data was decided. Changes:
- idat-files now are on a standard format (they had lower case filenames and were in a strange directory-structure).
- GS header exists, documentation updated with respect to cluster/manifest files.
- sampleSheet_snp013.csv is cleansed of 76 rows due to biobank-samples having to low DNA to create idat-files
2.6.22 The raw-data for snp012 (aka Rotterdam1 is now available.
2.6.22 A subdir SnpCommon now exits with a sub-directory Manifest. Just one manifest-file there for now ... more to come!
25.5.22 Unfortunately the familyIDs and sampleIDs where not according to MoBa standard. It has been removed and will be replaced real soon.
25.5.22 The raw-data for snp009 (aka Norment May16) is now available.
Please follow the snp009 link to read about discrepancies between MonbaGenetics 1.0 and 1.5 data.
20.5.22 The raw-data for snp011 (aka TED) is now available.
18.5.22 The raw-data for snp010 (aka Norment Feb18) is now available.
Extremely observant nerds will see that the names of the idat-files have changed to a more standard form as compared to the raw-data pubished under MoBaGenetics 1.0. This should have no practical consequences, but we strive at streamlining formats. This makes it much easier to create efficient reusable pipelines during the QC.
The same nerds will also notice that snp009 is not published yet, as we investigate its completeness.
26.04.22 The sample sheet changes (pilot from 13.04.22) have been undone. Headers on the sampleSheet files are similar to before, and a new file (gsHeader.csv) is (sometimes) made availble for users of gs-studio. It contains examples on how the header should be.
We will gradually stop indicating the chip in the sampleSheet files: By definition of set, it is the same for all and belong elsewhere.
13.04.22 This set is previously known as Norment Jun15.
The bad (?) news is that the earlier warning about the changes in file-structure/format for 1.5 might be about to kick in...
For snp008 we propose/test out a new format for the samplesheet. This in order to make it easier to use GenomeStudio - the previous samplesheets needed some tweaking.
A couple of things have happened within the sampleSheet
- Sample_name has been renamed to Sample_Id
- Sentrix_ID and Sentrix_Position have been renamed to SentrixBarcode_A,SentrixPosition_A
- The order of the columns have changed.
In addition a new file, gs_Samplesheet_snp008.csv is a suitable sampleSheet for Genome Studio if you want to process the idat-files - provided that you have the corresponding manifest (it is currently not available but named (A,HumanOmniExpress-24v1). There is a possibility that later the GenomeStudio version will have different data-column headers than the plain version, but we will try to avoid that if possible.
If this works out, we intend to (later) make sure that the directory structure for idat-files suits Genome Studio as well.
05.04.22 Originally we wanted manifest files stored under the datasets, but since they actually are common they might be moved to a newly documented snpArray/Common/Manifest directory.
Also it turned out that the Windows client on TSD does not understand unix softwlinks (just won't show them). So yesterdays Norment Jan15 structure was remade based on snp006.c
04.04.22 The Norment Jan15 bedset (plinkfiles) contained 2983 samples, while the sample sheet and idat-directories contained 3414. This has now been corrected. A side-effect is that the samplesheet has been renamed (since it had 3414 in its name).
(Sorry for introducing 1.0 changes in the 1.5 changelog) ...
The raw-data for snp007 (aka Norment Jan15) is now available.
29.03.2022 Yes, this is not 1.5, but we had requests for these files, so they have been added to version 1.0. These are mainly flaglist, that show what part of the QC samples/markers were removed.
The MoBaGenetics1.0 has been updated accordingly. The MERGEd set contain some extra pedigree info on the "ethnic core", pca plots and markerlists.
11.03.2022 Until now, MoBaGenetics 1.0 files were not found on the MoBaGenetics 1.5 directory. Gradually, we are including these - and rather recently snp001, snp002 and snp003 have been added. This will continue until the totality of all raw-data sets have been published.
There still might be some changes in the directory structure - some of these sets contain more than 50000 samples - so for now many idats-directories contain subdirectories but not always following the same pattern.
Also note that while MoBaGenetics 1.0 only consisted of datasets that made it to the merged set, MoBaGenetics will have snp* sets that will not be part of the for now almost mythical to-be-QC'ed set.
10.08.21 snp013 is uploaded. This is a rather small set (2426 samples) from 2018. There has been a bunch of problems getting this in place, and we have given up getting details like sample-well positions.
14.06.21 Even more updating of snp015b documentation. This time it is about documentation of missing idat-files due to duplicate tests on the same sample.
10.06.21 We had previously not received sample tray/well positions for snp015. Samplesheet files are now updated, and also contain more information, including scan dates. The documentation of the set is also slightly updated.
26.05.21 A small dataset snp019 (1164 samples) has been published on tsd.
25.5.21 snp015, snp015b, snp016, snp018 published on TSD. snp017 has it's idat-directories (not files) renamed. This is data from the Norment project that was genotyped after the the preliminary MoBaGentics 1.0 was published.
26.5 The same data, but with the old namestyle/samplesheets (called Norment2 on TSD) has been protected. Will later be removed.
On TSD, the directory Norment2 was named to match the way MoBaGenetics 1.0 datasets were named. They have been republished and documented on individual datasets for 1.5 as snp015-snp108.
The Norment2 is for now read protected to make sure nobody is using it, and will be removed completely , probably early June 2021.
See MoBaGenetics 1.5 for the current data-structure. It is most probable that some directories/files in the snp-sets will be moved and maybe even renamed.
For idat-files, directories containing idat-files might move so they reside directly under the idats directory. Currently some live in sub-directories.
plink/bedset-files might be moved so they are found directly under the bedset directory
There is currently no overlap between version 1.0 and 1.5. Gradually, all 1.0 sets will become part of 1.5. The names of the sets will be snp001 ... snp014.
Empty for now