### General information Author: Olga Krali, Josefine Palle, Jessica Nordlund Contact e-mail: jessica.nordlund@medsci.uu.se DOI: 10.17044/scilifelab.14666127 License: Restricted Access This readme file was last updated: 14-06-2021 ### Dataset description This dataset contains genome-wide DNA methylation data generated from 142 pediatric acute myeloid leukemia (AML) samples originating from bone marrow or peripheral blood samples taken at AML diagnosis (N=123) or relapse (N=19). Further details regarding the samples are available in Supplementary Table S1 from Krali and Palle et. al., 2021 (https://doi.org/10.3390/genes12060895). Genome-wide DNA methylation was analyzed at the SNP&SEQ Technology Platform, SciLifeLab, National Genomics Infrastructure Uppsala, Sweden. 200ng of bisulfite converted DNA was amplified, fragmented and hybridised to Illumina Infinium Human Methylation450k Beadchip using the standard protocol from Illumina (iScan SQ instrument). This metadata record contains information about the raw idat files generated from the Infinium DNA methylation arrays. The raw idat files were processed with Methylation Module (1.8.5) software in Genome Studio (V2010.3). The Methylprep Python library was used to generate and normalize the beta-value matrix (https://pypi.org/project/methylprep/1.3.3/). The raw idat files along with a samplesheet, processed beta-value matrix, annotation file for CpG annotation, and signal intensities matrix will be made available upon request. Limited phenotype information is available in the Supplemental Table 1 of the manuscript. All scripts that give a walk-through from data preprocessing from the raw idat files until the modelling process with Machine Learning can be found on the following GitHub repository: https://github.com/Molmed/Krali-Palle_2021. Terms for access The DNA methylation dataset is only to be used for research that is seeking to advance the understanding of the influence of epigenetic factors on leukemia etiology and biology. The data should not be used for other purposes, i.e. investigating the epigenetic signatures that may lead to identification of a person. For retrieving the data used for the scope of this publication, please contact datacentre@scilifelab.se. Each entry contains the following variables (order of appearance in the xlsx file): - 'Sample name': public id of each sample. r denotes samples taken from relapse - 'Organism': The species of origin, human (Homo sapiens) - 'idat file 1': name of the green channel (methylated signal) idat file with ID as the sample's sentrix ID - 'idat file 2': name of the red channel (unmethylated signal) idat file with ID as the sample's sentrix ID - 'Gender': gender of the patients for each sample - 'Sample type': if the sample was taken at diagnosis (diagnostic) or relapse - 'Cytogenetic Subtype': the AML subtype that each sample belongs to and if not applicable is empty. - 'Molecule': Genomic DNA - 'Description': Primary AML sample take at diagnosis or relapse