pandas_genomics.io.from_plink¶
-
pandas_genomics.io.
from_plink
(input: Union[str, pathlib.Path], swap_alleles: bool = False, max_variants: Optional[int] = None, categorical_phenotype: bool = True)[source]¶ Load genetic data from plink v1 files (.bed, .bim, and .fam) into a DataFrame.
- Parameters
- input: str or Path
PLINK sample (no extension): .bed, .bim and .fam files with the same name and location must exist.
- swap_alleles: bool
False by default, in which case “allele2” (usually major) in the bim file is considered the “reference” allele. If True, “allele1” (usually minor) is considered the “reference” allele.
- max_variants: Optional[int]
If provided, only load this number of variants
- categorical_phenotype: bool, True by default
If True, the phenotype is encoded as a categorical when loaded (1 = “Control”, 2 = “Case”, otherwise missing. If False, load values directly.
- Returns
- DataFrame
Columns correspond to variants (named as {variant_number}_{variant ID}). Rows correspond to samples and index columns include sample information.
Notes
Plink v1 files encode all variants as diploid (2n) and utilize “missing” alleles if the variant is actually haploid