Logo
stable

Contents:

  • Hall Lab Homepage
  • Pandas-Genomics Github Repo
  • API Reference
    • Input/Output
      • pandas_genomics.io.from_plink
      • pandas_genomics.io.to_plink
      • pandas_genomics.io.from_vcf
      • pandas_genomics.io.from_bed
    • Scalars
    • Simulation
    • Arrays
    • Accessors
  • Notes
  • Release History
pandas-genomics
  • »
  • API Reference »
  • pandas_genomics.io.from_plink
  • Edit on GitHub

pandas_genomics.io.from_plink¶

pandas_genomics.io.from_plink(input: Union[str, pathlib.Path], swap_alleles: bool = False, max_variants: Optional[int] = None, categorical_phenotype: bool = True)[source]¶

Load genetic data from plink v1 files (.bed, .bim, and .fam) into a DataFrame.

Parameters
input: str or Path

PLINK sample (no extension): .bed, .bim and .fam files with the same name and location must exist.

swap_alleles: bool

False by default, in which case “allele2” (usually major) in the bim file is considered the “reference” allele. If True, “allele1” (usually minor) is considered the “reference” allele.

max_variants: Optional[int]

If provided, only load this number of variants

categorical_phenotype: bool, True by default

If True, the phenotype is encoded as a categorical when loaded (1 = “Control”, 2 = “Case”, otherwise missing. If False, load values directly.

Returns
DataFrame

Columns correspond to variants (named as {variant_number}_{variant ID}). Rows correspond to samples and index columns include sample information.

Notes

Plink v1 files encode all variants as diploid (2n) and utilize “missing” alleles if the variant is actually haploid

Next Previous

© Copyright 2021, John McGuigan. Revision a8481b19.

Built with Sphinx using a theme provided by Read the Docs.