pandas_genomics.arrays.GenotypeArray

class pandas_genomics.arrays.GenotypeArray(values: Union[List[pandas_genomics.scalars.Genotype], pandas_genomics.arrays.genotype_array.GenotypeArray, numpy.ndarray], dtype: Optional[pandas_genomics.arrays.genotype_array.GenotypeDtype] = None, copy: bool = False)[source]

Holder for genotypes

Variant information is stored as part of the type, and the genotype is stored as a pair of integer arrays

Parameters
valueslist-like

The values of the genotypes.

dtypeGenotypeDtype

The specific parametized type. Optional (if possible to infer from values)

Attributes
dtype: GenotypeDtype

The specific parametized type

data: np.dtype(“u8”) with shape (<genotypes>, <ploidy>)

The genotype values encoded as indices into the allele list of the dtype

__init__(values: Union[List[pandas_genomics.scalars.Genotype], pandas_genomics.arrays.genotype_array.GenotypeArray, numpy.ndarray], dtype: Optional[pandas_genomics.arrays.genotype_array.GenotypeDtype] = None, copy: bool = False)[source]

Initialize assuming values is a GenotypeArray or a numpy array with the correct underlying shape

Methods

__init__(values[, dtype, copy])

Initialize assuming values is a GenotypeArray or a numpy array with the correct underlying shape

argmax([skipna])

Return the index of maximum value.

argmin([skipna])

Return the index of minimum value.

argsort([ascending, kind, na_position])

Return the indices that would sort this array.

astype(dtype[, copy])

Cast to a NumPy array with ‘dtype’.

copy()

Return a copy of the array.

delete(loc)

dropna()

Return ExtensionArray without NA values.

encode_additive()

Additive Encoding

encode_codominant()

This encodes the genotype into three categories.

encode_dominant()

Dominant Encoding

encode_edge(alpha_value, ref_allele, …)

Perform EDGE (weighted) encoding.

encode_recessive()

Recessive Encoding

equals(other)

Return if another array is equivalent to this array.

factorize([na_sentinel])

Return an array of ints indexing unique values

fillna([value, method, limit])

Fill NA/NaN values using the specified method.

is_genotype_array(other)

isin(values)

Pointwise comparison for set containment in the given values.

isna()

A 1-D array indicating if each value is missing

ravel([order])

Return a flattened view on this array.

repeat(repeats[, axis])

Repeat elements of a ExtensionArray.

searchsorted(value[, side, sorter])

Find indices where elements should be inserted to maintain order.

set_reference(allele)

Change the reference allele (in-place) by specifying an allele index value or an allele string

shift([periods, fill_value])

Shift values by desired number.

take(indexer[, allow_fill, fill_value])

Take elements from an array.

to_numpy([dtype, copy, na_value])

Convert to a NumPy ndarray.

transpose(*axes)

Return a transposed view on this array.

unique()

Return a GenotypeArray of unique values

value_counts([dropna])

Return a Series of unique counts with a GenotypeArray index

view([dtype])

Return a view on the array.

Attributes

T

allele_idxs

Return the allele indices for each genotype

dtype

The specific parametized type

gt_scores

Return the genotype score for each genotype (as a float)

hwe_pval

Calculate the probability that the samples are in HWE for diploid variants

is_heterozygous

Boolean array: True if the sample is heterozygous for any alleles

is_homozygous

Boolean array: True if the sample is homozygous for any allele

is_homozygous_alt

Boolean array: True if the sample is homozygous for any non-reference allele

is_homozygous_ref

Boolean array: True if the sample is homozygous for the reference allele

is_missing

Boolean array: True if the sample is missing all alleles

maf

Calculate the Minor Allele Frequency (MAF) for the most-frequent alternate allele.

nbytes

How many bytes to store this object in memory

ndim

Extension Arrays are only allowed to be 1-dimensional.

shape

Return a tuple of the array dimensions.

size

The number of elements in the array.

variant

Return the variant identifier