expyriment.misc.data_preprocessing

Data Preprocessing Module.

This module contains several classes and functions that help to handle, preprocessing and aggregate Expyriment data files.

class expyriment.misc.data_preprocessing.Aggregator(data_folder, file_name, suffix='.xpd')[source]

Bases: object

A class implementing a tool to aggregate Expyriment data.

This class is used to handle the multiple data files of a Experiment and process (i.e, aggregate) the data for further analysis

Examples

This tool helps, for instance, to aggregate your data for certain combinations of independent variables. E.g., data of a numerical magnitude judgement experiment. The code below makes a file with mean and median RTs and a second file with the errors and the number of trials:

from expyriment.misc import data_preprocessing
agg = data_preprocessing.Aggregator(data_folder= "./mydata/",
                        file_name = "MagnitudeJudgements")
agg.set_computed_variables(["parity = target_number % 2",
               "size = target_number > 65"])
agg.set_independent_variables(["hand", "size" , "parity"])

agg.set_exclusions(["trial_counter < 0",
                    "error != 0",
                    "RT < 2*std",
                    "RT > 2*std" # remove depending std in iv factor
                                 # combination for each subject
                    ])
agg.set_dependent_variables(["mean(RT)", "median(RT)"])
agg.aggregate(output_file="rts.csv")

agg.set_exclusions(["trial_counter < 0"])
agg.set_dependent_variables(["sum(error)", "n_trials"])
agg.aggregate(output_file="errors.csv")

Methods

add_variables(variable_names, data_columns)[source]

Adds a new variable to the data.

Parameters :

variable_names : str

name of the new variable(s)

data_columns : numpy.array

the new data columns as numpy array

Notes

The amount of variables and added columns must match. The added data must also match the number of rows. Note, manually added variables will be lost if cases will be excluded afterwards via a call of the method set_exclusions.

added_variables[source]

Getter for added variables.

aggregate(output_file=None, column_subject_id=0)[source]

Aggregate the data as defined by the design.

The design will be printed and the resulting data will be return as numpy.array together with the variable names.

Parameters :

output_file : str, optional

name of data output file. If this output_file is defined the function write the results as csv data file

column_subject_id : int, optional

data column containing the subject id (default=0)

Returns :

result : numpy.array

new_variable_names : list of strings

computed_variables[source]

Getter for computed variables.

concatenated_data[source]

Getter for concatenated_data.

Returns :

data : numpy.array

variables : list of str

Notes

Returns all data of all subjects as numpy.array and all variables names (including added variables). According to the defined design, the result contains the new computed variables and the subject variables from the headers of the Expyriment data files.

If data have been loaded and no new variable or exclusion has been defined the concatenated_data will merely return the previous data without re-processing.

data_files[source]

Getter for data_files.

The list of the data files considered for the analysis.

data_folder[source]

Getter for data_folder.

dependent_variables[source]

Getter for dependent variables.

exclusions[source]

Getter for exclusions.

file_name[source]

Getter for file_name.

get_data(filename, recode_variables=True, compute_new_variables=True, exclude_trials=True)[source]

Read data from from a single Expyriment data file.

Parameters :

filename : str

name of the Expyriment data file

recode_variables : bool, optional

set to False if defined variable recodings should not be applied (default=True)

compute_new_variables : bool, optional

set to False if new defined variables should not be computed (default=True)

exclude_trials : bool, optional

set to False if exclusion rules should not be applied (default=True)

Returns :

data : numpy.array

var_names : list

list of variable names

info : str

subject info

comment : str

comments in data

Notes

The function can be only applied on data of aggregator.data_files, that is, on the files in the defined data folder that start with the experiment name. According to the defined design, the result contains recoded data together with the new computed variables, and the subject variables from the headers of the Expyriment data files.

get_variable_data(variables)[source]

Returns the column of data as numpy array.

Parameters :

variables : list of str

names of the variables to be extracted

Returns :

data : numpy.array

independent_variables[source]

Getter for independent_variables.

print_n_trials(variables)[source]

Print the number of trials in the combinations of the independent variables.

Parameters :

variables : str or list

A string or a list of strings that represent the names of one or more data variables (aggregator.variables)

Notes

The functions is for instance useful to quickly check the experimental design.

reset(data_folder, file_name, suffix='.xpd')[source]

Reset the aggregator class and clear design.

Parameters :

data_folder : str

folder which contains of data of the subjects

file_name : str

name of the files. All files that start with this name will be considered for the analysis (cf. aggregator.data_files)

suffix : str, optional

if specified only files that end with this particular suffix will be considered (default=.xpd)

set_computed_variables(compute_syntax)[source]

Set syntax to compute new variables.

The method defines the variables, which will be computed. It can not be applied on variables that have been added manually via add_variables. The method requires a re-reading of the data files and might be therefore time consuming.

Parameters :

compute_syntax : str or list

A string or a list of strings that represent the syntax to compute the new variables

Notes

Compute Syntax:

{new-variable} = {variable} {relation/operation} {variable/value}
    {new-variable} -- a new not yet defined variable name
    {variable}     -- a defined data variable
    {relation}     --  ==, !=, >, <, >=, <=, => or <=
    {operation}    -- +, -, *, / or %
    {value}        -- string or numeric
set_dependent_variables(dv_syntax)[source]

Set dependent variables.

Parameters :

dv_syntax : str or list

syntax describing the dependent variable by a function and variable, e.g. mean(RT)

Notes

Syntax:

{function}({variable})
    {function} -- mean, median, sum, std or n_trials
                  Note: n_trials counts the number of trials
                  and does not require a variable as argument
    {variable} -- a defined data variable
set_exclusions(rule_syntax)[source]

Set rules to exclude trials from the analysis.

The method indicates the rows, which are ignored while reading the data files. It can therefore not be applied on variables that have been added later via add_variables and results in a loss of all manually added variables. Setting exclusions requires re-reading of the data files and might be therefore time consuming. Thus, call this method always at the beginning of your analysis script.

Parameters :

rule_syntax : str or list

A string or a list of strings that represent the rules to exclude trials

Notes

Rule syntax:

{variable} {relation} {variable/value}
    {variable}  -- a defined data variable
    {relation}  --  ==, !=, >, <, >=, <=, => or <=
    {value}     -- string or numeric

    If value is "{numeric} * std", trails are excluded in which
    the variable is below or above {numeric} standard deviations
    from the mean. The relations "==" and "!=" are not allow in
    this case. The exclusion criterion is apply for each subject
    and factor combination separately.
set_independent_variables(variables)[source]

Set the independent variables.

Parameters :

variables : str or list

the name(s) of one or more data variables (aggregator.variables)

set_subject_variables(variables)[source]

Set subject variables to be considered for the analysis.

The method sets the subject variables. Subject variables are between subject factors or other variables defines in the subject information section (#s) of the Expyriment data file. The method requires a re-reading of the data files and might be therefore time consuming.

Parameters :

variables : str or list

A string or a list of strings that represent the subject variables

set_variable_recoding(recoding_syntax)[source]

Set syntax to recode variables.

The method defines the variables, which will recoded. It can not be applied on variables that have been added later via add_variables. Recoding variables requires re-reading of the data files and might be therefore time consuming.

Parameters :

rule_syntax : str or list

A string or a list of strings that represent the variable recoding syntax

Notes

Recoding syntax:

{variable}: {old_value1} = {new_value1}, {old_value2} = {new_value2},...
subject_variables[source]

Getter for subject variable.

variable_recodings[source]

Getter for variable recodings.

variables[source]

Getter for variables.

The specified variables including the new computer variables and between subject variables and added variables.

write_concatenated_data(output_file=None, delimiter=', ')[source]

Concatenate data and write it to a csv file.

Parameters :

output_file : str, optional

name of data output file If no specified data will the save to {file_name}.csv

delimiter : str

delimiter character (default=”,”)

expyriment.misc.data_preprocessing.read_datafile(filename, only_header_and_variable_names=False, encoding=None)[source]

Read an Expyriment data file.

Returns the data, the variable names, the subject info & the comments:

Parameters :

filename : str

name (fullpath) of the Expyriment data file

only_header_and_variable_names : bool, optional

if True the function reads only the header and variable names (default=False)

Returns :

data : list of list

data array

variables : list of str

variable names list

subject_info : dict

dictionary with subject information (incl. date and between subject factors)

comments : str

string with remaining comments

encoding : str, optional

the encoding with which the contents of the file will be read

expyriment.misc.data_preprocessing.write_concatenated_data(data_folder, file_name, output_file=None, delimiter=', ')[source]

Concatenate data and write it to a csv file.

All files that start with this name will be considered for the analysis (cf. aggregator.data_files)

Parameters :

data_folder : str

folder which contains of data of the subjects (str)

file_name : str

name of the files

output_file : str, optional

name of data output file. If no specified data will the save to {file_name}.csv

delimiter : str, optional

delimiter character (default=”,”)

Notes

The function is useful to combine the experimental data and prepare for further processing with other software. It basically wraps Aggregator.write_concatenated_data.

expyriment.misc.data_preprocessing.write_csv_file(filename, data, varnames=None, delimiter=', ')[source]

Write 2D data array to csv file.

Parameters :

filename : str

name (fullpath) of the data file

data : list of list

2D array with data (list of list)

varnames : list of str, optional

array of strings representing variable names

delimiter : str, optional

delimiter character (default=”,”)

Functions

expyriment.misc.data_preprocessing.read_datafile(filename, only_header_and_variable_names=False, encoding=None)[source]

Read an Expyriment data file.

Returns the data, the variable names, the subject info & the comments:

Parameters :

filename : str

name (fullpath) of the Expyriment data file

only_header_and_variable_names : bool, optional

if True the function reads only the header and variable names (default=False)

Returns :

data : list of list

data array

variables : list of str

variable names list

subject_info : dict

dictionary with subject information (incl. date and between subject factors)

comments : str

string with remaining comments

encoding : str, optional

the encoding with which the contents of the file will be read

expyriment.misc.data_preprocessing.write_concatenated_data(data_folder, file_name, output_file=None, delimiter=', ')[source]

Concatenate data and write it to a csv file.

All files that start with this name will be considered for the analysis (cf. aggregator.data_files)

Parameters :

data_folder : str

folder which contains of data of the subjects (str)

file_name : str

name of the files

output_file : str, optional

name of data output file. If no specified data will the save to {file_name}.csv

delimiter : str, optional

delimiter character (default=”,”)

Notes

The function is useful to combine the experimental data and prepare for further processing with other software. It basically wraps Aggregator.write_concatenated_data.

expyriment.misc.data_preprocessing.write_csv_file(filename, data, varnames=None, delimiter=', ')[source]

Write 2D data array to csv file.

Parameters :

filename : str

name (fullpath) of the data file

data : list of list

2D array with data (list of list)

varnames : list of str, optional

array of strings representing variable names

delimiter : str, optional

delimiter character (default=”,”)

Table Of Contents

Previous topic

expyriment.misc.constants

Next topic

expyriment.misc.data_preprocessing.Aggregator

This Page