Data Preprocessing Module.
This module contains several classes and functions that help to handle, preprocessing and aggregate Expyriment data files.
Bases: object
A class implementing a tool to aggregate Expyriment data.
This class is used to handle the multiple data files of a Experiment and process (i.e, aggregate) the data for further analysis
Examples
This tool helps, for instance, to aggregate your data for certain combinations of independent variables. E.g., data of a numerical magnitude judgement experiment. The code below makes a file with mean and median RTs and a second file with the errors and the number of trials:
from expyriment.misc import data_preprocessing
agg = data_preprocessing.Aggregator(data_folder= "./mydata/",
file_name = "MagnitudeJudgements")
agg.set_computed_variables(["parity = target_number % 2",
"size = target_number > 65"])
agg.set_independent_variables(["hand", "size" , "parity"])
agg.set_exclusions(["trial_counter < 0",
"error != 0",
"RT < 2*std",
"RT > 2*std" # remove depending std in iv factor
# combination for each subject
])
agg.set_dependent_variables(["mean(RT)", "median(RT)"])
agg.aggregate(output_file="rts.csv")
agg.set_exclusions(["trial_counter < 0"])
agg.set_dependent_variables(["sum(error)", "n_trials"])
agg.aggregate(output_file="errors.csv")
Methods
Adds a new variable to the data.
Parameters : | variable_names : str
data_columns : numpy.array
|
---|
Notes
The amount of variables and added columns must match. The added data must also match the number of rows. Note, manually added variables will be lost if cases will be excluded afterwards via a call of the method set_exclusions.
Aggregate the data as defined by the design.
The design will be printed and the resulting data will be return as numpy.array together with the variable names.
Parameters : | output_file : str, optional
column_subject_id : int, optional
|
---|---|
Returns : | result : numpy.array new_variable_names : list of strings |
Getter for concatenated_data.
Returns : | data : numpy.array variables : list of str |
---|
Notes
Returns all data of all subjects as numpy.array and all variables names (including added variables). According to the defined design, the result contains the new computed variables and the subject variables from the headers of the Expyriment data files.
If data have been loaded and no new variable or exclusion has been defined the concatenated_data will merely return the previous data without re-processing.
Read data from from a single Expyriment data file.
Parameters : | filename : str
recode_variables : bool, optional
compute_new_variables : bool, optional
exclude_trials : bool, optional
|
---|---|
Returns : | data : numpy.array var_names : list
info : str
comment : str
|
Notes
The function can be only applied on data of aggregator.data_files, that is, on the files in the defined data folder that start with the experiment name. According to the defined design, the result contains recoded data together with the new computed variables, and the subject variables from the headers of the Expyriment data files.
Returns the column of data as numpy array.
Parameters : | variables : list of str
|
---|---|
Returns : | data : numpy.array |
Print the number of trials in the combinations of the independent variables.
Parameters : | variables : str or list
|
---|
Notes
The functions is for instance useful to quickly check the experimental design.
Reset the aggregator class and clear design.
Parameters : | data_folder : str
file_name : str
suffix : str, optional
|
---|
Set syntax to compute new variables.
The method defines the variables, which will be computed. It can not be applied on variables that have been added manually via add_variables. The method requires a re-reading of the data files and might be therefore time consuming.
Parameters : | compute_syntax : str or list
|
---|
Notes
Compute Syntax:
{new-variable} = {variable} {relation/operation} {variable/value}
{new-variable} -- a new not yet defined variable name
{variable} -- a defined data variable
{relation} -- ==, !=, >, <, >=, <=, => or <=
{operation} -- +, -, *, / or %
{value} -- string or numeric
Set dependent variables.
Parameters : | dv_syntax : str or list
|
---|
Notes
Syntax:
{function}({variable})
{function} -- mean, median, sum, std or n_trials
Note: n_trials counts the number of trials
and does not require a variable as argument
{variable} -- a defined data variable
Set rules to exclude trials from the analysis.
The method indicates the rows, which are ignored while reading the data files. It can therefore not be applied on variables that have been added later via add_variables and results in a loss of all manually added variables. Setting exclusions requires re-reading of the data files and might be therefore time consuming. Thus, call this method always at the beginning of your analysis script.
Parameters : | rule_syntax : str or list
|
---|
Notes
Rule syntax:
{variable} {relation} {variable/value}
{variable} -- a defined data variable
{relation} -- ==, !=, >, <, >=, <=, => or <=
{value} -- string or numeric
If value is "{numeric} * std", trails are excluded in which
the variable is below or above {numeric} standard deviations
from the mean. The relations "==" and "!=" are not allow in
this case. The exclusion criterion is apply for each subject
and factor combination separately.
Set the independent variables.
Parameters : | variables : str or list
|
---|
Set subject variables to be considered for the analysis.
The method sets the subject variables. Subject variables are between subject factors or other variables defines in the subject information section (#s) of the Expyriment data file. The method requires a re-reading of the data files and might be therefore time consuming.
Parameters : | variables : str or list
|
---|
Set syntax to recode variables.
The method defines the variables, which will recoded. It can not be applied on variables that have been added later via add_variables. Recoding variables requires re-reading of the data files and might be therefore time consuming.
Parameters : | rule_syntax : str or list
|
---|
Notes
Recoding syntax:
{variable}: {old_value1} = {new_value1}, {old_value2} = {new_value2},...
Read an Expyriment data file.
Returns the data, the variable names, the subject info & the comments:
Parameters : | filename : str
only_header_and_variable_names : bool, optional
|
---|---|
Returns : | data : list of list
variables : list of str
subject_info : dict
comments : str
encoding : str, optional
|
Concatenate data and write it to a csv file.
All files that start with this name will be considered for the analysis (cf. aggregator.data_files)
Parameters : | data_folder : str
file_name : str
output_file : str, optional
delimiter : str, optional
|
---|
Notes
The function is useful to combine the experimental data and prepare for further processing with other software. It basically wraps Aggregator.write_concatenated_data.
Write 2D data array to csv file.
Parameters : | filename : str
data : list of list
varnames : list of str, optional
delimiter : str, optional
|
---|
Read an Expyriment data file.
Returns the data, the variable names, the subject info & the comments:
Parameters : | filename : str
only_header_and_variable_names : bool, optional
|
---|---|
Returns : | data : list of list
variables : list of str
subject_info : dict
comments : str
encoding : str, optional
|
Concatenate data and write it to a csv file.
All files that start with this name will be considered for the analysis (cf. aggregator.data_files)
Parameters : | data_folder : str
file_name : str
output_file : str, optional
delimiter : str, optional
|
---|
Notes
The function is useful to combine the experimental data and prepare for further processing with other software. It basically wraps Aggregator.write_concatenated_data.
Write 2D data array to csv file.
Parameters : | filename : str
data : list of list
varnames : list of str, optional
delimiter : str, optional
|
---|