expyriment.misc.data_preprocessing.Aggregator¶
-
class
expyriment.misc.data_preprocessing.
Aggregator
(data_folder, file_name, suffix='.xpd', read_variables=None, names_comprise_glob_pattern=False)¶ A class implementing a tool to aggregate Expyriment data.
This class is used to handle the multiple data files of a Experiment and process (i.e, aggregate) the data for further analysis
Examples
This tool helps, for instance, to aggregate your data for certain combinations of independent variables. E.g., data of a numerical magnitude judgement experiment. The code below makes a file with mean and median RTs and a second file with the errors and the number of trials
>>> from expyriment.misc import data_preprocessing >>> agg = data_preprocessing.Aggregator(data_folder= "./mydata/", >>> file_name = "MagnitudeJudgements") >>> agg.set_computed_variables(["parity = target_number % 2", >>> "size = target_number > 65"]) >>> agg.set_independent_variables(["hand", "size" , "parity"]) >>> >>> agg.set_exclusions(["trial_counter < 0", >>> "error != 0", >>> "RT < 2*std", >>> "RT > 2*std" # remove depending std in iv factor >>> # combination for each subject >>> ]) >>> agg.set_dependent_variables(["mean(RT)", "median(RT)"]) >>> agg.aggregate(output_file="rts.csv") >>> >>> agg.set_exclusions(["trial_counter < 0"]) >>> agg.set_dependent_variables(["sum(error)", "n_trials"]) >>> agg.aggregate(output_file="errors.csv")
-
__init__
(data_folder, file_name, suffix='.xpd', read_variables=None, names_comprise_glob_pattern=False)¶ Create an aggregator.
Parameters: - data_folder :str
folder which contains the data
- file_namestr
name of the files, all files that start with this name will be considered for the analysis (cf. aggregator.data_files)
- suffixstr, optional
if specified only files that end with this particular suffix will be considered (default=.xpd)
- read_variablesarray of str, optional
array of variable names, read only the specified variables
- names_comprise_glob_patternboolean, optional
if True, data_folder and file_name are processed as glob pattern with wildcards such as “*” or “?”; the suffix parameter will be ignored
-
add_variables
(variable_names, data_columns)¶ Adds a new variable to the data.
Parameters: - variable_namesstr
name of the new variable(s)
- data_columnsnumpy.array
the new data columns as numpy array
Notes
The amount of variables and added columns must match. The added data must also match the number of rows. Manually added variables will be lost if cases will be excluded afterwards via a call of the method set_exclusions.
-
property
added_variables
¶ Getter for added variables.
-
aggregate
(output_file=None, column_subject_id=0)¶ Aggregate the data as defined by the design.
The design will be printed and the resulting data will be return as numpy.array together with the variable names.
Parameters: - output_filestr, optional
name of data output file. If this output_file is defined the function write the results as csv data file
- column_subject_idint, optional
data column containing the subject id (default=0)
Returns: - resultnumpy.array
- new_variable_nameslist of strings
-
property
computed_variables
¶ Getter for computed variables.
-
property
concatenated_data
¶ Getter for concatenated_data.
Returns: - datanumpy.array
- variableslist of str
Notes
Returns all data of all subjects as numpy.array and all variables names (including added variables). According to the defined design, the result contains the new computed variables and the subject variables from the headers of the Expyriment data files.
If data have been loaded and no new variable or exclusion has been defined the concatenated_data will merely return the previous data without re-processing.
-
property
data_files
¶ Getter for data_files.
The list of the data files considered for the analysis.
-
property
dependent_variables
¶ Getter for dependent variables.
-
property
exclusions
¶ Getter for exclusions.
-
get_data
(filename, recode_variables=True, compute_new_variables=True, exclude_trials=True)¶ Read data from from a single Expyriment data file.
Parameters: - filenamestr
name of the Expyriment data file
- recode_variablesbool, optional
set to False if defined variable recodings should not be applied (default=True)
- compute_new_variablesbool, optional
set to False if new defined variables should not be computed (default=True)
- exclude_trialsbool, optional
set to False if exclusion rules should not be applied (default=True)
Returns: - datanumpy.array
- var_nameslist
list of variable names
- infostr
subject info
- commentstr
comments in data
Notes
The function can be only applied on data of aggregator.data_files, that is, on the files in the defined data folder that start with the experiment name. According to the defined design, the result contains recoded data together with the new computed variables, and the subject variables from the headers of the Expyriment data files.
-
get_variable_data
(variables)¶ Returns the column of data as numpy array.
Parameters: - variableslist of str
names of the variables to be extracted
Returns: - datanumpy.array
-
property
independent_variables
¶ Getter for independent_variables.
-
print_n_trials
(variables)¶ Print the number of trials in the combinations of the independent variables.
Parameters: - variablesstr or list
A string or a list of strings that represent the names of one or more data variables (aggregator.variables)
Notes
The functions is for instance useful to quickly check the experimental design.
-
reset
(data_folder, file_name, suffix='.xpd', variables=None, names_comprise_glob_pattern=False)¶ Reset the aggregator class and clear design.
Parameters: - data_folderstr
folder which contains the data
- file_namestr
name of the files. All files that start with this name will be considered for the analysis (cf. aggregator.data_files)
- suffixstr, optional
if specified only files that end with this particular suffix will be considered (default=.xpd)
- variablesarray of str, optional
array of variable names, process only the specified variables
- names_comprise_glob_patternboolean, optional
if True, data_folder and file_name are processed as glob pattern with wildcards such as “*” or “?”. The suffix parameter will be ignored.
-
set_computed_variables
(compute_syntax)¶ Set syntax to compute new variables.
The method defines the variables, which will be computed. It can not be applied on variables that have been added manually via add_variables. The method requires a re-reading of the data files and might be therefore time consuming.
Parameters: - compute_syntaxstr or list
A string or a list of strings that represent the syntax to compute the new variables
Notes
Compute Syntax:
{new-variable} = {variable} {relation/operation} {variable/value} {new-variable} -- a new not yet defined variable name {variable} -- a defined data variable {relation} -- ==, !=, >, <, >=, <=, => or <= {operation} -- +, -, *, / or % {value} -- string or numeric
-
set_dependent_variables
(dv_syntax)¶ Set dependent variables.
Parameters: - dv_syntaxstr or list
syntax describing the dependent variable by a function and variable, e.g. mean(RT)
Notes
Syntax:
{function}({variable}) {function} -- mean, median, sum, std or n_trials Note: n_trials counts the number of trials and does not require a variable as argument {variable} -- a defined data variable
-
set_exclusions
(rule_syntax)¶ Set rules to exclude trials from the analysis.
The method indicates the rows, which are ignored while reading the data files. It can therefore not be applied on variables that have been added later via add_variables and results in a loss of all manually added variables. Setting exclusions requires re-reading of the data files and might be therefore time consuming. Thus, call this method always at the beginning of your analysis script.
Parameters: - rule_syntaxstr or list
A string or a list of strings that represent the rules to exclude trials
Notes
Rule syntax:
{variable} {relation} {variable/value} {variable} -- a defined data variable {relation} -- ==, !=, >, <, >=, <=, => or <= {value} -- string or numeric If value is "{numeric} * std", trails are excluded in which the variable is below or above {numeric} standard deviations from the mean. The relations "==" and "!=" are not allow in this case. The exclusion criterion is apply for each subject and factor combination separately.
-
set_independent_variables
(variables)¶ Set the independent variables.
Parameters: - variablesstr or list
the name(s) of one or more data variables (aggregator.variables)
-
set_subject_variables
(variables)¶ Set subject variables to be considered for the analysis.
The method sets the subject variables. Subject variables are between subject factors or other variables defines in the subject information section (#s) of the Expyriment data file. The method requires a re-reading of the data files and might be therefore time consuming.
Parameters: - variablesstr or list
A string or a list of strings that represent the subject variables
-
set_variable_recoding
(recoding_syntax)¶ Set syntax to recode variables.
The method defines the variables, which will recoded. It can not be applied on variables that have been added later via add_variables. Recoding variables requires re-reading of the data files and might be therefore time consuming.
Parameters: - rule_syntaxstr or list
A string or a list of strings that represent the variable recoding syntax
Notes
Recoding syntax:
{variable}: {old_value1} = {new_value1}, {old_value2} = {new_value2},...
-
property
subject_variables
¶ Getter for subject variable.
-
property
variable_recodings
¶ Getter for variable recodings.
-
property
variables
¶ Getter for variables.
The specified variables including the new computer variables and between subject variables and added variables.
-
write_concatenated_data
(output_file, delimiter=', ')¶ Concatenates data and writes it to a csv file.
Parameters: - output_filestr
name of data output file
- delimiterstr
delimiter character (default=”,”)
-