Class | Ai4r::Data::DataSet |
In: |
lib/ai4r/data/data_set.rb
|
Parent: | Object |
A data set is a collection of N data items. Each data item is described by a set of attributes, represented as an array. Optionally, you can assign a label to the attributes, using the data_labels property.
data_items | [R] | |
data_labels | [R] |
Create a new DataSet. By default, empty. Optionaly, you can provide the initial data items and data labels.
e.g. DataSet.new(:data_items => data_items, :data_labels => labels)
If you provide data items, but no data labels, the data set will use the default data label values (see set_data_labels)
Returns a Set instance containing all possible values for an attribute The parameter can be an attribute label or index (0 based).
build_domain("city")
build_domain("age")
build_domain(2) # In this example, the third attribute is gender
Returns an array with the domain of each attribute:
Return example:
#<Set: {"<30", "[30-50)", "[50-80]", ">80"}>, #<Set: {"M", "F"}>, [5, 85], #<Set: {"Y", "N"}>]
Returns the index of a given attribute (0-based). For example, if "gender" is the third attribute, then:
get_index("gender") => 2
Returns an array with the mean value of numeric attributes, and the most frequent value of non numeric attributes
opens a csv-file and reads it line by line for each line, a block is called and the row is passed to the block ruby1.8 and 1.9 safe
Set the data items. M data items with N attributes must have the following format:
[ [ATT1_VAL1, ATT2_VAL1, ATT3_VAL1, ... , ATTN_VAL1, CLASS_VAL1], [ATT1_VAL2, ATT2_VAL2, ATT3_VAL2, ... , ATTN_VAL2, CLASS_VAL2], ... [ATTM1_VALM, ATT2_VALM, ATT3_VALM, ... , ATTN_VALM, CLASS_VALM], ]
e.g.
[ ['New York', '<30', 'M', 'Y'], ['Chicago', '<30', 'M', 'Y'], ['Chicago', '<30', 'F', 'Y'], ['New York', '<30', 'M', 'Y'], ['New York', '<30', 'M', 'Y'], ['Chicago', '[30-50)', 'M', 'Y'], ['New York', '[30-50)', 'F', 'N'], ['Chicago', '[30-50)', 'F', 'Y'], ['New York', '[30-50)', 'F', 'N'], ['Chicago', '[50-80]', 'M', 'N'], ['New York', '[50-80]', 'F', 'N'], ['New York', '[50-80]', 'M', 'N'], ['Chicago', '[50-80]', 'M', 'N'], ['New York', '[50-80]', 'F', 'N'], ['Chicago', '>80', 'F', 'Y'] ]
This method returns the classifier (self), allowing method chaining.
Set data labels. Data labels must have the following format:
[ 'city', 'age_range', 'gender', 'marketing_target' ]
If you do not provide labels for you data, the following labels will be created by default:
[ 'attribute_1', 'attribute_2', 'attribute_3', 'class_value' ]