Hopdata uses simple csv file to train an algorithm. The csv files can have any number of columns but the first column must be the label that you wish to predict.
For example look at the following table:
|<=50K||39||State gov||Bachelors||13||Never married||Adm clerical||Not in family||White||Male||United States|
|<=50K||50||Self emp not inc||Bachelors||13||Married civ spouse||Exec managerial||Husband||White||Male||United States|
Here the first column is the salary of the person. The other column are the attributes related to the salary.
The csv file will look like:
<=50K,39, State-gov, Bachelors,13, Never-married, Adm-clerical, Not-in-family, White, Male,United-States
<=50K,50, Self-emp-not-inc, Bachelors,13, Married-civ-spouse, Exec-managerial, Husband, White, Male,United-States
The label can be one of these:
- Numerical (Integer)
- Categorial (String)
Hopdata will read all the columns of your file and analyze the type of column automatically.
Here are some sample files:
The CSV file should be UTF-8 encoded, latin-1 should also work if that’s a good choice for your data. We make our best to auto detect other encodings but this process might fail so we strongly recommend to always use UTF-8 if possible.
If you are exporting the CSV from XLS file it’s not recommended to use Excel to save as CSV since this tool is known to have issues with some Unicode characters, use an alternative tool. Google Docs Spreadsheets or Libre Office are good alternatives for this task.
If you find yourself having issues with your data encoding when importing a CSV, contact our support team at [email protected]
You must keep your file size under 10 MB, or the upload will fail.