HomeGetting StartedCSV File Format

CSV File Format

Hopdata uses simple csv file to train an algorithm. The csv files can have any number of columns but the first column must be the label that you wish to predict.

For example look at the following table:

Input File

Salary Age Job Education Experience Married Position Family Race Sex Country
<=50K 39 State gov Bachelors 13 Never married Adm clerical Not in family White Male United States
<=50K 50 Self emp not inc Bachelors 13 Married civ spouse Exec managerial Husband White Male United States


Here the first column is the salary of the person. The other column are the attributes related to the salary.


The csv file will look like:

<=50K,39, State-gov, Bachelors,13, Never-married, Adm-clerical, Not-in-family, White, Male,United-States
<=50K,50, Self-emp-not-inc, Bachelors,13, Married-civ-spouse, Exec-managerial, Husband, White, Male,United-States

The label can be one of these:

  • Numerical (Integer)
  • Categorial (String)


Hopdata will read all the columns of your file and analyze the type of column automatically.

Here are some sample files:


Encoding requirements

The CSV file should be UTF-8 encoded, latin-1 should also work if that’s a good choice for your data. We make our best to auto detect other encodings but this process might fail so we strongly recommend to always use UTF-8 if possible.

If you are exporting the CSV from XLS file it’s not recommended to use Excel to save as CSV since this tool is known to have issues with some Unicode characters, use an alternative tool. Google Docs Spreadsheets or Libre Office are good alternatives for this task.

If you find yourself having issues with your data encoding when importing a CSV, contact our support team at [email protected]

Upload limitations

You must keep your file size under 10 MB, or the upload will fail.

Was this article helpful to you? Yes No