The data set contains features for binary prediction of the income of an adult (the adult dataset).
The data has 48842 records with 14 features and one binary target column, referring to whether the person’s income is greater than 50K.
This is a copy of UCI ML Adult dataset. https://archive.ics.uci.edu/ml/datasets/adult
Ron Kohavi, “Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid”, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 1996
The typical ML task in this dataset is to build a model that determines whether a person makes over 50K a year.
- Dataset Shape:
Samples per class
‘>50K’ - 23.93%, ‘<=50K’ - 76.07%
The age of the person.
[Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked]
- [Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters,
1st-4th, 10th, Doctorate, 5th-6th, Preschool]
Number of years of education
- [Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent,
- [Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners,
Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces]
[Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried]
[White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black]
The capital gain of the person
The capital loss of the person
The number of hours worked per week
- [United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India,
Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands]
The target variable, whether the person makes over 50K a year.
Load and returns the Adult dataset (classification).
Load and return a fitted classification model.