SEER Database Analysis Tools

Overview

SEER database analysis toolkit written in python3. The tool is command line driven with the configuration infomation provided via arguments and config files. The tool has been primariy tested on *nix based systems with data being stored in mongodb.

Usage

$ ./main.py <arguments>

or

$ python3 main.py <arguments>

JSON Config File

The following is an example of a json configuration file used for tuning a decision tree. There are other example json files in the /example folder of this git repo.

{
  "dataSource": {
    "mongoDb": {
      "ip": "localhost",
      "port": "27017"
    },
    "targetName" : "IdValue",
    "data" : {
      "collectionName1" : [
        {"filedA" : "Values"},
        {"fieldB" : "Values, Values"}
      ],
      "collectionName2" : [
        {"fieldA" : "values"}
      ]
    }
  },
  "decisionTree": {
    "maxTreeDepth": 3,
    "maxFeatures": 2
  },
  "output": {
    "saveJson": 1
  }
}

dataSource

Contains one of the suboptions listed below, to select where the data will come from.

Name	Value	Description
targetName	str	Name of target feature loaded from data
data Source	json Element	Element with configure info for loading data ('MongoDb', 'csvFile', ect)

mongoDb

pull data from mongo db

Name	Value	Description
ip	str	string to the ip where the Mongo DB is running
port	int	port number for mongo DB server
database	str	name of mongodb database
data	json Element	Element listing collection & field names

csvFile

pull data from csv file

Name	Value	Description
filePath	str	Path to CSV file

Data Testing

training

Name	Value	Description
split	int	Percentage of data to allocate for testing
randomSeed	int	Provide fixed seed value for deterministic results

Computation Method To Run

The next element is for a computation to run with the provided data based on the datasource.

Decision Tree

Arguments depend on the method which is being the algorithm is being used. When gridSearch is enabled, all DT arguments listed as such need to be array values (even if they are singular)

Name	Value	Description
maxTreeDepth	int \ array	Tree depth
maxFeatures	int \ array	Max number of feature to be used
minSplitNum	int \ array	Minium Split Value
randomSeed	int	Provide fixed seed value for deterministic results
gridSearch	int \ bool	When set to '1' or 'True', will enable gride search with the provided array values

Logistic Regression

Name	Value	Description
predictors	array	which predictors are to be used, more info below

predictors

This is a json style array where the first element is the name of the predictors. The name should match data's fields name being loaded from the datasource (i.e. the column name if the data source is a csv). The second part 'value', indicates the datatype 'l' linear numeric value, 'c' for categorical

 "predictors" : [
   {"height" : "l"},
   {"isHuman" : "c"}
]

Output

Sets the location and parameters for where the resulting output files should be saved

Name	Value	Description
saveJson	0,1	Save the JSON file used alone with the output data
directory	text	Directory location to save output data
timestamp	0,1	Append timestamp to output directory name

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
compute		compute
example		example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataNameHelper.py		dataNameHelper.py
fieldNameMap.csv		fieldNameMap.csv
jsonParser.py		jsonParser.py
main.py		main.py
mongoHelper.py		mongoHelper.py
textParser.py		textParser.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SEER Database Analysis Tools

Overview

Usage

JSON Config File

dataSource

mongoDb

csvFile

Data Testing

training

Computation Method To Run

Decision Tree

Logistic Regression

predictors

Output

Dependencies

Comments

Other

Logging Verbosity

Future Changes

About

Uh oh!

Releases

Packages

Languages

License

A-Ruggeri/sdat

Folders and files

Latest commit

History

Repository files navigation

SEER Database Analysis Tools

Overview

Usage

JSON Config File

dataSource

mongoDb

csvFile

Data Testing

training

Computation Method To Run

Decision Tree

Logistic Regression

predictors

Output

Dependencies

Comments

Other

Logging Verbosity

Future Changes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages