Performs a very naive guess of the CsvFormat.
Performs a very naive guess of the CsvFormat. This uses weighted frequencies of occurences of common separators, row-delimiters, quotes, quote escapes, etc. and simply selects the max for each. For empty values, it uses the frequency of the the possible empty values within the cells.
This supports:
* \r\n and \n as row delimiters, * ',', '\t', ';', and '|' as field delimiters, * '"', and as quote delimiter, * the quote delimiter or \ for quote escapes, * , '?', '-', 'N/A', and 'NA' as empty values, and * 'N/M' and 'NM' as invalid values.
Headers are guessed by using the cosine similarity of the frequency of characters (except quotes/field delimiters) between the first row and all subsequent rows. Values below 0.5 will result in a header being inferred.
Makes a guess at the format of the CSV accessed by reader
.
Makes a guess at the format of the CSV accessed by reader
. This returns
the format, as well as the a new pushback reader to be used in place of
reader
. The original reader will have some data read out of it. The
returned reader will contain all the original reader's data.