Testing on Pandas
pd.DataFrame.__doc__
DataFrame
Two-dimensional, size-mutable, potentially heterogeneous tabular data.
Data structure also contains labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure.
Parameters
| NAME | TYPE | DESCRIPTION |
|---|---|---|
| data | ndarray (structured or homogeneous), Iterable, dict, or DataFrame | Dict can contain Series, arrays, constants, or list-like objects. |
| index | Index or array-like | Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided. |
| columns | Index or array-like | Column labels to use for resulting frame. Will default to RangeIndex (0, 1, 2, ..., n) if no column labels are provided. |
| dtype | dtype, default None | Data type to force. Only a single dtype is allowed. If None, infer. |
| copy | bool, default False | Copy data from inputs. Only affects DataFrame / 2d ndarray input. |
See Also
DataFrame.from_records : Constructor from tuples, also record arrays. DataFrame.from_dict : From dicts of Series, arrays, or dicts. read_csv : Read a comma-separated values (csv) file into DataFrame. read_table : Read general delimited file into DataFrame. read_clipboard : Read text from clipboard into DataFrame.
Examples
Constructing DataFrame from a dictionary.
Notice that the inferred dtype is int64.
To enforce a single dtype:
Constructing DataFrame from numpy ndarray:
pd.concat.__doc__
concat
Concatenate pandas objects along a particular axis with optional set logic along the other axes.
Can also add a layer of hierarchical indexing on the concatenation axis, which may be useful if the labels are the same (or overlapping) on the passed axis number.
Parameters
| NAME | TYPE | DESCRIPTION |
|---|---|---|
| objs | a sequence or mapping of Series or DataFrame objects | If a mapping is passed, the sorted keys will be used as the keys argument, unless it is passed, in which case the values will be selected (see below). Any None objects will be dropped silently unless they are all None in which case a ValueError will be raised. |
| axis | {0/'index', 1/'columns'}, default 0 | The axis to concatenate along. |
| join | {'inner', 'outer'}, default 'outer' | How to handle indexes on other axis (or axes). |
| ignore_index | bool, default False | If True, do not use the index values along the concatenation axis. The resulting axis will be labeled 0, ..., n - 1. This is useful if you are concatenating objects where the concatenation axis does not have meaningful indexing information. Note the index values on the other axes are still respected in the join. |
| keys | sequence, default None | If multiple levels passed, should contain tuples. Construct hierarchical index using the passed keys as the outermost level. |
| levels | list of sequences, default None | Specific levels (unique values) to use for constructing a MultiIndex. Otherwise they will be inferred from the keys. |
| names | list, default None | Names for the levels in the resulting hierarchical index. |
| verify_integrity | bool, default False | Check whether the new concatenated axis contains duplicates. This can be very expensive relative to the actual data concatenation. |
| sort | bool, default False | Sort non-concatenation axis if it is not already aligned when join is 'outer'. This has no effect when join='inner', which already preserves the order of the non-concatenation axis. |
| copy | bool, default True | If False, do not copy data unnecessarily. |
Returns
| TYPE | DESCRIPTION |
|---|---|
| object, type of objs | When concatenating all Series along the index (axis=0), a Series is returned. When objs contains at least one DataFrame, a DataFrame is returned. When concatenating along the columns (axis=1), a DataFrame is returned. |
See Also
Series.append : Concatenate Series. DataFrame.append : Concatenate DataFrames. DataFrame.join : Join DataFrames using indexes. DataFrame.merge : Merge DataFrames by indexes or columns.
Examples
Combine two Series.
Clear the existing index and reset it in the result
by setting the ignore_index option to True.
Add a hierarchical index at the outermost level of
the data with the keys option.
Label the index keys you create with the names option.
Combine two DataFrame objects with identical columns.
Combine DataFrame objects with overlapping columns
and return everything. Columns outside the intersection will
be filled with NaN values.
Combine DataFrame objects with overlapping columns
and return only those that are shared by passing inner to
the join keyword argument.
Combine DataFrame objects horizontally along the x axis by
passing in axis=1.
Prevent the result from including duplicate index values with the
verify_integrity option.
pd.melt.__doc__
melt
Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.
This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (id_vars), while all other columns, considered measured variables (value_vars), are "unpivoted" to the row axis, leaving just two non-identifier columns, 'variable' and 'value'.
Parameters
| NAME | TYPE | DESCRIPTION |
|---|---|---|
| id_vars | tuple, list, or ndarray, optional | Column(s) to use as identifier variables. |
| value_vars | tuple, list, or ndarray, optional | Column(s) to unpivot. If not specified, uses all columns that are not set as id_vars. |
| var_name | scalar | Name to use for the 'variable' column. If None it uses frame.columns.name or 'variable'. |
| value_name | scalar, default 'value' | Name to use for the 'value' column. |
| col_level | int or str, optional | If columns are a MultiIndex then use this level to melt. |
| ignore_index | bool, default True | If True, original index is ignored. If False, the original index is retained. Index labels will be repeated as necessary. |
Returns
| TYPE | DESCRIPTION |
|---|---|
| DataFrame | Unpivoted DataFrame. |
See Also
DataFrame.melt : Identical method. pivot_table : Create a spreadsheet-style pivot table as a DataFrame. DataFrame.pivot : Return reshaped DataFrame organized by given index / column values. DataFrame.explode : Explode a DataFrame from list-like columns to long format.
Examples
The names of 'variable' and 'value' columns can be customized:
Original index values can be kept around:
If you have multi-index columns:
pd.isna.__doc__
isna
Detect missing values for an array-like object.
This function takes a scalar or array-like object and indicates whether values are missing (NaN in numeric arrays, None or NaN in object arrays, NaT in datetimelike).
Parameters
| NAME | TYPE | DESCRIPTION |
|---|---|---|
| obj | scalar or array-like | Object to check for null or missing values. |
Returns
| TYPE | DESCRIPTION |
|---|---|
| bool or array-like of bool | For scalar input, returns a scalar boolean. For array input, returns an array of boolean indicating whether each corresponding element is missing. |
See Also
notna : Boolean inverse of pandas.isna. Series.isna : Detect missing values in a Series. DataFrame.isna : Detect missing values in a DataFrame. Index.isna : Detect missing values in an Index.
Examples
Scalar arguments (including strings) result in a scalar boolean.
ndarrays result in an ndarray of booleans.
For indexes, an ndarray of booleans is returned.
For Series and DataFrame, the same type is returned, containing booleans.
pd.to_datetime.__doc__
to_datetime
Convert argument to datetime.
Parameters
| NAME | TYPE | DESCRIPTION |
|---|---|---|
| arg | int, float, str, datetime, list, tuple, 1-d array, Series, DataFrame/dict-like | The object to convert to a datetime. |
| errors | {'ignore', 'raise', 'coerce'}, default 'raise' | - If 'raise', then invalid parsing will raise an exception. - If 'coerce', then invalid parsing will be set as NaT. - If 'ignore', then invalid parsing will return the input. |
| dayfirst | bool, default False | Specify a date parse order if arg is str or its list-likes. If True, parses dates with the day first, eg 10/11/12 is parsed as 2012-11-10. Warning: dayfirst=True is not strict, but will prefer to parse with day first (this is a known bug, based on dateutil behavior). |
| yearfirst | bool, default False | Specify a date parse order if arg is str or its list-likes. |
| - If both dayfirst and yearfirst are True, yearfirst is preceded (same | as dateutil). | |
| utc | bool, default None | Return UTC DatetimeIndex if True (converting any tz-aware datetime.datetime objects as well). |
| format | str, default None | The strftime to parse time, eg "%d/%m/%Y", note that "%f" will parse all the way up to nanoseconds. See strftime documentation for more information on choices: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior. |
| exact | bool, True by default | Behaves as: - If True, require an exact format match. - If False, allow the format to match anywhere in the target string. |
| unit | str, default 'ns' | The unit of the arg (D,s,ms,us,ns) denote the unit, which is an integer or float number. This will be based off the origin. Example, with unit='ms' and origin='unix' (the default), this would calculate the number of milliseconds to the unix epoch start. |
| infer_datetime_format | bool, default False | If True and no format is given, attempt to infer the format of the datetime strings based on the first non-NaN element, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by ~5-10x. |
| origin | scalar, default 'unix' | Define the reference date. The numeric values would be parsed as number of units (defined by unit) since this reference date. |
| - If Timestamp convertible, origin is set to Timestamp identified by | origin. | |
| cache | bool, default True | If True, use a cache of unique, converted dates to apply the datetime conversion. May produce significant speed-up when parsing duplicate date strings, especially ones with timezone offsets. The cache is only used when there are at least 50 values. The presence of out-of-bounds values will render the cache unusable and may slow down parsing. |
Returns
| TYPE | DESCRIPTION |
|---|---|
| datetime | If parsing succeeded. Return type depends on input: - list-like: DatetimeIndex - Series: Series of datetime64 dtype - scalar: Timestamp In case when it is not possible to return designated types (e.g. when any element of input is before Timestamp.min or after Timestamp.max) return will have datetime.datetime type (or corresponding array/Series). |
See Also
DataFrame.astype : Cast argument to a specified dtype. to_timedelta : Convert argument to timedelta. convert_dtypes : Convert dtypes.
Examples
Assembling a datetime from multiple columns of a DataFrame. The keys can be common abbreviations like ['year', 'month', 'day', 'minute', 'second', 'ms', 'us', 'ns']) or plurals of the same
If a date does not meet the timestamp limitations
<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
#timeseries-timestamp-limits>_, passing errors='ignore'
will return the original input instead of raising any exception.
Passing errors='coerce' will force an out-of-bounds date to NaT, in addition to forcing non-dates (or non-parseable dates) to NaT.
Passing infer_datetime_format=True can often-times speedup a parsing if its not an ISO8601 format exactly, but in a regular format.
Using a unix epoch time
.. warning:: For float arg, precision rounding might happen. To prevent unexpected behavior use a fixed-width exact type.
Using a non-unix epoch origin