Propensity Score Matching Generator

The PSM calculator generates matches between treated subjects and the control group.

Enter data in columns
Enter data from Excel
Header: you may rename 'Name-1', 'Name-2', etc.
Data: use Enter as delimiter; you may change the delimiters on 'More options'.

Select the variables:

Outcome - the dependent variable Treatment - Independent variables - covariates,

When to use the PSM?

You may use the propensity score analysis when you couldn't randomized the treatment. The propensity score matching help to reduce the effect of the confounding variables (covariates) by matching similar subjects between treatment group and the control group.

Input sata structure

  1. ID - The first column is the unique ID.
  2. Covariates - The next columns are covariates, which can be either numerical or categorical variables.
  3. Treatment - The second-to-last column is the treatment, and it should contain only 1 or 0 values.
  4. Output - The last column contains numerical data.

How to use the statistics calculator

Statistics calculator with optional Excel input. Computes statistics such as minimum, maximum, sum, count, standard deviation, quartiles, IQR, and skewness. Handles both numerical and categorical data. Insert one or more columns, and the statistics calculator processes each separately.

How to enter data?

  • Enter raw data directly - usually you have the raw data.
    a. Enter the name of the group.
    b. Enter the raw data separated by 'comma', 'space', or 'enter'. (*you may copy only the data from excel).
  • Enter raw data from excel

    Enter the header on the first row.

    1. Copy Paste
      • a. copy the raw data with the header from Excel or Google sheets, or any tool that separates data with tab and line feed. copy the entire block, include the header .
      • Paste the data in the input field.
    2. Import data from an Excel or CSV file.
      When you select an Excel file, the calculator will automatically load the first sheet and display it in the input field. You can choose either an Excel file (.xlsx or .xls) or a CSV file (.csv).
      To upload your file, use one of the following methods:
      1. Browse and select – Click the 'Browse' button and choose the file from your computer.
      2. Drag and drop – Drag your file and drop it into the 'Drop your .xlsx, .xls, or .csv file here!' area.
      Once the file is uploaded, the statistics calculator will display the data from the first sheet in the input field.
      Now, the 'Select sheet' dropdown will be populated with the names of your sheets, and you can choose any sheet.
    3. Filter Data
      When using the 'Enter data from Excel' option, you can filter the data by clicking the following icon above the header: excel filter icon
      You may select one or more values from the dropdown. Please note that the filter will include any value that contains the values you choose.

Balance

  1. Standardized mean difference - between -0.1 and 0.1
  2. Variance ratios balance - between 0.8 and

Assumptions

  1. No general equilibrium effect - the control subjects don't get the treatment indirectly.
  2. Conditional independent - the outcomes (y) are independent of the treatment.

Numerical data

Quantitative data, continuous variable or ordinal variable

Categorical data

Qualitative data, categorical variable

Logistic regression parameters

  1. Learning rate(α): Alpha represents the learning rate; common values typically range from 0.1 to 0.001. It represents the size of the gradient change in each iteration. It controls how much the coefficients are adjusted during each iteration of the optimization process. A smaller alpha means smaller steps in gradient descent, which can lead to more precise convergence but might require more iterations.
  2. Penalty(λ): This parameter controls the amount of regularization applied to the model. It is a shrinkage parameter that penalizes large coefficients to prevent overfitting. When lambda is set to zero, no regularization is applied, and the model behaves like ordinary least squares (OLS). As lambda increases, more penalty is applied, shrinking the coefficients towards zero. This helps in reducing model complexity and can improve generalization on unseen data
  3. Iterations: On each iteration, the algorithm changes the coefficients in a direction that will increase the log-likelihood. A higher number of iterations leads to better results until it reaches the maximum log-likelihood. In this case, more iterations will not lead to a better result.
  4. Epsilon: The algorithm calculates the maximum absolute distance between the coefficients every 100 iterations. If this maximum is less than epsilon, the algorithm will stop the calculation. When epsilon equals 0, the algorithm will run all the iterations.

Options

  1. Logistic regression - Displays the names of the coefficients or only x1, x2, etc.
  2. Matching report
    Show only ID and Score - Displays 'Treatment ID', 'Treatment score', 'Control ID', 'Control score', and 'Distance'.
    Show all columns - Also includes covariates and any other variable not included in the PSM process.
  3. Clean - clean the data automatically before running the PSM process.
  4. Missing data values - define the data that will be counted as missing data, such as NA, "", or N/A.
  5. Clean variables
    Numerical - remove subjects only if missing values are found in numerical variables.
    All - remove subjects if missing values are found in categorical variables or numerical variables.
  6. Rounding - how to round the results?
    When a resulting value is larger than one, the tool rounds it, but when a resulting value is less than one the tool displays the significant figures.
  7. Excel Pagination Display - the number of rows displayed per tab in the 'Enter data from Excel' option. This option displays the spreadsheet across multiple tabs.

Clean data

Data cleaning will occur automatically before running the PSM process.
For duplicate IDs, you will receive a warning, but the PSM process will continue.

  1. Subjects with values that should be clean as you defined (like "NA", "")
  2. Treatment - with values that are not 0 or 1.
  3. Outcome - with no numerical values.