Upload Your Dataset

šŸ“ Supported formats: Stata (.dta), SPSS (.sav), R (.rds), CSV (.csv), Excel (.xlsx, .xls), Tab-separated (.tsv), Text (.txt) šŸ“Š Maximum file size: 100MB āš ļø For very large datasets, use the variable filtering options below.
šŸ“‹ File Format Detected:
CSV/Text Import Options:
Excel Import Options:
šŸ’” Tip: If you encounter issues, try converting your file to CSV first, or use the import options above.

Data Preview:

Dataset Information


Import Summary:

                    

Variable Type Analysis

Variable Classification:

Variable Details



Variable Information:

                      
Categorical Variable Analysis:

Data Filtering Options

Keep Specific Variables:
Select variables to keep in your working dataset (leave empty to keep all)


Filter by Conditions:
Apply filters to keep only certain observations


Current Filters:

                      

Variable Operations

Choose Operation Type

Variable Recoding
Syntax: 1=0, <=0=NA, >=999=NA or 1->0, <=0->NA

                      
Create Computed Variables
Convert Categorical → Continuous
šŸ’” Assign numeric values to each category
šŸ“‹ Creates 0/1 variables for each category
šŸ“‹ Reference = -1, Target = 1, Others = 0 (for regression analysis)
Convert Continuous → Categorical
Rename Variables
Missing Value Treatment



Download Dataset

Variable Information

Generated R Code:

                            
Variable Details:

                          
Created Variables Log:

Data Preview

šŸ“Š Green = newly created variables, Yellow = source variables

Variable Selection



R Code for descriptives:

                      

Descriptive Statistics


šŸ“Š Interpretation Guide:

Categorical Variable Frequencies

Distribution Plots

Correlation Analysis




Export Options:
Download Results

R Code for correlation:

                      

Correlation Visualization


Network Plot Legend:

• Blue edges: Positive correlations

• Red edges: Negative correlations

• Edge thickness: Correlation strength

Correlation Results

Statistical Significance of Correlations

P-values indicate the probability that the observed correlation occurred by chance:

  • p < 0.001: Very strong evidence against null hypothesis (***)
  • p < 0.01: Strong evidence (marked with **)
  • p < 0.05: Moderate evidence (marked with *)
  • p ≄ 0.05: Insufficient evidence (not significant)

Note: Colors indicate significance levels - green (p<0.01), yellow (p<0.05), red (p≄0.05)


About Significance Tests

Significance tests help determine if correlations are statistically meaningful or could have occurred by random chance.

To view significance tests:

  1. Check the 'Show P-values' option in the analysis panel
  2. Generate your correlation analysis
  3. Return to this tab to view detailed p-values

What you'll see:
• P-values for each correlation coefficient
• Color-coded significance levels
• Statistical interpretation guidance

Enable 'Show Confidence Intervals' option to see confidence intervals.

Correlation Interpretation

R Log Output


                    

OLS Regression Setup

Multiple Comparisons Correction:

Note: Confidence intervals will be adjusted based on the correction method and alpha level.



R Code for OLS:

                      

Regression Results


                    

Model Fit Statistics:

Coefficient Plot

Corrected Confidence Intervals

Confidence intervals adjusted for multiple comparisons.

Enable multiple comparisons correction to view corrected confidence intervals.

OLS Interpretation

Diagnostic Plots


Diagnostic Plot Interpretation:
Residuals vs Fitted: Check for linearity and homoscedasticity. Points should be randomly scattered around the horizontal line at 0.
Q-Q Plot: Check for normality of residuals. Points should follow the diagonal line closely.
Scale-Location: Check for homoscedasticity. The red line should be roughly horizontal.
Residuals vs Leverage: Identify influential observations. Look for points outside Cook's distance lines.

Logistic Regression Setup

Note: Variable must contain only 0 and 1 values.

Multiple Comparisons Correction:

Note: Confidence intervals will be adjusted based on the correction method and alpha level.



R Code for Logistic Regression:

                      

Logistic Regression Results


                    

Model Fit Statistics:

Coefficient Plot (Log-Odds)

Odds Ratios

Corrected Confidence Intervals

Confidence intervals adjusted for multiple comparisons (log-odds scale).

Enable multiple comparisons correction to view corrected confidence intervals.

Predicted Probabilities

Logistic Regression Interpretation

Plot Configuration

Select Variables for Correlation:
Select at least 2 numeric variables. Hold Ctrl (Windows) or Cmd (Mac) to select multiple variables.
Variable Type Classification:
Plot Customization:



R Code for this plot:

                        

Visualization



R Code Log - Learn the Syntax!

This tab shows all the R code equivalent to your point-and-click actions. Copy and paste to learn R syntax!

Complete R Script:


                    



Session History:

How to Use This App

šŸ“š Step-by-Step Guide

  1. Upload Data: Start by uploading a Stata (.dta) file in the 'Data Upload' tab.
  2. Create Variables: Use the 'Variable Creation' tab to recode or create new measures.
  3. Explore Descriptives: Examine variable distributions and summary statistics.
  4. Enhanced Visualization: Create customized plots with proper variable type classification.
  5. Check Correlations: Look for relationships between variables.
  6. Run Regressions: Analyze relationships using OLS or logistic regression.
  7. Learn R Code: Check the 'R Code Log' tab to see the R syntax for your actions.
  8. Interpret Results: Use the interpretation boxes to understand your findings.

šŸ”§ Variable Creation Guide

Numeric Recoding:

  • Format: old_value=new_value OR condition=new_value
  • Examples: 1=0, 2=1, 99= (empty = NA)
  • Conditions: <=0=, >=999=, <18=0, >100=1
  • Operators: <=, >=, <, >, ==, !=
  • Use empty value after = to set as missing
  • IMPORTANT: For negative values like -9, -4, use: <0= to set all negative values as NA

Missing Value Handling:

  • Enter values to treat as missing: -99, -98, 999
  • Choose to set as NA or replace with specific value

Variable Types:

  • Continuous: Numeric with many unique values (age, income)
  • Ordinal: Ordered categories (education levels, agreement scales)
  • Categorical: Unordered categories (gender, party affiliation)

šŸ“– Statistical Interpretation Guide

Descriptive Statistics:

  • Mean: Average value of the variable
  • Median: Middle value when data is ordered
  • Standard Deviation: Measure of variability around the mean
  • Skewness: Measure of asymmetry in the distribution

Correlation Coefficients:

  • r = 0.1 to 0.3: Weak relationship
  • r = 0.3 to 0.5: Moderate relationship
  • r = 0.5 to 1.0: Strong relationship
  • Negative values: Inverse relationship

Regression Coefficients:

  • Coefficient: Expected change in Y for 1-unit change in X
  • p-value < 0.05: Statistically significant relationship
  • R-squared: Proportion of variance explained by the model
  • Odds Ratio > 1: Increases likelihood of outcome (logistic)

šŸŽÆ Tips for Large Datasets:

  • Use sampling for datasets > 10,000 observations
  • Start with descriptive statistics before running complex models
  • Check correlation matrices to identify multicollinearity
  • Examine diagnostic plots for assumption violations

āš ļø Troubleshooting Common Issues:

Data Loading Problems:

  • Ensure your file is a valid .dta file
  • Check file size (must be under 100MB)
  • Try converting to .csv if .dta loading fails
  • For ANES data: consider using a subset of variables

Variable Creation Issues:

  • Check that source variable exists in your dataset
  • Verify recoding syntax (old_value=new_value)
  • Use conditions carefully (<=, >=, <, >, ==, !=)
  • Remember: empty value after = sets to NA

Analysis Errors:

  • Ensure you have enough observations for analysis
  • Check for missing data in key variables
  • For regression: make sure dependent variable is appropriate type
  • For logistic regression: dependent variable should be binary (0/1)