Fatih Erol

Upload Your Dataset

Choose Data File

Browse...

📁 Supported formats: Stata (.dta), SPSS (.sav), R (.rds), CSV (.csv), Excel (.xlsx, .xls), Tab-separated (.tsv), Text (.txt) 📊 Maximum file size: 100MB ⚠️ For very large datasets, use the variable filtering options below.

📋 File Format Detected:

CSV/Text Import Options:

Separator:

Decimal:

Header row

Quote character:

Skip lines:

Excel Import Options:

Sheet name/number:

Skip rows:

Cell range (optional):

Column names

💡 Tip: If you encounter issues, try converting your file to CSV first, or use the import options above.

Data Preview:

Dataset Information

Import Summary:

Variable Type Analysis

Variable Classification:

Variable Details

Inspect Variable:

Variable Information:

Categorical Variable Analysis:

Data Filtering Options

Keep Specific Variables:

Select variables to keep in your working dataset (leave empty to keep all)

Variables to Keep:

Filter by Conditions:

Apply filters to keep only certain observations

Filter Variable:

Operator:

Filter Value:

Current Filters:

Variable Operations

Choose Operation Type

Operation Type:

Variable Recoding

Select Source Variable:

New Variable Name:

Syntax: 1=0, <=0=NA, >=999=NA or 1->0, <=0->NA

Recoding Rules:

Create Computed Variables

New Variable Name:

Computation Type:

Select Variables:

Weights (comma-separated):

Min value per variable:

Max value per variable:

Custom Formula:

Require all variables non-missing

Convert Categorical → Continuous

Select Categorical Variable:

New Variable Name:

Conversion Method:

💡 Assign numeric values to each category

📋 Creates 0/1 variables for each category

Include all categories (no reference)

📋 Reference = -1, Target = 1, Others = 0 (for regression analysis)

Convert Continuous → Categorical

Select Continuous Variable:

New Variable Name:

Conversion Method:

Cutpoints:

Labels (optional):

Number of bins:

Number of quantiles:

Method:

Rename Variables

Variable to Rename:

New Name:

Missing Value Treatment

Select Variable:

New Variable Name:

Missing Values (comma-separated):

Action:

Replacement Value:

Download Dataset

Variable Information

Generated R Code:

Variable Details:

Created Variables Log:

Data Preview

📊 Green = newly created variables, Yellow = source variables

Variable Selection

Select Variables for Descriptive Statistics:

Include Categorical Variables

R Code for descriptives:

Descriptive Statistics

📊 Interpretation Guide:

Categorical Variable Frequencies

Distribution Plots

Correlation Analysis

Select Variables for Correlation:

Correlation Method:

Use Partial Correlation

Control Variables:

Partial Correlation Method:

Show P-values

Show Confidence Intervals

Confidence Level:

Plot Type:

Export Options:

Download Results

R Code for correlation:

Correlation Visualization

Network Plot Legend:

• Blue edges: Positive correlations

• Red edges: Negative correlations

• Edge thickness: Correlation strength

Correlation Results

Statistical Significance of Correlations

P-values indicate the probability that the observed correlation occurred by chance:

p < 0.001: Very strong evidence against null hypothesis (***)
p < 0.01: Strong evidence (marked with **)
p < 0.05: Moderate evidence (marked with *)
p ≥ 0.05: Insufficient evidence (not significant)

Note: Colors indicate significance levels - green (p<0.01), yellow (p<0.05), red (p≥0.05)

About Significance Tests

Significance tests help determine if correlations are statistically meaningful or could have occurred by random chance.

To view significance tests:

Check the 'Show P-values' option in the analysis panel
Generate your correlation analysis
Return to this tab to view detailed p-values

What you'll see:
• P-values for each correlation coefficient
• Color-coded significance levels
• Statistical interpretation guidance

Enable 'Show Confidence Intervals' option to see confidence intervals.

Correlation Interpretation

R Log Output

OLS Regression Setup

Dependent Variable:

Independent Variables:

Robust Standard Errors

Multiple Comparisons Correction:

Apply Multiple Comparisons Correction

Correction Method:

Alpha Level:

Note: Confidence intervals will be adjusted based on the correction method and alpha level.

R Code for OLS:

Regression Results

Model Fit Statistics:

Coefficient Plot

Corrected Confidence Intervals

Confidence intervals adjusted for multiple comparisons.

Enable multiple comparisons correction to view corrected confidence intervals.

OLS Interpretation

Diagnostic Plots

Diagnostic Plot Interpretation:

Residuals vs Fitted: Check for linearity and homoscedasticity. Points should be randomly scattered around the horizontal line at 0.
Q-Q Plot: Check for normality of residuals. Points should follow the diagonal line closely.
Scale-Location: Check for homoscedasticity. The red line should be roughly horizontal.
Residuals vs Leverage: Identify influential observations. Look for points outside Cook's distance lines.

Logistic Regression Setup

Dependent Variable (Binary):

Note: Variable must contain only 0 and 1 values.

Independent Variables:

Multiple Comparisons Correction:

Apply Multiple Comparisons Correction

Correction Method:

Alpha Level:

Note: Confidence intervals will be adjusted based on the correction method and alpha level.

R Code for Logistic Regression:

Logistic Regression Results

Model Fit Statistics:

Coefficient Plot (Log-Odds)

Odds Ratios

Corrected Confidence Intervals

Confidence intervals adjusted for multiple comparisons (log-odds scale).

Enable multiple comparisons correction to view corrected confidence intervals.

Predicted Probabilities

Logistic Regression Interpretation

Plot Configuration

Plot Type:

X Variable:

Y Variable:

Grouping Variable (optional):

Color Variable (optional):

Size Variable (optional):

Select Variables for Correlation:

Choose variables (hold Ctrl/Cmd to select multiple):

Select at least 2 numeric variables. Hold Ctrl (Windows) or Cmd (Mac) to select multiple variables.

Variable Type Classification:

Plot Customization:

X-axis Label:

Y-axis Label:

Plot Title:

Color Scheme:

Number of Bins:

Point Size:

Transparency (0-1):

Show Data Labels

Create Faceted Plot

Download Plot

R Code for this plot:

Visualization

R Code Log - Learn the Syntax!

This tab shows all the R code equivalent to your point-and-click actions. Copy and paste to learn R syntax!

Complete R Script:

Download R Script

Download Current Dataset

Session History:

How to Use This App

📚 Step-by-Step Guide

Upload Data: Start by uploading a Stata (.dta) file in the 'Data Upload' tab.
Create Variables: Use the 'Variable Creation' tab to recode or create new measures.
Explore Descriptives: Examine variable distributions and summary statistics.
Enhanced Visualization: Create customized plots with proper variable type classification.
Check Correlations: Look for relationships between variables.
Run Regressions: Analyze relationships using OLS or logistic regression.
Learn R Code: Check the 'R Code Log' tab to see the R syntax for your actions.
Interpret Results: Use the interpretation boxes to understand your findings.

🔧 Variable Creation Guide

Numeric Recoding:

Format: old_value=new_value OR condition=new_value
Examples: 1=0, 2=1, 99= (empty = NA)
Conditions: <=0=, >=999=, <18=0, >100=1
Operators: <=, >=, <, >, ==, !=
Use empty value after = to set as missing
IMPORTANT: For negative values like -9, -4, use: <0= to set all negative values as NA

Missing Value Handling:

Enter values to treat as missing: -99, -98, 999
Choose to set as NA or replace with specific value

Variable Types:

Continuous: Numeric with many unique values (age, income)
Ordinal: Ordered categories (education levels, agreement scales)
Categorical: Unordered categories (gender, party affiliation)

📖 Statistical Interpretation Guide

Descriptive Statistics:

Mean: Average value of the variable
Median: Middle value when data is ordered
Standard Deviation: Measure of variability around the mean
Skewness: Measure of asymmetry in the distribution

Correlation Coefficients:

r = 0.1 to 0.3: Weak relationship
r = 0.3 to 0.5: Moderate relationship
r = 0.5 to 1.0: Strong relationship
Negative values: Inverse relationship

Regression Coefficients:

Coefficient: Expected change in Y for 1-unit change in X
p-value < 0.05: Statistically significant relationship
R-squared: Proportion of variance explained by the model
Odds Ratio > 1: Increases likelihood of outcome (logistic)

🎯 Tips for Large Datasets:

Use sampling for datasets > 10,000 observations
Start with descriptive statistics before running complex models
Check correlation matrices to identify multicollinearity
Examine diagnostic plots for assumption violations

⚠️ Troubleshooting Common Issues:

Data Loading Problems:

Ensure your file is a valid .dta file
Check file size (must be under 100MB)
Try converting to .csv if .dta loading fails
For ANES data: consider using a subset of variables

Variable Creation Issues:

Check that source variable exists in your dataset
Verify recoding syntax (old_value=new_value)
Use conditions carefully (<=, >=, <, >, ==, !=)
Remember: empty value after = sets to NA

Analysis Errors:

Ensure you have enough observations for analysis
Check for missing data in key variables
For regression: make sure dependent variable is appropriate type
For logistic regression: dependent variable should be binary (0/1)