Upload Your Dataset
š File Format Detected:
CSV/Text Import Options:
Excel Import Options:
Data Preview:
Dataset Information
Import Summary:
Variable Type Analysis
Variable Classification:
Variable Details
Variable Information:
Categorical Variable Analysis:
Data Filtering Options
Keep Specific Variables:
Select variables to keep in your working dataset (leave empty to keep all)Filter by Conditions:
Apply filters to keep only certain observationsCurrent Filters:
Variable Operations
Choose Operation Type
Variable Recoding
Create Computed Variables
Convert Categorical ā Continuous
Convert Continuous ā Categorical
Rename Variables
Missing Value Treatment
Variable Information
Generated R Code:
Variable Details:
Created Variables Log:
Data Preview
Variable Selection
R Code for descriptives:
Descriptive Statistics
š Interpretation Guide:
Categorical Variable Frequencies
Distribution Plots
Correlation Analysis
Correlation Visualization
Network Plot Legend:
⢠Blue edges: Positive correlations
⢠Red edges: Negative correlations
⢠Edge thickness: Correlation strength
Correlation Results
Statistical Significance of Correlations
P-values indicate the probability that the observed correlation occurred by chance:
- p < 0.001: Very strong evidence against null hypothesis (***)
- p < 0.01: Strong evidence (marked with **)
- p < 0.05: Moderate evidence (marked with *)
- p ā„ 0.05: Insufficient evidence (not significant)
Note: Colors indicate significance levels - green (p<0.01), yellow (p<0.05), red (pā„0.05)
About Significance Tests
Significance tests help determine if correlations are statistically meaningful or could have occurred by random chance.
To view significance tests:
- Check the 'Show P-values' option in the analysis panel
- Generate your correlation analysis
- Return to this tab to view detailed p-values
⢠P-values for each correlation coefficient
⢠Color-coded significance levels
⢠Statistical interpretation guidance
Enable 'Show Confidence Intervals' option to see confidence intervals.
Correlation Interpretation
R Log Output
OLS Regression Setup
Multiple Comparisons Correction:
Note: Confidence intervals will be adjusted based on the correction method and alpha level.
R Code for OLS:
Regression Results
Model Fit Statistics:
Coefficient Plot
Corrected Confidence Intervals
Confidence intervals adjusted for multiple comparisons.
Enable multiple comparisons correction to view corrected confidence intervals.
OLS Interpretation
Diagnostic Plots
Diagnostic Plot Interpretation:
Residuals vs Fitted: Check for linearity and homoscedasticity. Points should be randomly scattered around the horizontal line at 0.Q-Q Plot: Check for normality of residuals. Points should follow the diagonal line closely.
Scale-Location: Check for homoscedasticity. The red line should be roughly horizontal.
Residuals vs Leverage: Identify influential observations. Look for points outside Cook's distance lines.
Logistic Regression Setup
Note: Variable must contain only 0 and 1 values.
Multiple Comparisons Correction:
Note: Confidence intervals will be adjusted based on the correction method and alpha level.
R Code for Logistic Regression:
Logistic Regression Results
Model Fit Statistics:
Coefficient Plot (Log-Odds)
Odds Ratios
Corrected Confidence Intervals
Confidence intervals adjusted for multiple comparisons (log-odds scale).
Enable multiple comparisons correction to view corrected confidence intervals.
Predicted Probabilities
Logistic Regression Interpretation
Plot Configuration
Select Variables for Correlation:
Variable Type Classification:
Plot Customization:
R Code for this plot:
Visualization
R Code Log - Learn the Syntax!
Complete R Script:
Session History:
How to Use This App
š Step-by-Step Guide
- Upload Data: Start by uploading a Stata (.dta) file in the 'Data Upload' tab.
- Create Variables: Use the 'Variable Creation' tab to recode or create new measures.
- Explore Descriptives: Examine variable distributions and summary statistics.
- Enhanced Visualization: Create customized plots with proper variable type classification.
- Check Correlations: Look for relationships between variables.
- Run Regressions: Analyze relationships using OLS or logistic regression.
- Learn R Code: Check the 'R Code Log' tab to see the R syntax for your actions.
- Interpret Results: Use the interpretation boxes to understand your findings.
š§ Variable Creation Guide
Numeric Recoding:
- Format: old_value=new_value OR condition=new_value
- Examples: 1=0, 2=1, 99= (empty = NA)
- Conditions: <=0=, >=999=, <18=0, >100=1
- Operators: <=, >=, <, >, ==, !=
- Use empty value after = to set as missing
- IMPORTANT: For negative values like -9, -4, use: <0= to set all negative values as NA
Missing Value Handling:
- Enter values to treat as missing: -99, -98, 999
- Choose to set as NA or replace with specific value
Variable Types:
- Continuous: Numeric with many unique values (age, income)
- Ordinal: Ordered categories (education levels, agreement scales)
- Categorical: Unordered categories (gender, party affiliation)
š Statistical Interpretation Guide
Descriptive Statistics:
- Mean: Average value of the variable
- Median: Middle value when data is ordered
- Standard Deviation: Measure of variability around the mean
- Skewness: Measure of asymmetry in the distribution
Correlation Coefficients:
- r = 0.1 to 0.3: Weak relationship
- r = 0.3 to 0.5: Moderate relationship
- r = 0.5 to 1.0: Strong relationship
- Negative values: Inverse relationship
Regression Coefficients:
- Coefficient: Expected change in Y for 1-unit change in X
- p-value < 0.05: Statistically significant relationship
- R-squared: Proportion of variance explained by the model
- Odds Ratio > 1: Increases likelihood of outcome (logistic)
šÆ Tips for Large Datasets:
- Use sampling for datasets > 10,000 observations
- Start with descriptive statistics before running complex models
- Check correlation matrices to identify multicollinearity
- Examine diagnostic plots for assumption violations
ā ļø Troubleshooting Common Issues:
Data Loading Problems:
- Ensure your file is a valid .dta file
- Check file size (must be under 100MB)
- Try converting to .csv if .dta loading fails
- For ANES data: consider using a subset of variables
Variable Creation Issues:
- Check that source variable exists in your dataset
- Verify recoding syntax (old_value=new_value)
- Use conditions carefully (<=, >=, <, >, ==, !=)
- Remember: empty value after = sets to NA
Analysis Errors:
- Ensure you have enough observations for analysis
- Check for missing data in key variables
- For regression: make sure dependent variable is appropriate type
- For logistic regression: dependent variable should be binary (0/1)