Character Encoding Detector

Detect character encoding, validate Base64, and analyze text properties.

Waiting for input...

Code Examples

// Detect character encoding and validate Base64
function detectCharset(input) {
  // Check if valid Base64
  const isBase64 = /^[A-Za-z0-9+/]*={0,2}$/.test(input.trim());

  // Detect encoding
  const hasNonAscii = /[^\x00-\x7F]/.test(input);
  let encoding = hasNonAscii ? 'UTF-8' : 'ASCII';

  // Character and byte count
  const charCount = input.length;
  const byteCount = new Blob([input]).size;

  // Detect line breaks
  const hasCRLF = /\r\n/.test(input);
  const hasLF = /(?<!\r)\n/.test(input);
  const lineBreakType = hasCRLF ? 'CRLF' : hasLF ? 'LF' : 'None';

  // Check UTF-8 validity
  const isValidUtf8 = (() => {
    try {
      const encoded = new TextEncoder().encode(input);
      const decoded = new TextDecoder('utf-8', { fatal: true }).decode(encoded);
      return decoded === input;
    } catch {
      return false;
    }
  })();

  return {
    isBase64,
    encoding,
    charCount,
    byteCount,
    lineBreakType,
    isValidUtf8
  };
}

// Example usage
const text = "Hello, δΈ–η•Œ! 🌍";
const result = detectCharset(text);
console.log(result);
// Output: { isBase64: false, encoding: 'UTF-8', charCount: 13, byteCount: 19, ... }

How to Use

  1. 1Paste your text or Base64 string into the input area.
  2. 2The tool automatically analyzes the encoding and properties.
  3. 3View detailed information including encoding type, character count, and line breaks.
  4. 4Check if the input is valid Base64 and UTF-8.
  5. 5Copy the detection results for reference.

Common Use Cases

Debug Encoding Issues

Identify why text appears garbled or displays incorrectly in your application.

Validate Data

Verify that Base64 strings and text data are properly encoded before processing.

Internationalization

Ensure multi-language content uses correct encoding (UTF-8) for global compatibility.

File Format Analysis

Determine the encoding and line break format of text files from different sources.

Key Features

Encoding Detection

Automatically detects UTF-8, ASCII, Latin-1, and other character encodings.

Base64 Validation

Checks if input is valid Base64 and provides decoded size information.

Character Analysis

Displays character count, byte count, and detected language/script.

Line Break Detection

Identifies CRLF (Windows), LF (Unix), or mixed line break formats.

UTF-8 Validation

Verifies if text is valid UTF-8 and suggests encoding conversions.

Bilingual Support

Full support for English and Chinese interfaces with localized suggestions.

Frequently Asked Questions

What character encodings can this tool detect?

The tool can detect UTF-8, ASCII, ISO-8859-1 (Latin-1), and identify various scripts including Latin, Cyrillic, Chinese, Japanese, Korean, Arabic, Hebrew, and Greek.

How accurate is the encoding detection?

The tool uses Unicode ranges and byte patterns for detection. While generally accurate, complex mixed-encoding texts may require manual verification. For best results, ensure your source data is properly encoded.

Can it handle both plain text and Base64?

Yes! The tool automatically detects whether your input is Base64-encoded or plain text, and provides appropriate analysis for both formats.

Why does my text show as invalid UTF-8?

This usually means the text contains byte sequences that are not valid in UTF-8 encoding. It might be encoded in a different format (like Latin-1 or Windows-1252) or contain corrupted data.

What are the encoding suggestions for?

The tool provides practical recommendations based on the detected encoding, such as converting to UTF-8 for better compatibility or setting proper charset meta tags in HTML.