1

Closed

Loading a file with special characters rise an error

description

I just tried to load a log file that contains my firstname in the username tag (Gwenaël) and the error came up
Then i ran some tests and determined that special characters are not supported.

The exact error report was :
Error reading log file!
[filepath]
There is an error in XML document [error line+column]
A solution would be to change the character encoding to UTF-8

file attachments

Closed Jun 24, 2013 at 9:46 AM by gwentreb

comments

gwentreb wrote May 2, 2013 at 5:22 PM

So apparently it was just a missing configuration in the code. I added the encoding and set it to UTF-8. Now the "bug" is that the datagrid doesn't display well these special characters. I'm gonna take a look on it tomorrow !

dirkster wrote May 2, 2013 at 10:51 PM

Its difficult to understand the problem from your given description but you should remember that special characters are escaped in XML - have a look at the log4jmerger sample does it exhibit the same problem without datagrid (review the encoding options I used there) ?:

https://yalvlib.codeplex.com/wikipage?title=Merging%20Log%20Files&referringTitle=Documentation

...maybe this will help you nailing it although you are probably on the right path anyway...

gwentreb wrote May 3, 2013 at 12:09 PM

Apparently, the logs files I'm using are encoded in ANSI. Thanks to notepadpp I was able to convert them into UTF-8 and... tadaaa special characters works fine.

The thing is, we would have to ensure that the log files we are importing are UTF-8 encoded. And if there are not, the process would be to totally rewrite the file into an UTF-8 encoded one...

I'll let this issue like this for the moment since most of the log files are exported in english. I'll solve it when I'll have nothing else to do !!

gwentreb wrote May 3, 2013 at 12:14 PM

Here is a screenshot for the notepadpp thing.

Image

dirkster wrote May 4, 2013 at 3:30 AM

Try using the System.Text.Encoding.Default encoding on your reader as in the sample code.

You can then determine the actual encoding when you check the reader.CurrentEncoding property.

Then when you get funny output in the gridview it might be because you need to set the encoding from the viewmodel to the gridview via bound property - I was unable to find a resolution for this either but I would be surprised if there was no way to tell the gridview what the correct encoding should be... hope this helps.

dirkster wrote Jun 23, 2013 at 12:16 PM

I have looked into this problem and copied some code from AvalonEdit.
Can you try to change the beginning of the XmlEntriesProvider class as indicated below?

My addition starts at 'Encoding fileEncoding = Encoding.Default;'
This code requires adding the attached class DetectEncoding.cs and see if your problem is resolved?
        public override IEnumerable<LogEntry> GetEntries(string dataSource, FilterParams filter)
        {
            List<LogEntry> entries = new List<LogEntry>();

            XmlReaderSettings settings = new XmlReaderSettings()
            {
                ConformanceLevel = ConformanceLevel.Fragment
            };

            settings.ValidationEventHandler += settings_ValidationEventHandler;
            NameTable nt = new NameTable();
            XmlNamespaceManager mgr = new XmlNamespaceManager(nt);
            mgr.AddNamespace("log4j", Log4jNs);

            XmlParserContext pc = new XmlParserContext(nt, mgr, "", XmlSpace.Default);

            Encoding fileEncoding = Encoding.Default;

            using (FileStream fs = new FileStream(dataSource, FileMode.Open, FileAccess.Read, FileShare.Read))
            {
              using (StreamReader reader = DetectEncoding.OpenStream(fs, Encoding.UTF8))
              {
                // assign encoding after ReadToEnd() so that the StreamReader can autodetect the encoding
                fileEncoding = reader.CurrentEncoding;
              }
            }

            // Specifying encoding for the xml files
            // Still having some display issues on the dataGrid, special characters are not well displayed on it.
            pc.Encoding = fileEncoding;

            using (XmlReader xr = XmlReader.Create(dataSource, settings, pc))
I will need a sample file to verify your problem if this solution does not work.

gwentreb wrote Jun 24, 2013 at 9:45 AM

It works well !

I added a screenshot :)