Thoughts on software, innovation and people

Entries in software engineering (2)

Schrödinger's Web

Looking back at my post Perfect or Sloppy- RDF,Shirky and Wittgenstein and Danny's detailed response, wittgensteins-laptop (sorry you lost the original post Danny), a couple of things are clear. I didn't do a good job of explaining what I think the issue is and it was a bit them and us(not the intention).

Ian also made a good point. I should clarify that the issues that were bugging me are not with RDF itself but with the layers further up the Semantic Web Stack specifically the logic and proof layer built on top of RDF.

I would like to describe how I understand the proposed Semantic Web Stack and ask the community how certain questions have been covered off. It may be that I misunderstand the vision or that the questions I have have already been covered off.

As I understand it, the RDF and Onotology layers allow graphs of statements to be made and linked together. Multiple descriptions of a concept can be made and RDF allows inconsistency. The query level allows portions of graphs to be selected or joined together and the logic level allows knew knowledge to be inferred from the statements and questions to be answered using the mass of RDF statements. I understand this Logic to be first order predicate calculus(FOPC)?.

My concern is that the logic layer is very intolerant of inconsistency or error. From what I have been able to find it seems the proposed answer to this is to limit the scope of the logic to trusted consistent statements or user arbitration of conflicts will be required. This is the root of my concern, I cannot see how this is possible. Inconsistency is not just generated at the system logic or schema level it is deeper. It is the necessary result of allowing multiple descriptions of the same thing.

Inconsistency will always arise when ever humans have to make classification choices. This was one of the points in my previous post.

Danny was quite right to point out that most software today requires consistency. We all know the length programmers go to to ensure consistency and this is because programmatic methods are based on predicate logic. If a program enters an inconsistent state, usually that thread of execution must end. If the inconsistency is in persistent data you are in real trouble because restarting won't fix the problem.

Compilers enforce the consistency of the code but the data in the system must also be consistent if programmatic LOGIC is to be based on it.

Two principal methods are used to achieve this:

1. Limit to one description of an entity I.e no competing descriptions e.g one record per entity ID.

2. Fields marked as non programmatic e.g. Text descriptions. The contents of these fields will not be used by the program logic, they are for human use only.

With this approach any uncertainty in programmatic fields cannot generate inconsistency, principally because there is only one version of the truth i.e. statements are orthogonal.

Now contrast this with the semantic web where by definition you will be working with descriptions from many systems. Inconsistency will a natural feature not an error condition.

Note the fundamental nature of the inconsistency, it is not a property of the different systems, two identical systems will still yield inconsistency, because it is a function of how people use a system, not the system itself .

I confused the previous example by suggesting two different systems with slightly different schemas.

This time consider two identical library systems, both which have a schema with the concept of editions of a work , both of which are defined by the same RDF URI. In one system the librarian considers an edition to be when ever there is difference such as the two different covers for the same Harry Potter and catalogues accordingly. The librarian using the other system thinks it is a different edition only if the contents are different.

Now, taking descriptions from both systems you will get an inconsistency. Does the work Harry Potter have 1 or 2 editions.

This is not something you can fix by giving a different URI to the editions concept for each system because the inconsistency is result of the classification decision made by that person for that record in that system at that time i.e. it is not systematic. The result is that inconsistency will arise in an unpredictable way even between identical software systems with identical schemas. (it is one reason why integration of different systems still remains a pain even if you use RDF).

This inconsistency isn't a problem in its own right. But if layers of predicate logic are working off this data then it will become unstable very quickly.

My current understanding is that the SW community are suggesting that either inconsistency is avoided (how as it is a fundamental result of allowing multiple descriptions of the same thing) or that the system should ask a user at that point to arbitrate (on what basis should they choose one over the other? They are both right).

It strikes me that if inconsistency is fundamental then it should be treated as such, not something to be avoided.

Isn't the SW approach today, based on predicate logic, simply using the wrong maths? Just as the AI community was before it embraced fuzzyness, uncertainty and statistics? Or the classical physics community before quantum mechanics?

That transition saw AI moving from "programming" AI systems with rules and logic to creating learning systems that needed training.
It seems to me that the internet has 1 billion users capable of training it. We see examples of this in things like Google spell checking, which, rather than creating a traditional dictionary is based upon what people type and then retype when they can no results. When a spelling suggestion is given, if the user chooses it this provides further feedback or training as to what is useful spelling help and not. This turns out to work much better than the programmed approach. Other examples that spring to mind in del.icio.us and Flickr.

Realising that a work both has 1 and 2 editions at the same time seems to me to be exactly the position classical physics found it self in at the birth of quantum physics. The maths of classical physics could not cope with particles being at several locations at the same time. Neither could the classical physicist!

A new maths was required. One based upon uncertainly and probability. This maths is very well understood and forms the basis of solid state physics upon which electronic engineering is based upon which of coarse the computer is based!

So I guess my question is this: Is the logic layer intended to be FOPC and if so why. Who is ensuring that the SW community isn't falling into the same traps the AI community did? What can be learned from the AI community?
What is the problem of using probability based maths, it works for physics!

Maybe all of these have good answers. If so I wasn't able to easily find them. Or my understanding of the logic layers is wrong?

Please let me know.

Posted on Sunday, August 7, 2005 at 10:53AM by

Justin Leavesley in RDF, semantic web, software engineering |

18 Comments |

1 Reference

The million tonne beam

Over the last few years, I have been involved in many arguments about different approaches to software development. One of the recurring themes is "Software engineering should be more like traditional engineering. More repeatable". It is a worthy argument.

Often it is said "well engineering as a discipline is much older than software so of course it has a much stronger analytical approach i.e. maths". But I wonder if it is the age of the discipline at all that makes the difference? There is one other huge difference between, say structural engineering, and software engineering. Moore's Law!

What would structural engineering methods look like if price/performance of construction material doubled every 18 months?

So that between 1970 and 2005 there is a 1 million times increase.

A beam supporting 1000 tonnes would be lighter than carbon fiber and cheaper than paper.

The vast majority of existing types of construction could be put together in a safe way by anyone i.e. even the worse design would be many times stronger than the loads. What would be the need for such a disciplined approach to design as engineers use today.

Also, every 5 years a whole new vista of possible uses for construction would become economically viable. Things you just wouldn't even think of doing with the existing price/performance. There would still be projects that stretched the very limits of what was possible but the vast majority would be much more about working with the client to understand exactly what it is required; because of the cheapness and simplicity, building in module parts that could be thrown away or altered if not right might be the cheapest way to get to the final RIGHT solution for the customer.

I.e. The limiting factor on satisfying the customer is the customers ability to describe what they want. So rapid iterative approaches to showing the customer what a particular solution could mean for them in practise would be very valuable.

That is, the real cost is in the work for the customer to understand what they really need or want rather than the costs of construction or materials.

You can see what I am getting at. You might need Agile structural engineering!

Costs in today's engineering lie in a different place to most software projects. The comparison, for the majority of software development projects is not good. Of course for safety critical or low level real time projects the story is different , but these are a very small subset of development projects.

When solutions are not pushing the envelope, performance can be scarified for simplicity, this is perhaps what has allowed the software stack to form and evolve. We have the raw power to hide the complexity so most developers today do not need rigorous engineering methods to achieve customer satisfaction.

In fact as we add yet more layers to the software stack (and I am convinced that web2.0 and the semantic web need new layers) we lower the skills and economic barriers to software development enabling a new audience to participate. The concept of the developer itself shifts. For example, is the departmental boss who uses access to create a simple database application that solves a niche problem in his department a developer?
Maybe we should talk about application authors, application developers and software engineers as very separate disciplines that would typically work at different layers in the software stack?

What happens when the average person can "author" application software just as they can now be global publishers of dynamic content on the web?

So maybe it is more correct to relate software development with the whole construction industry, not just engineering.
If I need an extension built, the local builders can handle that quite easily without a structural engineer because we are not pushing any envelopes.
If I want to be a huge dam, I do need an engineer.

The difference with software is that every 18 months the builders can do ever more impressive works.

On a slightly different point, how many more builders are there than structural engineers?
I expect the number of "Application Authors" will massively outweigh the software engineers. By extension, I would also expect far more innovation to come from the low tech "Application Author" community than the high tech software engineer community, see Web services and the innovators dilemma

Posted on Thursday, August 4, 2005 at 05:26PM by

Justin Leavesley in software engineering |

Inbetween

Entries in software engineering (2)

Schrödinger's Web

The million tonne beam