A lot of architecture on the web discusses the problem from a less than holistic perspective. With this blog I am attempting to start down a path that answers more than just the “web related” interests with its architecture. So, it’s friendlier towards reporting, security, and operations teams. A lot of my success comes from taking applications that were purely “developer centric” and teasing out messy bits to work more transparently for the ops teams and business leaders.
For this, the only real constraints I had were: ASP.Net, RESTful web service layer, and a three data center (global clients) web farm model.
It can be roughly described from the top-down as follows:
Use NGINX (a light weight web) as a reverse proxy to handle routing to three global web farms by IP address location. Additional research has raised potentials for inserting more thorough DDOS detection at this layer. Further research raises the potential for routing all static content from this level, potentially combining Varnish with NGINX, to reduce the number of hops for the user to get to the images and HTML for the site.
Maintain a User Interface layer using ASP.Net MVC4 combined with a BackboneJS framework along with underscoreJS and JQuery. Further questions around whether SPA (Single Page Application, like HULU has) is better for you content or not. Regardless, SPA has a lot of fans these days. The frameworks seem to boil down to BackboneJS vs. KnockoutJS. Further research revealed some opinion based leanings toward BackboneJS: it has a larger community of developers (unverified) and has built in hooks for a RESTful web service layer. There is also a question of what is the best library or popular method to sanitize requests against XSS (cross site scripting) and SQLi (SQL injection). I find some .Net/Java developers ignore the security layer because they feel safe within their frameworks. However, I observe modern developers shifting towards faster and more responsive JavaScript libraries, and so, I want to keep an eye on this. The frameworks only protect you if you use their compilers.
For the caching part I kept coming across success stories in web farms using Memcached. Just to keep an eye on MS Azure, at this point, there is some potential interest in Windows Azure Caching (Preview). However there appears to be a concern since MS Azure Caching in other forms has been cost prohibitive. Also, as a MS developer, I’m just as concerned when choosing newer MS technologies as open source ones regarding the long term durability (is it maintained? Is there a healthy community). Memcached apparently does the job well in web farm situations, so, it seems to be a first choice.
So, the Service layer. ASP.Net Web API wins over WCF as a light weight RESTful web services that speaks in JSON. Versioning in the services would be handled through the URI model and operations would be kept minimal to required functionality with the HTTP verbs. Regarding speed… I’ve been on both sides of this question: Use a service layer for Web-DB communications vs. regular code layer. I know theoretically the straight code would be faster in a small app situation. I know that, despite debating, that Web API would be faster than WCF in many situations. I know that for any extensibility with external systems would be optimally built in a services fashion. So, to me, this is less about writing SOA or not, and more about, if I have a team that already has to code out a services layer, why confuse them with internal/external questions. I like to simplify things as much as possible up front, because I’ve seen many complex architectures fail out of the gate because the devs don’t get it and ultimately have a pressing deadline that takes priority over the purity of the concept.
This is where authentication is going to pass through, so we have Oath 2.0 vs. HMAC. The traditional way is to do authentication over HTTPS encryption, but, that’s only encrypted over the wire and not at the end points which opens the application up to Man in the Middle attacks. Research showed that Amazon, at some point, avoided this by not using OAuth and instead used HMAC. Others did Two legged OAuth. Regardless, caution needs to be taken here to choose a method that actually works before I start code. The thought of implementing an unsecure authentication method out of ignorance is, in my mind, a pretty avoidable problem.
The data access code … In fifteen years I’ve seen a lot of paths taken here. Some of them were light and painless, but regarded by some architects as distinctly “un-MS.” Personally, MS doesn’t pay me, so I have no loyalty to their lollipop data access flavors. I have seen and used Entity Framework since its inception, and I pretty much find it a great example of a “ivory tower” concept that fails to live up to expectations in the real world. I don’t need a DAL layer that knows how to talk to SQL, MySQL, Oracle, etc… I never really have either. Even in huge applications where mainframes were still in production this would not have helped. Someone had already build that layer. So, at this point I’d prefer a super simple layer with code minimized and tailored to the one database I have in production. If down the road a merger took place and I ended up with 2 databases, I’d cross that bridge than rather than gimp a solution for things that “may occur.” So, custom ADO.Net or an ORM or both.
Using ADO.Net to build the communications to a database usually means that SQLi has been defeated at this point. That and ensuring that no user input is used to build any query strings dynamically. Additionally at this point we have to consider making the calls to the database using TLS (Transfer Level Security). I had an additional thought I have not seen implemented but I have wondered about. The idea is my Services will request data from my database, but, how do I know all those requests came from the Services? What is they were spoofed? What if some savvy blackhat put a copy of my UI website on a thumb drive using WGET for the presentation layer and that site made a seemingly legit call back to my database? I don’t know; could be paranoid, but these days… So, the idea is to use something (HMAC) to make sure those requests are legit and then to route the other traffic to a honeypot database where I can monitor requests and try to track the traffic over time to find my little “helper.”
Down to the relational database layer… Could be SQL Express, could be MariaDB (over MySQL). Honestly, this doesn’t concern me because I wouldn’t choose to use many of the “bells and whistles” and I would choose to treat my database like a dumb trashcan for data that may blow up at any time. It’s only value to me is that it’s cheap and fast, because if we’re successful, we’ll need more of them. I’ve seen plenty of enterprise solutions use the most “pimped out” MS SQL servers they could have and they paid handsomely for it up front and down the road. I prefer to let the programmers solve the hard problems and just use sharding to reduce the stress on a cheaper database.
Which brings me to Shard’ing. I know Shard’ing scales better than Silo’ing, but I also know that the optimal sharding method requires some pretty insightful choices and a fast code layer to help the data calls get routed and bunched properly. The example often given is by users alphabetically, but, I’m curious if there’s some more optimal way to choose that client shard’ing other than common sense. Having studied MySpace and Amazon and others, this seems like a really painful road each company goes through and often takes a few tries to get just right.
So, at this point we have a basic architecture, but it’s missing, in my opinion some very key components. A way to monitor everything and a way to get Sales/Marketing all those reports without screwing up my database traffic. Oh, and giving the Security/Audit teams some toys would be nice.
I’ve worked with Ops guys and I’ve learned they can be your best friends or they can really hate you because you give them nothing to work with. I like Ops. So, I want to try out a distributed monitoring tool that has its hooks in everything without compromising. From what I’m reading, and what I’ve experienced, this just isn’t one of those areas that everyone thinks about. Ironic to me how most devs can debate endlessly about OOP or MVC vs. MVVM, but few have an answer to “how do you measure the “better-ness” of your OOP solution?” Sometimes they say, that’s another team’s responsibility… Now that’s team work. Anyway, numbers are how we measure, not religious devotion to decoupled systems and high minded PhD white papers from MS/Oracle.
So, the weak consensus boiled down to a couple paths:
- Ganglia (for metrics) + Nagios (for alerts)
- Sensu + Collectd + Graphite + Logstash
- Splunk
Now, all that really feels like heavy Ops, but not enough security. It’s good to know when servers are tanking and databases are hung, but I’d sure like to know when a friendly is helping me “test” my system by initiating a DDOS attack on Web Farm A or port scan on one of my service layers. So, where do we plug in SNORT or some other traffic monitoring security app?
Finally, the reporting. I don’t know the statistics, but I’m pretty sure a high percentage of any “Data Warehouse” project I’ve ever observed from the sidelines failed miserably… They failed in different ways. Usually, the original devs were too busy so they just create reporting straight off production databases. That works long enough for them to get a new job and a couple years layer business users start complaining about load times when they fire off a historical report against a database. Hey, how are they supposed to know? It was fine with Scott wrote it two years ago… No, no one has cleaned out the history or log files or rebuilt indexes or whatever… So eventually some BI company hears the complaints and sells them a big DW package which has more nobs than a space station. Oh you wanted consulting? That’s cost prohibitive, but we can teach your Dev for 2 hours and they’ll have it… Oh you’re good devs don’t have time/interest in DW? Just give me your worst, laziest, most checked out dev… Okay, long story short, but that’s what I run into when it comes to the sad, sad land of reporting.
Which is even sadder, because REPORTS are for EXECUTIVES much of the time. This is precisely how IT departments get judged and perceived by their corporations executive sales and marketing leaders. Okay, so, here’s my new thought to solving this much unseen problem in IT.
You have a standalone SQL Enterprise Edition database just for reporting. You setup a Quartz scheduler app to pull data every 2/4/6/24 hours from the prod databases, and transform it into quantitatively friendly tables for easy reporting. Then you spend some cache and get Telerik Reporting with the responsive design so it works for mobile loaded up on a server and dishing those reports out. I’m pretty sure this would take less time, despite costs, and satisfy more executives (who don’t want to come to the office to view a report), and really, outside of the data transformations, you could feasibly hand a Telerik solution to a B player on your team and it would still look like “magic rocket ships” to the leadership teams. But, the data pulls… have to be fast. The new guy shouldn’t be handed Entity Framework with a blog on how to write LINQ and put in a corner. This almost always results in high load times and absolutely unforgivable LINQ generated SQL. I know, it’s not LINQs fault it’s smarter than the average dev, but, that’s the world we’re in.
This is a really fun thought experiment for me so I’m going to continue posts that begin building out each part to expose incorrect assumptions and show metrics where I can.